Posts

Using R and H2O Isolation Forest anomaly detection for data quality, further analysis.

Introduction: This is the second article on data quality, for the first part, please go to: http://laranikalranalytics.blogspot.com/2019/11/using-r-and-h2o-isolation-forest-for.html Since Isolation Forest is building an ensemble of isolation trees, and these trees are created randomly, there is a lot of randomness in the isolation forest training, so, to have a more robust result, 3 isolation forest models will be trained for a better anomaly detection. I will also use Apache Spark for data handling. For a full example, testing data will be used after training the 3 IF(Isolation Forest) models. This way of using Isolation Forest is kind of a general usage also for maintenance prediction. I am working with data from file:
https://www.kaggle.com/bradklassen/pga-tour-20102018-data # Set Java parameters, enough memory for Java. options( java.parameters = c( "-Xmx40G" ) ) # 40GB Ram for Java# Loading libraries suppressWarnings(suppressMessages(library(sparklyr))) suppressWarnings(suppres…
Recent posts