Note. This is an update to article: http://laranikalranalytics.blogspot.com/2019/03/using-r-and-h2o-to-identify-product.html - It has some updates but also code optimization from Yana Kane-Esrig( https://www.linkedin.com/in/ykaneesrig/ ) , as she mentioned in a message: The code you posted has two nested for() {} loops. It took a very long time to run. I used just one for() loop. It was much faster Here her original code: num_rows=nrow(allData) for(i in 1:ncol(allData)) { temp = allData [,i] cat( "Processing column:", i, ", number missing:", sum( is.na(temp)), "\n" ) temp_mising =is.na( allData[, i]) temp_values = allData[,i][! temp_mising] temp_random = sample(temp_values, size = num_rows, replace = TRUE) temp_imputed = temp temp_imputed[temp_mising]= temp_random [temp_mising] # describe(temp_imputed) allData [,i] = temp_imputed cat( "Processed column:", i, ", number missing:", sum( is.na(allData [,i
R-Analytics
How to do things related to R, Installation, packages usage and algorithm samples.