Skip to main content

Predicting Car Battery Failure With R And H2O - Study

# Loading libraries
suppressWarnings( suppressMessages( library( h2o ) ) ) 
suppressWarnings( suppressMessages( library( data.table ) ) )
suppressWarnings( suppressMessages( library( plotly ) ) )
suppressWarnings( suppressMessages( library( DT ) ) )

# Reading data file
# Data from: https://www.kaggle.com/yunlevin/levin-vehicle-telematics
dataFileName = "/Development/Analytics/AnomalyDetection/AutomovileFailurePrediction/v2.csv"
carData = fread( dataFileName, skip=0, header = TRUE )
carBatteryData = data.table( TimeStamp = carData$timeStamp
                             , BatteryVoltage = as.numeric( carData$battery ) 
                            )
rm(carData)

# Data cleaning, filtering and conversion
carBatteryData = na.omit( carBatteryData ) # Keeping just valid Values

# According to this article: 
# https://shop.advanceautoparts.com/r/advice/car-maintenance/car-battery-voltage-range
#
# A perfect voltage ( without any devices or electronic systems plugged in )  
# is between 13.7 and 14.7V. 
# If the battery isn’t fully charged, it will diminish to 12.4V at 75%, 
# 12V when it’s only operating at 25%, and up to 11.9V when it’s completely discharged. 
#
# Battery voltage while a load is connected is much slower
# it should be something between 9.5V and 10.5V 
#
# This value interval ensures that your battery can store and deliver enough 
# current to start your car and power all your electronics and electric devices 
# without any difficulty

carBatteryData = carBatteryData[BatteryVoltage >= 9.5] # Filtering voltages greater or equal to 9.5
carBatteryData$TimeStamp = as.POSIXct( paste0( substr(carBatteryData$TimeStamp,1,17),"00" ) )
carBatteryData = unique(carBatteryData) # Removing duplicate voltage readings
carBatteryData = carBatteryData[order(TimeStamp)]


# spliting all data, using the last date as testing data and the rest for training.
lastDate = max( as.Date( format( carBatteryData$TimeStamp, "%Y-%m-%d" ) ) )
trainingData = carBatteryData[ as.Date( format( carBatteryData$TimeStamp, "%Y-%m-%d" ) ) != lastDate ]
testingData = carBatteryData[ as.Date( format( carBatteryData$TimeStamp, "%Y-%m-%d" ) ) == lastDate ]



################################################################################
# Creating Anomaly Detection Model
################################################################################

  h2o.init( nthreads = -1, max_mem_size = "5G" )
## 
## H2O is not running yet, starting it now...
## 
## Note:  In case of errors look at the following log files:
##     C:\Users\LaranIkal\AppData\Local\Temp\Rtmp6lTw4H/h2o_LaranIkal_started_from_r.out
##     C:\Users\LaranIkal\AppData\Local\Temp\Rtmp6lTw4H/h2o_LaranIkal_started_from_r.err
## 
## 
## Starting H2O JVM and connecting:  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         1 seconds 899 milliseconds 
##     H2O cluster timezone:       America/Mexico_City 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.24.0.2 
##     H2O cluster version age:    1 month and 7 days  
##     H2O cluster name:           H2O_started_from_R_LaranIkal_tzd452 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   4.44 GB 
##     H2O cluster total cores:    8 
##     H2O cluster allowed cores:  8 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     H2O API Extensions:         Amazon S3, Algos, AutoML, Core V3, Core V4 
##     R Version:                  R version 3.6.0 (2019-04-26)
  h2o.no_progress() # Disable progress bars for Rmd
  h2o.removeAll() # Cleans h2o cluster state.
## [1] 0
  # Convert the training dataset to H2O format.
  trainingData_hex = as.h2o( trainingData[,2], destination_frame = "train_hex" )
  
  # Build an Isolation forest model
  trainingModel = h2o.isolationForest( training_frame = trainingData_hex
                                       , sample_rate = 0.1
                                       , max_depth = 32
                                       , ntrees = 100
                                      )
  
  # According to H2O doc: 
  # http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/if.html
  #
  # Isolation Forest is similar in principle to Random Forest and is built on the basis of decision trees. 
  
  # Isolation Forest creates multiple decision trees to isolate observations.
  # 
  # Trees are split randomly, The assumption is that:
  #   
  #   IF ONE UNIT MEASUREMENTS ARE SIMILAR TO OTHERS,
  #   IT WILL TAKE MORE RANDOM SPLITS TO ISOLATE IT.
  # 
  #   The less splits needed, the unit is more likely to be anomalous.
  # 
  # The average number of splits is then used as a score.

  # Calculate score for training dataset
  score <- h2o.predict( trainingModel, trainingData_hex )
  result_pred <- as.vector( score$predict )


################################################################################
# Setting threshold value for anomaly detection.
################################################################################

  # Setting desired threshold percentage.
  threshold = .995 # Let's say we have 99.5% voltage values correct
  
  # Using avobe threshold to get score limit to filter anomalous voltage readings.
  scoreLimit = round( quantile( result_pred, threshold ), 4 )
  

  
################################################################################
# Get anomalous voltage readings from testing data, using model and scoreLimit got using training data.
################################################################################

  # Convert testing data frame to H2O format.
  testingDataH2O = as.h2o( testingData[,2], destination_frame = "testingData_hex" )
  
  # Get score using training model
  testingScore <- h2o.predict( trainingModel, testingDataH2O )

  # Add row score at the beginning of testing dataset
  testingData = cbind( RowScore = round( as.vector( testingScore$predict ), 4 ), testingData )

  # Check if there are anomalous voltage readings from testing data
  anomalies = testingData[ testingData$RowScore > scoreLimit, ]
# Here there is and additional filter to ensure maintenance recommendation
  # If there are more than 3 anomalous voltage readings, display an alert.
  if( dim( anomalies )[1]  > 3 ) { 
    cat( "Show alert on car display: Battery got anomalous voltage readings, it is recommended to take it to service." )
    
    plot_ly( data = anomalies
             , x = ~TimeStamp
             , y = ~BatteryVoltage
             , type = 'scatter'
             , mode = "lines"
             , name = 'Anomalies') %>%
      layout( yaxis = list( title = 'Battery Voltage.' )
              , xaxis = list( categoryorder='trace', title = 'Date - Time.' )
               )
  }
## Show alert on car display: Battery got anomalous voltage readings, it is recommended to take it to service.
if( dim( anomalies )[1]  > 3 ) { 
  DT::datatable(anomalies[,c(2,3)], rownames = FALSE )
}
Using this approach we may prevent failures on cars, not only for batteries but for many cases when sensors are used.
Carlos Kassab
We are using R, more information about R:

Popular posts from this blog

UPDATED: Using R and H2O to identify product anomalies during the manufacturing process.

Note.  This is an update to article:  http://laranikalranalytics.blogspot.com/2019/03/using-r-and-h2o-to-identify-product.html - It has some updates but also code optimization from  Yana Kane-Esrig(  https://www.linkedin.com/in/ykaneesrig/ ) , as she mentioned in a message: The code you posted has two nested for() {} loops. It took a very long time to run. I used just one for() loop. It was much faster   Here her original code: num_rows=nrow(allData) for(i in 1:ncol(allData)) {   temp = allData [,i]   cat( "Processing column:", i, ", number missing:", sum( is.na(temp)), "\n" )    temp_mising =is.na( allData[, i])    temp_values = allData[,i][! temp_mising]    temp_random = sample(temp_values, size = num_rows, replace = TRUE)      temp_imputed = temp   temp_imputed[temp_mising]= temp_random [temp_mising]   # describe(temp_imputed)   allData [,i] = temp_imputed      cat( "Processed column:", i, ", number missing:", sum( is.na(allData [,i

Using R and H2O Isolation Forest to identify product anomalies during the manufacturing process.

Note.  - This article  has some improvements from  Yana Kane-Esrig(  https://www.linkedin.com/in/ykaneesrig/  ) , mentioned in this article:  http://laranikalranalytics.blogspot.com/2021/03/updated-using-r-and-h2o-to-identify.html Introduction: We will identify anomalous units on the production line by using measurements data from testing stations and Isolation Forest model. Anomalous products are not failures, anomalies are units close to measurement limits, so we can display maintenance warnings before the station starts to make scrap. Before starting we need the next software installed and working: -  R language installed. -  H2O open source framework. - Java 8 ( For H2O ). Open JDK:  https://github.com/ojdkbuild/contrib_jdk8u-ci/releases -  R studio. Get your data. About the data: Since I cannot use my real data, for this article I am using  SECOM Data Set from UCI Machine Learning Repository      I downloaded SECOM data to /tmp How many records?:  Traini

Installing our R development environment on Ubuntu 20.04

  Step 1: Install R,  Here the link with instructions:  How to instal R on Ubuntu 20.04 Adding the steps I followed because sometimes the links become unavailable: Add GPG key: $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 Output: Executing: /tmp/apt-key-gpghome.NtZgt0Un4R/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 gpg: key 51716619E084DAB9: public key "Michael Rutter " imported gpg: Total number processed: 1 gpg: imported: 1 Add repository: $ sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' Output: Hit:1 https://deb.opera.com/opera-stable stable InRelease Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease Hit:3 http://archive.canonical.com/ubuntu focal InRelease