Skip to main content

Production Line Stations Maintenance Prediction - Process Flow.

Steps Needed in a Process to Detect Anomalies And Have a Maintenance Notice Before We Have Scrap Created on The Production Line.

Describing my previous articles( 1, 2 ) process flow:

  • Get Training Data.
    • At least 2 weeks of passed units measurements.
  • Data Cleaning.
    • Ensure no null values.
    • At least 95% data must have measurement values.
  • Anomalies Detection Model Creation.
    • Deep Learning Autoencoders.
      • or
    • Isolation Forest.
  • Set Yield Threshold Desired, Normally 99%
  • Get Prediction Value Limit by Linking Yield Threshold to Training Data Using The Anomaly Detection Model Created.
  • Get Testing Data.
    • Last 24 Hour Data From Station Measurements, Passed And Failed Units.
  • Testing Data Cleaning.
    • Ensure no null values.
  • Get Anomalies From Testing Data by Using The Model Created And Prediction Limit Found Before.
  • If Anomalies Found, Notify Maintenance to Avoid Scrap.
  • Display Chart Showing Last 24 Hour Anomalies And Failures Found:



As you can see( Anomalies in blue, Failures in orange ), we are detecting anomalies( Units close to measurement limits ) before failures.

Sending an alert when the first or second anomaly was detected will prevent scrap because the station will get maintenance to avoid failures.



Carlos Kassab

We are using R, more information about R:











Popular posts from this blog

UPDATED: Using R and H2O to identify product anomalies during the manufacturing process.

Note.  This is an update to article:  http://laranikalranalytics.blogspot.com/2019/03/using-r-and-h2o-to-identify-product.html - It has some updates but also code optimization from  Yana Kane-Esrig(  https://www.linkedin.com/in/ykaneesrig/ ) , as she mentioned in a message: The code you posted has two nested for() {} loops. It took a very long time to run. I used just one for() loop. It was much faster   Here her original code: num_rows=nrow(allData) for(i in 1:ncol(allData)) {   temp = allData [,i]   cat( "Processing column:", i, ", number missing:", sum( is.na(temp)), "\n" )    temp_mising =is.na( allData[, i])    temp_values = allData[,i][! temp_mising]    temp_random = sample(temp_values, size = num_rows, replace = TRUE)      temp_imputed = temp   temp_imputed[temp_mising]= temp_random [temp_mising]   # describe(temp_imputed)   allData [,i] = temp_imputed      cat( "Processed column:", i, ", number missing:", sum( is.na(allData [,i

Installing our R development environment on Ubuntu 20.04

  Step 1: Install R,  Here the link with instructions:  How to instal R on Ubuntu 20.04 Adding the steps I followed because sometimes the links become unavailable: Add GPG key: $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 Output: Executing: /tmp/apt-key-gpghome.NtZgt0Un4R/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 gpg: key 51716619E084DAB9: public key "Michael Rutter " imported gpg: Total number processed: 1 gpg: imported: 1 Add repository: $ sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' Output: Hit:1 https://deb.opera.com/opera-stable stable InRelease Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease Hit:3 http://archive.canonical.com/ubuntu focal InRelease

Using R and H2O Isolation Forest anomaly detection for data quality, further analysis.

Introduction: This is the second article on data quality, for the first part, please go to:  http://laranikalranalytics.blogspot.com/2019/11/using-r-and-h2o-isolation-forest-for.html Since Isolation Forest is building an ensemble of isolation trees, and these trees are created randomly, there is a lot of randomness in the isolation forest training, so, to have a more robust result, 3 isolation forest models will be trained for a better anomaly detection. I will also use Apache Spark for data handling. For a full example, testing data will be used after training the 3 IF(Isolation Forest) models. This way of using Isolation Forest is kind of a general usage also for maintenance prediction. I am working with data from file: https://www.kaggle.com/bradklassen/pga-tour-20102018-data NOTE: There was a problem with the data from the link above, so I created some synthetic data that can be downloaded from this link:  Golf Tour Synthetic Data # Set Java parameters, enough memo