Skip to main content

Time Series Analysis With Documentation And Steps I Follow For Analytics Projects.


To do this I will create a prediction of the open values for Bitcoin in the next 3 days.

The process I follow is based on CRISP-DM methodology: https://www.datasciencecentral.com/profiles/blogs/crisp-dm-a-standard-methodology-to-ensure-a-good-outcome

1.- Planning the activities.

To plan the activities I use a spread sheet document, below I show the spread sheet sample, if you would like the document, please go to the next link:
https://github.com/LaranIkal/R-ANALYTICS/blob/master/BitCoinPredictionsAdmin%26FollowUp.ods


Activity Activity Description DueDate Activity Owner Status Comments
Functional Requirement Specification A Text Document explaining the objectives of this project. 4/19/2018 Carlos Kassab Done
Get Data For Analysis Get initial data to create feasibility analysis 4/19/2018 Carlos Kassab Done
ETL Development ETL to get final data for next analysis 4/20/2018 Carlos Kassab Done 2018/04/19: In this case, there is not ETL needed, the dataset was downloaded from kaggle: https://www.kaggle.com/vivekchamp/bitcoin/data
Exploratory Data Analysis Dataset summary and histogram to know the data normalization 4/20/2018 Carlos Kassab In progress
Variables frequency Frequency of variable occurrence( frequency of values change, etc. ) 4/20/2018 Carlos Kassab Done 2018/04/19: We have already seen that our values change every day.
Outliers Analysis Analysis of variability in numeric variables, show it in charts and grids.. 4/20/2018 Carlos Kassab In progress
Time Series Decomposition – Getting metric charts, raw data, seasonality, trend and remainder. 4/20/2018 Carlos Kassab In progress
Modelling Create the analytics model 4/25/2018 Carlos Kassab Not Started
SQL View Development For Training, Validation And Testing NA Carlos Kassab Not Started 2018/04/19: No SQL view needed, everything is done inside the R script.
Model Selection By using random parameters search algorithm, to find the right model to be used for this data. 4/25/2018 Carlos Kassab Not Started
Model fine tunning. After finding the right algorithm, find the right model parameters to be used. 4/25/2018 Carlos Kassab Not Started
Chart Development Final data chart development 4/25/2018 Carlos Kassab Not Started
Data Validation Run the analytics model at least 2 weeks daily in order to see its behavior. NA Carlos Kassab Not Started
Deployment Schedule the automatic execution of the R code. NA Carlos Kassab Not Started

The first activity is the functional specification, this would be similar to business understanding in the crisp-dm methodology.

I use a text document, for this analysis, you can get the document from this link:
 https://github.com/LaranIkal/R-ANALYTICS/blob/master/BitCoinPredictionRequirementSpecification.odt

Now, the next step is to get the data for analysis and create the ETL script, in this case we just got the data from kaggle.com as mentioned in the documentation but, no ETL script was needed.

So, we have our data, now we are going to analyze it, we will do all the activities mentioned in yellow in the grid above. I did this in a MarkDown document, here you can see the HTML output:

https://github.com/LaranIkal/R-ANALYTICS/blob/master/Bitcoin%20Data%20Analysis.pdf

It seems that github is not serving big html files but, anyway I am including the html link:

https://github.com/LaranIkal/R-ANALYTICS/blob/master/BitcoinDataAnalysis.html

Note. At the end, in order to have everything together I included the time series algorithms in the same rmd file that creates BitcoinDataAnalysis.html file.

You can get all the sources from, this is the best way if you are interested in seeing the analysis:

https://github.com/LaranIkal/R-ANALYTICS


Enjoy it!!!.

Carlos Kassab





Popular posts from this blog

UPDATED: Using R and H2O to identify product anomalies during the manufacturing process.

Note.  This is an update to article:  http://laranikalranalytics.blogspot.com/2019/03/using-r-and-h2o-to-identify-product.html - It has some updates but also code optimization from  Yana Kane-Esrig(  https://www.linkedin.com/in/ykaneesrig/ ) , as she mentioned in a message: The code you posted has two nested for() {} loops. It took a very long time to run. I used just one for() loop. It was much faster   Here her original code: num_rows=nrow(allData) for(i in 1:ncol(allData)) {   temp = allData [,i]   cat( "Processing column:", i, ", number missing:", sum( is.na(temp)), "\n" )    temp_mising =is.na( allData[, i])    temp_values = allData[,i][! temp_mising]    temp_random = sample(temp_values, size = num_rows, replace = TRUE)      temp_imputed = temp   temp_imputed[temp_mising]= temp_random [temp_mising]   # describe(temp_imputed)   allData [,i] = temp_imputed      cat( "Processed column:", i, ", number missing:", sum( is.na(allData [,i

Using R and H2O Isolation Forest to identify product anomalies during the manufacturing process.

Note.  - This article  has some improvements from  Yana Kane-Esrig(  https://www.linkedin.com/in/ykaneesrig/  ) , mentioned in this article:  http://laranikalranalytics.blogspot.com/2021/03/updated-using-r-and-h2o-to-identify.html Introduction: We will identify anomalous units on the production line by using measurements data from testing stations and Isolation Forest model. Anomalous products are not failures, anomalies are units close to measurement limits, so we can display maintenance warnings before the station starts to make scrap. Before starting we need the next software installed and working: -  R language installed. -  H2O open source framework. - Java 8 ( For H2O ). Open JDK:  https://github.com/ojdkbuild/contrib_jdk8u-ci/releases -  R studio. Get your data. About the data: Since I cannot use my real data, for this article I am using  SECOM Data Set from UCI Machine Learning Repository      I downloaded SECOM data to /tmp How many records?:  Traini

Installing our R development environment on Ubuntu 20.04

  Step 1: Install R,  Here the link with instructions:  How to instal R on Ubuntu 20.04 Adding the steps I followed because sometimes the links become unavailable: Add GPG key: $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 Output: Executing: /tmp/apt-key-gpghome.NtZgt0Un4R/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 gpg: key 51716619E084DAB9: public key "Michael Rutter " imported gpg: Total number processed: 1 gpg: imported: 1 Add repository: $ sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' Output: Hit:1 https://deb.opera.com/opera-stable stable InRelease Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease Hit:3 http://archive.canonical.com/ubuntu focal InRelease