Skip to main content

Installing our R development environment on Ubuntu 20.04

 

Step 1: Install R,  Here the link with instructions: How to instal R on Ubuntu 20.04
Adding the steps I followed because sometimes the links become unavailable:

Add GPG key:

$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 Output: Executing: /tmp/apt-key-gpghome.NtZgt0Un4R/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 gpg: key 51716619E084DAB9: public key "Michael Rutter " imported gpg: Total number processed: 1 gpg: imported: 1

Add repository:

$ sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' Output: Hit:1 https://deb.opera.com/opera-stable stable InRelease Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease Hit:3 http://archive.canonical.com/ubuntu focal InRelease Hit:4 http://us.archive.ubuntu.com/ubuntu focal-updates InRelease Hit:5 http://us.archive.ubuntu.com/ubuntu focal-backports InRelease Get:6 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease [3 622 B] Hit:7 http://us.archive.ubuntu.com/ubuntu focal-security InRelease Get:8 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ Packages [31.6 kB] Fetched 35.3 kB in 2s (20.9 kB/s) Reading package lists... Done

Update repositories:

$ sudo apt update

Install R:

$ sudo apt install r-base

Step 2: Download and install rstudio, execellent R IDE:

R studio download link ***Select package Ubuntu 18/Debian 10 (64-bit deb package).
*In this case I am showing version 1.4.1106, please adjust to the version you are installing.
After downloading rstudio installation file: rstudio-1.4.1106-amd64.deb, it is a good practice(in some cases needed) to import RStudio's public code-signing key prior to installation, please visit this page for more detail: RStudio Public Key

Showing RStudio install file verification process, first get public key:
$ gpg --keyserver keys.gnupg.net --recv-keys 3F32EE77E331692F Output: gpg: key 3F32EE77E331692F: public key "RStudio, Inc. (code signing) <info@rstudio.com>" imported gpg: Total number processed: 1 gpg: imported: 1
Verifying rstudio install file:
$ dpkg-sig --verify rstudio-1.4.1106-amd64.deb Output: Command 'dpkg-sig' not found, but can be installed with: sudo apt install dpkg-sig

Installing needed package:

$ sudo apt install dpkg-sig
verify again:
$ dpkg-sig --verify rstudio-1.4.1106-amd64.deb Output: Processing rstudio-1.4.1106-amd64.deb... GOODSIG _gpgbuilder FE8564CFF1AB93F1728645193F32EE77E331692F 1613072032

Once veryfied, to install the rstudio downloaded package, open a terminal window and from the folder where you downloaded rstudio, execute:
$ sudo dpkg -i rstudio-1.4.1106-amd64.deb


Showing my output because I got some errors:
$ sudo dpkg -i rstudio-1.4.1106-amd64.deb Output: Selecting previously unselected package rstudio. (Reading database ... 192590 files and directories currently installed.) Preparing to unpack rstudio-1.4.1106-amd64.deb ... Unpacking rstudio (1.4.1106) … dpkg: dependency problems prevent configuration of rstudio: rstudio depends on libclang-dev; however: Package libclang-dev is not installed. rstudio depends on libpq5; however: Package libpq5 is not installed. dpkg: error processing package rstudio (--install): dependency problems - leaving unconfigured Processing triggers for gnome-menus (3.36.0-1ubuntu1) ... Processing triggers for desktop-file-utils (0.24-1ubuntu3) ... Processing triggers for mime-support (3.64ubuntu1) ... Processing triggers for hicolor-icon-theme (0.17-2) ... Processing triggers for shared-mime-info (1.15-1) ... Errors were encountered while processing: rstudio
Fixing the problem: Installing libclang-dev:
$ sudo apt install libclang-dev Output: Reading package lists... Done Building dependency tree Reading state information... Done You might want to run 'apt --fix-broken install' to correct these. The following packages have unmet dependencies: libclang-dev : Depends: libclang-10-dev (>= 10~) but it is not going to be installed rstudio : Depends: libpq5 but it is not going to be installed E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

So, I ran it: $ sudo apt --fix-broken install
Running again the install and it worked just fine: 
$ sudo dpkg -i rstudio-1.4.1106-amd64.deb 
 
After this you will have your basic R development environment. The next thing to do is to install R packages, to have a more complete development environment.

Step 3: Install a series of Ubuntu packages needed before installing R packages:

Open a terminal window and run all the next installation commands:

sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libssl-dev
sudo apt-get install libxml2-dev
sudo apt install libmariadbclient-dev
sudo apt-get install libpq-dev
sudo apt-get install unixodbc unixodbc-dev
sudo apt-get install libcairo2-dev
sudo apt-get install libgtk2.0-dev


The other important thing to do before installing R packages is to install Java and set it as default, for this installation I am using Java 8.

This is a good reference article about: How To Install Java 8 on Ubuntu 20.04/18.04/16.04


For short on Ubuntu 20.04, from a command window run this instructions:
$ sudo su Install Java 8: # apt install openjdk-8-jdk Check Java version to test it was installed: # java -version You should get a similar output like this: openjdk version "1.8.0_282" OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08) OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode) Configure Java: # cd /usr/lib/jvm # ln -s /usr/lib/jvm/java-8-openjdk-amd64 /usr/lib/jvm/default-java # export LD_LIBRARY_PATH=/usr/lib/jvm/default-java/lib:$LD_LIBRARY_PATH # export JAVA_HOME=/usr/lib/jvm/default-java # R CMD javareconf KEEP THIS TERMINAL WINDOW OPEN AS IT IS NOW AND CONTINUE TO THE R PACKAGES INSTALLATION

Step 4: INSTALL R PACKAGES:

On the same terminal above, run R: #R 
 
Now continue to the R packages installation: 
 
tidyverse -> Opinionated collection of R packages designed for data science.
install.packages( "tidyverse", dependencies = TRUE )

data.table -> Fast manipulation of large datasets.
install.packages( "data.table", dependencies = TRUE )

sqldf -> Run SQL instructions on your datasets.
install.packages( "sqldf", dependencies = TRUE )

stringdist -> Computes string distances, very useful when creating clusters of catalog descriptions.
install.packages( "stringdist", dependencies = TRUE )

RODBC -> Database access.
install.packages( "RODBC", dependencies = TRUE )

xts -> Non regular time series package
install.packages( "xts", dependencies = TRUE )

dygraphs -> Nice graphs for R
install.packages( "dygraphs", dependencies = TRUE )

openxlsx -> Read, Write and Edit XLSX Files
install.packages( "openxlsx", dependencies = TRUE )

lubridate -> Dates handling.
install.packages( "lubridate", dependencies = TRUE )

forecast -> ARIMA and forecast package
install.packages( "forecast", dependencies = TRUE )

mailR -> Send email from R
install.packages( "mailR", dependencies = TRUE )

gbm -> gbm ( Gradient Boosting Machine )algorithm for R
install.packages( "gbm", dependencies = TRUE )

gbm ->  xgboost algorithm for R
install.packages( "xgboost", dependencies = TRUE )

aTSA -> Time Series Analysis
install.packages( "aTSA", dependencies = TRUE )

rattle -> Tab-oriented user interface that is similar to Microsoft Office's ribbon interface. It makes getting started with data mining in R very easy.
install.packages( "rattle", dependencies = TRUE )

Rcmdr -> R Commander. A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
install.packages( "Rcmdr", dependencies = TRUE )

itsmr -> Time Series Analysis Using the Innovations Algorithm. Provides functions for modeling and forecasting time series data.
install.packages( "itsmr", dependencies = TRUE )

stlplus  -> A new implementation of STL. Allows for NA values, local quadratic smoothing, post-trend smoothing, and endpoint blending. The usage is very similar to that of R's built-in stl().
install.packages( "stlplus", dependencies = TRUE )

TSA -> Useful to compute data seasonality.
install.packages( "TSA", dependencies = TRUE )

Enjoy it!!!.

Carlos Kassab
https://www.linkedin.com/in/carlos-kassab-48b40743/

More information about R:
https://www.r-bloggers.com

Popular posts from this blog

UPDATED: Using R and H2O to identify product anomalies during the manufacturing process.

Note.  This is an update to article:  http://laranikalranalytics.blogspot.com/2019/03/using-r-and-h2o-to-identify-product.html - It has some updates but also code optimization from  Yana Kane-Esrig(  https://www.linkedin.com/in/ykaneesrig/ ) , as she mentioned in a message: The code you posted has two nested for() {} loops. It took a very long time to run. I used just one for() loop. It was much faster   Here her original code: num_rows=nrow(allData) for(i in 1:ncol(allData)) {   temp = allData [,i]   cat( "Processing column:", i, ", number missing:", sum( is.na(temp)), "\n" )    temp_mising =is.na( allData[, i])    temp_values = allData[,i][! temp_mising]    temp_random = sample(temp_values, size = num_rows, replace = TRUE)      temp_imputed = temp   temp_imputed[temp_mising]= temp_random [temp_mising]   # describe(temp_imputed)   allData [,i] = temp_imputed      cat( "Processed column:", i, ", number missing:", sum( is.na(allData [,i

Using R and H2O Isolation Forest to identify product anomalies during the manufacturing process.

Note.  - This article  has some improvements from  Yana Kane-Esrig(  https://www.linkedin.com/in/ykaneesrig/  ) , mentioned in this article:  http://laranikalranalytics.blogspot.com/2021/03/updated-using-r-and-h2o-to-identify.html Introduction: We will identify anomalous units on the production line by using measurements data from testing stations and Isolation Forest model. Anomalous products are not failures, anomalies are units close to measurement limits, so we can display maintenance warnings before the station starts to make scrap. Before starting we need the next software installed and working: -  R language installed. -  H2O open source framework. - Java 8 ( For H2O ). Open JDK:  https://github.com/ojdkbuild/contrib_jdk8u-ci/releases -  R studio. Get your data. About the data: Since I cannot use my real data, for this article I am using  SECOM Data Set from UCI Machine Learning Repository      I downloaded SECOM data to /tmp How many records?:  Traini