Skip to main content

Installing our R development environment on Ubuntu 20.04

 

Step 1: Install R,  Here the link with instructions: How to instal R on Ubuntu 20.04
Adding the steps I followed because sometimes the links become unavailable:

Add GPG key:

$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 Output: Executing: /tmp/apt-key-gpghome.NtZgt0Un4R/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 gpg: key 51716619E084DAB9: public key "Michael Rutter " imported gpg: Total number processed: 1 gpg: imported: 1

Add repository:

$ sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' Output: Hit:1 https://deb.opera.com/opera-stable stable InRelease Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease Hit:3 http://archive.canonical.com/ubuntu focal InRelease Hit:4 http://us.archive.ubuntu.com/ubuntu focal-updates InRelease Hit:5 http://us.archive.ubuntu.com/ubuntu focal-backports InRelease Get:6 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease [3 622 B] Hit:7 http://us.archive.ubuntu.com/ubuntu focal-security InRelease Get:8 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ Packages [31.6 kB] Fetched 35.3 kB in 2s (20.9 kB/s) Reading package lists... Done

Update repositories:

$ sudo apt update

Install R:

$ sudo apt install r-base

Step 2: Download and install rstudio, execellent R IDE:

R studio download link ***Select package Ubuntu 18/Debian 10 (64-bit deb package).
*In this case I am showing version 1.4.1106, please adjust to the version you are installing.
After downloading rstudio installation file: rstudio-1.4.1106-amd64.deb, it is a good practice(in some cases needed) to import RStudio's public code-signing key prior to installation, please visit this page for more detail: RStudio Public Key

Showing RStudio install file verification process, first get public key:
$ gpg --keyserver keys.gnupg.net --recv-keys 3F32EE77E331692F Output: gpg: key 3F32EE77E331692F: public key "RStudio, Inc. (code signing) <info@rstudio.com>" imported gpg: Total number processed: 1 gpg: imported: 1
Verifying rstudio install file:
$ dpkg-sig --verify rstudio-1.4.1106-amd64.deb Output: Command 'dpkg-sig' not found, but can be installed with: sudo apt install dpkg-sig

Installing needed package:

$ sudo apt install dpkg-sig
verify again:
$ dpkg-sig --verify rstudio-1.4.1106-amd64.deb Output: Processing rstudio-1.4.1106-amd64.deb... GOODSIG _gpgbuilder FE8564CFF1AB93F1728645193F32EE77E331692F 1613072032

Once veryfied, to install the rstudio downloaded package, open a terminal window and from the folder where you downloaded rstudio, execute:
$ sudo dpkg -i rstudio-1.4.1106-amd64.deb


Showing my output because I got some errors:
$ sudo dpkg -i rstudio-1.4.1106-amd64.deb Output: Selecting previously unselected package rstudio. (Reading database ... 192590 files and directories currently installed.) Preparing to unpack rstudio-1.4.1106-amd64.deb ... Unpacking rstudio (1.4.1106) … dpkg: dependency problems prevent configuration of rstudio: rstudio depends on libclang-dev; however: Package libclang-dev is not installed. rstudio depends on libpq5; however: Package libpq5 is not installed. dpkg: error processing package rstudio (--install): dependency problems - leaving unconfigured Processing triggers for gnome-menus (3.36.0-1ubuntu1) ... Processing triggers for desktop-file-utils (0.24-1ubuntu3) ... Processing triggers for mime-support (3.64ubuntu1) ... Processing triggers for hicolor-icon-theme (0.17-2) ... Processing triggers for shared-mime-info (1.15-1) ... Errors were encountered while processing: rstudio
Fixing the problem: Installing libclang-dev:
$ sudo apt install libclang-dev Output: Reading package lists... Done Building dependency tree Reading state information... Done You might want to run 'apt --fix-broken install' to correct these. The following packages have unmet dependencies: libclang-dev : Depends: libclang-10-dev (>= 10~) but it is not going to be installed rstudio : Depends: libpq5 but it is not going to be installed E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

So, I ran it: $ sudo apt --fix-broken install
Running again the install and it worked just fine: 
$ sudo dpkg -i rstudio-1.4.1106-amd64.deb 
 
After this you will have your basic R development environment. The next thing to do is to install R packages, to have a more complete development environment.

Step 3: Install a series of Ubuntu packages needed before installing R packages:

Open a terminal window and run all the next installation commands:

sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libssl-dev
sudo apt-get install libxml2-dev
sudo apt install libmariadbclient-dev
sudo apt-get install libpq-dev
sudo apt-get install unixodbc unixodbc-dev
sudo apt-get install libcairo2-dev
sudo apt-get install libgtk2.0-dev


The other important thing to do before installing R packages is to install Java and set it as default, for this installation I am using Java 8.

This is a good reference article about: How To Install Java 8 on Ubuntu 20.04/18.04/16.04


For short on Ubuntu 20.04, from a command window run this instructions:
$ sudo su Install Java 8: # apt install openjdk-8-jdk Check Java version to test it was installed: # java -version You should get a similar output like this: openjdk version "1.8.0_282" OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08) OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode) Configure Java: # cd /usr/lib/jvm # ln -s /usr/lib/jvm/java-8-openjdk-amd64 /usr/lib/jvm/default-java # export LD_LIBRARY_PATH=/usr/lib/jvm/default-java/lib:$LD_LIBRARY_PATH # export JAVA_HOME=/usr/lib/jvm/default-java # R CMD javareconf KEEP THIS TERMINAL WINDOW OPEN AS IT IS NOW AND CONTINUE TO THE R PACKAGES INSTALLATION

Step 4: INSTALL R PACKAGES:

On the same terminal above, run R: #R 
 
Now continue to the R packages installation: 
 
tidyverse -> Opinionated collection of R packages designed for data science.
install.packages( "tidyverse", dependencies = TRUE )

data.table -> Fast manipulation of large datasets.
install.packages( "data.table", dependencies = TRUE )

sqldf -> Run SQL instructions on your datasets.
install.packages( "sqldf", dependencies = TRUE )

stringdist -> Computes string distances, very useful when creating clusters of catalog descriptions.
install.packages( "stringdist", dependencies = TRUE )

RODBC -> Database access.
install.packages( "RODBC", dependencies = TRUE )

xts -> Non regular time series package
install.packages( "xts", dependencies = TRUE )

dygraphs -> Nice graphs for R
install.packages( "dygraphs", dependencies = TRUE )

openxlsx -> Read, Write and Edit XLSX Files
install.packages( "openxlsx", dependencies = TRUE )

lubridate -> Dates handling.
install.packages( "lubridate", dependencies = TRUE )

forecast -> ARIMA and forecast package
install.packages( "forecast", dependencies = TRUE )

mailR -> Send email from R
install.packages( "mailR", dependencies = TRUE )

gbm -> gbm ( Gradient Boosting Machine )algorithm for R
install.packages( "gbm", dependencies = TRUE )

gbm ->  xgboost algorithm for R
install.packages( "xgboost", dependencies = TRUE )

aTSA -> Time Series Analysis
install.packages( "aTSA", dependencies = TRUE )

rattle -> Tab-oriented user interface that is similar to Microsoft Office's ribbon interface. It makes getting started with data mining in R very easy.
install.packages( "rattle", dependencies = TRUE )

Rcmdr -> R Commander. A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
install.packages( "Rcmdr", dependencies = TRUE )

itsmr -> Time Series Analysis Using the Innovations Algorithm. Provides functions for modeling and forecasting time series data.
install.packages( "itsmr", dependencies = TRUE )

stlplus  -> A new implementation of STL. Allows for NA values, local quadratic smoothing, post-trend smoothing, and endpoint blending. The usage is very similar to that of R's built-in stl().
install.packages( "stlplus", dependencies = TRUE )

TSA -> Useful to compute data seasonality.
install.packages( "TSA", dependencies = TRUE )

Enjoy it!!!.

Carlos Kassab
https://www.linkedin.com/in/carlos-kassab-48b40743/

More information about R:
https://www.r-bloggers.com

Popular posts from this blog

UPDATED: Using R and H2O to identify product anomalies during the manufacturing process.

Note.  This is an update to article:  http://laranikalranalytics.blogspot.com/2019/03/using-r-and-h2o-to-identify-product.html - It has some updates but also code optimization from  Yana Kane-Esrig(  https://www.linkedin.com/in/ykaneesrig/ ) , as she mentioned in a message: The code you posted has two nested for() {} loops. It took a very long time to run. I used just one for() loop. It was much faster   Here her original code: num_rows=nrow(allData) for(i in 1:ncol(allData)) {   temp = allData [,i]   cat( "Processing column:", i, ", number missing:", sum( is.na(temp)), "\n" )    temp_mising =is.na( allData[, i])    temp_values = allData[,i][! temp_mising]    temp_random = sample(temp_values, size = num_rows, replace = TRUE)      temp_imputed = temp   temp_imputed[temp_mising]= temp_random [temp_mising]   # describe(temp_imputed)   allData [,i] = temp_imputed      cat( "Processed column:", i, ", number missing:", sum( is.na(allData [,i

Using R and H2O Isolation Forest anomaly detection for data quality, further analysis.

Introduction: This is the second article on data quality, for the first part, please go to:  http://laranikalranalytics.blogspot.com/2019/11/using-r-and-h2o-isolation-forest-for.html Since Isolation Forest is building an ensemble of isolation trees, and these trees are created randomly, there is a lot of randomness in the isolation forest training, so, to have a more robust result, 3 isolation forest models will be trained for a better anomaly detection. I will also use Apache Spark for data handling. For a full example, testing data will be used after training the 3 IF(Isolation Forest) models. This way of using Isolation Forest is kind of a general usage also for maintenance prediction. I am working with data from file: https://www.kaggle.com/bradklassen/pga-tour-20102018-data NOTE: There was a problem with the data from the link above, so I created some synthetic data that can be downloaded from this link:  Golf Tour Synthetic Data # Set Java parameters, enough memo