Skip to main content

Installing our R development environment on Ubuntu 16.04 And Elementary OS 0.4.1


Installing our R development environment on Ubuntu 16.04 And Elementary OS 0.4.1

As the first blog entry I thought it would be a good starting point describing how to setup a development environment on my preferred Linux distro - Ubuntu, in this case I am using LTS 16.04.

This article was created on March/16/2018 and updated on July/5/2018 to include installation on Elementary OS 0.4.1 - Loki. Yes, I am using Elementary OS now.

Note. One important thing to remember is that Elementary OS 0.4.1 is based on Ubuntu LTS 16.04 so, they are mostly compatible. There are some things different but it is minimum. The good thing about Elementary OS desktop is that, it is very fast, responsive and it is still supported by Elementary developers.

Note1. At the moment of updating this article, the new Elementary OS 0.5.0 was still on beta 1:
Elementary OS 0.5.0 Developer Preview: Juno Beta 1

Note. For Elementary, it is needed to install an additional package called gdebi, it is a simple tool to install deb files with your mouse right clik, open a terminal and execute:

sudo apt install gdebi

For more information about gdebi, please go to: https://launchpad.net/gdebi

If you are using Elementary OS, it is good to read this article about before continue:
11 Things To Do After Installing Elementary OS 0.4.1 Loki


Well, after installing our Ubuntu 16.04 or Elementary OS, let's install R, in order to do it, please follow the steps in this link:

How to install R on Ubuntu 16.04, for Elementary OS Loki is the same.


Now Download and install rstudio, our favorite IDE:

R studio download link ***Select package Ubuntu 16.04+/Debian 9+ (64-bit).

To install the rstudio downloaded package, open a terminal window and from the folder where you downloaded rstudio, execute:

sudo dpkg -i rstudio-xenial-1.1.453-amd64.deb

*In this case I am showing version 1.1.453, please adjust to the version you are installing.

I did it in my computer and I got the error message showed below, I just ignored it, finish the install process in this article, restarted my computer and it is working just fine:

laran@Sirius:~/Downloads$ sudo dpkg -i rstudio-xenial-1.1.453-amd64.deb 
Selecting previously unselected package rstudio.
(Reading database ... 239335 files and directories currently installed.)
Preparing to unpack rstudio-xenial-1.1.453-amd64.deb ...
Unpacking rstudio (1.1.453) ...
dpkg: dependency problems prevent configuration of rstudio:
 rstudio depends on libjpeg62; however:
  Package libjpeg62 is not installed.

dpkg: error processing package rstudio (--install):
 dependency problems - leaving unconfigured
Processing triggers for shared-mime-info (1.5-2ubuntu0.1) ...
Processing triggers for hicolor-icon-theme (0.17-1~elementary0.4.1) ...
Processing triggers for desktop-file-utils (0.22-1ubuntu5.2+elementary2~ubuntu0.4.1.1) ...
Processing triggers for bamfdaemon (0.5.3~bzr0+16.04.20180209-0ubuntu1) ...
Rebuilding /usr/share/applications/bamf-2.index...
Processing triggers for gnome-menus (3.13.3-6ubuntu3.1) ...
Processing triggers for mime-support (3.59ubuntu1) ...
Errors were encountered while processing:
 rstudio




After this you will have your basic R development environment. The next thing to do is to install R packages, to have a more complete development environment.

Install a series of Ubuntu packages needed before installing R packages:

Open a terminal window and run all the next installation commands:

sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libssl-dev
sudo apt-get install libxml2-dev
sudo apt-get install libmariadb-client-lgpl-dev
sudo apt-get install libpq-dev
sudo apt-get install unixodbc unixodbc-dev
sudo apt-get install libiodbc2-dev
sudo apt-get install libcairo2-dev
sudo apt-get install libgtk2.0-dev
sudo apt-get install ggobi
sudo apt-get install xserver-xorg-dev
sudo apt-get install libx11-dev freeglut3 freeglut3-dev
sudo apt-get install libmagick++-dev
sudo apt-get install unixodbc-dev

The other important thing to do before installing R packages is to install Oracle Java and set it as default, because the Java included which is Open JDK will not work with R packages.

Oracle Java installation on UBUNTU 16.04 or Elementary OS Loki:

Note about Elementary OS Loki: For Elementary OS Loki we must enable PPA repositories in order to install Oracle Java, please, before continue, run the next command in terminal if you are using Elementary OS Loki:

sudo apt-get install software-properties-common


For detailed instructions on Oracle Java install, go to the next link:

Oracle Java Installation on Ubuntu 16.04, for Elementary OS Loki is the same.

For short, run this instructions:
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo apt-get install oracle-java8-set-default

This is a very important step, maybe there are other ways to do it but this one is very easy to do it.
Using the file explorer go to:

/usr/lib/jvm

Copy folder with oracle java, in this case: java-8-oracle

Rename the new folder to: default-java


Run R on a terminal window: sudo R

INSTALL R PACKAGES:

tidyverse -> Opinionated collection of R packages designed for data science.
install.packages( "tidyverse", dependencies = TRUE )

data.table -> Fast manipulation of large datasets.
install.packages( "data.table", dependencies = TRUE )

sqldf -> Run SQL instructions on your datasets.
install.packages( "sqldf", dependencies = TRUE )

stringdist -> Computes string distances, very useful when creating clusters of catalog descriptions.
install.packages( "stringdist", dependencies = TRUE )

RODBC -> Database access.
install.packages( "RODBC", dependencies = TRUE )

xts -> Non regular time series package
install.packages( "xts", dependencies = TRUE )

dygraphs -> Nice graphs for R
install.packages( "dygraphs", dependencies = TRUE )

openxlsx -> Read, Write and Edit XLSX Files
install.packages( "openxlsx", dependencies = TRUE )

lubridate -> Dates handling.
install.packages( "lubridate", dependencies = TRUE )

forecast -> ARIMA and forecast package
install.packages( "forecast", dependencies = TRUE )

mailR -> Send email from R
install.packages( "mailR", dependencies = TRUE )

gbm -> gbm ( Gradient Boosting Machine )algorithm for R
install.packages( "gbm", dependencies = TRUE )

gbm ->  xgboost algorithm for R
install.packages( "xgboost", dependencies = TRUE )

aTSA -> Time Series Analysis
install.packages( "aTSA", dependencies = TRUE )

rattle -> Tab-oriented user interface that is similar to Microsoft Office's ribbon interface. It makes getting started with data mining in R very easy.
install.packages( "rattle", dependencies = TRUE )

Rcmdr -> R Commander. A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
install.packages( "Rcmdr", dependencies = TRUE )

itsmr -> Time Series Analysis Using the Innovations Algorithm. Provides functions for modeling and forecasting time series data.
install.packages( "itsmr", dependencies = TRUE )

stlplus  -> A new implementation of STL. Allows for NA values, local quadratic smoothing, post-trend smoothing, and endpoint blending. The usage is very similar to that of R's built-in stl().
install.packages( "stlplus", dependencies = TRUE )

TSA -> Useful to compute data seasonality.
install.packages( "TSA", dependencies = TRUE )

Enjoy it!!!.

Carlos Kassab
https://www.linkedin.com/in/carlos-kassab-48b40743/

More information about R:
https://www.r-bloggers.com

Popular posts from this blog

UPDATED: Using R and H2O to identify product anomalies during the manufacturing process.

Note.  This is an update to article:  http://laranikalranalytics.blogspot.com/2019/03/using-r-and-h2o-to-identify-product.html - It has some updates but also code optimization from  Yana Kane-Esrig(  https://www.linkedin.com/in/ykaneesrig/ ) , as she mentioned in a message: The code you posted has two nested for() {} loops. It took a very long time to run. I used just one for() loop. It was much faster   Here her original code: num_rows=nrow(allData) for(i in 1:ncol(allData)) {   temp = allData [,i]   cat( "Processing column:", i, ", number missing:", sum( is.na(temp)), "\n" )    temp_mising =is.na( allData[, i])    temp_values = allData[,i][! temp_mising]    temp_random = sample(temp_values, size = num_rows, replace = TRUE)      temp_imputed = temp   temp_imputed[temp_mising]= temp_random [temp_mising]   # describe(temp_imputed)   allData [,i] = temp_imputed      cat( "Process...

Installing our R development environment on Ubuntu 20.04

  Step 1: Install R,  Here the link with instructions:  How to instal R on Ubuntu 20.04 Adding the steps I followed because sometimes the links become unavailable: Add GPG key: $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 Output: Executing: /tmp/apt-key-gpghome.NtZgt0Un4R/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 gpg: key 51716619E084DAB9: public key "Michael Rutter " imported gpg: Total number processed: 1 gpg: imported: 1 Add repository: $ sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' Output: Hit:1 https://deb.opera.com/opera-stable stable InRelease Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease Hit:3 http://archive.canonical.com/ubuntu focal InRelease ...

Using R and H2O Isolation Forest anomaly detection for data quality, further analysis.

Introduction: This is the second article on data quality, for the first part, please go to:  http://laranikalranalytics.blogspot.com/2019/11/using-r-and-h2o-isolation-forest-for.html Since Isolation Forest is building an ensemble of isolation trees, and these trees are created randomly, there is a lot of randomness in the isolation forest training, so, to have a more robust result, 3 isolation forest models will be trained for a better anomaly detection. I will also use Apache Spark for data handling. For a full example, testing data will be used after training the 3 IF(Isolation Forest) models. This way of using Isolation Forest is kind of a general usage also for maintenance prediction. I am working with data from file: https://www.kaggle.com/bradklassen/pga-tour-20102018-data NOTE: There was a problem with the data from the link above, so I created some synthetic data that can be downloaded from this link:  Golf Tour Synthetic Data # Set Java parameters, en...