Chapter 1 Introduction to R
1.1 R Language
R is a complete programming language and software environment for statistical computing and graphical representation. As part of the GNU Project - free software, mass collaboration project - (https://www.gnu.org/software/software.en.html), the source code is free available.
For more details on R see https://www.r-project.org/.
1.1.1 R Packages
Functionalities in R can be expanded by importing packages. A package is a collection of R functions, data and compiled code. The location where the packages are stored is called the library. If there is a particular functionality that you require, you can download the package from the appropriate site and it will be stored in your library.
In all operation systems the function install.packages()
can be used to download and install a package automatically.
Otherwise, a package already installed in R can be loaded in a session by using the command library(package_name)
.
When you open an R Markdown document (.Rmd) the program propose you automatically to install the libraries listed there.
1.1.2 Some tips
- R is case sensitive!
- Previously used command can be recalled in the console by using the up arrow on the keyboard.
- The working directory by default is “
C:/user/.../Documents
”.- It can be found using the command
getwd()
- It can be changed using the command line
setwd("C:/Your/own/path")
- It can be found using the command
- In R Markdown the working directory when evaluating R code chunks is the directory of the input document by default.
- To access to a specific file in a sub-folder use “
. /subfolder/file.ext
” - To access to a specific file in a up-folder use “
. . /upfolder/file.ext
”
- To access to a specific file in a sub-folder use “
1.2 R Markdown
This is an R Markdown document :-)
Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. It is a simple and easy to use plain text language allowing to combine R code, results from data analysis (including plots and tables), and comments into a single nicely formatted and reproducible document (like a report, publication, thesis chapter or web pages).
Code lines are organized into code blocks, seeking to solve specified tasks, and referred to as “code chunk”. For more details on using R Markdown see http://rmarkdown.rstudio.com.
All what you have to do during the computing labs is to read each explanatory paragraph before running each individual R code chunk, one by one, and to interpret the results. Finally, to create a personal document (usually a PDF) from R Markdown, you need to Knit the document. Knitting a document simply means taking all the text and code and creating a nicely formatted document.
1.3 Data type in computational analysis
1.3.1 Variables
Variables are used to store values in a computer program. Values can be numbers (real and complex), words (string), matrices, and even tables.
The fundamental or atomic data in R Programming can be:
- integer: number without decimals
- numeric: number with decimals (float or double depending on the precision)
- character: string, label
- factors: a label with a limited number of categories
- logical: true/false
1.3.2 Data structure in R
R’s base data structures can be organised by their dimensionality (1d, 2d, or nd) and whether they are homogeneous (all contents must be of the same type) or heterogeneous (the contents can be of different types). This gives rise to the four data structures most often used in data analysis:
A Vector is a one-dimensional structure winch can contain object of one type only: numerical (integer and double), character, and logical.
## [1] 0.5 0.7
## [1] "double"
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] "integer"
## [1] TRUE FALSE
## [1] "logical"
## [1] "Swiss" "Itay" "France" "Germany"
## [1] "character"
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
## [1] "double"
## [1] 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 2 3 3 4 5
## [1] 2
## [1] 1.0 1.5 2.5 3.0 3.5 4.0 4.5 5.0
## [1] 1.0 1.5 2.5 3.0 3.5 4.0 4.5 5.0
A Matrix is a two-dimensional structure winch can contain object of one type only.
The function matrix()
can be used to construct matrices with specific dimensions.
# Matrix of elements equal to "zero" and dimension 2x5
m1<-matrix(0,2,5); m1 #(two rows by five columns)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 0 0 0 0
## [2,] 0 0 0 0 0
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
## [1] 2 5 8 11
## [1] 7 8 9
## [1] 8
A data frame allows to collect heterogeneous data. All elements must have the same length.
A list is a more flexible structure since it can contain variables of different types and lengths. Nevertheless, the preferred structure for statistical analyses and computation is the data frame.
It is a good practice to explore the data frame before performing further computation on the data.
This can be simply accomplished by using the commands str()
to explore the structure of the data and summary()
to display the summary statistics and quickly summarize the data.
For numerical vectors the command hist()
can be used to plot the basic histogram of the given values.
# Create the vectors with the variables
cities <- c("Berlin", "New York", "Paris", "Tokyo")
area <- c(892, 1214, 105, 2188)
population <- c(3.4, 8.1, 2.1, 12.9)
continent <- c("Europe", "Norh America", "Europe", "Asia")
# Concatenate the vectors into a new data frame
df1 <- data.frame(cities, area, population, continent)
df1
## cities area population continent
## 1 Berlin 892 3.4 Europe
## 2 New York 1214 8.1 Norh America
## 3 Paris 105 2.1 Europe
## 4 Tokyo 2188 12.9 Asia
#Add a column (e.g., language spoken) using the command "cbind"
df2 <- cbind (df1, "Language" = c ("German", "English", "Freanch", "Japanese"))
df2
## cities area population continent Language
## 1 Berlin 892 3.4 Europe German
## 2 New York 1214 8.1 Norh America English
## 3 Paris 105 2.1 Europe Freanch
## 4 Tokyo 2188 12.9 Asia Japanese
## 'data.frame': 4 obs. of 5 variables:
## $ cities : chr "Berlin" "New York" "Paris" "Tokyo"
## $ area : num 892 1214 105 2188
## $ population: num 3.4 8.1 2.1 12.9
## $ continent : chr "Europe" "Norh America" "Europe" "Asia"
## $ Language : chr "German" "English" "Freanch" "Japanese"
## cities area population continent
## Length:4 Min. : 105.0 Min. : 2.100 Length:4
## Class :character 1st Qu.: 695.2 1st Qu.: 3.075 Class :character
## Mode :character Median :1053.0 Median : 5.750 Mode :character
## Mean :1099.8 Mean : 6.625
## 3rd Qu.:1457.5 3rd Qu.: 9.300
## Max. :2188.0 Max. :12.900
## Language
## Length:4
## Class :character
## Mode :character
##
##
##
## [1] 3.4 8.1 2.1 12.9