Chapter 7 Cheatsheet

Here is a summary of expressions we learned in class.

Recall that we focused on English <-> Programming Code for R Interpreter in this class. Many of the functions we learned require the “Tidyverse” library to run.

7.1 Basic Data Types

English	R Language
Numeric	`2 + 3`
Character	`"hello", "123"`
Logical	`TRUE`, `FALSE`

7.2 Vectors

English R Language

Create a vector with some elements

English	R Language
Create a vector with some elements	`vec = c(1, -4, -9, 12)` `names = c("chris", "hannah", "chris", NA)`
Compute length of a vector	`length(vector)`
Access the second element of `names`	`names[2]`

vec = c(1, -4, -9, 12)

names = c("chris", "hannah", "chris", NA)

Compute length of a vector length(vector)

Access the second element of names names[2]

7.3 Conditional Operations

Often to create a logical indexing vector for subsetting

English	R Language
`vec` is greater than 0	`vec > 0`
`vec` is between 0 and 10	`vec >= 0 & vec <= 10`
`vec` is between 0 and 10, exclusively	`vec > 0 & vec < 10`
`vec` is greater than 4 or less than -4	`vec > 4 \| vec < -4`
`names` is “chris”	`names == "chris"`
`names` is not “chris”	`names != "chris"`
The non-missing values of `names`	`!is.na(names)`

7.4 Subsetting vectors

English R Language

English	R Language
Subset `vec` to the first 3 elements	`vec[c(1, 2, 3)]` or `vec[1:3]` or `vec[c(TRUE, TRUE, TRUE, FALSE)]`
Subset `vec` to be greater than 0	`vec[vec > 0]`
Subset `names` to have “chris”	`vec[vec == "chris"]`

Subset vec to the first 3 elements

vec[c(1, 2, 3)]

vec[1:3]

vec[c(TRUE, TRUE, TRUE, FALSE)]

Subset vec to be greater than 0 vec[vec > 0]

Subset names to have “chris” vec[vec == "chris"]

7.5 Dataframes

English	R Language
Load a dataframe from CSV file “data.csv”	`dataframe = read_csv("data.csv")`
Load a dataframe from Excel file “data.xlsx”	`dataframe = read_excel("data.xlsx")`
Compute the dimension of `dataframe`	`dim(dataframe)`
Access a column “subtype” of dataframe as a vector	`dataframe$subtype`
Subset `dataframe` to columns “subtype”, “diversity”, “outcome”	`select(dataframe, subtype, diversity, outcome)`
Subset `dataframe` to rows such that the outcome is greater than zero, and the subtype is “lung”.	`filter(dataframe, outcome > 0 & subtype == "lung"`)
Create a new column “log_outcome” so that it is the log transform of “outcome” column	`dataframe$log_outcome = log(dataframe$outcome)` or `dataframe = mutate(dataframe, log_outcome = log(outcome)`

7.6 Summary Statistics of a Dataframe’s column

English	R Language
Mean of `dataframe`’s “outcome” column	`mean(dataframe$outcome)`
Mean of `dataframe`’s “outcome” column, removing `NA` values	`mean(dataframe$outcome, na.rm = TRUE)`
Max of `dataframe`’s “outcome” column	`max(dataframe$outcome)`
Min of `dataframe`’s “outcome” column	`min(dataframe$outcome)`
Count of `dataframe`’s “subtype” column	`table(dataframe$subtype)`

7.7 Dataframe transformations

English R Language

Merge dataframe df1 and df2 by common column “id”, using all common entities. full_join(df1, df2, "id")

English	R Language
Merge dataframe `df1` and `df2` by common column “id”, using all common entities.	`full_join(df1, df2, "id")`
Group `dataframe` by “subtype” column, and summarise the mean “outcome” value for each “subtype” value, and get the total elements for each “subtype” value.	`dataframe_grouped = group_by(dataframe, subtype)` `dataframe_summary = summarise(dataframe_grouped, mean_outcome = mean(outcome), n_sample = n())`

Group dataframe by “subtype” column, and summarise the mean “outcome” value for each “subtype” value, and get the total elements for each “subtype” value.

dataframe_grouped = group_by(dataframe, subtype)

dataframe_summary = summarise(dataframe_grouped, mean_outcome = mean(outcome), n_sample = n())