Chapter 7 Cheatsheet

Here is a summary of expressions we learned in class.

Recall that we focused on English <-> Programming Code for R Interpreter in this class. Many of the functions we learned require the “Tidyverse” library to run.

7.1 Basic Data Types

English R Language
Numeric 2 + 3
Character "hello", "123"
Logical TRUE, FALSE

7.2 Vectors

English R Language
Create a vector with some elements

vec = c(1, -4, -9, 12)

names = c("chris", "hannah", "chris", NA)

Compute length of a vector length(vector)
Access the second element of names names[2]

7.3 Conditional Operations

Often to create a logical indexing vector for subsetting

English R Language
vec is greater than 0 vec > 0
vec is between 0 and 10 vec >= 0 & vec <= 10
vec is between 0 and 10, exclusively vec > 0 & vec < 10
vec is greater than 4 or less than -4 vec > 4 | vec < -4
names is “chris” names == "chris"
names is not “chris” names != "chris"
The non-missing values of names !is.na(names)

7.4 Subsetting vectors

English R Language
Subset vec to the first 3 elements

vec[c(1, 2, 3)]

or

vec[1:3]

or

vec[c(TRUE, TRUE, TRUE, FALSE)]

Subset vec to be greater than 0 vec[vec > 0]
Subset names to have “chris” or “bob” vec[vec == "chris" | vec == "bob"]

7.5 Dataframes

English R Language Output data type
Load a dataframe from CSV file “data.csv” dataframe = read_csv("data.csv") Dataframe
Load a dataframe from Excel file “data.xlsx” dataframe = read_excel("data.xlsx") Dataframe
Compute the dimension of dataframe dim(dataframe) Vector of length 2
Examine the column names of dataframe colnames(dataframe) Vector
Access a column “subtype” of dataframe as a vector dataframe$subtype Vector
Access a column “subtype” of dataframe as a vector and look at its length length(dataframe$subtype) Numeric
Subset dataframe to columns “subtype”, “diversity”, “outcome” select(dataframe, subtype, diversity, outcome) Dataframe
Subset dataframe to rows such that the outcome is greater than zero, and the subtype is “lung”. filter(dataframe, outcome > 0 & subtype == "lung") Dataframe
Create a new column “log_outcome” from dataframe so that it is the log transform of “outcome” column mutate(dataframe, log_outcome = log(outcome)) Dataframe

7.6 Summary Statistics of a Dataframe’s column

English R Language Output data type
Mean of dataframe’s “outcome” column mean(dataframe$outcome) Numeric
Mean of dataframe’s “outcome” column, removing NA values mean(dataframe$outcome, na.rm = TRUE) Numeric
Max of dataframe’s “outcome” column max(dataframe$outcome) Numeric
Min of dataframe’s “outcome” column min(dataframe$outcome) Numeric
Count of dataframe’s “subtype” column table(dataframe$subtype) Vector

7.7 Dataframe transformations

English R Language Output data type
Merge dataframe df1 and df2 by common column “id”, using all common entities. full_join(df1, df2, by = join_by(id)) Dataframe
Group dataframe by “subtype” column, and summarise the mean “outcome” value for each “subtype” value, and get the total elements for each “subtype” value.

dataframe_grouped = group_by(dataframe, subtype)

dataframe_summary = summarise(dataframe_grouped, mean_outcome = mean(outcome), n_sample = n())

Dataframe