Chapter 7 Cheatsheet

Many of the functions we learned require the “Tidyverse” library to run.

7.1 Lists

The one-size-fits-all data structure…

English R Language Output type
Creating a List my_list = list("hamburger", 1:100, c(TRUE, TRUE)) List
Creating a List with names my_list_named = list(l1 = "hamburger", l2 = 1:100, l3 = c(TRUE, TRUE)) List
Names of a List names(my_list_named) String vector
Accessing elements of a List

my_list[[1]]

my_list[[2]][3]

"hamburger"

3

Accessing elements of a List using names

my_list$l1 or my_list[["l1"]]

my_list$l2[3] or my_list[["l2"]][3]

"hamburger"

3

Treating a Dataframe df as a List my_df$col1, my_df[["col1"]] Vector

7.2 Exploring new data structures

If you encounter an unknown data structure, such as the result of a t.test(), how do you explore it?

English R Language
What data structure is this? class(x)
What are its attributes? attributes(x)
What are its names, if any? names(x)

7.3 Data type checking and coercing

You loaded in the data, now what?

English R Language Output type
Is this vector a ___ type of vector? is.numeric(vec), is.double(vec), is.integer(vec), is.character(vec) is.logical(vec) Logical value

Convert ___ type of vector to ____ type of vector.

Order of coercing that is allowed: Logical vector -> Numeric vector -> Character vector

as.numeric(vec), as.double(vec), as.integer(vec), as.character(vec) Vector of desired form

7.4 Subsetting and removing missing values

English R Language Output type
Subset vec to be greater than 0 vec[vec > 0] Vector
Subset vec to have “chris” or “bob” vec[vec == "chris" | vec == "bob"] Vector
Where are the missing values in this vector? is.na(vec) Logical vector indicating where the missing value is
Given vector vec, subset to non-missing values vec[!is.na(vec)] Vector
Given a dataframe df, subset the rows so that the column col1 does not have any missing values filter(df, !is.na(col1)) Dataframe

7.5 Data recoding

English R Language
If vector vec has the value “x”, recode it as “a”

vec[vec == "x"] = "a"

or

if_else(vec == "x", "a", vec)

If vector vec has the value “x”, recode it as “a”, anything else recode as “b” if_else(vec == "x", "a", "b")
If vector vec has the value “x”, recode it as “a”, else if vec has value “y”, recode it as “b”, anything else recode as “z”.
case_when(vec == "x" ~ "a",
vec == "y" ~ "b",
.default = "z")
If vector vec has the value “x”, recode it as “a”, else if vec has value “y”, recode it as “b”, anything else leave it as is.
case_when(vec == "x" ~ "a",
vec == "y" ~ "b",
.default = vec)
If dataframe df column col has the value “x”, recode it as “a”

df$col[df$col == "x"] = "a"

or

df$col = if_else(df$col == "x", "a", df$col)

If dataframe df column col has the value “x”, recode it as “a”, anything else recode as “b”

df$col = if_else(df$col == "x", "a", "b")

or

df$col = mutate(df, col = if_else(col == "x", "a", "b"))

If dataframe df column col has the value “x”, recode it as “a”, else if column col has value “y”, recode it as “b”, anything else recode as “z”.
df$col = case_when(df$col == "x" ~ "a",
df$col == "y" ~ "b",
.default = "z")

or

df$col = mutate(df, col = case_when(col == "x" ~ "a",
col == "y" ~ "b",
.default = "z"))
If dataframe df column col has the value “x”, recode it as “a”, else if column col has value “y”, recode it as “b”, anything else leave it as is.
df$col = case_when(df$col == "x" ~ "a",
df$col == "y" ~ "b",
.default = df$col)

or

df$col = mutate(df, col = case_when(col == "x" ~ "a",
col == "y" ~ "b",
.default = col))

7.6 Conditional statements

English R Language
If statement
if(condition) {

}
If-else if statement
if(condition1) {

}else if(condition2) {

}
If-else statement
if(condition1) {

}else {

}
If-else if-else statement
if(condition1) {

}else if(condition2) {

}else {

}

7.7 Dataframe Transformations

English R Language Output type

Pivot longer on Dataframe df with columns names q1, q2, q3 to be put into its own vector “quarter” and corresponding values to its own column “sales”

More notes on specifying patterns for columns here.

pivot_longer(df, c("q1", "q2", "q3"), names_to = "quarter", values_to = "sales")

or

pivot_longer(df, starts_with("q"), names_to = "quarter", values_to = "sales")

Dataframe
Pivot wider on Dataframe df to take the values of column measurement_type to be column names and the corresponding column values. pivot_wider(df, names_from = "measurement_type", values_from = "values") Dataframe
Separate Dataframe df’s column patient_id_type into two columns patient_id and patient_type by the separator - separate(df, col = "patient_id_type", into = c("patient_id", "patient_type", sep="-") Dataframe

7.8 Writing functions

Some examples.

English R Langauge
Write a function that takes in a vector and returns a vector of the same length, such as a z-score transformation
z_score = function(vec) {
  result = (vec - mean(vec)) / sd(vec)
  return(result)
}

then, to use it:

df$biomarker_standardized = z_score(df$biomarker)
Write a function that takes in a vector and returns a summary statistic, such as the difference in highest and lowest value
max_diff = function(vec) {
  result = max(vec) - min(vec)
  return(result)
}

then, to use it:

max_diff(df$biomarker)
Write a function that takes in a Dataframe and returns some summary information about it, such as its dimension
my_dim = function(df) {
  result = c(nrow(df), ncol(df))
  return(result)
}

then, to use it:

my_dim(penguins)
Write a function that takes in a character data type, and returns a Dataframe, such as loading and preprocessing the Dataframe.
load_and_process = function(filepath) {
  df = read_csv(filepath)
  df = pivot_longer(df, c("q1", "q2", "q3"), names_to = "quarter", values_to = "sales")
  return(df)
}

then, to use it:

sales_df = load_and_process("sales_data.csv")

7.9 Iteration

Some examples

English R Language Output type
Iterate on a vector of characters representing filepaths, where the function loads in Dataframes.
files = c("f1.csv", "f2.csv", "f3.csv")

map(files, read_csv)
List of Dataframes
Iterate on a vector of characters representing filepaths, where the a fucstom function loads in Dataframes and processes them.
process_data = function(file) {
  df = read_csv(file)
  drop_na(df)
  return(df)
}

files = c("f1.csv", "f2.csv", "f3.csv")

map(files, read_csv)
List of Dataframes
Iterate on the columns of a Dataframe to compute summary statistics. (Treat the Dataframe as a List to be itereated through)
penguins_numeric = penguins %>% select(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g)

map_dbl(penguins_numeric, mean, na.rm = TRUE)
Numerical vector
Itereate over different conditions to analyze a Dataframe multiple times.
penguins_analysis = function(current_species) {
  penguins_subset = filter(penguins, species == current_species)
  result = mean(penguins_subset$bill_length_mm, na.rm=TRUE)
  return(result)
}

map_dbl(c("Adelie", "Chinstrap", "Gentoo"), penguins_analysis)
Numerical vector