Chapter 7 Cheatsheet

Many of the functions we learned require the “Tidyverse” library to run.

7.1 Lists

The one-size-fits-all data structure…

English	R Language	Output type
Creating a List	`my_list = list("hamburger", 1:100, c(TRUE, TRUE))`	List
Creating a List with names	`my_list_named = list(l1 = "hamburger", l2 = 1:100, l3 = c(TRUE, TRUE))`	List
Names of a List	`names(my_list_named)`	String vector
Accessing elements of a List	`my_list[[1]]` `my_list[[2]][3]`	`"hamburger"` `3`
Accessing elements of a List using names	`my_list$l1` or `my_list[["l1"]]` `my_list$l2[3]` or `my_list[["l2"]][3]`	`"hamburger"` `3`
Treating a Dataframe `df` as a List	`my_df$col1`, `my_df[["col1"]]`	Vector

7.2 Exploring new data structures

If you encounter an unknown data structure, such as the result of a t.test(), how do you explore it?

English	R Language
What data structure is this?	`class(x)`
What are its attributes?	`attributes(x)`
What are its names, if any?	`names(x)`

7.3 Data type checking and coercing

You loaded in the data, now what?

English R Language Output type

Is this vector a ___ type of vector? is.numeric(vec), is.double(vec), is.integer(vec), is.character(vec) is.logical(vec) Logical value

English	R Language	Output type
Is this vector a ___ type of vector?	`is.numeric(vec)`, `is.double(vec)`, `is.integer(vec)`, `is.character(vec)` `is.logical(vec)`	Logical value
Convert ___ type of vector to ____ type of vector. Order of coercing that is allowed: Logical vector -> Numeric vector -> Character vector	`as.numeric(vec)`, `as.double(vec)`, `as.integer(vec)`, `as.character(vec)`	Vector of desired form

Convert ___ type of vector to ____ type of vector.

Order of coercing that is allowed: Logical vector -> Numeric vector -> Character vector

as.numeric(vec), as.double(vec), as.integer(vec), as.character(vec)

Vector of desired form

7.4 Subsetting and removing missing values

English	R Language	Output type
Subset `vec` to be greater than 0	`vec[vec > 0]`	Vector
Subset `vec` to have “chris” or “bob”	`vec[vec == "chris" \| vec == "bob"]`	Vector
Where are the missing values in this vector?	`is.na(vec)`	Logical vector indicating where the missing value is
Given vector `vec`, subset to non-missing values	`vec[!is.na(vec)]`	Vector
Given a dataframe `df`, subset the rows so that the column `col1` does not have any missing values	`filter(df, !is.na(col1))`	Dataframe

7.5 Data recoding

English	R Language
If vector `vec` has the value “x”, recode it as “a”	`vec[vec == "x"] = "a"` or `if_else(vec == "x", "a", vec)`
If vector `vec` has the value “x”, recode it as “a”, anything else recode as “b”	`if_else(vec == "x", "a", "b")`
If vector `vec` has the value “x”, recode it as “a”, else if `vec` has value “y”, recode it as “b”, anything else recode as “z”.	`case_when(vec == "x" ~ "a", vec == "y" ~ "b", .default = "z")`
If vector `vec` has the value “x”, recode it as “a”, else if `vec` has value “y”, recode it as “b”, anything else leave it as is.	`case_when(vec == "x" ~ "a", vec == "y" ~ "b", .default = vec)`
If dataframe `df` column `col` has the value “x”, recode it as “a”	`df$col[df$col == "x"] = "a"` or `df$col = if_else(df$col == "x", "a", df$col)`
If dataframe `df` column `col` has the value “x”, recode it as “a”, anything else recode as “b”	`df$col = if_else(df$col == "x", "a", "b")` or `df$col = mutate(df, col = if_else(col == "x", "a", "b"))`
If dataframe `df` column `col` has the value “x”, recode it as “a”, else if column `col` has value “y”, recode it as “b”, anything else recode as “z”.	`df$col = case_when(df$col == "x" ~ "a", df$col == "y" ~ "b", .default = "z")` or `df$col = mutate(df, col = case_when(col == "x" ~ "a", col == "y" ~ "b", .default = "z"))`
If dataframe `df` column `col` has the value “x”, recode it as “a”, else if column `col` has value “y”, recode it as “b”, anything else leave it as is.	`df$col = case_when(df$col == "x" ~ "a", df$col == "y" ~ "b", .default = df$col)` or `df$col = mutate(df, col = case_when(col == "x" ~ "a", col == "y" ~ "b", .default = col))`

7.6 Conditional statements

English	R Language
If statement	`if(condition) { }`
If-else if statement	`if(condition1) { }else if(condition2) { }`
If-else statement	`if(condition1) { }else { }`
If-else if-else statement	`if(condition1) { }else if(condition2) { }else { }`

7.7 Dataframe Transformations

English R Language Output type

English	R Language	Output type
Pivot longer on Dataframe `df` with columns names `q1`, `q2`, `q3` to be put into its own vector “quarter” and corresponding values to its own column “sales” More notes on specifying patterns for columns here.	`pivot_longer(df, c("q1", "q2", "q3"), names_to = "quarter", values_to = "sales")` or `pivot_longer(df, starts_with("q"), names_to = "quarter", values_to = "sales")`	Dataframe
Pivot wider on Dataframe `df` to take the values of column `measurement_type` to be column names and the corresponding column `values`.	`pivot_wider(df, names_from = "measurement_type", values_from = "values")`	Dataframe
Separate Dataframe `df`’s column `patient_id_type` into two columns `patient_id` and `patient_type` by the separator `-`	`separate(df, col = "patient_id_type", into = c("patient_id", "patient_type", sep="-")`	Dataframe

Pivot longer on Dataframe df with columns names q1, q2, q3 to be put into its own vector “quarter” and corresponding values to its own column “sales”

More notes on specifying patterns for columns here.

pivot_longer(df, c("q1", "q2", "q3"), names_to = "quarter", values_to = "sales")

pivot_longer(df, starts_with("q"), names_to = "quarter", values_to = "sales")

Dataframe

Pivot wider on Dataframe df to take the values of column measurement_type to be column names and the corresponding column values. pivot_wider(df, names_from = "measurement_type", values_from = "values") Dataframe

Separate Dataframe df’s column patient_id_type into two columns patient_id and patient_type by the separator - separate(df, col = "patient_id_type", into = c("patient_id", "patient_type", sep="-") Dataframe

7.8 Writing functions

Some examples.

English	R Langauge
Write a function that takes in a vector and returns a vector of the same length, such as a z-score transformation	`z_score = function(vec) { result = (vec - mean(vec)) / sd(vec) return(result) }` then, to use it: `df$biomarker_standardized = z_score(df$biomarker)`
Write a function that takes in a vector and returns a summary statistic, such as the difference in highest and lowest value	`max_diff = function(vec) { result = max(vec) - min(vec) return(result) }` then, to use it: `max_diff(df$biomarker)`
Write a function that takes in a Dataframe and returns some summary information about it, such as its dimension	`my_dim = function(df) { result = c(nrow(df), ncol(df)) return(result) }` then, to use it: `my_dim(penguins)`
Write a function that takes in a character data type, and returns a Dataframe, such as loading and preprocessing the Dataframe.	`load_and_process = function(filepath) { df = read_csv(filepath) df = pivot_longer(df, c("q1", "q2", "q3"), names_to = "quarter", values_to = "sales") return(df) }` then, to use it: `sales_df = load_and_process("sales_data.csv")`

7.9 Iteration

Some examples

English	R Language	Output type
Iterate on a vector of characters representing filepaths, where the function loads in Dataframes.	`files = c("f1.csv", "f2.csv", "f3.csv") map(files, read_csv)`	List of Dataframes
Iterate on a vector of characters representing filepaths, where the a fucstom function loads in Dataframes and processes them.	`process_data = function(file) { df = read_csv(file) drop_na(df) return(df) } files = c("f1.csv", "f2.csv", "f3.csv") map(files, read_csv)`	List of Dataframes
Iterate on the columns of a Dataframe to compute summary statistics. (Treat the Dataframe as a List to be itereated through)	`penguins_numeric = penguins %>% select(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) map_dbl(penguins_numeric, mean, na.rm = TRUE)`	Numerical vector
Itereate over different conditions to analyze a Dataframe multiple times.	`penguins_analysis = function(current_species) { penguins_subset = filter(penguins, species == current_species) result = mean(penguins_subset$bill_length_mm, na.rm=TRUE) return(result) } map_dbl(c("Adelie", "Chinstrap", "Gentoo"), penguins_analysis)`	Numerical vector