Chapter 7 Cheatsheet
Many of the functions we learned require the “Tidyverse” library to run.
7.1 Lists
The one-size-fits-all data structure…
| English | R Language | Output type |
|---|---|---|
| Creating a List | my_list = list("hamburger", 1:100, c(TRUE, TRUE)) |
List |
| Creating a List with names | my_list_named = list(l1 = "hamburger", l2 = 1:100, l3 = c(TRUE, TRUE)) |
List |
| Names of a List | names(my_list_named) |
String vector |
| Accessing elements of a List |
|
|
| Accessing elements of a List using names |
|
|
Treating a Dataframe df as a List |
my_df$col1, my_df[["col1"]] |
Vector |
7.2 Exploring new data structures
If you encounter an unknown data structure, such as the result of a t.test(), how do you explore it?
| English | R Language |
|---|---|
| What data structure is this? | class(x) |
| What are its attributes? | attributes(x) |
| What are its names, if any? | names(x) |
7.3 Data type checking and coercing
You loaded in the data, now what?
| English | R Language | Output type |
|---|---|---|
| Is this vector a ___ type of vector? | is.numeric(vec), is.double(vec), is.integer(vec), is.character(vec) is.logical(vec) |
Logical value |
Convert ___ type of vector to ____ type of vector. Order of coercing that is allowed: Logical vector -> Numeric vector -> Character vector |
as.numeric(vec), as.double(vec), as.integer(vec), as.character(vec) |
Vector of desired form |
7.4 Subsetting and removing missing values
| English | R Language | Output type |
|---|---|---|
Subset vec to be greater than 0 |
vec[vec > 0] |
Vector |
Subset vec to have “chris” or “bob” |
vec[vec == "chris" | vec == "bob"] |
Vector |
| Where are the missing values in this vector? | is.na(vec) |
Logical vector indicating where the missing value is |
Given vector vec, subset to non-missing values |
vec[!is.na(vec)] |
Vector |
Given a dataframe df, subset the rows so that the column col1 does not have any missing values |
filter(df, !is.na(col1)) |
Dataframe |
7.5 Data recoding
| English | R Language |
|---|---|
If vector vec has the value “x”, recode it as “a” |
or
|
If vector vec has the value “x”, recode it as “a”, anything else recode as “b” |
if_else(vec == "x", "a", "b") |
If vector vec has the value “x”, recode it as “a”, else if vec has value “y”, recode it as “b”, anything else recode as “z”. |
|
If vector vec has the value “x”, recode it as “a”, else if vec has value “y”, recode it as “b”, anything else leave it as is. |
|
If dataframe df column col has the value “x”, recode it as “a” |
or
|
If dataframe df column col has the value “x”, recode it as “a”, anything else recode as “b” |
or
|
If dataframe df column col has the value “x”, recode it as “a”, else if column col has value “y”, recode it as “b”, anything else recode as “z”. |
or |
If dataframe df column col has the value “x”, recode it as “a”, else if column col has value “y”, recode it as “b”, anything else leave it as is. |
or |
7.6 Conditional statements
| English | R Language |
|---|---|
| If statement | |
| If-else if statement | |
| If-else statement | |
| If-else if-else statement | |
7.7 Dataframe Transformations
| English | R Language | Output type |
|---|---|---|
Pivot longer on Dataframe More notes on specifying patterns for columns here. |
or
|
Dataframe |
Pivot wider on Dataframe df to take the values of column measurement_type to be column names and the corresponding column values. |
pivot_wider(df, names_from = "measurement_type", values_from = "values") |
Dataframe |
Separate Dataframe df’s column patient_id_type into two columns patient_id and patient_type by the separator - |
separate(df, col = "patient_id_type", into = c("patient_id", "patient_type", sep="-") |
Dataframe |
7.8 Writing functions
Some examples.
| English | R Langauge |
|---|---|
| Write a function that takes in a vector and returns a vector of the same length, such as a z-score transformation |
then, to use it: |
| Write a function that takes in a vector and returns a summary statistic, such as the difference in highest and lowest value |
then, to use it: |
| Write a function that takes in a Dataframe and returns some summary information about it, such as its dimension |
then, to use it: |
| Write a function that takes in a character data type, and returns a Dataframe, such as loading and preprocessing the Dataframe. |
then, to use it: |
7.9 Iteration
Some examples
| English | R Language | Output type |
|---|---|---|
| Iterate on a vector of characters representing filepaths, where the function loads in Dataframes. | |
List of Dataframes |
| Iterate on a vector of characters representing filepaths, where the a fucstom function loads in Dataframes and processes them. | |
List of Dataframes |
| Iterate on the columns of a Dataframe to compute summary statistics. (Treat the Dataframe as a List to be itereated through) | |
Numerical vector |
| Itereate over different conditions to analyze a Dataframe multiple times. | |
Numerical vector |