11 Exercises

The following exercises will walk you through the process of using iSEE to explore single cell RNA-seq data. You will use data that have already been prepared and is saved in a SingleCellExperiment object. You will use the pbmc3k dataset from the TENxPBMCData package, which contains gene expression profiles for 2,700 single peripheral blood mononuclear cells. Information on how the data were preprocessed can be found here.

These exercises were adapted from iSEEWorkshopEuroBioc2020/.

To follow along with these exercises, you will need to complete the steps described in the Preparation guide for this demo.

11.1 Loading the Data

First, we’ll want to load the data that we’ll be using. To do that, we’ll need to load the AnVIL package created by the Bioconductor team for interfacing with files on AnVIL. Luckily, it’s already installed.

library(AnVIL)

Next, we’ll use the avfiles_restore() function from the AnVIL package to actually bring the data into our environment’s persistent disk.

avfiles_restore( 
  source = "sc_bioconductor_data.RData",
  namespace = "anvil-outreach",
  name = "demos-single-cell-bioconductor"
)

In the code above, we are copying the file sc_bioconductor_data.RData from the Workspace demos-single-cell-bioconductor. It was created under the anvil-outreach Billing Project.

11.2 Installing iSEE

Once you’ve loaded the data, you’ll want to install and load the iSEE library. You can easily install it into your own personal RStudio environment using the pre-installed BiocManager commands.

BiocManager::install('iSEE')
library(iSEE)

11.3 Panel Display

First we will take a look at the interactive plots that iSEE can display.

11.4 Visualize Cell Type Assignment

Next, let’s focus specifically on visualizing cell type assignment by cluster membership. The goal is to identify the predominant cell type in each cluster. We can do this by plotting the column data in a ColumnDataPlot.

First, select the panel organization button and select “Organize panels”.

Remove all plots except for the column data plot. This will make things easier to view. Change the width to 12.

You should now see a large scatter plot. Select “Data parameters” underneath the plot.

First, select “labels_fine” under “Column of interest (Y-axis)”. Directly below, select the “Column data” button for “X-axis”. Once the dropdown menu appears for “Column of interest (X-axis)”, select “Cluster”.

Since both cell annotations and the cluster are categorical, iSEE will generate a visual representation of a matrix called a “Hinton plot”.

Now we know that cluster 4 contains almost all the cells that were annotated as classical monocytes. On the other hand, T cells can be found in multiple clusters.

We can also save the R code used to create our iSEE plots. This helps make our work reproducible!

11.5 Visualize Expression of a Single Gene

Now let’s take a look at the expression data of a single gene across all the clusters. We can use the “Feature assay plot” panel to plot the distribution of the logcount values for a particular gene.

Click on the “Organize Panels” icon in the top right corner. Remove the “column plot” and choose “feature assay plot”. Change the width of the plot to 12.

You should now see a rather underwhelming bar plot. We still need to change the data parameters, so click on the “Data parameters” box.

Next, change the “Y-axis feature” to “LYZ”. This is the gene whose expression we’ll be examining. Change the feature selection box to “logcounts.”

The LYZ gene encodes an enzyme called lysozyme, which plays a crucial role in the immune system’s defense against bacterial infections. The primary function of lysozyme is to break down bacterial cell walls.

The highest levels of LYZ gene expression are typically observed in tissues with direct contact with the external environment, such as the epithelial cells of the respiratory tract, gastrointestinal tract, and genitourinary system. These tissues are often exposed to potential pathogens, and the expression of lysozyme helps provide an additional line of defense against bacterial invasion.

Next, click the “column data” button under the “X-axis” header. Finally, choose “Cluster” from the drop down menu of “X-axis column data.”

Now we have a much more exciting violin plot of LYZ gene expression levels across the 14 clusters in our dataset. LYZ is expressed more in clusters 4, 8, 9, and 13. We might be interested in also displaying cell type information in this plot, which we can do using the Visual Parameters options. Click the “Visual parameters” box.

Make sure “Color” is checked in the first row. Choose “Column Data” under the “Color by” options and change the drop down menu to “labels_main”. We could also choose to color the data by “labels_fine”.

We should see the dots in our violin plot colored by cell type annotation. Three of the clusters which have higher LYZ expression contain large numbers of cells identified as monocytes. Since LYZ codes for a human lysosome protein and is often used as a marker gene for monocytes, this makes a lot of sense.

Monocytes are a subset of white blood cells that play a pivotal role in our immune defense against infections. Upon encountering an infection or inflammation, they migrate from the bloodstream to the affected tissues, where they differentiate into specialized cells that engulf and eliminate pathogens.

Let’s download this plot. Click on the “download” button in the top right corner like we did before, but this time choose “Download panel output”. A box will pop up asking you to choose which plots to download. This means you could have multiple plots being displayed in your panel but only choose to download a subset of them. Make sure “Feature assay plot” is checked and click “Download”. Your figure will be saved in a zip file in your Downloads folder.

Your turn!

CD14 is a marker gene for the same type of cells as LYZ. Does it have the same cluster expression pattern as what we saw for LYZ?

11.6 Get Session Info

It’s a good idea to document information about the packages (and their versions) you used while running the analysis. The last codeblock uses the sessionInfo() command to do just that. Here’s an example of what that might look like:

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.12       highr_0.8         bslib_0.6.1       compiler_4.0.2   
##  [5] pillar_1.9.0      jquerylib_0.1.4   tools_4.0.2       digest_0.6.25    
##  [9] lattice_0.20-41   jsonlite_1.7.1    udpipe_0.8.3      evaluate_0.23    
## [13] lifecycle_1.0.4   tibble_3.2.1      pkgconfig_2.0.3   rlang_1.1.3      
## [17] Matrix_1.2-18     igraph_1.2.6      cli_3.6.2         curl_5.2.1       
## [21] yaml_2.2.1        xfun_0.26         fastmap_1.1.1     xml2_1.3.2       
## [25] dplyr_1.0.2       stringr_1.4.0     httr_1.4.2        knitr_1.33       
## [29] hms_0.5.3         askpass_1.1       fs_1.5.0          generics_0.0.2   
## [33] vctrs_0.6.5       sass_0.4.8        gitcreds_0.1.1    grid_4.0.2       
## [37] rprojroot_2.0.4   tidyselect_1.1.0  glue_1.4.2        data.table_1.13.0
## [41] R6_2.4.1          cow_0.0.0.9000    fansi_0.4.1       textrank_0.3.0   
## [45] ottrpal_1.2.1     rmarkdown_2.10    bookdown_0.24     readr_1.4.0      
## [49] purrr_0.3.4       magrittr_2.0.3    htmltools_0.5.7   utf8_1.1.4       
## [53] stringi_1.5.3     openssl_1.4.3     cachem_1.0.8

11.7 Shutting Down

  1. Pausing your cloud environment only temporarily stops your work. When you are ready to delete the cloud environment, click on the RStudio icon on the right-hand side and select “Settings”. If you don’t see this icon, you may need to scroll to the right.

    Screenshot of the Workspace page. The RStudio icon associated with the cloud environment is highlighted. The Settings button is also highlighted

  2. Click on “Delete Environment”.

    Screenshot of the cloud environment popout. "Delete environment" is highlighted.

  3. If you are certain that you do not need the data and configuration on your disk, you should select “Delete everything, including persistent disk”. If there is anything you would like to save, open the compute environment and copy the file(s) from your compute environment to another location, such as the Workspace bucket, GitHub, or your local machine, depending on your needs.

    Screenshot of the cloud environment popout. "Delete everything, including persistent disk" is highlighted.

  4. Select “DELETE”.

    Screenshot of the cloud environment popout. "Delete" is highlighted.

You can also delete your cloud environment(s) and disk storage at https://anvil.terra.bio/#clusters.