Getting started with Spatial Transcriptomics

Get workshop files

Download the files for this activity clicking here: https://github.com/fhdsl/Moffitt_ITN_Workshop/archive/refs/heads/main.zip
Put this file on your desktop so it is easily findable.
Double click the zip file (or right click and choose “unzip” or “decompress” to unzip the file.
Open up your activity files you downloaded so we can see what’s there.

Get familiar with the data set

Within the folder, navigate to activity-files and then spatial_transcriptomics_activity_files we should see a metadata file (meylan_etal_2022_tumor_grade.csv), a PDF of the manuscript that describes these data, and a visium_samples folder that includes two samples.

Each sample’s folder contains a several files resulting from the spaceranger pipeline. However, we will use the following files:

sample_b_01
- GSM5924046_frozen_b_1_filtered_feature_bc_matrix.h5 (the gene expression data)
- spatial
  - GSM5924046_frozen_b_1_tissue_positions_list.csv (spot x,y spatial positions)
  - GSM5924046_frozen_b_1_tissue_hires_image.png (The H&E-stained tissue image)
  - GSM5924046_frozen_b_1_scalefactors_json.json (to scale the H&E image and plots)
sample_b_07 …

The README describes these samples:

These samples result from tumor biopsies collected from a patient cohort with clear cell renal cell carcinoma (ccRCC). The researchers profiled the samples with 10X Visium to study the gene expression in TLS in a spatial context.

The metadata file (meylan_etal_2022_tumor_grade.csv) contains two variables of interest. One is the tumor grade, and the other is positivity for a tertiary lymphoid structure (TLS) as ascertained using immunohistochemistry staining. The file looks like this:

samplename	cohort	ID	pT	tls
sample_b_01	IMM	b_1	pT1	pos
sample_b_07	IMM	b_7	pT3	neg

Create an account with spatialGE

Go to https://spatialge.moffitt.org/
Click on “Sign Up” in the upper right corner.

Starting a new project

Click the blue “New Project” button.
For What spatial transcriptomics platform are you using for this project? choose Visium – this is the type of data our example data are.
Make your own project name and description that’s sensible. Could be something related to the workshop such as “ITN-Moffitt workshop”.
Then click “Create”.

Uploading the dataset

IMPORTANT: For each sample we will repeat the following steps to upload each sample’s set of files.

Uploading one sample’s data

For Sample Name put the ID indicating on the folder, e.g. sample_b_01. This is very important, as sample IDs need to match exactly the sample IDs in the metadata file (meylan_etal_2022_tumor_grade.csv). Otherwise, no metadata is imported.
For the Gene expression box upload the .h5 file e.g. GSM5924046_frozen_b_1_filtered_feature_bc_matrix.h5. You can upload files by dragging and dropping or by clicking on them to navigate.
For the Coordinates box upload the .csv file e.g. GSM5924046_frozen_b_1_tissue_positions_list.csv.
For the Tissue image box upload the .png file e.g. GSM5924046_frozen_b_1_tissue_hires_image.png.
For the Scale factor box upload the .json file e.g. GSM5924046_frozen_b_1_scalefactors_json.json. The scaling factor file is output automatically by the 10X Space Ranger pipeline, and contains information to approximate the size of the tissue image and the expression plots.

Once the above steps are done click the green Import Sample.

Now return to the beginning of these steps to repeat the same steps for the other sample.

You can use this checklist to keep track as you upload and follow the steps for each sample.

sample_b_01 data entered
sample_b_07 data entered

Adding metadata

Now click Option 1: Upload metadata file.
Upload the meylan_etal_2022_tumor_grade.csv file. You can drag and drop the file or by click on the (+) button to navigate.

Homework: Manually adding sample metadata

Metadata can also be added manually. To do so, follow these steps:

Click Option 2: Add metadata manually.
Click Add new metadata column. Add a column named patient.
Click Add new metadata column again. Add a column named therapy.

You can refer to the meylan_etal_2022_tumor_grade.csv file’s contents to add these data for each sample:

samplename	pT	tls
sample_b_01	pT1	pos
sample_b_07	pT3	neg
sample_b_18	pT2	pos
sample_a_01	pT1b	neg

Add this sample_b_01 corresponding pT and tls information.
Add this sample_b_07 corresponding pT and tls information.
Add this sample_b_18 corresponding pT and tls information.
Add this sample_a_01 corresponding pT and tls information.

Remember: The sample IDs in the metadata should match exactly the sample names used during file import.

Homework: Additional samples in the data set

Users are encouraged to try spatialGE with these additional samples after the workshop. These samples were not used in the workshop for time-efficiency purposes

The metadata for the other two samples looks like this:

samplename	cohort	ID	pT	tls
sample_b_01	IMM	b_1	pT1	pos
sample_b_07	IMM	b_7	pT3	neg
sample_b_18	IMM	b_18	pT2	pos
sample_a_01	ExhauCRF	a_1	pT1b	neg

The metadata file and additional visium samples can be found within the additional_activity_files_for_home.

You can use this checklist to keep track as you upload and follow the steps for each sample.

sample_b_01 data entered
sample_b_07 data entered
sample_b_18 data entered
sample_a_01 data entered

After you’ve entered the data and metadata:

NOTE: Make sure to upload all the samples before clicking the Import Data button. You will not be able to edit the project (unless you start a new project completely) after you click Import Data.

Make sure everything is as you intend and then click Import Data.

This may take a little bit of time. Note you can have it send you an email instead of waiting on the page.

Advanced task: Loading data into the command-line version of spatialGE

Most of analyses in this step-by-step tutorial can also be performed with the spatialGE R package. To do so, start by installing the spatialGE package:

# Install devtools if not already installed:
#install.packages('devtools')

devtools::install_github('FridleyLab/spatialGE')

Then load the spatialGE package:

library('spatialGE')

Now, specify the directory paths to the folders containing the data and the metadata file:

counts_fp = c('./activity-files/spatial_transcriptomics_activity_files/visium_samples/sample_b_01/',
              './activity-files/spatial_transcriptomics_activity_files/visium_samples/sample_b_07/',
              './activity-files/spatial_transcriptomics_activity_files/additional_activity_files_for_home/sample_a_01/',
              './activity-files/spatial_transcriptomics_activity_files/additional_activity_files_for_home/sample_b_18/')

meta_file = './activity-files/spatial_transcriptomics_activity_files/additional_activity_files_for_home/meylan_etal_2022_tumor_grade.csv'

Finally, create an STlist. The STlist is the R object in spatialGE that holds the data and results:

tls_stlist = STlist(rnacounts=counts_fp, samples=meta_file)

Filtering your data

Each ST technology will require different filtering parameters. Compared to single-cell ST, spot-level ST (e.g., Visium), tends to yield more counts per spot. Even among spot-level ST projects, these parameters will need adjustment considering the sequencing depth and cellularity (i.e., cells per area unit). For these reasons, the values used here should not be taken as “golden rule”, but rather, users are encouraged to try different parameters and see what filtering procedure produces the most “noise” reduction without loosing too much relevant information. spatialGE provides statistics and plots to help the user assess the effect of filtering.

Go to the “Filter data” tab.
Click “Filter spots/cells”.
Enter the minimum number of counts a spot needs to have to be kept in the data set. In this case, 500 will be input.
Enter the minimum number of genes a spot needs to have to be kept in the data set. In this case, 100 will be input.
Click the “Mitochondrial genes (^MT-)” box to filter spots by mitochondrial gene content. Keep in mind that some ST platforms do not quantify mitochondrial genes.
Enter the maximum percentage of mitochondrial counts. Use 20% in this case.

Homework: Performing gene-level filtering

Users can also remove genes with low number of counts. This is advisable in most cases. However, since ST features a high gene dropout (i.e., genes that the technology fails to quantify), imposing a filter too stringent might lead to keep very little usable information in the data set.

To perform a gene count filter:

Now, to filter out genes, click “Filter genes”.
Filter out genes with less than 100 counts.
Filter out genes expressed in less than 20 spots.

Advanced task: Filtering data using the command-line version of spatialGE

To achieve the same spot- and gene level filtering results as those obtained with the web application, users can run the following command in the R console.

tls_stlist = filter_data(tls_stlist,
                         spot_minreads=500,
                         spot_mingenes=100,
                         spot_maxpct=0.2,
                         gene_minreads=100,
                         gene_minspots=20)

Once you have all the filter settings as you’d like click the blue “APPLY FILTER” button.
Users can also download a “parameter file”, which contains the filtering settings used for reproducibility. To do this, locate the “Download parameter log” link below the “APPLY FILTER” button.

Visualize filtering results

Count distributions

Click “Violin plots” to visualize count distribution after filtering.
Currently, “total_counts” and “total_genes” per spot can be visualized.
When changing the variable to plot, click the blue “GENERATE PLOTS” button to update.

Advanced task: Create count distribution plots with the command-line version of spatialGE

spatialGE::distribution_plots(tls_stlist, , plot_meta='total_counts', plot_type='violin')

Homework: Visualizing gene counts in spatial context

Quilt plot

The quilt plot tab within the QC and Data Transformation module allows visualization of the counts or detected genes per spot. This functionality might be useful to assess the localization of areas with low cellularity or necrotic.

Click Quilt plot to visualize the total number of genes or counts per spot and their spatial context.
Select total_counts.
Select one sample underneath the First sample dropdown menu.
And select a second sample to compare to underneath the Second sample drop-down menu.
Click blue “GENERATE PLOTS” button to create the plot.

Normalize Data

Click the “Normalize data” tab.
Click “Use SCTransform” to apply Seurat’s normalization method.
Click the blue “NORMALIZE DATA” to start normalization.

Homework: Assessing data normalization results at the gene level

The spatialGE web app allows users to assess the distribution of counts per spot or cell for specific genes. To do so, follow these steps:

The distribution of counts per spot for a given gene can also be plotted. For example, MAP2K2. When querying a gene, keep in mind that the query is case-sensitive. Since these are human samples, use all-upper case letters.
Click “GENERATE PLOTS” to show the number of MAP2K2 counts per spot.

Advanced task: Data normalization using the command-line version of sptialGE

# Perform data normalization using SCTransform
# NOTE: To use log-transformation set `method='log'`
tls_stlist = transform_data(tls_stlist, method='sct')

Visualization

Gene expression comparative visualization

Click the Visualization module on the left side menu.
You can search for your favorite gene in the Search and select genes menu. For this example query and click IGKC.
Also query and click MS4A1 gene.
Lastly, also query and click COL1A1.
Click blue “GENERATE PLOTS” button to create the plot.

Images can be exported in multiple formats (PNG/SVG/PDF).

Homework: Generation of gene expression surfaces

Gene expression surface (“kriging”)

Users can also generate a “expression surface” to visualize expression of a gene in the spatial context. This is a type of plot where expression values are inferred for the spaces not quantified between spots.

Click the “Expression surface” tab.
In the Search and select genes menu search and select IGKC.
Click “ESTIMATE SURFACES” button to create the plot.

Advanced task: Create spatial gene expression plots using the command-line version of spatialGE

STplot(tls_stlist, genes=c('IGKC', 'MS4A1', 'COL1A1'))

Spatial Domain Detection

Click the “Spatial domain detection” on the left side menu.
Now in the Number of domains slider put 3 to 5 domains will be detected in the samples. This is how many clusters will attempt to be identified.
For Number of most variable genes to use choose 3000 with high variation will be used to detect the domains.
Finally, click “RUN STCLUST” to find clusters.
Explore the results by clicking each K= tab.

Images can be exported in multiple formats (PNG/SVG/PDF).

Advanced task: Domain detection using the command-line version of spatialGE

tls_stlist = STclust(tls_stlist, ks=c(3:5))

# Plot the detected domains
STplot(tls_stlist, ks=3:5)