Getting started with Spatial Transcriptomics

Get workshop files

Download the files for this activity clicking here: https://github.com/fhdsl/UMich_ITN_Workshop/archive/refs/heads/main.zip
Put this file on your desktop so it is easily findable.
Double click the zip file (or right click and choose “unzip” or “decompress” to unzip the file.
Open up your activity files you downloaded so we can see what’s there.

Get familiar with the data set

Navigate to activity-files and then spatial_transcriptomics_activity_files. Within the folder, we should see a metadata file (lee_etal_2024_sample_metadata.csv), a PDF of the manuscript that describes these data, and a visium_samples folder that includes two samples.

Each sample’s folder contains a several files resulting from the spaceranger pipeline. However, we will use the following files:

sample_00732
- 21_00732_LI_SING_filtered_feature_bc_matrix.h5 (the gene expression data)
- spatial
  - 21_00732_LI_SING_tissue_positions_list.csv (spot x,y spatial positions)
  - 21_00732_LI_SING_tissue_hires_image.png (The H&E-stained tissue image)
  - 21_00732_LI_SING_scalefactors_json (to scale the H&E image and plots)
sample_01675
- …

The README describes these samples:

These samples result from tumor biopsies collected from a patient cohort with gastric cancer (GC). The researchers profiled the samples with 10X Visium to study the gene expression in malignant, stromal, and immune cells within the GC tumor microenvironment.

The metadata file (lee_etal_2024_sample_metadata.csv) contains sample identifiers and classification into “intestinal” or “diffuse” GC. The file looks like this:

sample_id	sample_id2	type
sample_01675	GC9	Diffuse
sample_00732	GC3	Intestinal

Create an account with spatialGE

Go to https://spatialge.moffitt.org/
Click on “Sign Up” in the upper right corner.

Starting a new project

Click the blue “New Project” button.
For What spatial transcriptomics platform are you using for this project? choose Visium – this is the type of data our example data are.
Make your own project name and description that’s sensible. Could be something related to the workshop such as “ITN-UMichigan workshop”.
Then click “Create”.

Uploading the dataset

IMPORTANT: For each sample we will repeat the following steps to upload each sample’s set of files.

Uploading one sample’s data

For Sample Name put the ID indicating on the folder, e.g. sample_00732. This is very important, as sample IDs need to match exactly the sample IDs in the metadata file (lee_etal_2024_sample_metadata.csv). Otherwise, no metadata is imported.
For the Gene expression box upload the .h5 file e.g. 21_00732_LI_SING_filtered_feature_bc_matrix.h5. You can upload files by dragging and dropping or by clicking on them to navigate.
For the Coordinates box upload the .csv file e.g. 21_00732_LI_SING_tissue_positions_list.csv.
For the Tissue image box upload the .png file e.g. 21_00732_LI_SING_tissue_hires_image.png.
For the Scale factor box upload the .json file e.g. 21_00732_LI_SING_scalefactors_json.json. The scaling factor file is output automatically by the 10X Space Ranger pipeline, and contains information to approximate the size of the tissue image and the expression plots.

Once the above steps are done click the green Import Sample.

If you’ve only entered the first sample, click the blue “ADD NEW SAMPLE” button on the top left and then return to the beginning of these steps to repeat the same steps for the other sample.

Do not click the blue Import Data button on the bottom of the page until you’ve uploaded both samples and the associated metadata. Otherwise, you’ll have to create a new project to upload additional samples.

You can use this checklist to keep track as you upload and follow the steps for each sample.

sample_00732 data entered
sample_01675 data entered

Adding metadata

Now click Option 1: Upload metadata file.
Upload the lee_etal_2024_sample_metadata.csv file. You can drag and drop the file or by click on the (+) button to navigate.

Homework: Manually adding sample metadata

Metadata can also be added manually. To do so, follow these steps:

Click Option 2: Add metadata manually.
Click Add new metadata column. Add a column named sample_id2.
Click Add new metadata column again. Add a column named type.

You can refer to the lee_etal_2024_sample_metadata.csv file’s contents to add these data for each sample:

sample_id	sample_id2	type
sample_01675	GC9	Diffuse
sample_00732	GC3	Intestinal

Add sample_01675 corresponding sample_id2 and type information.
Add sample_00732 corresponding sample_id2 and type information.

Remember: The sample IDs in the metadata should match exactly the sample names used during file import.

After you’ve entered the data and metadata:

NOTE: Make sure to upload all the samples before clicking the Import Data button. You will not be able to edit the project (unless you start a new project completely) after you click Import Data.

Make sure everything is as you intend and then click Import Data.

This may take a little bit of time. Note you can have it send you an email instead of waiting on the page.

Using the command-line: Loading data into spatialGE

Most of analyses in this step-by-step tutorial can also be performed with the spatialGE R package. To do so, start by installing the spatialGE package. You can run these commands on Posit Cloud with the link we sent you to avoid running installations.

# Install devtools if not already installed:
#install.packages('devtools')

devtools::install_github('FridleyLab/spatialGE')

Then load the spatialGE package:

library('spatialGE')

Now, specify the directory paths to the folders containing the data and the metadata file:

counts_fp = c('./activity_files/spatial_transcriptomics_activity_files/visium_samples/sample_00732/',
              './activity_files/spatial_transcriptomics_activity_files/visium_samples/sample_01675/')

meta_file = './activity_files/spatial_transcriptomics_activity_files/lee_etal_2024_sample_metadata.csv'

Finally, create an STlist. The STlist is the R object in spatialGE that holds the data and results:

gc_stlist = STlist(rnacounts=counts_fp, samples=meta_file)

Filtering your data

Each ST technology will require different filtering parameters. Compared to single-cell ST, spot-level ST (e.g., Visium), tends to yield more counts per spot. Even among spot-level ST projects, these parameters will need adjustment considering the sequencing depth and cellularity (i.e., cells per area unit). For these reasons, the values used here should not be taken as “golden rule”, but rather, users are encouraged to try different parameters and see what filtering procedure produces the most “noise” reduction without loosing too much relevant information. spatialGE provides statistics and plots to help the user assess the effect of filtering.

Go to the “Filter data” tab.
Click “Filter spots/cells”.
Enter the minimum number of counts a spot needs to have to be kept in the data set. In this case, 500 will be input.
Enter the minimum number of genes a spot needs to have to be kept in the data set. In this case, 100 will be input.
Click the “Mitochondrial genes (^MT-)” box to filter spots by mitochondrial gene content. Keep in mind that some ST platforms do not quantify mitochondrial genes.
Enter the maximum percentage of mitochondrial counts. Use 20% in this case.

Homework: Performing gene-level filtering

Users can also remove genes with low number of counts. This is advisable in most cases. However, since ST features a high gene dropout (i.e., genes that the technology fails to quantify), imposing a filter too stringent might lead to keep very little usable information in the data set.

To perform a gene count filter:

Now, to filter out genes, click “Filter genes”.
Filter out genes with less than 100 counts.
Filter out genes expressed in less than 20 spots.

Using the command-line: Filtering data with spatialGE

To achieve the same spot- and gene level filtering results as those obtained with the web application, users can run the following command in the R console.

gc_stlist = filter_data(gc_stlist,
                        spot_minreads=500,
                        spot_mingenes=100,
                        spot_maxpct=0.2,
                        gene_minreads=100,
                        gene_minspots=20)

Once you have all the filter settings as you’d like click the blue “APPLY FILTER” button.
Users can also download a “parameter file”, which contains the filtering settings used for reproducibility. To do this, locate the “Download parameter log” link below the “APPLY FILTER” button.

Visualize filtering results

Count distributions

Click “Violin plots” to visualize count distribution after filtering.
Currently, “total_counts” and “total_genes” per spot can be visualized.
When changing the variable to plot, click the blue “GENERATE PLOTS” button to update.

Using the command-line: Create count distribution plots with the spatialGE

spatialGE::distribution_plots(gc_stlist, plot_meta='total_counts', plot_type='violin')

Homework: Visualizing gene counts in spatial context

Quilt plot

The quilt plot tab within the QC and Data Transformation module allows visualization of the counts or detected genes per spot. This functionality might be useful to assess the localization of areas with low cellularity or necrotic.

Click Quilt plot to visualize the total number of genes or counts per spot and their spatial context.
Select total_counts.
Select one sample underneath the First sample dropdown menu.
And select a second sample to compare to underneath the Second sample drop-down menu.
Click blue “GENERATE PLOTS” button to create the plot.

Normalize Data

Click the “Normalize data” tab.
Click “Use SCTransform” to apply Seurat’s normalization method.
Click the blue “NORMALIZE DATA” to start normalization.

Homework: Assessing data normalization results at the gene level

The spatialGE web app allows users to assess the distribution of counts per spot or cell for specific genes. To do so, follow these steps:

The distribution of counts per spot for a given gene can also be plotted. For example, MAP2K2. When querying a gene, keep in mind that the query is case-sensitive. Since these are human samples, use all-upper case letters.
Click “GENERATE PLOTS” to show the number of MAP2K2 counts per spot.

Using the command-line: Data normalization using sptialGE

# Perform data normalization using SCTransform
# NOTE: To use log-transformation set `method='log'`
gc_stlist = transform_data(gc_stlist, method='sct')

Visualization

Gene expression comparative visualization

Click the Visualization module on the left side menu.
You can search for your favorite gene in the Search and select genes menu. For this example query and click CCL19.
Also query and click FN1 gene.
Lastly, also query and click COL1A1.
Click blue “GENERATE PLOTS” button to create the plot.

Images can be exported in multiple formats (PNG/SVG/PDF).

Homework: Generation of gene expression surfaces

Gene expression surface (“kriging”)

Users can also generate a “expression surface” to visualize expression of a gene in the spatial context. This is a type of plot where expression values are inferred for the spaces not quantified between spots.

Click the “Expression surface” tab.
In the Search and select genes menu search and select COL1A1.
Click “ESTIMATE SURFACES” button to create the plot.

Using the command-line: Create spatial gene expression plots using spatialGE

STplot(gc_stlist, genes=c('CCL19', 'FN1', 'COL1A1'))

Spatial Domain Detection

Click the “Spatial domain detection” on the left side menu.
Now in the Number of domains slider put 3 to 5 domains will be detected in the samples. This is how many clusters will attempt to be identified.
For Number of most variable genes to use choose 3000 with high variation will be used to detect the domains.
Finally, click “RUN STCLUST” to find clusters.
Explore the results by clicking each K= tab.

Images can be exported in multiple formats (PNG/SVG/PDF).

Using the command-line: Domain detection using spatialGE

gc_stlist = STclust(gc_stlist, ks=c(3:5))

# Plot the detected domains
STplot(gc_stlist, ks=3:5)

Homework: Other analysis types to explore

We encourage the users to run these more advanced analyses. These were not included due to time necessary to complete them. However, these are some of the advanced analysis types that can help in hypothesis generation using spatial transcriptomics data.

Phenotyping

In spatialGE, inferring cell types on Visium data sets is achieved with STdeconvolve (Miller et al.2022). The STdeconvolve is performed in two stages:

The method attempts to fit a series of models composed of “latent topics”. Each latent topic represents a cell type, a cell state, or even a functional niche.
The latent topics are assigned a biological identity based on a list of reference genes. The assignments are obtained via gene set enrichment analysis (GSEA).

Click the “Phenotyping” module on the left side menu.
To begin stage 1, select 7 to 10 topics in the Fit LDA models with this many topics slider. This is the number of topics within each model: One model with 7 topics, another with 8, another with 9, and one with 10 topics.
For Use this many variable genes choose 5000 with high variation will be used to detect the domains.
Finally, click “RUN LDA MODELS”.

To begin stage 2, select the “CellMarker signatures (v2.0, Human-Cancer)” reference data set from the “Gene signatures” drop-down.
Then, click “ASSIGN IDENTITIES” to begin GSEA.

Spatial gene set enrichment

The test for spatial gene set enrichment in spatialGE uses a permutation-based approach to find if the distances among spots with high expression of a gene set are shorter than expected by chance. The spatial gene set enrichment test produces a p-value for each gene set, with a low p-value (typically < 0.05), indicating that there is evidence for the gene set to be spatially distributed as hot-spot(s). In spatialGE, a collection of gene sets including Hallmark and KEGG are available. Nonetheless, users can upload their customs data sets as well.

To start a spatial gene set enrichment test, follow these steps:

Click the “Spatial gene set enrichment” module on the left side menu.
In the “Select/upload a gene set database” dropdown, select the “HALLMARK - human” option.
Write “1000” in the “Permutations” textbox.
Then, click “RUN STENRICH”.

Spatial gradient detection

In spatialGE, users can test if a gene shows higher (or lower) expression closer to a specific cluster or spatial domain (“spatial gradient”). The method is useful to investigate spatial patterns at the interface of spatial domains.

Using the steps previously outlined to detect spatial domains, it seems spatial domain 3 in the plot with title “stclust_spw0_k3”, is an immune-infiltrated area. With the following steps, users can test for genes with spatial expression gradients with respect to this immune region:

Click the “Spatial gradients” module on the left side menu.
Using the table at the top, select all samples.
Write “1000” in the “Number of most variable genes to use” textbox.
From the “Annotation to test” drop-down, select ” STclust; Domains (k): 04; No spatial weight”.
From the “reference cluster” drop-down, select “3”.
Then, click “RUN STGRADIENT”.

Using the command-line: Domain detection using spatialGE

stgradient_res = STgradient(gc_stlist, topgenes=1000,
                            samples='sample_00732',
                            annot='stclust_spw0.025_k4',
                            ref=4)

stgradient_res[['sample_00732']]

Additional Practice with spatialGE

There is an additional bonus activity on Posit Cloud if you want to dive deeper with using the spatialGE R package. We will keep the Posit Cloud account available to you for one week after the workshop. Please download any of your files you’d like to keep before this week is up!

See instructions on accessing the resources on Posit Cloud in the “Activity: Set Up Posit Cloud” section of the Clinical Workshop Activity page.

Additionally, the bonus activity is available on GitHub and you can work through it on your machine. You can download the Rmd file from there using the “Download raw file” () button in the top right. If you encounter any issues installing spatialGE (specifically its dependencies), we recommend using BiocManager to try to install these dependencies (following the instructions of sections 2.2 and 2.3 at this BiocManager resource).