Download the files for this activity clicking
here: https://github.com/fhdsl/UMich_ITN_Workshop/archive/refs/heads/main.zip
Put this file on your desktop so it is easily
findable.
Double click the zip file (or right click and
choose “unzip” or “decompress” to unzip the file.
Open up your activity files you downloaded so we
can see what’s there.
Navigate to activity-files
and then
spatial_transcriptomics_activity_files
. Within the folder,
we should see a metadata file
(lee_etal_2024_sample_metadata.csv
), a PDF of the
manuscript that describes these data, and a visium_samples
folder that includes two samples.
Each sample’s folder contains a several files resulting from the
spaceranger
pipeline. However, we will use the following
files:
The README describes these samples:
These samples result from tumor biopsies collected from a patient cohort with gastric cancer (GC). The researchers profiled the samples with 10X Visium to study the gene expression in malignant, stromal, and immune cells within the GC tumor microenvironment.
The metadata file (lee_etal_2024_sample_metadata.csv
)
contains sample identifiers and classification into “intestinal” or
“diffuse” GC. The file looks like this:
sample_id | sample_id2 | type |
---|---|---|
sample_01675 | GC9 | Diffuse |
sample_00732 | GC3 | Intestinal |
Go to https://spatialge.moffitt.org/
Click on “Sign Up” in the upper right
corner.
Click the blue “New Project” button.
For
What spatial transcriptomics platform are you using for this project?
choose Visium
– this is the type of data our example data
are.
Make your own project name and description
that’s sensible. Could be something related to the workshop such as
“ITN-UMichigan workshop”.
Then click “Create”.
IMPORTANT: For each sample we will repeat the following steps to upload each sample’s set of files.
For Sample Name
put the ID
indicating on the folder, e.g. sample_00732
. This is very
important, as sample IDs need to match exactly the sample IDs in the
metadata file (lee_etal_2024_sample_metadata.csv
).
Otherwise, no metadata is imported.
For the Gene expression
box upload
the .h5
file
e.g. 21_00732_LI_SING_filtered_feature_bc_matrix.h5
. You
can upload files by dragging and dropping or by clicking on them to
navigate.
For the Coordinates
box upload the
.csv
file
e.g. 21_00732_LI_SING_tissue_positions_list.csv
.
For the Tissue image
box upload the
.png
file
e.g. 21_00732_LI_SING_tissue_hires_image.png
.
For the Scale factor
box upload the
.json
file
e.g. 21_00732_LI_SING_scalefactors_json.json
. The scaling
factor file is output automatically by the 10X Space Ranger
pipeline, and contains information to approximate the size of the tissue
image and the expression plots.
Once the above steps are done click the green
Import Sample
.
If you’ve only entered the first sample, click the blue “ADD NEW SAMPLE” button on the top left and then return to the beginning of these steps to repeat the same steps for the other sample.
Do not click the blue Import Data button on the bottom of the page until you’ve uploaded both samples and the associated metadata. Otherwise, you’ll have to create a new project to upload additional samples.
You can use this checklist to keep track as you upload and follow the steps for each sample.
sample_00732
data entered
sample_01675
data entered
Now click
Option 1: Upload metadata file
.
Upload the
lee_etal_2024_sample_metadata.csv
file. You can drag and
drop the file or by click on the (+) button to navigate.
Remember: The sample IDs in the metadata should match exactly the sample names used during file import.
NOTE: Make sure to upload all the samples before
clicking the Import Data
button. You will not be able to
edit the project (unless you start a new project completely) after you
click Import Data
.
Make sure everything is as you intend and
then click Import Data
.
This may take a little bit of time. Note you can have it send you an email instead of waiting on the page.
Each ST technology will require different filtering parameters. Compared to single-cell ST, spot-level ST (e.g., Visium), tends to yield more counts per spot. Even among spot-level ST projects, these parameters will need adjustment considering the sequencing depth and cellularity (i.e., cells per area unit). For these reasons, the values used here should not be taken as “golden rule”, but rather, users are encouraged to try different parameters and see what filtering procedure produces the most “noise” reduction without loosing too much relevant information. spatialGE provides statistics and plots to help the user assess the effect of filtering.
Go to the “Filter data” tab.
Click “Filter spots/cells”.
Enter the minimum number of counts a spot needs
to have to be kept in the data set. In this case, 500 will be
input.
Enter the minimum number of genes a spot needs
to have to be kept in the data set. In this case, 100 will be
input.
Click the “Mitochondrial genes (^MT-)” box to
filter spots by mitochondrial gene content. Keep in mind that some ST
platforms do not quantify mitochondrial genes.
Enter the maximum percentage of mitochondrial
counts. Use 20% in this case.
Once you have all the filter settings as
you’d like click the blue “APPLY FILTER” button.
Users can also download a “parameter file”,
which contains the filtering settings used for reproducibility. To do
this, locate the “Download parameter log” link below the “APPLY FILTER”
button.
Click “Violin plots” to visualize count
distribution after filtering.
Currently, “total_counts” and “total_genes” per
spot can be visualized.
When changing the variable to plot, click the
blue “GENERATE PLOTS” button to update.
Click the “Normalize data” tab.
Click “Use SCTransform” to apply Seurat’s
normalization method.
Click the blue “NORMALIZE DATA” to start
normalization.
Click the Visualization
module
on the left side menu.
You can search for your favorite gene in the
Search and select genes
menu. For this example query and
click CCL19.
Also query and click FN1 gene.
Lastly, also query and click
COL1A1.
Click blue “GENERATE PLOTS” button to create the
plot.
Images can be exported in multiple formats (PNG/SVG/PDF).
Click the “Spatial domain detection” on the
left side menu.
Now in the Number of domains
slider
put 3 to 5 domains will be detected in the samples. This is how many
clusters will attempt to be identified.
For
Number of most variable genes to use
choose 3000 with high
variation will be used to detect the domains.
Finally, click “RUN STCLUST” to find
clusters.
Explore the results by clicking each
K=
tab.
Images can be exported in multiple formats (PNG/SVG/PDF).
We encourage the users to run these more advanced analyses. These were not included due to time necessary to complete them. However, these are some of the advanced analysis types that can help in hypothesis generation using spatial transcriptomics data.
In spatialGE, inferring cell types on Visium data sets is achieved with STdeconvolve (Miller et al.2022). The STdeconvolve is performed in two stages:
The method attempts to fit a series of models composed of “latent topics”. Each latent topic represents a cell type, a cell state, or even a functional niche.
The latent topics are assigned a biological identity based on a list of reference genes. The assignments are obtained via gene set enrichment analysis (GSEA).
Click the “Phenotyping” module on the left
side menu.
To begin stage 1, select 7 to 10 topics in the
Fit LDA models with this many topics
slider. This is the
number of topics within each model: One model with 7 topics, another
with 8, another with 9, and one with 10 topics.
For Use this many variable genes
choose 5000 with high variation will be used to detect the
domains.
Finally, click “RUN LDA MODELS”.
To begin stage 2, select the “CellMarker
signatures (v2.0, Human-Cancer)” reference data set from the “Gene
signatures” drop-down.
Then, click “ASSIGN IDENTITIES” to begin
GSEA.
The test for spatial gene set enrichment in spatialGE uses a permutation-based approach to find if the distances among spots with high expression of a gene set are shorter than expected by chance. The spatial gene set enrichment test produces a p-value for each gene set, with a low p-value (typically < 0.05), indicating that there is evidence for the gene set to be spatially distributed as hot-spot(s). In spatialGE, a collection of gene sets including Hallmark and KEGG are available. Nonetheless, users can upload their customs data sets as well.
To start a spatial gene set enrichment test, follow these steps:
Click the “Spatial gene set enrichment”
module on the left side menu.
In the “Select/upload a gene set database”
dropdown, select the “HALLMARK - human” option.
Write “1000” in the “Permutations”
textbox.
Then, click “RUN STENRICH”.
In spatialGE, users can test if a gene shows higher (or lower) expression closer to a specific cluster or spatial domain (“spatial gradient”). The method is useful to investigate spatial patterns at the interface of spatial domains.
Using the steps previously outlined to detect spatial domains, it seems spatial domain 3 in the plot with title “stclust_spw0_k3”, is an immune-infiltrated area. With the following steps, users can test for genes with spatial expression gradients with respect to this immune region:
Click the “Spatial gradients” module on the
left side menu.
Using the table at the top, select all
samples.
Write “1000” in the “Number of most variable
genes to use” textbox.
From the “Annotation to test” drop-down, select
” STclust; Domains (k): 04; No spatial weight”.
From the “reference cluster” drop-down, select
“3”.
Then, click “RUN STGRADIENT”.
There is an additional bonus activity on Posit Cloud if you want to dive deeper
with using the spatialGE R
package. We will keep the Posit
Cloud account available to you for one week after the workshop. Please
download any of your files you’d like to keep before this week is
up!
See instructions on accessing the resources on Posit Cloud in the “Activity: Set Up Posit Cloud” section of the Clinical Workshop Activity page.
Additionally, the bonus
activity is available on GitHub and you can work through it on your
machine. You can download the Rmd file from there using the “Download
raw file” ()
button in the top right. If you encounter any issues installing
spatialGE (specifically its dependencies), we recommend using
BiocManager to try to install these dependencies (following the instructions
of sections 2.2 and 2.3 at this BiocManager resource).