Download the files for this activity clicking
here: https://github.com/fhdsl/Moffitt_ITN_Workshop/archive/refs/heads/main.zip
Put this file on your desktop so it is easily
findable.
Double click the zip file (or right click and
choose “unzip” or “decompress” to unzip the file.
Open up your activity files you downloaded so we
can see what’s there.
Within the folder, navigate to activity-files
and then
spatial_transcriptomics_activity_files
we should see a
metadata file (meylan_etal_2022_tumor_grade.csv
), a PDF of
the manuscript that describes these data, and a
visium_samples
folder that includes two samples.
Each sample’s folder contains a several files resulting from the
spaceranger
pipeline. However, we will use the following
files:
The README describes these samples:
These samples result from tumor biopsies collected from a patient cohort with clear cell renal cell carcinoma (ccRCC). The researchers profiled the samples with 10X Visium to study the gene expression in TLS in a spatial context.
The metadata file (meylan_etal_2022_tumor_grade.csv
)
contains two variables of interest. One is the tumor grade, and the
other is positivity for a tertiary lymphoid structure (TLS) as
ascertained using immunohistochemistry staining. The file looks like
this:
samplename | cohort | ID | pT | tls |
---|---|---|---|---|
sample_b_01 | IMM | b_1 | pT1 | pos |
sample_b_07 | IMM | b_7 | pT3 | neg |
Go to https://spatialge.moffitt.org/
Click on “Sign Up” in the upper right
corner.
Click the blue “New Project” button.
For
What spatial transcriptomics platform are you using for this project?
choose Visium
– this is the type of data our example data
are.
Make your own project name and description
that’s sensible. Could be something related to the workshop such as
“ITN-Moffitt workshop”.
Then click “Create”.
IMPORTANT: For each sample we will repeat the following steps to upload each sample’s set of files.
For Sample Name
put the ID
indicating on the folder, e.g. sample_b_01
. This is very
important, as sample IDs need to match exactly the sample IDs in the
metadata file (meylan_etal_2022_tumor_grade.csv
).
Otherwise, no metadata is imported.
For the Gene expression
box upload
the .h5
file
e.g. GSM5924046_frozen_b_1_filtered_feature_bc_matrix.h5
.
You can upload files by dragging and dropping or by clicking on them to
navigate.
For the Coordinates
box upload the
.csv
file
e.g. GSM5924046_frozen_b_1_tissue_positions_list.csv
.
For the Tissue image
box upload the
.png
file
e.g. GSM5924046_frozen_b_1_tissue_hires_image.png
.
For the Scale factor
box upload the
.json
file
e.g. GSM5924046_frozen_b_1_scalefactors_json.json
. The
scaling factor file is output automatically by the
10X Space Ranger
pipeline, and contains information to
approximate the size of the tissue image and the expression plots.
Once the above steps are done click the green
Import Sample
.
Now return to the beginning of these steps to repeat the same steps for the other sample.
You can use this checklist to keep track as you upload and follow the steps for each sample.
sample_b_01
data entered
sample_b_07
data entered
Now click
Option 1: Upload metadata file
.
Upload the
meylan_etal_2022_tumor_grade.csv
file. You can drag and
drop the file or by click on the (+) button to navigate.
Remember: The sample IDs in the metadata should match exactly the sample names used during file import.
NOTE: Make sure to upload all the samples before
clicking the Import Data
button. You will not be able to
edit the project (unless you start a new project completely) after you
click Import Data
.
Make sure everything is as you intend and
then click Import Data
.
This may take a little bit of time. Note you can have it send you an email instead of waiting on the page.
Each ST technology will require different filtering parameters. Compared to single-cell ST, spot-level ST (e.g., Visium), tends to yield more counts per spot. Even among spot-level ST projects, these parameters will need adjustment considering the sequencing depth and cellularity (i.e., cells per area unit). For these reasons, the values used here should not be taken as “golden rule”, but rather, users are encouraged to try different parameters and see what filtering procedure produces the most “noise” reduction without loosing too much relevant information. spatialGE provides statistics and plots to help the user assess the effect of filtering.
Go to the “Filter data” tab.
Click “Filter spots/cells”.
Enter the minimum number of counts a spot needs
to have to be kept in the data set. In this case, 500 will be
input.
Enter the minimum number of genes a spot needs
to have to be kept in the data set. In this case, 100 will be
input.
Click the “Mitochondrial genes (^MT-)” box to
filter spots by mitochondrial gene content. Keep in mind that some ST
platforms do not quantify mitochondrial genes.
Enter the maximum percentage of mitochondrial
counts. Use 20% in this case.
Once you have all the filter settings as
you’d like click the blue “APPLY FILTER” button.
Users can also download a “parameter file”,
which contains the filtering settings used for reproducibility. To do
this, locate the “Download parameter log” link below the “APPLY FILTER”
button.
Click “Violin plots” to visualize count
distribution after filtering.
Currently, “total_counts” and “total_genes” per
spot can be visualized.
When changing the variable to plot, click the
blue “GENERATE PLOTS” button to update.
Click the “Normalize data” tab.
Click “Use SCTransform” to apply Seurat’s
normalization method.
Click the blue “NORMALIZE DATA” to start
normalization.
Click the Visualization
module
on the left side menu.
You can search for your favorite gene in the
Search and select genes
menu. For this example query and
click IGKC.
Also query and click MS4A1 gene.
Lastly, also query and click
COL1A1.
Click blue “GENERATE PLOTS” button to create the
plot.
Images can be exported in multiple formats (PNG/SVG/PDF).
Click the “Spatial domain detection” on the
left side menu.
Now in the Number of domains
slider
put 3 to 5 domains will be detected in the samples. This is how many
clusters will attempt to be identified.
For
Number of most variable genes to use
choose 3000 with high
variation will be used to detect the domains.
Finally, click “RUN STCLUST” to find
clusters.
Explore the results by clicking each
K=
tab.
Images can be exported in multiple formats (PNG/SVG/PDF).