Download the files for this activity clicking
here: https://github.com/fhdsl/ITN_Workshops_2024/archive/refs/heads/main.zip
Put this file on your desktop so it is easily
findable.
Double click the zip file (or right click and
choose “unzip” or “decompress” to unzip the file.
Open up your activity files you downloaded so we
can see what’s here.
We should see we have two samples, a metadata file, as well as a PDF of the manuscript that describes these data.
Each sample’s folder has an h5 file, and a spatial folder.
The README describes these samples:
The original data set results from triple-negative breast cancer (TNBC) tumor biopsies collected from a diverse patient cohort. Patients underwent different therapeutic regimes. The tumor biopsies were profiled with 10X Visium. The data set presented here is a subset of the original cohort.
Additionally the metadata describe which samples came from which patients with which treatments which looks like this:
samplename | patient | therapy |
---|---|---|
sample_093d | patient_12 | none |
sample_396c | patient_20 | adriamycin |
While two samples are provided, an additional two samples are available in the course’s material, which could be used with this activity walkthrough after the workshop.
Go to https://spatialge.moffitt.org/
Click on “Sign Up” in the upper right
corner.
Click the blue “New Project” button.
For
What spatial transcriptomics platform are you using for this project?
choose Visium
– this is the type of data our example data
are.
Make your own project name and description
that’s sensible. Could be something related to the workshop “ITN
workshop”.
Then click “Create”.
For each sample we will repeat the following steps to upload each sample’s set of files.
For Sample Name
put the ID
indicating on the folder, e.g. sample_093d
. This is very
important, as sample IDs need to match exactly the sample IDs in the
metadata file (example_clinical.csv
). Otherwise, no
metadata is imported.
For the Gene expression
box upload
the .h5
file
e.g. GSM6433590_093D_filtered_feature_bc_matrix.h5
. You can
upload files by dragging and dropping or by clicking on them to
navigate.
For the Coordinates
box upload the
.csv
file
e.g. GSM6433590_093D_tissue_positions_list.csv
.
For the Tissue image
box upload the
.png
file
e.g. GSM6433590_093D_tissue_hires_image.png
.
For the Scale factor
box upload the
.json
file
e.g. GSM6433590_093D_scalefactors_json.json
. The scaling
factor file is output automatically by the 10X Space Ranger
pipeline, and contains information to approximate the size of the tissue
image and the expression plots.
Once the above steps are done click the green
Import Sample
.
You can use this checklist to keep track as you upload and follow the steps for each sample.
sample_093d
data entered
sample_396c
data entered
Now return to the beginning of these steps to repeat the same steps for the other sample.
Now click
Add metadata manually
.
Click Add new metadata column
. Add
a column named patient
.
Click Add new metadata column
again. Add a column named therapy
.
You can reference the example_clinical.csv
file’s
contents to add these data for each sample:
samplename | patient | therapy |
---|---|---|
sample_093d | patient_12 | none |
sample_396c | patient_20 | adriamycin |
Add this sample_093d
corresponding patient
and therapy
information.
Add this sample_396c
corresponding
patient
and therapy
information.
Remember: The sample IDs in the metadata should match exactly the sample names used during file import.
You will not be able to edit this (unless you
start a new project completely) after you click
Import Data
. So make sure everything is as you intend and
then click Import Data
. This will take a little bit of
time. Note you can have it send you an email instead of waiting on the
page.
Each ST technology will require different filtering parameters. Compared to single-cell ST, spot-level ST (e.g., Visium), tends to yield more counts per spot. Even among spot-level ST projects, these parameters will need adjustement considering the sequencing depth and cellularity (i.e., cells per area unit). For these reasons, the values used here should not be taken as “golden rule”, but rather, users are eoncuraged to ytry different parameters and see what filtering procedure produces the most “moise” reduction withouth loosing too much relevant information. spatialGE provides statitsics and plots to help the user assess the effect of filtering.
Go to the “Filter data” tab.
Click “Filter spots/cells”.
Enter the minimum number of counts a spot needs
to have to be kept in the data set. In this case, 2000 will be
input.
Enter the minimum number of genes a spot needs
to have to be kept in the data set. In this case, 500 will be
input.
Click the “Mitochondrial genes (^MT-)” field to
filter spots by mitochondrial gene content. Keep in mind that some ST
platforms do not quantify mitochondrial genes.
Enter the maximum percentage of mitochondrial
counts. Use 20% in this case.
Now, to filter out genes, click “Filter
genes”.
Filter out genes with less than 2000
counts.
Filter out genes expressed in less than 20
spots.
Once you have all the filter settings as you’d
like click the blue “APPLY FILTER” button.
Users can also download a “parameter file”,
which contains the filtering settings used for reproducibility. To do
this, locate the “Download parameter log” link below the “APPLY FILTER”
button.
Click “Violin plots” to visualize count
distribution after filtering.
Currently, “total_counts” and “total_genes” per
spot can be visualized.
When changing the variable to plot, click the
blue “GENERATE PLOTS” button to update.
Click Quilt plot
to visualize
the total number of genes or counts per spot and their spatial
context.
Select total_counts
.
Select one sample underneath the
First sample
dropdown menu.
And select a second sample to compare to
underneath the Second sample
dropdown menu.
Click blue “GENERATE PLOTS” button to create the
plot.
Click the “Normalize data” tab.
Click “Use SCTransform” to apply Seurat’s
normalization method.
Click the blue “NORMALIZE DATA” to start
normalization.
The distribution of counts per spot for a given
gene can also be plotted. For example, MAP2K2. When querying a
gene, keep in mind that the query is case-sensitive. Since these are
human samples, use all-upper case letters.
Click “GENERATE PLOTS” to show the number of
MAP2K2 counts per spot.
Click the Visualization
module
on the left side menu.
You can search for your favorite gene in the
Search and select genes
menu. For this example query and
click IGKC.
Also query and click FN1 gene.
Lastly, also query and click
C1QA.
Click blue “GENERATE PLOTS” button to create the
plot.
Click the “Spatial domain detection” on the
left side menu.
Now in the Number of domains
slider
put 3 to 5 domains will be detected in the samples. This is how many
clusters will attempt to be identified.
For
Number of most variable genes to use
choose 3000 with high
variation will be used to detect the domains.
Finally, click “RUN STCLUST” to find
clusters.
Explore the results by clicking each
K=
tab.
Images can be exported in multiple formats.