Get workshop files

Download the files for this activity clicking here: https://github.com/fhdsl/ITN_Workshops_2024/archive/refs/heads/main.zip
Put this file on your desktop so it is easily findable.
Double click the zip file (or right click and choose “unzip” or “decompress” to unzip the file.
Open up your activity files you downloaded so we can see what’s here.

Get familiar with the data we have

We should see we have two samples, a metadata file, as well as a PDF of the manuscript that describes these data.

Each sample’s folder has an h5 file, and a spatial folder.

  • sample_093d
    • GSM6433590_093D_filtered_feature_bc_matrix.h5
    • spatial
      • GSM6433590_093D_scalefactors_json.json
      • GSM6433590_093D_tissue_hires_image.png
      • GSM6433590_093D_tissue_positions_list.csv
  • sample_396c …

The README describes these samples:

The original data set results from triple-negative breast cancer (TNBC) tumor biopsies collected from a diverse patient cohort. Patients underwent different therapeutic regimes. The tumor biopsies were profiled with 10X Visium. The data set presented here is a subset of the original cohort.

Additionally the metadata describe which samples came from which patients with which treatments which looks like this:

samplename patient therapy
sample_093d patient_12 none
sample_396c patient_20 adriamycin

While two samples are provided, an additional two samples are available in the course’s material, which could be used with this activity walkthrough after the workshop.

Create an account with SpatialGE

Go to https://spatialge.moffitt.org/
Click on “Sign Up” in the upper right corner.

Starting a new project

Click the blue “New Project” button.
For What spatial transcriptomics platform are you using for this project? choose Visium – this is the type of data our example data are.
Make your own project name and description that’s sensible. Could be something related to the workshop “ITN workshop”.
Then click “Create”.

Uploading the dataset

For each sample we will repeat the following steps to upload each sample’s set of files.

Uploading one sample’s data

For Sample Name put the ID indicating on the folder, e.g. sample_093d. This is very important, as sample IDs need to match exactly the sample IDs in the metadata file (example_clinical.csv). Otherwise, no metadata is imported.
For the Gene expression box upload the .h5 file e.g. GSM6433590_093D_filtered_feature_bc_matrix.h5. You can upload files by dragging and dropping or by clicking on them to navigate.
For the Coordinates box upload the .csv file e.g. GSM6433590_093D_tissue_positions_list.csv.
For the Tissue image box upload the .png file e.g. GSM6433590_093D_tissue_hires_image.png.
For the Scale factor box upload the .json file e.g. GSM6433590_093D_scalefactors_json.json. The scaling factor file is output automatically by the 10X Space Ranger pipeline, and contains information to approximate the size of the tissue image and the expression plots.

Once the above steps are done click the green Import Sample.

You can use this checklist to keep track as you upload and follow the steps for each sample.

sample_093d data entered
sample_396c data entered

Click here for additional datasets for post-workshop walkthrough.

sample_396a data entered
sample_397d data entered

Now return to the beginning of these steps to repeat the same steps for the other sample.

Adding metadata

Now click Add metadata manually.
Click Add new metadata column. Add a column named patient.
Click Add new metadata column again. Add a column named therapy.

You can reference the example_clinical.csv file’s contents to add these data for each sample:

samplename patient therapy
sample_093d patient_12 none
sample_396c patient_20 adriamycin

Add this sample_093d corresponding patient and therapy information.
Add this sample_396c corresponding patient and therapy information.

Click here for additional datasets for post-workshop walkthrough.
samplename patient therapy
sample_396a patient_19 pembrolizumab
sample_397d patient_22 taxotere

Add this sample_396a corresponding patient and therapy information.
Add this sample_397d corresponding patient and therapy information.

Remember: The sample IDs in the metadata should match exactly the sample names used during file import.

After you’ve entered the data and metadata:

You will not be able to edit this (unless you start a new project completely) after you click Import Data. So make sure everything is as you intend and then click Import Data. This will take a little bit of time. Note you can have it send you an email instead of waiting on the page.

Filtering your data

Each ST technology will require different filtering parameters. Compared to single-cell ST, spot-level ST (e.g., Visium), tends to yield more counts per spot. Even among spot-level ST projects, these parameters will need adjustement considering the sequencing depth and cellularity (i.e., cells per area unit). For these reasons, the values used here should not be taken as “golden rule”, but rather, users are eoncuraged to ytry different parameters and see what filtering procedure produces the most “moise” reduction withouth loosing too much relevant information. spatialGE provides statitsics and plots to help the user assess the effect of filtering.

Go to the “Filter data” tab.
Click “Filter spots/cells”.
Enter the minimum number of counts a spot needs to have to be kept in the data set. In this case, 2000 will be input.
Enter the minimum number of genes a spot needs to have to be kept in the data set. In this case, 500 will be input.
Click the “Mitochondrial genes (^MT-)” field to filter spots by mitochondrial gene content. Keep in mind that some ST platforms do not quantify mitochondrial genes.
Enter the maximum percentage of mitochondrial counts. Use 20% in this case.
Now, to filter out genes, click “Filter genes”.
Filter out genes with less than 2000 counts.
Filter out genes expressed in less than 20 spots.
Once you have all the filter settings as you’d like click the blue “APPLY FILTER” button.
Users can also download a “parameter file”, which contains the filtering settings used for reproducibility. To do this, locate the “Download parameter log” link below the “APPLY FILTER” button.

Visualize filtering results

Count distributions

Click “Violin plots” to visualize count distribution after filtering.
Currently, “total_counts” and “total_genes” per spot can be visualized.
When changing the variable to plot, click the blue “GENERATE PLOTS” button to update.

Quilt plot

Click Quilt plot to visualize the total number of genes or counts per spot and their spatial context.
Select total_counts.
Select one sample underneath the First sample dropdown menu.
And select a second sample to compare to underneath the Second sample dropdown menu.
Click blue “GENERATE PLOTS” button to create the plot.

Normalize Data

Click the “Normalize data” tab.
Click “Use SCTransform” to apply Seurat’s normalization method.
Click the blue “NORMALIZE DATA” to start normalization.
The distribution of counts per spot for a given gene can also be plotted. For example, MAP2K2. When querying a gene, keep in mind that the query is case-sensitive. Since these are human samples, use all-upper case letters.
Click “GENERATE PLOTS” to show the number of MAP2K2 counts per spot.

Visualization

Gene expression comparative visualization

Click the Visualization module on the left side menu.
You can search for your favorite gene in the Search and select genes menu. For this example query and click IGKC.
Also query and click FN1 gene.
Lastly, also query and click C1QA.
Click blue “GENERATE PLOTS” button to create the plot.

Click here for instructions on gene expression surfaces.

Expression surface

Alternatively, an “expression surface” can be generated. This is a type of plot where expression values are inferred for the spaces not quantified between spots

Click the “Expression surface” tab.
In the Search and select genes menu search and select IGKC.
Click “ESTIMATE SURFACES” button to create the plot.

Spatial Domain Detection

Click the “Spatial domain detection” on the left side menu.
Now in the Number of domains slider put 3 to 5 domains will be detected in the samples. This is how many clusters will attempt to be identified.
For Number of most variable genes to use choose 3000 with high variation will be used to detect the domains.
Finally, click “RUN STCLUST” to find clusters.
Explore the results by clicking each K= tab.

Images can be exported in multiple formats.