Our Galaxy activity is a condensed tutorial based on the “Reference-based RNA-Seq data analysis” Galaxy Training Tutorial.
It uses data that is deposited on and available from zenodo, including subsampled data that will be quicker to work with. For more info on the data, checkout the tutorial linked above.
If you have files in your history
already, use the plus sign button on the top right of the history pane
to Create new history
.
Click the pencil button next to “Unnamed history”. Fill in the name with something descriptive/appropriate and add more detail a description to the annotation if you want. Click “Save”
Our History pane is empty and we’ll need to load data.
Why do we want sequencing reads and a reference genome? Why are there 4 files for sequencing reads?
https://zenodo.org/record/6457007/files/GSM461177_1_subsampled.fastqsanger
https://zenodo.org/record/6457007/files/GSM461177_2_subsampled.fastqsanger
https://zenodo.org/record/6457007/files/GSM461180_1_subsampled.fastqsanger
https://zenodo.org/record/6457007/files/GSM461180_2_subsampled.fastqsanger
This will open up an interactive panel for data upload:
fastqsanger
(Note the list includes both
fastqcsanger
and fastqsanger
where one is QC
and the other is just q. Select the one with just a q).D. melanogaster Aug. 2014 (BDGP Release 6 + ISO1 MT/dm6) (dm6)
This will open up an interactive panel:
2 PE fastqs
as the nameIn the green strips, there are 3 columns, for each fastqsanger pair, in the middle column we’ll edit the displayed name to be a more informative name.
https://zenodo.org/record/6457007/files/Drosophila_melanogaster.BDGP6.32.109_UCSC.gtf.gz
This will open up an interactive panel for data upload:
ottrpal::include_slide("https://docs.google.com/presentation/d/1kWsS23lOJxfbhE8jSdE92JWnEceUEYm5xovCczPbe-8/edit#slide=id.g281646704fe_0_59")
Paste the copied URL into the middle box.
Using the first dropdown menu on the top
(labeled “Auto-detect”), let’s select the filetype:
gtf
.
Using the second dropdown menu on the
top (labeled “unspecified (?)”), let’s select the reference organism:
D. melanogaster Aug. 2014 (BDGP Release 6 + ISO1 MT/dm6) (dm6)
Click the blue “Start” button in the bottom stretch of options.
Click the “Close” button at the end of the bottom stretch of options.
Now that we have all of the data uploaded, we’ll begin with some quality control analysis of the data. This is useful for verifying that the data is high quality, but also will benefit us when we run later steps/need to know info as inputs for the mapping tools (e.g., read size).
Flatten
into the search bar and
select the Flatten collection
tool. This will open the
Flatten collection tool in the middle pane.You can rename the output to a more informative name by
Click the blue “Save” button
On the top left of the page, using the
tool pane search bar, type Fastq
into the search bar and
select the FastQC
tool. This will open the FastQC tool in
the middle panel.
multi
into the search bar and select
the MultiQC
tool. This will open the MultiQC
tool in the middle pane.Which tool was used to generate logs?
question, use the
down arrow to see a list and scroll down until you see
FastQC
and select FastQC
.+ Insert FastQC output
button.FASTQC on collection __: RawData
data setOptionally, you can add a Report title near the bottom of the middle pane
Click the blue Run tool button in the upper right of the middle pane
Cut
into the search bar and select
the Cutadapt
tool. This will open the Cutadapt
tool in the middle pane.Single-end or Paired-end reads?
click the down arrow and
select Paired-end Collection
.2 PE fastqs
as the paired collection input, if not, select
it.Other Read Trimming Options
section and edit the
Quality cutoff(s) (R1)*
parameter. Enter a value of
20.Read Filtering Options
section and edit the
Minimum length (R1)
parameter. Enter a value of 20.Additional outputs to generate
checkbox section and check
the
Report: Cutadapt's per-adapter statistics. You can use this file with MultiQC
multi
into the search bar and select
the MultiQC
tool. This will open the MultiQC
tool in the middle pane.Results
section and
Which tool was used to generate logs
subsection, click the
down arrow and select Cutadapt/Trim Galore!
.Cutadapt on collection __: Report
data setFollow the steps in the Galaxy walkthrough to continue with mapping
Here are some other relevant tutorials from Galaxy: