Chapter 4 Run the Workflow

4.1 Start the Run

Once you have saved the inputs and outputs, click on RUN ANALYSIS.

The RUN ANALYSIS button is highlighted.

4.2 Monitor the Run

Navigate to the JOB HISTORY tab. You can see your most recent submissions in table form. Click on the most recent submission. Notice how each sample gets its own job. This keeps the whole process speedy!

The page reflects submission IDs and time submitted. Several runs have launched for each sample. The workflow status is 'Running' for all 8 samples.

4.3 Inspect the Run Results

You should see the status change to “Succeeded” if everything completed correctly.

Workflow status is now succeeded for all 8 samples, with a check mark.

After 24 hours, you can also see the costs associated with each run under “Run Cost”.

4.4 Inspecting Other Run Files

It can be helpful to look at intermediate files on Google Cloud Platform, especially if runs did not complete successfully. You can view these files by clicking on the folder icon for “Execution directory”.

The folder icon indicating execution directory besides the workflow job for one sample is highlighted.

For each run, you can see a number of associated files, including the output .fq files and log files.

Files for the workflow on GCP are shown. There are two fastq files that were created as well as 3 log files that are relevant for debugging, including sample_file.log, stderr, and stdout.

Click on stdout and/or stderr and “DOWNLOAD” to view the terminal output.

The DOWNLOAD button is highlighted.

Here is what your output in stdout might look like.

Terminal monospace text shows the following: Creating a subsampled file with 20000 lines. Created SRR10152993_1_subsample.fq. Subsampling paired-end reads for SRR10152993. Created SRR10152993_2_subsample.fq.

4.5 Confirming Results in the DATA tab

We can see that the subsample files have been linked in the DATA table “sample”.

On the DATA tab of the workspace, a new column has been created called read1_subsample. There are 8 files corresponding to each original file.

In our example, we have a mixture of single-end and paired-end reads, so only the paired-end samples get a second file.

On the DATA tab of the workspace, a new column has been created called read2_subsample. There are 4 files corresponding to each original file that had paired-end reads.

You should now be able to use the subset files for downstream applications!