Chapter 1 Quick Start

In this module, we’ll bring some metagenomic data into AnVIL.

This data comes from this BioProject, which collected soil samples to study bacterial communities in tallgrass prairie. Bacteria play an important role in this ecosystem, but can be changed by disturbance, management, and the presence of herbivores.

The SRA Data corresponding to this project is located here.

Microbiome diversity has many benefitial properties ranging soil and plant health.

You might hear new terms for moving data around in the cloud. Ingress is when data comes to you, similar to downloading a file or receiving an email with an attachment. Egress is sending the data to another resource, similar to uploading or sending an attached file via email. There is no fee for ingressing data to AnVIL from SRA.

1.1 Clone Workspace

Clone the Workspace https://anvil.terra.bio/#workspaces/anvil-outreach/SRA-data-on-AnVIL.

For this demo, we have given the cloned Workspace the name SRA-data-on-AnVIL-example.

1.2 Set Up Samples

Navigate to the WORKFLOWS Tab and select the SRA_Fetch Workflow.

Workflows tab with SRA_Fetch.

Select “Run workflow(s) with inputs defined by data table”.

'Run workflow(s) with inputs defined by data table' has been selected.

Set the “Select root entity type” to “sample” and click SELECT DATA.

Step 1 and 2 for setting up the Workflow.

On the Select Data popup, select only the first sample, SRR22375322, and click OK.

The first sample selected from the data table.

1.3 Launch Workflow

Click on the space underneath “Attribute” and select this.sample_id.

'this.sample_id' must be selected under the Workflow Attribute

Click SAVE.

The SAVE button is highlighted

You are ready to launch the Workflow! Click RUN ANALYSIS.

The RUN ANALYSIS button is highlighted

Voilà! Your Workflow is running.

Because the Workflow is happening in the cloud, you can close your browser or shut down your computer without interrupting the transfer.

The Workflow status page describes submission statistics and job status

1.4 Check Workflow

Click on the JOB HISTORY tab. You should see that the job status is “Done”. This might take a few minutes.

The check mark indicates the Workflow has completed successfully

1.5 Locate Data

Click on the DATA tab and click on the “sample” table on the left.

Navigate to the Files folder under the DATA tab

You should now see the file associated with the first sample!

The imported file is now visible in the sample table

1.6 Summary

  • Clone Workspace
  • Go to the WORKFLOWS tab
  • Select sample via data table (“Run workflow(s) with inputs defined by data table”)
  • Set the Attribute to this.sample_id
  • SAVE and RUN ANALYSIS
  • Go to DATA tab and click “sample” table to see file populated