3 Exercises

3.1 Launch Terra

Open anvilproject.org and click on “Launch” Terra

3.2 Clone HPRC Workspace

At anvil.terra.bio/#workspaces

  • Enter hprc in the search box
  • Click on the “Public” tab
  • Click on AnVIL_HPRC
  • Click on the circle with three vertical dots in the upper right corner and select “Clone”

3.3 Start a Cloud Environment

  • Click on the Environment Configuration (cloud icon)
  • Select Jupyter Settings
  • Scroll down and click “Create”

3.4 Find Tidbits

  • In the Dashboard tab, what are three types of sequencing data that are available?
  • In the Data tab participant table, what two superpopulations have the most participants?
  • In the Data tab sample table, how many samples lack any ilmn data?
  • In the Data tab assembly_sample table, what is the command to download the HG002 mat_fasta file?

3.5 Enter Terminal

  • In the Analysis tab, click on Terminal
  • Make a working copy of the HG002 mat_fasta
    • NOTE: Requester pays buckets require -u <google-project-id> [ref]
  • Examine file with ls -l and zcat *.fa.gz | head
gsutil cp 'gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/working/HPRC_PLUS/HG002/assemblies/year1_f1_assembly_v2_genbank/HG002.maternal.f1_assembly_v2_genbank.fa.gz' .

3.6 Shut Down

  • Click on the Environment Configuration (cloud icon)
  • Select Jupyter Settings
  • Scroll down and click “Delete Environment”
  • Select “Delete” after deciding to keep or delete your persistent disk
  • Click “hamburger” icon in the upper left, expand your name, select Cloud Environments and confirm no unnecessary resources are running