21 Overview

Every dataset tells a story, yet deciphering exactly how a figure was constructed can feel like solving a puzzle with missing pieces. Reproducibility is a common challenge. This mini-hackathon challenges teams to recreate figures from the MAGE RNA sequencing study using real omics data from the 1000 Genomes Project. Teams can also create additional visualizations from the open data. Through hands-on experience with cloud-based tools on AnVIL, participants will discover that computational reproducibility requires creativity, problem-solving, and detective work that go beyond simply following published protocols.

21.1 Skills Level

Genetics

Beginner: some genetics knowledge helpful

Programming skills

Beginner: some programming experience helpful

21.2 Learning Objectives

  1. Set up and manage an R analysis environment on AnVIL using cloud-native tools

  2. Import, reshape, and join multiple genomic data types (expression counts, sample metadata, variant calls, and genome annotations)

  3. Import data into an AnVIL workspace from multiple sources

  4. Apply core tidyverse operations including piping, filtering, joining, and mutation

  5. Extract and visualize expression quantitative trait loci (eQTL) data

  6. Reproduce figures from a peer-reviewed publication using open-access data

21.3 References: