21 Overview
Every dataset tells a story, yet deciphering exactly how a figure was constructed can feel like solving a puzzle with missing pieces. Reproducibility is a common challenge. This mini-hackathon challenges teams to recreate figures from the MAGE RNA sequencing study using real omics data from the 1000 Genomes Project. Teams can also create additional visualizations from the open data. Through hands-on experience with cloud-based tools on AnVIL, participants will discover that computational reproducibility requires creativity, problem-solving, and detective work that go beyond simply following published protocols.
21.1 Skills Level
Genetics
Beginner: some genetics knowledge helpful
Programming skills
Beginner: some programming experience helpful
21.2 Learning Objectives
Set up and manage an R analysis environment on AnVIL using cloud-native tools
Import, reshape, and join multiple genomic data types (expression counts, sample metadata, variant calls, and genome annotations)
Import data into an AnVIL workspace from multiple sources
Apply core tidyverse operations including piping, filtering, joining, and mutation
Extract and visualize expression quantitative trait loci (eQTL) data
Reproduce figures from a peer-reviewed publication using open-access data
21.3 References:
Paper to reproduce here: https://pmc.ncbi.nlm.nih.gov/articles/PMC11291278/
Github repository companion for the paper here: https://github.com/mccoy-lab/MAGE
Data from the paper can be found here: https://zenodo.org/records/10535719
Original Workspace: https://anvil.terra.bio/#workspaces/anvil-outreach/demos-mage-minihack