1 Overview

1.1 Background

Over the past 20 years, there has been tremendous growth in human genomics, with millions of human genomes sequenced and many more to come in the future. This data, along with other biomedical information, has the potential to greatly improve our understanding of healthy living and transform healthcare. To achieve these goals, we need new approaches to research that involve cloud computing, which is the only way to effectively share and analyze data at this scale. The traditional method of genomic data sharing through centralized data warehouses is becoming unsustainable and inefficient. It creates redundant infrastructure and administrative inefficiencies that make collaborative analysis challenging, especially as datasets increase in size. This approach also leads to high transfer and download costs and poses security and compliance risks.

The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-Space, or AnVIL, inverts the traditional model, providing a cloud environment for the analysis of large genomic and related datasets. By providing a secure, unified environment for data management and compute, AnVIL eliminates the need for data movement, allows for active threat detection and monitoring, and provides elastic, shared computing resources that can be acquired as needed.

AnVIL currently provides secure access to hundreds of thousands of NHGRI datasets spanning nearly 5 petabytes of data and is built from a set of established components that are accessed and connected through the AnVIL Portal at https://anvilproject.org/ . The AnVIL homepage provides researchers an entry point into these tools, along with documentation, data browser, and links to related resources.

1.2 About this Demo

The Welcome to AnVIL demo provides an overview of the AnVIL platform, its purpose, and how it can benefit researchers and the broader genomics community. We highlight some of the historic challenges of analyzing and sharing large-scale genomic data and describe how AnVIL provides a cloud-based solution for accessing and analyzing such data. We also cover key features and functionalities of AnVIL, including its compatibility with popular analysis tools/ Finally, we emphasize the importance of collaboration and community engagement in driving AnVIL’s development and future growth, providing resources to customize your journey with AnVIL.

1.3 Skills Level

Genetics

Novice: no genetics knowledge needed

Programming skills

Novice: no programming experience needed

1.4 Learning Objectives

  1. Understand the benefits of using cloud computing for genomic data analysis and the limitations of traditional data-sharing paradigms.

  2. Identify the main features and tools available on the AnVIL platform and their potential applications in genomics research.

  3. Learn how to navigate and access a Workspace, the starting point for a basic analysis using the AnVIL tools.

  4. Identify potential use cases for the AnVIL platform in your own research projects.

  5. Learn about additional resources to self-guide your discovery on AnVIL.