About this Book

This is a companion training guide for BioDIGS, a GDSCN project that brings a research experience into the classroom. In this module, students will explore microbiome data for the presence of biosynthesis genes. They will run analyses using antiSMASH as implemented on Galaxy.

Landing page for the Biosynthetic Gene Cluster Activity.

Visit the BioDIGS (BioDiversity and Informatics for Genomics Scholars) website here for more information about this collaborative, distributed research project, including how you can get involved!

The GDSCN (Genomics Data Science Community Network) is a consortium of educators who aim to create a world where researchers, educators, and students from diverse backgrounds are able to fully participate in genomic data science research. You can find more information about its mission and initiatives here.

BioDIGS logo

0.1 Skills Level

The activities in this guide are written for undergraduate students and beginning graduate students.

Genetics
Beginner: some genetics knowledge needed

Programming skills
Novice: no programming experience needed

0.2 Platform

The activities in this guide are demonstrated on NHGRI’s AnVIL cloud computing platform. AnVIL is the preferred computing platform for the GDSCN. However, all of these activities can be done using your personal installation of R or using the online Galaxy portal.

Please check out our full collection of AnVIL and related resources: https://hutchdatascience.org/AnVIL_Collection/

0.3 Data

The data generated by the BioDIGS project is available through the BioDIGS website, as well as through an AnVIL workspace.

Data about the soil itself as well as soil metal content was generated by the Delaware Soil Testing Program at the University of Delaware. Sequences were generated by the Johns Hopkins University Genetic Resources Core Facility and by PacBio.