BioDIGS: Biosynthetic Gene Clusters
October 16, 2024
About this Book
This is a companion training guide for BioDIGS, a GDSCN project that brings a research experience into the classroom. In this module, students will explore microbiome data for the presence of biosynthesis genes. They will run analyses using antiSMASH as implemented on Galaxy.
Visit the BioDIGS (BioDiversity and Informatics for Genomics Scholars) website here for more information about this collaborative, distributed research project, including how you can get involved!
The GDSCN (Genomics Data Science Community Network) is a consortium of educators who aim to create a world where researchers, educators, and students from diverse backgrounds are able to fully participate in genomic data science research. You can find more information about its mission and initiatives here.
0.1 Skills Level
The activities in this guide are written for undergraduate students and beginning graduate students.
Genetics
Beginner: some genetics knowledge needed
Programming skills
Novice: no programming experience needed
0.2 Platform
The activities in this guide are demonstrated on NHGRI’s AnVIL cloud computing platform. AnVIL is the preferred computing platform for the GDSCN. However, all of these activities can be done using your personal installation of R or using the online Galaxy portal.
Please check out our full collection of AnVIL and related resources: https://hutchdatascience.org/AnVIL_Collection/
0.3 Data
The data generated by the BioDIGS project is available through the BioDIGS website, as well as through an AnVIL workspace.
Data about the soil itself as well as soil metal content was generated by the Delaware Soil Testing Program at the University of Delaware. Sequences were generated by the Johns Hopkins University Genetic Resources Core Facility and by PacBio.