Chapter 1 Background
One critical aspect of an undergraduate STEM education is hands-on research. Undergraduate research experiences enhance what students learn in the classroom as well as increase a student’s interest in pursuing STEM careers (Russell, Hancock, and McCullough 2007). It can also lead to improved scientific reasoning and increased academic performance overall (Buffalari et al. 2020). However, many students at underresourced institutions like community colleges, Historically Black Colleges and Universities (HBCUs), tribal colleges and universities, and Hispanic-serving institutions have limited access to research opportunities compared to their cohorts at larger four-year colleges and R1 institutions. These students are also more likely to belong to groups that are already under-represented in STEM disciplines, particularly genomics and data science (Canner et al. 2017; GDSCN 2022).
The BioDIGS Project aims to be at the intersection of genomics, data science, cloud computing, and education.
1.1 What is genomics?
Genomics broadly refers to the study of genomes, which are an organism’s complete set of DNA. This includes both genes and non-coding regions of DNA. Traditional genomics involves sequencing and analyzing the genome of individual species.
Metagenomics expands genomics to look at the collective genomes of entire communities of organisms in an environmental sample, like soil. It allows researchers to study not just the genes of culturable or isolated organisms, but the entirety of genetic material present in a given environment. By using genomic techniques to survey the soil microbes, we can identify everything in the soil, including microbes that no one has identified before.
We are doing both traditional genomics and metagenomics as part of BioDIGS.
1.2 What is data science?
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. It includes collecting, cleaning, and combining data from multiple databases, exploring data and developing statistical and machine learning models to identify patterns in complex datasets, and creating tools to efficiently store, process, and access large amounts of data.
1.3 What is cloud computing?
Cloud computing just means using the internet to get access to powerful computer resources like storage, servers, databases, networking tools, and specialized software programs. Instead of having to buy and maintain their own powerful computers, storage servers, and other systems, users can pay to use them through an internet connection as needed. Users only pay for what they need, when they actually use it, and professionals update and maintain the systems in large data centers. It is a particularly useful tool for researchers and students at smaller institutions with limited computational services, especially when working with complex databases.
The genome assembly and analyses for BioDIGS have been done using the NHGRI AnVIL cloud computing platform, as well as Galaxy.
1.4 Why soil microbes?
It can be challenging to include undergraduates in human genomic and health research, especially in a classroom context. Both human genetic data and human health data are protected data, which limits the sort of information students can access without undergoing specialized ethics training. However, the same sorts of data cleaning and analysis methods used for human genomic data are also used for microbial genomic data, which does not have the same sort of legal protections as human genetic data. This makes it ideal for training undergraduate students at the beginning of their careers and can be used to prepare students for future research in human genomics and health (Jurkowski, Reid, and Labov 2017). Additionally, the microbes in the soil can have big impacts on our health (Brevik and Burgess 2014).
1.5 Heavy metals and human health
Human activities that change the landscape can also change what sorts of inorganic and abiotic compounds we find in the soil, particularly increasing the amount of heavy metals (Yan et al. 2020). When cars drive on roads, compounds from the exhaust, oil, and other fluids might settle onto the roads and be washed into the soil. When we put salt on roads, parking lots, and sidewalks, the salts themselves will eventually be washed away and enter the ecosystem through both water and soil. Chemicals from factories and other businesses also leech into our environment. Previous research has demonstrated that in areas with more human activity, like cities, soils include greater concentrations of heavy metals than found in rural areas with limited human populations (Khan et al. 2023; Wang, Birch, and Liu 2022). Increased heavy metal concentrations also disproportionately affect lower-income and predominantly minority areas (Jones et al. 2022).
Research suggests that increased heavy metal concentration in soils has major impacts on the soil microbial community. In particular, increased heavy metal concentration is associated with an increase in soil bacteria that have antibiotic resistance markers (Gorovtsov, Sazykin, and Sazykina 2018; Nguyen et al. 2019; Sun, Xu, and Fan 2021).