Chapter 1 Introduction Leaf


1.1 Learning Objectives

In this chapter we will:

1.2 Motivation

The Data Science for Environmental Health Short Course (DaSEH), developed at the Fred Hutch Data Science Lab, is a NIH Short Course program that provides open-source teaching materials for introductory environmental health data analyses in R. The intention of this guide is to provide instructors more information about how to make the most of our resources.

1.3 Reuse and Licensing

Creative Commons License


All of our materials are licensed with CC-BY-NC-SA 4.0.

This means you are free to:

  • Share — copy and redistribute the material in any medium or format
  • Adapt — remix, transform, and build upon the material

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • NonCommercial — You may not use the material for commercial purposes.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

1.4 Target Audience

This guide is intended for educators who are interested in using DaSEH materials for instruction.

Elements of our materials can be helpful for instructors who teach:

  • High school students
  • Undergraduate students
  • Graduate students
  • Professionals (researchers and public health practitioners)

And can be applied to courses or materials in:

  • Environmental Health
  • Data science
  • Statistics
  • Public health
  • Programming
  • Technical writing

… and more

1.5 Curriculum

This guide documents:

  • The DaSEH philosophy
  • The general structure of the short course
  • Various entry points to using the short course materials (including our hands-on lab exercises, homework ideas, and more)
  • Examples of how to use the materials in the classroom
  • Instructions on how to modify/adapt our materials
  • Guidelines for contributing to our materials or creating your own short course

1.6 DaSEH Philosophy

The DaSEH short course is an educational resource that educators can use for environmental health data science education. Our goal is teaching students how gain insights from data in real-world challenges about environmental health.

Most of DaSEH’s examples use real environmental health data from public sources in an effort to make the material more engaging and relevant.

Our materials focus on the decision making process of real data analyses, basic challenges of analyses, applications of data methods, and context for when to use different approaches. We emphasize best practices for mindful data cleaning, reproducibility, and science communication.

We also focus on troubleshooting and looking up documentation to support continued learning, rather than memorization of small coding details. For more information on our teaching philosophy see our paper about teaching R.

All DaSEH materials are open source and available at our GitHub Repository. We provide transparency about data sources and, where possible, how it was produced. We also cover data limitations and ethical considerations.

Our materials are modular and flexible enough for standalone/bonus assignments or full classroom use.

We hope you find our resources useful!

1.6.1 What problem are we addressing?

Despite a growing interest in data science and environmental health, instructors do not always have time to create related materials. These resources are intended to support instructors who want to bring environmental health data science education to their students. Ultimately, this will facilitate more research and scientific breakthroughs in this important field of public health.

1.7 DaSEH Short Course Resources

Illustration of DaSEH short course, where there are 4 main stages: 1. Online synchronous instruction with hads-on labs for week one and 2 covering topics in reproducibility, basic R, subsetting, summarizing, cleaning, manipulation visualization, statistics, and functions, 2. self-directed time for review and for finding data for codeathon project 3. Codeathon for 3 days in person coverign topics about GitHub, ethics, mapping, and appling what was learned earlier to a project. We also evaluate learner experiences and skills before the online course, after the online course, and after the codeathon

The DaSEH Short Course has 3 main stages: online instruction, self-directed learning, and the codeathon.

1.7.1 Stage 1: Online Instruction

In the first stage, learners participate in a synchronous online course where they watch didactic lectures and live demos on a particular topic, followed by hands-on lab activities.

Course instruction lasts for 9 days, with 3.5 hours each day. Each day includes breaks between sessions (breakdown shown below). Each day’s sessions cover one or two modules, described below.

online course schedule starting at 10:30am to 11:30 for the first session, followed by a break for 1- min then another session from 11:40 - 12:40, followed by another ten minute break, concluding with a session from 12:50 to 2 pm. Each session has a lecture followed by a lab to practice exercises relevant to the topic.

We recommend having two instructors and one TA. We have the instructors split up the module lectures for the following reasons:

  • to reduce instruction burden
  • to allow learners to ask questions via chat during the course without disruption
  • to provide learners with slightly different teaching styles/explanations among instructors

For labs, we provide learners with 3 options via Zoom Breakout Rooms:

  • Room 1: Instructor waits briefly for each question and then walks through the exercises with learners.
  • Room 2: Instructor allows learners time to work through all exercises. Students then share their work and challenges with guidance and troubleshooting from the instructor.
  • Room 3: A room where the teaching assistant helps those who need one-on-one support.

Overview of tasks for the personelle involved including two instructors and one teaching assistant. Each instructor teaches half of the lectures and answers chat questions while the other instructor lectures, meanwhile the TA helps look up additional resources and also helps answers chat questions. For Labs one instructor leads a group of learners by giving a brief amount of time for them to try on their own followed by walking through the steps together, the other instructor allows the learners more time to try the exercises on their own and then learners share their work and challenges with support from the instructor for added details and troubleshooting, the TA runs a lab room where learners can get one-on-one support.

Our modules cover the following topics:

Week one:

  • Reproducibility
  • Basic R
  • Subsetting Data
  • Summarizing Data
  • Cleaning Data

Week two:

  • Manipulation Data
  • Visualization of Data
  • Statistics
  • Functions

In Week one day one includes information about the course sturcture, best practices for learning, an overview of necessary jargon, context about R, Basic R for simple calculations and assigning objects, how to navigate RStudio, how to test and save code, how to view data, Day two includes practices for reproducibility including  cleaning the environment, using reports, reporting packages and versions, as well as inputing data like csv files tab delimited files and excel. Day 3 includes subsetting data, such as filtering rows and selecting columns, day 4 includes summarizing basic stats from data and grouping variables to do so, as well as information about data classes and how to transform them. The last day includes data cleaning topics, including dealing with missing data, recoding values, and making new values based on other columns

 Week two day 1 includes manipulating data to reshape it or join it, as well as using a point and click option to create visualizations, the next day incldues covering the basics of a famous package called ggplot2, extensions to make interactive plots, combined plots or labeled plots, as well as best practices for visualizing data,as well as an introduction to factors and how to reorder levels, the next day includes statistcs for correlation, t-tests and regression, followed by data output to save csv and Rda files, finally we end with writing functions to make work more efficient and to apply tasks to multiple columns of a dataframe.

Students have access to our website for course resources which includes lecture slides, lab exercises, cheatsheets to remind them about important functions, homework assignments, and additional resources.

1.7.2 Stage 2: Self-Directed Review and Project Planning

In this stage, learners can complete homework problems for extra practice, rewatch lecture and lab recordings, and catch up on other responsibilities before the codeathon (Stage 3). This stage lasts at least a week, giving learners time to digest and consolidate what they’ve learned.

Learners are also encouraged to find public data or bring their own, so that they can identify a research question for their project in the codeathon stage. Learners are provided with an example project and a template R Markdown file.

1.7.3 Stage 3: Codeathon

In the final stage of the short course, learners participate in a 3 day codeathon to apply what they learned. Learners choose a topic of interest, get in person assistance from peers and instructors, ask more in-depth questions, and learn about some more advanced topics in reproducibility and data ethics.

See this codeathon schedule for more details.

1.8 Getting Started

DaSEH materials are designed to be beginner friendly. Learners can work through our materials with no prerequisite knowledge in programming, environmental health, public health, or statistics. However, the appropriate technology, software, and a basic familiarity with R Studio is helpful. This section will detail what learners and instructors will need.

1.8.1 Technical Requirements

Our materials use the R statistical programming language for data analysis. R is available for Windows, Mac, and Linux and can be downloaded from the The Comprehensive R Archive Network (CRAN). While there is no specific R version requirement, we recommend that instructors have all learners use the same version (typically the most recent version of R).

Additional R packages are also required, including:

  • tidyverse
  • naniar
  • janitor
  • esquisse

Cheatsheets detailing how to use common packages and functions are available at the RStudio website.

Please note that to install R version 3.5 and higher you will need to have a web browser and a compatible operating system. See this link for more information.

1.8.2 RStudio

To work with R, we recommend the RStudio Integrated Development Environment. RStudio includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. RStudio is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux) or in a browser connected to RStudio Server. More information and program installation instruction are available at the RStudio website.

For reproducible data analyses and easy publishing of reports and presentations, we recommend using RMarkdown. More information about RMarkdown is also available at the RStudio website.

RStudio is one of the most widely used development environments among R programmers. Other options can be explored here.

1.8.3 GitHub

DaSEH is hosted on GitHub. GitHub is a website and cloud service that enables developers to store, manage, and track changes to their code. DaSEH uses GitHub for both development and distribution purposes. Users have complete access to all materials through our DaSEH GitHub repository. The repository contains all the materials needed for the course. We use this same repository to host our website for free via GitHub Pages.

The DaSEH GitHub repository includes:

  • Slides (in multiple formats)
  • Data used in the examples (that aren’t installed in R otherwise)
  • Cheatsheets to remind learners of packages
  • homework and lab files, including both keys and raw files

While experience with GitHub is not needed to access the materials, using GitHub can streamline downloading, modifying, and collaborating with us on our resources. An introduction on how to get started with GitHub is available at Happy Git.

1.9 Website Map

The DaSEH website has several relevant pages:

  1. Home: explains what the course is for: https://daseh.org/index.html

  2. Apply: links to a form to apply to DaSEH – The link to Google Forms changes periodically, but you can view a static version here

  3. Course Content

  4. Support

  5. Instructors

1.10 Feedback

We are continually working to improve the DaSEH resources to address learner and educator needs. Feedback is essential for this goal. If you use DaSEH materials, we would love to hear from you!

1.10.1 Survey

We appreciate feedback about all aspects of the DaSEH user experience including but not limited to technical challenges, potential improvements, or new ideas on how to use the our materials in the classroom. Students, instructors, or anyone engaging with DaSEH can leave open ended feedback here:

You can also add your name and email to the DaSEH Interest Form to be contacted about upcoming events and opportunities:

1.10.2 Feedback for this Guide

If you have feedback for this guide about how we can improve the content, please comment on our GitHub repository here.

1.11 Contact Us

If you have any questions, suggestions for improvement for individual resources, ideas, or thoughts, you can contact us through email at daseh at fredhutch.org.