Data Science Lab Course Catalog

This section contains a list of self-service training materials and other courses.

Self-service Training Materials Overview

Course Name Description Link(s)
Cluster 101 Intro to using the Fred Hutch HPC cluster for new or experienced users. Provides a certification option. With Certification or Without Certification
Developing WDL Workflows How to strategically develop and scale up a WDL workflow that is iterative, reproducible, and efficient in terms of time and resource used. Course Link
Code Review Leading a lab with novice or experienced code writers and users? Either way, our Code Review guidance materials include helpful suggestions for various types of lab members, expertise and group dynamics. Course Link
NIH Data Sharing We have created and are actively developing a guide that walks you through the process of complying with the new 2023 NIH Data Sharing Policy. Guide Link
  • If you need/want to get a certification, please take the course through Leanpub at this link.

  • If you do not need the certification or want to bookmark the course for future reference, you can find the material at this link.

If you take this course and want to give us feedback or would like to learn more about it, you can share your thoughts in Slack in the #ask-dasl channel) or you can file an issue on the course’s GitHub repository.

Introduction to Git and GitHub

You will learn how to use Git, a version control system that is the primary means of doing reproducible and collaborative research. You will use Git from the command line to document the history of your code, create different versions of your code, and collaborate with others on your code using GitHub!

Targeted audience: Researchers who want to keep track the history of their code at a professional standard, and share it with an audience.

Prerequisites: Completion of Intro to Command Line or demonstrating competency.

Commitment: A 1.5 hour workshop.

Workshop content found here.

Intermediate R

You will continue to learn the fundamentals of R and programming, and work on a data science project from start to finish. You will learn how to load and clean messy data, use custom R packages and functions, and effectively scale up your analysis.

Targeted audience: This course is appropriate for those who understand the basics of R data analysis and want to expand their knowledge to tackle messy data and use custom tools.

Pre-requisites: Completion of Intro to R, or knowledge of subsetting vectors and dataframes, and performing simple analysis such as summarizing data. Commitment: 6 weekly 1.5 hour classes, with encouraged 1-2 hours of practice weekly.

Course content found here.

Developing WDL Workflows

This course is intended for first time developers of the WDL workflow language, who wants to iteratively develop a WDL bioinformatics workflow. It shows a bioinformatics workflow developer how to strategically develop and scale up a WDL workflow that is iterative, reproducible, and efficient in terms of time and resource used. This guide is flexible regardless of where the data is, what computing resources are being used, and what software is being used. In order to use this guide the audience should be able to comprehend introductory WDL syntax, and should be able to run a WDL workflow on a computing engine of their choice, such as Cromwell, miniWDL, or a cloud computing environment such as Terra, AnVIL, or Dockstore.

Code Review

Leading a lab with novice or experienced code writers and users? Either way, see our Code Review materials that include helpful suggestions for various types of lab members, expertise and group dynamics.

If you take this course and want to give us feedback or would like to learn more about it, you can share your thoughts in Slack in the #ask-dasl channel) or you can file an issue on the course’s GitHub repository.

NIH Data Sharing

We have created and are actively developing a guide you can find here that walks you through the process of complying with the new 2023 NIH Data Sharing Policy. We have also created the DMS Helper App to make filling in and downloading your data sharing plan easier. >If you take this course and want to give us feedback or would like to learn more about it, you can share your thoughts in Slack in the #ask-dasl channel) or you can fill out our Google Feedback Form.

Full List of Self-Service Training Resources

FH DaSL staff have developed many training resources as part of collaborations and efforts. Various resources spanning a wide range of data science and tool-specific topics that we have previously developed are available from the sources listed below.

(in alphabetical order)

AnVIL

AnVIL is a computing platform that enables researchers to analyze controlled access data sets in a cloud computing environment. It has loads of training materials to support those using it!

Code Review Guidance for Research Labs

Leading a lab with novice or experienced code writers and users? Either way, see our Code Review materials that include helpful suggestions for various types of lab members, expertise and group dynamics.

DataTrail

The DataTrail courses are free and designed to help those with less familiarity with computers and technology become savvy data scientists. It includes the technological data science fundamentals but also information on how to network and other accompanying and necessary skills for jobs in data science.

ITCR Training Network

The ITCR Training Network is an effort to catalyze cancer informatics research through training opportunities. It has online courses that are available for free and/or for certification, but also hosts synchronous training events and workshops related to data science in cancer research. Links to all the current ITCR courses can be found here

Johns Hopkins Data Science Courses

There are a lot of helpful resources for data science that we made as a part of Johns Hopkins. These courses cover various applications and tools of data science, mostly focused on using R and the Tidyverse.

Open Case studies

The Open Case Studies project can be used by educators and learners alike to help people learn how apply data science to real-life data.