Data Science Lab Course Catalog
This section contains a list of self-service training materials from live training, self-service materials, and our grants-based training projects.
Training Materials from live training
We organize our live trainings by several categories.
Data Science Programming
Course Name | Description | Link |
---|---|---|
Intro to R | Researchers who want to get started with data analysis and visualizations. This course is appropriate for those who want to learn coding for the first time, or have explored programming and want to focus on fundamentals in R. | Course link |
Intermediate R | The course continues building programming fundamentals of R. You will learn how to make use of complex data structures, use custom functions built by other R users, creating your own functions, and how to iterate repeated tasks that scales naturally. | Course link |
Intro to Python | Researchers who want to get started with data analysis and visualizations. This course is appropriate for those who want to learn coding for the first time, or have explored programming and want to focus on fundamentals in Python. | Course link |
Intermediate Python | The course continues building programming fundamentals of Python. You will learn how to make use of complex data structures, iterate repeated tasks that scales naturally, and create your own functions. | Course link |
Intro to Databases | This course teaches how to connect and query databases using Structured Query Language (SQL). In particular, we will focus on querying data in a commonly used data model for storing patient data called OMOP. | Course link |
Scaleable Computing
Course Name | Description | Link |
---|---|---|
Intro to Command Line | Fluency in programming and data science requires using computer software from the Command Line, a text-based way of controlling the computer. You will learn how to interact and manipulate files, folders, and software. | Course link |
Cluster 101 | Fred Hutch maintains a high performance computing cluster specifically to support work that requires intensive computing. You will learn how to log-in and launch jobs on the cluster compute system | Course link |
Reproducible Research
Course Name | Description | Link |
---|---|---|
Intro to Git and GitHub | You will learn how to use Git and GitHub, a version control system that is the primary means of doing reproducible and collaborative research. | Course link |
Intermediate Git and GitHub | You will utilize Git and GitHub from the Command Line to understand the technical commands to conduct reproducible and collaborative research. | Course link |
Data4All
Course Name | Description | Link |
---|---|---|
Better Plots | Do you want your graphs and plots to be more effective in communicating your results to others? Come learn about the principles of data storytelling with visualizations. | Course link |
Better Spreadsheets | Do you want to take your work in spreadsheets? and with tables to the next level, and make it easy to collaborate with others? Come learn about tidy principles to format and organize your data. | Course link |
Better Tables | Are you working on tables for publication and you'd like to improve their presentation and readability? Join us for a 2-part workshop on presenting data effectively in tables. | Course link |
Self-service Training Materials Overview
Course Name | Description | Link |
---|---|---|
Developing WDL Workflows | How to strategically develop and scale up a WDL workflow that is iterative, reproducible, and efficient in terms of time and resource used. | Course Link |
Code Review | Leading a lab with novice or experienced code writers and users? Either way, our Code Review guidance materials include helpful suggestions for various types of lab members, expertise and group dynamics. | Course Link |
NIH Data Sharing | We have created and are actively developing a guide that walks you through the process of complying with the new 2023 NIH Data Sharing Policy. | Guide Link |
Developing WDL Workflows
This guide is intended for first time developers of the WDL workflow language, who wants to iteratively develop a WDL bioinformatics workflow. It shows a bioinformatics workflow developer how to strategically develop and scale up a WDL workflow that is iterative, reproducible, and efficient in terms of time and resource used. This guide is flexible regardless of where the data is, what computing resources are being used, and what software is being used. In order to use this guide the audience should be able to comprehend introductory WDL syntax, and should be able to run a WDL workflow on a computing engine of their choice, such as Cromwell, miniWDL, or a cloud computing environment such as Terra, AnVIL, or Dockstore.
Code Review
Leading a lab with novice or experienced code writers and users? Either way, see our Code Review materials that include helpful suggestions for various types of lab members, expertise and group dynamics.
If you take this course and want to give us feedback or would like to learn more about it, you can share your thoughts in Slack in the #ask-dasl channel) or you can file an issue on the course’s GitHub repository.
NIH Data Sharing
We have created and are actively developing a guide you can find here that walks you through the process of complying with the new 2023 NIH Data Sharing Policy. We have also created the DMS Helper App to make filling in and downloading your data sharing plan easier.
Full List of Self-Service Training Resources
FH DaSL staff have developed many training resources as part of collaborations and efforts. Various resources spanning a wide range of data science and tool-specific topics that we have previously developed are available from the sources listed below. For the full list of DaSL supported resources go here.
AnVIL
AnVIL is a computing platform that enables researchers to analyze controlled access data sets in a cloud computing environment. It has loads of training materials to support those using it!
Code Review Guidance for Research Labs
Leading a lab with novice or experienced code writers and users? Either way, see our Code Review materials that include helpful suggestions for various types of lab members, expertise and group dynamics.
DataTrail
The DataTrail courses are free and designed to help those with less familiarity with computers and technology become savvy data scientists. It includes the technological data science fundamentals but also information on how to network and other accompanying and necessary skills for jobs in data science.
ITCR Training Network
The ITCR Training Network is an effort to catalyze cancer informatics research through training opportunities. It has online courses that are available for free and/or for certification, but also hosts synchronous training events and workshops related to data science in cancer research. Links to all the current ITCR courses can be found here
Johns Hopkins Data Science Courses
There are a lot of helpful resources for data science that we made as a part of Johns Hopkins. These courses cover various applications and tools of data science, mostly focused on using R and the Tidyverse.
Open Case studies
The Open Case Studies project can be used by educators and learners alike to help people learn how apply data science to real-life data.