Data Science Courses

Spring Quarter Courses and Workshops

The Fred Hutch Data Science Lab (DaSL) is excited to launch its second year of biomedical data science training and learning communities! At DaSL, we believe that everyone, regardless of their educational background, can excel at data science.

For Spring Quarter of 2025, we are offering the following in-person and online courses and workshops in the following topics. Each course or workshop will have learning community sessions to extend your skills.

Note that the topics change per quarter.

For more information about the other courses we offer, look at our Course Catalog

Full List of Spring Quarter Courses and Workshops

This is a list of the course/workshop offerings for Spring Quarter.

Topic Name Type
Data Science Programming Intro to R Course
Data Science Programming Intro to Python Course
Data Science Programming Intro to SQL/Big Data Course
Data Science Programming Genomic Analysis with Bioconductor Course
Data4All Better Plots Workshop
Data4All Better Spreadsheets Workshop
Data4All AI for Coding Workshop
Scalable Computing Intro to Command Line Workshop
Scalable Computing Cluster 101 Workshop

Course Descriptions and Details for Spring Quarter

Note that all courses and workshops have a registration link in the description if they are still open. We do maintain a waiting list for each workshop/course.

Location/Teams Information and other information will be made available after registration.

Data Science Programming

Intro to R

Information  
Type Course
Dates April 22, April 29, May 6, May 13, May 20, May 27
Time Tuesdays 11:30-1:00 pm
Time Commitment 6 weeks of classes
Audience Researchers who want to do more with their data analyses and visualizations. This course is appropriate for those who want to learn coding for the first time, or have explored programming and want to focus on fundamentals in R.
Registration Register Link

In this course, you will learn the fundamentals of R, a statistical programming language, and use it to wrangle data for analysis and visualization. The programming skills you will learn are transferable to learn more about R independently and other high-level languages such as Python. At the end of the class, you will be reproducing analysis from a scientific publication!

Learning Objectives (LOs)

  • Analyze Tidy datasets in the R programming language via data wrangling, summary statistics, and visualization.
  • Describe how the R programming environment interpret complex expressions made out of functions, operations, and data structures, in a step-by-step way.
  • Apply problem solving strategies to debug broken code.

Introduction to Python

Information  
Type Course
Dates April 22, April 29, May 6, May 13, May 20, May 27
Time Tuesdays 2:00-3:30 PM PST
Commitment 6 weeks of classes and optional data-thon
Audience Researchers who want to do more with their data analyses and visualizations. This course is appropriate for those who want to learn coding for the first time, or have explored programming and want to focus on fundamentals in Python.
Registration Register Link
Course Website Website Link

You will learn the fundamentals of Python, a statistical programming language, and use it to wrangle data for analysis and visualization. The programming skills you will learn are transferable to learn more about Python. At the end of the class, you will be reproducing analysis from a scientific publication!

Learning Objectives:

  • Analyze Tidy datasets in the Python programming language via data subsetting, joining, and transformations.
  • Evaluate summary statistics and data visualization to understand scientific questions.
  • Describe how the Python programming environment interprets complex expressions made out of functions, operations, and data structures, in a step-by-step way.
  • Apply problem solving strategies to debug broken code.

Intro to SQL

Information  
Type Course
Dates April 24, May 8, 15, 22
Time Thursdays 1:00-2:30 PM PST
Commitment 4 weeks of classes
Audience Researchers and clinical support staff who need to extract and work with relevant data stored in the OMOP data model.
Registration Register Link

Data that we need to utilize and query is often stored in data sources such as databases or data warehouses. In this course, you will learn how to connect and query databases using Structured Query Language (SQL). In particular, we will focus on querying data in a commonly used data model for storing patient data called OMOP. By the end of this course, you will be prepared to construct complex queries to retrieve large data sets and automate these queries to produce automated reports and dashboards.

Learning Objectives

Explain data sources such as Databases and how to connect to them

  • Query data sources using database engines and Structured Query Language (SQL) to filter, join, summarize, and update data
  • Explain the OMOP data model and how it enables clinical data queries
  • Schedule queries to pull data from data sources on a regular basis

Genomic Analysis with Bioconductor

Information  
Type Course
Dates April 23, 30, May 7, 14, 21, 28
Time Wednesdays 1:00-2:30 PM PST
Commitment 6 weeks of class
Audience Researchers who want to learn the basics of genomic data analysis using genomic count data
Prerequiste Intro to R or equivalent
Registration Register Link

This course will introduce you to the basic data structures and genomic data analysis in R/Bioconductor. Specifically, we will focus on the basics of Bulk RNAseq analysis, including differential expression, annotation, and gene set analysis. We will also focus on loading data and metadata into data structures such as SummarizedExperiment. By the end of this course, you should be familiar with a basic RNAseq analysis workflow utilizing RNAseq count data.

Please note that this course requires the Intro to R course as a prerequisite, or an equivalent course with instructor permission. Please note that this course does not cover RNAseq workflows such as MultiQC and alignment.

Learning Objectives

  • Explain and Utilize Bioconductor data structures such as SummarizedExperiment to integrate metadata and assay data in your analysis
  • Explore and clean a RNAseq dataset, including QC analysis
  • Utilize Differential Expression analysis on an RNAseq dataset using Bioconductor Packages
  • Identify and Annotate Gene Sets for downstream analysis
  • Load data from RNAseq experiments into Bioconductor

Data4All

We all work with data in different ways. The DaSL Data4All workshops give you an opportunity to learn about data-related topics that are immediately applicable to your current position. There are no prerequisites for these courses. Everyone is welcome.

Attend multiple sessions and earn your Data4All badge to show others at FH and beyond that you work with data ethically and collaboratively.

Each Data4All workshop includes a list of DaSL training and resources to extend your own knowledgebase.

Better Plots

Information  
Type Workshop
Date April 21
Time 2:00-3:30 PM PST
Time Commitment 1 session
Audience Anyone who wants to communicate more effectively with plots
Registration Registration Link

Do you want your graphs and plots to be more effective in communicating your results to others? Come learn about the principles of data storytelling with visualizations. Data Storytelling is the art of communicating your message about data to others. There are effective techniques (decluttering, annotating, and highlighting) that you can use to make your visualizations more accessible and communicative.

This is a software-agnostic workshop, focusing on essential principles that can be applied to any visualization software. Hands-on examples will be demonstrated in both R and Python.

Learning Objectives

  • Utilize design principles to effectively present plots by decluttering and removing extraneous information
  • Utilize annotations and titles to get people to your conclusions faster
  • Utilize preattentive attributes and color to effectively highlight important information in your plots

Better Spreadsheets

Information  
Type Workshop
Dates May 12 (workshop)
Time Monday 2:00-3:30 PM PST
Time Commitment 1 session
Audience Researchers who want to collaborate more effectively with Excel Tables
Register Registration Link

Do you want to take your work in spreadsheets? and with tables to the next level, and make it easy to collaborate with others? Come learn about tidy principles to format and organize your data.

Demonstrations and examples will be done via Google Sheets.

Learning Objectives

  • Explain and utilize tidy principles to effectively organize your data
  • Format your data to effecitvely utilize it in analyses
  • Collaborate with data scientists by outputting data formats such as Comma Separated Values (CSVs)

AI for Coding

Information  
Type Workshop
Dates May 19
Time Monday 2:00-3:30 PM PST
Time Commitment 1 session
Audience Researchers and Clinical staff who want to use AI in their data work
Register Registration Link

AI tools such as Claude and ChatGPT-4 are powerful tools for coders of any level. In this workshop, we will focus on ways to to work with AIs in a reproducible and ethical manner, especially in light of protected health information (PHI). We will experiment with available AI tools at Fred Hutch.

Learning Objectives

  • Explain restrictions at Fred Hutch governing the ethical and responsible use of AI for coding
  • Explore the range of coding abilities of LLMs using test data
  • Describe an ethical framework for integrating LLMs in your coding work

Scalable Computing

Interested in working with the Fred Hutch computational resources, such as the gizmo cluster or the rhino machines? Come attend this series of workshops to build your skills with working with these computational resources.

A prerequisite for all workshops is to request an account with access to the FH network: directions are here.

Introduction to Command Line

Information  
Type Workshop
Dates April 28
Time Monday 2:00-3:30 PM PST
Prerequisites None
Time Commitment 1 session
Audience Researchers who want to utilize command-line tools such as Git/GitHub
Register Registration Link
Course Website Website

Fluency in programming and data science requires using computer software from the Command Line, a text-based way of controlling the computer. You will go on a guided under-the-hood tour behind the graphical interface we typically use: you will learn how to interact and manipulate files, folders, and software via the Command Line.

Learning Objectives

  • Describe when it is appropriate to use the Command Line and its pros and cons.
  • Analyze the components of a shell command: what are the possible inputs, arguments/options, and outputs, and where to find documentation for help.
  • Formulate directory tree addresses of interest using full and relative paths and file directory commands.
  • Formulate file operation commands for creating, moving, and delete files, including using the wildcard.

Cluster 101

Information  
Type Workshop
Dates May 5
Time Monday 2-3:30 PM PST
Time Commitment 1 session
Prerequisites Intro to Command Line or equivalent experience
Audience Researchers who want to use scientific software launched from the command line, want to use a high-performance cluster computing environment, or want to use a cloud computing environment.
Registration Registration Link

Many scientific computing tasks cannot be done locally on a personal computer due to constraints in computation, data, and memory. In this workshop, you will learn how to connect to the Fred Hutch SLURM high performance cluster to transfer files, load scientific software, compute interactively, and launch jobs!

Learning Objectives

  • Describe the architecture and filesystems on FH’s cluster.
  • Utilize the command line to log in to the FH cluster, and submit a simple job to run.
  • Understand different ways of requesting resources for a cluster job and utilize them for job submission.
  • Utilize already-installed software for job submission and log-in to interactive mode.
  • Describe how one would upload and download files from the FH cluster.