Bash for Bioinformatics

Author

Ted Laderas

Published

October 2, 2025

Bash for Bio

Who this course is for

  • Have you needed to align a folder of FASTA files and not know how to do it?
  • Do you want to automate an R or Python script you wrote to work on a bunch of files?
  • Do you want to do all of this on a high performance cluster ()?

If so, this course is for you! We will learn enough bash scripting to do useful things on the Fred Hutch computing cluster (affectionately called “gizmo”) and automate the boring parts.

Learning Objectives

  • Apply bash scripting to execute alignment, and Python/R scripts
  • Navigate and process data on the different filesystems available at FH
  • Manage software dependencies reproducibly using container-based technologies such as Docker/Apptainer containers or EasyBuild modules
  • Articulate basic HPC architecture concepts and why they’re useful in your work
  • Leverage bash scripting to execute on a high performance cluster.
  • Utilize workflow managers such as cromwell to process multiple files in a multi-step workflow.
NoteWasn’t there another Bash for Bioinformatics book?

I originally wrote a book that was called Bash for Bioinformatics, which was about learning enough bash to use the cloud-based DNANexus platform effectively.

I have renamed that book Bash for DNANexus, and named this course Bash for Bioinformatics.

This book shares bones with Bash for DNANexus, but has more of a focus on running tasks on high performance computing systems ().

Prerequisites

We will assume that you will do all of your work in your home directory on Rhino. We will not be using that much space in your home directory.

NoteTerminology

We know that not all of us have the same vocabulary. We try to define terminology as much as possible. These are indicated by double underlines such as this:

You can click and hold on the term to define it.

## Instructors / TAs

If you need to schedule some time to talk, please schedule with Ted.

  • Ted Laderas (Main Instructor), Director of Training and Community, Office of the Chief Data Officer
  • Taylor Firman (TA), Research Informatics Lead, Office of the Chief Data Officer
  • Scott Chamberlain (TA), Software Developer, Office of the Chief Data Officer
  • Chris Lo (TA), Data Science Trainer, Office of the Chief Data Officer

We all have experience running jobs on HPC. Please reach out if you have any questions.

Introductions

In chat, please introduce yourself:

  • Your Name & Your Group
  • What you want to learn in this course
  • Favorite Fall activity

Culture of the course

  • Learning on the job is challenging
    • I will move at learner’s pace; we are learning together.
    • Teach not for mastery, but teach for empowerment to learn effectively.

We sometimes struggle with our data science in isolation, unaware that someone two doors down from us has gone through the same struggle.

  • We learn and work better with our peers.
  • Know that if you have a question, other people will have it.
  • Asking questions is our way of taking care of others.

We ask you to follow Participation Guidelines and Code of Conduct.

Please note that this is the first time this course has been given - we have done our best to edit all mistakes out there, but there may be mistakes. So be patient and reach out if something isn’t working.

If you do find a mistake, please report it to Ted. I’ll add you to the acknowledgements below.

Schedule

Class is on Thursdays, 12:00 - 1:30 PM. There will be an office hour 1/2 hour after class if you need help.

You should complete the readings before class for weeks 3 and 4, so we can hit the ground running.

Week Date Topics Reading
Preclass Review Intro to Command Line and Cluster 101
Week 1 October 9 Filesystem Basics Bite Size Bash
Week 2 October 16 Writing and Running Bash Scripts Bite Size Bash
No Class October 23 OCDO Retreat
Week 3 October 30 Batch Processing and HPC Jobs HPC Basics
Week 4 November 6 Testing Scripts/Workflow Managers Container Basics
On your own time Testing Scripts
On your own time Configuring your Bash Shell

Reference Texts

  • We will be using Julia Evan’s Bite Size Bash as our reference text. Julia’s explanations are incredibly clear and it will be a valuable reference even beyond this course. The PDF is available in the Google Classroom materials. Please do not share with others - we have a group rate and it is only $12 for individual purchases.
  • If you want to know the true power of the command line, I recommend Data Science at the Command Line. This book showcases how much you can get done with just command line.

Badge of completion

We offer a badge of completion when you finish the course!

What it is:

  • A display of what you accomplished in the course, shareable in your professional networks such as LinkedIn, similar to online education services such as Coursera. A way for you to be accountable for your learning.

What it isn’t:

  • Accreditation through an university or degree-granting program.

Requirements:

  • Sign up on the badging spreadsheet (will send link out in class).
  • Complete badge-required sections of the exercises for 3 out of 4 assignments. We’ll cover this in class.

Acknowledgements

This course would not be live without the efforts of:

  • Emma Bishop
  • Scott Chamberlain
  • Taylor Firman
  • Chris Lo
  • Sitapriya Moorthi
  • Dan Tenenbaum

Thanking you for all your help.