Bash for Bioinformatics

Author

Ted Laderas

Published

July 9, 2025

Bash for Bio

Who this course is for

  • Have you needed to align a folder of FASTA files and not know how to do it?
  • Do you want to automate an R or Python script you wrote to work on a bunch of files?
  • Do you want to do all of this on a high performance cluster ()?

If so, this course is for you! We will learn enough bash scripting to do useful things on the Fred Hutch computing cluster (affectionately called “gizmo”) and automate the boring parts.

Learning Objectives

  • Apply bash scripting to execute alignment, and Python/R scripts
  • Navigate and process data on the different filesystems available at FH
  • Articulate basic HPC architecture concepts and why they’re useful in your work
  • Leverage bash scripting to execute jobs on a high performance cluster.
  • Utilize workflow managers such as cromwell to process multiple files in a multi-step workflow.
  • Manage software dependencies reproducibly using container-based technologies such as Docker/Apptainer containers or EasyBuild modules
Wasn’t there another Bash for Bioinformatics book?

I originally wrote a book that was called Bash for Bioinformatics, which was about learning enough bash to use the cloud-based DNANexus platform effectively.

I have renamed that book Bash for DNANexus, and named this course Bash for Bioinformatics.

This book shares bones with Bash for DNANexus, but has more of a focus on running tasks on high performance computing systems ().

Prerequisites

We will assume that you will do all of your work in your home directory on Rhino. We will not be using that much space in your home directory.

Terminology

We know that not all of us have the same vocabulary. We try to define terminology as much as possible. These are indicated by double underlines such as this:

You can click and hold on the term to define it.

Schedule

You should have completed the readings before class, so we can hit the ground running.

Week Topics Reading
Preclass Review Intro to Command Line and Cluster 101
Week 1 Filesystem Basics Bite Size Bash
Week 2 Writing and Running Bash Scripts Bite Size Bash
Week 3 Batch Processing and HPC Jobs HPC Basics
Week 4 Testing Scripts/Workflow Managers Container Basics
On your own time Testing Scripts
On your own time Configuring your Bash Shell

Reference Texts

  • We will be using Julia Evan’s Bite Size Bash as our reference text. Julia’s explanations are incredibly clear and it will be a valuable reference even beyond this course. You will receive the PDF as part of class, and I will refer to it throughout the course.
  • If you want to know the true power of the command line, I recommend Data Science at the Command Line. This book showcases how much you can get done with just command line.