Bash for Bioinformatics
Bash for Bio
Who this course is for
- Have you needed to align a folder of FASTA files and not know how to do it?
- Do you want to automate an R or Python script you wrote to work on a bunch of files?
- Do you want to do all of this on a high performance cluster ()?
If so, this course is for you! We will learn enough bash scripting to do useful things on the Fred Hutch computing cluster (affectionately called “gizmo”) and automate the boring parts.
Learning Objectives
- Apply bash scripting to execute alignment, and Python/R scripts
- Navigate and process data on the different filesystems available at FH
- Articulate basic HPC architecture concepts and why they’re useful in your work
- Leverage bash scripting to execute jobs on a high performance cluster.
- Utilize workflow managers such as
cromwell
to process multiple files in a multi-step workflow. - Manage software dependencies reproducibly using container-based technologies such as Docker/Apptainer containers or EasyBuild modules
I originally wrote a book that was called Bash for Bioinformatics, which was about learning enough bash to use the cloud-based DNANexus platform effectively.
I have renamed that book Bash for DNANexus, and named this course Bash for Bioinformatics.
This book shares bones with Bash for DNANexus, but has more of a focus on running tasks on high performance computing systems ().
Prerequisites
- You will need an account on
rhino
and know how to connect to it through VPN. If you have taken the Intro to Fred Hutch Cluster Computing workshop, then you will be ready. - We highly recommend reviewing Intro to Command Line and Intro to Fred Hutch Cluster Computing.
- Basic knowledge of the following commands:
ls
cd
and basic directory navigationmv
/cp
/mkdir
/rm
We will assume that you will do all of your work in your home directory on Rhino. We will not be using that much space in your home directory.
We know that not all of us have the same vocabulary. We try to define terminology as much as possible. These are indicated by double underlines such as this:
You can click and hold on the term to define it.
Schedule
You should have completed the readings before class, so we can hit the ground running.
Week | Topics | Reading |
---|---|---|
Preclass | Review Intro to Command Line and Cluster 101 | |
Week 1 | Filesystem Basics | Bite Size Bash |
Week 2 | Writing and Running Bash Scripts | Bite Size Bash |
Week 3 | Batch Processing and HPC Jobs | HPC Basics |
Week 4 | Testing Scripts/Workflow Managers | Container Basics |
On your own time | Testing Scripts | |
On your own time | Configuring your Bash Shell |
Reference Texts
- We will be using Julia Evan’s Bite Size Bash as our reference text. Julia’s explanations are incredibly clear and it will be a valuable reference even beyond this course. You will receive the PDF as part of class, and I will refer to it throughout the course.
- If you want to know the true power of the command line, I recommend Data Science at the Command Line. This book showcases how much you can get done with just command line.