Bash for Bioinformatics
Bash for Bio
Who this course is for
- Have you needed to align a folder of FASTA files and not know how to do it?
- Do you want to automate an R or Python script you wrote to work on a bunch of files?
- Do you want to do all of this on a high performance cluster ()?
If so, this course is for you! We will learn enough bash scripting to do useful things on the Fred Hutch computing cluster (affectionately called “gizmo”) and automate the boring parts.
Learning Objectives
- Apply bash scripting to execute alignment, and Python/R scripts
- Navigate and process data on the different filesystems available at FH
- Manage software dependencies reproducibly using container-based technologies such as Docker/Apptainer containers or EasyBuild modules
- Articulate basic HPC architecture concepts and why they’re useful in your work
- Leverage bash scripting to execute on a high performance cluster.
- Utilize workflow managers such as
cromwellto process multiple files in a multi-step workflow.
I originally wrote a book that was called Bash for Bioinformatics, which was about learning enough bash to use the cloud-based DNANexus platform effectively.
I have renamed that book Bash for DNANexus, and named this course Bash for Bioinformatics.
This book shares bones with Bash for DNANexus, but has more of a focus on running tasks on high performance computing systems ().
Prerequisites
- You will need an account on
rhinoand know how to connect to it through VPN. If you have taken the Intro to Fred Hutch Cluster Computing workshop, then you will be ready. - We highly recommend reviewing Intro to Command Line and Intro to Fred Hutch Cluster Computing.
- Basic knowledge of the following commands:
lscdand basic directory navigationmv/cp/mkdir/rm
We will assume that you will do all of your work in your home directory on Rhino. We will not be using that much space in your home directory.
We know that not all of us have the same vocabulary. We try to define terminology as much as possible. These are indicated by double underlines such as this:
You can click and hold on the term to define it.
## Instructors / TAs
If you need to schedule some time to talk, please schedule with Ted.
- Ted Laderas (Main Instructor), Director of Training and Community, Office of the Chief Data Officer
- Taylor Firman (TA), Research Informatics Lead, Office of the Chief Data Officer
- Scott Chamberlain (TA), Software Developer, Office of the Chief Data Officer
- Chris Lo (TA), Data Science Trainer, Office of the Chief Data Officer
We all have experience running jobs on HPC. Please reach out if you have any questions.
Introductions
In chat, please introduce yourself:
- Your Name & Your Group
- What you want to learn in this course
- Favorite Fall activity
Culture of the course
- Learning on the job is challenging
- I will move at learner’s pace; we are learning together.
- Teach not for mastery, but teach for empowerment to learn effectively.
We sometimes struggle with our data science in isolation, unaware that someone two doors down from us has gone through the same struggle.
- We learn and work better with our peers.
- Know that if you have a question, other people will have it.
- Asking questions is our way of taking care of others.
We ask you to follow Participation Guidelines and Code of Conduct.
Please note that this is the first time this course has been given - we have done our best to edit all mistakes out there, but there may be mistakes. So be patient and reach out if something isn’t working.
If you do find a mistake, please report it to Ted. I’ll add you to the acknowledgements below.
Schedule
Class is on Thursdays, 12:00 - 1:30 PM. There will be an office hour 1/2 hour after class if you need help.
You should complete the readings before class for weeks 3 and 4, so we can hit the ground running.
| Week | Date | Topics | Reading |
|---|---|---|---|
| Preclass | Review Intro to Command Line and Cluster 101 | ||
| Week 1 | October 9 | Filesystem Basics | Bite Size Bash |
| Week 2 | October 16 | Writing and Running Bash Scripts | Bite Size Bash |
| No Class | October 23 | OCDO Retreat | |
| Week 3 | October 30 | Batch Processing and HPC Jobs | HPC Basics |
| Week 4 | November 6 | Testing Scripts/Workflow Managers | Container Basics |
| On your own time | Testing Scripts | ||
| On your own time | Configuring your Bash Shell |
Reference Texts
- We will be using Julia Evan’s Bite Size Bash as our reference text. Julia’s explanations are incredibly clear and it will be a valuable reference even beyond this course. The PDF is available in the Google Classroom materials. Please do not share with others - we have a group rate and it is only $12 for individual purchases.
- If you want to know the true power of the command line, I recommend Data Science at the Command Line. This book showcases how much you can get done with just command line.
Badge of completion

We offer a badge of completion when you finish the course!
What it is:
- A display of what you accomplished in the course, shareable in your professional networks such as LinkedIn, similar to online education services such as Coursera. A way for you to be accountable for your learning.
What it isn’t:
- Accreditation through an university or degree-granting program.
Requirements:
- Sign up on the badging spreadsheet (will send link out in class).
- Complete badge-required sections of the exercises for 3 out of 4 assignments. We’ll cover this in class.
Acknowledgements
This course would not be live without the efforts of:
- Emma Bishop
- Scott Chamberlain
- Taylor Firman
- Chris Lo
- Sitapriya Moorthi
- Dan Tenenbaum
Thanking you for all your help.