2 Week 1: Navigating
2.1 Learning Objectives
By the end of this session, you should be able to:
- Navigate and copy data to the different filesystems available at Fred Hutch.
- Explain the difference between absolute and relative file paths.
- Set Permissions on and execute a bash script
- Execute scripts written in Python and R on the command line
- Find help on the system and on the web
2.2 Exercises
Open up the exercises here or in Google Classroom.
Defined words are double underlined. You can click and hold on them to see the definition. Try it below!
2.4 Setting Yourself Up for Success
Make Sure you:
I will demo how to connect to rhino using the Scicomp On Demand dashboard. This site has a handy “Rhino Shell Access” menu item under “Clusters”.
When you are scripting, I suggest you open two terminal windows: the first one is for editing scripts, and the second one is for running scripts on the command line.
So now we have logged into rhino. Now what?
2.5 Grabbing Stuff from GitHub
For the rest of the exercises for today, we’ll be grabbing the scripts from github using git clone.
git clone https://github.com/fhdsl/bash_for_bioThis will create a folder called bash_for_bio/ in our current directory. This directory has all of the course materials, including the scripts.
bash_for_bio/
Throughout this course, I expect you to run code in the base bash_for_bio/ folder, not in scripts or in data. All of the code is tested with this in mind.
If you are having problems executing the code, please make sure you are in the base bash_for_bio folder, or adjust your file paths when running the script.
2.5.1 du: How much space?
One of the first things we can do is check for disk usage with the du command. If I run du by itself on the command line, it will give me the disk usage of all folders and files in our current directory, which is a lot of output.
There is an option called -d that lets us specify the depth. -d 1 will give us only the file sizes of the top level folders in our directory.
Make sure you are in the bash_for_bio/ directory. Then try the following command:
du -d 1 -h .Here are the first few lines of my du output within the bash_for_bio folder:
240K ./_extensions
192K ./.quarto
616K ./scripts
1.9M ./data
8.6M ./.git
6.7M ./docs
10M ./images
30M .
If we want to specify du to scan only a single folder, we can give the folder name.
du -d 1 scriptsAnd I will get the following output:
144K scripts/week1
56K scripts/__pycache__
128K scripts/week3
232K scripts/week2
616K scripts
2.6 FH users: the main filesystems
When working on the Fred Hutch HPC, there are four main filesystems you should consider:
/home/- The home filesystem. Your scripts can live here. Also where your configuration files (such as.bashrc) live. Can be accessed using~/./fh/fast/(also known asfast) - Research storage. Raw files and processed results should live here./hpc/temp/(also known astemp) - The temporary filesystem. This filesystem is faster to access for gizmo nodes on the cluster, so files can be copied to for computation. The output files you generate should be moved back into an appropriate folder on/fh/fast/. Note that files on/hpc/temp/will be deleted after 30 days./fh/regulated/- A secure filesystem meant for NIH regulated data. If you are processing data that is regulated under the current NIH guidelines, you will process it here.
So, how do we utilize these filesystems? We will be running commands like this:
- 1
- Load bwa software
- 2
-
Start
bwa mem(aligner) - 3
- path of genome index
- 4
- path of paired end reads files
- 5
- path of output
To understand the above, We first have to familiarize ourselves with absolute vs relative paths.
2.6.1 More about the FH Filesystems
2.6.2 Try it out
What are the permissions for the GitHub repo (bash_for_bio) that you just downloaded?
2.7 Running a Bash Script
Ok, now we have a bash script tell_the_time.sh in our current directory, how do we run it?
Because the script is not on our $PATH (Section 15.3.2), we’ll need to use ./ to execute it. ./ is an alias for the current folder, and it is an indicator to bash that the command we want to execute is in our current folder.
tladera2$ ./tell_the_time.shIf we haven’t set the permissions (Section 1.4) correctly, we’ll get this message:
bash: ./scripts/tell_the_time.sh: Permission denied
But if we have execute access, we’ll get something like this:
Fri Jul 11 13:27:47 PDT 2025
Which is the current date and time.
2.8 Running an R or Python Script on the command line
2.8.1 Loading the fhR or fhPython modules
Before we can run our scripts in R or Python, we’ll need to load up either R or Python on the cluster. We can do this with the module load command:
- 1
-
Load up
fhRmodule - has R and most packages installed - 2
-
Load up
fhPythonmodule - has Python and most packages installed.
We’ll talk more about software modules next week (Section 3.5).
2.8.2 R Users
You might not be aware that there are multiple ways to run R:
- as an interactive console, which is what we usually use in an IDE such as RStudio
- on the command line using the
Rscriptcommand.
Rscript my_r_script.RTo run this script, we’ll need to first load fhR:
module load fhR
Rscript my_r_script.R
module purge2.8.3 Python Users
Python users are much more aware that you can run Python scripts on the command line:
python3 my_python_script.py
To execute this on gizmo, we’ll first need to load fhPython:
module load fhPython
python3 my_python_script.py
module purgeWithin a shell script, you can also use a shebang (Section 3.4) to make your script executable by providing the location of python3:
#!/bin/python3
python3 my_python_script.py
2.9 Getting Help
You may have heard about man pages. You can usually get help by using the man command:
man wcThis is the first part of the output:
NAME
wc – word, line, character, and byte count
SYNOPSIS
wc [--libxo] [-Lclmw] [file ...]
DESCRIPTION
The wc utility displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is
specified) to the standard output. A line is defined as a string of characters delimited by a ⟨newline⟩ character.
Characters beyond the final ⟨newline⟩ character will not be included in the line count.
A word is defined as a string of characters delimited by white space characters. White space characters are the set of
characters for which the iswspace(3) function returns true. If more than one input file is specified, a line of cumulative
counts for all the files is displayed on a separate line after the output for the last file.
The following options are available:
--libxo
Generate output via libxo(3) in a selection of different human and machine readable formats. See xo_parse_args(3)
for details on command line arguments.
-L Write the length of the line containing the most bytes (default) or characters (when -m is provided) to standard
output. When more than one file argument is specified, the longest input line of all files is reported as the value
of the final “total”.
I personally find man pages very hard to read, especially when there are lots of options for a command.
Instead, I use tldr, which contain examples of the most commonly used options in a command. It is not installed on gizmo, but you can use the page at https://tldr.inbrowser.app/, which has all of the tldr help pages.
2.10 Recap
We learned the following this week:
- Navigate and copy data to the different filesystems available at Fred Hutch.
- Set Permissions on and execute a bash script
- Find help on the system