8  Appendix: Configuring your Shell

8.1 .bashrc: Where do I put my configuration?

There is a file in your home directory called .bashrc. This is where you can customize the way the Bash shell behaves.

There are 2 things you should know how to set:

  • Aliases
  • Environment Variables, especially $PATH

8.2 When does the bashrc file get loaded?

You can force reloading of the

8.3 An example .bashrc file

8.4 Aliases

Aliases are shortcuts for commands. You can specify them using alias as a line in your .bashrc file:

alias ll='ls -la'

We are defining an alias called ll that runs ls -la (long listing for directory for all files) here. Once

Some people even add aliases for things they mistype frequently.

8.5 Environment Variables

Environment variables are variables which can be seen globally in the Linux (or Windows) system across executables.

You can get a list of all set environment variables by using the env command. Here’s an example from my own system:

env
SHELL=/bin/bash
NVM_INC=/home/tladera2/.nvm/versions/node/v21.7.1/include/node
WSL_DISTRO_NAME=Ubuntu
NAME=2QM6TV3
PWD=/home/tladera2
LOGNAME=tladera2
[....]

One common environment variable you may have seen is $JAVA_HOME, which is used to find the Java Software Development Kit (SDK). (I usually encounter it when a software application yells at me when I haven’t set it.)

You can see whether an environment variable is set using echo, such as

echo $PATH
/home/tladera2/.local/bin:/home/tladera2/gems/bin:/home/tladera2/.nvm/versions/node/v21.7.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/ [....]

8.5.1 Setting Environment Variables

In Bash, we use the export command to declare an environment variable. For example, if we wanted to declare the environment variable $SAMTOOLS_PATH we’d do the following:

# works: note no spaces
export SAMTOOLS_PATH="/home/tladera2/miniconda/bin/"

One thing to note is that spacing matters when you declare environment variables. For example, this won’t declare the $SAMTOOLS_PATH variable:

# won't work because of spaces
export SAMTOOLS_PATH = "/home/tladera2/miniconda/bin/"

Another thing to note is that we declare environment variables differently than we use them. If we wanted to use SAMTOOLS_PATH in a script, we use a dollar sign ($) in front of it:

${SAMTOOLS_PATH}/samtools view -c $input_file

In this case, the value of $SAMTOOLS_PATH will be expanded (substituted) to give the overall path:

/home/tladera2/miniconda/bin/samtools view -c $input_file

8.5.2 A Very Special Environment Variable: $PATH

The most important environment variable is the $PATH variable. This variable is important because it determines where to search for software executables (also called binaries). If you have softwware installed by a package manager (such as miniconda), you may need to add the location of your executables to your $PATH.

We can add more directories to the $PATH by appending to it. You might have seen the following bit of code in your .bashrc:

export PATH=$PATH:/home/tladera2/samtools/

In this line, we are adding the path /home/tladera2/samtools/ to our $PATH environment variable. Note that how we refer to the PATH variable is different depending on which side the variable is on of the equals sign.

Order matters in your $PATH

TLDR: We declare the variable using export PATH (no dollar sign) and we append to the variable using $PATH (with dollar sign). This is something that trips me up all the time.

For FH Users

In general, when you use environment modules on gizmo, you do not need to modify your $PATH variable. You mostly need to modify it when you are compiling executables so that the system can find them. Be sure to use which to see where the environment module is actually located:

which samtools

8.5.3 Making your own environment variables

One of the difficulties with working on a cluster is that your scripts may be in one filesystem (/home/), and your data might be in another filesystem (/fh/fast/). And it might be recommended that you transfer over files to a faster-access filesystem (/fh/temp/) to process them.

You can set your own environment variables for use in your own scripts. For example, we might define a $TCR_FILE_HOME variable:

export TCR_FILE_HOME=/fh/fast/my_tcr_project/

to save us some typing across our scripts. We can use this new environment variable like any other existing environment variable:

#!/bin/Bash
export my_file_location=$TCR_FILE_HOME/fasta_files/
.bashrc versus .bash_profile

Ok, what’s the difference between .bashrc and .bash_profile?

The main difference is when these two files are sourced. bash_profile is used when you do an interactive login, and .bashrc is used for non-interactive shells.

.bashrc should contain the environment variables that you use all the time, such as $PATH and $JAVA_HOME for example.

You can get the best of both worlds by including the following line in your .bash_profile:

source ~/.bashrc

That way, everything in the .bashrc file is loaded when you log in interactively.

8.6 Project/folder based workflows

On a particular machine, using absolute paths is safe. However, you do this at the cost of portability - code that you write on one machine may not run on another.

If you ever anticipate doing the analysis on a separate machine, using project structures with relative paths is the safest. For example, you may want to move from the on-premise FH system to working with the data in AWS.

Here’s one example of putting everything into a single folder:

my_project
├── data
│   ├── chr1.fa.gz
│   ├── chr2.fa.gz
│   └── chr3.fa.gz
├── results
├── run_workflow.sh
└── scripts
    └── run_bowtie.sh

In the above example, our project is named my_project, and there are three folders inside it: data/, results/, and scripts/. Our main script for running is my_project/run_workflow.sh. Because this script is in the root folder, we can refer to the data/ folder to process files:

./scripts/run_bowtie.sh data/*.fa.gz results/

When we run run_workflow.sh, it will execute run_bowtie.sh on all of the files in data/, and save them in results/, resulting in the following updated structure.

my_project
├── data
│   ├── chr1.fa.gz
│   ├── chr2.fa.gz
│   └── chr3.fa.gz
├── results
│   ├── chr1.bam
│   ├── chr2.bam
│   └── chr3.bam
├── run_workflow.sh
└── scripts
    └── run_bowtie.sh

You may have seen relative paths such as ../another_directory/ - the .. means to go up a directory in the file hierarchy, and then look in that directory for the another_directory/ directory. I try to avoid using relative paths like these.

In general for portability and reproducibility, you will want to use relative paths within a directory, and avoid using relative paths like ../../my_folder, where you are navigating up. In general, use relative paths to navigate down.

Why This is Important

When you start executing scripts, it’s important to know where the results go. When you execute SAMtools on a file in /fh/temp/, for example, where does the output go?

Workflow Runners such as Cromwell and Nextflow will output into certain file structures by default. This can be changed, but knowing the default behavior is super helpful when you don’t specify an output directory.