8 Appendix: Configuring your Shell
8.1 .bashrc
: Where do I put my configuration?
There is a file in your home directory called .bashrc
. This is where you can customize the way the Bash shell behaves.
There are 2 things you should know how to set:
- Aliases
- Environment Variables, especially
$PATH
8.2 When does the bashrc
file get loaded?
You can force reloading of the
8.3 An example .bashrc
file
8.4 Aliases
Aliases are shortcuts for commands. You can specify them using alias
as a line in your .bashrc
file:
alias ll='ls -la'
We are defining an alias called ll
that runs ls -la
(long listing for directory for all files) here. Once
Some people even add aliases for things they mistype frequently.
8.5 Environment Variables
Environment variables are variables which can be seen globally in the Linux (or Windows) system across executables.
You can get a list of all set environment variables by using the env
command. Here’s an example from my own system:
env
SHELL=/bin/bash
NVM_INC=/home/tladera2/.nvm/versions/node/v21.7.1/include/node
WSL_DISTRO_NAME=Ubuntu
NAME=2QM6TV3
PWD=/home/tladera2
LOGNAME=tladera2
[....]
One common environment variable you may have seen is $JAVA_HOME
, which is used to find the Java Software Development Kit (SDK). (I usually encounter it when a software application yells at me when I haven’t set it.)
You can see whether an environment variable is set using echo
, such as
echo $PATH
/home/tladera2/.local/bin:/home/tladera2/gems/bin:/home/tladera2/.nvm/versions/node/v21.7.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/ [....]
8.5.1 Setting Environment Variables
In Bash, we use the export
command to declare an environment variable. For example, if we wanted to declare the environment variable $SAMTOOLS_PATH
we’d do the following:
# works: note no spaces
export SAMTOOLS_PATH="/home/tladera2/miniconda/bin/"
One thing to note is that spacing matters when you declare environment variables. For example, this won’t declare the $SAMTOOLS_PATH
variable:
# won't work because of spaces
export SAMTOOLS_PATH = "/home/tladera2/miniconda/bin/"
Another thing to note is that we declare environment variables differently than we use them. If we wanted to use SAMTOOLS_PATH
in a script, we use a dollar sign ($
) in front of it:
${SAMTOOLS_PATH}/samtools view -c $input_file
In this case, the value of $SAMTOOLS_PATH
will be expanded (substituted) to give the overall path:
/home/tladera2/miniconda/bin/samtools view -c $input_file
8.5.2 A Very Special Environment Variable: $PATH
The most important environment variable is the $PATH
variable. This variable is important because it determines where to search for software executables (also called binaries). If you have softwware installed by a package manager (such as miniconda
), you may need to add the location of your executables to your $PATH
.
We can add more directories to the $PATH
by appending to it. You might have seen the following bit of code in your .bashrc
:
export PATH=$PATH:/home/tladera2/samtools/
In this line, we are adding the path /home/tladera2/samtools/
to our $PATH
environment variable. Note that how we refer to the PATH
variable is different depending on which side the variable is on of the equals sign.
$PATH
TLDR: We declare the variable using export PATH
(no dollar sign) and we append to the variable using $PATH
(with dollar sign). This is something that trips me up all the time.
In general, when you use environment modules on gizmo
, you do not need to modify your $PATH
variable. You mostly need to modify it when you are compiling executables so that the system can find them. Be sure to use which
to see where the environment module is actually located:
which samtools
8.5.3 Making your own environment variables
One of the difficulties with working on a cluster is that your scripts may be in one filesystem (/home/
), and your data might be in another filesystem (/fh/fast/
). And it might be recommended that you transfer over files to a faster-access filesystem (/fh/temp/
) to process them.
You can set your own environment variables for use in your own scripts. For example, we might define a $TCR_FILE_HOME
variable:
export TCR_FILE_HOME=/fh/fast/my_tcr_project/
to save us some typing across our scripts. We can use this new environment variable like any other existing environment variable:
#!/bin/Bash
export my_file_location=$TCR_FILE_HOME/fasta_files/
.bashrc
versus .bash_profile
Ok, what’s the difference between .bashrc
and .bash_profile
?
The main difference is when these two files are sourced. bash_profile
is used when you do an interactive login, and .bashrc
is used for non-interactive shells.
.bashrc
should contain the environment variables that you use all the time, such as $PATH
and $JAVA_HOME
for example.
You can get the best of both worlds by including the following line in your .bash_profile
:
source ~/.bashrc
That way, everything in the .bashrc
file is loaded when you log in interactively.
8.6 Project/folder based workflows
On a particular machine, using absolute paths is safe. However, you do this at the cost of portability - code that you write on one machine may not run on another.
If you ever anticipate doing the analysis on a separate machine, using project structures with relative paths is the safest. For example, you may want to move from the on-premise FH system to working with the data in AWS.
Here’s one example of putting everything into a single folder:
my_project
├── data
│ ├── chr1.fa.gz
│ ├── chr2.fa.gz
│ └── chr3.fa.gz
├── results
├── run_workflow.sh
└── scripts
└── run_bowtie.sh
In the above example, our project is named my_project
, and there are three folders inside it: data/
, results/
, and scripts/
. Our main script for running is my_project/run_workflow.sh
. Because this script is in the root folder, we can refer to the data/
folder to process files:
./scripts/run_bowtie.sh data/*.fa.gz results/
When we run run_workflow.sh
, it will execute run_bowtie.sh
on all of the files in data/
, and save them in results/
, resulting in the following updated structure.
my_project
├── data
│ ├── chr1.fa.gz
│ ├── chr2.fa.gz
│ └── chr3.fa.gz
├── results
│ ├── chr1.bam
│ ├── chr2.bam
│ └── chr3.bam
├── run_workflow.sh
└── scripts
└── run_bowtie.sh
You may have seen relative paths such as ../another_directory/
- the ..
means to go up a directory in the file hierarchy, and then look in that directory for the another_directory/
directory. I try to avoid using relative paths like these.
In general for portability and reproducibility, you will want to use relative paths within a directory, and avoid using relative paths like ../../my_folder
, where you are navigating up. In general, use relative paths to navigate down.