Chapter 5 General Data Analysis Tools
5.2 Command Line vs GUI
When using computers there are two different ways you can tell a computer program what you want it to do. You can use a a Graphics User Interface (abbreviated as GUI) where you point and click buttons or you can use a Command Line Interface where you type in commands and write scripts that tell the program what you want it to do.
Command Line Interfaces require a bit more time to learn and get used to, but they are generally easier to make more reproducible, because every step that you are using an analysis can be written in a script. Graphics User Interfaces can be more intuitive to use more quickly, but they can be difficult to repeat the analysis in the exact same way. If you know you will be doing the same analysis many times (either with different or the same samples), it is a good use of your time to make sure that you learn how to use Command Line tools. We will discuss some of the most commonly used Command line tools here.
5.2.1 Bash
Bash is a command language used by a lot of computers and programs. Many of the same items that you might do every day on your computer by clicking on various items on your desktop and menus, you can also perform using bash.
On a Mac computer, you can use bash commands by finding your Terminal
window. Go to your search bar and search for the Terminal
. You may want to keep this application handy.
In Windows, you can use bash commands by search for Command Prompt
application. Go to your search bar and search for Command Prompt
. You may want to keep this application handy.
5.2.2 R
R is a program commonly used for statistics and data analysis. It’s free and has lots of R packages built for genomics analysis purposes. Many of these packages have been highlighted in this course or otherwise listed in our tool glossary.
5.2.2.1 Resources for learning R
5.2.2.1.1 R and Tidyverse
- Swirl, an interactive tutorial
- R for Data Science
- Tidyverse skills for Data Science by Carrie Wright.
- Handy R cheatsheets
- R Cookbook Second Edition
- Advanced R
- R for Epidemiology - has generally good R advice
- O’Reilly books available through Seattle Public Library
5.2.2.1.3 R and Genomics
- Intro to R and Tidyverse course and exercises from the Childhood Cancer Data Lab.
- Refine.bio examples from the Childhood Cancer Data Lab.
- Biostar Handbook: A Beginner’s Guide to Bioinformatics
5.2.3 Python
Python is a program that also is used for data analysis among many other items. It can be a very powerful development tool. Some of the packages that have been highlighted in this course or otherwise are listed in our tool glossary.