Chapter 5 General Data Analysis Tools

5.1 Learning Objectives

This chapter will demonstrate how to: Understand the difference between command line and GUI based applications. Understand what R and Python languages are. Find many links to resources where you can learn R or Python

5.2 Command Line vs GUI

When using computers there are two different ways you can tell a computer program what you want it to do. You can use a a Graphics User Interface (abbreviated as GUI) where you point and click buttons or you can use a Command Line Interface where you type in commands and write scripts that tell the program what you want it to do.

Command Line Interfaces require a bit more time to learn and get used to, but they are generally easier to make more reproducible, because every step that you are using an analysis can be written in a script. Graphics User Interfaces can be more intuitive to use more quickly, but they can be difficult to repeat the analysis in the exact same way. If you know you will be doing the same analysis many times (either with different or the same samples), it is a good use of your time to make sure that you learn how to use Command Line tools. We will discuss some of the most commonly used Command line tools here.

5.2.1 Bash

Bash is a command language used by a lot of computers and programs. Many of the same items that you might do every day on your computer by clicking on various items on your desktop and menus, you can also perform using bash.

On a Mac computer, you can use bash commands by finding your Terminal window. Go to your search bar and search for the Terminal. You may want to keep this application handy.

In Windows, you can use bash commands by search for Command Prompt application. Go to your search bar and search for Command Prompt. You may want to keep this application handy.

5.2.2 R

R is a program commonly used for statistics and data analysis. It’s free and has lots of R packages built for genomics analysis purposes. Many of these packages have been highlighted in this course or otherwise listed in our tool glossary.

5.2.2.1 Resources for learning R

5.2.2.1.1 R and Tidyverse
5.2.2.1.3 R and Genomics

5.2.3 Python

Python is a program that also is used for data analysis among many other items. It can be a very powerful development tool. Some of the packages that have been highlighted in this course or otherwise are listed in our tool glossary.

5.2.3.1 Resources for learning python