Chapter 1 Introduction
In this course we will explore a variety of tools that can assist with data analysis from a broad range of fields. The tools we will cover may take some time to get used to, but the payoff will be immeasurable. Not only are these skills valuable for career advancement, they will also make your work-life easier. The tools will enhance your ability to reproduce your work across similar projects, stay organized, collaborate with others effectively, and more.
1.1 Motivation
Many researchers are self-taught when it comes to computer science. However, data analysis has become a requirement for most researchers. The ability to smoothly work in a reproducible manner not only makes for easier more maintainable workflows, it also improves scientific rigor and transparency.
This course will help learners to use tools that will make their data analytic workflows more organized, more understandable to collaborators (and your future self!), and ultimately more efficient.
1.2 Target Audience
This course is intended for people conducting data analyses at the level of a graduate student or higher. The course is designed so that the majority of the material is presented in a high-level manner that should be applicable to researchers working in a broad range of areas. The course is centered around the R programming language, a widely used statistical analysis software package.
1.4 Learning Objectives
- Implement basic project organization tools:
- R Studio tips and tricks for efficiency
- R Markdown to create reports
- Setup and configure RStudio/RStudio projects for data analysis (
here
package and file structure/paths) - Install and configure
ProjectTemplate
package for formalizing and automating workflows
- Setup and configure RStudio/RStudio projects for data analysis (
- Apply the
pointblank
package for validation of tabular data - Write functions and package them
- Apply the
testthat
package for building software unit tests - Setup and use Git repositories for version control of code
- Interface with GitHub to share Git repositories for collaboration; execute GitHub-based workflows
- Pull Requests
- Code review
- Issues
- Discussions
References will include Gillespie and Lovelace (2021), Riederer (2020), Timbers, Campbell, and Lee (2022).
Code review references will include “About Scientific Code Review” (n.d.), Radigan (n.d.), Parker (2017), Bodner (2018).