
Chapter 1 Introduction
In this course, we will explore a variety of tools that can assist with reproducible data analysis from a broad range of fields. The tools we will cover may take some time to get used to, but the payoff will be immeasurable. Not only are these skills valuable for career advancement, they will also make your work-life easier. The tools will enhance your ability to reproduce your work across similar projects, stay organized, collaborate with others effectively, and more. This course was funded as part of a series of courses in the Training Module for Reproducible Data Science Research project.
1.1 Motivation
Many researchers are self-taught when it comes to computer science. However, data analysis has become a requirement for most researchers. The ability to smoothly work in a reproducible manner not only makes for easier more maintainable workflows, it also improves scientific rigor and transparency.
This course will help learners to use tools that will make their data analytic workflows more organized, more understandable to collaborators (and your future self!), and ultimately more efficient.
1.2 Target Audience
This course is intended for people conducting data analyses at the level of a graduate student or higher. The course is designed so that the majority of the material is presented in a high-level manner that should be applicable to researchers working in a broad range of areas. The course is centered around the R programming language, a widely used statistical analysis software package.
1.3 Topics covered:
This course will cover organization practices, coding practices, tools, and concepts for making your data analyzes more reproducible in R.
We will cover important topics such as version control to track changes in documents over time, coding practices to make your code more transparent and to test your code, and methods for sharing your code and data in efficient and clear ways.
1.4 Curriculum
The course will cover the basics for getting started with configuring your projects for use of tools and practices to make your analyses more reproducible.
We will also point to more advanced topics in other resources.
References will include Gillespie and Lovelace (2021), Riederer (2020), Timbers, Campbell, and Lee (2022).
Code review references will include “About Scientific Code Review” (n.d.), Radigan (n.d.), Parker (2017), Bodner (2018).