Chapter 1 Introduction

In this course, we will explore a variety of tools that can assist with reproducible data analysis from a broad range of fields. The tools we will cover may take some time to get used to, but the payoff will be immeasurable. Not only are these skills valuable for career advancement, they will also make your work-life easier. The tools will enhance your ability to reproduce your work across similar projects, stay organized, collaborate with others effectively, and more. This course was funded as part of a series of courses in the Training Module for Reproducible Data Science Research project.

1.1 Motivation

Many researchers are self-taught when it comes to computer science. However, data analysis has become a requirement for most researchers. The ability to smoothly work in a reproducible manner not only makes for easier more maintainable workflows, it also improves scientific rigor and transparency.

This course will help learners to use tools that will make their data analytic workflows more organized, more understandable to collaborators (and your future self!), and ultimately more efficient.

1.2 Target Audience

This course is intended for people conducting data analyses at the level of a graduate student or higher. The course is designed so that the majority of the material is presented in a high-level manner that should be applicable to researchers working in a broad range of areas. The course is centered around the R programming language, a widely used statistical analysis software package.

For individuals who: Are new to working in R or RStudio, are familiar with R but want to make their projects more organized, transparent, and reproducible, want to learn about making reproducible reports and want to track changes across projects over time with GitHub

1.3 Topics covered:

This course will cover organization practices, coding practices, tools, and concepts for making your data analyzes more reproducible in R.

We will cover important topics such as version control to track changes in documents over time, coding practices to make your code more transparent and to test your code, and methods for sharing your code and data in efficient and clear ways.

Concepts discussed in the Tools for Reproducible Workflows in R course: Why R is a great tool for reproducibility, major practices involved in reproducibility and methods to organize projects, how to use tools in RStudio to make your work more reproducible, How to make reproducible RMarkdown and Quarto reports, code practices to make your code more transparent, version control with GitHub to track changes over time and collaborate with others on projects, how to be transparent about software versions, how to share data and code publicly

1.4 Curriculum

The course will cover the basics for getting started with configuring your projects for use of tools and practices to make your analyses more reproducible.

We will also point to more advanced topics in other resources.

Overall Course Learning Objectives. This course will demonstrate how to: 1. Explain best practices for making analyses more reproducible and transparent, 2. Use special features in RStudio for efficiency and reproducibility, 3. Configure and organize projects for data analysis using the here package and the ProjectTemplate package, 4. Create reproducible reports using RMarkdown and Quarto, 5. Write custom functions for reuse of code, 6.Test functions with the testthat package, 7. Setup and use Git and GitHub to track changes over time.,  8. Share data and code publicly

References will include Gillespie and Lovelace (2021), Riederer (2020), Timbers, Campbell, and Lee (2022).

Code review references will include “About Scientific Code Review” (n.d.), Radigan (n.d.), Parker (2017), Bodner (2018).