Learning Git and GitHub

Who is this workshop for?

  • Those who want a basic understanding of the version control process
  • Those who want to understand how open source collaboration works
  • Those who need to build a mental model

Not for:

  • People who already use Git on the command line (you already have the mental model)
  • Impatient people

Learning Objectives

By the end of this workshop, you will be able to:

  • Explain the reasons we use repositories
  • Explain how version control tracks changes in your work
  • Make a branch in a repository
  • Make a fork of a repository that you don’t own
  • Make contributions to a repository using pull requests

Reminder: Be gentle with yourself and others

  • Participation Guidelines
  • We are all learning together
  • We all learn at different paces
  • Asking questions is a way of taking care of others

Our focus today is on concepts

  • Intermediate Git is much more about how you do it

Reproducibility and the Research Lifecycle

Benefits of Storing your code in a Repo

  • Centralized code
  • Other people outside your lab can use it
  • The ability to roll back changes that broke your code
  • Recognition for your work
  • Supportive community that can help you learn and improve it

It’s Tough being Open

But it’s also rewarding

Version Control

Version control is a systematic approach to record changes made in a file, or set of files, over time. This allows you and your collaborators to track the history, see what changed, and recall specific versions later when needed.

Ways we work with version control

  • By ourselves (sole developer)
  • As a member of a GitHub repository
  • As an external collaborator of a GitHub repository

Version Control Workflow (by ourselves)

  1. Create files - these may contain text, code or both.
  2. Work on these files, by changing, deleting or adding new content.
  3. Create a snapshot of the file status (also known as version) at this time.
  4. Document what was changed in the version history of that file.

You probably already do a version of this:

Git is a formal way of tracking changes

  • Each “save” is called a commit
  • Basically a snapshot of the file at that moment in time
  • We have one file, but many versions of that file
  • We only track changes in the file, not save the entire file

What’s the diff-erence?

Git only tracks what’s changed between commits (called a diff):

  • Lines of code we add (+)
  • Lines of code we delete (-)

Diff Example

Diff Example

We can fix mistakes

  • What if we made a mistake in code?
  • We can roll back or revert changes associated with a commit

Exercise: Look at a commit history

More about commits

  • Commits have a message
  • Commits can be done for multiple files at once

There is an intermediate step: staging

  • Exists to bundle multiple changes to a single commit
  • Is hidden in the web interface

Ways we work in a repository

  • By ourselves (sole developer)
  • As a member of a GitHub repository
  • As an external collaborator of a GitHub repository

Git / GitHub is a way for multiple developers to work together

  • Everything we’ve done so far we’ve done by ourselves
  • The key with Git/GitHub is that multiple people can work on a repository at once

What is the difference between Git / GitHub?

  • Git is the software that does version control
    • Use it on our own machines with command line git, Git Desktop, RStudio, etc.
  • GitHub is the website that hosts repositories and it uses Git
    • Hosted repository is also called the remote

Interacting with GitHub

graph LR
  A[Our local machine] --git push ---> B
  B["Remote Repo on GitHub"] --git pull--> A

  • Updating the remote from our local is called a push
  • Updating the local from the remote is called a pull

How do we do this?

  • Multiple people can work on their own versions of the code called branches
  • Developers can work on different features on the same code
  • Needs a reconciliation process (pull requests/merging)

Branches are isolated versions of the original repository

Many People Can Work on the Same Code

When in doubt, branch

  • Before you make big changes, make a branch

Exercise: Add a Recipe to our Cookbook Repository!

Repository Member List

Everyone is a member of one of two repositories:

Exercise: Add a Recipe

Now comes the hard part

  • Integrating the changes

Make a Pull Request

  • A pull request is a formal request to merge your code changes into the history
  • Someone (the owner) needs to merge your changes after a request

Exercise: Make a Pull Request

Exercise: Make a Pull Request

Reconciliation of Branches (merging)

  • Need to integrate changes in branches together
  • This is called a merge

Who’s responsible for merging?

  • Repository Owner
    • Could be program manager of a group
    • Could be software engineer
  • It is a big responsibility
    • Need to ensure that merge doesn’t break things
    • Need to make sure that merges don’t conflict

Merging process

  • Manual review process
    • May submit reviews
    • May submit approvals
    • May deny the pull request

Merging Demo

Ways we work in a repository

  • By ourselves (sole developer)
  • As a member of a GitHub repository
  • As an external collaborator of a GitHub repository

GitHub lets you contribute to code, even if you aren’t a member

  • You can still contribute to code you don’t own
    • Open source is built on collaborations
  • You can do this by making a fork of the code

Forks

  • Your version of the code is called a fork - it belongs to you
    • Like an external branch
  • Can submit your changes to the code through a pull request

Exercise: Make a fork and add a recipe

Let’s try a reviewing a Pull Request

Exercise: Review a Pull Request

Recap of What we did

Hopefully you will now be able to

  • Explain the reasons we use repositories
  • Explain how version control tracks changes in your work
  • Make a branch in a repository
  • Make a fork of a repository that you don’t own
  • Make contributions to a repository using pull requests

Resources