Chapter 5 Building a GitHub Action | Making Science Reproducible

5.1 Phase 0: Revisiting the Importance of Automation

GitHub Actions are useful as a continuous integration, continuous deployment tool to automatically test an analysis or rerun the steps of analysis anytime there is a change.

GitHub Actions can be triggered by specific events (such as opening a pull request, merging a branch, or opening an issue) or on a schedule (for example, once a month) , and can perform many tasks, and even use a specific container or environment to perform such tasks.

Building a GitHub Action often utilizes GitHub secrets and/or Docker images – the subjects of the assignments before this one.

If needed, revisit the GitHub Automation for Scientists course for a more thorough refresher on

5.2 Phase 1: Repository Setup and Planning

5.2.1 Step 1: Set Up Your Working Environment

Navigate to your capstone sandbox repository
Locate the GitHub_Action_Assignment.yml file within the GitHub_Action_Assignment directory - this is your starting template
Review the existing template to understand what’s already provided

Important: Do not change the filename from GitHub_Action_Assignment.yml. Leave the name exactly as it is. You will be moving the file’s location, but do not change its name.

5.2.2 Step 2: Create Your Working Branch

In order to isolate your work and enable the automated evaluation system this sandbox uses, you will create a new working branch.

Create a new branch for this assignment (e.g., github-action-assignment or gha-build)
Switch to this branch before making any changes

5.2.3 Step 3: Move File to Correct Location

GitHub Actions must be in the .github/workflows/ directory for Github to recognize them as a GitHub Action.

Move GitHub_Action_Assignment.yml to .github/workflows/GitHub_Action_Assignment.yml

Critical: The file must be in this exact location for GitHub to recognize it as a workflow file (and for this assignment to automatically evaluate your GitHub Action).

5.2.4 Step 4: Plan Your GitHub Action

While learning, it can be really helpful to pick an action that produces something that you can visually inspect (e.g., a report or plot) because if it is generated and looks how you expect, you can confirm that your GitHub Action is successful and doesn’t have a silent failure occurring in the background.

Decide what useful task your GitHub Action will perform
- Ideas for useful actions:
     Run automated tests on your capstone code
     Generate reports or documentation
     Check code quality or formatting
     Create data visualizations
     Send notifications when certain conditions are met
     Validate data files or configurations

You never want to put sensitive PHI or PII on GitHub, even in a private repository.

You will want to limit the size of any data files you are putting on GitHub unless you have large file storage (GitLFS) capabilities. So if you want to run a report, create a visualization, or validate a data file, then perhaps make a small toy dataset.

5.3 Phase 2: GitHub Action Development

5.3.1 Step 5: Understand the Template Structure

Open GitHub_Action_Assignment.yml and examine the existing structure
Identify the key components:
     name: - What your action is called
     on: - When it should trigger
     jobs: - What it should do
     job-name: - an identifier of the job specifically
     runs-on: - What environment to use

5.3.2 Step 6: Define Your Action’s Purpose

Choose a meaningful trigger (on:):
     pull_request: - Runs when PRs are opened/updated (good for testing)
     push: - Runs when code is pushed to specific branches
     workflow_dispatch: - Allows manual triggering (useful for development)
     schedule: - Runs on a time schedule (e.g., once a week)

Design your job steps:
     Start with actions/checkout@v4 to get your repository files
     Add steps that accomplish your chosen task
     Include error handling and status checks
     It’s important to name steps so that you can access and check outputs of those steps.

Consider and possibly emulate one of these common design strategies for various tasks. Notice how they fill out the key components and outline where the task steps will be filled in:

For Code Quality/Testing:

name: Code Quality Check
on: pull_request
jobs:
  quality-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run quality checks
        run: |
          # Your quality check commands here
          echo "Running code quality checks..."

For Data Analysis:

name: Automated Analysis
on: workflow_dispatch
jobs:
  analyze:
    runs-on: ubuntu-latest
    container:
      image: jhudsl/ottr_python:main  # Example with R and Python
    steps:
      - uses: actions/checkout@v4
      - name: Run analysis
        run: |
          # Your analysis commands here

For Documentation:

name: Generate Documentation
on: push
jobs:
  docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Generate docs
        run: |
          # Documentation generation commands

Fill out the GitHub_Action_Assignment.yml, including the key components, and the steps of the action (the task you chose), under run: |.
While on your branch, add any needed files (e.g., scripts or toy data) to your copy of the sandbox repository so that your GitHub Action can access them.

5.4 Phase 3: Testing and Iteration

5.4.1 Step 7: Open Your Pull Request

Key: This step triggers the GHA Assignment Eval test

Commit your changes to your branch.
Push the branch to GitHub.
Open a pull request from your branch to main.

5.4.2 Step 8: Monitor Automated Evaluation & Interpret Evaluation Results

Watch for the GHA Assignment Eval check to start running
This evaluator will test whether your GitHub Action runs successfully. The evaluator checks for common issues and provides guidance.
Wait for it to complete and comment on your PR
     • Success: You’ll receive a validation code in the PR comment
     • Failure: You’ll get specific error messages and troubleshooting tips
Interpret the results and proceed accordingly:
     If the test’s comment provides a validation code, skip to Phase 5.
     If the test’s comment contains error messages, visit Phase 4: Troubleshooting and Refinement of these instructions.

5.5 Phase 4: Troubleshooting and Refinement

5.5.1 Step 9: Debug Common Issues

Common issues to check:
YAML syntax errors: Check indentation, colons, and spacing
File location errors: Ensure file is in .github/workflows/
Permission errors: May need to add GitHub secrets or tokens (see the GitHub Secrets Assignment for more information if necessary)
Missing dependencies: Check if your chosen environment has required software
Silent failures: Verify your action actually does what you expect

You’ll find specific error messages in the actions logs. Check them out!

The actions logs are useful when debugging. Look at the Details if an action is not successful for more information why

5.5.2 Step 10: Iterate Until Success

Make changes to your GitHub_Action_Assignment.yml file on your branch
Commit and push changes to trigger new evaluation runs
Keep refining based on feedback until you get an action that runs successfully

5.6 Phase 6: Completion

If you are taking the course on Coursera:

Collect Your Validation Code

Once your GitHub Action runs successfully, copy the validation code from the PR comment. Keep this code safe for your quiz submission.

Submit Your Quiz

Go to your Coursera Graded Assignment (“Building a GitHub Action”)
Paste the validation code when prompted
Submit the quiz to get credit for your work

icons from icons8

5.7 Suggestions for Troubleshooting and Best Practices

Remember: This assignment builds on both the containers knowledge (for choosing appropriate environments) and the GitHub Actions automation concepts. The goal is to demonstrate your understanding of continuous integration principles by creating something that automatically improves or validates your capstone project.

5.7.1 Start Simple and Build Up

Begin with a basic action that you know will work.
Your action should be useful but not overly complex. Focus on demonstrating GitHub Actions concepts rather than building production software.
Test early and often to catch issues quickly.
Add complexity gradually, testing each addition.
Use marketplace actions: Leverage existing actions when possible.

5.7.2 Use Course Knowledge

Reference information from the GitHub Automation for Scientists course as needed.
Reference YAML examples from course exercises
Apply containers knowledge:
- Choose appropriate runs-on: environment
- Consider using Docker containers if you need specific software
- Reference container images from Docker Hub if needed
Apply automation principles:
- Ensure your action fails appropriately when something goes wrong
- Include meaningful output and logging
- Use environment variables and secrets when needed. (Remember you may need to add the “workflow” scope to your personal access token for this assignment.)

5.7.3 Leverage the Automated Testing

The automated evaluator provides immediate feedback
Don’t be afraid to make multiple attempts - each iteration teaches you something
Read the evaluator’s comments carefully for specific guidance

5.7.4 Read Error Messages Carefully

Error messages (in the actions logs) often contain the exact solution
Check the logs: Go to the Actions tab and examine detailed output
Print debugging info: Use echo commands to verify assumptions
Google specific error messages if they’re unclear
Copy and paste the error code and include any relevant context to a code trained LLM (e.g., Phind) for additional help.

The automated evaluation system is designed to help you learn through iteration, so embrace the feedback loop and keep refining your action until it works reliably!