Feel free to comment or share your learning.

Glossary

Workflow: A collection of jobs defined in a .yml file, with associated triggers (on:).
Job: A named set of steps, run in a certain environment (runs-on:).
Step: A task within a job.

Style

General

2-space indents. Use .editorconfig from edx-lint.
The file extension should be .yml, not .yaml.
Use YAML dash-syntax for lists, even for one-item lists. This makes it easier to add/remove items without adjusting other lines.

Workflows

Give every workflow a name. Use title casing: Nightly Unit Tests. Keep the name short, 4-5 words or less. Any longer, and GitHub will truncate them in the UI like the PR check interface.
Name the workflow file the same as the workflow name, except lower case with dashes: nightly-unit-tests.yml

Jobs

Use snake_case for the job’s ID. For example python_38_unit_tests:
Give every job a name. Use sentence casing: Python 3.8 unit tests. Keep the name short, 4-5 words or less
Leave a blank line between jobs.

Steps

Give every step a name. Use sentence casing: Build and upload the results report. These can be longer than 5 words.
Leave a blank line between steps.

Commands

Always use YAML multi-line strings for shell commands, to simplify quoting:

Instead of this:

# BAD! This won't parse!
run: bash -c "./do.py --extra \"{'offset': '2023-06-14T04:20:00'}\""

Use this:

run: |
  bash -c "./do.py --extra \"{'offset': '2023-06-14T04:20:00'}\""

Example

name: "Nightly Unit Tests"

on:
  push:
    branches:
      - "**/*nightly*"
  schedule:
    # Run at 2:22am early every morning Eastern time (6/7:22 UTC)
    # https://crontab.guru/#22_7_%2a_%2a_%2a
    - cron: "22 7 * * *"
  workflow_dispatch:

defaults:
  run:
    shell: bash

permissions:
  contents: read

concurrency:
  group: "${{ github.workflow }}-${{ github.ref }}"
  cancel-in-progress: true

jobs:
  tests:
    name: "Python ${{ matrix.python-version }} tests"
    runs-on: ubuntu-20.04

    strategy:
      matrix:
        python-version:
          - "3.8"
          - "3.11"

    steps:
      - name: "Check out the repo"
        uses: "actions/checkout@v3"

      - name: "Set up Python"
        uses: "actions/setup-python@v4"
        with:
          python-version: "${{ matrix.python-version }}"
          
      - name: "Do the thing"
        run: |
          python -m tox -- -rfsEX

Help

For schedule/cron triggers, include a link to crontab.guru: https://crontab.guru/#22_7_%2a_%2a_%2a.
Use comments to explain tricky parts of the workflow.

Error Checking

You can enable stricter Bash error handling by setting the default “shell” to “bash” at the top of every workflow, like so:

defaults:
  run:
    shell: bash

Note that bash is already the default interpreter for any shell code you put in your workflows, but explicitly setting “bash” in your workflow enables some extra-strict bash behavior. Notably, it enables the “pipefail” option, which means that if you write a | b, an error in a will result the entire command failing (by default, errors in a are silenced; only errors in b are raised). You can read more about the nuances here.

tl;dr: When in doubt, put this a the top of your script. It will make it less likely for errors to pass silently.

Matrix

Always use a matrix for versions like Python/Django, even if there’s only one. This makes it easier to understand and adjust the versions.

Dynamic Matrix

To dynamically set matrix values in a maintainable way, you can utilize https://github.com/actions/github-script

jobs:
  setup-matrix:
    steps:
      - uses: actions/github-script@v6
        id: generate_matrix
        with:
          script: |
            var nodeVersions = [16, 18];
            // logic to add/remove node versions
            core.setOutput('nodeVersions', nodeVersions);
    outputs:
      node_versions: ${{ steps.generate_matrix.outputs.nodeVersions }}

  run_tests:
    needs: [setup-matrix]
    strategy:
      matrix:
        version: ${{ fromJson(needs.setup-matrix.outputs.node_versions) }}
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: ${{ matrix.version }}
      - run: |
          npm ci
          npm run test

Security

Use GitHub repo or organization secrets for credentials.
Be careful to avoid script injection attacks: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#understanding-the-risk-of-script-injections
References to other GitHub actions use tags like @v3. This is not a specific reference and will change if the author of the action updates it. This might not be what you want. You can use a full SHA reference instead to be certain of the version you are getting.
Actions and workflows can have a permissions clause that limits the actions permissible with the implicit GitHub token: https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs

Organization

Workflows (.yml files) can contain one job or many. Here are some general guidelines on when to organize jobs into one workflow versus splitting them up:

When jobs are triggered by different scenarios (for example, pull request vs. scheduled), split them into separate workflows.
When one job depends on the status or output of other jobs, you may need to combine them into a single workflow.
When jobs share a related purpose (for example, code quality checks), consider grouping them into a single workflow.
When using a matrix and required checks, it can simplify configuration to collect the matrix steps into a final success step, then require just the success step. An example of applying this pattern is in https://github.com/openedx/edx-platform/pull/31024.
If many repos across Open edX will be using the same workflow, consider making a reusable workflow—even if all your workflow does is call a non-openedx reusable workflow! This will allow coordinated upgrades.
- Get your workflow working, then make a PR against openedx/.github but use the workflow_call trigger and inputs.
- In the new workflow, be sure to pin any workflows it depends on to commits (ideally) or tags.
- In your repo, depend on the master version of the new reusable workflow. This provides a central point of indirection that allows automatically upgrading all dependent repos to newer versions of actions.

Testing

https://github.com/nektos/act allows you to run and test your actions locally (with some limitations)
If you’re making a new GitHub workflow that needs to be manually runnable (on: workflow_dispatch), GitHub won’t offer to run it until it has been run at least once. You can get out of this Catch-22 by temporarily adding push as one of the triggers, then pushing that to your branch. Now GitHub knows about it, and you can remove push and then use the GH CLI to invoke the version that's on your branch: gh workflow run my-new-workflow.yml --ref my-working-branch -f param1=value1 -f param2=value2

Other advice

Include an on: workflow_dispatch trigger. Why wouldn’t you want to be able to manually start an action if needed?
If you have many required checks, it can be more convenient and less error-prone to aggregate them into a “success” step, and just require the success step. See edx-platform’s unit-tests.yml action for how it works: https://github.com/openedx/edx-platform/blob/f4566cdc7a89b788f8981c637c57c0453e857a3c/.github/workflows/unit-tests.yml#L77-L91
Actions now (Jan 2023) can have configuration variables: https://github.blog/changelog/2023-01-10-github-actions-support-for-configuration-variables-in-workflows/. Previously, this was done with secrets, but it’s not good to store non-secret information in secrets.
Jobs can auto-cancel when changes are pushed to the same branch. Use the concurrency clause to control what jobs get canceled when a new job begins: https://docs.github.com/en/actions/using-jobs/using-concurrency.
Review existing actions and model workflows to understand what’s possible and avoid reinventing existing well-supported code. One good resource for this is https://github.com/sdras/awesome-actions .

TODO:

How to link to the 2U details
Can we use https://yamllint.readthedocs.io/en/stable/index.html to check some of this?

Contents