Repo Health Job User Guide

What is Repo Health Job?

The repo health job is a script written to parse all the repositories of given organizations by running the repo health checks on each repository. It generates the summary yaml file against each repository after running the mentioned checks on the repository.
The job also has an additional feature of combining all the yaml files into a single csv file to generate a repo health dashboard and push the data to a specific google spread sheet which makes it easier to monitor and review the changes at any time.

What is it used for?

Right now, 2U has setup a scheduled workflow build using the provided template workflow which is triggered daily and updates the repo health dashboard with the updated data about all the edx and openedx repositories.
The repo health dashboard is currently being used by the tech-arch-bom team to plan and upgrade the repositories across organizations.

Why did we move from Jenkins to GitHub Actions?

Previously, the repo health job was running as a scheduled Jenkins job but for making it easier for Axim and other community organizations to use the job for their respective organizations and repositories, it has been moved to GitHub Action workflows.

How does the job currently work?

Right now, different components of the repo health job workflow are working as following:

  1. The reusable repo health job workflow is present inside the openedx/.github repo and it can be referenced from any repo by any organization using the provided template.

  2. The reusable workflow triggers the bash script to run the repo health checks which is present in the openedx/edx-repo-health repository. The script executes all the repo_health_checks present in the repo on all the repositories of the given organizations.

How can you setup the repo health job for your organization?

To setup the repo health job to run against your organization, you need to follow the steps mentioned below:

Setup the scheduled workflow
  • Create a workflow file by copying the template workflow file in your desired repository.

  • This workflow will be triggered according to your desired scheduled to parse and update the data.

Setup the repo health secrets
  • To successfully run the workflow file, you’ll need to add following secrets to your GitHub repository where the workflow is being hosted:

    • READTHEDOCS_API_KEY: API key for readthedocs access. Needed to run the docs check against repositories.

    • REPO_HEALTH_GOOGLE_CREDS_FILE: Link to the credentials file for the google spreadsheet if you need the job to push the csv to a google spreadsheet.

    • REPO_HEALTH_BOT_TOKEN: GitHub token with read access to all repositories and write access to the target repo where you want the yaml files and csv report to be stored.

    • REPO_HEALTH_BOT_EMAIL: A unique email associated with your repo health bot. This email will be used to commit the generated yaml files and csv report to the target repository.

Provide the needed arguments to the schedule workflow
  • The scheduled workflow will need following input parameters to successfully run all the checks and generate desired reports

    • ORG_NAMES: Space separated list of organization names to parse repositories i.e. 'openedx edx . . .'

    • EDX_REPO_HEALTH_BRANCH: Branch of the openedx/edx-repo-health repo to check out. This can be used to run custom checks against repositories if needed.

    • ONLY_CHECK_THIS_REPOSITORY: If you only want to run repo health on one repository, set this to org/name of the desired repository.

    • REPORT_DATE: The date for which repo health data is required. (format: YYYY-MM-DD). Pass this argument if you want to parse the repositories data for any specific date.

    • REPO_HEALTH_OWNERSHIP_SPREADSHEET_URL: URL for the google spreadsheet to populate the data.

    • REPO_HEALTH_REPOS_WORKSHEET_ID: ID for the google spreadsheet to populate the data.

    • TARGET_REPO_TO_STORE_REPORTS: Target repo to store the csv reports & results i.e. org/repo-name

    • REPOS_TO_IGNORE: Space separated list of repositories to be ignored i.e. 'repo1 repo2 . . .'

How to customize checks for your needs?

edx_repo_health_branch parameter can be used to test custom changes of the script on the repositories. This can be helpful when you

  • don’t want to run the job on all the repositories together

  • need to add any additional step in the script for your needs

  • Skip any of the default checks not needed for your organization

By default, the build run triggered against a custom branch other than master doesn’t commits changes to the repository. If you want your changes to be committed to the target repo after running against the custom branch, you’ll need to update the branch comparison condition from master to custom-branch.