Flaky Test Process
Overview
Having dependable test suites is a requirement for decreasing time to value through more frequent edx-platform deployments. This translates to flaky tests no longer being an acceptable nuisance. Instead, our tests need to be 100% trustworthy in our Continuous Integration (CI) and (near)-Continuous Deployment (CD) systems.
What is a flaky test? A flaky test is one that sometimes passes and sometimes fails. Most flaky tests are flaky because of how the test was written, and not due to an actual bug. However, that is not a certainty.
The flaky test process is to delete the flaky test using the process defined below.
Consequences of this process
This process has the following consequences:
Test suites become dependable. We can rely on them for deployments. You can rely on them with your PRs.
We save lots of people and compute resources that were wasted on flaky tests.
We no longer pretend like flaky tests are a safety net against bugs.
There is a potential increase in risk for improved time to value.
Product development teams will continue to balance the risk vs reward of fixing the test, and determine how to move forward, without the above costs.
How do I know I have encountered a flaky test?
If you have encountered a test that both fails and passes on the same commit, then the test is flaky.
If the flakiness is related to the code changes in your PR:
Ensure you didn't introduce a new flaky test. If so, fix it.
Ensure your code doesn't have a bug related to timing. If it does, fix the bug.
Ensure your code didn't cause a test to become flaky. If it did, either fix the test or follow this process as appropriate.
If the flaky test is unrelated to your code changes, follow the rest of this process in order.
Step 1: File a flaky bug ticket in Github:
Check if someone is already following this flaky test process at the same time:
Search for an existing Github ticket about this issue using the
flaky-test
labelSearch on Github to see if the test was already deleted.
Create (or update) a Github issue:
Create a new issue in the repo containing the test code
Title:
[Flaky test] `path/to/test.py::class::method`
Labels:
flaky-test
Description: Paste the following template:
This test fails intermittently and will be deleted according to the Flaky Test Process in <https://github.com/openedx/edx-platform/pull/___PENDING___> See docs on [how to address flaky tests](https://openedx.atlassian.net/wiki/spaces/AC/pages/4306337795/Flaky+Test+Process#Handling-a-flaky-test-Github-issue) for why this should be fixed and how to go about it. - Failing CI run: https://... - Subsequent Passing CI run: https://... Failure output: ``` ...test failure... ```
…and fill out these placeholders:
URL of failed test run in GitHub Actions.
URL of passing run for the same commit.
Output from the failing test, including error message, stack trace, and anything else needed to recognize the same failure again. Build logs aren’t kept forever, and this also makes the text searchable.
Now we can remove the test!
Step 2: Make a PR deleting the flaky test(s):
Delete the flaky test(s) in a single commit in a new PR (so that it is easily cherry-picked by others.)
In the commit message, link to the GitHub issue you just created, e.g.:
test: Delete flaky test TEST::METHOD Deleted according to flaky test process: https://openedx.atlassian.net/wiki/spaces/AC/pages/4306337795/Flaky+Test+Process Flaky test ticket: - https://github.com/openedx/edx-platform/issues/_____
Note that
temp:
could be used in place oftest:
for the commit type if you assume it will be fixed and restored.If you have any helpful thoughts to add, feel free to comment on the PR.
Update your Github issue to link to the PR (replace the
___PENDING___
placeholder in the first line).Get a passing build and at least 1 review before merging to master.
Note: It would be best if the reviewer has some idea about the relative importance of the test and could help prioritize the ticket to fix the test.
Handling a flaky test Github issue
If you are reviewing a flaky test Github issue, you may want to consider the following:
Is the test necessary? For example, is it a cypress test that makes more sense in a lower part of the pyramid (e.g. python or javascript unit tests)?
Is the test covering a longer flow where only one part is flaky? Maybe you can reduce the amount of the test that needs to be deleted.
Is it only the test that is flaky, or is the code under test itself non-deterministic?
If the test is of debatable usefulness, consider time-boxing the effort to fix and closing the ticket if it takes too long.