Flaky Test Process

In June 2024, this page was copied from 2U’s private Confluence to here, so that it could be adapted/adopted for the community, as was originally intended. The page requires clean-up, and could be an OEP or other type of document at this point.


Copied content begins here:

 

This updated process went into effect August 2017.

If you are familiar with the old Flaky Test Process; we have changed over to a zero-tolerance process, rather than a process that attempts to tolerate flaky tests.

Overview

Having dependable test suites is a requirement for decreasing time to value through more frequent edx-platform deployments. This translates to flaky tests no longer being an acceptable nuisance.  Instead, our tests need to be 100% trustworthy in our Continuous Integration (CI) and (near)-Continuous Deployment (CD) systems.

What is a flaky test? A flaky test is one that sometimes passes and sometimes fails. Most are bok-choy acceptance tests, testing the flow of the user through the site. Most flaky tests are flaky because of how the test was written, and not due to an actual bug.

The current process is now to delete the flaky test using the process defined below.

Consequences of this process

This updated process has the following consequences:

  • Test suites become dependable. We can rely on them for deployments. You can rely on them with your PRs.

  • We save lots of people and compute resources that were wasted on flaky tests.

  • We no longer pretend like flaky tests are a safety net against bugs.

  • There is a potential increase in risk for improved time to value.

  • Product development teams will continue to balance the risk vs reward of fixing the test, and determine how to move forward, without the above costs.

Given these consequences, it seems like a reasonable process to experiment with.  The plan is to revisit this process in October 2017 to determine how and if it needs to be improved.

How do I know I have encountered a flaky test?

Before you type "jenkins run xxx", copy the link to the failed build just in case you need to file a ticket. Alternatively you can later go to the job in Jenkins (e.g. the edx-platform-bok-choy-pr job) and find the old builds for your PR using the search bar in the Build History pane.

If you have encountered a test that both fails and passes on the same commit in Jenkins, then the test is flaky.

If the flakiness is related to the code changes in your PR:

  • Ensure you didn't introduce a new flaky test. If so, fix it.

  • Ensure your code doesn't have a bug related to timing. If it does, fix the bug.

  • Ensure your code didn't cause a test to become flaky. If it did, either fix the test or follow this process as appropriate.

If the flaky test is unrelated to your code changes, follow the rest of this process in order.

Step 1: File a flaky bug ticket in Jira:

  1. Check if someone is already following this flaky test process at the same time:

    1. Search for an existing JIRA ticket about this issue. It's probably easiest to use the Known Flaky Tests JIRA query.

    2. Search on Github to see if the test was already deleted.

  2. Create (or update) a JIRA ticket with the following information:

    1. File a ticket here

    2. Summary field: Something like "<testclass testcase> fails intermittently"

    3. Labels: At a minimum "flaky_test" (this will make it show up using the Known Flaky Tests JIRA query)

    4. Platform Area: Platform Areas & Product Components: Full Listing (if you only know the platform area and not the product component, that's fine!)

    5. Description: Include the following:

      1. A link to a failed build in Jenkins.

        1. Pin the build by pressing “Keep this build forever“, otherwise it will soon disappear

      2. A link to a passing build in Jenkins for the same commit.

      3. The Error Message and Stacktrace for a failed test.

        1. You should include this text in the JIRA ticket so it is searchable to others.

        2. Wrap each of these in a `{noformat}` macro so it shows up nicely in JIRA.

      4. (optional but super, super helpful) For BokChoy tests, include the screenshot that was captured at the time that the test failed. To find this:

        1. In the center pane of your Jenkins build, look at the section entitled "Test Result". Each failure will have line in the following format: "Run tests / <number> / <test name>". The number references the shard on which this test was run.

        2. Click the "Build Artifacts" link in the center.

        3. Navigate to "test_root/log/shard_<shard number>".

        4. Download the png and attach it to the JIRA ticket

      5. Putting it all together you'll be entering something like this:

        This test fails intermittently and has been removed from the codebase. For triaging this bug: * Until proven one way or the other, it could be either the code that is flaky or the test that is flaky. * The test has been removed from the codebase, and thus the functionality is no longer covered by bok-choy test. ** Evaluate whether or not it is or can be covered by a lower level (e.g. python or JS unit test) and thus the test was unnecessary, or if there is now risk that a bug in the code could escape and thus the test should be fixed and re-enabled. TestClass:test_case failed in [this build on jenkins | https://build.testeng.edx.org/job/edx-platform-all-tests-master-flow/999/] and passed in the [subsequent master build | {link}]. Error Message {noformat} Whatever the error message was for the testcase. {noformat} Stacktrace {noformat} The stacktrace from the test result. {noformat}

Step 2: Make a PR deleting the flaky test(s):

  1. Delete the flaky test(s) in a single commit in a new PR (so that it is easily cherry-picked by others.)

    1. Include the Jira ticket ID in your commit message, e.g.:

      test: Delete flaky test TEST::METHOD Deleted according to flaky test process: https://2u-internal.atlassian.net/wiki/spaces/TE/pages/12812492/Flaky+Test+Process Flaky test ticket: CR-99999999
    2. Note that temp: could be used in place of test: for the commit type if you assume it will be fixed and restored.

    3. If you have any helpful thoughts to add, feel free to comment on the PR.

  2. Place a link to the PR back in your Jira ticket:

    The test was [deleted in this PR|https://github.com/edx/edx-platform/pull/99999].
  3. Get a passing build and at least 1 review before merging to master.

    1. Note: It would be best if the reviewer has some idea about the relative importance of the test and could help prioritize the ticket to fix the test.

Handling a flaky test JIRA ticket

If you are reviewing a flaky test JIRA ticket, you may want to consider the following:

  • Is the test necessary? For example, is it a bok-choy test that makes more sense in a lower part of the pyramid (e.g. python or javascript unit tests)?

  • Especially in bokchoy, is the test covering a longer flow where only one part is flaky? Maybe you can reduce the amount of the test that needs to be deleted.

  • If the test is of debatable usefulness, consider time-boxing the effort to fix and closing the ticket if it takes too long.

When it’s no longer needed, unpin the Jenkins build by pressing “Don’t keep this build forever“.

Tips for fixing flaky bok-choy tests

  • See Why is my bok choy test flaky? for some common root causes of flakiness.

  • See Examples of Fixes for Flaky Tests for some examples of how flaky tests have been fixed in the past. Feel free to add to this.

  • Use the flaky decorator to run the test multiple times in a row (like below), using the min_passes and max_runs options.  This will allow you to run them multiple times locally more quickly, or on Jenkins with only one build.  Do not forget to remove the decorator before merging though.

  • If you are running this locally, and have been working in to codebase for a while, you might remember we used to need to run with the paver argument --extra_args=--with-flaky. This is no longer necessary because we have switched from nose to pytest for the test runner.

  • Due to setup-related issues, in some cases depending on how the test was written, a second attempt at the same testcase will always fail. You can check if this is the case by temporarily instructing the test to run twice with something like this and running in devstack: @flaky(max_runs=2, min_passes=2).

  • It is suggested that you update and write tests using a browser other than firefox. Note that Chrome and PhantomJS come for free with devstack, but there are a couple gotchas.

  • For a fast debug/make changes/test changes cycle, use these tips on running in Pycharm.

The @skip decorator

The @skip decorator is appropriate only in rare cases. For example, tests might be skipped where the tests don't apply for a subclass like in this sample code.

For all other cases, you probably want to follow a similar process to this flaky test process and delete the test from our codebase.

Splunk report for test flakiness