Feature Flags and Settings on edx-platform


Rules of thumb (going forward)

  1. Feature toggles. Use Waffle via waffle_utils for boolean flags to toggle or rollout a feature.
    1. Use WaffleSwitch for a simple toggle that can be dynamically changed while the server is running (example). 
      Toggle via Django Admin at /admin/waffle/switch/

    2. Use WaffleFlag for a staged rollout of the feature when you want to increase gradually the percentage of the population (example).
      Toggle via Django Admin at /admin/waffle/flag/
    3. Use CourseWaffleFlag for supporting course-level opt-in (during staged rollout) or opt-out (after full rollout) of the feature (example).
      Toggle via Django Admin at /admin/waffle_utils/waffleflagcourseoverridemodel/

  2. Feature settings. Use Configuration models for more complex settings and non-boolean fields to configure a feature.
    1. Models.
      1. Configuration models are based on Django models.
      2. They are immutable tables where each change introduces a new row in the table.
      3. Declare your feature-specific model in your models.py file, inheriting from ConfigurationModel (example).
      4. Declare a django admin interface to your model in your admin.py (example).
    2. Site-aware. Make your config models site-aware by including a site field in the model (example).
    3. Removal. To remove a Configuration field/model, use a 2-phase deployment strategy where you remove use of it from the code before removing the field/model with a migration. See Everything About Database Migrations#Deploymentandbackward-compatiblemigrations.

  3. System settings. Use Django settings for open edX instance/deployment wide configurations that do not change while the service is running. For a complete walkthrough of this method see this guide: /wiki/spaces/ENG/pages/439223874
    1. Changes. Note that any changes to these settings would require a re-deployment of the service.  To be able to change these settings dynamically, see '#2 Feature settings above'.
    2. Configuration files.
      1. Add the setting's default value to (lms|cms)/envs/common.py (example).
      2. Override the value in other relevant (lms|cms)/envs/* files (example).
      3. Make sure to update aws.py so the setting can be configurable via JSON files (example).
        1. If the setting is security-sensitive, read its value from AUTH_TOKENS
        2. Else, read the value from ENV_TOKENS
    3. Documentation. Add your setting to edX Feature Flags, along with description, notes, contact, etc. If prior discussions about the setting are on a JIRA ticket, link that in as well!


Why are Feature Toggles used?

Case 1: Decoupling release from deployment

When teams introduce new changes or new features in the platform and they want to:

  • have control of when that feature is enabled so they can monitor it in stage/production at their own time, independent of the release cycle.
  • have the ability to disable it in case things go unexpectedly in stage/production.
  • be able to submit incremental changes without turning on the entire feature and preventing exposure to unfinished work.
    • Note: This last reason should be used very sparingly! Instead of having a grand toggle to unlock many changes at once, consider breaking up your feature into iterative verticals that can be released a bite at a time.  See Release Toggles Are The Last Thing You Should Do.

Best Practices

  • Use WaffleSwitch.
  • Gate entry points. Don't try to protect every code path with a toggle. Focus on checking for the toggle on just entry points that would lead users to the change/feature.
  • Tests. Run all tests with the expected future of the waffle switch being enabled.  Have at least a few tests to verify the functionality when the waffle switch is disabled.
  • Removal. Once the feature is field-tested in production, the feature toggle should be removed from both (a) all places in the code and (b) from the production/stage configurations. Plan ahead by creating a clean up story ahead of time.

Case 2: Staged Rollout

When teams introduce new changes or new features in the platform and they want to control the population affected by the change or exposed to the new feature.  Here are some cases when this occurs:

  • The team is concerned about possible performance/scalability degradation or functional correctness issues and they want to monitor the impact on stage/production by gradually increasing the load.
  • The team wants to Beta-test the feature in Production with individuals on the development team.
  • The team wants to Beta-test the feature in Production with a few brave course teams that have opted-in prior to making the feature available to all courses.

Best Practices

  • Gate entry points. Don't try to protect every code path with a toggle. Focus on checking for the toggle on just entry points that would lead users to the change/feature.
  • Tests. Run tests with both variations, on and off, to verify the functionality continues to work during the rollout transition phase.
  • Removal. Once the feature is fully adopted in production, the feature toggle should be removed from both (a) all places in the code and (b) from the production/stage configurations. Plan ahead by creating a clean up story ahead of time.
Early BetaGradual RolloutFull RolloutAll EnvironmentsClean Up

Beta test in Production with individual users, courses, or percentage of population (both internal and external).  Implement with CourseWaffleFlag or WaffleFlag as described above.

Prepare to remove the flag after full rollout.  Track this with a JIRA ticket.

Add courses in the Beta via CourseWaffleFlag or add percentage of users in the population via WaffleFlag.

Turn the feature on for everyone on the site (e.g., courses.edx.org).

CourseWaffleFlag could be used to turn the feature off for a course that chooses to opt-out. But this delays completion of the feature, so use sparingly.

Other environments (e.g. edge.edx.org) may lag (or be ahead). Turn the feature on for environments that aren't yet using the flag and monitor any unexpected impact.

The feature has been rolled out to all edX environments. Remove the toggle from all code and configurations.

If there is a strong reason to enable the Open edX community to have the same rollout capabilities, consider delaying clean-up until after the next Open edX release.

Case 3: Configuration option for Open edX

When teams introduce new changes or new features that are not expected to be adopted by all open edX instances.  This case may co-exist with Case #1 or Case #2 above.  So use the guidelines from the above cases to decide which Waffle technology to use.

Best Practices

  • Think twice about whether or not this option is necessary. There is a large cost to supporting long-term toggles. Although you can avoid hard decisions by just providing a toggle and letting others decide for themselves, code complexity is reduced with unnecessary alternative paths.
  • Consider alternative design patterns such as plug-in architectures and other extensible mechanisms that don't require code-level toggles. 

Audit Trail of Changes

Waffle Changes

The django_admin_log table keeps a history and audit trail of all WaffleFlag and WaffleSwitch made via Django Admin.

WaffleSwitch changes can be queried via

WaffleFlag changes can be queried via

CourseWaffleFlag changes can be queried via

Configuration Model Changes

As mentioned above, Configuration models are append only and so changes to the settings via Django Admin will be maintained in its own table, with a new row added for each change.

Coding Tips

Testing with Waffle

Python Unit Tests

  • For WaffleFlag and CourseWaffleFlag, use the override_waffle_flag decorator implemented in this testutils.py file.

Bokchoy Tests

  • Bokchoy tests cannot use the decorator because the server is separate from the test code.
  • To override in the URL, see the External Test Suites section of the Waffle documentation. Read the following important details as well.
    • In order to override a flag, it must first exist in the database.
      • In edx-platform, you can temporarily create a record in common/test/db_fixtures/waffle_flags.json that will be loaded directly into mysql for bok-choy tests only.  Note that you can default the flag on or off depending on your needs.
      • Other teams temporarily create a migration which will create the flag in all environments, including Production. 
    • Here is some example code reloading a page with a waffle flag set to a different value.
    • Note: In edx-platform, the WAFFLE_OVERRIDE setting is already taken care of in bokchoy to enable this type of URL override.

Pending Infrastructural Work (up for grabs)

  1. Extract the waffle_utils djangoapp out of edx-platform into its own repo so it can be used by other IDAs.
  2. Figure out how to enable a waffle switch/flag for ALL unit tests in the platform - without using a context manager.
  3. Enhance waffle_utils to be site-aware so toggles can be overridden for White Label sites.
  4. Implement a CourseWaffleSwitch equivalent of CourseWaffleFlag when the need arises.
  5. Better tools for (1) duplicating settings from one system to another and (2) initializing settings in a system.

Reference

Glossary

  • waffle_utils: A django app (currently within edx-platform) that provides a set of Waffle related utilities, including CourseWaffleFlag and WaffleFlag.  It is also an explicit abstraction layer for toggling of features.  Use the interfaces provided in this library instead of Waffle directly.

  • WaffleFlag: A class within waffle_utils that supports request-caching and name-spacing.  Use this for all waffle flags outside a course.

  • CourseWaffleFlag: A class that supports Course-level overrides of WaffleFlags.