Teak - Operator/Dev Notes

Teak - Operator/Dev Notes

The 20th Open edX community release will be named Teak. Consult the Open edX Release Schedule for details around when the release master branch will be cut and the actual release will occur.

Put stuff here that we have to remember when we start packaging up Teak.  Especially important is information that system installers or operators will need to know. Please include your name when you add an item, so that we can get back to you with questions.

Operational

  • In LMS and CMS, Celery now uses task protocol 2. (@Tim McCormack)

    • Action: Any operator using custom Celery tooling should ensure it is compatible with protocol 2. For other operators, no action is required.

    • Background: Celery 4.0 switched how task messages are structured and the new message format is called protocol 2. The version of Celery we currently use (anything >=4.0) can create and consume both protocol versions and it should be safe to switch between them with zero downtime.

      • By default, Celery 4.0 and higher produce messages in this format, and Celery 3.1.25 and higher can read messages in this format.

      • edx-platform was pinned to protocol 1 during the upgrade to Celery 4, presumably as a precaution. This change is the long-delayed unpinning of the protocol version so that Celery can use its default version.

      • Operators can still override the protocol version using the Django setting CELERY_TASK_PROTOCOL although there is no guarantee that protocol 1 compatibility will be preserved in the future.

  • When codejail is used by LMS and CMS, it no longer requires write access to the sandbox virtualenv .config or .cache directories. (@Tim McCormack)

    • Action: If you run codejail, it is recommended that you remove write permissions to <SANDENV>/.config and <SANDENV>/.cache from your AppArmor profile, if possible.

    • Background: Running import matplotlib in a custom Python-evaluated XBlock in Sumac and earlier required the AppArmor profile to allow write access to one of these directories. In Teak, edxapp now sets the MPLCONFIGDIR environment variable for inputs sent to codejail, so matplotlib will now write to the ./tmp/ subdirectory inside the codejail-created sandbox.

      • You should be able to identify these exclusions by looking for lines like /home/sandbox/.config/ wrix, although the exact parent directory may vary. Other temporary directories may have been allowed instead, such as /tmp. Any such write permission to a global directory is inadvisable, since it reduces the ability of codejail to perform effective sandboxing. Removing these lines in Teak will (appropriately) reduce the permissions of sandboxed code. They should not be removed before Teak, however, as this will cause matplotlib to fail to load.

      • Operators who have not previously needed to support matplotlib in instructor or learner code may not have these exclusions in their AppArmor configurations. If this is your situation, no action is required.

      • Removing these lines may cause other, unanticipated failures in sandboxed code. Monitor your codejail logs and failure rates when deploying this change.

  • New feature: Codejail local/remote darklaunch @Tim McCormack

    • Audience: Deployers who support codejail (e.g. custom Python-graded problem blocks) and are not already using a remote codejail service.

      • This is not relevant to Tutor, which does not support local codejail.

    • Background: Historically, codejail execution has been performed on the same hosts as LMS and CMS, aka “local codejail”. There is a new codejail-service that allows performing this code execution remotely. This allows for additional security restrictions, and the new code includes several security enhancements.

    • Purpose: The darklaunch feature allows operators to gain confidence in preparing for a switch from local to remote codejail. When enabled, it can send all codejail executions to both local and remote codejail, while only using the results of the local execution and suppressing all errors from the remote side. This allows operators to discover issues in the remote service’s configuration under real production traffic conditions.

    • Usage: To use darklaunch to switch from local to remote:

      1. Create a codejail-service cluster

      2. Configure LMS and CMS to call it by configuring CODE_JAIL_REST_SERVICE_HOST but not ENABLE_CODEJAIL_REST_SERVICE (which must remain disabled for the moment).

      3. Begin the dark launch by setting ENABLE_CODEJAIL_DARKLAUNCH to true. Traffic will begin flowing to the new service, but the results will be ignored.

        • The only user-visible impact should be that codejail executions take twice as long, as the local and remote executions are performed serially.

      4. Observe telemetry to discover errors and behavior mismatches.

        • Mismatches can include:

          • One side failed to execute entirely (“unexpected error”) while the other did not. This might include network issues.

          • One side returned an error from the submitted code, while the other did not, or produced a different error.

          • Both sides succeeded, but the returned globals dictionaries differed.

        • Error and warning logs from safe_exec.py in edxapp containing codejail darklaunch will tell you about configuration problems, unexpected errors, and mismatches in behavior between the two environments.

        • Span-based telemetry (New Relic, Datadog, etc.) can be used to track rates of mismatches and break them down by course ID and type. See set_custom_attribute calls starting with codejail. in safe_exec.py for available attributes. The local-only, remote-only and local/remote darklaunch calls all have different span names as well, e.g. safe_exec.remote_exec_darklaunch.

        • Use CODEJAIL_DARKLAUNCH_EMSG_NORMALIZERS to normalize away spurious mismatches between the environments. (Not all mismatches can be readily ignored, such as ordering differences in sets.)

      5. Once behavior and performance differences are resolved, remove ENABLE_CODEJAIL_DARKLAUNCH and set ENABLE_CODEJAIL_REST_SERVICE to true. This will complete the migration, and codejail executions will only be performed on the remote service.

Deprecations and Removals

Default Changes for Teak

Notes for Release Manager (not for release notes)

  • Certain repos (that are transitive dependencies) had previously been erroneously tagged for release due to docs builds. These repos have open-release/sumac.master branches, but will not have teak.master branches. Please see Link tag versions of repos to the named release builds · Issue #941 · openedx/docs.openedx.org for details.

  • Product has taken a look at the newly added settings and feature toggles, and made notes about how we’d like these new toggles to be configured for the release and for the release testing sandbox on this sheet here.

  • (New feature) In-Context Metrics in Studio

    • Configuration instructions for the Teak Release

      • Upgrade tutor-contrib-aspects to version v2.2.1

      • Add the setting ASPECTS_ENABLE_STUDIO_IN_CONTEXT_METRICS = False to the openedx-cms-common-settings  Tutor patch

      • tutor config save

      • Rebuild the MFE container

    • Configuration instructions for the Teak Testing Sandbox

      • Upgrade tutor-contrib-aspects to version v2.2.1

      • Add the setting ASPECTS_ENABLE_STUDIO_IN_CONTEXT_METRICS = True to the openedx-cms-common-settings  Tutor patch

      • tutor config save

      • Rebuild the MFE container

  • (Existing feature; setting change) Entrance Exams

    • FEATURES[‘ENTRANCE_EXAMS’]

      • Set this to True (default is False)

      • Source: openedx/core/toggles.py (line 7)

      • Desc: Enable entrance exams feature. When enabled, students see an exam xblock as the first unit of the course.

      • Creation Date: 2015-12-01

      • Implementation: [‘SettingDictToggle’]

      • Use Cases: [‘open_edx’]