Teak - Operator/Dev Notes
The 20th Open edX community release will be named Teak. Consult the Open edX Release Schedule for details around when the release master branch will be cut and the actual release will occur.
Put stuff here that we have to remember when we start packaging up Teak. Especially important is information that system installers or operators will need to know. Please include your name when you add an item, so that we can get back to you with questions.
Operational
In LMS and CMS, Celery now uses task protocol 2. (@Tim McCormack)
Action: Any operator using custom Celery tooling should ensure it is compatible with protocol 2. For other operators, no action is required.
Background: Celery 4.0 switched how task messages are structured and the new message format is called protocol 2. The version of Celery we currently use (anything >=4.0) can create and consume both protocol versions and it should be safe to switch between them with zero downtime.
By default, Celery 4.0 and higher produce messages in this format, and Celery 3.1.25 and higher can read messages in this format.
edx-platform was pinned to protocol 1 during the upgrade to Celery 4, presumably as a precaution. This change is the long-delayed unpinning of the protocol version so that Celery can use its default version.
Operators can still override the protocol version using the Django setting
CELERY_TASK_PROTOCOL
although there is no guarantee that protocol 1 compatibility will be preserved in the future.
When codejail is used by LMS and CMS, it no longer requires write access to the sandbox virtualenv
.config
or.cache
directories. (@Tim McCormack)Action: If you run codejail, it is recommended that you remove write permissions to
<SANDENV>/.config
and<SANDENV>/.cache
from your AppArmor profile, if possible.Background: Running
import matplotlib
in a custom Python-evaluated XBlock in Sumac and earlier required the AppArmor profile to allow write access to one of these directories. In Teak, edxapp now sets theMPLCONFIGDIR
environment variable for inputs sent to codejail, so matplotlib will now write to the./tmp/
subdirectory inside the codejail-created sandbox.You should be able to identify these exclusions by looking for lines like
/home/sandbox/.config/ wrix,
although the exact parent directory may vary. Other temporary directories may have been allowed instead, such as/tmp
. Any such write permission to a global directory is inadvisable, since it reduces the ability of codejail to perform effective sandboxing. Removing these lines in Teak will (appropriately) reduce the permissions of sandboxed code. They should not be removed before Teak, however, as this will cause matplotlib to fail to load.Operators who have not previously needed to support matplotlib in instructor or learner code may not have these exclusions in their AppArmor configurations. If this is your situation, no action is required.
Removing these lines may cause other, unanticipated failures in sandboxed code. Monitor your codejail logs and failure rates when deploying this change.
New feature: Codejail local/remote darklaunch @Tim McCormack
Audience: Deployers who support codejail (e.g. custom Python-graded problem blocks) and are not already using a remote codejail service.
This is not relevant to Tutor, which does not support local codejail.
Background: Historically, codejail execution has been performed on the same hosts as LMS and CMS, aka “local codejail”. There is a new codejail-service that allows performing this code execution remotely. This allows for additional security restrictions, and the new code includes several security enhancements.
Purpose: The darklaunch feature allows operators to gain confidence in preparing for a switch from local to remote codejail. When enabled, it can send all codejail executions to both local and remote codejail, while only using the results of the local execution and suppressing all errors from the remote side. This allows operators to discover issues in the remote service’s configuration under real production traffic conditions.
Usage: To use darklaunch to switch from local to remote:
Create a codejail-service cluster
Configure LMS and CMS to call it by configuring
CODE_JAIL_REST_SERVICE_HOST
but notENABLE_CODEJAIL_REST_SERVICE
(which must remain disabled for the moment).Begin the dark launch by setting
ENABLE_CODEJAIL_DARKLAUNCH
totrue
. Traffic will begin flowing to the new service, but the results will be ignored.The only user-visible impact should be that codejail executions take twice as long, as the local and remote executions are performed serially.
Observe telemetry to discover errors and behavior mismatches.
Mismatches can include:
One side failed to execute entirely (“unexpected error”) while the other did not. This might include network issues.
One side returned an error from the submitted code, while the other did not, or produced a different error.
Both sides succeeded, but the returned globals dictionaries differed.
Error and warning logs from
safe_exec.py
in edxapp containingcodejail darklaunch
will tell you about configuration problems, unexpected errors, and mismatches in behavior between the two environments.Span-based telemetry (New Relic, Datadog, etc.) can be used to track rates of mismatches and break them down by course ID and type. See
set_custom_attribute
calls starting withcodejail.
in safe_exec.py for available attributes. The local-only, remote-only and local/remote darklaunch calls all have different span names as well, e.g.safe_exec.remote_exec_darklaunch
.Use
CODEJAIL_DARKLAUNCH_EMSG_NORMALIZERS
to normalize away spurious mismatches between the environments. (Not all mismatches can be readily ignored, such as ordering differences in sets.)
Once behavior and performance differences are resolved, remove
ENABLE_CODEJAIL_DARKLAUNCH
and setENABLE_CODEJAIL_REST_SERVICE
totrue
. This will complete the migration, and codejail executions will only be performed on the remote service.
Deprecations and Removals
[DEPR]: block_structure.storage_backing_for_cache in edx-platform · Issue #32 · openedx/public-engineering (@Feanil Patel)
This is a simplification to how course content is cached. It should be invisible to all end users.
Some operators may need to run a management command to re-populate the cache:
./manage.py lms generate_course_blocks --all_courses --settings=production
The flag ENABLE_BLAKE2B_HASHING was removed. blake2b hashing is now used for caching instead of the deprecated md4 hashing. After upgrading, it’s possible that performance could be degraded as the cache rebuilds.
[DEPR]: django-oauth2-provider (DOP) related tables · Issue #82 · openedx/public-engineering (@Robert Raposa)
For LMS and CMS, there is a new script to clean up old DOP-related authentication tables.
If you have an old installation of the Open edX platform (Palm or later), you may have many outdated/unused authentication-related tables that can lead to confusion when looking at the database.
The script is not related to Teak other than it now being available, and should be ok to run on any installation using Palm, Redwood, or Sumac.
It is possible to work around this breaking change by also exporting your forked
Footer
component asFooterSlot
and your forkedStudioFooter
component asStudioFooterSlot
Default Changes for Teak
Notes for Release Manager (not for release notes)
Certain repos (that are transitive dependencies) had previously been erroneously tagged for release due to docs builds. These repos have open-release/sumac.master branches, but will not have teak.master branches. Please see Link tag versions of repos to the named release builds · Issue #941 · openedx/docs.openedx.org for details.
Product has taken a look at the newly added settings and feature toggles, and made notes about how we’d like these new toggles to be configured for the release and for the release testing sandbox on this sheet here.
(New feature) In-Context Metrics in Studio
Configuration instructions for the Teak Release
Upgrade tutor-contrib-aspects to version
v2.2.1
Add the setting
ASPECTS_ENABLE_STUDIO_IN_CONTEXT_METRICS = False
to theopenedx-cms-common-settings
Tutor patchtutor config save
Rebuild the MFE container
Configuration instructions for the Teak Testing Sandbox
Upgrade tutor-contrib-aspects to version
v2.2.1
Add the setting
ASPECTS_ENABLE_STUDIO_IN_CONTEXT_METRICS = True
to theopenedx-cms-common-settings
Tutor patchtutor config save
Rebuild the MFE container
(Existing feature; setting change) Entrance Exams
FEATURES[‘ENTRANCE_EXAMS’]
Set this to True (default is False)
Source: openedx/core/toggles.py (line 7)
Desc: Enable entrance exams feature. When enabled, students see an exam xblock as the first unit of the course.
Creation Date: 2015-12-01
Implementation: [‘SettingDictToggle’]
Use Cases: [‘open_edx’]