Summary of Issue
There were a number of issues we encountered while trying to merge the new forums backend into edx-platform.
- The process was disruptive to 2U because of the operational effort needed to deploy, test, and debug the new forums code on their infrastructure (it had been tested and deployed using Tutor prior to this point).
- The process was disruptive to the Sumac release process, because of the coupling of the master branch and 2U's release pipeline. This put us in the position where we were cherry-picking the forums code onto the Sumac test sandbox for weeks, without landing it on either master or the sumac release branch.
This happened despite efforts to make the new forums functionality backwards compatible. 2U is also currently running the new forums code in a way that makes as few operational changes as possible from v1–still calling out to the Ruby service for everything. Further forums migration steps have the potential to be even more disruptive, and 2U has already indicated that it has no capacity to manage this transition in the Teak timeframe (as outlined in the DEPR ticket).
The larger context is that there are other upcoming changes that may be similarly disruptive, such as the XQueue replacement project kicking off discovery in January, the content libraries migration slated for late 2025, and possibly even the upcoming roles and permissions work. These efforts are meant to reduce operational complexity in the long term, by retiring services, standardizing our code, and simplifying our deployment stack. But 2U's unique infrastructure and high scale mean that it won't be able to make use of this new code without investing time and energy into devops, testing, and data migration.
Relevant Tickets
Timeline of Events
- 2024-11-11
- Dave tags Kelly to give a heads up that the v2 forums code PR is getting close to merging in edx-platform.
- Dave starts a Slack thread in
#ask-2u
to ask if 2U has switched over toutf8mb4
for database connections. This was motivated by the fact that the forums PR would create database tables for the new MySQL storage backend. If the config was not changed before deployment, the tables would not support storing emojis.
- 2024-11-13
- After some investigation, Robert verifies that the encoding of the database connection for the LMS and Studio is still
utf8
(which means it'sutf8mb3
).
- After some investigation, Robert verifies that the encoding of the database connection for the LMS and Studio is still
- 2024-11-18
- Forums PR gets final approvals.
- 2024-11-19
- 2U's staging environment Elasticsearch is not working, making it impossible to properly test forums in that environment. Asad informs the channel that SRE has reached out to AWS support
- 2024-11-21
- Robert makes a new ticket to 2U SRE. Merging the PR to master is delayed.
- The rationale for this was that the configuration change would be straightforward (requires no data migration, Tutor has connected this way by default since Redwood, and most sites that we know of also connect in this manner). And that by doing that configuration change, we would save 2U having to spend more time modifying those tables in the future.
- Robert makes a new ticket to 2U SRE. Merging the PR to master is delayed.
- 2024-11-22
- Alex notifies the Slack thread that SRE is investigating the connection encoding issue.
- The 2U staging instance Elasticsearch is fixed.
- 2024-11-25
- More discussion of the scope of required changes, and whether changing this configuration falls under SRE or app owners.
- 2024-11-26
- Ahtisham and Diana pick up the configuration work, but we agree that deployment shouldn't happen until after the long U.S. Thanksgiving holiday.
- (incomplete: Dave will continue filling this in on Monday)
How did this happen?