Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

For those not familiar with the RCA format: The point is to find systematic flaws that led to the incident being discussed. Blame should be assigned to processes, not people. It’s us against the mistakes that got us here.

Summary of Issue

There were a number of issues we encountered while trying to merge the new forums backend into edx-platform.

...

  • 2024-11-11

    • Dave tags Kelly to give a heads up that the v2 forums code PR is getting close to merging in edx-platform.

    • Dave starts a Slack thread in #ask-2u to ask if 2U has switched over to utf8mb4 for database connections. This was motivated by the fact that the forums PR would create database tables for the new MySQL storage backend. If the config was not changed before deployment, the tables would not support storing emojis.

  • 2024-11-13

    • After some investigation, Robert verifies that the encoding of the database connection for the LMS and Studio is still utf8 (which means it's utf8mb3).

  • 2024-11-18

    • Forums PR gets final approvals.

  • 2024-11-19

    • 2U's staging environment Elasticsearch is not working, making it impossible to properly test forums in that environment. Asad informs the channel that SRE has reached out to AWS support

  • 2024-11-21

    • Robert makes a new ticket to 2U SRE. Merging the PR to master is delayed.

      • The rationale for this was that the configuration change would be straightforward (requires no data migration, Tutor has connected this way by default since Redwood, and most sites that we know of also connect in this manner). And that by doing that configuration change, we would save 2U having to spend more time modifying those tables in the future.

  • 2024-11-22

    • Alex notifies the Slack thread that SRE is investigating the connection encoding issue.

    • The 2U staging instance Elasticsearch is fixed.

  • 2024-11-25

    • More discussion of the scope of required changes, and whether changing this configuration falls under SRE or app owners.

  • 2024-11-26

    • Ahtisham and Diana pick up the configuration work, but we agree that deployment shouldn't happen until after the long U.S. Thanksgiving holiday.

    (incomplete: Dave will continue filling this in on Monday)
    • .

  • 2024-12-03

    • 2U infrastructure was updated to use utf8mb4, removing the last blocker to merge.

    • Diana pauses the release pipeline. The Infinity will test.

    • 11:01 AM EST: Dave merges the initial PR into master.

  • 2024-12-04

    • 08:43 EST: Ahtisham reports that the staging env is broken. We decide to fix forward.

How did this happen?

 

Note
  • We are filling this out async.

    • If there are any invalid assumptions or statements in any of the bullets below, simply add your own bullet to help clarify.

    • Please add “[NAME]” as a prefix to your bullet.

...