Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

All public Working Group meetings follow the Recording Policy for Open edX Meetings

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

⏮️ Previous TODOs

DescriptionAssigneeTask appears on
  • Jeremy Ristau Will have someone ask about forum performance testing on the DEPR ticket
Jeremy Ristau2024-11-14 Meeting notes
  • Feanil Patel follow-up with Ed/Felipe about the codejail service and whether we should make it part of Openedx
Feanil Patel2024-10-24 Meeting notes
  • Feanil Patel Update elasticsearch support windows now that it’s viable again.
Feanil Patel2024-10-17 Meeting notes
  • Feanil Patel ticket enabling cron CI of master every week so we know when external changes might have broken some repos that are usually not getting updates.
Feanil Patel2024-09-12 Meeting notes
  • Jeremy Ristau ensure DEPR tickets are created for any frontend that can be deleted as a result of the new course-authoring MFE.
Jeremy Ristau2024-05-30 Meeting notes

\uD83D\uDDE3 Discussion topics

Item

Presenter

Notes

Discuss access for teams that maintain many repos across the org

  • Introduction of a new CC Role for maintainer-at-large role

    • Will be posted to the forums shortly

    • We can follow up on discussion in that post for feedback.

Continued discussion on whether we should change the Depr 6-month window approach. Should we have one big ticket for something like Python 3.8 or Node 18 and just start the 6-month clock once all the maintained repos have been updated?


  • Proposal to shorten the DEPR simultaneous support window to 4-months, for future upgrade DEPRs that need to have a support window/operator impact.

    • We chose 6-months to guarantee it would be in one release.

  • Alternate Proposal:

    • Provide a predictable time when the fix will be gauranteed to be available within the next six months.

    • Announce the DEPR as early as possible (6-months is ideal) and at the end of the DEPR, there has to be a 1-month period of simultaneous support.

      • The Plan is announced early and the time when the work is completed is as predictable as possible.

      • If the work is done early, we should keep the original date but this could be negotiated. Get agreement from people running master.

      • If the work is completed late, we provide a 1-month simultaneous support window from the time of completion.

      • We give at least six months announcement window. But the work does not need to have started or completed when we make the announcement.

Teak Maintenance Goals, take a look at https://docs.google.com/spreadsheets/d/1wtpoypH1XOPc_G6h9AUNXJ6XiNKD6dlkMP3lubdpE9I/edit?gid=195838733#gid=195838733

Next time

edx-platform Specific Conversations

Celery sharing

  • See https://github.com/edx/configuration/pull/68

  • I think we’ve proven empirically that the issue is as follows (this is not captured well by our docs yet, so that could cause some confusion w.r.t. state of actual resolution):

    1. We were running with celery mingle enabled (b/c its enabled by default). Mingle means that, on worker startup (including restarts), each worker asks about the state of every other worker bound to the broker (redis).

    2. Every edx python IDA that uses celery used a single broker (the legacy redis cluster).

    3. edxapp was running with 30 worker instances each, and each one of those runs around 14 parent celery worker processes.

    The confluence of these three things kicked off a “connection storm” in redis, causing massive amounts of (duplicated) task data to be sent out over the network to every worker, which caused us to pin the redis engine CPU at 100%, and blocked all workers from processing tasks from any queue.

    The way we proved this empirically - during deploys (i.e. when we bring up a larger number of new worker processes), look at the following:

    1. The number of “sync with” celery logs eminating from the celery workers.

    2. The total network out from redis to the workers.

    3. Redis engine CPU utilization

    4. Redis new and current network connection counts.

    In the bad state, all three of these metrics spiked and stayed elevated for quite some time. When mingle was disabled (on stage), none of them spiked.

Config overrides and YAML

  • Old Conversation: You should have your own settings files.

  • New Conversation about Devstack config being dropped:

    • The new development.py settings file should not include YAML support but will allow downstream settings files to add YAML support if they want it.

Toggle annotations and DEPR

  • Can we use the removal dates in toggle annotations as the deadlines when it’s safe to remove?

    • The goal of the annotation was always documentation to make it easier to understand the age of Toggles. This was before the 6-month window was created.

    • Proposal: Drop the removal date and just use the DEPR process because the dates used to be aspirational will mislead folks.

✅ Action items

  • Kyle McCormick will update the DEPR Pilot ticket with the new suggestion for planning major maintenance DEPRs

⤴ Decisions

  • No labels