2024-09-26 Meeting notes

All public Working Group meetings follow the Recording Policy for Open edX Meetings

 Date

Sep 26, 2024

 Participants

  • @Feanil Patel

Previous TODOs

 Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Discuss access for teams that maintain many repos across the org

 

  • Introduction of a new CC Role for maintainer-at-large role

    • Will be posted to the forums shortly

    • We can follow up on discussion in that post for feedback.

Continued discussion on whether we should change the Depr 6-month window approach. Should we have one big ticket for something like Python 3.8 or Node 18 and just start the 6-month clock once all the maintained repos have been updated?



  • Proposal to shorten the DEPR simultaneous support window to 4-months, for future upgrade DEPRs that need to have a support window/operator impact.

    • We chose 6-months to guarantee it would be in one release.

  • Alternate Proposal:

    • Provide a predictable time when the fix will be gauranteed to be available within the next six months.

    • Announce the DEPR as early as possible (6-months is ideal) and at the end of the DEPR, there has to be a 1-month period of simultaneous support.

      • The Plan is announced early and the time when the work is completed is as predictable as possible.

      • If the work is done early, we should keep the original date but this could be negotiated. Get agreement from people running master.

      • If the work is completed late, we provide a 1-month simultaneous support window from the time of completion.

      • We give at least six months announcement window. But the work does not need to have started or completed when we make the announcement.

Teak Maintenance Goals, take a look at Support Windows

 

Next time

edx-platform Specific Conversations

Celery sharing

 

  • See feat: run celery without mingle, heartbeat, or gossip by iloveagent57 · Pull Request #68 · edx/configuration

    • @Régis Behmo @Felipe Montoya @Jhony Avella flagging as a potential improvement to make to how Tutor or Harmony runs celery.

  • I think we’ve proven empirically that the issue is as follows (this is not captured well by our docs yet, so that could cause some confusion w.r.t. state of actual resolution):

    1. We were running with celery mingle enabled (b/c its enabled by default). Mingle means that, on worker startup (including restarts), each worker asks about the state of every other worker bound to the broker (redis).

    2. Every edx python IDA that uses celery used a single broker (the legacy redis cluster).

    3. edxapp was running with 30 worker instances each, and each one of those runs around 14 parent celery worker processes.

    The confluence of these three things kicked off a “connection storm” in redis, causing massive amounts of (duplicated) task data to be sent out over the network to every worker, which caused us to pin the redis engine CPU at 100%, and blocked all workers from processing tasks from any queue.

    The way we proved this empirically - during deploys (i.e. when we bring up a larger number of new worker processes), look at the following:

    1. The number of “sync with” celery logs eminating from the celery workers.

    2. The total network out from redis to the workers.

    3. Redis engine CPU utilization

    4. Redis new and current network connection counts.

    In the bad state, all three of these metrics spiked and stayed elevated for quite some time. When mingle was disabled (on stage), none of them spiked.

Config overrides and YAML

 

  • Old Conversation: You should have your own settings files.

  • New Conversation about Devstack config being dropped:

    • The new development.py settings file should not include YAML support but will allow downstream settings files to add YAML support if they want it.

Toggle annotations and DEPR

 

  • Can we use the removal dates in toggle annotations as the deadlines when it’s safe to remove?

    • The goal of the annotation was always documentation to make it easier to understand the age of Toggles. This was before the 6-month window was created.

    • Proposal: Drop the removal date and just use the DEPR process because the dates used to be aspirational will mislead folks.

 Action items

@Kyle McCormick will update the DEPR Pilot ticket with the new suggestion for planning major maintenance DEPRs

Recording and Transcript

Recording: https://drive.google.com/file/d/1DefRlGrGz4eAeiNc6dtTLe39JNwechVQ/view?usp=sharing

Maintenance Working Group Meeting – 2024/09/26 08:59 EDT – Transcript

Attendees

Adolfo Brandes, Brian Smith, Feanil Patel, Feanil Patel's Presentation, Jeremy Ristau, Kyle McCormick, Maksim Sokolskiy, Robert Raposa, Sarina Canelake

Transcript

Brian Smith: Hello.

Robert Raposa: And that's

Sarina Canelake: My name. Brian Morning Robert.

Brian Smith: mark,

Feanil Patel: Hello.

Brian Smith: Hello.

Feanil Patel: Robert. Does this mean you're production was? A little bit more under control.

Robert Raposa: There it does. And actually I can share One potential change that just might help reduce celery load for everyone.

Jeremy Ristau: This morning, everyone.

Robert Raposa: Over reduced.

Feanil Patel: Bringing my name.

Brian Smith: Morning.

Robert Raposa: Good morning, reduce load on the Redis cluster, of course.

Feanil Patel: Sounds good.

Feanil Patel: Notes on her.

Feanil Patel: Skipping that going now.

Feanil Patel: Fairness cream.

Feanil Patel: Y'all can see that. Okay. there

Feanil Patel: let's just go down the line real quick and then, we'll Talk through the next time stuff.

Jeremy Ristau: Yeah, so for the first one, I have brought up the Django upgrade to arbybaum. They're not really doing anything yet but at least it's on their radar. And then I pinged our SRE manager as well.

Feanil Patel: Yeah.

Feanil Patel: Yeah, I think both of the first two things are.

Jeremy Ristau: To let them know and connected them with the Neil.

Jeremy Ristau: So I think I can call that done for now.

Feanil Patel: Done for and then, A we can follow up on that conversation that we have that I mentioned like That we had on that ticket.

Jeremy Ristau: Yeah.

Feanil Patel: Here, if that would be useful around. Cross repo maintenance, access stuff.

Jeremy Ristau: Yeah, that'd be great.

Feanil Patel: And the new rule.

Feanil Patel: And then next thing is the patch window discussion. We want to have That.

Feanil Patel: and I know,

Feanil Patel: and then there's the crown on master, which I've not taken it up yet, but I'll try to do that this week. Brian, did you get a chance to update the dapper pilot issue with the

Brian Smith: Yeah, I left a comment on that issue. Just kind of adding addendums for scope and…

Feanil Patel: Okay.

Brian Smith: unmaintained repost stuff. I think it covers everything that we have there, but if someone wants to double check and see if there's something I missed more eyes will lead to better docs.

Feanil Patel: Yeah. Yeah,…

Brian Smith: So yeah.

00:05:00

Feanil Patel: and that is Cool. Yeah, we can take a look at those but I'm going to mark this as Done or now. People can follow up on the issue.

Brian Smith: Exactly.

Feanil Patel: Six month window for simultaneous Python version doesn't seem prudent. I think that's like a variation. Yeah, I think that's in here. And in the next time,

Feanil Patel: let's see. Chief maintenance goals.

Feanil Patel: And then I found three 12 or 313 we completed that discussion last time.

Feanil Patel: and then, Pepper tickets for the front ends that need to be deleted as part of the course offering MVP. No update on that, Jeremy.

Jeremy Ristau: Now.

Feanil Patel: And then I'm going to drop Clinton's things since he hasn't been here in a while. so, first up

Feanil Patel: So last week, we talked about this notion of teams that are helping maintain many repos across the organization and

Feanil Patel: How they would manage access for them because needing to sort of expand and contract CC ship for them across. A lot of repost doesn't make a lot of sense. one of the things and I spoke to with Sarina about this and one of the things we sort of arrived at was, The.

Feanil Patel: Introduction of a new rule.

Feanil Patel: Which I think currently, we're calling maintenance at large.

Sarina Canelake: Yeah, I was gonna actually put this on the forums for discussion. So I know I want make sure anybody…

Feanil Patel: Yeah. yeah, I mean

Sarina Canelake: who has thoughts here captures it on the discussion post. I'm about to make not just in this thing.

Feanil Patel: If you make it and drop a link here, we can. Yeah.

Sarina Canelake: Yeah, I actually tagged you on it yesterday because I wanted you to review it before I posted it.

Feanil Patel: I'll take a look at it after this, but I think this is the right thing to do that role will get introduced. There's details in the post sarina's going to make and we can sort of follow up on there about that role and whether it seems like the right fit

Feanil Patel: and if it doesn't,

Feanil Patel: So, it will skip this for now, and sort of follow up async on that post and I'll put a link in here once I've gone through it, and we can post it.

Robert Raposa: I will note that the name of it is amazingly possibly out of place. If it doesn't at large mean escaped.

Feanil Patel: it's also used in and politics for representatives that represent the entire city as opposed to certain districts in it.

Robert Raposa: The.

Robert Raposa: I think I just am less aware of that you Cool.

Feanil Patel: We got it. Yeah, and the name is one of the things. You can provide feedback on

Robert Raposa: Yes, that's good.

Feanil Patel: Yeah. It's a surprisingly hard group to name, I think.

Robert Raposa: Okay.

Robert Raposa: Just give it.

Feanil Patel: We'll make sure it's a person carrying, a bag of money or…

Robert Raposa: Okay, exactly.

Feanil Patel: something. Stripe shirt.

Feanil Patel: Let's talk about the deprecation window.

Feanil Patel: Which I think is super relevant this group. yeah, I think as we're sort of adding on, all of these different matrices or finding that the cost of maintaining those for a long period of time, in terms of Time. It takes for tests to run and queuing is actually much higher than I previously expected for six month window.

00:10:00

Feanil Patel: And I'd like to propose shortening that window. A bit to reduce that load and to sort of speed up some of these things landing. And they want to open that up for discussion.

Feanil Patel: And just because it's hard to know what to shorten. It do, let's say four months unless somebody wants to As a different number and their head.

Feanil Patel: I'll go for it.

Kyle McCormick: Just point a clarification. Are you a proposing it for this upgrade all upgrades are all dappers.

Feanil Patel: I'm going to say for deckers that need to have a support window because I think the six-month window was specifically for things where there's sort of operator impact.

Feanil Patel: and for things compatibility where a Future complete new version exists. We don't necessarily need this But for anything, where, the operators or sites will have major impact, and we want to give them time to adjust.

Kyle McCormick: Breaking changes. Got it.

Feanil Patel: Yeah. Brian.

Brian Smith: and this is specifically in reference to, Things where we end up having simultaneous support for the old version and the new version, right? that's…

Feanil Patel: Correct.

Brian Smith: where Or are there pain points for other deckers…

Feanil Patel: Yeah.

Brian Smith: where we don't have that overlap time?

Feanil Patel: no, I mean the window is for simultaneous support so I think that's where Having the six months, simultaneous, support is causing issues. In terms of the amount of CI we have to maintain and the amount of resources that's taking up. And so now that we have some data on that with both the Python and…

Brian Smith: Yeah.

Feanil Patel: node updates in flight,

Feanil Patel: I based on that information and feedback. I've been hearing from other people, I kind of want to propose that we changed this down to four months but I am not one of the people who has to upgrade and operating site. So I want feedback from people who are doing that,

Brian Smith: I'd also want to call out that, I think the window might change depending on how far out from a release. We are because there's the question of, should we continue to have simultaneous support shipped in the next release? And that means we would want it up until the cut at least,

Feanil Patel: Mm- And that's why we chose the six month window in the first place. Is that it would guarantee that it would be in at least one release.

Robert Raposa: It's one question that I have is, we all understand the cost of CI. And the wish to prioritize, this work to shrink the window as much as we can. Anytime these upgrades are happening. I'm wondering.

Robert Raposa: If we think, That is not enough and that we need to. Say that six months Is not the official window, because we Feel like we're going to miss. The opportunity to shrink the window for or if we can simply say it, we don't actually need a new rule. We just know that. Hey we're going to put this ahead of other things. Because we want to get this out of CIS, quickly as possible.

Robert Raposa: Yeah. It's just a question.

Feanil Patel: Can you restate that question? I had some trouble following it.

Robert Raposa: the question is, because the current pilot is we give a six months, official window, because that's what we think is needed as a general rule, but we also,…

Feanil Patel: Mmm.

Robert Raposa: try to,

Robert Raposa: Order and organize the dippers such that things can be closed earlier. And these are ones that we have. Because of the CIA cost even more pressure, to want to bump them up in priority so that they're the first things that we're working on and we close them even earlier and…

00:15:00

Feanil Patel: Right.

Robert Raposa: and I guess the question is, Do we know that general rule and process is not going to be enough, that if we leave it at six months and we don't change anything and we all understand the cost of CIA. do we think that some of us are just going to be? and that's just not a big deal. I mean, the cost of CIA affects all like everyone, no one wants.

Robert Raposa: That sitting around longer than it can be,…

Feanil Patel: Right.

Robert Raposa: or that it can be sitting around. so I'm just wondering whether we actually need an exceptional, Let's call it four months for this or If it's six months, it still covers the six month rule for

Robert Raposa: Named releases. And we all understand that this is the type of work that we would like to prioritize over other work.

Feanil Patel: Right, I guess. So it sounds like you're saying, We keep the six-month window and just make it clear that we're going to try to land this faster?

Robert Raposa: Yep.

Feanil Patel: Does that? Yeah, And so I think my response to that would be that the window is for sort of dropping the old support in particular, and I think From a technical perspective usually that part is not super complex the dropping of the support because it was the adding support that requires the forward, compatible deltas and then dropping support is usually more clean up and faster to do.

Feanil Patel: but the impact of dropping support on operators who have not yet transitioned is the thing that I'm concerned about and I want to make sure that I think when we were talking about this we said four months or we said six months because it's sort of straddled both the releases and gave other master following operators sufficient time. What if we said, four months plus one release. So if the release was in five months, Or six months, we extended potentially. But

Feanil Patel: Often for a lot of these. We don't necessarily need simultaneous support on releases because the way that people who are on releases operate, Tends to be via tutor. For the most part and tutor has a lot of tooling to do these upgrades, as a part of the shift between releases. so, having a four-month window having tutor sort of nightly update, The name releases will sort of capture these deltas, pretty even if they're shorter than six months.

Feanil Patel: Kyle.

Kyle McCormick: I'm gonna propose something different. It might be hard to explain, bear with me. so, I'm pretending on, I'm an operator running off of master, right? And we're upgrading Python from 312 to 314, right? That's the test on the table. One support is in master. Once we have that simultaneous support, if I as an operator have planned and I have my road map set up so that my team is available to go into our docker files and switch that Python version and then test and master. That's for a smooth upgrade. That's a couple week process I think. And for a rough upgrade that might be like a month long process. It it's not six months. But that's assuming that I have it on my roadmap and my team's available to do it. So I guess this is a question directed. I think Jeremy mostly Do you need a six-month window overlapping support or do you need a?

Kyle McCormick: Six for whatever month heads up so that your team is ready to do an upgrade in about a month's time.

Jeremy Ristau: I mean that is what the pilot is, right? you put the announcement out there and you give it six months, the whole point was that you don't react immediately when you put the dapper up, right? You give it several months of heads up so that people who are deploying from master, can to make it into their roadmaps. And yeah,…

Kyle McCormick: So, I think this is a point of misalignment between us possibly a good misalignment that we can to go an opportunity.

Jeremy Ristau: yeah. Good.

Kyle McCormick: So What I am hearing from you, Jeremy is that, you understand the pilot to mean, months from the announcement of the plan. Whereas what we're operating under is six months of simultaneous support, both versions of Python NCI And those are two separate things.

00:20:00

Jeremy Ristau: Yes.

Robert Raposa: Yeah.

Feanil Patel: Yeah. So I'm trying to summarize your suggestion, God, which is we provide a predictable time when the fix will be available. probably within the next six months because it's hard to predict much further out than that. But that the fix does not have to be available on master when we make this announcement. if that work is planned for that period.

Jeremy Ristau: I would argue in fact, it should not Put in immediately, right? You need to give people that time to react before you drop the change on them.

Feanil Patel: What the addition or the removal. because the addition Is incremental,…

Jeremy Ristau: That's true. Yeah. Great. Yeah.

Feanil Patel: it's more to the dropping support. That's that operationally complex.

Jeremy Ristau: That's true. And is this just the statuses of Deborah, Is there an announced state? This would be announced and…

Feanil Patel: That. Right.

Jeremy Ristau: then it would have some period of time where it's announced before, the expansion and then some period of time before the contraction, and those should just all be outlined in the dipper, right?

Feanil Patel: right, and I think this is,…

Kyle McCormick: Yes.

Feanil Patel: this is sort of like we can make this very predictable for these big upgrades and by doing so, hopefully, shrink the amount of time where we're doing simultaneous, support down to What a month, a couple weeks.

Feanil Patel: What is that feel like? because,

Kyle McCormick: Just to put it on the table. I'd say if we announced six months ahead of time, then the overlapping window could be one month. straw man, feel free to debate Taylor part, but for the sake of argument, that's what I would say. Six months and that's a Decker. Six months later. It can be removed, but there has to be a one month window of overlapping support at the end of that.

Kyle McCormick: and I guess,

Kyle McCormick: That I think Jeremy Robin. I'm curious what your opinion on those numbers are. It's like the sixth month is how long do I need to plan? And the one month is, how long do I need to do? to being execute the upgrade on your environment.

Robert Raposa: Yeah, The six months. Yeah. I understand more clearly, what the issue was. Because the six-month window for other Dipper dippers was when the Replacement was made available. Is that true?

Feanil Patel: Currently it's been sort of like when the replacement is made available and this is saying what we will more predictably have the replacement available by month x. And you will have one month to transition after we make it available. And That should hopefully allow for enough time to plan and allocate resources. To do that.

Robert Raposa: Yeah, I think it makes sense to have continued to have the six-month window at least for announcing the plan. you…

Feanil Patel: Yeah.

Robert Raposa: and having a minimum window for once it's done. and ready to be moved onto and…

Feanil Patel: Right.

Robert Raposa: how long you have and whether that's a month or more for Upgrades, I'll let Jeremy's speak about how. long, that typically takes because I know,

Robert Raposa: because I don't know. And Jeremy can speak to that. But I think what we're saying here now, Is less. We're not going to give the six-month warning for planning but just For upgrades. That have all work and have this additional cost. we're not going to give six months from the readiness. Point.

Feanil Patel: Yeah, I will use a new example to make sure we're online which is in the You release. either We want to support Django. 5q. It's two releases from now. We've got 12 months before we need to really have that available. but, we already know what to do to do the five zero of five. One updates. There's a bunch of deprecation warnings in our code We can start fixing them, etc.

00:25:00

Feanil Patel: And so what we would do is I could do it tomorrow which I can create a ticket that outlines the plan for the Django Five to release which will drop support for Django for two. And I would say that will be available.

Feanil Patel: In.

Feanil Patel: At the end of July 2025.

Feanil Patel: If it's done early. Then it's still end of July 2025, because presumably you didn't allocate resources before then to actually make that transition. But from July 20 July to the end of August, is the amount of time you have before we drop support for simultaneously, running both Django versions in CI.

Feanil Patel: How does that align with everybody's expectations?

Jeremy Ristau: For me personally, what I would say is

Jeremy Ristau: I'm in total alignment that putting up a dipper when you want to do something versus waiting until it's six months before makes sense. So it's 12 months before. Great put the plan in there and the target dates. You're hoping to go for that gives everybody as much transparency as possible. The point where you said the Decker will say, our target is And so, July is when the one month time starts. I think I would say more. That's your target. If you hit your target, that's great. If you don't hit your target, it's one month from whenever you hit your target.

Feanil Patel: And I was saying, specifically, if it's early that we keep the target But I would love to if it's Also shift the target early, but I think that's like a open question in my mind.

Jeremy Ristau: Yeah, One feels like an outlier or rally rather than something. We should probably build a process around and then, whoever is running the Decker would have to be very communicative to the whole a gamut of users and…

Feanil Patel: All right.

Jeremy Ristau: try to convince everyone to move the day back. But I would…

Feanil Patel: Right.

Jeremy Ristau: then the point at which, you can remove your support for the old version would still be the original thing that you stated because that is what everybody else has planned to. Yeah.

Feanil Patel: That was my thought And if the work does run late and does not complete in time, then it's a one month from whenever the work is completed.

Jeremy Ristau: That feels like a totally. Acceptable thing to me, for sure.

Feanil Patel: Okay.

Jeremy Ristau: If we can't bake something in six months, that's on us.

Feanil Patel: I'm gonna say if the work is done early, we Should keep the original date, but this could be negotiated.

Kyle McCormick: I say we should get thumbs up from the people running master, who are stakeholders? so,

Kyle McCormick: Be probably reaching out to two you and Mit and saying, Hey we're Are you ready early? And if you're not both like Yes let's do it really.

Feanil Patel: Right.

Kyle McCormick: Then we wait.

Robert Raposa: and my understanding is what you just said is the part of the process for every depot, given the six month window, you…

Feanil Patel: Right informally the size I think been the process and…

Robert Raposa: in six months or as early as we can, it's not

Feanil Patel: this is a formalizing. It Chris this specifically costly thing which is these giant upgrades But we could apply it to everything.

Robert Raposa: Hey, yeah. And And I mean specifically just that last Notes.

Feanil Patel: Got it.

Feanil Patel: Right.

Robert Raposa: Everything else I get is more upgrade centric.

Feanil Patel: I'm spelled simultaneous long multiple times.

00:30:00

Kyle McCormick: Those would be an interesting adjustment to the pilot. I can try to write it up,…

Feanil Patel: Yeah.

Kyle McCormick: I think. Getting this into debit. The dapper process will be interesting because in the current ever process, before we even doing this pilot, there's a common period and then once the common period is over. It's slash and Go ahead. But now we have a common period and then we have a waiting period and then we have an overlap and support period. And then we have a removable period and these are all good things, but we're gonna have to I think enrich our language. around Deborah and have more statuses, all for good reasons, but just lagging that

Robert Raposa: I guess one question for this is, how do we choose the target date?

Feanil Patel: I mean, same way anybody plans, anything? Which is as much information as they have, they make a prediction and then adjust based on new information when I have it.

Robert Raposa: Got it, and there's no.

Feanil Patel: Likely, I'm because This is a guess on my part but I suspect it'll probably be related to the named releases.

Feanil Patel: Just because that is how a lot of the products oriented planning is happening now. And so it will probably be some relation to that, although I Personally, want to push our maintenance up as much as possible, so that we can move past, emergency maintenance, and into sort of predictable, continuous cadence. But the Django. Upgrade is actually going to be our first test of that. Because we have two releases to complete it, but I'm curious if how much of it will get done before the one month before the second release is about to be cut.

Robert Raposa: I guess a different question that's related is there a minimum of five months? So that the five months plus one month there's going to be a minimum of six months.

Robert Raposa: We're matching game,…

Feanil Patel: Mmm.

Robert Raposa: six months rule and we're not going to say, Hey we think we can get this done in two months so now we've got a two months to get it, ready and you've got one month to get it out, But The whole point is to continue to have at least a six-month warning about what's going on and it might be longer if the targets further up. Makes sense or not.

Feanil Patel: Yeah, so you're saying we should give at least a six-month window. to let people know, but the work doesn't necessarily have to have started then completed by then it's just that Hey we expect to be done with the Ubuntu upgrade in six months. we haven't started on it yet but we expect to be done by that point. Yeah, thank Jeremy

Feanil Patel: All right.

Jeremy Ristau: Hey, I think this is a great start. All these conversations and I liked how you added a note to pick things up in the action items as the next conversation. for the next time this field one of those. Yeah.

Feanil Patel: Yeah.

Feanil Patel: All right, Kyle, you said you were gonna update the thing. The ticket, the deficit with this Thank you.

Kyle McCormick: Yep.

Feanil Patel: I think this is a really good improvement to the language slacker understanding, which I think will be great.

Feanil Patel: Do we think this topic is completed and can be checked off? Or do we think we need more discussion on this? I feel like we're in a good place in the next discussion will be after Kyle writes it up for people to look at

Brian Smith: Yeah, I feel like moving the discussion to the ticket for a while and then if we need to bring it back up in a higher bandwidth form,…

Feanil Patel: Yeah.

Brian Smith: we can add it to an agenda later.

Feanil Patel: I'm gonna check this next time off and also I think this addresses the six-month support window for simultaneous support, because there were two different issues that are Yeah.

Kyle McCormick: As yeah, just leaving an action atom.

Kyle McCormick: So I can

Feanil Patel: Yeah, your action item is in there.

Feanil Patel: And then we'll talk about maintenance goals, hopefully. And I'm gonna leave the discussion maintenance teams on for next time. Also Jeremy because hopefully the post will be up by then and if there's synchronous feedback, we want to do, we can do it then. that said, Let's transition to EDX platform and we can move some of Robert. Do you want to add some of these other ones A next time to do.

00:35:00

Robert Raposa: You, we should decide which our EDX platform. I mean that The.

Feanil Patel: The celery stuff is at X platform, right?

Robert Raposa: What? and mostly, I mean, it does affect all services but again that's why it's hard to know what's an ex platform and what's not because that They all affect the next platform,…

Feanil Patel: Yeah.

Robert Raposa: the most.

Feanil Patel: Yeah. I'm gonna.

Robert Raposa: because,

Feanil Patel: Yeah, let's

Robert Raposa: the same with toggle annotations, that could be a general maintenance. But it could also be an Xbox more.

Kyle McCormick: we've spent a lot of time talking about, Pepper processes Today. Maybe we bump that unless anyone's really itching to talk about it today.

Feanil Patel: As long as Robert promises not to have more production incidents. The next time we meet

Robert Raposa: Hey I'm here. We could talk about it. I don't like you got it.

Kyle McCormick: Okay.

Feanil Patel: let's push it to the end. The other stuff seems definitely like EDX platform centric brand.

Robert Raposa: Sounds good.

Feanil Patel: And if we have time, we'll add it.

Robert Raposa: Yeah, the other thing, I don't know if it's for a future topic. I saw The post around the note upgrade. and I don't know if this is just a dipper meeting topic of what's too big and too small for diaper tickets is that Everything, that's not a me.

Feanil Patel: let's talk about that with the Dipper folks, because I think, Maintenance Working Group has super been active in the depot process, but I think what the future of the depot are working group and how does it make sense at some point to just combine given that three out of five of us are on both of these working groups?

Feanil Patel: I think it's useful to have the separate time because there is slightly separate focus. But Maybe you could add that as a next time for the dapper conversations, Robert Okay.

Robert Raposa: Yeah. Yeah. Were there and just starting an ad for myself.

Feanil Patel: Open up this.

Robert Raposa: I mean, In theory, if we don't get to the toggle question, that could also move there.

Feanil Patel: Yeah. Yeah.

Robert Raposa: yeah, so we had a number of issues but this is more just sharing that evidently This causes a lot of pain on the servers and isn't needed for anything, right? and

Feanil Patel: Yeah, yeah. Because I assume those messages are mostly getting dropped on the floor because

Feanil Patel: As of the security groups.

Robert Raposa: Yeah.

Robert Raposa: Which one? Yeah, no item.

Robert Raposa: I don't know, I think it all goes through the broker, so I think it all works, but it just put on a load on the broker. and the burger is like,…

Feanil Patel: Got it.

Robert Raposa: Very happy with it and very unhappy with it.

Robert Raposa: the other update each people may or…

Feanil Patel: Cut.

Robert Raposa: may not need as we also split our cluster so that accepts as its own accelerate cluster from all the other services, but that's

Feanil Patel: Nice. Yeah. Yeah, that's much more operational. But this seems like it's worth passing to.

Robert Raposa: yeah, I mean

Feanil Patel: Tutor operators for how tutor starts up salary. Kyle.

Kyle McCormick: Sorry, I haven't even gone to the PR yet.

Robert Raposa: Yeah.

Feanil Patel: Yeah, I'll publish the page. Also drop the PR link over.

Robert Raposa: Yeah, I mean, it. May shrink. The amount of resourcing that's required.

Feanil Patel: Yeah, which is super useful.

00:40:00

Robert Raposa: Yeah.

Feanil Patel: Yeah, let's see.

Feanil Patel: How you're looking at it now?

Kyle McCormick: Yeah, I can't say that. I understand it.

Feanil Patel: I think the key thing is This bit here.

Feanil Patel: Which is reducing the number of coordinating messages, that celery sends internally to itself that I think are not necessary. Is what Robert was saying?

Robert Raposa: Yeah, of course from any worker to all other workers for every worker.

Kyle McCormick: Okay.

Feanil Patel: 

Robert Raposa: At least that's the English.

Feanil Patel: Yeah, because these are essentially different adcasting. Protocols. for learning about what other workers are doing and up to

Kyle McCormick: Right.

Robert Raposa: Seems to make. A little worse all the time but much worse when you are. If you…

Feanil Patel: at scale.

Robert Raposa: if you have to do any restarts, that's when it's like any research or, losing a work, you're adding a worker. So I don't know how much. That will affect.

Robert Raposa: other people…

Feanil Patel: Yeah.

Robert Raposa: but even when you're not doing that, I think it reduces the number of New connections, and all kinds of issues at times, but I'm Less certain about that if everything's just running forever.

Kyle McCormick: Yeah, this appreciate you sharing this.

Kyle McCormick: I feel like, The people on the opennetic side who'd want to see. This is the Large Instances Working Group.

Feanil Patel: but,

Kyle McCormick: Vanilla Tutor runs, anonymous worker and a CMS worker one of each. probably isn't affected much by this, but the Harmony project Kind of plugs into tutor to make it scale and have I'm guessing potentially multiple workers.

Kyle McCormick: Is there a? Summary of the problem You could post so that I could share this with people.

Robert Raposa: if we have a

Robert Raposa: Easily, publicable summary. but we definitely have bullet points that we can copy out. and let me just look for

Feanil Patel: Do you want to toss them into this duck? Robert.

Feanil Patel: And I figure Johnny Felipe and Regis seemed like that and their sufficiently involved in the large instance plus tutor to tag on this. Yeah.

Kyle McCormick: Yeah.

Feanil Patel: And Robert and I think you're in the process of finding that data and pasting it, but maybe we can tell you could summarize the next thing while we're doing that.

Robert Raposa: Here's one potential summary and now we can move on.

Feanil Patel: Yeah.

Robert Raposa: I did a copy. Where is this?

Kyle McCormick: Did I put that action? The agendasop again?

Robert Raposa: Did.

Kyle McCormick: sorry, the next one that finials, Imagining.

Robert Raposa: They write.

Feanil Patel: That can figure overheads and…

Robert Raposa: No. Sorry.

Feanil Patel: yaml. Robert did you put it?

Robert Raposa: yeah, no, I dropped all of these and I just had a brief Question about it,…

Feanil Patel: Yeah.

Robert Raposa: There was the old Topic about overrides and a proposal and Hey we think this would help you and we think we're going to do things differently and we're going to adjust oeps and whatever and I was reminded of that topic when the Yaml CONFIGS came up for Devstack and I was reminded of the fact that I still Generally followed. the old topic but not well enough where I'm like

00:45:00

Robert Raposa: Here's what we're saying, needs to happen. Yes, we should even do this or not. And does this affect our use of yaml in debtstack anywhere and everywhere or is it? Yeah. I can't even remember if the posal. Had Literally dropping Yaml for certain things or had nothing to do with Yaml, So this is more like a bunch of questions that I had that I just laid out so that you could be this totally irrelevant and maybe we'll get to that topic again at some future time. one, you feel it matters against but, Yeah, to you, Kyle. this is,

Kyle McCormick: Yeah.

Kyle McCormick: I don't think the new Development. upon which both Dev Stack settings and Tutor settings Should support Yaml.

Kyle McCormick: I think. That it shouldn't do anything to preclude Dev Stack. From using yaml. If it wants to though,

Kyle McCormick: Does that make sense?

Robert Raposa: Is that? How things were left on that discussion. Okay, so okay. So

Kyle McCormick: Yes.

Robert Raposa: so I get that, that is

Robert Raposa: So I get that there is no urgent issue around that topic. but again, it reminded me of There was A semi urgent, but not urgent issue around. However, ads are dealt with, because we had an issue at one point in time, and that could occur. And we had a whole discussion about that and…

Feanil Patel: All right.

Robert Raposa: we had discussions about whether or not the Oep around settings was going to change or not and them. And…

Kyle McCormick: Sure.

Robert Raposa: then that all just stopped and…

Kyle McCormick: Right. There is a through line between them.

Robert Raposa: I don't,…

Feanil Patel: Yeah.

Robert Raposa: And this reminded me of that. So,

Kyle McCormick: The through line is that

Robert Raposa: Okay.

Kyle McCormick: from registers, and I experience with tutor, When you don't control the upstream settings, file Yeml is only enough if you have crafted the upstream settings file so that you can write Yaml to run your site.

Kyle McCormick: We could be wrong but that was our experience in Twitter we just couldn't write a yaml file for tutor that handled all the things we needed to tweak to make two to work. The way we wanted to.

Feanil Patel: and furthermore, the addition of the yaml file complexity is not necessarily valuable Because it essentially duplicates.

Kyle McCormick: It's actually.

Feanil Patel: Capabilities that settings file already has right?

Kyle McCormick: Yeah, so we ended up having to have A Custom Python Settings module. That require. And we had to have a yaml file because EDX app will crash if you don't have a ML file. So it was really the worst of both worlds. We had to use Yaml and we had to use Python and they layered on top of each other in a way that wasn't like

Kyle McCormick: describable in a sentence or two.

Feanil Patel: And I think if I were to summarize, what's happening right now, is that as we're transitioning, that development settings files away from the old debtstack. we are not going to continue to support Yaml for the development We're not currently yet, making any changes to the settings, file production DOT by, but it is likely that it will move in this direction at some point in the future. When we will announce it in communicate and do all that stuff,

Kyle McCormick: So then I said was that my reasoning for pushing us away from Yaml but there would be a different ticket with the windows we talked about. For any change there. Yeah.

Robert Raposa: so one thing, and what you said that I am not quite following yet is you're saying Yellow only works if you have control over something. And yellow has been. Presumably working for us to some extent because we're using it, So what is it that we have controlled over that allows for that or using you?

Feanil Patel: Production at PI.

Kyle McCormick: I'm saying that for years, EDX, I mean, Everyone in this group call right now, just changed production. whenever we needed to and Yama was this kind of secondary thing. And then the community just kind of like, Production job I would hit them in the face and they would read again because they had to. And then they would also make another Django settings file because the animal wasn't sufficient for them.

00:50:00

Feanil Patel: Because there are things that we are updating, where we're like, this dictionary. We need to get inside this dictionary and update inside of a list, Let me write something in production to apply. That makes a new Yaml variable. That lets me inject that data into that structure. and that's great for us but somebody else in the community needs to update a different dictionary. With an injected value. And they can't land that settings that they can't land, that production up, high change nor…

Robert Raposa: Good.

Feanil Patel: did so they just make their own settings file, that does that injection in code. But they still need the ammo file. So from a community operator perspective, it doesn't reduce complexity to have production up by there. It actually just increases it because now there's two ways of changing that settings. And they interact weirdly with each other.

Robert Raposa: and so, the ultimate Wish plan would be no one's using email is that accurate?

Kyle McCormick: the plan would be That if an operator wants to use ammo,…

Feanil Patel: Right. The recommendation,…

Kyle McCormick: their custom settings file can have the yaml load block in it.

Feanil Patel: I think would be that. Nobody use Yaml for me personally.

Feanil Patel: because,

Kyle McCormick: I I can see them why an operator would want to have? A non-touring a declarative file.

Feanil Patel: As static file.

Kyle McCormick: I…

Feanil Patel: Let's say for

Kyle McCormick: if you were not during the play but during complete file.

Feanil Patel: Yeah.

Kyle McCormick: And statics your aesthetic file.

Feanil Patel: At the level of complexity. Yeah.

Kyle McCormick: A configuration file for their settings. I understand that It just doesn't work for the project. But I can. Yeah.

Kyle McCormick: I mean, personally, running a site I might write a little block of python. That loads in Yaml,…

Robert Raposa: And that's…

Kyle McCormick: but that block of Python would be tailored.

Robert Raposa: what you're saying. We landed would do…

Feanil Patel: Yeah.

Kyle McCormick: So I need and it probably won't work for everybody else.

Robert Raposa: if this were adjusted At…

Feanil Patel: And I'm yeah.

Kyle McCormick: Yeah.

Robert Raposa: And I get that you're saying and…

Robert Raposa: and maybe don't do that.

Feanil Patel: yeah me personally in my personal opinion, I'm saying you should not do that and should just have a setting step by file because you have the people who are changing that file and the config file are, Our engineers and they should know how that bit works. And it reduces the number of yeah.

Kyle McCormick: But but if you have a system running already, that moves the animal files around, which I know you do, then you could just keep that block, so you don't have to change that system.

Feanil Patel: Or you can delete that whole system. And have. A thing that reads the config out of some data store directly.

Feanil Patel: And then have actual live replacing thing. Live updating configs that are not dependent on reboots as much.

Feanil Patel: It depends on how much time and space, you have,…

Robert Raposa: Yeah, cool,…

Feanil Patel: that's everything.

Robert Raposa: and it sounds like this is This would be it, a dipper at some point Expanded contract.

Feanil Patel: Yeah. Yeah.

Robert Raposa: And it's not out there and…

Kyle McCormick: Absolutely. I'd probably say more than a month of overlapping support here…

Robert Raposa: okay, that's good.

Kyle McCormick: because this is not a trivial migration.

Feanil Patel: That's huge.

Robert Raposa: Yeah, depending on, how do you write? The bet I already.

Feanil Patel: Yeah.

Kyle McCormick: It's thoroughly, medium sized.

Robert Raposa: Good. Next.

Feanil Patel: Do we have five minutes? Do you guys want to talk about toggle, annotations and dapper, or do you want to push that to either the Decker meeting next week this meeting next week.

Robert Raposa: I mean, I'm happy to.

Feanil Patel: Do you want to prime the pump Kyle?

Robert Raposa: I'm happy to start it was go ahead. Okay.

Kyle McCormick: Sorry, Tacos. Ionization.

Feanil Patel: so The question here was Tuggle, annotations already have a date at which they're meant to be removed on them. Can we just use that date if it's past that date? We can just go remove things. They don't need to have more warning than that.

Robert Raposa: yeah, so my main comment is, we didn't Have any of these discussions about process and six months and all of this. At the time those annotations were coming in and we were trying to get some native metadata in the code so that we can see And the biggest thing that we were trying to see is just add a glance. Can we see just how old some of these that are in theory, temporary? And so the idea and…

00:55:00

Feanil Patel: Right.

Robert Raposa: even in the documentation is If you don't know, just throw in two weeks or two, I don't even remember what it is. just pick a date because, to switch to blame and all these other things on the code, is much harder than having the annotations that are pulled out to a report so that we can just scan through and sort the report and be like, hey, by the way, we've got Temporary toggles, that are from five years ago. Maybe, if we're going to look at toggles to deprecate, those would be good candidates. so for me I feel like Those annotations are not set in stone. I think, having a day, annotation would still serve this purpose of allowing us to more quickly and easily see.

Robert Raposa: Something but I wouldn't let those dates drive a process. I would figure out what our process is and then potentially have that drive how we use the annotation or adjusting annotation names as necessary, or whatever. But not to pretend like those dates were any more meaningful than they were.

Feanil Patel: What if we said starting today? Those dates are meaningful. And if you put a date in any time, after that, the talk can be removed. moving forward, those dates are real.

Robert Raposa: So, why?

Feanil Patel: what's the value of that?

Robert Raposa: What problem is, are we trying to solve is this for temporary specifically,…

Feanil Patel: Yeah.

Robert Raposa: temporary toggles and helping ensure that They don't live forever or

Feanil Patel: I think it's reducing the amount of time between when we notice that a thing can go away and we can delete it.

Feanil Patel: so if a toggle has been around in the system and has a removal date, At which point the future should be default on now instead of whatever it is.

Feanil Patel: Do we need to say, hey, Y is going to be the new default. you have, although I feel like maybe is a mood conversation because This isn't a necessarily.

Feanil Patel: this isn't like a high operator impact kind of stuff for the most part, right. this feature toggle is being flipped the other way so that we're caching more or this new UI is the defaults now and there could be impact there. I guess from the old Empire like the old UI being

Feanil Patel: More capable more future. That's more features than the new one. but,

Feanil Patel: The question is sort of like should this go through a new Decker or not. That's fair. But yeah, the question is like, Does this need a new deadline? Or can we use the existing deadline that we wrote down on the toggle? As the deadline, when it's safe to do the removal.

Jeremy Ristau: I mean, can we use a real example here and Let's say we put a six-month look, annotation on all the waffle flags around course, authoring pages. Can we just hit me?

Feanil Patel: Right.

Jeremy Ristau: Can we just immediately? Get rid of them right now and remove the old pages. I don't think so, right? Because you would want to go through a deprecation process.

Feanil Patel: But I think the question is if let's say we set that deadline for six months from whenever at that point.

Feanil Patel: Can the Decker just I am announcing that. This thing is going to be removed and this is going to be the new default two week comment period like normal. And at that point, I can just go do the removal or does it need further warning than that? I think is I think The question that was raised.

Jeremy Ristau: thinking about it. I would go back to Robert's suggestion of the Decker drives it and the annotation is just a reflection of the deffer's goals. If the annotation gets stale, you still have a pointer back to the actual deprecation, but you're mention of Can we just take everything as is right now and react to it? As is like, there's all these things that have historical dates. That.

01:00:00

Feanil Patel: No, no more. Right, right? That's fair. I think my suggestion was more that …

Jeremy Ristau: Yeah.

Feanil Patel: Can we say that starting today if you add a new annotation to things, That those annotation dates are more hard than the dates that used to be in annotations because we want to make this process more.

Feanil Patel: Predictable and reliable and, moving forward. Because I understand that historically. These dates were aspirational, And what I want to do is shift.

Jeremy Ristau: Or they were just arbitrary.

Feanil Patel: And they were arbitrary. And what if they were neither of those things, Like what would it take for us to get from?

Jeremy Ristau: Yeah.

Robert Raposa: but,

Kyle McCormick: There's a thing, it could be.

Feanil Patel: Yeah. Right?

Jeremy Ristau: Yeah.

Kyle McCormick: They could be gone. We have a debit process.

Feanil Patel: Exist. we have a debt process. They could just not exist. That There's still a creation date.

Jeremy Ristau: Yeah.

Feanil Patel: So that in particular the removal data, I think is the thing that's misleading, right? a toggle creation data, I think is useful for getting age into the reports but a toggle Removal date is. Almost always aspirational. I feel like

Robert Raposa: Yeah, I think you're getting towards where I would go, which is, let's we have some tickets that already have some adjustments, to the, annotations proposed. And I think you're Giving even more of them and we should just review what are the right annotations that would help us and let's get them in.

Feanil Patel: Right.

Robert Raposa: So that's one piece and then in terms of Do we want to target and how would it be used?

Robert Raposa: Could be similar to. The whole issue of I think there are a set of rollout toggles. That we would like, whoever is creating it. to put in that target date and that it might be two weeks and that they are on the hook for actually doing the removal, but I think there's going to be another set that is more like The conversation around Jeremy saying,…

Feanil Patel: Yeah.

Robert Raposa: We don't want to create this dipper because we don't want to be on the hook to pull out everything, that's just not something we can commit to right now, and I think They'll be a set of things where it's like, this could potentially be temporary. But only if you get everyone to agree that the old thing can go away. And who's gonna make the old thing go away.

Feanil Patel: Right.

Robert Raposa: And maybe all of that is part of the product discussion, So yes. that short rollout goggles that This is meant to be temporary,…

Kyle McCormick: Yeah.

Robert Raposa: to make sure that things are safe. And once they're…

Feanil Patel: Right.

Robert Raposa: if we don't need the little thing. Actually marking that and having people pull it out. Sounds great. And if they don't pull it out knowing that they should have is great.

Kyle McCormick: I think to your last point, when the Sjw wraps up before the old stuff is removed, if you think, There should be a process where people in the community are empowered to just do that clean up and aren't hit with a wall of paperwork, in order to get it done.

Jeremy Ristau: Yeah, yeah. I mean It's a tough line to draw in all of these conversations between speed and appreciation for the users of the platform. it's a really hard. Line to walk down.

Feanil Patel: Robert. I don't know where the notes are for making changes to the toggle process. But I think removing the removal dates, Or maybe adding clarification documentation around the removal dates, which is like, if this is not a thing, you are planning on removing by this date, don't put this date in here.

Robert Raposa: I think there's The view in the Deborah board that's for Deborah process related tickets. I think you probably find something in the ad comments there's only a small handful of tickets there And let…

Feanil Patel: Okay.

Robert Raposa: if you don't find the ticket.

Feanil Patel: Yeah, I'll take a look at that points.

Robert Raposa: And we should probably trip because we have another meeting as and,…

Feanil Patel: Yeah.

Robert Raposa: but this was Level. Thank you.

Feanil Patel: That thank…

Kyle McCormick: And Xbox.

Jeremy Ristau: Thanks everybody.

Feanil Patel: Alright, have a good deal.

Jeremy Ristau: Everyone see.

Meeting ended after 01:05:01 👋

This editable transcript was computer generated and might contain errors. People can also change the text after it was created.