Maintenance Working Group Meeting – 2024/09/26 08:59 EDT – Transcript
Attendees
Adolfo Brandes, Brian Smith, Feanil Patel, Feanil Patel's Presentation, Jeremy Ristau, Kyle McCormick, Maksim Sokolskiy, Robert Raposa, Sarina Canelake
Transcript
Brian Smith: Hello.
Robert Raposa: And that's
Sarina Canelake: My name. Brian Morning Robert.
Brian Smith: mark,
Feanil Patel: Hello.
Brian Smith: Hello.
Feanil Patel: Robert. Does this mean you're production was? A little bit more under control.
Robert Raposa: There it does. And actually I can share One potential change that just might help reduce celery load for everyone.
Jeremy Ristau: This morning, everyone.
Robert Raposa: Over reduced.
Feanil Patel: Bringing my name.
Brian Smith: Morning.
Robert Raposa: Good morning, reduce load on the Redis cluster, of course.
Feanil Patel: Sounds good.
Feanil Patel: Notes on her.
Feanil Patel: Skipping that going now.
Feanil Patel: Fairness cream.
Feanil Patel: Y'all can see that. Okay. there
Feanil Patel: let's just go down the line real quick and then, we'll Talk through the next time stuff.
Jeremy Ristau: Yeah, so for the first one, I have brought up the Django upgrade to arbybaum. They're not really doing anything yet but at least it's on their radar. And then I pinged our SRE manager as well.
Feanil Patel: Yeah.
Feanil Patel: Yeah, I think both of the first two things are.
Jeremy Ristau: To let them know and connected them with the Neil.
Jeremy Ristau: So I think I can call that done for now.
Feanil Patel: Done for and then, A we can follow up on that conversation that we have that I mentioned like That we had on that ticket.
Jeremy Ristau: Yeah.
Feanil Patel: Here, if that would be useful around. Cross repo maintenance, access stuff.
Jeremy Ristau: Yeah, that'd be great.
Feanil Patel: And the new rule.
Feanil Patel: And then next thing is the patch window discussion. We want to have That.
Feanil Patel: and I know,
Feanil Patel: and then there's the crown on master, which I've not taken it up yet, but I'll try to do that this week. Brian, did you get a chance to update the dapper pilot issue with the
Brian Smith: Yeah, I left a comment on that issue. Just kind of adding addendums for scope and…
Feanil Patel: Okay.
Brian Smith: unmaintained repost stuff. I think it covers everything that we have there, but if someone wants to double check and see if there's something I missed more eyes will lead to better docs.
Feanil Patel: Yeah. Yeah,…
Brian Smith: So yeah.
00:05:00
Feanil Patel: and that is Cool. Yeah, we can take a look at those but I'm going to mark this as Done or now. People can follow up on the issue.
Brian Smith: Exactly.
Feanil Patel: Six month window for simultaneous Python version doesn't seem prudent. I think that's like a variation. Yeah, I think that's in here. And in the next time,
Feanil Patel: let's see. Chief maintenance goals.
Feanil Patel: And then I found three 12 or 313 we completed that discussion last time.
Feanil Patel: and then, Pepper tickets for the front ends that need to be deleted as part of the course offering MVP. No update on that, Jeremy.
Jeremy Ristau: Now.
Feanil Patel: And then I'm going to drop Clinton's things since he hasn't been here in a while. so, first up
Feanil Patel: So last week, we talked about this notion of teams that are helping maintain many repos across the organization and
Feanil Patel: How they would manage access for them because needing to sort of expand and contract CC ship for them across. A lot of repost doesn't make a lot of sense. one of the things and I spoke to with Sarina about this and one of the things we sort of arrived at was, The.
Feanil Patel: Introduction of a new rule.
Feanil Patel: Which I think currently, we're calling maintenance at large.
Sarina Canelake: Yeah, I was gonna actually put this on the forums for discussion. So I know I want make sure anybody…
Feanil Patel: Yeah. yeah, I mean
Sarina Canelake: who has thoughts here captures it on the discussion post. I'm about to make not just in this thing.
Feanil Patel: If you make it and drop a link here, we can. Yeah.
Sarina Canelake: Yeah, I actually tagged you on it yesterday because I wanted you to review it before I posted it.
Feanil Patel: I'll take a look at it after this, but I think this is the right thing to do that role will get introduced. There's details in the post sarina's going to make and we can sort of follow up on there about that role and whether it seems like the right fit
Feanil Patel: and if it doesn't,
Feanil Patel: So, it will skip this for now, and sort of follow up async on that post and I'll put a link in here once I've gone through it, and we can post it.
Robert Raposa: I will note that the name of it is amazingly possibly out of place. If it doesn't at large mean escaped.
Feanil Patel: it's also used in and politics for representatives that represent the entire city as opposed to certain districts in it.
Robert Raposa: The.
Robert Raposa: I think I just am less aware of that you Cool.
Feanil Patel: We got it. Yeah, and the name is one of the things. You can provide feedback on
Robert Raposa: Yes, that's good.
Feanil Patel: Yeah. It's a surprisingly hard group to name, I think.
Robert Raposa: Okay.
Robert Raposa: Just give it.
Feanil Patel: We'll make sure it's a person carrying, a bag of money or…
Robert Raposa: Okay, exactly.
Feanil Patel: something. Stripe shirt.
Feanil Patel: Let's talk about the deprecation window.
Feanil Patel: Which I think is super relevant this group. yeah, I think as we're sort of adding on, all of these different matrices or finding that the cost of maintaining those for a long period of time, in terms of Time. It takes for tests to run and queuing is actually much higher than I previously expected for six month window.
00:10:00
Feanil Patel: And I'd like to propose shortening that window. A bit to reduce that load and to sort of speed up some of these things landing. And they want to open that up for discussion.
Feanil Patel: And just because it's hard to know what to shorten. It do, let's say four months unless somebody wants to As a different number and their head.
Feanil Patel: I'll go for it.
Kyle McCormick: Just point a clarification. Are you a proposing it for this upgrade all upgrades are all dappers.
Feanil Patel: I'm going to say for deckers that need to have a support window because I think the six-month window was specifically for things where there's sort of operator impact.
Feanil Patel: and for things compatibility where a Future complete new version exists. We don't necessarily need this But for anything, where, the operators or sites will have major impact, and we want to give them time to adjust.
Kyle McCormick: Breaking changes. Got it.
Feanil Patel: Yeah. Brian.
Brian Smith: and this is specifically in reference to, Things where we end up having simultaneous support for the old version and the new version, right? that's…
Feanil Patel: Correct.
Brian Smith: where Or are there pain points for other deckers…
Feanil Patel: Yeah.
Brian Smith: where we don't have that overlap time?
Feanil Patel: no, I mean the window is for simultaneous support so I think that's where Having the six months, simultaneous, support is causing issues. In terms of the amount of CI we have to maintain and the amount of resources that's taking up. And so now that we have some data on that with both the Python and…
Brian Smith: Yeah.
Feanil Patel: node updates in flight,
Feanil Patel: I based on that information and feedback. I've been hearing from other people, I kind of want to propose that we changed this down to four months but I am not one of the people who has to upgrade and operating site. So I want feedback from people who are doing that,
Brian Smith: I'd also want to call out that, I think the window might change depending on how far out from a release. We are because there's the question of, should we continue to have simultaneous support shipped in the next release? And that means we would want it up until the cut at least,
Feanil Patel: Mm- And that's why we chose the six month window in the first place. Is that it would guarantee that it would be in at least one release.
Robert Raposa: It's one question that I have is, we all understand the cost of CI. And the wish to prioritize, this work to shrink the window as much as we can. Anytime these upgrades are happening. I'm wondering.
Robert Raposa: If we think, That is not enough and that we need to. Say that six months Is not the official window, because we Feel like we're going to miss. The opportunity to shrink the window for or if we can simply say it, we don't actually need a new rule. We just know that. Hey we're going to put this ahead of other things. Because we want to get this out of CIS, quickly as possible.
Robert Raposa: Yeah. It's just a question.
Feanil Patel: Can you restate that question? I had some trouble following it.
Robert Raposa: the question is, because the current pilot is we give a six months, official window, because that's what we think is needed as a general rule, but we also,…
Feanil Patel: Mmm.
Robert Raposa: try to,
Robert Raposa: Order and organize the dippers such that things can be closed earlier. And these are ones that we have. Because of the CIA cost even more pressure, to want to bump them up in priority so that they're the first things that we're working on and we close them even earlier and…
00:15:00
Feanil Patel: Right.
Robert Raposa: and I guess the question is, Do we know that general rule and process is not going to be enough, that if we leave it at six months and we don't change anything and we all understand the cost of CIA. do we think that some of us are just going to be? and that's just not a big deal. I mean, the cost of CIA affects all like everyone, no one wants.
Robert Raposa: That sitting around longer than it can be,…
Feanil Patel: Right.
Robert Raposa: or that it can be sitting around. so I'm just wondering whether we actually need an exceptional, Let's call it four months for this or If it's six months, it still covers the six month rule for
Robert Raposa: Named releases. And we all understand that this is the type of work that we would like to prioritize over other work.
Feanil Patel: Right, I guess. So it sounds like you're saying, We keep the six-month window and just make it clear that we're going to try to land this faster?
Robert Raposa: Yep.
Feanil Patel: Does that? Yeah, And so I think my response to that would be that the window is for sort of dropping the old support in particular, and I think From a technical perspective usually that part is not super complex the dropping of the support because it was the adding support that requires the forward, compatible deltas and then dropping support is usually more clean up and faster to do.
Feanil Patel: but the impact of dropping support on operators who have not yet transitioned is the thing that I'm concerned about and I want to make sure that I think when we were talking about this we said four months or we said six months because it's sort of straddled both the releases and gave other master following operators sufficient time. What if we said, four months plus one release. So if the release was in five months, Or six months, we extended potentially. But
Feanil Patel: Often for a lot of these. We don't necessarily need simultaneous support on releases because the way that people who are on releases operate, Tends to be via tutor. For the most part and tutor has a lot of tooling to do these upgrades, as a part of the shift between releases. so, having a four-month window having tutor sort of nightly update, The name releases will sort of capture these deltas, pretty even if they're shorter than six months.
Feanil Patel: Kyle.
Kyle McCormick: I'm gonna propose something different. It might be hard to explain, bear with me. so, I'm pretending on, I'm an operator running off of master, right? And we're upgrading Python from 312 to 314, right? That's the test on the table. One support is in master. Once we have that simultaneous support, if I as an operator have planned and I have my road map set up so that my team is available to go into our docker files and switch that Python version and then test and master. That's for a smooth upgrade. That's a couple week process I think. And for a rough upgrade that might be like a month long process. It it's not six months. But that's assuming that I have it on my roadmap and my team's available to do it. So I guess this is a question directed. I think Jeremy mostly Do you need a six-month window overlapping support or do you need a?
Kyle McCormick: Six for whatever month heads up so that your team is ready to do an upgrade in about a month's time.
Jeremy Ristau: I mean that is what the pilot is, right? you put the announcement out there and you give it six months, the whole point was that you don't react immediately when you put the dapper up, right? You give it several months of heads up so that people who are deploying from master, can to make it into their roadmaps. And yeah,…
Kyle McCormick: So, I think this is a point of misalignment between us possibly a good misalignment that we can to go an opportunity.
Jeremy Ristau: yeah. Good.
Kyle McCormick: So What I am hearing from you, Jeremy is that, you understand the pilot to mean, months from the announcement of the plan. Whereas what we're operating under is six months of simultaneous support, both versions of Python NCI And those are two separate things.
00:20:00
Jeremy Ristau: Yes.
Robert Raposa: Yeah.
Feanil Patel: Yeah. So I'm trying to summarize your suggestion, God, which is we provide a predictable time when the fix will be available. probably within the next six months because it's hard to predict much further out than that. But that the fix does not have to be available on master when we make this announcement. if that work is planned for that period.
Jeremy Ristau: I would argue in fact, it should not Put in immediately, right? You need to give people that time to react before you drop the change on them.
Feanil Patel: What the addition or the removal. because the addition Is incremental,…
Jeremy Ristau: That's true. Yeah. Great. Yeah.
Feanil Patel: it's more to the dropping support. That's that operationally complex.
Jeremy Ristau: That's true. And is this just the statuses of Deborah, Is there an announced state? This would be announced and…
Feanil Patel: That. Right.
Jeremy Ristau: then it would have some period of time where it's announced before, the expansion and then some period of time before the contraction, and those should just all be outlined in the dipper, right?
Feanil Patel: right, and I think this is,…
Kyle McCormick: Yes.
Feanil Patel: this is sort of like we can make this very predictable for these big upgrades and by doing so, hopefully, shrink the amount of time where we're doing simultaneous, support down to What a month, a couple weeks.
Feanil Patel: What is that feel like? because,
Kyle McCormick: Just to put it on the table. I'd say if we announced six months ahead of time, then the overlapping window could be one month. straw man, feel free to debate Taylor part, but for the sake of argument, that's what I would say. Six months and that's a Decker. Six months later. It can be removed, but there has to be a one month window of overlapping support at the end of that.