Large Instances Meeting Notes 2023-10-17

Meeting video recording: https://drive.google.com/file/d/1ylWPN53OODTjHSjYJcgBHnmjr_h6WCDQ/view?usp=sharing

Assign meeting lead and note taker.

Greetings & introductions as needed.

Updates from each org on the call - 2U, eduNEXT, OpenCraft, Raccoon Gang. What's new with your deployment(s)?

@Jhony Avella @ eduNEXT - want to do more load testing on our mini k8s installation. Hoping to get to it in the next month. We’re in the process of setting it up for a customer. (@Felipe Montoya : Keep in mind, it’s still 3 servers, not 7, so “mini” is a bit misleading. But still smaller deployment than traditional k8s). We can still add new nodes using k3s if we need to.

@Moisés González Re improvements to MFE hosting: we got a PR merged recently, but still exploring how can we make it easier to define the public path to a CDN endpoint. At the moment, each image has to have the CDN endpoint that pulls from the container and caches the assets. Exploring how to circumvent that. @Felipe Montoya We actually have a business requirement for that - we don’t want to have a separate image for each customer/CDN URL.

@Braden MacDonald For what it’s worth, I’ve used a hacky solution for that: put a unique string into the variable, then run webpack, and build the MFE bundle. Then use a script in the container start up to basically do a find-and-replace to substitute the right variable at container startup. It’s hacky but I’ve used it in prod for some time and never saw any problems.

@Gábor Boros We ran into a docker rate limit issue and after discussing it with @Régis Behmo , he created a PR to moby (docker) and got the issue fixed upstream . If anyone else is interested in using a registry mirror / pull through cache to work around docker rate limit issues, check it out. We (OpenCraft) encountered this problem when building all the images for PR sandboxes. e.g. building 40 sandboxes requires something like 400 image pulls.

@Felipe Montoya One thing that could be a blocker for us is the SSL termination layer. Sometimes we want to have that outside the cluster, in the ingress controller / load balancer itself, not inside the cluster on cert-manager. @Moisés González Another thing is that we don’t yet have a lot of Palm-version deployments that can take advantage of things like shared elasticsearch.

@Maksim Sokolskiy We are tracking an issue with shared elasticsearch on Palm/Quince/master even now: https://github.com/openedx/openedx-k8s-harmony/pull/31#issuecomment-1616058880 . No updates related to Harmony at Racoon Gang yet. Regarding ClickHouse Helm Chart project, it’s on hold.

Harmony project updates: Review list of PRs and issues, and assign anything un-assigned.

No changes on most of the issues/PRs this sprint.

Open discussion/questions, if any.

Discussion of py2neo issue https://discuss.openedx.org/t/missing-py2neo-package-causing-build-issues-in-all-edx-platform-releases/11371/10

Braden: Frontend Pluggability Summit is next week. @Felipe Montoya I’ve asked Maria to look at it, thinking about what we learned from Hooks & Filters in particular.