Large Instance Meeting Notes 18.04.2023
@Felipe Montoya will be leading the meeting, @Braden MacDonald taking notes
Notes:
Reviewing the issues on the DevOps Working group board
The issue to update the README for Harmony is now done.
Autoscaling PR. We had two options for how to resolve a command (helm dependency update) and Jhony has addressed this so the PR should be ready to go. @Braden MacDonald will take another look today and this should merge shortly.
Jhony is investigating how to allow people to use the helm chart without needing to clone the repo. He will make an issue.
Monitoring with Prometheus, should be unblocked with the merge of the autoscaling PR.
Karpenter issue is also blocked on the autoscaling which will soon merge.
OpenSearch support - we have a draft PR from @Maksim Sokolskiy. He anticipates it will be ready for review in a week or two.
Next steps until we can use this in production. Thanks for the different points of view and lists from the different people who commented. We’re missing monitoring, publishing the helm chart and a release process.
eduNEXT will be testing this on a prod-like sandbox environment soon.
Jhony: there is a helm chart we can use that provides all the monitoring tools. As mentioned, Jhony will create an issue about the release process.
Question about the release process: do we need to open an Axim engineering ticket so we can publish using an official Open edx account? A from Jhony: It depends - if we publish on GitHub Pages it won’t be necessary. It would be needed if we wanted to publish on Artefact Hub etc. Jhony will post details on the issue that he’ll open.
Discussion: do we need log collection? Gabor: It’s nice to have. Felipe: maybe it’s better to have log collection at a higher level, something out of scope.
@Felipe Montoya will open an issue about this so we can have a technical discussion about our options.
Tutor plugin: we’ll add it to the index once we’re using it in prod. Jhony: the pod autoscaling plugin is already in the tutor plugin index.
SSL cert for ElasticSearch:
@Felipe Montoya : I’ve seen so many issues with self-signed certs; they’re basically a planned outage on the date of their expiration. Can we use cert-manager/letsencrypt to get a valid public cert for this instead? We pinged @Moisés González to get his input.
@Maksim Sokolskiy notes that the solution for OpenSearch may be different than for ElasticSearch but we aren’t sure at this point. We’ll track it with the same issue for now, and create separate issues if needed.
Anything else that we aren’t tracking with an issue on the board?
Jhony: The ElasticSearch helm chart required some changes to the platform. Have those changes landed? chore: add ELASTIC_SEARCH_INDEX_PREFIX setting to prefix indices by keithgg · Pull Request #130 · openedx/edx-search
It has merged, but is it in the named release? If not, we need to backport it. @Maksim Sokolskiy will investigate, since he’s part of the BTR working group.
Braden: Is anyone using MySQL 8 of 5.7?
Most using 5.7
OpenCraft using 8 on their new DigitalOcean Grove clusters but no prod customers on it yet.
Regis is planning to change Tutor to 8 soon.
Felipe: Is anyone deploying databases in the cluster? We tried but found it too slow.
Braden: In my experience it’s too slow. OpenCraft uses managed dbs outside of the cluster.
Thomas: We’re still planning to review the README PR; it’s merged recently but we’ll add any issues we find on the PR. Braden: that’s perfect; I merged it after one review to reduce conflicts down the road since many other PRs change the README, but still welcome your review.