Large Instances Meeting Notes 2025-04-29

Large Instances Meeting Notes 2025-04-29

Transcript | Recording

Summary (AI Generated)

The DevOps collaboration meeting for Open edX Kubernetes and large instances, held on April 29, 2025, focused on current challenges and progress in deploying and managing Open edX at scale. Attendees Braden MacDonald, Felipe Montoya, Jhony Avella, and Moisés González discussed the integration and effectiveness of AI tools like Copilot and ChatGPT in their workflow, acknowledging the significant changes AI brings to software development practices and decision-making.

Jhony Avella highlighted efforts around AWS infrastructure automation, noting that his team has improved the controller resources for Tutor instances but still requires extensive parallel testing. Additionally, his team addressed slow initialization times for Open edX installations caused by insufficient resource allocation. Current initialization times are down to about 30-35 minutes, with goals set to reduce this further to 15-20 minutes, especially targeting database migrations which currently consume around half of the initialization time.

The team discussed potential optimizations, including Felipe Montoya's proposal for pre-migrated database templates and Braden MacDonald's suggestion of using Django’s migration consolidation feature. This could significantly speed up deployments by reducing the complexity and volume of migrations.

They also reviewed ongoing work involving migrations to the new forum application and noted minor issues during the process, stressing the importance of clear documentation and tickets for tracking these concerns. Felipe raised a specific issue regarding a longstanding bug in the micro-frontend (MF) handling of advanced settings, noting the urgency of integrating a pending fix.

Cost efficiency in Kubernetes clusters was another major topic. Braden described how Opencraft was reviewing and potentially reducing Kubernetes expenses by shutting down unused staging clusters and integrating staging environments within production clusters when possible. Similarly, Felipe shared their strategy of closely monitoring AWS bills to identify and eliminate unnecessary infrastructure costs.

Finally, the group touched upon new community adoptions of the Grove and Harmony tools for Kubernetes deployments, notably by Clemson University. They emphasized the critical need for reliable upgrade pathways in deployment tools and reflected on past challenges with maintaining instances post-deployment, underscoring the importance of continuous maintenance capabilities.