/
Arch Hours: 2023

Arch Hours: 2023

Meeting Expectations

Why?

  • Provide an opportunity for generative discussion and ideas.

  • Foster comradery through technical curiosity and geekdom.

Who?

  • Open to all edX-ers and Arbisoft-ers

What?

  • At times, these informal discussions result in follow-up action and beneficial change in our technology or in our organization. While this is not a decision-making body, these serendipitous discussions spark ideas that may result in ADRs/OEPs and tickets on team backlogs.

  • At times, it serves as a form of informal office hours to ask live technical questions of the archeological collective.

  • At times, we have pre-planned deep-dive topics that folks propose to gather wide-input or to answer questions.

  • At times, we have hosted special guests (internal and external to edX) on specialized topics.

When?

  • Not lunch hour in ET timezone: With Covid remote work, "Arch Lunch" has evolved into “Arch Hour” in order to accommodate various home/life situations during lunch time.

How? Live Co-Editing

To circumvent Confluence’s limitations with the maximum number of concurrent editors:

Why not just stick with keeping the notes in the Google doc?

  • Google docs are not as discoverable.

  • Google docs don’t notify observers of future edits.

  • Google doc comments don’t notify all observers.

How? Structure

Please enter your proposed topics for discussion.
When we use Lean Coffee Style (link1, link2), we vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).

Prefix your topic with your intention so we are clear on what outcome you are striving from the discussion. Examples:

  • [inform] You are simply seeking to inform the group of this item. You may field clarifying questions from the group on your inform, but not seeking further discussion at this time.

  • [ideation] You are seeking divergent and wide perspectives from this group. In this brainstorming mode, all ideas are accepted, without critical analysis.

    • It may be helpful to clarify whether you’d like to ideate on the problem space or the solution space.

  • [analysis] You are asking the group to help you poke holes in your idea/topic/plan/etc.

  • [quest] You are seeking information/responses to a question you have.

2023-12-20

2023-12-13

  • [quest] (Dave): How’s the MySQL 8.0 switchover going?

    • Scheduled for 2am tonight

  • [quest] (Dave): What’s a good way to roll out database connection encoding changes to Studio for http://edx.org ? (configuration repo help)

    • Dave to make issue in configuration repo (?) to track this.

    • This may be overridable in edx-internal (which would allow for faster rollback)

    • Jeremy will see if we want to turn on Issues there

  • High-level, external-safe discussion of recent 2U staff meetings

  • [ideation] (Jeremy) 2U executive management has been talking a lot about the “Open edX maintenance burden”.  Where are teams feeling this, and how much time is it taking?  What parts of it would we actually be comfortable handing over to other organizations to handle?

    • Fixing bugs?

    • Merging dependency upgrade PRs?

    • Big framework upgrades?

    • Roadmap decisions?

    • Reviewing changes from outside the core owning/maintaining team?

    • Deprecating stuff that’s no longer useful?

    • Building extension points so optional features can be added without being added for everyone?

  • [quest] (Jeremy) Are there more things like the Insights stack that are effectively only used by 2U, and should be deprecated as far as Open edX is concerned?

    • And how should we proactively identify things like this moving forward?

    • (Andy) At least a yearish ago there was at least one other insights user

  • [quest] How deep are architecture vendor commitments?  What failover features are there?

2023-12-06

  • [Ned] What is bad about installing dependencies from GitHub?

  • [Jeremy] What would people want to see from an Open edX maintenance working group?

    • Expertise about how to use Dependabot and Renovate

    • “Error budget” for teams is useful

    • Test suites that are comprehensive enough and fast enough to run

    • Standard way to avoid trying to upgrade to a known-broken release again somewhere else

    • Clear path from identification of unwanted dependencies to deprecation of them

    • Path to consistently applying new good code patterns across our codebase

  • [Jeremy] What are leading causes of delayed/lost PR reviews on your teams?

    • [Andy/Cosmonauts] Big, intimidating PRs and lack of context after an ownership transfer

    • [Chris] Variety of different reasons

    • [Hilary] PRs that span ownership boundaries

    • [Andy] High/unclear level of responsibility from approving a PR

2023-11-30

  • [inform] (Dave) Submit Open edX conference talks!

  • [AS][Wild Speculation] Are we in a startup runway situation now?

    • Should we consider more extreme short-term measures than we normally would, to minimize operating costs in the meantime?

    • Axim is coordinating a pile of funded contribution (FC) projects to make various improvements already; not a lot of bandwidth for new feature requests right now.

    • Spent a lot of time discussing this, hard to distill it into key points

2023-11-15

  • [inform] (Jeremy) courseware_studentmodule Table Refactoring  

  • [musings/questions] (Ned) Mapping people/squads/repos

  • [Question] (Hilary) If we’re trying to design new db for a model, who is the best person/people to talk to?

    • Solution Review

    • Dave Ormsbee at Axim

    • Many principal/staff engineers

    • No real DBAs, but…

    • At least for a while, we have part-time contract DBA via Percona

    • Also, consult Everything About Database Migrations  

  • [inform] (Jeremy) Socket.dev - tool for tracking dependency health 

  • [quest] (Jeremy) Awareness of Django’s transaction.on_commit() and similar tools which are incredibly useful in the right circumstances.  How do we make sure developers are aware of / informed of these when appropriate?

    • New “everything about concurrency” guide?

      • Celery

      • Transactions

      • Event bus

      • Django async

  • [inform] (Alex) Enterprise likes drf-spectacular

2023-11-08

  • [inform] (Dave) New Relic is a great resource!

  • [quest] (Dave) There are multiple places where it would be beneficial to add indexes in large tables (~10-100M+ rows?). I don’t think they’ll lock anything, but would the length  of time it takes to run the migration be disruptive to the release process? What’s a good path forward so we don’t take edx.org by surprise?

    • Please hold off until MySQL 8.0 update has completed.

    • Long term: Need to do something about partitioning CSM.

  • [ideation] (Dave) Serving assets with Caddy (and maybe nginx?) via X-Accel-Redirect.

    • django-sendfile2  

      • Dave: FWIW, I wasn’t planning to use this. I think folks have standardized on the header since this was written, it doesn’t seem like it’s completely up to date, and I’m planning to do a lot of header tweaking anyhow.

    • Can make the container run Caddy and have that work in devstack as well.

    • Actual nginx configuration still happens via the configuration repo–can talk to TNL for testing help.

  • [question] (Ned) why are there so many old renovate pull requests open?

    • 86 created before 2023: ​​https://github.com/pulls?q=is%3Apr+is%3Aopen+author%3Aapp%2Frenovate+org%3Aopenedx+created%3A%3C2023-01-01

  • [question] (Jeff) Any special security considerations re browser extensions?

  • [question] (Jeff) AGPL / borrowing Canvas code concerns?

  • [question] (Jeff) How best to query for the presence of SRT captions files for videos?

2023-11-01

  • [inform] (Ned) GitHub pull request labels can make Jira issues

    • Get in touch with Ned if you want to enable this for specific repo/project pairs

    • TNL & Aperture are testing it out

  • [inform] (Andy) similar to what is going on with edx-search, what’s going on with enrollments https://2u-internal.atlassian.net/wiki/spaces/microb/pages/620036097/Enrollment+Notes

    • Just edit it if it’s wrong 🙂

  • [quest] (Ned) Where are we on cypress testing?

    • Have functional (although sparse) e2e smoke test suite; wasn’t too much easier to set up than bok-choy, but does seem to be easier to maintain

    • There have been mentions of expanding the smoke tests and/or using it for a11y CI checks, but no concrete plans yet

    • Requires JS experience and a review of the “lessons learned” docs we have so far

  • [quest] (Jeremy) Do people think Conventional Comments are a good idea?

  • [ideation] (Jeremy) Mitigating pressure for 2U & Open edX code to diverge

    • 2U concerns

      • Security of releases

        • Including regulatory compliance

      • Ability to deploy changes quickly

      • Extra deprecation overhead (relatively minor point)

    • Axim/Open edX concerns

      • Codebase clean of 2U-specific cruft

    • Shared worries

      • Time needed to reconcile divergent branches

      • Risk of permanent divergence resulting from logistics rather than intent/benefit

    • Ideas

      • Try to make sure that core committer merge process satisfies core regulatory requirements: require at least one non-author approver, have at least a PR (if not a separate issue) to track the rationale for each change, etc. (this may also benefit companies other than 2U in satisfying regulatory needs)

      • If necessary, use fast-forward-only release branches for edx.org that lag main/master just enough to perform any mandatory review for commits from non-2U-employees/contractors (but do we really need this, given that we don’t do it for other dependencies?)

      • Document and expand the range of options for quickly getting changes to production via config changes, extension points, temporary forks, etc. that don’t require putting 2U-specific code in Open edX

2023-10-25

Skipped so 2U developers could stay focused on Innovation Week (hackathon).

2023-10-18

  • [inform] (Ned) Innovation Week!!1!

  • [quest] (Jeremy) Does anyone have a security/vulnerability scanner they would recommend?  Dependabot can’t track both main/master and a release branch.

  • [informish] (Andy) edx-search research (also 2u specific), maybe we should do more of this, looking at enrollments now.

  • [Question] CourseOverview has an org string field, not a fk to Organization - Any insight into the architectural reasons for this?

    • CourseOverview predates the organization table, and is essentially a cache of data in MongoDB

2023-10-11

2023-10-04

  • [quest] (Dave) How is the MySQL 8 upgrade coming along?

    • (Jeremy) Largely done except for prod LMS, which is rerunning a days-long mitigation schema change

    • (Jeremy) We just switched devstack over

    • (Jeremy) Django 4.2 LMS upgrade involves password hash change, Arbi-BOM discussing options with BTR

  • [quest] (Dave) Is 2U using OrbStack now?

    • (Jeremy) Several individuals are, still going through vendor review for broader adoption

  • [quest] (Jeremy) Does anybody have thoughts on gaps in Kubernetes for use as a dev environment?

    • (John) Test data is needed regardless of which dev environment we go with

2023-09-27

  • [inform] (Ned) 2U has signed the Axim non-technical contribution agreement.

  • [inform] (Ned) Next 2U hackathon (“Innovation Week”) is Oct 23-25

  • [inform] (Jeremy) 2U vendor review triggers & consequences

    • Install things in a virtual machine with no VPN access to test them?

  • [quest] (Jeremy) What technology/tool do you think would best help you get good work done, but we just aren’t set up to use it yet?

    • (Matt) AI coding assistants?

      • (Jeremy) There are interesting and unanswered questions around inadvertent copyright infringement and copyrightability of the generated code

        • (Andy) Solvable via custom LLM trained on our own code?

      • (Hilary) Would this exacerbate the problem of not having enough people with deep knowledge of different parts of our systems?

      • (Matt) Maybe this will help us get knowledge bases that actually work

    • (Andy) OrbStack!

    • (Andy) Type hinting?  TypeScript?

    • (Hilary) prop-types in JavaScript?

    • (Matt) Cloudflare AI tools

2023-09-20

  • [Ned] Axim would like to remove admin access to repos (this is different from the write access topic I mentioned last week).  What concerns do people have about that?

    • [John] Is it likely that we’ll end up with a bunch of edx-org forks as a result of this?

      • [Ned] Feels unlikely at this point, lots of extra work

      • [Ned] More likely that we start using personal forks more often

    • [Ned] When do people really want admin access, anyway?

      • [Jeremy] Updating branch protection (required CI checks)

      • [Andy] Initial repo setup

  • [David] 2U Internal Marketplace update

    • [Jeff] Will the frontend be using Paragon or something else?

    • We think we’ll still have something like commerce coordinator, but it may have a more limited, appropriately scoped API than originally envisioned

    • [Jeremy] Arbi-BOM noted that we’ve been wanting to get rid of django-oscar & ecommerce for 2 years, now needing to upgrade them again for another Django upgrade

  • [quest](Hilary) - api access

2023-09-13

  • [inform] (Hilary) OEP-66 PR up for review

    • Includes the relevant bits of OEP-9, drops or updates others

    • Bridgekeeper vs. rules

  • [quest] (Jeremy) Has anyone worked with Kompose or any other tools for 1) migrating from docker-compose to k8s or 2) allowing deployment of custom subsets of related microservices in k8s?

    • Not in this crowd

  • [question] (Mat) Infinity is looking to understand if Notifications belongs in edx or openedx. There’s not been a lot of understanding where this feature should eventually land.

  • [question] (Dave): The cut-off for Quince is coming soon (October 9th). Will we be on Django 4.2 for edx-platform? MySQL 8.0?

    • Devstack should be switching over to MySQL 8 within a week

    • Code is largely ready for Django 4.2, last PRs are being finalized and merged

    • Trying to get edx.org on MySQL 8 before updating the default requirements

    • If we miss end of September for that, we’ll try to make Django 4.2 the default with alternate 3.2-using requirements files for edx.org to use (annoying to implement in edx.org deployment pipelines)

    • BTW: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3613392957  

  • [inform] (Ned) Long-term we are going to be reducing 2U access to all repos, and give teams write access to the repos they need.

  • [request] (Ned) Please educate new devs about the public aspects of much of our code.

    • Branch naming

    • Avoid links to private data

2023-09-06

  • [quest] (Robert) It seems the LMS auth_user table doesn’t have a modified date.

    • Context: https://twou.slack.com/archives/C04ACDVM6A1/p1694010936917439?thread_ts=1694010901.764199&cid=C04ACDVM6A1  

    • Is there a way to determine this?

    • If not, what’s the best way to resolve this at our scale (at least going forward)?

    • Django offers support for custom user tables, but edx-platform predates the feature, and it would be a fairly painful switch at this point

    • We sent tracking logs on most field changes, but not all

      • first_name and last_name are ignored, for example

    • How about django-simple history?

      • Would give us a full audit record moving forward, but likely to be massive (and probably overkill)

    • How do post_save and post_commit interact?

  • [quest] (Deborah) Is there a documented general principle about what log level to use?

    • https://docs.python.org/3/library/logging.html#levels

    • Different teams have logs in Splunk, New Relic, and/or DataDog

    • If we did a major change of log formatting, we’d probably want to talk to the community first

    • We should probably have some kind of OEP/ADR for Python logging in general

  • [inform/quest] (Robert) How can we quickly determine the actual impact of a category of error?

    • Number of users impacted, scope of impact for each user, etc.

    • Starting to discuss with Data Engineering

    • https://onenr.io/0kjnpPZ56wo

      • Can this be tied to course completion?

    • May be a good case study for engineering noticing a problem and trying to make a call on how much work to put into getting resources allocated to fixing it

2023-08-30

  • [inform] (Jeremy) We are seeking good topics for the edX engineering blog and/or volunteers to write about them

  • [quest] (Jeremy) Should we consider piloting structlog somewhere?

  • [quest] (Phil) Does anyone know anything about Next.js?

    • Phil:

      • I know relatively little about Next.js; any advice? Have we used it at edX before? Does it work well with our current stack? Are there any good learning resources out there?

      • Looking at: django-nextjs, a Django app plugin to enable a Django IDA to serve Next.js server-side code [blog post]

      • Apparently https://pypi.org/project/django-nextjs/ exists to allow Django templates and Next.js templates to co-exist

      • (Jeremy) FED-BOM did a little discovery on this and related tooling in https://github.com/openedx/wg-frontend/issues/126 (we decided at the time to punt until some of the alternatives mature a bit)

    • Jeremy: Related issue: https://github.com/openedx/wg-frontend/issues/126

    • This is more of a question for Frontend WG/Paragon WG.

    • On the open-source side, OEP-11 is the guiding document that may need amendment for Next.js.

  • [quest] (Jeremy) In light of Deciphering Glyph :: Get Your Mac Python From Python.org , should we make any updates to recommendations for installing Python on macOS?

  • [quest] (Dave) Any thoughts on porting forums service to a Django app? (I know there’s Infinity discovery around this, but I’m curious if others had also looked at this problem, or if there were other discussions not captured in those docs.) Context is that Axim is considering funding something here.

    • We’d love to see this get done

    • Diana: concern about data migration

      • Dave: Phase 1 would keep the data in place and start moving towards removing the Ruby aspect of things. Any potential data migration would happen after that.

    • Would alleviate (admittedly modest) security concerns around Sinatra and dependency gems

    • Would greatly reduce barriers to making forums enhancements

  • [quest] (Jeff) Mathjax version 3 OSPR received; could we roll this out course-by-course?

    • Also, native browser rendering or MathCAT may be preferable in some cases

    • (Dave) Maybe go to the Product Working Group to plan out the per-course rollout capability?

  • [quest] (Jeremy) Dev environment direction

    • (Hilary) Think there will still be a need for local long term

      • Intermittent internet connections, for example

    • (Andy) There are some jury-rigged AI configurations that are easier to set up locally

      • But default to remote, fall back to local may be where we want to end up

    • (aside) Open edX is featured on Orbstack’s website! https://docs.orbstack.dev/benchmarks  

  • [quest] (Hilary) How do people set up new large Open edX sites?

    • Mostly custom Terraform and such, sometimes based off of an AWS solution template

    • Although Harmony is a new Kubernetes-based solution for this

2023-08-23

  • [analysis] (Jeremy) Docker Desktop replacement

  • [quest] (Ned) Why are we OK with a 2-hour deployment pipeline?

    • https://en.wikipedia.org/wiki/Boiling_frog

    • (Andy) It’s worse than that, it’s a 2 hour nondeterministic pipeline

    • (Phil) From GoCD edxapp statistics: 45+20+1+1+15+(2+3+7)=94 minutes

      • The 45 minutes (half the duration) is building the AMI

      • Each number is the average duration of a pipeline step.

      • Pipeline #’s in parenthesis happen after the build is available on prod.

    • We don’t have a good basis of comparing even between different pipelines within 2U

    • (Jeremy) We don’t want to continue using GoCD in the long term, which leads to debate on the value of optimizing it (vs. doing work to switch to Argo CD instead)

    • (Jeremy) There are several parallel efforts to reduce edx-platform build time, but the time to value delivery is long doing it that way.  Maybe we should concentrate our efforts a little better for incremental value delivery?

  • [quest] (Alex D.) Any patterns that folks like for data replication between services?

  • [quest] (Ned) What kinds of informal education are useful for developers?

    • High-level block diagram (context/container from c4)

    • Architectural onboarding has fallen by the wayside

    • How code is organized (mono-repo and otherwise)

    • Migrations, what they are and how they go wrong

    • Celery

    • Tour of a new ida makefile

    • What counts as “core”

  • [quest] (Adam) How do we get better at either smoke tests or health checks so that we can make big changes to infrastructure more confidently and detect things before we ship bugs to prod.

2023-08-16

  • [ideation] (Jeremy) What (if anything) should we do for next-generation automated a11y testing?

    • The QA team is interested in working on this if we want to put effort into it

    • (Jeff) We have something to run axe-core for MFEs, unclear how many MFEs are running it and how often

    • (Jeff) There are services that do automated checks for large sites, but they’re ridiculously expensive

    • (Jeff) Have been trying to pick a tool by the end of the year

    • (Jeff) Also, there’s an external audit in the works

    • Jeremy will connect Jeff and the QA team to try to figure out next steps

  • [quest] (Hilary) - Is anyone interested in being a co-author on an OEP about authz best practices?

  • [quest] (Jeff) - Can you think of any problems if no MathML rendering library is included with Open edX or http://edx.org ? (Chrome and Firefox include MathML rendering now, and I wonder if interactivity might be better as a browser extension for some users)

    • There’s also MathCAT

      • Oh hey, it’s written in Rust

    • Browser support has gotten much better in recent years

    • We haven’t yet done a detailed a11y evaluation of the options

    • Worth looking at using the native browser rendering just for the JS download size savings

    • We spend non-trivial $ pushing MathJax bits to everyone in a lot of pages.

  • [question] (Ned) What does a “Cybersecurity review” entail?.

    • Came up in the context of being needed for anything being open sourced

    • (Adam) Sometimes involves pentesting or an external security firm review

    • (Hilary) If you don’t have confidence in the answers to the form, you can get help talking through it

      • Brian M and Purva have done an AppSec for Trilogy user account things

  • [question] (Adam) Do operators actually need to upgrade to MySQL 8 by the Q