Arch Hours: 2023
Meeting Expectations
Why?
Provide an opportunity for generative discussion and ideas.
Foster comradery through technical curiosity and geekdom.
Who?
Open to all edX-ers and Arbisoft-ers
What?
At times, these informal discussions result in follow-up action and beneficial change in our technology or in our organization. While this is not a decision-making body, these serendipitous discussions spark ideas that may result in ADRs/OEPs and tickets on team backlogs.
At times, it serves as a form of informal office hours to ask live technical questions of the archeological collective.
At times, we have pre-planned deep-dive topics that folks propose to gather wide-input or to answer questions.
At times, we have hosted special guests (internal and external to edX) on specialized topics.
When?
Not lunch hour in ET timezone: With Covid remote work, "Arch Lunch" has evolved into “Arch Hour” in order to accommodate various home/life situations during lunch time.
How? Live Co-Editing
To circumvent Confluence’s limitations with the maximum number of concurrent editors:
during the hour together, we capture topics and take notes at https://docs.google.com/document/d/16-IVTGIjfKyMl8F4__Pk8Di4c_Lkx3k0-djIW6H14X0/edit# .
after the hour, we move those notes to this page.
Why not just stick with keeping the notes in the Google doc?
Google docs are not as discoverable.
Google docs don’t notify observers of future edits.
Google doc comments don’t notify all observers.
How? Structure
Please enter your proposed topics for discussion.
When we use Lean Coffee Style (link1, link2), we vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).
Prefix your topic with your intention so we are clear on what outcome you are striving from the discussion. Examples:
[inform] You are simply seeking to inform the group of this item. You may field clarifying questions from the group on your inform, but not seeking further discussion at this time.
[ideation] You are seeking divergent and wide perspectives from this group. In this brainstorming mode, all ideas are accepted, without critical analysis.
It may be helpful to clarify whether you’d like to ideate on the problem space or the solution space.
[analysis] You are asking the group to help you poke holes in your idea/topic/plan/etc.
[quest] You are seeking information/responses to a question you have.
2023-12-20
(Dave) New Relic -> DataDog status
Related: Application Performance Monitoring
Contract expires around June
Edx-platform should have open telemetry compatible layer for this so people can plug in their own APM solutions.
[inform] (Kelly) I made a terrible diagram of edx-platform: https://lucid.app/lucidchart/fb870610-f8b4-4b7e-a509-1b871f81c54b/edit?beaconFlowId=8DBD553E85CDC9EE&invitationId=inv_e964873a-34ba-4bea-bc36-2fbe304edf40&page=9J6X4Q5XLMLH#
Related: https://openedx.slack.com/archives/C0497NQCLBT/p1702908527116319 and Architecture diagram updates · Issue #449 · openedx/docs.openedx.org
Within 2U, some of this stuff probably should be owned by Service Experience but isn’t yet
[Ned] Divergence Strategies: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/730005583/Divergence
Xavier’s doc: https://docs.google.com/document/d/1YyRxBrgIVoxwdcQLTWMyfUdFnxcRaqLJD1kTxBbIUn8/edit
2023-12-13
[quest] (Dave): How’s the MySQL 8.0 switchover going?
Scheduled for 2am tonight
[quest] (Dave): What’s a good way to roll out database connection encoding changes to Studio for http://edx.org ? (configuration repo help)
Dave to make issue in configuration repo (?) to track this.
This may be overridable in edx-internal (which would allow for faster rollback)
Jeremy will see if we want to turn on Issues there
High-level, external-safe discussion of recent 2U staff meetings
[ideation] (Jeremy) 2U executive management has been talking a lot about the “Open edX maintenance burden”. Where are teams feeling this, and how much time is it taking? What parts of it would we actually be comfortable handing over to other organizations to handle?
Fixing bugs?
Merging dependency upgrade PRs?
Big framework upgrades?
Roadmap decisions?
Reviewing changes from outside the core owning/maintaining team?
Deprecating stuff that’s no longer useful?
Building extension points so optional features can be added without being added for everyone?
[quest] (Jeremy) Are there more things like the Insights stack that are effectively only used by 2U, and should be deprecated as far as Open edX is concerned?
And how should we proactively identify things like this moving forward?
(Andy) At least a yearish ago there was at least one other insights user
[quest] How deep are architecture vendor commitments? What failover features are there?
2023-12-06
[Ned] What is bad about installing dependencies from GitHub?
OEP-18: Python Dependency Management — Open edX Proposals 1.0 documentation (see the end of the “Rationale” section)
[Jeremy] What would people want to see from an Open edX maintenance working group?
Expertise about how to use Dependabot and Renovate
“Error budget” for teams is useful
Test suites that are comprehensive enough and fast enough to run
Standard way to avoid trying to upgrade to a known-broken release again somewhere else
Clear path from identification of unwanted dependencies to deprecation of them
Path to consistently applying new good code patterns across our codebase
[Jeremy] What are leading causes of delayed/lost PR reviews on your teams?
[Andy/Cosmonauts] Big, intimidating PRs and lack of context after an ownership transfer
[Chris] Variety of different reasons
[Hilary] PRs that span ownership boundaries
[Andy] High/unclear level of responsibility from approving a PR
2023-11-30
[inform] (Dave) Submit Open edX conference talks!
[AS][Wild Speculation] Are we in a startup runway situation now?
Should we consider more extreme short-term measures than we normally would, to minimize operating costs in the meantime?
Axim is coordinating a pile of funded contribution (FC) projects to make various improvements already; not a lot of bandwidth for new feature requests right now.
Spent a lot of time discussing this, hard to distill it into key points
2023-11-15
[inform] (Jeremy) courseware_studentmodule Table Refactoring
[musings/questions] (Ned) Mapping people/squads/repos
[Question] (Hilary) If we’re trying to design new db for a model, who is the best person/people to talk to?
Solution Review
Dave Ormsbee at Axim
Many principal/staff engineers
No real DBAs, but…
At least for a while, we have part-time contract DBA via Percona
Also, consult Everything About Database Migrations
[inform] (Jeremy) Socket.dev - tool for tracking dependency health
[quest] (Jeremy) Awareness of Django’s transaction.on_commit() and similar tools which are incredibly useful in the right circumstances. How do we make sure developers are aware of / informed of these when appropriate?
New “everything about concurrency” guide?
Celery
Transactions
Event bus
Django async
[inform] (Alex) Enterprise likes drf-spectacular
Related: Making edX Platform APIs public
Related: API Improvements · Issue #32609 · openedx/edx-platform
Should we have a doc/resource of “we like these patterns, do more of this”?
2023-11-08
[inform] (Dave) New Relic is a great resource!
[quest] (Dave) There are multiple places where it would be beneficial to add indexes in large tables (~10-100M+ rows?). I don’t think they’ll lock anything, but would the length of time it takes to run the migration be disruptive to the release process? What’s a good path forward so we don’t take edx.org by surprise?
Please hold off until MySQL 8.0 update has completed.
Long term: Need to do something about partitioning CSM.
[ideation] (Dave) Serving assets with Caddy (and maybe nginx?) via X-Accel-Redirect.
Dave: FWIW, I wasn’t planning to use this. I think folks have standardized on the header since this was written, it doesn’t seem like it’s completely up to date, and I’m planning to do a lot of header tweaking anyhow.
Can make the container run Caddy and have that work in devstack as well.
Actual nginx configuration still happens via the configuration repo–can talk to TNL for testing help.
[question] (Ned) why are there so many old renovate pull requests open?
86 created before 2023: https://github.com/pulls?q=is%3Apr+is%3Aopen+author%3Aapp%2Frenovate+org%3Aopenedx+created%3A%3C2023-01-01
[question] (Jeff) Any special security considerations re browser extensions?
[question] (Jeff) AGPL / borrowing Canvas code concerns?
[question] (Jeff) How best to query for the presence of SRT captions files for videos?
2023-11-01
[inform] (Ned) GitHub pull request labels can make Jira issues
Get in touch with Ned if you want to enable this for specific repo/project pairs
TNL & Aperture are testing it out
[inform] (Andy) similar to what is going on with edx-search, what’s going on with enrollments https://2u-internal.atlassian.net/wiki/spaces/microb/pages/620036097/Enrollment+Notes
Just edit it if it’s wrong 🙂
[quest] (Ned) Where are we on cypress testing?
Have functional (although sparse) e2e smoke test suite; wasn’t too much easier to set up than bok-choy, but does seem to be easier to maintain
There have been mentions of expanding the smoke tests and/or using it for a11y CI checks, but no concrete plans yet
Requires JS experience and a review of the “lessons learned” docs we have so far
[quest] (Jeremy) Do people think Conventional Comments are a good idea?
Some enterprise engineers at 2U are debating it: https://twou.slack.com/archives/C049C2JGH3L/p1698697036085959
Robert does something like this already, a few others do at least sometimes
People seem generally up for trying it
Don’t think we want to enforce it, but may be good to suggest it?
English is hard (risk of stumbling into the recommended formatting while meaning and thinking about something else)
Arch-BOM uses https://2u-internal.atlassian.net/wiki/spaces/AT/pages/16385625/How+We+Announce guidelines for getting the word out about new things
May be good just to see some senior people doing it as an example, to see if it starts a trend
[ideation] (Jeremy) Mitigating pressure for 2U & Open edX code to diverge
2U concerns
Security of releases
Including regulatory compliance
Ability to deploy changes quickly
Extra deprecation overhead (relatively minor point)
Axim/Open edX concerns
Codebase clean of 2U-specific cruft
Shared worries
Time needed to reconcile divergent branches
Risk of permanent divergence resulting from logistics rather than intent/benefit
Ideas
Try to make sure that core committer merge process satisfies core regulatory requirements: require at least one non-author approver, have at least a PR (if not a separate issue) to track the rationale for each change, etc. (this may also benefit companies other than 2U in satisfying regulatory needs)
If necessary, use fast-forward-only release branches for edx.org that lag main/master just enough to perform any mandatory review for commits from non-2U-employees/contractors (but do we really need this, given that we don’t do it for other dependencies?)
Document and expand the range of options for quickly getting changes to production via config changes, extension points, temporary forks, etc. that don’t require putting 2U-specific code in Open edX
2023-10-25
Skipped so 2U developers could stay focused on Innovation Week (hackathon).
2023-10-18
[inform] (Ned) Innovation Week!!1!
[quest] (Jeremy) Does anyone have a security/vulnerability scanner they would recommend? Dependabot can’t track both main/master and a release branch.
[informish] (Andy) edx-search research (also 2u specific), maybe we should do more of this, looking at enrollments now.
[Question] CourseOverview has an org string field, not a fk to Organization - Any insight into the architectural reasons for this?
CourseOverview predates the organization table, and is essentially a cache of data in MongoDB
2023-10-11
[quest] (Ned) Do people understand that Core Contributors are allowed to merge without 2U approval?
It seems at least some people didn’t realize this
Maybe people like Virginia who are new to 2U and/or Open edX aren’t aware of this either?
https://openedx.atlassian.net/wiki/spaces/COMM/pages/3334635570
https://openedx.atlassian.net/wiki/spaces/COMM/pages/3156344833
[inform] (Jeremy) Maintenance Working Group launch preparation
Let Jeremy know if you’re interested in joining
[quest] (Jeff Witt) A11y in CI – opinions on how to implement?
WCAG 2.2 was released last week
A few a11y issues have made it into production that shouldn’t have
We used to have a minimal suite of automated a11y tests, but it was very hard to maintain and rarely caught problems, so we got rid of it
Tools have improved over time, we’re about due to pick one to employ for CI
We were using axe-core which is still popular, but may be going closed-source soon; there are other options now also
Jeremy’s notes on tools, etc. from earlier discussions around this:
https://codeburst.io/automated-accessibility-testing-tool-a11y-pa11y-jest-storybook-2ad294bfe71a
https://dev.to/willkre/3-ways-to-automate-accessibility-testing-a11y-19kc
https://dev.to/steady5063/react-testing-library-accessibility-4fom
https://www.digitala11y.com/open-source-accessibility-tools/
https://medium.com/john-lewis-software-engineering/automating-a11y-testing-part-1-axe-ed3d215de126
https://storybook.js.org/docs/react/writing-tests/accessibility-testing
Shifting left on catching a11y issues really reduces the cost of compliance
We aren’t doing the annual a11y training anymore, but we plan to roll it out again for at least the most relevant personnel
[quest] (Jeremy) Tech stack consolidation - what do you think we should try to get rid of, and in favor of what?
https://backstage.techdev.2u.com/ has a comparison of the 2U vertical tech stacks & processes as of shortly after the edX acquisition (2U-private, it’s on the main page after authenticating via GitHub)
[quest] (Jeff) If I set up a test user in a production course, does that muck with financial reporting?
(Matt) Enterprise has some test user capabilities set up for testing integrations and such
2023-10-04
[quest] (Dave) How is the MySQL 8 upgrade coming along?
(Jeremy) Largely done except for prod LMS, which is rerunning a days-long mitigation schema change
(Jeremy) We just switched devstack over
(Jeremy) Django 4.2 LMS upgrade involves password hash change, Arbi-BOM discussing options with BTR
[quest] (Dave) Is 2U using OrbStack now?
(Jeremy) Several individuals are, still going through vendor review for broader adoption
[quest] (Jeremy) Does anybody have thoughts on gaps in Kubernetes for use as a dev environment?
(John) Test data is needed regardless of which dev environment we go with
2023-09-27
[inform] (Ned) 2U has signed the Axim non-technical contribution agreement.
[inform] (Ned) Next 2U hackathon (“Innovation Week”) is Oct 23-25
[inform] (Jeremy) 2U vendor review triggers & consequences
Install things in a virtual machine with no VPN access to test them?
[quest] (Jeremy) What technology/tool do you think would best help you get good work done, but we just aren’t set up to use it yet?
(Matt) AI coding assistants?
(Jeremy) There are interesting and unanswered questions around inadvertent copyright infringement and copyrightability of the generated code
(Andy) Solvable via custom LLM trained on our own code?
(Hilary) Would this exacerbate the problem of not having enough people with deep knowledge of different parts of our systems?
(Matt) Maybe this will help us get knowledge bases that actually work
(Andy) OrbStack!
(Andy) Type hinting? TypeScript?
(Hilary) prop-types in JavaScript?
(Matt) Cloudflare AI tools
2023-09-20
[Ned] Axim would like to remove admin access to repos (this is different from the write access topic I mentioned last week). What concerns do people have about that?
[John] Is it likely that we’ll end up with a bunch of edx-org forks as a result of this?
[Ned] Feels unlikely at this point, lots of extra work
[Ned] More likely that we start using personal forks more often
[Ned] When do people really want admin access, anyway?
[Jeremy] Updating branch protection (required CI checks)
[Andy] Initial repo setup
[David] 2U Internal Marketplace update
[Jeff] Will the frontend be using Paragon or something else?
We think we’ll still have something like commerce coordinator, but it may have a more limited, appropriately scoped API than originally envisioned
[Jeremy] Arbi-BOM noted that we’ve been wanting to get rid of django-oscar & ecommerce for 2 years, now needing to upgrade them again for another Django upgrade
[quest](Hilary) - api access
[Robert]
Use observability to track down or learn about usage: https://github.com/openedx/edx-django-utils/blob/master/edx_django_utils/monitoring/docs/how_tos/using_custom_attributes.rst
Add linting. :) Use lint-amnesty.
Consider moving cross-service role sharing from JWT to another mechanism (e.g. events + data duplication, other?)
2023-09-13
[inform] (Hilary) OEP-66 PR up for review
Includes the relevant bits of OEP-9, drops or updates others
Bridgekeeper vs. rules
[quest] (Jeremy) Has anyone worked with Kompose or any other tools for 1) migrating from docker-compose to k8s or 2) allowing deployment of custom subsets of related microservices in k8s?
Not in this crowd
[question] (Mat) Infinity is looking to understand if Notifications belongs in edx or openedx. There’s not been a lot of understanding where this feature should eventually land.
[question] (Dave): The cut-off for Quince is coming soon (October 9th). Will we be on Django 4.2 for edx-platform? MySQL 8.0?
Devstack should be switching over to MySQL 8 within a week
Code is largely ready for Django 4.2, last PRs are being finalized and merged
Trying to get edx.org on MySQL 8 before updating the default requirements
If we miss end of September for that, we’ll try to make Django 4.2 the default with alternate 3.2-using requirements files for edx.org to use (annoying to implement in edx.org deployment pipelines)
BTW: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3613392957
[inform] (Ned) Long-term we are going to be reducing 2U access to all repos, and give teams write access to the repos they need.
[request] (Ned) Please educate new devs about the public aspects of much of our code.
Branch naming
Avoid links to private data
2023-09-06
[quest] (Robert) It seems the LMS auth_user table doesn’t have a modified date.
Is there a way to determine this?
If not, what’s the best way to resolve this at our scale (at least going forward)?
Django offers support for custom user tables, but edx-platform predates the feature, and it would be a fairly painful switch at this point
We sent tracking logs on most field changes, but not all
first_name and last_name are ignored, for example
How about django-simple history?
Would give us a full audit record moving forward, but likely to be massive (and probably overkill)
How do post_save and post_commit interact?
[quest] (Deborah) Is there a documented general principle about what log level to use?
Different teams have logs in Splunk, New Relic, and/or DataDog
If we did a major change of log formatting, we’d probably want to talk to the community first
We should probably have some kind of OEP/ADR for Python logging in general
[inform/quest] (Robert) How can we quickly determine the actual impact of a category of error?
Number of users impacted, scope of impact for each user, etc.
Starting to discuss with Data Engineering
Can this be tied to course completion?
May be a good case study for engineering noticing a problem and trying to make a call on how much work to put into getting resources allocated to fixing it
2023-08-30
[inform] (Jeremy) We are seeking good topics for the edX engineering blog and/or volunteers to write about them
[quest] (Jeremy) Should we consider piloting structlog somewhere?
New Relic now supports it: Python agent v8.11.0 | New Relic Documentation
Seems to avoid the need for some of the black-magic parsing we use Splunk for
The pretty console output could be a nice DevEx enhancement in dev environments
[quest] (Phil) Does anyone know anything about Next.js?
Phil:
I know relatively little about Next.js; any advice? Have we used it at edX before? Does it work well with our current stack? Are there any good learning resources out there?
Looking at: django-nextjs, a Django app plugin to enable a Django IDA to serve Next.js server-side code [blog post]
Apparently https://pypi.org/project/django-nextjs/ exists to allow Django templates and Next.js templates to co-exist
(Jeremy) FED-BOM did a little discovery on this and related tooling in https://github.com/openedx/wg-frontend/issues/126 (we decided at the time to punt until some of the alternatives mature a bit)
Jeremy: Related issue: https://github.com/openedx/wg-frontend/issues/126
This is more of a question for Frontend WG/Paragon WG.
On the open-source side, OEP-11 is the guiding document that may need amendment for Next.js.
[quest] (Jeremy) In light of Deciphering Glyph :: Get Your Mac Python From Python.org , should we make any updates to recommendations for installing Python on macOS?
(flame): pyenv!
Current documentation:
Summary: no real need to change anything here, either pyenv or official installers should work for most people at 2U / working on Open edX
[quest] (Dave) Any thoughts on porting forums service to a Django app? (I know there’s Infinity discovery around this, but I’m curious if others had also looked at this problem, or if there were other discussions not captured in those docs.) Context is that Axim is considering funding something here.
We’d love to see this get done
Diana: concern about data migration
Dave: Phase 1 would keep the data in place and start moving towards removing the Ruby aspect of things. Any potential data migration would happen after that.
Would alleviate (admittedly modest) security concerns around Sinatra and dependency gems
Would greatly reduce barriers to making forums enhancements
[quest] (Jeff) Mathjax version 3 OSPR received; could we roll this out course-by-course?
Also, native browser rendering or MathCAT may be preferable in some cases
(Dave) Maybe go to the Product Working Group to plan out the per-course rollout capability?
[quest] (Jeremy) Dev environment direction
(Hilary) Think there will still be a need for local long term
Intermittent internet connections, for example
(Andy) There are some jury-rigged AI configurations that are easier to set up locally
But default to remote, fall back to local may be where we want to end up
(aside) Open edX is featured on Orbstack’s website! https://docs.orbstack.dev/benchmarks
[quest] (Hilary) How do people set up new large Open edX sites?
Mostly custom Terraform and such, sometimes based off of an AWS solution template
Although Harmony is a new Kubernetes-based solution for this
2023-08-23
[analysis] (Jeremy) Docker Desktop replacement
Wiki page with some analysis: https://openedx.atlassian.net/wiki/spaces/AC/pages/3845914644
Arch-BOM ticket for continuing investigation: https://github.com/edx/edx-arch-experiments/issues/93
[quest] (Ned) Why are we OK with a 2-hour deployment pipeline?
(Andy) It’s worse than that, it’s a 2 hour nondeterministic pipeline
(Phil) From GoCD edxapp statistics: 45+20+1+1+15+(2+3+7)=94 minutes
The 45 minutes (half the duration) is building the AMI
Each number is the average duration of a pipeline step.
Pipeline #’s in parenthesis happen after the build is available on prod.
We don’t have a good basis of comparing even between different pipelines within 2U
(Jeremy) We don’t want to continue using GoCD in the long term, which leads to debate on the value of optimizing it (vs. doing work to switch to Argo CD instead)
(Jeremy) There are several parallel efforts to reduce edx-platform build time, but the time to value delivery is long doing it that way. Maybe we should concentrate our efforts a little better for incremental value delivery?
[quest] (Alex D.) Any patterns that folks like for data replication between services?
[we talked about things]
https://openedx.atlassian.net/wiki/spaces/AC/pages/1074397222 see points about Eventual Consistency
[quest] (Ned) What kinds of informal education are useful for developers?
High-level block diagram (context/container from c4)
Architectural onboarding has fallen by the wayside
How code is organized (mono-repo and otherwise)
Migrations, what they are and how they go wrong
Celery
Tour of a new ida makefile
What counts as “core”
[quest] (Adam) How do we get better at either smoke tests or health checks so that we can make big changes to infrastructure more confidently and detect things before we ship bugs to prod.
2023-08-16
[ideation] (Jeremy) What (if anything) should we do for next-generation automated a11y testing?
The QA team is interested in working on this if we want to put effort into it
(Jeff) We have something to run axe-core for MFEs, unclear how many MFEs are running it and how often
(Jeff) There are services that do automated checks for large sites, but they’re ridiculously expensive
(Jeff) Have been trying to pick a tool by the end of the year
(Jeff) Also, there’s an external audit in the works
Jeremy will connect Jeff and the QA team to try to figure out next steps
[quest] (Hilary) - Is anyone interested in being a co-author on an OEP about authz best practices?
Would be nice to have someone already familiar with edx-rbac and Django Admin
(Robert) The authentication OEP is more of a collection of documentation and context than official “best practices”
(Jeremy) Any idea yet how this would relate to OEP-9: User Authorization (Permissions) — Open edX Proposals 1.0 documentation and OEP-4:
Application Authorization (Scopes) — Open edX Proposals 1.0 documentation
Ideally supplant these, given that they’re pretty old and not reflective of the current system; include the parts that are still useful
Jeremy is up for reviewing, but doesn’t really have time to co-author
[quest] (Jeff) - Can you think of any problems if no MathML rendering library is included with Open edX or http://edx.org ? (Chrome and Firefox include MathML rendering now, and I wonder if interactivity might be better as a browser extension for some users)
There’s also MathCAT
Oh hey, it’s written in Rust
Browser support has gotten much better in recent years
We haven’t yet done a detailed a11y evaluation of the options
Worth looking at using the native browser rendering just for the JS download size savings
We spend non-trivial $ pushing MathJax bits to everyone in a lot of pages.
[question] (Ned) What does a “Cybersecurity review” entail?.
Came up in the context of being needed for anything being open sourced
(Adam) Sometimes involves pentesting or an external security firm review
(Hilary) If you don’t have confidence in the answers to the form, you can get help talking through it
Brian M and Purva have done an AppSec for Trilogy user account things
[question] (Adam) Do operators actually need to upgrade to MySQL 8 by the Q