Arch Hours: 2023
Meeting Expectations
Why?
Provide an opportunity for generative discussion and ideas.
Foster comradery through technical curiosity and geekdom.
Who?
Open to all edX-ers and Arbisoft-ers
What?
At times, these informal discussions result in follow-up action and beneficial change in our technology or in our organization. While this is not a decision-making body, these serendipitous discussions spark ideas that may result in ADRs/OEPs and tickets on team backlogs.
At times, it serves as a form of informal office hours to ask live technical questions of the archeological collective.
At times, we have pre-planned deep-dive topics that folks propose to gather wide-input or to answer questions.
At times, we have hosted special guests (internal and external to edX) on specialized topics.
When?
Not lunch hour in ET timezone: With Covid remote work, "Arch Lunch" has evolved into “Arch Hour” in order to accommodate various home/life situations during lunch time.
How? Live Co-Editing
To circumvent Confluence’s limitations with the maximum number of concurrent editors:
during the hour together, we capture topics and take notes at https://docs.google.com/document/d/16-IVTGIjfKyMl8F4__Pk8Di4c_Lkx3k0-djIW6H14X0/edit# .
after the hour, we move those notes to this page.
Why not just stick with keeping the notes in the Google doc?
Google docs are not as discoverable.
Google docs don’t notify observers of future edits.
Google doc comments don’t notify all observers.
How? Structure
Please enter your proposed topics for discussion.
When we use Lean Coffee Style (link1, link2), we vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).
Prefix your topic with your intention so we are clear on what outcome you are striving from the discussion. Examples:
[inform] You are simply seeking to inform the group of this item. You may field clarifying questions from the group on your inform, but not seeking further discussion at this time.
[ideation] You are seeking divergent and wide perspectives from this group. In this brainstorming mode, all ideas are accepted, without critical analysis.
It may be helpful to clarify whether you’d like to ideate on the problem space or the solution space.
[analysis] You are asking the group to help you poke holes in your idea/topic/plan/etc.
[quest] You are seeking information/responses to a question you have.
2023-12-20
(Dave) New Relic -> DataDog status
Related: https://openedx.atlassian.net/wiki/spaces/AC/pages/3555754025
Contract expires around June
Edx-platform should have open telemetry compatible layer for this so people can plug in their own APM solutions.
[inform] (Kelly) I made a terrible diagram of edx-platform: https://lucid.app/lucidchart/fb870610-f8b4-4b7e-a509-1b871f81c54b/edit?beaconFlowId=8DBD553E85CDC9EE&invitationId=inv_e964873a-34ba-4bea-bc36-2fbe304edf40&page=9J6X4Q5XLMLH#
Related: https://openedx.slack.com/archives/C0497NQCLBT/p1702908527116319 and https://github.com/openedx/docs.openedx.org/issues/449
Within 2U, some of this stuff probably should be owned by Service Experience but isn’t yet
[Ned] Divergence Strategies: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/730005583/Divergence
Xavier’s doc: https://docs.google.com/document/d/1YyRxBrgIVoxwdcQLTWMyfUdFnxcRaqLJD1kTxBbIUn8/edit
2023-12-13
[quest] (Dave): How’s the MySQL 8.0 switchover going?
Scheduled for 2am tonight
[quest] (Dave): What’s a good way to roll out database connection encoding changes to Studio for http://edx.org ? (configuration repo help)
Dave to make issue in configuration repo (?) to track this.
This may be overridable in edx-internal (which would allow for faster rollback)
Jeremy will see if we want to turn on Issues there
High-level, external-safe discussion of recent 2U staff meetings
[ideation] (Jeremy) 2U executive management has been talking a lot about the “Open edX maintenance burden”. Where are teams feeling this, and how much time is it taking? What parts of it would we actually be comfortable handing over to other organizations to handle?
Fixing bugs?
Merging dependency upgrade PRs?
Big framework upgrades?
Roadmap decisions?
Reviewing changes from outside the core owning/maintaining team?
Deprecating stuff that’s no longer useful?
Building extension points so optional features can be added without being added for everyone?
[quest] (Jeremy) Are there more things like the Insights stack that are effectively only used by 2U, and should be deprecated as far as Open edX is concerned?
And how should we proactively identify things like this moving forward?
(Andy) At least a yearish ago there was at least one other insights user
[quest] How deep are architecture vendor commitments? What failover features are there?
2023-12-06
[Ned] What is bad about installing dependencies from GitHub?
OEP-18: Python Dependency Management — Open edX Proposals 1.0 documentation (see the end of the “Rationale” section)
[Jeremy] What would people want to see from an Open edX maintenance working group?
Expertise about how to use Dependabot and Renovate
“Error budget” for teams is useful
Test suites that are comprehensive enough and fast enough to run
Standard way to avoid trying to upgrade to a known-broken release again somewhere else
Clear path from identification of unwanted dependencies to deprecation of them
Path to consistently applying new good code patterns across our codebase
[Jeremy] What are leading causes of delayed/lost PR reviews on your teams?
[Andy/Cosmonauts] Big, intimidating PRs and lack of context after an ownership transfer
[Chris] Variety of different reasons
[Hilary] PRs that span ownership boundaries
[Andy] High/unclear level of responsibility from approving a PR
2023-11-30
[inform] (Dave) Submit Open edX conference talks!
[AS][Wild Speculation] Are we in a startup runway situation now?
Should we consider more extreme short-term measures than we normally would, to minimize operating costs in the meantime?
Axim is coordinating a pile of funded contribution (FC) projects to make various improvements already; not a lot of bandwidth for new feature requests right now.
Spent a lot of time discussing this, hard to distill it into key points
2023-11-15
[inform] (Jeremy) https://openedx.atlassian.net/wiki/spaces/AC/pages/3927375918/courseware+studentmodule+Table+Refactoring
[musings/questions] (Ned) Mapping people/squads/repos
[Question] (Hilary) If we’re trying to design new db for a model, who is the best person/people to talk to?
Solution Review
Dave Ormsbee at Axim
Many principal/staff engineers
No real DBAs, but…
At least for a while, we have part-time contract DBA via Percona
Also, consult https://openedx.atlassian.net/wiki/spaces/AC/pages/23003228
[inform] (Jeremy) Socket.dev - tool for tracking dependency health
[quest] (Jeremy) Awareness of Django’s transaction.on_commit() and similar tools which are incredibly useful in the right circumstances. How do we make sure developers are aware of / informed of these when appropriate?
New “everything about concurrency” guide?
Celery
Transactions
Event bus
Django async
[inform] (Alex) Enterprise likes drf-spectacular
Related: Making edX Platform APIs public
Related: https://github.com/openedx/edx-platform/issues/32609
Should we have a doc/resource of “we like these patterns, do more of this”?
2023-11-08
[inform] (Dave) New Relic is a great resource!
[quest] (Dave) There are multiple places where it would be beneficial to add indexes in large tables (~10-100M+ rows?). I don’t think they’ll lock anything, but would the length of time it takes to run the migration be disruptive to the release process? What’s a good path forward so we don’t take edx.org by surprise?
Please hold off until MySQL 8.0 update has completed.
Long term: Need to do something about partitioning CSM.
[ideation] (Dave) Serving assets with Caddy (and maybe nginx?) via X-Accel-Redirect.
https://pypi.org/project/django-sendfile2/
Dave: FWIW, I wasn’t planning to use this. I think folks have standardized on the header since this was written, it doesn’t seem like it’s completely up to date, and I’m planning to do a lot of header tweaking anyhow.
Can make the container run Caddy and have that work in devstack as well.
Actual nginx configuration still happens via the configuration repo–can talk to TNL for testing help.
[question] (Ned) why are there so many old renovate pull requests open?
86 created before 2023: https://github.com/pulls?q=is%3Apr+is%3Aopen+author%3Aapp%2Frenovate+org%3Aopenedx+created%3A%3C2023-01-01
[question] (Jeff) Any special security considerations re browser extensions?
[question] (Jeff) AGPL / borrowing Canvas code concerns?
[question] (Jeff) How best to query for the presence of SRT captions files for videos?
2023-11-01
[inform] (Ned) GitHub pull request labels can make Jira issues
Get in touch with Ned if you want to enable this for specific repo/project pairs
TNL & Aperture are testing it out
[inform] (Andy) similar to what is going on with edx-search, what’s going on with enrollments https://2u-internal.atlassian.net/wiki/spaces/microb/pages/620036097/Enrollment+Notes
Just edit it if it’s wrong 🙂
[quest] (Ned) Where are we on cypress testing?
Have functional (although sparse) e2e smoke test suite; wasn’t too much easier to set up than bok-choy, but does seem to be easier to maintain
There have been mentions of expanding the smoke tests and/or using it for a11y CI checks, but no concrete plans yet
Requires JS experience and a review of the “lessons learned” docs we have so far
[quest] (Jeremy) Do people think Conventional Comments are a good idea?
Some enterprise engineers at 2U are debating it: https://twou.slack.com/archives/C049C2JGH3L/p1698697036085959
Robert does something like this already, a few others do at least sometimes
People seem generally up for trying it
Don’t think we want to enforce it, but may be good to suggest it?
English is hard (risk of stumbling into the recommended formatting while meaning and thinking about something else)
Arch-BOM uses https://2u-internal.atlassian.net/wiki/spaces/AT/pages/16385625/How+We+Announce guidelines for getting the word out about new things
May be good just to see some senior people doing it as an example, to see if it starts a trend
[ideation] (Jeremy) Mitigating pressure for 2U & Open edX code to diverge
2U concerns
Security of releases
Including regulatory compliance
Ability to deploy changes quickly
Extra deprecation overhead (relatively minor point)
Axim/Open edX concerns
Codebase clean of 2U-specific cruft
Shared worries
Time needed to reconcile divergent branches
Risk of permanent divergence resulting from logistics rather than intent/benefit
Ideas
Try to make sure that core committer merge process satisfies core regulatory requirements: require at least one non-author approver, have at least a PR (if not a separate issue) to track the rationale for each change, etc. (this may also benefit companies other than 2U in satisfying regulatory needs)
If necessary, use fast-forward-only release branches for edx.org that lag main/master just enough to perform any mandatory review for commits from non-2U-employees/contractors (but do we really need this, given that we don’t do it for other dependencies?)
Document and expand the range of options for quickly getting changes to production via config changes, extension points, temporary forks, etc. that don’t require putting 2U-specific code in Open edX
2023-10-25
Skipped so 2U developers could stay focused on Innovation Week (hackathon).
2023-10-18
[inform] (Ned) Innovation Week!!1!
[quest] (Jeremy) Does anyone have a security/vulnerability scanner they would recommend? Dependabot can’t track both main/master and a release branch.
[informish] (Andy) edx-search research (also 2u specific), maybe we should do more of this, looking at enrollments now.
[Question] CourseOverview has an org string field, not a fk to Organization - Any insight into the architectural reasons for this?
CourseOverview predates the organization table, and is essentially a cache of data in MongoDB
2023-10-11
[quest] (Ned) Do people understand that Core Contributors are allowed to merge without 2U approval?
It seems at least some people didn’t realize this
Maybe people like Virginia who are new to 2U and/or Open edX aren’t aware of this either?
https://openedx.atlassian.net/wiki/spaces/COMM/pages/3334635570
https://openedx.atlassian.net/wiki/spaces/COMM/pages/3156344833
[inform] (Jeremy) Maintenance Working Group launch preparation
Let Jeremy know if you’re interested in joining
[quest] (Jeff Witt) A11y in CI – opinions on how to implement?
WCAG 2.2 was released last week
A few a11y issues have made it into production that shouldn’t have
We used to have a minimal suite of automated a11y tests, but it was very hard to maintain and rarely caught problems, so we got rid of it
Tools have improved over time, we’re about due to pick one to employ for CI
We were using axe-core which is still popular, but may be going closed-source soon; there are other options now also
Jeremy’s notes on tools, etc. from earlier discussions around this:
https://codeburst.io/automated-accessibility-testing-tool-a11y-pa11y-jest-storybook-2ad294bfe71a
https://dev.to/willkre/3-ways-to-automate-accessibility-testing-a11y-19kc
https://dev.to/steady5063/react-testing-library-accessibility-4fom
https://www.digitala11y.com/open-source-accessibility-tools/
https://medium.com/john-lewis-software-engineering/automating-a11y-testing-part-1-axe-ed3d215de126
https://storybook.js.org/docs/react/writing-tests/accessibility-testing
Shifting left on catching a11y issues really reduces the cost of compliance
We aren’t doing the annual a11y training anymore, but we plan to roll it out again for at least the most relevant personnel
[quest] (Jeremy) Tech stack consolidation - what do you think we should try to get rid of, and in favor of what?
https://backstage.techdev.2u.com/ has a comparison of the 2U vertical tech stacks & processes as of shortly after the edX acquisition (2U-private, it’s on the main page after authenticating via GitHub)
[quest] (Jeff) If I set up a test user in a production course, does that muck with financial reporting?
(Matt) Enterprise has some test user capabilities set up for testing integrations and such
2023-10-04
[quest] (Dave) How is the MySQL 8 upgrade coming along?
(Jeremy) Largely done except for prod LMS, which is rerunning a days-long mitigation schema change
(Jeremy) We just switched devstack over
(Jeremy) Django 4.2 LMS upgrade involves password hash change, Arbi-BOM discussing options with BTR
[quest] (Dave) Is 2U using OrbStack now?
(Jeremy) Several individuals are, still going through vendor review for broader adoption
[quest] (Jeremy) Does anybody have thoughts on gaps in Kubernetes for use as a dev environment?
(John) Test data is needed regardless of which dev environment we go with
2023-09-27
[inform] (Ned) 2U has signed the Axim non-technical contribution agreement.
[inform] (Ned) Next 2U hackathon (“Innovation Week”) is Oct 23-25
[inform] (Jeremy) 2U vendor review triggers & consequences
Install things in a virtual machine with no VPN access to test them?
[quest] (Jeremy) What technology/tool do you think would best help you get good work done, but we just aren’t set up to use it yet?
(Matt) AI coding assistants?
(Jeremy) There are interesting and unanswered questions around inadvertent copyright infringement and copyrightability of the generated code
(Andy) Solvable via custom LLM trained on our own code?
(Hilary) Would this exacerbate the problem of not having enough people with deep knowledge of different parts of our systems?
(Matt) Maybe this will help us get knowledge bases that actually work
(Andy) OrbStack!
(Andy) Type hinting? TypeScript?
(Hilary) prop-types in JavaScript?
(Matt) Cloudflare AI tools
2023-09-20
[Ned] Axim would like to remove admin access to repos (this is different from the write access topic I mentioned last week). What concerns do people have about that?
[John] Is it likely that we’ll end up with a bunch of edx-org forks as a result of this?
[Ned] Feels unlikely at this point, lots of extra work
[Ned] More likely that we start using personal forks more often
[Ned] When do people really want admin access, anyway?
[Jeremy] Updating branch protection (required CI checks)
[Andy] Initial repo setup
[David] 2U Internal Marketplace update
[Jeff] Will the frontend be using Paragon or something else?
We think we’ll still have something like commerce coordinator, but it may have a more limited, appropriately scoped API than originally envisioned
[Jeremy] Arbi-BOM noted that we’ve been wanting to get rid of django-oscar & ecommerce for 2 years, now needing to upgrade them again for another Django upgrade
[quest](Hilary) - api access
[Robert]
Use observability to track down or learn about usage: https://github.com/openedx/edx-django-utils/blob/master/edx_django_utils/monitoring/docs/how_tos/using_custom_attributes.rst
Add linting. :) Use lint-amnesty.
Consider moving cross-service role sharing from JWT to another mechanism (e.g. events + data duplication, other?)
2023-09-13
[inform] (Hilary) OEP-66 PR up for review
Includes the relevant bits of OEP-9, drops or updates others
Bridgekeeper vs. rules
[quest] (Jeremy) Has anyone worked with Kompose or any other tools for 1) migrating from docker-compose to k8s or 2) allowing deployment of custom subsets of related microservices in k8s?
Not in this crowd
[question] (Mat) Infinity is looking to understand if Notifications belongs in edx or openedx. There’s not been a lot of understanding where this feature should eventually land.
[question] (Dave): The cut-off for Quince is coming soon (October 9th). Will we be on Django 4.2 for edx-platform? MySQL 8.0?
Devstack should be switching over to MySQL 8 within a week
Code is largely ready for Django 4.2, last PRs are being finalized and merged
Trying to get edx.org on MySQL 8 before updating the default requirements
If we miss end of September for that, we’ll try to make Django 4.2 the default with alternate 3.2-using requirements files for edx.org to use (annoying to implement in edx.org deployment pipelines)
BTW: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3613392957
[inform] (Ned) Long-term we are going to be reducing 2U access to all repos, and give teams write access to the repos they need.
[request] (Ned) Please educate new devs about the public aspects of much of our code.
Branch naming
Avoid links to private data
2023-09-06
[quest] (Robert) It seems the LMS auth_user table doesn’t have a modified date.
Is there a way to determine this?
If not, what’s the best way to resolve this at our scale (at least going forward)?
Django offers support for custom user tables, but edx-platform predates the feature, and it would be a fairly painful switch at this point
We sent tracking logs on most field changes, but not all
first_name and last_name are ignored, for example
How about django-simple history?
Would give us a full audit record moving forward, but likely to be massive (and probably overkill)
How do post_save and post_commit interact?
[quest] (Deborah) Is there a documented general principle about what log level to use?
Different teams have logs in Splunk, New Relic, and/or DataDog
If we did a major change of log formatting, we’d probably want to talk to the community first
We should probably have some kind of OEP/ADR for Python logging in general
[inform/quest] (Robert) How can we quickly determine the actual impact of a category of error?
Number of users impacted, scope of impact for each user, etc.
Starting to discuss with Data Engineering
Can this be tied to course completion?
May be a good case study for engineering noticing a problem and trying to make a call on how much work to put into getting resources allocated to fixing it
2023-08-30
[inform] (Jeremy) We are seeking good topics for the edX engineering blog and/or volunteers to write about them
[quest] (Jeremy) Should we consider piloting structlog somewhere?
New Relic now supports it: Python agent v8.11.0 | New Relic Documentation
Seems to avoid the need for some of the black-magic parsing we use Splunk for
The pretty console output could be a nice DevEx enhancement in dev environments
[quest] (Phil) Does anyone know anything about Next.js?
Phil: