Arch Hours: 2023
Meeting Expectations
Why?
Provide an opportunity for generative discussion and ideas.
Foster comradery through technical curiosity and geekdom.
Who?
Open to all edX-ers and Arbisoft-ers
What?
At times, these informal discussions result in follow-up action and beneficial change in our technology or in our organization. While this is not a decision-making body, these serendipitous discussions spark ideas that may result in ADRs/OEPs and tickets on team backlogs.
At times, it serves as a form of informal office hours to ask live technical questions of the archeological collective.
At times, we have pre-planned deep-dive topics that folks propose to gather wide-input or to answer questions.
At times, we have hosted special guests (internal and external to edX) on specialized topics.
When?
Not lunch hour in ET timezone: With Covid remote work, "Arch Lunch" has evolved into “Arch Hour” in order to accommodate various home/life situations during lunch time.
How? Live Co-Editing
To circumvent Confluence’s limitations with the maximum number of concurrent editors:
during the hour together, we capture topics and take notes at https://docs.google.com/document/d/16-IVTGIjfKyMl8F4__Pk8Di4c_Lkx3k0-djIW6H14X0/edit# .
after the hour, we move those notes to this page.
Why not just stick with keeping the notes in the Google doc?
Google docs are not as discoverable.
Google docs don’t notify observers of future edits.
Google doc comments don’t notify all observers.
How? Structure
Please enter your proposed topics for discussion.
When we use Lean Coffee Style (link1, link2), we vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).
Prefix your topic with your intention so we are clear on what outcome you are striving from the discussion. Examples:
[inform] You are simply seeking to inform the group of this item. You may field clarifying questions from the group on your inform, but not seeking further discussion at this time.
[ideation] You are seeking divergent and wide perspectives from this group. In this brainstorming mode, all ideas are accepted, without critical analysis.
It may be helpful to clarify whether you’d like to ideate on the problem space or the solution space.
[analysis] You are asking the group to help you poke holes in your idea/topic/plan/etc.
[quest] You are seeking information/responses to a question you have.
2023-12-20
(Dave) New Relic -> DataDog status
Related: Application Performance Monitoring
Contract expires around June
Edx-platform should have open telemetry compatible layer for this so people can plug in their own APM solutions.
[inform] (Kelly) I made a terrible diagram of edx-platform: https://lucid.app/lucidchart/fb870610-f8b4-4b7e-a509-1b871f81c54b/edit?beaconFlowId=8DBD553E85CDC9EE&invitationId=inv_e964873a-34ba-4bea-bc36-2fbe304edf40&page=9J6X4Q5XLMLH#
Related: https://openedx.slack.com/archives/C0497NQCLBT/p1702908527116319 and Architecture diagram updates · Issue #449 · openedx/docs.openedx.org
Within 2U, some of this stuff probably should be owned by Service Experience but isn’t yet
[Ned] Divergence Strategies: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/730005583/Divergence
Xavier’s doc: https://docs.google.com/document/d/1YyRxBrgIVoxwdcQLTWMyfUdFnxcRaqLJD1kTxBbIUn8/edit
2023-12-13
[quest] (Dave): How’s the MySQL 8.0 switchover going?
Scheduled for 2am tonight
[quest] (Dave): What’s a good way to roll out database connection encoding changes to Studio for http://edx.org ? (configuration repo help)
Dave to make issue in configuration repo (?) to track this.
This may be overridable in edx-internal (which would allow for faster rollback)
Jeremy will see if we want to turn on Issues there
High-level, external-safe discussion of recent 2U staff meetings
[ideation] (Jeremy) 2U executive management has been talking a lot about the “Open edX maintenance burden”. Where are teams feeling this, and how much time is it taking? What parts of it would we actually be comfortable handing over to other organizations to handle?
Fixing bugs?
Merging dependency upgrade PRs?
Big framework upgrades?
Roadmap decisions?
Reviewing changes from outside the core owning/maintaining team?
Deprecating stuff that’s no longer useful?
Building extension points so optional features can be added without being added for everyone?
[quest] (Jeremy) Are there more things like the Insights stack that are effectively only used by 2U, and should be deprecated as far as Open edX is concerned?
And how should we proactively identify things like this moving forward?
(Andy) At least a yearish ago there was at least one other insights user
[quest] How deep are architecture vendor commitments? What failover features are there?
2023-12-06
[Ned] What is bad about installing dependencies from GitHub?
OEP-18: Python Dependency Management — Open edX Proposals 1.0 documentation (see the end of the “Rationale” section)
[Jeremy] What would people want to see from an Open edX maintenance working group?
Expertise about how to use Dependabot and Renovate
“Error budget” for teams is useful
Test suites that are comprehensive enough and fast enough to run
Standard way to avoid trying to upgrade to a known-broken release again somewhere else
Clear path from identification of unwanted dependencies to deprecation of them
Path to consistently applying new good code patterns across our codebase
[Jeremy] What are leading causes of delayed/lost PR reviews on your teams?
[Andy/Cosmonauts] Big, intimidating PRs and lack of context after an ownership transfer
[Chris] Variety of different reasons
[Hilary] PRs that span ownership boundaries
[Andy] High/unclear level of responsibility from approving a PR
2023-11-30
[inform] (Dave) Submit Open edX conference talks!
[AS][Wild Speculation] Are we in a startup runway situation now?
Should we consider more extreme short-term measures than we normally would, to minimize operating costs in the meantime?
Axim is coordinating a pile of funded contribution (FC) projects to make various improvements already; not a lot of bandwidth for new feature requests right now.
Spent a lot of time discussing this, hard to distill it into key points
2023-11-15
[inform] (Jeremy) courseware_studentmodule Table Refactoring
[musings/questions] (Ned) Mapping people/squads/repos
[Question] (Hilary) If we’re trying to design new db for a model, who is the best person/people to talk to?
Solution Review
Dave Ormsbee at Axim
Many principal/staff engineers
No real DBAs, but…
At least for a while, we have part-time contract DBA via Percona
Also, consult Everything About Database Migrations
[inform] (Jeremy) Socket.dev - tool for tracking dependency health
[quest] (Jeremy) Awareness of Django’s transaction.on_commit() and similar tools which are incredibly useful in the right circumstances. How do we make sure developers are aware of / informed of these when appropriate?
New “everything about concurrency” guide?
Celery
Transactions
Event bus
Django async
[inform] (Alex) Enterprise likes drf-spectacular
Related: Making edX Platform APIs public
Related: API Improvements · Issue #32609 · openedx/edx-platform
Should we have a doc/resource of “we like these patterns, do more of this”?
2023-11-08
[inform] (Dave) New Relic is a great resource!
[quest] (Dave) There are multiple places where it would be beneficial to add indexes in large tables (~10-100M+ rows?). I don’t think they’ll lock anything, but would the length of time it takes to run the migration be disruptive to the release process? What’s a good path forward so we don’t take edx.org by surprise?
Please hold off until MySQL 8.0 update has completed.
Long term: Need to do something about partitioning CSM.
[ideation] (Dave) Serving assets with Caddy (and maybe nginx?) via X-Accel-Redirect.
Dave: FWIW, I wasn’t planning to use this. I think folks have standardized on the header since this was written, it doesn’t seem like it’s completely up to date, and I’m planning to do a lot of header tweaking anyhow.
Can make the container run Caddy and have that work in devstack as well.
Actual nginx configuration still happens via the configuration repo–can talk to TNL for testing help.
[question] (Ned) why are there so many old renovate pull requests open?
86 created before 2023: https://github.com/pulls?q=is%3Apr+is%3Aopen+author%3Aapp%2Frenovate+org%3Aopenedx+created%3A%3C2023-01-01
[question] (Jeff) Any special security considerations re browser extensions?
[question] (Jeff) AGPL / borrowing Canvas code concerns?
[question] (Jeff) How best to query for the presence of SRT captions files for videos?
2023-11-01
[inform] (Ned) GitHub pull request labels can make Jira issues
Get in touch with Ned if you want to enable this for specific repo/project pairs
TNL & Aperture are testing it out
[inform] (Andy) similar to what is going on with edx-search, what’s going on with enrollments https://2u-internal.atlassian.net/wiki/spaces/microb/pages/620036097/Enrollment+Notes
Just edit it if it’s wrong 🙂
[quest] (Ned) Where are we on cypress testing?
Have functional (although sparse) e2e smoke test suite; wasn’t too much easier to set up than bok-choy, but does seem to be easier to maintain
There have been mentions of expanding the smoke tests and/or using it for a11y CI checks, but no concrete plans yet
Requires JS experience and a review of the “lessons learned” docs we have so far
[quest] (Jeremy) Do people think Conventional Comments are a good idea?
Some enterprise engineers at 2U are debating it: https://twou.slack.com/archives/C049C2JGH3L/p1698697036085959
Robert does something like this already, a few others do at least sometimes
People seem generally up for trying it
Don’t think we want to enforce it, but may be good to suggest it?
English is hard (risk of stumbling into the recommended formatting while meaning and thinking about something else)
Arch-BOM uses https://2u-internal.atlassian.net/wiki/spaces/AT/pages/16385625/How+We+Announce guidelines for getting the word out about new things
May be good just to see some senior people doing it as an example, to see if it starts a trend
[ideation] (Jeremy) Mitigating pressure for 2U & Open edX code to diverge
2U concerns
Security of releases
Including regulatory compliance
Ability to deploy changes quickly
Extra deprecation overhead (relatively minor point)
Axim/Open edX concerns
Codebase clean of 2U-specific cruft
Shared worries
Time needed to reconcile divergent branches
Risk of permanent divergence resulting from logistics rather than intent/benefit
Ideas
Try to make sure that core committer merge process satisfies core regulatory requirements: require at least one non-author approver, have at least a PR (if not a separate issue) to track the rationale for each change, etc. (this may also benefit companies other than 2U in satisfying regulatory needs)
If necessary, use fast-forward-only release branches for edx.org that lag main/master just enough to perform any mandatory review for commits from non-2U-employees/contractors (but do we really need this, given that we don’t do it for other dependencies?)
Document and expand the range of options for quickly getting changes to production via config changes, extension points, temporary forks, etc. that don’t require putting 2U-specific code in Open edX
2023-10-25
Skipped so 2U developers could stay focused on Innovation Week (hackathon).
2023-10-18
[inform] (Ned) Innovation Week!!1!
[quest] (Jeremy) Does anyone have a security/vulnerability scanner they would recommend? Dependabot can’t track both main/master and a release branch.
[informish] (Andy) edx-search research (also 2u specific), maybe we should do more of this, looking at enrollments now.
[Question] CourseOverview has an org string field, not a fk to Organization - Any insight into the architectural reasons for this?
CourseOverview predates the organization table, and is essentially a cache of data in MongoDB
2023-10-11
[quest] (Ned) Do people understand that Core Contributors are allowed to merge without 2U approval?
It seems at least some people didn’t realize this
Maybe people like Virginia who are new to 2U and/or Open edX aren’t aware of this either?
https://openedx.atlassian.net/wiki/spaces/COMM/pages/3334635570
https://openedx.atlassian.net/wiki/spaces/COMM/pages/3156344833
[inform] (Jeremy) Maintenance Working Group launch preparation
Let Jeremy know if you’re interested in joining
[quest] (Jeff Witt) A11y in CI – opinions on how to implement?
WCAG 2.2 was released last week
A few a11y issues have made it into production that shouldn’t have
We used to have a minimal suite of automated a11y tests, but it was very hard to maintain and rarely caught problems, so we got rid of it
Tools have improved over time, we’re about due to pick one to employ for CI
We were using axe-core which is still popular, but may be going closed-source soon; there are other options now also
Jeremy’s notes on tools, etc. from earlier discussions around this:
https://codeburst.io/automated-accessibility-testing-tool-a11y-pa11y-jest-storybook-2ad294bfe71a
https://dev.to/willkre/3-ways-to-automate-accessibility-testing-a11y-19kc
https://dev.to/steady5063/react-testing-library-accessibility-4fom
https://www.digitala11y.com/open-source-accessibility-tools/
https://medium.com/john-lewis-software-engineering/automating-a11y-testing-part-1-axe-ed3d215de126
https://storybook.js.org/docs/react/writing-tests/accessibility-testing
Shifting left on catching a11y issues really reduces the cost of compliance
We aren’t doing the annual a11y training anymore, but we plan to roll it out again for at least the most relevant personnel
[quest] (Jeremy) Tech stack consolidation - what do you think we should try to get rid of, and in favor of what?
https://backstage.techdev.2u.com/ has a comparison of the 2U vertical tech stacks & processes as of shortly after the edX acquisition (2U-private, it’s on the main page after authenticating via GitHub)
[quest] (Jeff) If I set up a test user in a production course, does that muck with financial reporting?
(Matt) Enterprise has some test user capabilities set up for testing integrations and such
2023-10-04
[quest] (Dave) How is the MySQL 8 upgrade coming along?
(Jeremy) Largely done except for prod LMS, which is rerunning a days-long mitigation schema change
(Jeremy) We just switched devstack over
(Jeremy) Django 4.2 LMS upgrade involves password hash change, Arbi-BOM discussing options with BTR
[quest] (Dave) Is 2U using OrbStack now?
(Jeremy) Several individuals are, still going through vendor review for broader adoption
[quest] (Jeremy) Does anybody have thoughts on gaps in Kubernetes for use as a dev environment?
(John) Test data is needed regardless of which dev environment we go with
2023-09-27
[inform] (Ned) 2U has signed the Axim non-technical contribution agreement.
[inform] (Ned) Next 2U hackathon (“Innovation Week”) is Oct 23-25
[inform] (Jeremy) 2U vendor review triggers & consequences
Install things in a virtual machine with no VPN access to test them?
[quest] (Jeremy) What technology/tool do you think would best help you get good work done, but we just aren’t set up to use it yet?
(Matt) AI coding assistants?
(Jeremy) There are interesting and unanswered questions around inadvertent copyright infringement and copyrightability of the generated code
(Andy) Solvable via custom LLM trained on our own code?
(Hilary) Would this exacerbate the problem of not having enough people with deep knowledge of different parts of our systems?
(Matt) Maybe this will help us get knowledge bases that actually work
(Andy) OrbStack!
(Andy) Type hinting? TypeScript?
(Hilary) prop-types in JavaScript?
(Matt) Cloudflare AI tools
2023-09-20
[Ned] Axim would like to remove admin access to repos (this is different from the write access topic I mentioned last week). What concerns do people have about that?
[John] Is it likely that we’ll end up with a bunch of edx-org forks as a result of this?
[Ned] Feels unlikely at this point, lots of extra work
[Ned] More likely that we start using personal forks more often
[Ned] When do people really want admin access, anyway?
[Jeremy] Updating branch protection (required CI checks)
[Andy] Initial repo setup
[David] 2U Internal Marketplace update
[Jeff] Will the frontend be using Paragon or something else?
We think we’ll still have something like commerce coordinator, but it may have a more limited, appropriately scoped API than originally envisioned
[Jeremy] Arbi-BOM noted that we’ve been wanting to get rid of django-oscar & ecommerce for 2 years, now needing to upgrade them again for another Django upgrade
[quest](Hilary) - api access
[Robert]
Use observability to track down or learn about usage: https://github.com/openedx/edx-django-utils/blob/master/edx_django_utils/monitoring/docs/how_tos/using_custom_attributes.rst
Add linting. :) Use lint-amnesty.
Consider moving cross-service role sharing from JWT to another mechanism (e.g. events + data duplication, other?)
2023-09-13
[inform] (Hilary) OEP-66 PR up for review
Includes the relevant bits of OEP-9, drops or updates others
Bridgekeeper vs. rules
[quest] (Jeremy) Has anyone worked with Kompose or any other tools for 1) migrating from docker-compose to k8s or 2) allowing deployment of custom subsets of related microservices in k8s?
Not in this crowd
[question] (Mat) Infinity is looking to understand if Notifications belongs in edx or openedx. There’s not been a lot of understanding where this feature should eventually land.
[question] (Dave): The cut-off for Quince is coming soon (October 9th). Will we be on Django 4.2 for edx-platform? MySQL 8.0?
Devstack should be switching over to MySQL 8 within a week
Code is largely ready for Django 4.2, last PRs are being finalized and merged
Trying to get edx.org on MySQL 8 before updating the default requirements
If we miss end of September for that, we’ll try to make Django 4.2 the default with alternate 3.2-using requirements files for edx.org to use (annoying to implement in edx.org deployment pipelines)
BTW: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3613392957
[inform] (Ned) Long-term we are going to be reducing 2U access to all repos, and give teams write access to the repos they need.
[request] (Ned) Please educate new devs about the public aspects of much of our code.
Branch naming
Avoid links to private data
2023-09-06
[quest] (Robert) It seems the LMS auth_user table doesn’t have a modified date.
Is there a way to determine this?
If not, what’s the best way to resolve this at our scale (at least going forward)?
Django offers support for custom user tables, but edx-platform predates the feature, and it would be a fairly painful switch at this point
We sent tracking logs on most field changes, but not all
first_name and last_name are ignored, for example
How about django-simple history?
Would give us a full audit record moving forward, but likely to be massive (and probably overkill)
How do post_save and post_commit interact?
[quest] (Deborah) Is there a documented general principle about what log level to use?
Different teams have logs in Splunk, New Relic, and/or DataDog
If we did a major change of log formatting, we’d probably want to talk to the community first
We should probably have some kind of OEP/ADR for Python logging in general
[inform/quest] (Robert) How can we quickly determine the actual impact of a category of error?
Number of users impacted, scope of impact for each user, etc.
Starting to discuss with Data Engineering
Can this be tied to course completion?
May be a good case study for engineering noticing a problem and trying to make a call on how much work to put into getting resources allocated to fixing it
2023-08-30
[inform] (Jeremy) We are seeking good topics for the edX engineering blog and/or volunteers to write about them
[quest] (Jeremy) Should we consider piloting structlog somewhere?
New Relic now supports it: Python agent v8.11.0 | New Relic Documentation
Seems to avoid the need for some of the black-magic parsing we use Splunk for
The pretty console output could be a nice DevEx enhancement in dev environments
[quest] (Phil) Does anyone know anything about Next.js?
Phil:
I know relatively little about Next.js; any advice? Have we used it at edX before? Does it work well with our current stack? Are there any good learning resources out there?
Looking at: django-nextjs, a Django app plugin to enable a Django IDA to serve Next.js server-side code [blog post]
Apparently https://pypi.org/project/django-nextjs/ exists to allow Django templates and Next.js templates to co-exist
(Jeremy) FED-BOM did a little discovery on this and related tooling in https://github.com/openedx/wg-frontend/issues/126 (we decided at the time to punt until some of the alternatives mature a bit)
Jeremy: Related issue: https://github.com/openedx/wg-frontend/issues/126
This is more of a question for Frontend WG/Paragon WG.
On the open-source side, OEP-11 is the guiding document that may need amendment for Next.js.
[quest] (Jeremy) In light of Deciphering Glyph :: Get Your Mac Python From Python.org , should we make any updates to recommendations for installing Python on macOS?
(flame): pyenv!
Current documentation:
Summary: no real need to change anything here, either pyenv or official installers should work for most people at 2U / working on Open edX
[quest] (Dave) Any thoughts on porting forums service to a Django app? (I know there’s Infinity discovery around this, but I’m curious if others had also looked at this problem, or if there were other discussions not captured in those docs.) Context is that Axim is considering funding something here.
We’d love to see this get done
Diana: concern about data migration
Dave: Phase 1 would keep the data in place and start moving towards removing the Ruby aspect of things. Any potential data migration would happen after that.
Would alleviate (admittedly modest) security concerns around Sinatra and dependency gems
Would greatly reduce barriers to making forums enhancements
[quest] (Jeff) Mathjax version 3 OSPR received; could we roll this out course-by-course?
Also, native browser rendering or MathCAT may be preferable in some cases
(Dave) Maybe go to the Product Working Group to plan out the per-course rollout capability?
[quest] (Jeremy) Dev environment direction
(Hilary) Think there will still be a need for local long term
Intermittent internet connections, for example
(Andy) There are some jury-rigged AI configurations that are easier to set up locally
But default to remote, fall back to local may be where we want to end up
(aside) Open edX is featured on Orbstack’s website! https://docs.orbstack.dev/benchmarks
[quest] (Hilary) How do people set up new large Open edX sites?
Mostly custom Terraform and such, sometimes based off of an AWS solution template
Although Harmony is a new Kubernetes-based solution for this
2023-08-23
[analysis] (Jeremy) Docker Desktop replacement
Wiki page with some analysis: https://openedx.atlassian.net/wiki/spaces/AC/pages/3845914644
Arch-BOM ticket for continuing investigation: https://github.com/edx/edx-arch-experiments/issues/93
[quest] (Ned) Why are we OK with a 2-hour deployment pipeline?
(Andy) It’s worse than that, it’s a 2 hour nondeterministic pipeline
(Phil) From GoCD edxapp statistics: 45+20+1+1+15+(2+3+7)=94 minutes
The 45 minutes (half the duration) is building the AMI
Each number is the average duration of a pipeline step.
Pipeline #’s in parenthesis happen after the build is available on prod.
We don’t have a good basis of comparing even between different pipelines within 2U
(Jeremy) We don’t want to continue using GoCD in the long term, which leads to debate on the value of optimizing it (vs. doing work to switch to Argo CD instead)
(Jeremy) There are several parallel efforts to reduce edx-platform build time, but the time to value delivery is long doing it that way. Maybe we should concentrate our efforts a little better for incremental value delivery?
[quest] (Alex D.) Any patterns that folks like for data replication between services?
[we talked about things]
https://openedx.atlassian.net/wiki/spaces/AC/pages/1074397222 see points about Eventual Consistency
[quest] (Ned) What kinds of informal education are useful for developers?
High-level block diagram (context/container from c4)
Architectural onboarding has fallen by the wayside
How code is organized (mono-repo and otherwise)
Migrations, what they are and how they go wrong
Celery
Tour of a new ida makefile
What counts as “core”
[quest] (Adam) How do we get better at either smoke tests or health checks so that we can make big changes to infrastructure more confidently and detect things before we ship bugs to prod.
2023-08-16
[ideation] (Jeremy) What (if anything) should we do for next-generation automated a11y testing?
The QA team is interested in working on this if we want to put effort into it
(Jeff) We have something to run axe-core for MFEs, unclear how many MFEs are running it and how often
(Jeff) There are services that do automated checks for large sites, but they’re ridiculously expensive
(Jeff) Have been trying to pick a tool by the end of the year
(Jeff) Also, there’s an external audit in the works
Jeremy will connect Jeff and the QA team to try to figure out next steps
[quest] (Hilary) - Is anyone interested in being a co-author on an OEP about authz best practices?
Would be nice to have someone already familiar with edx-rbac and Django Admin
(Robert) The authentication OEP is more of a collection of documentation and context than official “best practices”
(Jeremy) Any idea yet how this would relate to OEP-9: User Authorization (Permissions) — Open edX Proposals 1.0 documentation and OEP-4:
Application Authorization (Scopes) — Open edX Proposals 1.0 documentation
Ideally supplant these, given that they’re pretty old and not reflective of the current system; include the parts that are still useful
Jeremy is up for reviewing, but doesn’t really have time to co-author
[quest] (Jeff) - Can you think of any problems if no MathML rendering library is included with Open edX or http://edx.org ? (Chrome and Firefox include MathML rendering now, and I wonder if interactivity might be better as a browser extension for some users)
There’s also MathCAT
Oh hey, it’s written in Rust
Browser support has gotten much better in recent years
We haven’t yet done a detailed a11y evaluation of the options
Worth looking at using the native browser rendering just for the JS download size savings
We spend non-trivial $ pushing MathJax bits to everyone in a lot of pages.
[question] (Ned) What does a “Cybersecurity review” entail?.
Came up in the context of being needed for anything being open sourced
(Adam) Sometimes involves pentesting or an external security firm review
(Hilary) If you don’t have confidence in the answers to the form, you can get help talking through it
Brian M and Purva have done an AppSec for Trilogy user account things
[question] (Adam) Do operators actually need to upgrade to MySQL 8 by the Quince release?
If they need to defer the MySQL 8 upgrade, they’ll defer the Open edX upgrade
It’s tricky to do a release that supports installation with either Django 3.2 or 4.2
Django 4.2 will generate SQL that just doesn’t work with MySQL 5.7
2023-08-09
[inform] (Dave) PSA on deadlock errors
[inform/discuss] (Ned) Whether to open source code is full of subtleties
Slack thread from today: https://twou.slack.com/archives/C030CC8T40N/p1691583253294229
(Ned) edx-platform is in phase 3 of maintainership
2023-07-26
[analysis] (Ned) Marking problematic PRs
Idea: add a label to any PR that required a revert or fix-forward
We could look back through revert PRs to catch many of these
Maybe QA or Incident Response would be up for marking historical ones?
We’d need to update the incident runbook to flag new ones
[inform] (Jeremy) Kolo for Django (Django dev/debugging tool for VSCode)
[inform] (Feanil) There is edx-platform API documentation at https://docs.openedx.org/projects/edx-platform/en/latest/references/lms_apis.html
[quest] (Jeremy) Do people have any eagerness or reluctance to switch from setup.py to pyproject.toml?
Came up recently in https://github.com/openedx/openedx-learning/pull/65#issuecomment-1644360505
(Feanil) I’d like to see it happen so we get more consistent
Cookiecutters first (or at least early)
(Jeremy) I’d love to get it done, just trying to figure out priority relative to other projects
https://github.com/orgs/edx/projects/15/views/7?pane=issue&itemId=34167265
[quest] (Jeremy) Has anybody found a good way to profile the build time of Docker images?
“docker build” has basic per-step timings, but they’re kind of lost in the noise
https://github.com/orgs/edx/projects/15/views/7?pane=issue&itemId=34281583
[quest] (Jeremy) Are people ok with a console report of prioritized repo health issues in code they own? Or is there a strong preference for browser/other UI?
https://github.com/openedx/edx-repo-health/blob/master/scripts/console_dashboard.py
People seem ok with the console version, at least for now
2023-07-19
[quest] (Hilary) Where should docs go for Open edX code?
(Ned) Try to keep it as close to the relevant code as possible.
Common choices include Read the Docs & Confluence
But the main thing is to get something written, it can be moved later if appropriate
(Robert) (post-meeting) See https://docs.openedx.org/projects/openedx-proposals/en/latest/best-practices/oep-0019-bp-developer-documentation.html
[inform] (Ned) We’re starting to plan an October Hackathon
If you want get involved, please join #interest-hackathon-planning
(Hilary) For some teams, the theme was a dis-incentive because the assumption was that projects needed to be on-theme, no matter how many times the contrary was stated
Let’s point to the list of projects that were done last time, to illustrate that other topics are actually ok
Adjourned early for lack of topics & low attendance
2023-07-12
[ideation] (Dave O) Proposal: Make MinIO a part of the default Tutor / Devstack install, and let Django apps and services assume an S3-like interface instead of having to accommodate any django-storages backend–i.e. drop support for storing that data directly on the filesystem.
OEP
DEPR
How to migrate folks away
Check on Swift usage/compatibility
Seems better than localstack, which was way to big for this use case
Look into reliability (link)
Don’t force MinIO
[quest] (Jeremy) Should we configure and enable https://github.com/actions/dependency-review-action ?
Have Arbi-BOM try it out, talk to Axim about shared config if it goes well
Quick demo of how to find Dependency Review on PRs
[quest] (Hilary) Does core functionality belong in the platform or should an IDA be used if the functionality could be considered a complete service?
Ownership is much simpler for a separate service, leading 2U to often prefer this
Some operators of smaller sites struggle to manage multiple services, and prefer everything critical to be in edx-platform
Some things make sense as libraries or plugins that get installed into edx-platform
https://openedx.atlassian.net/wiki/spaces/AC/pages/1074397222
https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/26017856/Directory+Of+edX+Sites (2U internal)
2023-07-05
[rant?] (Andy) we keep getting m1 macs, how much lost dev time before we invest in them
Ongoing discrepancy between people having no trouble and those having nothing but trouble
Frontend is one of the areas some people have hit trouble in
Fragmentation in services being used is making reproducing problems hard
Projects ongoing - ARM images coming, for example
[quest] (Phil) service-to-service testing
https://twou.slack.com/archives/C030CC8T40N/p1688569461913529
We have Cypress for this at a very small scale, but want to lean into using Pact instead
https://openedx.atlassian.net/wiki/spaces/AC/pages/3769663499
[quest] (Jeremy B) https://openedx.atlassian.net/wiki/spaces/AC/pages/3664904195 - adoption next steps
We’ll be asking owning teams to start tracking tasks from this board that need attention from them
Feedback welcome on whether we should start doing this manually or do some of the proposed automation first
[ideation] (David) Reducing risk of deploying edx-platform
Requiring reviews on edx-platform - at the very least, an edx.org reviewer is required for an OSPR to go out on code we own within edx-platform?
Process for including community members on RCAs - we can have more information on why incidents happened
Process for halting incoming commits - should Clamps be extended to restrict the pool of committers to edx-platform?
(long-term) Requiring more frequent e2e tests on code shipment - hopefully gives us more confidence to deploy
(long-term) Core contributor read-only access to GoCD, or a website built on GoCD’s APIs exposing some info? Mergers are responsible for monitoring build? [eventually leading to a continuous delivery consortium]
[quest] Jeff Witt How does Transifex import work? Is there a single source of truth?
See https://www.frontitude.com/ – edX Design team may want to start using it.
2023-06-28
[analysis] (David) Web notifications vs. edx-ace
Not all stakeholders/the team doing the work isn’t present, but I could use some insight into whether we think these two are apples and oranges or what.
(John) Braze supports some form of web notifications
(John) Braze is handling transactional emails like password resets, not just marketing stuff
(John) The thing that we’re missing right now is persistence of the messages that have been sent
(Dave) I think Gabe is the only person from the original edx-ace development team who’s still in the ecosystem
(Dave) Possibly relevant discussion in #notifications-2023 in the Open edX Slack
[analysis] (Jeremy) https://openedx.atlassian.net/wiki/spaces/AC/pages/3801743364
(John) Postgresql has better offline/concurrent index creation (non-locking)
(Dave) Limits on length of indexes, etc. measured in bytes instead of characters
Any ~opinions~ on chatbot architecture? (Jeff Witt)
The accessibility of most chatbots is pretty poor
Has any serious thought gone into the frontend architecture for these chatbots?
(Andy) Allie has put some thought into it
Are we going to make any attempt to do side-by-side comparisons of the different LLM options?
(John) It’s unclear how well any of these work in non-English languages
2023-06-21
[question] (Andy) do we have any caching guidelines / review? Thinking of stashing some data in memcache.
Or am I just being too paranoid about html parse time
OEP-22: Caching in Django — Open edX Proposals 1.0 documentation
Also a Dave Ormsbee talk at the conference
https://2023-open-edx-conference.sessionize.com/session/435597
A Practical Guide to Backend Caching - Open edX
(Dave): Talk to me! I love caching!
No existing guidelines, aside from don’t involve pickles at all
[analysis](Hilary) course role scope and new course role management options
An Authorization doc in Confluence
Potential definitions:
System-wide Roles
Service-specific Roles
Course-specific Roles
It’s not clear whether these are system-wide or service-specific, or either or both, so not a great term, and maybe we need more terms.
[quest] [Robert]
Are edX course-discovery course run ids the defacto 2U course ids?
Applies to notifications, and other future work.
Who would own this discussion?
What is the 2U equivalent of an OEP process?
[David] My understanding is that this was the closest thing to a place to put ‘global’ ADRs: https://architecture.techdev.2u.com/decisions/index.html
Whether anyone’s using that or updating it… well, I’m pretty sure they aren’t.
Cohorts/sections/course runs: https://twou.slack.com/archives/C030CC8T40N/p1682617923260639
2023-06-14
[review] (Hilary) Current architecture for roles/permission sets and if time potential decisions around roles/permission set contexts/scopes
Presented a diagram summarizing current understanding; seems largely accurate
Robert Raposa is a good person to go to in order to verify some of the points of uncertainty
(Jeremy) OEP-9: User Authorization (Permissions) — Open edX Proposals 1.0 documentation may be useful for separating the implementation of a permission check from all the places that need to use it
(Phil) Note that at least ecommerce has permissions in Django admin separate from those in edx-platform
(Andy) Support are good people to ask what are the biggest problems/points of confusion with the current permissions scheme
(Phil) 2U’s Security Working Group & Ben Piscopo have talked in the past about the need for a TA role: some role between learner & instructor that doesn’t grant the waterfall of privileges that comes with being course staff.
Future related topic: scopes
[ideation] (Jeremy) If you could make one nontrivial change to Open edX to make it better/easier to work with in the future, what would it be?
(Dave) Require fewer resources to run - CPU, RAM, storage
Run the whole stack on a Raspberry Pi
(Alex) Magical observability (beyond New Relic levels)
Quickly trace why things happened and what they are
(Alex) Have a schema for content (metadata)
(Andy) Types!
Adding types to Python
JS -> TypeScript
Hilary: +1
(Jeremy) Arbi-BOM would love to work on the Python side of this
(Dave) How much of the code do we need to type before this starts becoming useful?
(Andy) cookiecutter might actually be the first place, so you get a typed thing when you get started
(Dave) [mildly crazy] Run Studio and LMS as one deployable thing (single platform)
(to be clear, I haven’t fully thought this through yet. :-P)
2U wants more services, small deployers want fewer services. Is there a possibility of enabling either option for services, rather than forcing either that makes some happy and some sad.
(Dave) I think it’s possible to have “deploy this subset of apps as a different service” thing… I need to think about it more though. A lot of the simplification advantages of having things in the same place might be negated if we have to do the cross service thing anyway.
(Jeff) Test user accounts, so I don’t have to enroll in order to test it with e2e testing systems
In order to reproduce the UI as perceived by some users
Catalog of possible states
(Jeremy) Add support for PostgreSQL (dropping MySQL support optional)
Consolidate Elasticsearch and MongoDB into PG full text search & JSONB fields, at least for smaller installations
(Phil)
Learning: Unit-level table of contents
Documentation: Architecture onboarding
2023-06-07
[quest] (Jeremy) Which software industry news sources are people finding useful, if any?
GitHub blog - https://github.blog/feed/
GitHub Changelog - https://github.blog/changelog/feed/
Thoughtworks Tech Radar - https://www.thoughtworks.com/radar
Leadership: https://www.rawsignal.ca/newsletter
Rands in repose outgrowth https://randsinrepose.com/welcome-to-rands-leadership-slack/
https://star-history.com/blog blog (~monthly)
[ideation] (David) How to handle frontend plugin error states when we can’t know if the contents of an iframe has failed to load.
Best idea so far is to use a timeout, assume it failed to load if there isn’t a load successful event after x time since the load attempt event
[ideation] (David) Engineering and Architecture Onboarding information architecture
Came up because interns start next week
Some folks have made their own versions, apparently?
… can we find out who without judgment?
Onboarding IA
Dev environment setup
Culture
General tech learning resources
Tools
Separate out ‘meta onboarding’ for managers and people who are onboarding new hires
Glossary of… “domains”
Devstack (“if you’re a search and discover engineer, ignore everything about devstack”, for instance)
Terraform
Ansible
Doc categorizations - are these applicable to onboarding docs?
Explanations
How-Tos
Tutorials
Reference
People onboarding a new hire get some coaching
Being conservative about what you show folks in the beginning
Buddies need to know what to do
Coach people about how to onboard more effectively
How to find stuff?
Goals
Set small goals for the person onboarding that are appropriate for your team
“How can I get this person to commit a production change on their first day?”
“Getting devstack running on your first day is a win”
[Jeremy] Who should be in charge of making onboarding better?
(John) Feels like it should be a competency for engineering management, maybe even explicitly called out in the career pathway
(David) I volunteered to handle some of the docs side of this
(John) It would be awesome to have more onboarding material as courses hosted on Open edX
[quest] (Jeff) What’s up with the new video player project?
(Dave) Still in discovery: https://openedx.atlassian.net/wiki/spaces/OEPM/pages/3674734593
2023-05-31
[Ned] Draft policy for granting write access to openedx org repos: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3771793429
[John] [inform] Adam Stankiewicz and I are demonstrating non-devstack MFE development (using stage auth and apis) at tomorrow’s FedEX meeting
[inform/quest] (Jeremy) Pact contract testing interest revival
https://openedx.atlassian.net/wiki/spaces/AC/pages/3769663499
Frontend WG considering it for testing MFE interaction with backends
Its stub server could also be used to mock back ends for MFE development
(David) It feels like course-discovery could really use this
(David) It’s easier than it looks like at first glance, but you really need to try using it to understand it
[ideation] (David) is there a good way to structure a django app plugin (or something) such that an operator could choose to run it in edx-platform OR as a separate service?
(Jeremy) Seems feasible with a minimal microservice wrapper to install the app into, especially if it communicates with the rest of edx-platform via signals (which could be transmitted over the event bus if needed) instead of direct Python API calls
[analysis] (Jeremy) Development metrics and tools for collecting them
Okay got acquired by Stripe for internal purposes, tool no longer available
https://devlake.apache.org/ under consideration
Atlassian Compass - https://www.atlassian.com/software/compass
(Andy) Proposed metric - time to verify locally if an attempted bug fix worked
[quest] (Jeremy) Devstack & Tutor pain points
(Hilary) Mismatch between Confluence & Slack & verbal advice
Especially local vs. hosted devstack
(Hilary) Docker Desktop now asks you to confirm that your company has a license for you, which was a significant speed bump for installation
Maybe we should accelerate switching to Orbstack or Minikube or Rancher Desktop, etc.
2023-05-24
[ideation] (David J) Next steps for this meeting - announcements when it’s starting, perhaps? Seems to work well for other recurring optional meetings.
Arch Hour is starting now at https://edx-org.zoom.us/j/88954838503?pwd=Q3MzbDgvTE00ejU3NXgyTUR3dXQ3Zz09
Announced in #tech-dev-edx
[ideation] (Jeremy B) I’d like to enhance the console repo health dashboard I started during the Hackathon to make it useful for teams in identifying high-priority tech debt. Any nominations for patterns/issues that should be especially prioritized?
(Ned) Proper README content for repos in the edx GitHub org (contributions, security, etc.)
(David) Stop using Enzyme
(David) Count of warnings in test suite
[quest] (Jeremy B) Has anybody looked at JSON5? Might be a viable option when considering YAML over JSON just to get support for comments.
(Chris) I just add “comment” fields to my JSON
Sounds like none of us have a burning need for this
We’ve come to terms with YAML, and it’s imposed on us by our software choices in many cases
[question] (Ned) I’m casting a wide net to find out what education people need about Open edX / open source / Axim / etc: https://twou.slack.com/archives/C04847T6QNQ/p1684505932705279
If someone asks you a question along these lines, capture the answer
Have we linked to the docs we have for this in places where people who have these questions can find them?
[inform] (Dave) Axim has created a new GitHub org (aximcollaborative), and Axim-specific repos will be moved there.
[quest] (Jeremy) Does “platform engineering” sound substantially different from SRE to others?
Example: https://platformcon.com/
Basic idea: Create an internal developer platform for teams to self-service launch their own services
(John) A key enabling factor for these is having more of the config local to the service repo (12 factor app)
2023-05-17
****[John] any updates on the future of devstack?
Hosted devstack vs. Tutor vs. off-the-shelf cloud dev env tooling
Need to set a budget for hosted devstack
We want to experiment with Tutor + Devspaces/Okteto/other
Want to start building arm64 images for new non-Ansible devstack images
Provisioning is a pain point
Still being on MySQL 5.7 is making this worse
SRE and AWS are working on it
We still owe the NPS survey results
Shortest path to improving the status quo?
Build arm64 images & upgrade MySQL
Improve the DB cache
***[quest] (Alie L) Has anybody worked with https://github.com/openedx/edx-rbac before? I’m trying to understand if it is a viable option for getting/reading course staff roles from the LMS into an IDA (specifically special exams IDA)
[John] Enterprise is using it for what sounds like a similar thing
[Robert] Issue to fix/update OEP for Authorization: https://github.com/openedx/open-edx-proposals/issues/479
[Alex - async] https://github.com/openedx/edx-rbac/blob/master/docs/how_to_guide.rst I can walk through this with you if you want, Alie. I’ve got a good bit of edx-rbac context. It will probably work for you, but it can be a little finicky and…non-obvious.
***[John] [a deliberately provocative question] Does React make us faster or slower?
[John] It’s pretty complex; is it worth the complexity?
[David] Yes.
[Diana] Seems to be easier to work with than what we had before (Backbone/Underscore/jQuery)
[David] Part of it is trying to use what most other people are using successfully
[Jeremy] Some of the complexity and framework churn over time is due to site performance constraints - size and load time
[David] Arguably the MFE framework is too simple - too much divergence between individual MFEs
[Robert] A lot of this depends on “what are we comparing React to as an alternative?”
[Ned] Simplicity of the framework vs. simplicity of the code written using it
[Jeremy] If we want a concrete alternative to consider, I’d nominate Svelte. Much smaller browser footprint, intelligent compiler, and very liked by its users. But not nearly as widely adopted as React yet.
* [quest] (Dave O) What is the charset and collation used for edX’s databases now, and are there any plans to change them anytime soon? (I’d like to get everyone on utf8mb4 and more modern collations, but I don’t know what the state of things are for the big installs, or how painful the migration is)
[Andy] I think it may vary across DBs, we had to look at it for some of the insights work. Alison Langston may remember more.
[Alie] did look into this briefly for insights data pipeline work, because we ran into a bug with getting specific unicode characters through the data pipeline. Turns out it was a MySQL bug that had not yet been patched in the Aurora DB we were using (so not related to collation)
[Jeremy] It’s planned out, but not yet executed (still on utf8). Blocked on the MySQL 8 upgrade, I think because 5.7 doesn’t have the collation we want.
[Inform] (Andy) pushing for open sourcing most of the summary AI stuff, we’ll see how it goes
[inform] (Jeremy B) Another Thoughtworks Tech Radar is out. Does anyone want to review and discuss, either now or after time to read it?
[inform] (John) edx-rest-client now automatically forwards request ID headers, for traceability purposes
[quest] (Jeremy B) Does https://roadmap.sh/ look like it could help smooth out the learning curve for our tech stack? (The content and/or the tooling for creating such skill paths.)
Reasonable? https://roadmap.sh/full-stack
What do these boxes mean? https://roadmap.sh/react
They make sense to me. ¯\_(ツ)_/¯
Actively harmful: https://roadmap.sh/python
2023-05-10
[Alex] [question] Do we have any collateral/best practices/etc about caching strategies (not tactics)?
https://open-edx-proposals.readthedocs.io/en/latest/best-practices/oep-0022-bp-django-caches.html
https://www.youtube.com/watch?v=pa7igsOf4is (Dave O. presenting at 2023 Open edX conf.)
[John] we still have 2 jira instances? What’s the timeline on combining or sharing access broadly?
[Andy] Thinks the hosted Jira will probably live for a while longer - there’s a lot of idiosyncrasies that make it hard to “jump” from this one to the “other” Jira instance.
[Ned] We’ll eventually get migrated off of “server atlassian” (2u-internal) and onto “cloud atlassian” (TODO: name of other atlassian?)
Atlassian is going to eventually stop supporting their server version - we’re not going to be able to run it on our own, eventually.
[Alex] Who’s the right person or slack channel to answer this question?
[Ned] Ned is an ok person to ask, because he’s on the committee that’s deciding this stuff.
[David] Move this meeting a bit later, tack “office hours” onto it, and announce it’s happening?
[Yes votes] ++++++
[No votes]
Shift to post-lunch Eastern time slot
This basically excludes Pakistan & South Africa folk, but they haven’t been coming anyway
The hardest thing: naming. Is “office hours” somewhat misleading?
“Architecture de-couples therapy”
What’s a good shared calendar for this to live in?
[David] Can add to the platform team calendar
Tech-dev shared calendar would be good, but we need admin/IT help. Also, this calendar is possessed by demons of randomness.
[Emily] Ned, anything interesting at PyCon?
[Ned] Yes, is currently curating a list of recommended videos.
PyCon:
was two, 2-day tutorials (hands on, ya gotta pay extra for that)
More structured talks for the rest of the days through end of week…
…and then sprints for a couple of days.
It’s in Pittsburgh the next two years, you should go!
Ned gave a keynote talk, you can find it on his website, probably. https://nedbatchelder.com/blog/202305/pycon_2023_keynote.html
DjangoCon: you still have 5 days to submit a talk proposal.
In Durham, NC, mid-October.
[John] Inform: SRE has finished injecting X-Request-ID headers on everything.
Right now, you’ll see them only in the nginx logs.
We can start logging this with middleware now probably - like, log the request ID on every log line.
We can also re-use these request IDs when passing requests to services to get a distributed tracing sort of thing.
John will come and bother arch-bom about how to do these two items.
2023-05-03
[quest] (John) What’s up with course-discovery staying (or not) a core part of Open edX?
What’s “in/out” of Open edX is a complex decision point that isn’t well understood.
The Open Source Process working group is happy to hear about friction, and find ways to reduce it.
There have been some recent conversations around potentially moving large chunks of course-discovery to the event bus instead of celery jobs / data aggregation
How do we start publishing more events to the event bus?
[inform] (Ned) GitHub teams in the openedx org will be cleaned up in June: claim ones you still want: https://docs.google.com/spreadsheets/d/1nDWMTbmDHyrwWJsSsyr6d6Bx97IKAoKpm5mSVjRj108/edit#gid=0
[quest] (Jeremy) Next steps on Maintenance Board adoption - https://github.com/orgs/edx/projects/17/views/1
Some of the stuff here is for code that we honestly don’t know if we want to keep or not
We need a product-side DEPR process
Some of the problem is difficulty to test/understand code you don’t always work in
Get maintenance explicitly in org OGSPs
Justify as enabling ability to rapidly pivot to new priorities
Fix our ability to revert and fix-forward, so there’s more confidence in merging PRs with no obvious flaws
May be time to revisit and re-promote the Arch Manifesto
[Andy] the chaotic FOMO of ai projects
[Jeremy] Feels like the best uses have the lowest visibility. Maybe couple with a big announcement highlighting the improvements?
We need a product person actually responsible for making decisions on what we will and won’t use AI for
[Andy] Orbstack looks cool as a Docker Desktop replacement: https://orbstack.dev/
2023-04-05
[Jeremy] I periodically maintain https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/19471107/Conferences. Is this useful? Do people still get value from technical conferences?
Many people think of conferences in terms of the talks, because that’s what gets hyped a lot. But with those now often available online afterwards, the most value is often in other aspects.
We should highlight the networking benefits and the value to the company of being seen attending conferences. This page probably needs a “how and why to attend conferences” section or link.
[Ned] (what else?) Hackathon last call
2023-03-22
[John] Request IDs are in progress, can we move it along?
[inform] this is close to working, it’s enabled on stage
[Ned] codecov: do people use it?
It’s too expensive for private repos
It sometimes fails, which is a distraction
“More of an impediment than an enabler”
Bad: it complains if you delete covered lines
“Occasionally useful to indicate tests to write”
“Too binary: .1% off shouldn’t be a failure”
“Not sure we’re getting even $500/mo value from it”
Tangent: coverage metrics at all
We’re interested in getting broader feedback from devs, though.
Hackathon
ChatGPT foundations
Credits for experiments?
OpenAI account is hard to get
Trying to satisfy Legal
GPT3.5-Turbo is better, we should use it, and you can generally access this model via standard usage of the open AI API.
Will people need to spend money during the hackathon, and if so, how much?
“A team could spend $20 over the course of the hackathon”
“Multiple models, each with different capabilities and price points. Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.”
3.5 turbo is $0.002 per 1,000 tokens.
Assuming 250 words per page of a high-school essay, $20 would buy you a 30,000 page high-school essay.
Legal has concerns about 2U information being fed to OpenAI. Not clear how to get approval for experiments.
2023-03-15
[Jeremy/Tyler] To what extent should we lean into Kubernetes for development environments?
[Tyler] Devspace deploying into clusters
https://openedx.atlassian.net/wiki/spaces/AC/pages/3615850497
[Jeremy] What are the top concerns people have on a new dev environment?
[Jeremy] What should our next steps be?
[Andy]
Need to be able to wipe and reset state
Recently, slow - moving into the cloud feels like it would fix this
[John]
Not run all services to run 1 service
Tight coupling means lots of setup (~1.5 hours) to test a small change
[Alex] What exactly causes this?
[John] Anything that changes between your last devstack build can cause extra troubleshooting
[Alex] Wonder if we need snapshots.
[Phil] Our requirements are not always pinned.
[Tyler] Do we have a list of dependencies?
[Jeremy] the docker-compose.yml of devstack.git is what defines these, for things added to devstack. There are many new services of late that don’t make it into devstack (e.g. every enterprise service that’s not a library).
K8s? It’s hard to find a concise articulation of using k8s for a development environment and benefits of such, vs. docker-compose.
[John] Is k8s going to be painful because not a lot of people [in the world?] are running it for local development. Even if it’s a technologically superior choice?
[Jeremy] We designed Open edX originally to deploy each service as a distinct VM with some communication layer around it. Which translated fine to docker-compose, but it’s not built in a way to take advantage of k8s features.
[Tyler] Suspects data is the first big problem to tackle
[Jeremy] https://open-edx-proposals.readthedocs.io/en/latest/best-practices/oep-0037-bp-test-data.html
[John] A nice property of staging environments is that the work to set up application data is shared/re-used/extended amongst different devs and teams in the course of testing and verification.
[Andy] Would it be better to regularly re-import data into stage, even at the cost of people losing recent modifications/additions they’ve made?
[John] Describes an idea where e.g. sales demo account live on the staging environment, so that actual humans are populating realistic data on stage, and then you get pretty good data on staging for free.
[Jeremy] Candidate next steps
Create and maintain better stock data
Pick a cloud dev deployment approach
Clean up our configuration story to require less deployment-specific customization
[Jeremy] Necessary step: smooth the k8s learning-curve. You shouldn’t have to fully understand k8s just to start doing development.
2023-03-08
[John] [question] Thoughts on use of Django message framework.
[John] Django has a built-in notification library. We have 3rd party tools (social-core) that use it and the messages are not making it to the MFE.
[Robert]
Payments might be wrapping messages and delivering to an MFE.
Django messages is probably designed for delivering user messages within a single service (shared session) across multiple calls.
[Phil] Do we usually use Braze for this?
Braze has in-app messaging, but there’s probably a better/more native way to do this. Don’t want to increase our dependencies.
+ [Robert] [analysis] Discussion of blurred boundaries and varying needs of L&P and Marketplace within the monolith.
2023-02-22
DJ: [inform] Building out a few wiki pages for Solution Review: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/349831223/Solution+Review
*** DJ: [ideation] Offsite (onsite?) Open edX LMS boundaries discussion with tCRIL et al
This isn’t “2U + tCRIL” - we shouldn’t think of it that way
What about doing this during the conference?
Birds of a feather?
Use this time to figure out the shape of it and gauge interest
Friday is all scheduled out as working group time
David wants to discuss further with Robert
*** Ned: [question] Do people feel like they are in tune with what tCRIL is doing?
Example: WooCommerce vs Commerce Coordinator
https://discuss.openedx.org/t/tcril-funded-contribution-woocommerce-discovery/9337
“2U is currently building a next generation commerce platform (commerce-coordinator 1) that is extensible and pluggable. However this platform may be more complex and require more technical capabilities than all operators have. We want to see what it would look like to try to integrate the Open edX platform directly with a 3rd party commerce platform that we do not need to maintain. Is there a possibility that we can have a simple implementation that will work for smaller deployments that would complement commerce-coordinator for larger deployments?”
(David) Following Discourse helps, especially the Announcement space
Had trouble getting notifications for anything smaller than “everything”
DJ: Quibble - notifications and emails in discourse are two unconnected systems… notifications may work reasonably, but it requires you to show up at the site to see them.
(Ned) Discourse in particular is polarizing - tCRIL recommending as primary means of communication, 2U largely ignoring it
(John) Sudden change made by 2U in discovery for representing 3rd-party content
(Jeremy) I track it pretty closely, but that’s a large chunk of my job and it leaves me basically no time for coding
(David) Do we need an Open edX activity digest?
(Andy) Curation would definitely improve the signal/noise ratio
Very large digression about the competing complexity needs between most of the community and 2U
2023-02-15
[andy] 2u infra / edx infra comparative service spin up time
Takes about 2 weeks for edX
Cookiecutter makes the template fast
There’s a bunch of manual configuration lookup settings
Terraform has to be done somewhat by hand
Instructions were still a bit sparse at the time
Takes about 1 day for “2U” (mostly self-service)
Does a lot less, but up much faster
This was for a Node project in k8s by someone with experience doing this
Looking to see if there’s a doc for that process
Doesn’t need to connect to the LMS or other Open edX resources, which simplified things a lot
We should review the instructions to identify the manual and/or tricky parts
[quest] (Jeremy) What are the best resources you’ve seen for learning the basics of Kubernetes and when/why you should choose it?
[quest] (Jeremy) What are good next steps to put all the things in k8s?
2023-02-08
[inform] (Jeremy) I wrote a first draft of https://openedx.atlassian.net/wiki/spaces/AC/pages/3660316693 , feedback welcome
[praise] (Alex) I love this table of contents!
[inform] (Jeremy) Also wrote https://openedx.atlassian.net/wiki/spaces/AC/pages/3664904195 , again feedback welcome
(Andy) There needs to be some incentive on the project team side to follow this process.
(Robert) Maybe this needs an injection into our OGSPs.
(Jeremey) Northwards, the sentiment is that engineering managers should do this triage.
(Jeremy) From “Making Work Visible” - often, not enough importance is placed on “maintaining revenue”, often err too much on the side of “generate new revenue”.
(Phil) Discusses the nature of sec working group dropping work into other teams’ backlogs and the importance of clarifying the prioritization of that work - often Sec WG starts a thread in a slack channel and it gets treated as a CAT-1, where in actuality, it should be treated as a CAT-3.
(Phil) Can we triage SEC issues onto this maintenance board?
(Jeremy) I hope so, but we’re still collecting feedback on this board/process. Hopefully this board allows us to “see the problem of everything being stuck” and come up with solutions to un-stick us.
(Robert) Is reminded of the necessity of making PRs as small as reasonably possible to increase the probability that it’s properly reviewed by an owning team.
(Jeremy) On the flip side, there’s so much overhead in getting attention onto the PR that it incentivizes jamming more work into a PR (example feedback from Open edX community).
[quest] (Alex) How do we talk about “big things”?
Examples:
(from Robert) “Our marketing site, purchasing capabilities, enrollment handling, etc. all came from a legacy view of the world where the Open edX LMS was the home of all courses. This is no longer true, at least from a non-technical perspective. What conversations and efforts are happening around the long-term capabilities and boundaries will be? I know we’ve made lots of short-term decisions to just get data to the places that were already feeding the marketing site as quickly as possible. I don’t think that is the ideal long-terms solution, but wondering where and if this is being discussed.”
The LMS user identity OEP is another good example - we found a bigger solution that was generally preferable, but we stuck with a kind of local maximum instead (the existing numeric LMS auth.user.id)
(Jeremy) People have limited short term memory, and it’s hard to reason about big, complex things. But we have very good visual processing centers in our brain. So make a visual representation to facilitate a good discussion about “big things”.
(Alex, John) Having the best picture you can up front, discussed synchronously with a somewhat small group of people has worked well for us very recently (and probably historically, too).
(John) The nature of micro-web services gives us constraints that actually makes problems more approachable. Good microservice design can help promote team behaviors that reinforce autonomy, ownership, etc.
(Andy) Having the big picture and then farming out pieces of the whole to smaller teams works well when there’s a designated lead who’s in charge of the big picture.
It’s difficult to balance this for the lead and the teams - teams want autonomy and agency, and often leads would prefer the teams to be autonomous (either altruistically or lazily).
(Alex) How do we support leads on projects like this? Is there a doc or something? Would there be value in having an artifact like this? E.g. “who has authority to decide if X is a reasonable thing to do, and can you do it now? Can your team make a local decision about X now? What does ownership mean in terms of responsibility?”
https://github.com/edx/wg-arch-process/issues is a good place to propose the construction of an artifact like this.
(Robert) [Alex missed this, but it seemed important, please fill in if you can…something about docs or process Julie was putting together?]
(Everyone, always) plug for using Pact - if these tests break, we know you’re breaking your promises as an owner.
(Jeremy) QA team is starting to talk to Vanguards about expanding usage, talking to Arch-BOM and Arbi-BOM to find more partners to actually use it.
(Jeremy) Threatens again to have a Pact representative come talk to us.
(John) Being consensus-driven often makes us want to “get it completely right” up-front. But no matter what, once we start writing and deploying code, we’re going to realize we were wrong about assumptions and decisions, but that’s why we iterate.
(Everyone) Notes that we have one or two docs from Andy and Robert each about being a lead (or subsets of that).
(Andy) One major benefit of declaring a lead: making clear the things we will _not_ do. Committees are far more apt to say “yes” to too much.
(John) Do we have too many “dotted lines”. Are our hierarchies too gentle? Who is going to make the strong-handed, hard decisions? And are those people managers?
[quest] (John) There’s also lots of little things where we come up with good ideas to address them, but they get lost or stalled. How do we make the outcomes of the discussions actionable and then get them done?
2023-02-01
[inform] (Jeremy) Feedback and assistance welcome on Dev Environment Features
[note] This is comment-only! Added suggestion as a comment.
[inform] (Jeremy) Maintenance board format updates: https://github.com/orgs/edx/projects/17/views/1
[Robert] Does it make sense to also start using this for product work that requires cross-team review?
[Jeremy] Probably, especially if the volume or latency of that work increases
[inform] (Ned) repos can now get external PRs added to GitHub projects
At least 4 teams are using GitHub Projects actively to coordinate with other Open edX community orgs
[inform] (Robert/John) Web requests greater than 1 minute are erroring, even though they don’t look like it in New Relic.
Load balancer is returning an error code to the user, but New Relic doesn’t see it
[ideation] (Phil) Does anyone know of good API spec authoring tools?
Phil: Is doing this in a Google Sheet and getting unwieldy
Alex D.: Capturing in .rst as an ADR for the repo where the API lives
Benefit: reviewable, can use Git for version control
Robert: There is a list version of tables for rST files that is much easier than the default.
Jansen: In some places we have some Open API code annotations that does autodoc.
Alex D.: There is also a library called drf-spectacular that will let you annotate in a cleaner way.
Andy: Stubbing out the code is probably easier. Fan of writing a doc, but keeping it very small. People don’t read documentation, so it’s better to do in code.
Alex D.: Benefit of ADR code - you’re basically writing the API if it’s default enough, and you will make decisions, which is magic since it’ll be in the ADR already.
Alex D.: Answer may vary by use case. It’s always hard when doing multiple teams.
John: We’ve been thinking about using Pact! There’s been a previous all hands on this. It will auto-create the tests for interfacing with external APIs.
Jeremy: Gave a mini-Pact 101. See existing work in https://2u-internal.atlassian.net/wiki/spaces/IM/pages/18973253/Contract+Testing+Automation+Flow . We’re still learning how to integrate this at edX. The QA team is the main owner right now.
Phil: Thanks Alexander Dusenbery Jansen Kantor John Nagro Jeremy Bowman Andy Shultz!
[ideation] (Jeremy) Any particular technology or process pain points you’ve been feeling recently?
[John] Lack of distributed tracing
[Robert] Not having per-service configuration-as-code New Relic settings
[Robert] See [Observability] Enabling distributed tracing for LMS workers #174
[Jeremy] Lack of clarity around who’s responsible for keeping PRs moving from creation to deployment
Have added “Author Team Review”, “Owner Review”, and “Approved” columns to some Kanban boards to shed light on this
Otherwise, not many frustrations it seems
2023-01-25
[inform] (Jeremy) Making Work Visible (Summary)
[very quick inform] (Ned) Hackathon?
[inform, request for feedback] we’re planning b2c subscriptions, please see this tech spec if you’re interested in providing feedback B2C Subscriptions - Programs MVP Tech spec
[analysis] (Jeremy) Maintenance workflow automation
[request for feedback] (Ned) roadmap items for continued decoupling/integration/tcril-etc
What automation would help you?
What information are you lacking?
[quest] (Robert) As the rest of 2U moves onto our Confluence wiki (https://2u-internal.atlassian.net), do we want any best practices documented?
[David] Do we have any best practices? (half joke, half serious)
Be sure permissions are what we want (permissive)
[quest] (Alex) What is our current software development methodology? Like, are we still an “Agile” organization?
2023-01-18
*** [Phil] [quest] Poll on ticket creation & different practices
Who is authorized to create tickets?
How do you add tickets to other teams’ backlogs?
How do you tech leads approach ticket creation?
How long do your teams spend in grooming?
[Ned] Jira and/or GitHub Issues?
[Phil] Both
[Andy]
Everybody should be able to create tickets.
Tech lead lays out tickets, but follow on tickets are very common.
[John]
Product Manager helps prioritize.
All new tickets land into “For Grooming”.
Acceptance criteria definition is part of grooming.
Also agree anyone should be able to create ticket
[Jeremy]
Previously, have seen teams accumulate cumbersome piles of tickets.
Maybe deferring tickets to get them out of the main backlog is the right solution.
[Chris]
Is there a way to prioritize/sort GitHub issues?
[Ned]
Issues in projects can be in separate columns
Can have priority fields
[John]
What is the specific pain point?
[Phil] Spending too much time grooming.
Grooming is really a ROM (rough order of magnitude).
Pointing is S/M/L.
[Robert]
Discovery & bug tickets are good for uncertain efforts and can save you grooming/research time.
[Andy]
Philosophy: more tickets are better if the purpose of the ticketing system is work tracking.
This means that you need to reduce ticket creation overhead. Tickets should not be hard to create.
[John]
Linking tickets is also really powerful.
[Robert]
For tickets for other teams: CRs are a good way to create tickets for other teams.
**[Ned] (inform) Hackathon -> #interest-hackathon
Join if you’re at all interested!
** [Jeremy] https://github.blog/2023-01-17-git-security-vulnerabilities-announced-2/
[Ned] GitHub blog posts is a fantastic source of information about updates to Git core.
Subtopics:
Clients
Users may not have admin rights
Users may have multiple copies of Git on their machine
Servers
[Ned] Follow-on: implicit credentials?
Would love ideas on how to make local Git need more authentication to access his GitHub account
[John] Hardware keys are one way to do this.
**[Jeremy] https://github.com/wemake-services/django-test-migrations has a migration name check (to make sure you gave the migration a more useful name than 000x_auto_<timestamp>) and some functionality for testing forward and backward migrations. Does that sound useful enough to try it out?
Consensus: yes
*[Jeremy] If you haven’t already reviewed https://openedx.atlassian.net/wiki/spaces/AC/pages/3615850497 , please do so
Test factories
[Ned] (inform) 2U is not in our wiki
[Jeremy] Maintenance GitHub Project board: https://github.com/orgsn/edx/projects/17/views/1
[Ned] Why is it private?
[Jeremy] Leftover from when I was collecting initial feedback, just made public
[Feanil] CFP Closes next Monday
[Ned] Munch & Learn on New Relic at 10!
2023-01-04
[John] [inform] Quick inform about indexes with Aurora.
John may update https://openedx.atlassian.net/wiki/spaces/AC/pages/23003228#EverythingAboutDatabaseMigrations-Howtoaddindextoexistingtable(AWSAurora)
non-blocking index creation with Aurora2/Mysql57 https://github.com/openedx/edx-enterprise/pull/1693/files
[Ned] Last plea for conference talk ideas: https://docs.google.com/document/d/1nBW_uS7KSjFNq1K_sjkv8IliadcUiDc06HqCo4DIjas/edit#
[Phil] [inform] In re. finding DB performance information in New Relic, Chris Pappas showed me a couple of these recently! (Led to this. All links for Ecommerce.)
APM/Databases/Slow SQL Traces: List of slow queries for a service: https://onenr.io/0PwJY5LJOQ7
APM/Databases/Slow SQL Traces/Query Analysis: Links to on what to improve on a slow query: https://onenr.io/0BQ1Z5bqlwx
APM/Transactions/Transaction Traces: List of slow views/transactions for a service: https://onenr.io/0Vwg1ann0jJ
APM/Transactions/Transaction Traces/Database queries: Suggestions for how to improve the index: https://onenr.io/0ZQWamG3MRW