Arch Hours: 2023

Meeting Expectations

Why?

  • Provide an opportunity for generative discussion and ideas.

  • Foster comradery through technical curiosity and geekdom.

Who?

  • Open to all edX-ers and Arbisoft-ers

What?

  • At times, these informal discussions result in follow-up action and beneficial change in our technology or in our organization. While this is not a decision-making body, these serendipitous discussions spark ideas that may result in ADRs/OEPs and tickets on team backlogs.

  • At times, it serves as a form of informal office hours to ask live technical questions of the archeological collective.

  • At times, we have pre-planned deep-dive topics that folks propose to gather wide-input or to answer questions.

  • At times, we have hosted special guests (internal and external to edX) on specialized topics.

When?

  • Not lunch hour in ET timezone: With Covid remote work, "Arch Lunch" has evolved into “Arch Hour” in order to accommodate various home/life situations during lunch time.

How? Live Co-Editing

To circumvent Confluence’s limitations with the maximum number of concurrent editors:

Why not just stick with keeping the notes in the Google doc?

  • Google docs are not as discoverable.

  • Google docs don’t notify observers of future edits.

  • Google doc comments don’t notify all observers.

How? Structure

Please enter your proposed topics for discussion.
When we use Lean Coffee Style (link1, link2), we vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).

Prefix your topic with your intention so we are clear on what outcome you are striving from the discussion. Examples:

  • [inform] You are simply seeking to inform the group of this item. You may field clarifying questions from the group on your inform, but not seeking further discussion at this time.

  • [ideation] You are seeking divergent and wide perspectives from this group. In this brainstorming mode, all ideas are accepted, without critical analysis.

    • It may be helpful to clarify whether you’d like to ideate on the problem space or the solution space.

  • [analysis] You are asking the group to help you poke holes in your idea/topic/plan/etc.

  • [quest] You are seeking information/responses to a question you have.

2023-12-20

2023-12-13

  • [quest] (Dave): How’s the MySQL 8.0 switchover going?

    • Scheduled for 2am tonight

  • [quest] (Dave): What’s a good way to roll out database connection encoding changes to Studio for ? (configuration repo help)

    • Dave to make issue in configuration repo (?) to track this.

    • This may be overridable in edx-internal (which would allow for faster rollback)

    • Jeremy will see if we want to turn on Issues there

  • High-level, external-safe discussion of recent 2U staff meetings

  • [ideation] (Jeremy) 2U executive management has been talking a lot about the “Open edX maintenance burden”.  Where are teams feeling this, and how much time is it taking?  What parts of it would we actually be comfortable handing over to other organizations to handle?

    • Fixing bugs?

    • Merging dependency upgrade PRs?

    • Big framework upgrades?

    • Roadmap decisions?

    • Reviewing changes from outside the core owning/maintaining team?

    • Deprecating stuff that’s no longer useful?

    • Building extension points so optional features can be added without being added for everyone?

  • [quest] (Jeremy) Are there more things like the Insights stack that are effectively only used by 2U, and should be deprecated as far as Open edX is concerned?

    • And how should we proactively identify things like this moving forward?

    • (Andy) At least a yearish ago there was at least one other insights user

  • [quest] How deep are architecture vendor commitments?  What failover features are there?

2023-12-06

  • [Ned] What is bad about installing dependencies from GitHub?

  • [Jeremy] What would people want to see from an Open edX maintenance working group?

    • Expertise about how to use Dependabot and Renovate

    • “Error budget” for teams is useful

    • Test suites that are comprehensive enough and fast enough to run

    • Standard way to avoid trying to upgrade to a known-broken release again somewhere else

    • Clear path from identification of unwanted dependencies to deprecation of them

    • Path to consistently applying new good code patterns across our codebase

  • [Jeremy] What are leading causes of delayed/lost PR reviews on your teams?

    • [Andy/Cosmonauts] Big, intimidating PRs and lack of context after an ownership transfer

    • [Chris] Variety of different reasons

    • [Hilary] PRs that span ownership boundaries

    • [Andy] High/unclear level of responsibility from approving a PR

2023-11-30

  • [inform] (Dave) Submit Open edX conference talks!

  • [AS][Wild Speculation] Are we in a startup runway situation now?

    • Should we consider more extreme short-term measures than we normally would, to minimize operating costs in the meantime?

    • Axim is coordinating a pile of funded contribution (FC) projects to make various improvements already; not a lot of bandwidth for new feature requests right now.

    • Spent a lot of time discussing this, hard to distill it into key points

2023-11-15

  • [inform] (Jeremy)  

  • [musings/questions] (Ned) Mapping people/squads/repos

  • [Question] (Hilary) If we’re trying to design new db for a model, who is the best person/people to talk to?

    • Solution Review

    • Dave Ormsbee at Axim

    • Many principal/staff engineers

    • No real DBAs, but…

    • At least for a while, we have part-time contract DBA via Percona

    • Also, consult  

  • [inform] (Jeremy) Socket.dev - tool for tracking dependency health 

  • [quest] (Jeremy) Awareness of Django’s transaction.on_commit() and similar tools which are incredibly useful in the right circumstances.  How do we make sure developers are aware of / informed of these when appropriate?

    • New “everything about concurrency” guide?

      • Celery

      • Transactions

      • Event bus

      • Django async

  • [inform] (Alex) Enterprise likes drf-spectacular

2023-11-08

  • [inform] (Dave) New Relic is a great resource!

  • [quest] (Dave) There are multiple places where it would be beneficial to add indexes in large tables (~10-100M+ rows?). I don’t think they’ll lock anything, but would the length  of time it takes to run the migration be disruptive to the release process? What’s a good path forward so we don’t take edx.org by surprise?

    • Please hold off until MySQL 8.0 update has completed.

    • Long term: Need to do something about partitioning CSM.

  • [ideation] (Dave) Serving assets with Caddy (and maybe nginx?) via X-Accel-Redirect.

    •  

      • Dave: FWIW, I wasn’t planning to use this. I think folks have standardized on the header since this was written, it doesn’t seem like it’s completely up to date, and I’m planning to do a lot of header tweaking anyhow.

    • Can make the container run Caddy and have that work in devstack as well.

    • Actual nginx configuration still happens via the configuration repo–can talk to TNL for testing help.

  • [question] (Ned) why are there so many old renovate pull requests open?

    • 86 created before 2023: ​​https://github.com/pulls?q=is%3Apr+is%3Aopen+author%3Aapp%2Frenovate+org%3Aopenedx+created%3A%3C2023-01-01

  • [question] (Jeff) Any special security considerations re browser extensions?

  • [question] (Jeff) AGPL / borrowing Canvas code concerns?

  • [question] (Jeff) How best to query for the presence of SRT captions files for videos?

2023-11-01

  • [inform] (Ned) GitHub pull request labels can make Jira issues

    • Get in touch with Ned if you want to enable this for specific repo/project pairs

    • TNL & Aperture are testing it out

  • [inform] (Andy) similar to what is going on with edx-search, what’s going on with enrollments

    • Just edit it if it’s wrong 🙂

  • [quest] (Ned) Where are we on cypress testing?

    • Have functional (although sparse) e2e smoke test suite; wasn’t too much easier to set up than bok-choy, but does seem to be easier to maintain

    • There have been mentions of expanding the smoke tests and/or using it for a11y CI checks, but no concrete plans yet

    • Requires JS experience and a review of the “lessons learned” docs we have so far

  • [quest] (Jeremy) Do people think Conventional Comments are a good idea?

    • Some enterprise engineers at 2U are debating it:  

    • Robert does something like this already, a few others do at least sometimes

    • People seem generally up for trying it

    • Don’t think we want to enforce it, but may be good to suggest it?

    • English is hard (risk of stumbling into the recommended formatting while meaning and thinking about something else)

    • Arch-BOM uses guidelines for getting the word out about new things

    • May be good just to see some senior people doing it as an example, to see if it starts a trend

  • [ideation] (Jeremy) Mitigating pressure for 2U & Open edX code to diverge

    • 2U concerns

      • Security of releases

        • Including regulatory compliance

      • Ability to deploy changes quickly

      • Extra deprecation overhead (relatively minor point)

    • Axim/Open edX concerns

      • Codebase clean of 2U-specific cruft

    • Shared worries

      • Time needed to reconcile divergent branches

      • Risk of permanent divergence resulting from logistics rather than intent/benefit

    • Ideas

      • Try to make sure that core committer merge process satisfies core regulatory requirements: require at least one non-author approver, have at least a PR (if not a separate issue) to track the rationale for each change, etc. (this may also benefit companies other than 2U in satisfying regulatory needs)

      • If necessary, use fast-forward-only release branches for edx.org that lag main/master just enough to perform any mandatory review for commits from non-2U-employees/contractors (but do we really need this, given that we don’t do it for other dependencies?)

      • Document and expand the range of options for quickly getting changes to production via config changes, extension points, temporary forks, etc. that don’t require putting 2U-specific code in Open edX

2023-10-25

Skipped so 2U developers could stay focused on Innovation Week (hackathon).

2023-10-18

  • [inform] (Ned) Innovation Week!!1!

  • [quest] (Jeremy) Does anyone have a security/vulnerability scanner they would recommend?  Dependabot can’t track both main/master and a release branch.

  • [informish] (Andy) edx-search research (also 2u specific), maybe we should do more of this, looking at enrollments now.

  • [Question] CourseOverview has an org string field, not a fk to Organization - Any insight into the architectural reasons for this?

    • CourseOverview predates the organization table, and is essentially a cache of data in MongoDB

2023-10-11

  • [quest] (Ned) Do people understand that Core Contributors are allowed to merge without 2U approval?

    • It seems at least some people didn’t realize this

    • Maybe people like Virginia who are new to 2U and/or Open edX aren’t aware of this either?

  • [inform] (Jeremy) Maintenance Working Group launch preparation

    • Let Jeremy know if you’re interested in joining

  • [quest] (Jeff Witt) A11y in CI – opinions on how to implement?

    • WCAG 2.2 was released last week

    • A few a11y issues have made it into production that shouldn’t have

    • We used to have a minimal suite of automated a11y tests, but it was very hard to maintain and rarely caught problems, so we got rid of it

    • Tools have improved over time, we’re about due to pick one to employ for CI

    • We were using axe-core which is still popular, but may be going closed-source soon; there are other options now also

    • Jeremy’s notes on tools, etc. from earlier discussions around this:

    • Shifting left on catching a11y issues really reduces the cost of compliance

    • We aren’t doing the annual a11y training anymore, but we plan to roll it out again for at least the most relevant personnel

  • [quest] (Jeremy) Tech stack consolidation - what do you think we should try to get rid of, and in favor of what?

    • https://backstage.techdev.2u.com/ has a comparison of the 2U vertical tech stacks & processes as of shortly after the edX acquisition (2U-private, it’s on the main page after authenticating via GitHub)

  • [quest] (Jeff) If I set up a test user in a production course, does that muck with financial reporting?

    • (Matt) Enterprise has some test user capabilities set up for testing integrations and such

2023-10-04

  • [quest] (Dave) How is the MySQL 8 upgrade coming along?

    • (Jeremy) Largely done except for prod LMS, which is rerunning a days-long mitigation schema change

    • (Jeremy) We just switched devstack over

    • (Jeremy) Django 4.2 LMS upgrade involves password hash change, Arbi-BOM discussing options with BTR

  • [quest] (Dave) Is 2U using OrbStack now?

    • (Jeremy) Several individuals are, still going through vendor review for broader adoption

  • [quest] (Jeremy) Does anybody have thoughts on gaps in Kubernetes for use as a dev environment?

    • (John) Test data is needed regardless of which dev environment we go with

2023-09-27

  • [inform] (Ned) 2U has signed the Axim non-technical contribution agreement.

  • [inform] (Ned) Next 2U hackathon (“Innovation Week”) is Oct 23-25

  • [inform] (Jeremy) 2U vendor review triggers & consequences

    • Install things in a virtual machine with no VPN access to test them?

  • [quest] (Jeremy) What technology/tool do you think would best help you get good work done, but we just aren’t set up to use it yet?

    • (Matt) AI coding assistants?

      • (Jeremy) There are interesting and unanswered questions around inadvertent copyright infringement and copyrightability of the generated code

        • (Andy) Solvable via custom LLM trained on our own code?

      • (Hilary) Would this exacerbate the problem of not having enough people with deep knowledge of different parts of our systems?

      • (Matt) Maybe this will help us get knowledge bases that actually work

    • (Andy) OrbStack!

    • (Andy) Type hinting?  TypeScript?

    • (Hilary) prop-types in JavaScript?

    • (Matt) Cloudflare AI tools

2023-09-20

  • [Ned] Axim would like to remove admin access to repos (this is different from the write access topic I mentioned last week).  What concerns do people have about that?

    • [John] Is it likely that we’ll end up with a bunch of edx-org forks as a result of this?

      • [Ned] Feels unlikely at this point, lots of extra work

      • [Ned] More likely that we start using personal forks more often

    • [Ned] When do people really want admin access, anyway?

      • [Jeremy] Updating branch protection (required CI checks)

      • [Andy] Initial repo setup

  • [David] 2U Internal Marketplace update

    • [Jeff] Will the frontend be using Paragon or something else?

    • We think we’ll still have something like commerce coordinator, but it may have a more limited, appropriately scoped API than originally envisioned

    • [Jeremy] Arbi-BOM noted that we’ve been wanting to get rid of django-oscar & ecommerce for 2 years, now needing to upgrade them again for another Django upgrade

  • [quest](Hilary) - api access

2023-09-13

  • [inform] (Hilary) OEP-66 PR up for review

    • Includes the relevant bits of OEP-9, drops or updates others

    • Bridgekeeper vs. rules

  • [quest] (Jeremy) Has anyone worked with Kompose or any other tools for 1) migrating from docker-compose to k8s or 2) allowing deployment of custom subsets of related microservices in k8s?

    • Not in this crowd

  • [question] (Mat) Infinity is looking to understand if Notifications belongs in edx or openedx. There’s not been a lot of understanding where this feature should eventually land.

  • [question] (Dave): The cut-off for Quince is coming soon (October 9th). Will we be on Django 4.2 for edx-platform? MySQL 8.0?

    • Devstack should be switching over to MySQL 8 within a week

    • Code is largely ready for Django 4.2, last PRs are being finalized and merged

    • Trying to get edx.org on MySQL 8 before updating the default requirements

    • If we miss end of September for that, we’ll try to make Django 4.2 the default with alternate 3.2-using requirements files for edx.org to use (annoying to implement in edx.org deployment pipelines)

    • BTW:  

  • [inform] (Ned) Long-term we are going to be reducing 2U access to all repos, and give teams write access to the repos they need.

  • [request] (Ned) Please educate new devs about the public aspects of much of our code.

    • Branch naming

    • Avoid links to private data

2023-09-06

  • [quest] (Robert) It seems the LMS auth_user table doesn’t have a modified date.

    • Context:  

    • Is there a way to determine this?

    • If not, what’s the best way to resolve this at our scale (at least going forward)?

    • Django offers support for custom user tables, but edx-platform predates the feature, and it would be a fairly painful switch at this point

    • We sent tracking logs on most field changes, but not all

      • first_name and last_name are ignored, for example

    • How about django-simple history?

      • Would give us a full audit record moving forward, but likely to be massive (and probably overkill)

    • How do post_save and post_commit interact?

  • [quest] (Deborah) Is there a documented general principle about what log level to use?

    • Different teams have logs in Splunk, New Relic, and/or DataDog

    • If we did a major change of log formatting, we’d probably want to talk to the community first

    • We should probably have some kind of OEP/ADR for Python logging in general

  • [inform/quest] (Robert) How can we quickly determine the actual impact of a category of error?

    • Number of users impacted, scope of impact for each user, etc.

    • Starting to discuss with Data Engineering

    • https://onenr.io/0kjnpPZ56wo

      • Can this be tied to course completion?

    • May be a good case study for engineering noticing a problem and trying to make a call on how much work to put into getting resources allocated to fixing it

2023-08-30

  • [inform] (Jeremy) We are seeking good topics for the edX engineering blog and/or volunteers to write about them

  • [quest] (Jeremy) Should we consider piloting structlog somewhere?

  • [quest] (Phil) Does anyone know anything about Next.js?

    • Phil:

      • I know relatively little about Next.js; any advice? Have we used it at edX before? Does it work well with our current stack? Are there any good learning resources out there?

      • Looking at: django-nextjs, a Django app plugin to enable a Django IDA to serve Next.js server-side code [blog post]

      • Apparently exists to allow Django templates and Next.js templates to co-exist

      • (Jeremy) FED-BOM did a little discovery on this and related tooling in (we decided at the time to punt until some of the alternatives mature a bit)

    • Jeremy: Related issue:

    • This is more of a question for Frontend WG/Paragon WG.

    • On the open-source side, OEP-11 is the guiding document that may need amendment for Next.js.

  • [quest] (Jeremy) In light of Deciphering Glyph :: Get Your Mac Python From Python.org , should we make any updates to recommendations for installing Python on macOS?

    • (flame): pyenv!

    • Current documentation: 

    • Summary: no real need to change anything here, either pyenv or official installers should work for most people at 2U / working on Open edX

  • [quest] (Dave) Any thoughts on porting forums service to a Django app? (I know there’s Infinity discovery around this, but I’m curious if others had also looked at this problem, or if there were other discussions not captured in those docs.) Context is that Axim is considering funding something here.

    • We’d love to see this get done

    • Diana: concern about data migration

      • Dave: Phase 1 would keep the data in place and start moving towards removing the Ruby aspect of things. Any potential data migration would happen after that.

    • Would alleviate (admittedly modest) security concerns around Sinatra and dependency gems

    • Would greatly reduce barriers to making forums enhancements

  • [quest] (Jeff) Mathjax version 3 OSPR received; could we roll this out course-by-course?

    • Also, native browser rendering or MathCAT may be preferable in some cases

    • (Dave) Maybe go to the Product Working Group to plan out the per-course rollout capability?

  • [quest] (Jeremy) Dev environment direction

    • (Hilary) Think there will still be a need for local long term

      • Intermittent internet connections, for example

    • (Andy) There are some jury-rigged AI configurations that are easier to set up locally

      • But default to remote, fall back to local may be where we want to end up

    • (aside) Open edX is featured on Orbstack’s website!  

  • [quest] (Hilary) How do people set up new large Open edX sites?

    • Mostly custom Terraform and such, sometimes based off of an AWS solution template

    • Although Harmony is a new Kubernetes-based solution for this

2023-08-23

  • [analysis] (Jeremy) Docker Desktop replacement

    • Wiki page with some analysis:

    • Arch-BOM ticket for continuing investigation:

  • [quest] (Ned) Why are we OK with a 2-hour deployment pipeline?

    • (Andy) It’s worse than that, it’s a 2 hour nondeterministic pipeline

      • in progress

    • (Phil) From GoCD edxapp statistics: 45+20+1+1+15+(2+3+7)=94 minutes

      • The 45 minutes (half the duration) is building the AMI

      • Each number is the average duration of a pipeline step.

      • Pipeline #’s in parenthesis happen after the build is available on prod.

    • We don’t have a good basis of comparing even between different pipelines within 2U

    • (Jeremy) We don’t want to continue using GoCD in the long term, which leads to debate on the value of optimizing it (vs. doing work to switch to Argo CD instead)

    • (Jeremy) There are several parallel efforts to reduce edx-platform build time, but the time to value delivery is long doing it that way.  Maybe we should concentrate our efforts a little better for incremental value delivery?

  • [quest] (Alex D.) Any patterns that folks like for data replication between services?

    • [we talked about things]

    • see points about Eventual Consistency

  • [quest] (Ned) What kinds of informal education are useful for developers?

    • High-level block diagram (context/container from c4)

    • Architectural onboarding has fallen by the wayside

    • How code is organized (mono-repo and otherwise)

    • Migrations, what they are and how they go wrong

    • Celery

    • Tour of a new ida makefile

    • What counts as “core”

  • [quest] (Adam) How do we get better at either smoke tests or health checks so that we can make big changes to infrastructure more confidently and detect things before we ship bugs to prod.

2023-08-16

  • [ideation] (Jeremy) What (if anything) should we do for next-generation automated a11y testing?

    • The QA team is interested in working on this if we want to put effort into it

    • (Jeff) We have something to run axe-core for MFEs, unclear how many MFEs are running it and how often

    • (Jeff) There are services that do automated checks for large sites, but they’re ridiculously expensive

    • (Jeff) Have been trying to pick a tool by the end of the year

    • (Jeff) Also, there’s an external audit in the works

    • Jeremy will connect Jeff and the QA team to try to figure out next steps

  • [quest] (Hilary) - Is anyone interested in being a co-author on an OEP about authz best practices?

  • [quest] (Jeff) - Can you think of any problems if no MathML rendering library is included with Open edX or ? (Chrome and Firefox include MathML rendering now, and I wonder if interactivity might be better as a browser extension for some users)

    • There’s also MathCAT

      • Oh hey, it’s written in Rust

    • Browser support has gotten much better in recent years

    • We haven’t yet done a detailed a11y evaluation of the options

    • Worth looking at using the native browser rendering just for the JS download size savings

    • We spend non-trivial $ pushing MathJax bits to everyone in a lot of pages.

  • [question] (Ned) What does a “Cybersecurity review” entail?.

    • Came up in the context of being needed for anything being open sourced

    • (Adam) Sometimes involves pentesting or an external security firm review

    • (Hilary) If you don’t have confidence in the answers to the form, you can get help talking through it

      • Brian M and Purva have done an AppSec for Trilogy user account things

  • [question] (Adam) Do operators actually need to upgrade to MySQL 8 by the Quince release?

    • If they need to defer the MySQL 8 upgrade, they’ll defer the Open edX upgrade

    • It’s tricky to do a release that supports installation with either Django 3.2 or 4.2

    • Django 4.2 will generate SQL that just doesn’t work with MySQL 5.7

2023-08-09

2023-07-26

  • [analysis] (Ned) Marking problematic PRs

    • Idea: add a label to any PR that required a revert or fix-forward

    • We could look back through revert PRs to catch many of these

    • Maybe QA or Incident Response would be up for marking historical ones?

    • We’d need to update the incident runbook to flag new ones

  • [inform] (Jeremy) Kolo for Django (Django dev/debugging tool for VSCode) 

  • [inform] (Feanil) There is edx-platform API documentation at

  • [quest] (Jeremy) Do people have any eagerness or reluctance to switch from setup.py to pyproject.toml?

    • Came up recently in  

    • (Feanil) I’d like to see it happen so we get more consistent

      • Cookiecutters first (or at least early)

    • (Jeremy) I’d love to get it done, just trying to figure out priority relative to other projects

  • [quest] (Jeremy) Has anybody found a good way to profile the build time of Docker images?

    • “docker build” has basic per-step timings, but they’re kind of lost in the noise

    •  

  • [quest] (Jeremy) Are people ok with a console report of prioritized repo health issues in code they own?  Or is there a strong preference for browser/other UI?

2023-07-19

  • [quest] (Hilary) Where should docs go for Open edX code?

    • (Ned) Try to keep it as close to the relevant code as possible.

      • Common choices include Read the Docs & Confluence

      • But the main thing is to get something written, it can be moved later if appropriate

    • (Robert) (post-meeting) See

  • [inform] (Ned) We’re starting to plan an October Hackathon

    • If you want get involved, please join #interest-hackathon-planning

    • (Hilary) For some teams, the theme was a dis-incentive because the assumption was that projects needed to be on-theme, no matter how many times the contrary was stated

      • Let’s point to the list of projects that were done last time, to illustrate that other topics are actually ok

  • Adjourned early for lack of topics & low attendance

2023-07-12

  • [ideation] (Dave O) Proposal: Make MinIO a part of the default Tutor / Devstack install, and let Django apps and services assume an S3-like interface instead of having to accommodate any django-storages backend–i.e. drop support for storing that data directly on the filesystem.

    • OEP

    • DEPR

    • How to migrate folks away

    • Check on Swift usage/compatibility

    • Seems better than localstack, which was way to big for this use case

    • Look into reliability (link)

      • Don’t force MinIO

  • [quest] (Jeremy) Should we configure and enable ?

    • Have Arbi-BOM try it out, talk to Axim about shared config if it goes well

    • Quick demo of how to find Dependency Review on PRs

  • [quest] (Hilary) Does core functionality belong in the platform or should an IDA be used if the functionality could be considered a complete service?

    • Ownership is much simpler for a separate service, leading 2U to often prefer this

    • Some operators of smaller sites struggle to manage multiple services, and prefer everything critical to be in edx-platform

    • Some things make sense as libraries or plugins that get installed into edx-platform

    • (2U internal)

2023-07-05

  • [rant?] (Andy) we keep getting m1 macs, how much lost dev time before we invest in them

    • Ongoing discrepancy between people having no trouble and those having nothing but trouble

    • Frontend is one of the areas some people have hit trouble in

    • Fragmentation in services being used is making reproducing problems hard

    • Projects ongoing - ARM images coming, for example

  • [quest] (Phil) service-to-service testing

    • We have Cypress for this at a very small scale, but want to lean into using Pact instead

    •  

  • [quest] (Jeremy B) - adoption next steps

    • We’ll be asking owning teams to start tracking tasks from this board that need attention from them

    • Feedback welcome on whether we should start doing this manually or do some of the proposed automation first

  • [ideation] (David) Reducing risk of deploying edx-platform

    • Requiring reviews on edx-platform - at the very least, an edx.org reviewer is required for an OSPR to go out on code we own within edx-platform?

    • Process for including community members on RCAs - we can have more information on why incidents happened

    • Process for halting incoming commits - should Clamps be extended to restrict the pool of committers to edx-platform?

    • (long-term) Requiring more frequent e2e tests on code shipment - hopefully gives us more confidence to deploy

    • (long-term) Core contributor read-only access to GoCD, or a website built on GoCD’s APIs exposing some info? Mergers are responsible for monitoring build? [eventually leading to a continuous delivery consortium]

  • [quest] Jeff Witt How does Transifex import work?  Is there a single source of truth?
    See – edX Design team may want to start using it.

2023-06-28

  • [analysis] (David) Web notifications vs. edx-ace

    • Not all stakeholders/the team doing the work isn’t present, but I could use some insight into whether we think these two are apples and oranges or what.

    • (John) Braze supports some form of web notifications

    • (John) Braze is handling transactional emails like password resets, not just marketing stuff

    • (John) The thing that we’re missing right now is persistence of the messages that have been sent

    • (Dave) I think Gabe is the only person from the original edx-ace development team who’s still in the ecosystem

    • (Dave) Possibly relevant discussion in #notifications-2023 in the Open edX Slack

  • [analysis] (Jeremy)  

    • (John) Postgresql has better offline/concurrent index creation (non-locking)

    • (Dave) Limits on length of indexes, etc. measured in bytes instead of characters

  • Any ~opinions~ on chatbot architecture? (Jeff Witt)

    • The accessibility of most chatbots is pretty poor

    • Has any serious thought gone into the frontend architecture for these chatbots?

      • (Andy) Allie has put some thought into it

    • Are we going to make any attempt to do side-by-side comparisons of the different LLM options?

    • (John) It’s unclear how well any of these work in non-English languages

2023-06-21

  • [question] (Andy) do we have any caching guidelines / review? Thinking of stashing some data in memcache.

  • [analysis](Hilary) course role scope and new course role management options

    • An Authorization doc in Confluence

      • Potential definitions:

        • System-wide Roles

        • Service-specific Roles

        • Course-specific Roles

          • It’s not clear whether these are system-wide or service-specific, or either or both, so not a great term, and maybe we need more terms.

  • [quest] [Robert]

    • Are edX course-discovery course run ids the defacto 2U course ids?

      • Applies to notifications, and other future work.

    • Who would own this discussion?

    • What is the 2U equivalent of an OEP process?

    • Cohorts/sections/course runs:

2023-06-14

  • [review] (Hilary) Current architecture for roles/permission sets and if time potential decisions around roles/permission set contexts/scopes

    • Presented a diagram summarizing current understanding; seems largely accurate

    • Robert Raposa is a good person to go to in order to verify some of the points of uncertainty

    • (Jeremy) OEP-9: User Authorization (Permissions) — Open edX Proposals 1.0 documentation may be useful for separating the implementation of a permission check from all the places that need to use it

    • (Phil) Note that at least ecommerce has permissions in Django admin separate from those in edx-platform

    • (Andy) Support are good people to ask what are the biggest problems/points of confusion with the current permissions scheme

    • (Phil) 2U’s Security Working Group & Ben Piscopo have talked in the past about the need for a TA role: some role between learner & instructor that doesn’t grant the waterfall of privileges that comes with being course staff.

    • Future related topic: scopes

  • [ideation] (Jeremy) If you could make one nontrivial change to Open edX to make it better/easier to work with in the future, what would it be?

    • (Dave) Require fewer resources to run - CPU, RAM, storage

      • Run the whole stack on a Raspberry Pi

    • (Alex) Magical observability (beyond New Relic levels)

      • Quickly trace why things happened and what they are

    • (Alex) Have a schema for content (metadata)

    • (Andy) Types!

      • Adding types to Python

      • JS -> TypeScript

      • Hilary: +1

      • (Jeremy) Arbi-BOM would love to work on the Python side of this

      • (Dave) How much of the code do we need to type before this starts becoming useful?

      • (Andy) cookiecutter might actually be the first place, so you get a typed thing when you get started

    • (Dave) [mildly crazy] Run Studio and LMS as one deployable thing (single platform)

      • (to be clear, I haven’t fully thought this through yet. :-P)

      • 2U wants more services, small deployers want fewer services. Is there a possibility of enabling either option for services, rather than forcing either that makes some happy and some sad.

        • (Dave) I think it’s possible to have “deploy this subset of apps as a different service” thing… I need to think about it more though. A lot of the simplification advantages of having things in the same place might be negated if we have to do the cross service thing anyway.

    • (Jeff) Test user accounts, so I don’t have to enroll in order to test it with e2e testing systems

      • In order to reproduce the UI as perceived by some users

      • Catalog of possible states

    • (Jeremy) Add support for PostgreSQL (dropping MySQL support optional)

      • Consolidate Elasticsearch and MongoDB into PG full text search & JSONB fields, at least for smaller installations

    • (Phil)

      • Learning: Unit-level table of contents

      • Documentation: Architecture onboarding

2023-06-07

  • [quest] (Jeremy) Which software industry news sources are people finding useful, if any?

  • [ideation] (David) How to handle frontend plugin error states when we can’t know if the contents of an iframe has failed to load. 

    • Best idea so far is to use a timeout, assume it failed to load if there isn’t a load successful event after x time since the load attempt event

  • [ideation] (David) Engineering and Architecture Onboarding information architecture

    • Came up because interns start next week

    • Some folks have made their own versions, apparently?

      • … can we find out who without judgment?

    • Onboarding IA

      • Dev environment setup

      • Culture

      • General tech learning resources

      • Tools

      • Separate out ‘meta onboarding’ for managers and people who are onboarding new hires

      • Glossary of… “domains”

        • Devstack (“if you’re a search and discover engineer, ignore everything about devstack”, for instance)

        • Terraform

        • Ansible

      • Doc categorizations - are these applicable to onboarding docs?

        • Explanations

        • How-Tos

        • Tutorials

        • Reference

    • People onboarding a new hire get some coaching

      • Being conservative about what you show folks in the beginning

      • Buddies need to know what to do

      • Coach people about how to onboard more effectively

    • How to find stuff?

    • Goals

      • Set small goals for the person onboarding that are appropriate for your team

      • “How can I get this person to commit a production change on their first day?”

      • “Getting devstack running on your first day is a win”

    •  

    • [Jeremy] Who should be in charge of making onboarding better?

      • (John) Feels like it should be a competency for engineering management, maybe even explicitly called out in the career pathway

      • (David) I volunteered to handle some of the docs side of this

    • (John) It would be awesome to have more onboarding material as courses hosted on Open edX

  • [quest] (Jeff) What’s up with the new video player project?

    • (Dave) Still in discovery:

2023-05-31

  • [Ned] Draft policy for granting write access to openedx org repos:  

  • [John] [inform] Adam Stankiewicz and I are demonstrating non-devstack MFE development (using stage auth and apis) at tomorrow’s FedEX meeting

  • [inform/quest] (Jeremy) Pact contract testing interest revival

    •  

    • Frontend WG considering it for testing MFE interaction with backends

    • Its stub server could also be used to mock back ends for MFE development

    • (David) It feels like course-discovery could really use this

    • (David) It’s easier than it looks like at first glance, but you really need to try using it to understand it

  • [ideation] (David) is there a good way to structure a django app plugin (or something) such that an operator could choose to run it in edx-platform OR as a separate service?

    • (Jeremy) Seems feasible with a minimal microservice wrapper to install the app into, especially if it communicates with the rest of edx-platform via signals (which could be transmitted over the event bus if needed) instead of direct Python API calls

  • [analysis] (Jeremy) Development metrics and tools for collecting them

    • Okay got acquired by Stripe for internal purposes, tool no longer available

    • under consideration

    • Atlassian Compass -  

    • (Andy) Proposed metric - time to verify locally if an attempted bug fix worked

    •  

    • Developer Experience Metrics

  • [quest] (Jeremy) Devstack & Tutor pain points

    • (Hilary) Mismatch between Confluence & Slack & verbal advice

      • Especially local vs. hosted devstack

    • (Hilary) Docker Desktop now asks you to confirm that your company has a license for you, which was a significant speed bump for installation

      • Maybe we should accelerate switching to Orbstack or Minikube or Rancher Desktop, etc.

2023-05-24

  • [ideation] (David J) Next steps for this meeting - announcements when it’s starting, perhaps?  Seems to work well for other recurring optional meetings.

    • Arch Hour is starting now at

    • Announced in #tech-dev-edx

  • [ideation] (Jeremy B) I’d like to enhance the console repo health dashboard I started during the Hackathon to make it useful for teams in identifying high-priority tech debt.  Any nominations for patterns/issues that should be especially prioritized?

  • [quest] (Jeremy B) Has anybody looked at JSON5?  Might be a viable option when considering YAML over JSON just to get support for comments.

    • (Chris) I just add “comment” fields to my JSON

    • Sounds like none of us have a burning need for this

    • We’ve come to terms with YAML, and it’s imposed on us by our software choices in many cases

  • [question] (Ned) I’m casting a wide net to find out what education people need about Open edX / open source / Axim / etc:  

    • If someone asks you a question along these lines, capture the answer

    • Have we linked to the docs we have for this in places where people who have these questions can find them?

  • [inform] (Dave) Axim has created a new GitHub org (aximcollaborative), and Axim-specific repos will be moved there.

  • [quest] (Jeremy) Does “platform engineering” sound substantially different from SRE to others?

    • Example:  

    • SRE vs Devops vs Platform Engineering | The New Stack

    • Basic idea: Create an internal developer platform for teams to self-service launch their own services

    • (John) A key enabling factor for these is having more of the config local to the service repo (12 factor app)

2023-05-17

  • ****[John] any updates on the future of devstack?

    • Hosted devstack vs. Tutor vs. off-the-shelf cloud dev env tooling

    • Need to set a budget for hosted devstack

    • We want to experiment with Tutor + Devspaces/Okteto/other

    • Want to start building arm64 images for new non-Ansible devstack images

    • Provisioning is a pain point

      • Still being on MySQL 5.7 is making this worse

      • SRE and AWS are working on it

    • We still owe the NPS survey results

    • Shortest path to improving the status quo?

      • Build arm64 images & upgrade MySQL

      • Improve the DB cache

  • ***[quest] (Alie L) Has anybody worked with before? I’m trying to understand if it is a viable option for getting/reading course staff roles from the LMS into an IDA (specifically special exams IDA)

    • [John] Enterprise is using it for what sounds like a similar thing

    • [Robert] Issue to fix/update OEP for Authorization:

    • [Alex - async] I can walk through this with you if you want, Alie.  I’ve got a good bit of edx-rbac context.  It will probably work for you, but it can be a little finicky and…non-obvious.

  • ***[John] [a deliberately provocative question] Does React make us faster or slower?

    • [John]

    • [John] It’s pretty complex; is it worth the complexity?

    • [David] Yes.

    • [Diana] Seems to be easier to work with than what we had before (Backbone/Underscore/jQuery)

    • [David] Part of it is trying to use what most other people are using successfully

    • [Jeremy] Some of the complexity and framework churn over time is due to site performance constraints - size and load time

    • [David] Arguably the MFE framework is too simple - too much divergence between individual MFEs

    • [Robert] A lot of this depends on “what are we comparing React to as an alternative?”

    • [Ned] Simplicity of the framework vs. simplicity of the code written using it

    • [Jeremy] If we want a concrete alternative to consider, I’d nominate Svelte.  Much smaller browser footprint, intelligent compiler, and very liked by its users.  But not nearly as widely adopted as React yet.

  • * [quest] (Dave O) What is the charset and collation used for edX’s databases now, and are there any plans to change them anytime soon? (I’d like to get everyone on utf8mb4 and more modern collations, but I don’t know what the state of things are for the big installs, or how painful the migration is)

    • [Andy] I think it may vary across DBs, we had to look at it for some of the insights work. Alison Langston may remember more.

      • [Alie] did look into this briefly for insights data pipeline work, because we ran into a bug with getting specific unicode characters through the data pipeline. Turns out it was a MySQL bug that had not yet been patched in the Aurora DB we were using (so not related to collation)

    • [Jeremy] It’s planned out, but not yet executed (still on utf8).  Blocked on the MySQL 8 upgrade, I think because 5.7 doesn’t have the collation we want.

  • [Inform] (Andy) pushing for open sourcing most of the summary AI stuff, we’ll see how it goes

  • [inform] (Jeremy B) Another Thoughtworks Tech Radar is out.  Does anyone want to review and discuss, either now or after time to read it?

  • [inform] (John) edx-rest-client now automatically forwards request ID headers, for traceability purposes

  • [quest] (Jeremy B) Does look like it could help smooth out the learning curve for our tech stack?  (The content and/or the tooling for creating such skill paths.)

    • Reasonable?

    • What do these boxes mean?

      • They make sense to me. ¯\_(ツ)_/¯  

    • Actively harmful:

2023-05-10

  • [Alex] [question] Do we have any collateral/best practices/etc about caching strategies (not tactics)?

    •  

    • (Dave O. presenting at 2023 Open edX conf.)

  • [John] we still have 2 jira instances? What’s the timeline on combining or sharing access broadly?

    • [Andy] Thinks the hosted Jira will probably live for a while longer - there’s a lot of idiosyncrasies that make it hard to “jump” from this one to the “other” Jira instance.

    • [Ned] We’ll eventually get migrated off of “server atlassian” (2u-internal) and onto “cloud atlassian” (TODO: name of other atlassian?)

      • Atlassian is going to eventually stop supporting their server version - we’re not going to be able to run it on our own, eventually.

    • [Alex] Who’s the right person or slack channel to answer this question?

      • [Ned] Ned is an ok person to ask, because he’s on the committee that’s deciding this stuff.

  • [David] Move this meeting a bit later, tack “office hours” onto it, and announce it’s happening?

    • [Yes votes] ++++++

    • [No votes]

    • Shift to post-lunch Eastern time slot

      • This basically excludes Pakistan & South Africa folk, but they haven’t been coming anyway

    • The hardest thing: naming.  Is “office hours” somewhat misleading?

      • “Architecture de-couples therapy”

    • What’s a good shared calendar for this to live in?

      • [David] Can add to the platform team calendar

      • Tech-dev shared calendar would be good, but we need admin/IT help.  Also, this calendar is possessed by demons of randomness.

  • [Emily] Ned, anything interesting at PyCon?

    • [Ned] Yes, is currently curating a list of recommended videos.

    • PyCon: 

      • was two, 2-day tutorials (hands on, ya gotta pay extra for that)

      • More structured talks for the rest of the days through end of week…

      • …and then sprints for a couple of days.

    • It’s in Pittsburgh the next two years, you should go!

    • Ned gave a keynote talk, you can find it on his website, probably.  

      •  

    • DjangoCon: you still have 5 days to submit a talk proposal.

      • In Durham, NC, mid-October.

  • [John] Inform: SRE has finished injecting X-Request-ID headers on everything.

    • Right now, you’ll see them only in the nginx logs.

    • We can start logging this with middleware now probably - like, log the request ID on every log line.

    • We can also re-use these request IDs when passing requests to services to get a distributed tracing sort of thing.

    • John will come and bother arch-bom about how to do these two items.

2023-05-03

  • [quest] (John) What’s up with course-discovery staying (or not) a core part of Open edX?

    • What’s “in/out” of Open edX is a complex decision point that isn’t well understood.

    • The Open Source Process  working group is happy to hear about friction, and find ways to reduce it.

    • There have been some recent conversations around potentially moving large chunks of course-discovery to the event bus instead of celery jobs / data aggregation

      • How do we start publishing more events to the event bus?

        •  

  • [inform] (Ned) GitHub teams in the openedx org will be cleaned up in June: claim ones you still want:  

  • [quest] (Jeremy) Next steps on Maintenance Board adoption -  

    • Some of the stuff here is for code that we honestly don’t know if we want to keep or not

    • We need a product-side DEPR process

    • Some of the problem is difficulty to test/understand code you don’t always work in

    • Get maintenance explicitly in org OGSPs

      • Justify as enabling ability to rapidly pivot to new priorities

    • Fix our ability to revert and fix-forward, so there’s more confidence in merging PRs with no obvious flaws

    • May be time to revisit and re-promote the Arch Manifesto

  • [Andy] the chaotic FOMO of ai projects

    • [Jeremy] Feels like the best uses have the lowest visibility.  Maybe couple with a big announcement highlighting the improvements?

    • We need a product person actually responsible for making decisions on what we will and won’t use AI for

  • [Andy] Orbstack looks cool as a Docker Desktop replacement:

2023-04-05

  • [Jeremy] I periodically maintain .  Is this useful?  Do people still get value from technical conferences?

    • Many people think of conferences in terms of the talks, because that’s what gets hyped a lot. But with those now often available online afterwards, the most value is often in other aspects.

    • We should highlight the networking benefits and the value to the company of being seen attending conferences. This page probably needs a “how and why to attend conferences” section or link.

  • [Ned] (what else?) Hackathon last call

2023-03-22

  • [John] Request IDs are in progress, can we move it along?

    • [inform] this is close to working, it’s enabled on stage

  • [Ned] codecov: do people use it?

    • It’s too expensive for private repos

    • It sometimes fails, which is a distraction

    • “More of an impediment than an enabler”

    • Bad: it complains if you delete covered lines

    • “Occasionally useful to indicate tests to write”

    • “Too binary: .1% off shouldn’t be a failure”

    • “Not sure we’re getting even $500/mo value from it”

    • Tangent: coverage metrics at all

    • We’re interested in getting broader feedback from devs, though.

  • Hackathon

    • Ideas:  

    • ChatGPT foundations

      • Credits for experiments?

      • OpenAI account is hard to get

        • Trying to satisfy Legal

      • GPT3.5-Turbo is better, we should use it, and you can generally access this model via standard usage of the open AI API.

      • Will people need to spend money during the hackathon, and if so, how much?

        • “A team could spend $20 over the course of the hackathon”

        • “Multiple models, each with different capabilities and price points. Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.”

          • 3.5 turbo is $0.002 per 1,000 tokens.

          • Assuming 250 words per page of a high-school essay, $20 would buy you a 30,000 page high-school essay.

    • Legal has concerns about 2U information being fed to OpenAI.  Not clear how to get approval for experiments.

2023-03-15

  • [Jeremy/Tyler] To what extent should we lean into Kubernetes for development environments?

    • [Tyler] Devspace deploying into clusters

    •  

    •  

    • [Jeremy] What are the top concerns people have on a new dev environment?

      • [Jeremy] What should our next steps be?

      • [Andy]

        • Need to be able to wipe and reset state

        • Recently, slow - moving into the cloud feels like it would fix this

      • [John]

        • Not run all services to run 1 service

        • Tight coupling means lots of setup (~1.5 hours) to test a small change

          • [Alex] What exactly causes this?

            • [John] Anything that changes between your last devstack build can cause extra troubleshooting

              • [Alex] Wonder if we need snapshots.

            • [Phil] Our requirements are not always pinned.

      • [Tyler] Do we have a list of dependencies?

        • [Jeremy] the docker-compose.yml of devstack.git is what defines these, for things added to devstack.  There are many new services of late that don’t make it into devstack (e.g. every enterprise service that’s not a library).

      • K8s?  It’s hard to find a concise articulation of using k8s for a development environment and benefits of such, vs. docker-compose.

        • [John] Is k8s going to be painful because not a lot of people [in the world?] are running it for local development.  Even if it’s a technologically superior choice?

        • [Jeremy] We designed Open edX originally to deploy each service as a distinct VM with some communication layer around it.  Which translated fine to docker-compose, but it’s not built in a way to take advantage of k8s features.

      • [Tyler] Suspects data is the first big problem to tackle

        • [Jeremy]  

        • [John] A nice property of staging environments is that the work to set up application data is shared/re-used/extended amongst different devs and teams in the course of testing and verification.

        • [Andy] Would it be better to regularly re-import data into stage, even at the cost of people losing recent modifications/additions they’ve made?

        • [John] Describes an idea where e.g. sales demo account live on the staging environment, so that actual humans are populating realistic data on stage, and then you get pretty good data on staging for free.

      • [Jeremy] Candidate next steps

        • Create and maintain better stock data

        • Pick a cloud dev deployment approach

        • Clean up our configuration story to require less deployment-specific customization

      • [Jeremy] Necessary step: smooth the k8s learning-curve.  You shouldn’t have to fully understand k8s just to start doing development.

        •  

        •  

2023-03-08

  • [John] [question] Thoughts on use of Django message framework. 

    • [John] Django has a built-in notification library. We have 3rd party tools (social-core) that use it and the messages are not making it to the MFE.

    • [Robert] 

      • Payments might be wrapping messages and delivering to an MFE.

        • See

      • Django messages is probably designed for delivering user messages within a single service (shared session) across multiple calls.

    • [Phil] Do we usually use Braze for this?

      • Braze has in-app messaging, but there’s probably a better/more native way to do this. Don’t want to increase our dependencies.

  • + [Robert] [analysis] Discussion of blurred boundaries and varying needs of L&P and Marketplace within the monolith.

2023-02-22

  • DJ: [inform] Building out a few wiki pages for Solution Review:

  • *** DJ: [ideation] Offsite (onsite?) Open edX LMS boundaries discussion with tCRIL et al

    • This isn’t “2U + tCRIL” - we shouldn’t think of it that way

    • What about doing this during the conference?

      • Birds of a feather?

        • Use this time to figure out the shape of it and gauge interest

      • Friday is all scheduled out as working group time

    • David wants to discuss further with Robert

  • *** Ned: [question] Do people feel like they are in tune with what tCRIL is doing?

    • Example: WooCommerce vs Commerce Coordinator

      • “2U is currently building a next generation commerce platform (commerce-coordinator 1) that is extensible and pluggable. However this platform may be more complex and require more technical capabilities than all operators have. We want to see what it would look like to try to integrate the Open edX platform directly with a 3rd party commerce platform that we do not need to maintain. Is there a possibility that we can have a simple implementation that will work for smaller deployments that would complement commerce-coordinator for larger deployments?”

    • (David) Following Discourse helps, especially the Announcement space

      • Had trouble getting notifications for anything smaller than “everything”

        • DJ: Quibble - notifications and emails in discourse are two unconnected systems… notifications may work reasonably, but it requires you to show up at the site to see them.

    • (Ned) Discourse in particular is polarizing - tCRIL recommending as primary means of communication, 2U largely ignoring it

    • (John) Sudden change made by 2U in discovery for representing 3rd-party content

    • (Jeremy) I track it pretty closely, but that’s a large chunk of my job and it leaves me basically no time for coding

    • (David) Do we need an Open edX activity digest?

      • (Andy) Curation would definitely improve the signal/noise ratio

    • Very large digression about the competing complexity needs between most of the community and 2U

2023-02-15

  • [andy] 2u infra / edx infra comparative service spin up time

    • Takes about 2 weeks for edX

      • Cookiecutter makes the template fast

      • There’s a bunch of manual configuration lookup settings

      • Terraform has to be done somewhat by hand

      • Instructions were still a bit sparse at the time

        •  

    • Takes about 1 day for “2U” (mostly self-service)

      • Does a lot less, but up much faster

      • This was for a Node project in k8s by someone with experience doing this

      • Looking to see if there’s a doc for that process

      • Doesn’t need to connect to the LMS or other Open edX resources, which simplified things a lot

    • We should review the instructions to identify the manual and/or tricky parts

  • [quest] (Jeremy) What are the best resources you’ve seen for learning the basics of Kubernetes and when/why you should choose it?

  •  [quest] (Jeremy) What are good next steps to put all the things in k8s?

2023-02-08

  •  [inform] (Jeremy) I wrote a first draft of , feedback welcome

    • [praise] (Alex) I love this table of contents!

  •  [inform] (Jeremy) Also wrote , again feedback welcome

    • (Andy) There needs to be some incentive on the project team side to follow this process.

    • (Robert) Maybe this needs an injection into our OGSPs.

    • (Jeremey) Northwards, the sentiment is that engineering managers should do this triage.

    • (Jeremy) From “Making Work Visible” - often, not enough importance is placed on “maintaining revenue”, often err too much on the side of “generate new revenue”.

    • (Phil) Discusses the nature of sec working group dropping work into other teams’ backlogs and the importance of clarifying the prioritization of that work - often Sec WG starts a thread in a slack channel and it gets treated as a CAT-1, where in actuality, it should be treated as a CAT-3.

    • (Phil) Can we triage SEC issues onto this maintenance board?

      • (Jeremy) I hope so, but we’re still collecting feedback on this board/process.  Hopefully this board allows us to “see the problem of everything being stuck” and come up with solutions to un-stick us.

    • (Robert) Is reminded of the necessity of making PRs as small as reasonably possible to increase the probability that it’s properly reviewed by an owning team.

      • (Jeremy) On the flip side, there’s so much overhead in getting attention onto the PR that it incentivizes jamming more work into a PR (example feedback from Open edX community).

  • [quest]  (Alex) How do we talk about “big things”?

    • Examples:

      • (from Robert) “Our marketing site, purchasing capabilities, enrollment handling, etc. all came from a legacy view of the world where the Open edX LMS was the home of all courses. This is no longer true, at least from a non-technical perspective. What conversations and efforts are happening around the long-term capabilities and boundaries will be? I know we’ve made lots of short-term decisions to just get data to the places that were already feeding the marketing site as quickly as possible. I don’t think that is the ideal long-terms solution, but wondering where and if this is being discussed.” 

      • The LMS user identity OEP is another good example - we found a bigger solution that was generally preferable, but we stuck with a kind of local maximum instead (the existing numeric LMS auth.user.id)

    • (Jeremy) People have limited short term memory, and it’s hard to reason about big, complex things.  But we have very good visual processing centers in our brain.  So make a visual representation to facilitate a good discussion about “big things”.

      • (Alex, John) Having the best picture you can up front, discussed synchronously with a somewhat small group of people has worked well for us very recently (and probably historically, too).

      • (John) The nature of micro-web services gives us constraints that actually makes problems more approachable.  Good microservice design can help promote team behaviors that reinforce autonomy, ownership, etc.

    • (Andy) Having the big picture and then farming out pieces of the whole to smaller teams works well when there’s a designated lead who’s in charge of the big picture.

      • It’s difficult to balance this for the lead and the teams - teams want autonomy and agency, and often leads would prefer the teams to be autonomous (either altruistically or lazily).

      • (Alex) How do we support leads on projects like this?  Is there a doc or something?  Would there be value in having an artifact like this?  E.g. “who has authority to decide if X is a reasonable thing to do, and can you do it now?  Can your team make a local decision about X now? What does ownership mean in terms of responsibility?”

      • (Robert) [Alex missed this, but it seemed important, please fill in if you can…something about docs or process Julie was putting together?]

      • (Everyone, always) plug for using Pact - if these tests break, we know you’re breaking your promises as an owner.

        • (Jeremy) QA team is starting to talk to Vanguards about expanding usage, talking to Arch-BOM and Arbi-BOM to find more partners to actually use it.

        • (Jeremy) Threatens again to have a Pact representative come talk to us.

    • (John) Being consensus-driven often makes us want to “get it completely right” up-front.  But no matter what, once we start writing and deploying code, we’re going to realize we were wrong about assumptions and decisions, but that’s why we iterate.

    • (Everyone) Notes that we have one or two docs from Andy and Robert each about being a lead (or subsets of that).

      •  

    • (Andy) One major benefit of declaring a lead: making clear the things we will _not_ do. Committees are far more apt to say “yes” to too much.

    • (John) Do we have too many “dotted lines”.  Are our hierarchies too gentle? Who is going to make the strong-handed, hard decisions?  And are those people managers?

  • [quest] (John) There’s also lots of little things where we come up with good ideas to address them, but they get lost or stalled. How do we make the outcomes of the discussions actionable and then get them done?

2023-02-01

  •  [inform] (Jeremy) Feedback and assistance welcome on Dev Environment Features

  •  [inform] (Jeremy) Maintenance board format updates:  

    • [Robert] Does it make sense to also start using this for product work that requires cross-team review?

      • [Jeremy] Probably, especially if the volume or latency of that work increases

  • [inform] (Ned) repos can now get external PRs added to GitHub projects 

    • At least 4 teams are using GitHub Projects actively to coordinate with other Open edX community orgs

  • [inform] (Robert/John) Web requests greater than 1 minute are erroring, even though they don’t look like it in New Relic. 

    • Load balancer is returning an error code to the user, but New Relic doesn’t see it

  • [ideation] (Phil) Does anyone know of good API spec authoring tools?

    • Phil: Is doing this in a Google Sheet and getting unwieldy

    • Alex D.: Capturing in .rst as an ADR for the repo where the API lives

      • Benefit: reviewable, can use Git for version control

      • Robert: There is a list version of tables for rST files that is much easier than the default.

    • Jansen: In some places we have some Open API code annotations that does autodoc.

    • Alex D.: There is also a library called drf-spectacular that will let you annotate in a cleaner way.

    • Andy: Stubbing out the code is probably easier. Fan of writing a doc, but keeping it very small. People don’t read documentation, so it’s better to do in code.

    • Alex D.: Benefit of ADR code - you’re basically writing the API if it’s default enough, and you will make decisions, which is magic since it’ll be in the ADR already.

    • Alex D.: Answer may vary by use case. It’s always hard when doing multiple teams. 

    • John: We’ve been thinking about using Pact! There’s been a previous all hands on this. It will auto-create the tests for interfacing with external APIs.

    • Jeremy: Gave a mini-Pact 101. See existing work in . We’re still learning how to integrate this at edX. The QA team is the main owner right now.

    • Phil: Thanks Alexander Dusenbery Jansen Kantor John Nagro Jeremy Bowman Andy Shultz!

  • [ideation] (Jeremy) Any particular technology or process pain points you’ve been feeling recently?

    • [John] Lack of distributed tracing

    • [Robert] Not having per-service configuration-as-code New Relic settings

    • [Robert] See  [Observability] Enabling distributed tracing for LMS workers #174 

    • [Jeremy] Lack of clarity around who’s responsible for keeping PRs moving from creation to deployment

      • Have added “Author Team Review”, “Owner Review”, and “Approved” columns to some Kanban boards to shed light on this

    • Otherwise, not many frustrations it seems

2023-01-25

  • [inform] (Jeremy) Making Work Visible (Summary)

  • [very quick inform] (Ned) Hackathon?

  • [inform, request for feedback] we’re planning b2c subscriptions, please see this tech spec if you’re interested in providing feedback B2C Subscriptions - Programs MVP Tech spec  

  • [analysis] (Jeremy) Maintenance workflow automation

  • [request for feedback] (Ned) roadmap items for continued decoupling/integration/tcril-etc 

    • What automation would help you?

    • What information are you lacking?

  • [quest] (Robert) As the rest of 2U moves onto our Confluence wiki (https://2u-internal.atlassian.net), do we want any best practices documented?

    • [David] Do we have any best practices? (half joke, half serious) 

    • Be sure permissions are what we want (permissive)

  • [quest] (Alex) What is our current software development methodology?  Like, are we still an “Agile” organization?

2023-01-18

  • *** [Phil] [quest] Poll on ticket creation & different practices

    • Who is authorized to create tickets?

    • How do you add tickets to other teams’ backlogs?

    • How do you tech leads approach ticket creation?

    • How long do your teams spend in grooming?

    • [Ned] Jira and/or GitHub Issues?

      • [Phil] Both

    • [Andy]

      • Everybody should be able to create tickets.

      • Tech lead lays out tickets, but follow on tickets are very common.

    • [John]

      • Product Manager helps prioritize.

      • All new tickets land into “For Grooming”.

      • Acceptance criteria definition is part of grooming.

      • Also agree anyone should be able to create ticket

    • [Jeremy]

      • Previously, have seen teams accumulate cumbersome piles of tickets.

      • Maybe deferring tickets to get them out of the main backlog is the right solution.

    • [Chris]

      • Is there a way to prioritize/sort GitHub issues?

        • [Ned]

          • Issues in projects can be in separate columns

          • Can have priority fields

    • [John]

      • What is the specific pain point?

        • [Phil] Spending too much time grooming.

      • Grooming is really a ROM (rough order of magnitude).

      • Pointing is S/M/L.

    • [Robert]

      • Discovery & bug tickets are good for uncertain efforts and can save you grooming/research time.

    • [Andy]

      • Philosophy: more tickets are better if the purpose of the ticketing system is work tracking.

      • This means that you need to reduce ticket creation overhead. Tickets should not be hard to create.

    • [John]

      • Linking tickets is also really powerful.

    • [Robert]

      • For tickets for other teams: CRs are a good way to create tickets for other teams.

  • **[Ned] (inform) Hackathon -> #interest-hackathon 

    • Join if you’re at all interested!

  • ** [Jeremy]  

    • [Ned] GitHub blog posts is a fantastic source of information about updates to Git core.

    • Subtopics:

      • Clients

        • Users may not have admin rights

        • Users may have multiple copies of Git on their machine

      • Servers

    • [Ned] Follow-on: implicit credentials?

      • Would love ideas on how to make local Git need more authentication to access his GitHub account

      • [John] Hardware keys are one way to do this.

  • **[Jeremy] has a migration name check (to make sure you gave the migration a more useful name than 000x_auto_<timestamp>) and some functionality for testing forward and backward migrations.  Does that sound useful enough to try it out?

    • Consensus: yes

  • *[Jeremy] If you haven’t already reviewed , please do so

    • Test factories

  • [Ned] (inform) 2U is not in our wiki

  • [Jeremy] Maintenance GitHub Project board: https://github.com/orgsn/edx/projects/17/views/1

    • [Ned] Why is it private?

      • [Jeremy] Leftover from when I was collecting initial feedback, just made public

  • [Feanil] CFP Closes next Monday

  • [Ned] Munch & Learn on New Relic at 10!

2023-01-04

  • [John] [inform] Quick inform about indexes with Aurora.

    • John may update

    • non-blocking index creation with Aurora2/Mysql57

  • [Ned] Last plea for conference talk ideas:  

  • [Phil] [inform] In re. finding DB performance information in New Relic, Chris Pappas showed me a couple of these recently! (Led to this. All links for Ecommerce.)