Arch Hours: 2023

Meeting Expectations

Why?

  • Provide an opportunity for generative discussion and ideas.

  • Foster comradery through technical curiosity and geekdom.

Who?

  • Open to all edX-ers and Arbisoft-ers

What?

  • At times, these informal discussions result in follow-up action and beneficial change in our technology or in our organization. While this is not a decision-making body, these serendipitous discussions spark ideas that may result in ADRs/OEPs and tickets on team backlogs.

  • At times, it serves as a form of informal office hours to ask live technical questions of the archeological collective.

  • At times, we have pre-planned deep-dive topics that folks propose to gather wide-input or to answer questions.

  • At times, we have hosted special guests (internal and external to edX) on specialized topics.

When?

  • Not lunch hour in ET timezone: With Covid remote work, "Arch Lunch" has evolved into “Arch Hour” in order to accommodate various home/life situations during lunch time.

How? Live Co-Editing

To circumvent Confluence’s limitations with the maximum number of concurrent editors:

Why not just stick with keeping the notes in the Google doc?

  • Google docs are not as discoverable.

  • Google docs don’t notify observers of future edits.

  • Google doc comments don’t notify all observers.

How? Structure

Please enter your proposed topics for discussion.
When we use Lean Coffee Style (link1, link2), we vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).

Prefix your topic with your intention so we are clear on what outcome you are striving from the discussion. Examples:

  • [inform] You are simply seeking to inform the group of this item. You may field clarifying questions from the group on your inform, but not seeking further discussion at this time.

  • [ideation] You are seeking divergent and wide perspectives from this group. In this brainstorming mode, all ideas are accepted, without critical analysis.

    • It may be helpful to clarify whether you’d like to ideate on the problem space or the solution space.

  • [analysis] You are asking the group to help you poke holes in your idea/topic/plan/etc.

  • [quest] You are seeking information/responses to a question you have.

2023-12-20

2023-12-13

  • [quest] (Dave): How’s the MySQL 8.0 switchover going?

    • Scheduled for 2am tonight

  • [quest] (Dave): What’s a good way to roll out database connection encoding changes to Studio for http://edx.org ? (configuration repo help)

    • Dave to make issue in configuration repo (?) to track this.

    • This may be overridable in edx-internal (which would allow for faster rollback)

    • Jeremy will see if we want to turn on Issues there

  • High-level, external-safe discussion of recent 2U staff meetings

  • [ideation] (Jeremy) 2U executive management has been talking a lot about the “Open edX maintenance burden”.  Where are teams feeling this, and how much time is it taking?  What parts of it would we actually be comfortable handing over to other organizations to handle?

    • Fixing bugs?

    • Merging dependency upgrade PRs?

    • Big framework upgrades?

    • Roadmap decisions?

    • Reviewing changes from outside the core owning/maintaining team?

    • Deprecating stuff that’s no longer useful?

    • Building extension points so optional features can be added without being added for everyone?

  • [quest] (Jeremy) Are there more things like the Insights stack that are effectively only used by 2U, and should be deprecated as far as Open edX is concerned?

    • And how should we proactively identify things like this moving forward?

    • (Andy) At least a yearish ago there was at least one other insights user

  • [quest] How deep are architecture vendor commitments?  What failover features are there?

2023-12-06

  • [Ned] What is bad about installing dependencies from GitHub?

  • [Jeremy] What would people want to see from an Open edX maintenance working group?

    • Expertise about how to use Dependabot and Renovate

    • “Error budget” for teams is useful

    • Test suites that are comprehensive enough and fast enough to run

    • Standard way to avoid trying to upgrade to a known-broken release again somewhere else

    • Clear path from identification of unwanted dependencies to deprecation of them

    • Path to consistently applying new good code patterns across our codebase

  • [Jeremy] What are leading causes of delayed/lost PR reviews on your teams?

    • [Andy/Cosmonauts] Big, intimidating PRs and lack of context after an ownership transfer

    • [Chris] Variety of different reasons

    • [Hilary] PRs that span ownership boundaries

    • [Andy] High/unclear level of responsibility from approving a PR

2023-11-30

  • [inform] (Dave) Submit Open edX conference talks!

  • [AS][Wild Speculation] Are we in a startup runway situation now?

    • Should we consider more extreme short-term measures than we normally would, to minimize operating costs in the meantime?

    • Axim is coordinating a pile of funded contribution (FC) projects to make various improvements already; not a lot of bandwidth for new feature requests right now.

    • Spent a lot of time discussing this, hard to distill it into key points

2023-11-15

  • [inform] (Jeremy)  

  • [musings/questions] (Ned) Mapping people/squads/repos

  • [Question] (Hilary) If we’re trying to design new db for a model, who is the best person/people to talk to?

    • Solution Review

    • Dave Ormsbee at Axim

    • Many principal/staff engineers

    • No real DBAs, but…

    • At least for a while, we have part-time contract DBA via Percona

    • Also, consult  

  • [inform] (Jeremy) Socket.dev - tool for tracking dependency health 

  • [quest] (Jeremy) Awareness of Django’s transaction.on_commit() and similar tools which are incredibly useful in the right circumstances.  How do we make sure developers are aware of / informed of these when appropriate?

    • New “everything about concurrency” guide?

      • Celery

      • Transactions

      • Event bus

      • Django async

  • [inform] (Alex) Enterprise likes drf-spectacular

2023-11-08

  • [inform] (Dave) New Relic is a great resource!

  • [quest] (Dave) There are multiple places where it would be beneficial to add indexes in large tables (~10-100M+ rows?). I don’t think they’ll lock anything, but would the length  of time it takes to run the migration be disruptive to the release process? What’s a good path forward so we don’t take edx.org by surprise?

    • Please hold off until MySQL 8.0 update has completed.

    • Long term: Need to do something about partitioning CSM.

  • [ideation] (Dave) Serving assets with Caddy (and maybe nginx?) via X-Accel-Redirect.

    • https://pypi.org/project/django-sendfile2/  

      • Dave: FWIW, I wasn’t planning to use this. I think folks have standardized on the header since this was written, it doesn’t seem like it’s completely up to date, and I’m planning to do a lot of header tweaking anyhow.

    • Can make the container run Caddy and have that work in devstack as well.

    • Actual nginx configuration still happens via the configuration repo–can talk to TNL for testing help.

  • [question] (Ned) why are there so many old renovate pull requests open?

    • 86 created before 2023: ​​https://github.com/pulls?q=is%3Apr+is%3Aopen+author%3Aapp%2Frenovate+org%3Aopenedx+created%3A%3C2023-01-01

  • [question] (Jeff) Any special security considerations re browser extensions?

  • [question] (Jeff) AGPL / borrowing Canvas code concerns?

  • [question] (Jeff) How best to query for the presence of SRT captions files for videos?

2023-11-01

  • [inform] (Ned) GitHub pull request labels can make Jira issues

    • Get in touch with Ned if you want to enable this for specific repo/project pairs

    • TNL & Aperture are testing it out

  • [inform] (Andy) similar to what is going on with edx-search, what’s going on with enrollments

    • Just edit it if it’s wrong 🙂

  • [quest] (Ned) Where are we on cypress testing?

    • Have functional (although sparse) e2e smoke test suite; wasn’t too much easier to set up than bok-choy, but does seem to be easier to maintain

    • There have been mentions of expanding the smoke tests and/or using it for a11y CI checks, but no concrete plans yet

    • Requires JS experience and a review of the “lessons learned” docs we have so far

  • [quest] (Jeremy) Do people think Conventional Comments are a good idea?

  • [ideation] (Jeremy) Mitigating pressure for 2U & Open edX code to diverge

    • 2U concerns

      • Security of releases

        • Including regulatory compliance

      • Ability to deploy changes quickly

      • Extra deprecation overhead (relatively minor point)

    • Axim/Open edX concerns

      • Codebase clean of 2U-specific cruft

    • Shared worries

      • Time needed to reconcile divergent branches

      • Risk of permanent divergence resulting from logistics rather than intent/benefit

    • Ideas

      • Try to make sure that core committer merge process satisfies core regulatory requirements: require at least one non-author approver, have at least a PR (if not a separate issue) to track the rationale for each change, etc. (this may also benefit companies other than 2U in satisfying regulatory needs)

      • If necessary, use fast-forward-only release branches for edx.org that lag main/master just enough to perform any mandatory review for commits from non-2U-employees/contractors (but do we really need this, given that we don’t do it for other dependencies?)

      • Document and expand the range of options for quickly getting changes to production via config changes, extension points, temporary forks, etc. that don’t require putting 2U-specific code in Open edX

2023-10-25

Skipped so 2U developers could stay focused on Innovation Week (hackathon).

2023-10-18

  • [inform] (Ned) Innovation Week!!1!

  • [quest] (Jeremy) Does anyone have a security/vulnerability scanner they would recommend?  Dependabot can’t track both main/master and a release branch.

  • [informish] (Andy) edx-search research (also 2u specific), maybe we should do more of this, looking at enrollments now.

  • [Question] CourseOverview has an org string field, not a fk to Organization - Any insight into the architectural reasons for this?

    • CourseOverview predates the organization table, and is essentially a cache of data in MongoDB

2023-10-11

2023-10-04

  • [quest] (Dave) How is the MySQL 8 upgrade coming along?

    • (Jeremy) Largely done except for prod LMS, which is rerunning a days-long mitigation schema change

    • (Jeremy) We just switched devstack over

    • (Jeremy) Django 4.2 LMS upgrade involves password hash change, Arbi-BOM discussing options with BTR

  • [quest] (Dave) Is 2U using OrbStack now?

    • (Jeremy) Several individuals are, still going through vendor review for broader adoption

  • [quest] (Jeremy) Does anybody have thoughts on gaps in Kubernetes for use as a dev environment?

    • (John) Test data is needed regardless of which dev environment we go with

2023-09-27

  • [inform] (Ned) 2U has signed the Axim non-technical contribution agreement.

  • [inform] (Ned) Next 2U hackathon (“Innovation Week”) is Oct 23-25

  • [inform] (Jeremy) 2U vendor review triggers & consequences

    • Install things in a virtual machine with no VPN access to test them?

  • [quest] (Jeremy) What technology/tool do you think would best help you get good work done, but we just aren’t set up to use it yet?

    • (Matt) AI coding assistants?

      • (Jeremy) There are interesting and unanswered questions around inadvertent copyright infringement and copyrightability of the generated code

        • (Andy) Solvable via custom LLM trained on our own code?

      • (Hilary) Would this exacerbate the problem of not having enough people with deep knowledge of different parts of our systems?

      • (Matt) Maybe this will help us get knowledge bases that actually work

    • (Andy) OrbStack!

    • (Andy) Type hinting?  TypeScript?

    • (Hilary) prop-types in JavaScript?

    • (Matt) Cloudflare AI tools

2023-09-20

  • [Ned] Axim would like to remove admin access to repos (this is different from the write access topic I mentioned last week).  What concerns do people have about that?

    • [John] Is it likely that we’ll end up with a bunch of edx-org forks as a result of this?

      • [Ned] Feels unlikely at this point, lots of extra work

      • [Ned] More likely that we start using personal forks more often

    • [Ned] When do people really want admin access, anyway?

      • [Jeremy] Updating branch protection (required CI checks)

      • [Andy] Initial repo setup

  • [David] 2U Internal Marketplace update

    • [Jeff] Will the frontend be using Paragon or something else?

    • We think we’ll still have something like commerce coordinator, but it may have a more limited, appropriately scoped API than originally envisioned

    • [Jeremy] Arbi-BOM noted that we’ve been wanting to get rid of django-oscar & ecommerce for 2 years, now needing to upgrade them again for another Django upgrade

  • [quest](Hilary) - api access

2023-09-13

  • [inform] (Hilary) OEP-66 PR up for review

    • Includes the relevant bits of OEP-9, drops or updates others

    • Bridgekeeper vs. rules

  • [quest] (Jeremy) Has anyone worked with Kompose or any other tools for 1) migrating from docker-compose to k8s or 2) allowing deployment of custom subsets of related microservices in k8s?

    • Not in this crowd

  • [question] (Mat) Infinity is looking to understand if Notifications belongs in edx or openedx. There’s not been a lot of understanding where this feature should eventually land.

  • [question] (Dave): The cut-off for Quince is coming soon (October 9th). Will we be on Django 4.2 for edx-platform? MySQL 8.0?

    • Devstack should be switching over to MySQL 8 within a week

    • Code is largely ready for Django 4.2, last PRs are being finalized and merged

    • Trying to get edx.org on MySQL 8 before updating the default requirements

    • If we miss end of September for that, we’ll try to make Django 4.2 the default with alternate 3.2-using requirements files for edx.org to use (annoying to implement in edx.org deployment pipelines)

    • BTW:  

  • [inform] (Ned) Long-term we are going to be reducing 2U access to all repos, and give teams write access to the repos they need.

  • [request] (Ned) Please educate new devs about the public aspects of much of our code.

    • Branch naming

    • Avoid links to private data

2023-09-06

  • [quest] (Robert) It seems the LMS auth_user table doesn’t have a modified date.

    • Context: https://twou.slack.com/archives/C04ACDVM6A1/p1694010936917439?thread_ts=1694010901.764199&cid=C04ACDVM6A1  

    • Is there a way to determine this?

    • If not, what’s the best way to resolve this at our scale (at least going forward)?

    • Django offers support for custom user tables, but edx-platform predates the feature, and it would be a fairly painful switch at this point

    • We sent tracking logs on most field changes, but not all

      • first_name and last_name are ignored, for example

    • How about django-simple history?

      • Would give us a full audit record moving forward, but likely to be massive (and probably overkill)

    • How do post_save and post_commit interact?

  • [quest] (Deborah) Is there a documented general principle about what log level to use?

    • https://docs.python.org/3/library/logging.html#levels

    • Different teams have logs in Splunk, New Relic, and/or DataDog

    • If we did a major change of log formatting, we’d probably want to talk to the community first

    • We should probably have some kind of OEP/ADR for Python logging in general

  • [inform/quest] (Robert) How can we quickly determine the actual impact of a category of error?

    • Number of users impacted, scope of impact for each user, etc.

    • Starting to discuss with Data Engineering

    • https://onenr.io/0kjnpPZ56wo

      • Can this be tied to course completion?

    • May be a good case study for engineering noticing a problem and trying to make a call on how much work to put into getting resources allocated to fixing it

2023-08-30

  • [inform] (Jeremy) We are seeking good topics for the edX engineering blog and/or volunteers to write about them

  • [quest] (Jeremy) Should we consider piloting structlog somewhere?

  • [quest] (Phil) Does anyone know anything about Next.js?

    • Phil:

      • I know relatively little about Next.js; any advice? Have we used it at edX before? Does it work well with our current stack? Are there any good learning resources out there?

      • Looking at: django-nextjs, a Django app plugin to enable a Django IDA to serve Next.js server-side code [blog post]

      • Apparently https://pypi.org/project/django-nextjs/ exists to allow Django templates and Next.js templates to co-exist

      • (Jeremy) FED-BOM did a little discovery on this and related tooling in https://github.com/openedx/wg-frontend/issues/126 (we decided at the time to punt until some of the alternatives mature a bit)

    • Jeremy: Related issue: https://github.com/openedx/wg-frontend/issues/126

    • This is more of a question for Frontend WG/Paragon WG.

    • On the open-source side, OEP-11 is the guiding document that may need amendment for Next.js.

  • [quest] (Jeremy) In light of Deciphering Glyph :: Get Your Mac Python From Python.org , should we make any updates to recommendations for installing Python on macOS?

    • (flame): pyenv!

    • Current documentation: 

    • Summary: no real need to change anything here, either pyenv or official installers should work for most people at 2U / working on Open edX

  • [quest] (Dave) Any thoughts on porting forums service to a Django app? (I know there’s Infinity discovery around this, but I’m curious if others had also looked at this problem, or if there were other discussions not captured in those docs.) Context is that Axim is considering funding something here.

    • We’d love to see this get done

    • Diana: concern about data migration

      • Dave: Phase 1 would keep the data in place and start moving towards removing the Ruby aspect of things. Any potential data migration would happen after that.

    • Would alleviate (admittedly modest) security concerns around Sinatra and dependency gems

    • Would greatly reduce barriers to making forums enhancements

  • [quest] (Jeff) Mathjax version 3 OSPR received; could we roll this out course-by-course?

    • Also, native browser rendering or MathCAT may be preferable in some cases

    • (Dave) Maybe go to the Product Working Group to plan out the per-course rollout capability?

  • [quest] (Jeremy) Dev environment direction

    • (Hilary) Think there will still be a need for local long term

      • Intermittent internet connections, for example

    • (Andy) There are some jury-rigged AI configurations that are easier to set up locally

      • But default to remote, fall back to local may be where we want to end up

    • (aside) Open edX is featured on Orbstack’s website! https://docs.orbstack.dev/benchmarks  

  • [quest] (Hilary) How do people set up new large Open edX sites?

    • Mostly custom Terraform and such, sometimes based off of an AWS solution template

    • Although Harmony is a new Kubernetes-based solution for this

2023-08-23

  • [analysis] (Jeremy) Docker Desktop replacement

  • [quest] (Ned) Why are we OK with a 2-hour deployment pipeline?

    • https://en.wikipedia.org/wiki/Boiling_frog

    • (Andy) It’s worse than that, it’s a 2 hour nondeterministic pipeline

    • (Phil) From GoCD edxapp statistics: 45+20+1+1+15+(2+3+7)=94 minutes

      • The 45 minutes (half the duration) is building the AMI

      • Each number is the average duration of a pipeline step.

      • Pipeline #’s in parenthesis happen after the build is available on prod.

    • We don’t have a good basis of comparing even between different pipelines within 2U

    • (Jeremy) We don’t want to continue using GoCD in the long term, which leads to debate on the value of optimizing it (vs. doing work to switch to Argo CD instead)

    • (Jeremy) There are several parallel efforts to reduce edx-platform build time, but the time to value delivery is long doing it that way.  Maybe we should concentrate our efforts a little better for incremental value delivery?

  • [quest] (Alex D.) Any patterns that folks like for data replication between services?

  • [quest] (Ned) What kinds of informal education are useful for developers?

    • High-level block diagram (context/container from c4)

    • Architectural onboarding has fallen by the wayside

    • How code is organized (mono-repo and otherwise)

    • Migrations, what they are and how they go wrong

    • Celery

    • Tour of a new ida makefile

    • What counts as “core”

  • [quest] (Adam) How do we get better at either smoke tests or health checks so that we can make big changes to infrastructure more confidently and detect things before we ship bugs to prod.

2023-08-16

  • [ideation] (Jeremy) What (if anything) should we do for next-generation automated a11y testing?

    • The QA team is interested in working on this if we want to put effort into it

    • (Jeff) We have something to run axe-core for MFEs, unclear how many MFEs are running it and how often

    • (Jeff) There are services that do automated checks for large sites, but they’re ridiculously expensive

    • (Jeff) Have been trying to pick a tool by the end of the year

    • (Jeff) Also, there’s an external audit in the works

    • Jeremy will connect Jeff and the QA team to try to figure out next steps

  • [quest] (Hilary) - Is anyone interested in being a co-author on an OEP about authz best practices?

  • [quest] (Jeff) - Can you think of any problems if no MathML rendering library is included with Open edX or http://edx.org ? (Chrome and Firefox include MathML rendering now, and I wonder if interactivity might be better as a browser extension for some users)

    • There’s also MathCAT

      • Oh hey, it’s written in Rust

    • Browser support has gotten much better in recent years

    • We haven’t yet done a detailed a11y evaluation of the options

    • Worth looking at using the native browser rendering just for the JS download size savings

    • We spend non-trivial $ pushing MathJax bits to everyone in a lot of pages.

  • [question] (Ned) What does a “Cybersecurity review” entail?.

    • Came up in the context of being needed for anything being open sourced

    • (Adam) Sometimes involves pentesting or an external security firm review

    • (Hilary) If you don’t have confidence in the answers to the form, you can get help talking through it

      • Brian M and Purva have done an AppSec for Trilogy user account things

  • [question] (Adam) Do operators actually need to upgrade to MySQL 8 by the Quince release?

    • If they need to defer the MySQL 8 upgrade, they’ll defer the Open edX upgrade

    • It’s tricky to do a release that supports installation with either Django 3.2 or 4.2

    • Django 4.2 will generate SQL that just doesn’t work with MySQL 5.7

2023-08-09

2023-07-26

2023-07-19

  • [quest] (Hilary) Where should docs go for Open edX code?

  • [inform] (Ned) We’re starting to plan an October Hackathon

    • If you want get involved, please join #interest-hackathon-planning

    • (Hilary) For some teams, the theme was a dis-incentive because the assumption was that projects needed to be on-theme, no matter how many times the contrary was stated

      • Let’s point to the list of projects that were done last time, to illustrate that other topics are actually ok

  • Adjourned early for lack of topics & low attendance

2023-07-12

  • [ideation] (Dave O) Proposal: Make MinIO a part of the default Tutor / Devstack install, and let Django apps and services assume an S3-like interface instead of having to accommodate any django-storages backend–i.e. drop support for storing that data directly on the filesystem.

    • OEP

    • DEPR

    • How to migrate folks away

    • Check on Swift usage/compatibility

    • Seems better than localstack, which was way to big for this use case

    • Look into reliability (link)

      • Don’t force MinIO

  • [quest] (Jeremy) Should we configure and enable https://github.com/actions/dependency-review-action ?

    • Have Arbi-BOM try it out, talk to Axim about shared config if it goes well

    • Quick demo of how to find Dependency Review on PRs

  • [quest] (Hilary) Does core functionality belong in the platform or should an IDA be used if the functionality could be considered a complete service?

2023-07-05

  • [rant?] (Andy) we keep getting m1 macs, how much lost dev time before we invest in them

    • Ongoing discrepancy between people having no trouble and those having nothing but trouble

    • Frontend is one of the areas some people have hit trouble in

    • Fragmentation in services being used is making reproducing problems hard

    • Projects ongoing - ARM images coming, for example

  • [quest] (Phil) service-to-service testing

  • [quest] (Jeremy B) - adoption next steps

    • We’ll be asking owning teams to start tracking tasks from this board that need attention from them

    • Feedback welcome on whether we should start doing this manually or do some of the proposed automation first

  • [ideation] (David) Reducing risk of deploying edx-platform

    • Requiring reviews on edx-platform - at the very least, an edx.org reviewer is required for an OSPR to go out on code we own within edx-platform?

    • Process for including community members on RCAs - we can have more information on why incidents happened

    • Process for halting incoming commits - should Clamps be extended to restrict the pool of committers to edx-platform?

    • (long-term) Requiring more frequent e2e tests on code shipment - hopefully gives us more confidence to deploy

    • (long-term) Core contributor read-only access to GoCD, or a website built on GoCD’s APIs exposing some info? Mergers are responsible for monitoring build? [eventually leading to a continuous delivery consortium]

  • [quest] Jeff Witt How does Transifex import work?  Is there a single source of truth?
    See https://www.frontitude.com/ – edX Design team may want to start using it.

2023-06-28

  • [analysis] (David) Web notifications vs. edx-ace

    • Not all stakeholders/the team doing the work isn’t present, but I could use some insight into whether we think these two are apples and oranges or what.

    • (John) Braze supports some form of web notifications

    • (John) Braze is handling transactional emails like password resets, not just marketing stuff

    • (John) The thing that we’re missing right now is persistence of the messages that have been sent

    • (Dave) I think Gabe is the only person from the original edx-ace development team who’s still in the ecosystem

    • (Dave) Possibly relevant discussion in #notifications-2023 in the Open edX Slack

  • [analysis] (Jeremy)  

    • (John) Postgresql has better offline/concurrent index creation (non-locking)

    • (Dave) Limits on length of indexes, etc. measured in bytes instead of characters

  • Any ~opinions~ on chatbot architecture? (Jeff Witt)

    • The accessibility of most chatbots is pretty poor

    • Has any serious thought gone into the frontend architecture for these chatbots?

      • (Andy) Allie has put some thought into it

    • Are we going to make any attempt to do side-by-side comparisons of the different LLM options?

    • (John) It’s unclear how well any of these work in non-English languages

2023-06-21

2023-06-14

  • [review] (Hilary) Current architecture for roles/permission sets and if time potential decisions around roles/permission set contexts/scopes

    • Presented a diagram summarizing current understanding; seems largely accurate

    • Robert Raposa is a good person to go to in order to verify some of the points of uncertainty

    • (Jeremy) OEP-9: User Authorization (Permissions) — Open edX Proposals 1.0 documentation may be useful for separating the implementation of a permission check from all the places that need to use it

    • (Phil) Note that at least ecommerce has permissions in Django admin separate from those in edx-platform

    • (Andy) Support are good people to ask what are the biggest problems/points of confusion with the current permissions scheme

    • (Phil) 2U’s Security Working Group & Ben Piscopo have talked in the past about the need for a TA role: some role between learner & instructor that doesn’t grant the waterfall of privileges that comes with being course staff.

    • Future related topic: scopes

  • [ideation] (Jeremy) If you could make one nontrivial change to Open edX to make it better/easier to work with in the future, what would it be?

    • (Dave) Require fewer resources to run - CPU, RAM, storage

      • Run the whole stack on a Raspberry Pi

    • (Alex) Magical observability (beyond New Relic levels)

      • Quickly trace why things happened and what they are

    • (Alex) Have a schema for content (metadata)

    • (Andy) Types!

      • Adding types to Python

      • JS -> TypeScript

      • Hilary: +1

      • (Jeremy) Arbi-BOM would love to work on the Python side of this

      • (Dave) How much of the code do we need to type before this starts becoming useful?

      • (Andy) cookiecutter might actually be the first place, so you get a typed thing when you get started

    • (Dave) [mildly crazy] Run Studio and LMS as one deployable thing (single platform)

      • (to be clear, I haven’t fully thought this through yet. :-P)

      • 2U wants more services, small deployers want fewer services. Is there a possibility of enabling either option for services, rather than forcing either that makes some happy and some sad.

        • (Dave) I think it’s possible to have “deploy this subset of apps as a different service” thing… I need to think about it more though. A lot of the simplification advantages of having things in the same place might be negated if we have to do the cross service thing anyway.

    • (Jeff) Test user accounts, so I don’t have to enroll in order to test it with e2e testing systems

      • In order to reproduce the UI as perceived by some users

      • Catalog of possible states

    • (Jeremy) Add support for PostgreSQL (dropping MySQL support optional)

      • Consolidate Elasticsearch and MongoDB into PG full text search & JSONB fields, at least for smaller installations

    • (Phil)

      • Learning: Unit-level table of contents

      • Documentation: Architecture onboarding

2023-06-07

  • [quest] (Jeremy) Which software industry news sources are people finding useful, if any?

  • [ideation] (David) How to handle frontend plugin error states when we can’t know if the contents of an iframe has failed to load. 

    • Best idea so far is to use a timeout, assume it failed to load if there isn’t a load successful event after x time since the load attempt event

  • [ideation] (David) Engineering and Architecture Onboarding information architecture

    • Came up because interns start next week

    • Some folks have made their own versions, apparently?

      • … can we find out who without judgment?

    • Onboarding IA

      • Dev environment setup

      • Culture

      • General tech learning resources

      • Tools

      • Separate out ‘meta onboarding’ for managers and people who are onboarding new hires

      • Glossary of… “domains”

        • Devstack (“if you’re a search and discover engineer, ignore everything about devstack”, for instance)

        • Terraform

        • Ansible

      • Doc categorizations - are these applicable to onboarding docs?

        • Explanations

        • How-Tos

        • Tutorials

        • Reference

    • People onboarding a new hire get some coaching

    • How to find stuff?

    • Goals

      • Set small goals for the person onboarding that are appropriate for your team

      • “How can I get this person to commit a production change on their first day?”

      • “Getting devstack running on your first day is a win”

    • https://github.com/orgs/openedx/projects/15/views/1  

    • [Jeremy] Who should be in charge of making onboarding better?

      • (John) Feels like it should be a competency for engineering management, maybe even explicitly called out in the career pathway

      • (David) I volunteered to handle some of the docs side of this

    • (John) It would be awesome to have more onboarding material as courses hosted on Open edX

  • [quest] (Jeff) What’s up with the new video player project?

2023-05-31

  • [Ned] Draft policy for granting write access to openedx org repos:  

  • [John] [inform] Adam Stankiewicz and I are demonstrating non-devstack MFE development (using stage auth and apis) at tomorrow’s FedEX meeting

  • [inform/quest] (Jeremy) Pact contract testing interest revival

    •  

    • Frontend WG considering it for testing MFE interaction with backends

    • Its stub server could also be used to mock back ends for MFE development

    • (David) It feels like course-discovery could really use this

    • (David) It’s easier than it looks like at first glance, but you really need to try using it to understand it

  • [ideation] (David) is there a good way to structure a django app plugin (or something) such that an operator could choose to run it in edx-platform OR as a separate service?

    • (Jeremy) Seems feasible with a minimal microservice wrapper to install the app into, especially if it communicates with the rest of edx-platform via signals (which could be transmitted over the event bus if needed) instead of direct Python API calls

  • [analysis] (Jeremy) Development metrics and tools for collecting them

  • [quest] (Jeremy) Devstack & Tutor pain points

    • (Hilary) Mismatch between Confluence & Slack & verbal advice

      • Especially local vs. hosted devstack

    • (Hilary) Docker Desktop now asks you to confirm that your company has a license for you, which was a significant speed bump for installation

      • Maybe we should accelerate switching to Orbstack or Minikube or Rancher Desktop, etc.

2023-05-24

  • [ideation] (David J) Next steps for this meeting - announcements when it’s starting, perhaps?  Seems to work well for other recurring optional meetings.

  • [ideation] (Jeremy B) I’d like to enhance the console repo health dashboard I started during the Hackathon to make it useful for teams in identifying high-priority tech debt.  Any nominations for patterns/issues that should be especially prioritized?

  • [quest] (Jeremy B) Has anybody looked at JSON5?  Might be a viable option when considering YAML over JSON just to get support for comments.

    • (Chris) I just add “comment” fields to my JSON

    • Sounds like none of us have a burning need for this

    • We’ve come to terms with YAML, and it’s imposed on us by our software choices in many cases

  • [question] (Ned) I’m casting a wide net to find out what education people need about Open edX / open source / Axim / etc: https://twou.slack.com/archives/C04847T6QNQ/p1684505932705279  

    • If someone asks you a question along these lines, capture the answer

    • Have we linked to the docs we have for this in places where people who have these questions can find them?

  • [inform] (Dave) Axim has created a new GitHub org (aximcollaborative), and Axim-specific repos will be moved there.

  • [quest] (Jeremy) Does “platform engineering” sound substantially different from SRE to others?

2023-05-17

  • ****[John] any updates on the future of devstack?

    • Hosted devstack vs. Tutor vs. off-the-shelf cloud dev env tooling

    • Need to set a budget for hosted devstack

    • We want to experiment with Tutor + Devspaces/Okteto/other

    • Want to start building arm64 images for new non-Ansible devstack images

    • Provisioning is a pain point

      • Still being on MySQL 5.7 is making this worse

      • SRE and AWS are working on it

    • We still owe the NPS survey results

    • Shortest path to improving the status quo?

      • Build arm64 images & upgrade MySQL

      • Improve the DB cache

  • ***[quest] (Alie L) Has anybody worked with https://github.com/openedx/edx-rbac before? I’m trying to understand if it is a viable option for getting/reading course staff roles from the LMS into an IDA (specifically special exams IDA)

  • ***[John] [a deliberately provocative question] Does React make us faster or slower?

    • [John] https://www.radicalsimpli.city/

    • [John] It’s pretty complex; is it worth the complexity?

    • [David] Yes.

    • [Diana] Seems to be easier to work with than what we had before (Backbone/Underscore/jQuery)

    • [David] Part of it is trying to use what most other people are using successfully

    • [Jeremy] Some of the complexity and framework churn over time is due to site performance constraints - size and load time

    • [David] Arguably the MFE framework is too simple - too much divergence between individual MFEs

    • [Robert] A lot of this depends on “what are we comparing React to as an alternative?”

    • [Ned] Simplicity of the framework vs. simplicity of the code written using it

    • [Jeremy] If we want a concrete alternative to consider, I’d nominate Svelte.  Much smaller browser footprint, intelligent compiler, and very liked by its users.  But not nearly as widely adopted as React yet.

  • * [quest] (Dave O) What is the charset and collation used for edX’s databases now, and are there any plans to change them anytime soon? (I’d like to get everyone on utf8mb4 and more modern collations, but I don’t know what the state of things are for the big installs, or how painful the migration is)

    • [Andy] I think it may vary across DBs, we had to look at it for some of the insights work. Alison Langston may remember more.

      • [Alie] did look into this briefly for insights data pipeline work, because we ran into a bug with getting specific unicode characters through the data pipeline. Turns out it was a MySQL bug that had not yet been patched in the Aurora DB we were using (so not related to collation)

    • [Jeremy] It’s planned out, but not yet executed (still on utf8).  Blocked on the MySQL 8 upgrade, I think because 5.7 doesn’t have the collation we want.

  • [Inform] (Andy) pushing for open sourcing most of the summary AI stuff, we’ll see how it goes

  • [inform] (Jeremy B) Another Thoughtworks Tech Radar is out.  Does anyone want to review and discuss, either now or after time to read it?

  • [inform] (John) edx-rest-client now automatically forwards request ID headers, for traceability purposes

  • [quest] (Jeremy B) Does https://roadmap.sh/ look like it could help smooth out the learning curve for our tech stack?  (The content and/or the tooling for creating such skill paths.)

2023-05-10

  • [Alex] [question] Do we have any collateral/best practices/etc about caching strategies (not tactics)?

  • [John] we still have 2 jira instances? What’s the timeline on combining or sharing access broadly?

    • [Andy] Thinks the hosted Jira will probably live for a while longer - there’s a lot of idiosyncrasies that make it hard to “jump” from this one to the “other” Jira instance.

    • [Ned] We’ll eventually get migrated off of “server atlassian” (2u-internal) and onto “cloud atlassian” (TODO: name of other atlassian?)

      • Atlassian is going to eventually stop supporting their server version - we’re not going to be able to run it on our own, eventually.

    • [Alex] Who’s the right person or slack channel to answer this question?

      • [Ned] Ned is an ok person to ask, because he’s on the committee that’s deciding this stuff.

  • [David] Move this meeting a bit later, tack “office hours” onto it, and announce it’s happening?

    • [Yes votes] ++++++

    • [No votes]

    • Shift to post-lunch Eastern time slot

      • This basically excludes Pakistan & South Africa folk, but they haven’t been coming anyway

    • The hardest thing: naming.  Is “office hours” somewhat misleading?

      • “Architecture de-couples therapy”

    • What’s a good shared calendar for this to live in?

      • [David] Can add to the platform team calendar

      • Tech-dev shared calendar would be good, but we need admin/IT help.  Also, this calendar is possessed by demons of randomness.

  • [Emily] Ned, anything interesting at PyCon?

    • [Ned] Yes, is currently curating a list of recommended videos.

    • PyCon: 

      • was two, 2-day tutorials (hands on, ya gotta pay extra for that)

      • More structured talks for the rest of the days through end of week…

      • …and then sprints for a couple of days.

    • It’s in Pittsburgh the next two years, you should go!

    • Ned gave a keynote talk, you can find it on his website, probably. https://nedbatchelder.com/blog/202305/pycon_2023_keynote.html  

    • DjangoCon: you still have 5 days to submit a talk proposal.

      • In Durham, NC, mid-October.

  • [John] Inform: SRE has finished injecting X-Request-ID headers on everything.

    • Right now, you’ll see them only in the nginx logs.

    • We can start logging this with middleware now probably - like, log the request ID on every log line.

    • We can also re-use these request IDs when passing requests to services to get a distributed tracing sort of thing.

    • John will come and bother arch-bom about how to do these two items.

2023-05-03

  • [quest] (John) What’s up with course-discovery staying (or not) a core part of Open edX?

    • What’s “in/out” of Open edX is a complex decision point that isn’t well understood.

    • The Open Source Process  working group is happy to hear about friction, and find ways to reduce it.

    • There have been some recent conversations around potentially moving large chunks of course-discovery to the event bus instead of celery jobs / data aggregation

  • [inform] (Ned) GitHub teams in the openedx org will be cleaned up in June: claim ones you still want: https://docs.google.com/spreadsheets/d/1nDWMTbmDHyrwWJsSsyr6d6Bx97IKAoKpm5mSVjRj108/edit#gid=0  

  • [quest] (Jeremy) Next steps on Maintenance Board adoption - https://github.com/orgs/edx/projects/17/views/1  

    • Some of the stuff here is for code that we honestly don’t know if we want to keep or not

    • We need a product-side DEPR process

    • Some of the problem is difficulty to test/understand code you don’t always work in

    • Get maintenance explicitly in org OGSPs

      • Justify as enabling ability to rapidly pivot to new priorities

    • Fix our ability to revert and fix-forward, so there’s more confidence in merging PRs with no obvious flaws

    • May be time to revisit and re-promote the Arch Manifesto

  • [Andy] the chaotic FOMO of ai projects

    • [Jeremy] Feels like the best uses have the lowest visibility.  Maybe couple with a big announcement highlighting the improvements?

    • We need a product person actually responsible for making decisions on what we will and won’t use AI for

  • [Andy] Orbstack looks cool as a Docker Desktop replacement: https://orbstack.dev/

2023-04-05

  • [Jeremy] I periodically maintain .  Is this useful?  Do people still get value from technical conferences?

    • Many people think of conferences in terms of the talks, because that’s what gets hyped a lot. But with those now often available online afterwards, the most value is often in other aspects.

    • We should highlight the networking benefits and the value to the company of being seen attending conferences. This page probably needs a “how and why to attend conferences” section or link.

  • [Ned] (what else?) Hackathon last call

2023-03-22

  • [John] Request IDs are in progress, can we move it along?

  • [Ned] codecov: do people use it?

    • It’s too expensive for private repos

    • It sometimes fails, which is a distraction

    • “More of an impediment than an enabler”

    • Bad: it complains if you delete covered lines

    • “Occasionally useful to indicate tests to write”

    • “Too binary: .1% off shouldn’t be a failure”

    • “Not sure we’re getting even $500/mo value from it”

    • Tangent: coverage metrics at all

    • We’re interested in getting broader feedback from devs, though.

  • Hackathon

    • Ideas: https://docs.google.com/document/d/1xcLT2BTeT5La7qx59KaxMQrA_3TKp6jXifjygSa_DhA/edit#heading=h.42hpro60romb  

    • ChatGPT foundations

      • Credits for experiments?

      • OpenAI account is hard to get

        • Trying to satisfy Legal

      • GPT3.5-Turbo is better, we should use it, and you can generally access this model via standard usage of the open AI API.

      • Will people need to spend money during the hackathon, and if so, how much?

        • “A team could spend $20 over the course of the hackathon”

        • https://openai.com/pricing

        • “Multiple models, each with different capabilities and price points. Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.”

          • 3.5 turbo is $0.002 per 1,000 tokens.

          • Assuming 250 words per page of a high-school essay, $20 would buy you a 30,000 page high-school essay.

    • Legal has concerns about 2U information being fed to OpenAI.  Not clear how to get approval for experiments.

2023-03-15

  • [Jeremy/Tyler] To what extent should we lean into Kubernetes for development environments?

    • [Tyler] Devspace deploying into clusters

    •  

    • https://github.com/edx/edx-arch-experiments/issues/82  

    • [Jeremy] What are the top concerns people have on a new dev environment?

      • [Jeremy] What should our next steps be?

      • [Andy]

        • Need to be able to wipe and reset state

        • Recently, slow - moving into the cloud feels like it would fix this

      • [John]

        • Not run all services to run 1 service

        • Tight coupling means lots of setup (~1.5 hours) to test a small change

          • [Alex] What exactly causes this?

            • [John] Anything that changes between your last devstack build can cause extra troubleshooting

              • [Alex] Wonder if we need snapshots.

            • [Phil] Our requirements are not always pinned.

      • [Tyler] Do we have a list of dependencies?

        • [Jeremy] the docker-compose.yml of devstack.git is what defines these, for things added to devstack.  There are many new services of late that don’t make it into devstack (e.g. every enterprise service that’s not a library).

      • K8s?  It’s hard to find a concise articulation of using k8s for a development environment and benefits of such, vs. docker-compose.

        • [John] Is k8s going to be painful because not a lot of people [in the world?] are running it for local development.  Even if it’s a technologically superior choice?

        • [Jeremy] We designed Open edX originally to deploy each service as a distinct VM with some communication layer around it.  Which translated fine to docker-compose, but it’s not built in a way to take advantage of k8s features.

      • [Tyler] Suspects data is the first big problem to tackle

        • [Jeremy] https://open-edx-proposals.readthedocs.io/en/latest/best-practices/oep-0037-bp-test-data.html  

        • [John] A nice property of staging environments is that the work to set up application data is shared/re-used/extended amongst different devs and teams in the course of testing and verification.

        • [Andy] Would it be better to regularly re-import data into stage, even at the cost of people losing recent modifications/additions they’ve made?

        • [John] Describes an idea where e.g. sales demo account live on the staging environment, so that actual humans are populating realistic data on stage, and then you get pretty good data on staging for free.

      • [Jeremy] Candidate next steps

        • Create and maintain better stock data

        • Pick a cloud dev deployment approach

        • Clean up our configuration story to require less deployment-specific customization

      • [Jeremy] Necessary step: smooth the k8s learning-curve.  You shouldn’t have to fully understand k8s just to start doing development.

2023-03-08

2023-02-22