Arch Hours: 2022

Arch Hours: 2022

Meeting Expectations

Why?

  • Provide an opportunity for generative discussion and ideas.

  • Foster comradery through technical curiosity and geekdom.

Who?

  • Open to all edX-ers and Arbisoft-ers

What?

  • At times, these informal discussions result in follow-up action and beneficial change in our technology or in our organization. While this is not a decision-making body, these serendipitous discussions spark ideas that may result in ADRs/OEPs and tickets on team backlogs.

  • At times, it serves as a form of informal office hours to ask live technical questions of the archeological collective.

  • At times, we have pre-planned deep-dive topics that folks propose to gather wide-input or to answer questions.

  • At times, we have hosted special guests (internal and external to edX) on specialized topics.

When?

  • Not lunch hour in ET timezone: With Covid remote work, "Arch Lunch" has evolved into “Arch Hour” in order to accommodate various home/life situations during lunch time.

How? Live Co-Editing

To circumvent Confluence’s limitations with the maximum number of concurrent editors:

Why not just stick with keeping the notes in the Google doc?

  • Google docs are not as discoverable.

  • Google docs don’t notify observers of future edits.

  • Google doc comments don’t notify all observers.

How? Structure

Please enter your proposed topics for discussion.
When we use Lean Coffee Style (link1, link2), we vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).

Prefix your topic with your intention so we are clear on what outcome you are striving from the discussion. Examples:

  • [inform] You are simply seeking to inform the group of this item. You may field clarifying questions from the group on your inform, but not seeking further discussion at this time.

  • [ideation] You are seeking divergent and wide perspectives from this group. In this brainstorming mode, all ideas are accepted, without critical analysis.

    • It may be helpful to clarify whether you’d like to ideate on the problem space or the solution space.

  • [analysis] You are asking the group to help you poke holes in your idea/topic/plan/etc.

  • [quest] You are seeking information/responses to a question you have.

2022-12-21

  • [Phil] [discuss] edx-cookiecutter & auto-adding LMS id from JWT to User Django model in non-LMS new services

    • Consensus:

      • Let’s add the lms_user_id in by default: PR + ADR

      • Let’s consider in the future how to reduce the number of identifiers, especially considering future efforts of unifying identity at 2U

        • Enterprise may have a model for this in how they stub users if they are added to subscriptions before they exist in the LMS.

    • Created: ​​Add lms_user_id by default to the user model of new cookiecutter IDAs · Issue #281 · openedx/edx-cookiecutters

    • Raw discussion notes:

      • Purchase squad, migrating ecommerce to 2U pre-existing ecommerce - “Titan”

      • Confusion about canonical user identifiers - LMS user ID

      • Pie or Exams do this thing about auto-adding LMS user ID - should we add this to the cookie cutter?  Should new services automatically have the LMS user ID in their user model?

        • Well, maybe not all of them need it… but many may eventually need it?

      • John: Side note: Maybe we could set the id of the user in the new service to be the same as the lms_user_id?

        • Phil: I didn’t know we could do this!

        • Chris D: What about conflicts?

          • John N: There is only one user table that creates IDs

      • John, Robert: Seconded

      • Robert: We should have docs in the cookiecutter about this information

      • Robert: On the older services we didn’t have this for a long time. We were re-using an assorted variety of user identifiers across services. Users were and many times still are being created in LMS by different services.

        • History: Ecommerce was one of the first repos where we were trying to get the lms_user_id holistically added to all calls to/from the repo & LMS

      • David: Does Enterprise has any use cases of user imports?

        • John: We have a stub record we create if a user doesn’t pre-exist in LMS

      • John: Makes sense to have lms_user_id in the user model. Maybe a future thought is to reduce our total number of ids.

      • Robert: In the LMS, we do have the concept of external IDs.

      • Chris: We have global identity as well.

      • John: Maybe we have options to map it in the future.

  • [Robert] (quest) Arch Monthly Stand-up used to provide me some info about what others are up to. I know we had thoughts about an async replacement, but right now I feel like I just don’t get this info.

    • Do others feel they are getting this info? Where can I tap in?

      • There’s an L&P Scrum of Scrums that covers some of this for managers

    • Or, do we need some replacement?

      • BOM teams try to keep track of what to announce, does this need to be a more widely done practice?

      • Are demo/sharing time meetings common in teams?

2022-12-14

  • [Feanil/Ned] Announcements

  • [Feanil] General overview of how things are going at 2U?

  • [Andy] report on LTI tool actual vs. specified or expected behavior 

    • Unique identifiers

    • PII sharing

2022-12-07

2022-11-30

  • [Ben W] What does the http->https forwarding?

    • [Robert] Cloudflare probably for http->https. Also, an answer to a separate question, Google TagManager is often where random scripts are dropped on the page.

2022-11-23

Low attendance due to Thanksgiving-related PTO. There was some continuation of discussions about XBlocks, iframes, and CSS conflicts, but notes weren’t taken.

2022-11-16

  • [inform] (Jeremy) Updated draft of Development Environment Vision is ready for review

  • [quest] Jeff Witt 1 min: Use of !important in CSS – OK to use, or to be avoided?  Consensus seems to be that it’d be best to avoid it. Uncertain if there’s any substantial a11y angle on this guideline.

  • [discussion] Ned: OEP-55 Maintainership: monitoring issues, PR SLAs

    • We’ve picked repos for the pilot that are likely to do well at this, but what happens when it’s expanded to repos owned by overwhelmed teams?

    • [John] Do we need someone like Natalia to help teams keep track of this?

    • [Andy] It’ll probably increase the pressure to catch up with the maintenance backlog in various repos

    • [Jeremy] I suspect that much of the need for a project manager arises from immature processes around software maintenance and sustainability, we should also take steps to address that.

    • [Ned] Pilot Phase 2archived

    • [Andy] We have processes for tracking OSPRs, but really not for GitHub Issues yet.  How do we make sure these actually get considered when prioritizing?  (Given that many of our product managers/owners live primarily in Jira.)

    • [John] We could improve scheduling of automated upgrade PRs.

      • [Andy] Some teams are already doing this, at least for the Python upgrade PRs.

    • Much of OSPR handling is currently being dealt with in per-team on-call processes, which works but may not be the ideal approach.

    • [Andy] If you have a product-mandated backlog, fix that first.  Needs to be a conversation that factors in maintenance needs.

    • [John] Having more advance notice that PRs will be coming (and why) really helps.

2022-11-09

2022-11-02

2022-10-26

Skipped due to low attendance

2022-10-19

  • [Jeremy] High-level development environment objectives

    • No need to debug code updating problems

    • Fast to set up a new dev environment

    • Don’t need to carefully preserve manually set up testing data

    • Good support for debugging and observability

    • Consistent between services

    • Able to run reasonable subsets of the full Open edX ecosystem of services

    • Defaults to feature flags currently active in production

    • Comes with data needed to quickly test most features

  • [Adam] [quest] I'd like to discuss with this group and Simon to better understand the plan for moving to Open Search

  • [Jeremy] Can we get away from requiring thorough owning team review for maintenance, bug fix, and small feature enhancement PRs?  What would have to change to make that happen?

    • Plugins/libraries need to have been tested in the things they’re installed in

    • Make test suites more reflective of actual behavior in production deployment

    • Make the changes unused in edx-platform

    • Address issues raised in previous RCAs - trailing slash consistency, database migration linting

    • Shorten the time from merging to detecting problems in production

    • Canary deployments?

    • Shrink the size of edx-platform (small problem can bring down a large chunk of production)

    • Automatically deploy a test environment that exercises the change

2022-10-12

  • [inform] (Ben W) FWG/Opencraft/RacoonGang Theming conflict.  Working with groups to try and consolidate how we co-ordinate work around the platform between working groups and get them to talk to each other.

    • We ended up with parallel meetings: 2U-focused and Open edX community focused

    • Not much communication happened between these parallel groups

    • Trying to fit all front end stuff into one series of meetings ends up at poor signal to noise ratio

    • Need a clear forum for this coordination

    • (Ned) Concerned about defaulting to a meeting as the primary forum for this: conflicts, time zones, etc.

    • (Chris) Should we use the Open edX roadmap for this kind of coordination?

    • (Ben) Trying to reconcile architectural initiatives being driven by multiple organizations in the same project sounds terrifying

    • (John) This sounds like a flaw/scalability problem with our architecture and process that needs to be fixed

    • (Andy) We need to get better at sending more redundant communications the larger a project is

  • [quest] (Jeremy B) Developer Experience - reasonable focus for this meeting? Arch-BOM is pivoting(?) to focus on this

    • For a loose definition of DX, perhaps

    • Any examples or resources we should learn from?

      • (Ben) Standardized debugging/troubleshooting tools

    • (Chris) It feels like some aspects of DX start to cross back into architecture

    • (Chris) Would be good to get a status update on development environment efforts

    • (Ben) How can we make “thing not in platform” easier (for a new python API)

    • (Ben) How can we make “New MFE” easier (observability, config, etc)

    • (Hamzah) A “newsletter” of changes and features would be helpful.

  • [inform] (Ben W) FedX exists again.  What this means.  What our focuses will be.

2022-10-05

  • [inform] (Ned) We need hackathon organizers, please volunteer

  • [quest] (Jeremy) Hacktoberfest - do we want to accept contributions this year?

    • Easy way to get T-shirts for developers

    • But it’s not clear how much else we get out of this; usually a few vaguely useful contributions, a few mild wastes of time

    • [Ned] We have enough problems with our contribution pipeline as is, may not be a great idea to pile more into the backlog

    • [Jeremy] We do have a bunch of GitHub Issues for edx-platform pytest warning fixes that we could tag for participants

    • Let’s activate selectively for things where there are useful issues open and maintainers are willing

  • [quest] (Jeremy) How important/useful to people think type checking would be?

    • [Ned] First we’d need to fix our existing linting

    • We’d need a policy, one reasonable example could be “you may add type hints, but you don’t have to.  If you do, the linting must not break”

    • Communicate said policy

    • Hold off on any big push for test-generated type hints or other comprehensive annotations

  • [discuss] (Diana) What do we need to do to make sure there’s not much disruption from Slack migration? 

    • Migrate existing channels

    • Update integrations

    • Handle shared channels

    • There’s a lot written about this, few people have had time to read it all.  And it sounds like there are at least a few corner cases that the docs and process don’t cover yet.

    • Emoji transfers (Matt Hughes seems to be working on this)

  • [discuss] (Ned) Links to private wikis from public wiki. Allowed/disallowed?

    • Feanil: it’s fine as long as it’s clear that it’s a private link and that it was understood by the author that it’s private.

    • [Jeremy] Is it worth wrapping them in conditional content blocks to make it explicit and avoid distracting other readers?

    • [Feanil] How about a table at the bottom of the page for links to each org’s private related context?

  • [inform] (Ned) Kelly is trying to formalize “public workstreams”: https://edx-internal.slack.com/archives/CDA7GMJ4B/p1664910103145889 (private 2U link)

  • [discuss] Max’s impressions of FedBom PR flow

  • [inform/quest] (Jeremy) Arch-BOM -> Developer Experience

    • If you have any suggestions on improvements that should be prioritized, please let us know

2022-09-28

  • [Ned/inform] Open Source Process working group: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/19467639/Open+Source+Process+Working+Group (private)

  • [Ned] Forking Strategies doc in progress: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/155746369/Forking+Strategies (private)

  • [Jeff/quest] Do we have a Dates API, for extensions?

    • Idea is that we should have some mechanism in the platform to facilitate people scheduling time to work together on a course

    • Things like this: https://www.flow.club/ and https://focusme.com/  

    • [Dave] There’s support for retrieving key dates about the course, but not adding dates

  • [ideation] (Jeremy) Frontend security vulnerability handling

    • We get dependabot alerts about security vulnerabilities in dependencies.

    • Would be nice to just upgrade things (hopefully automatically)

    • Fed-BOM is working to get upgrade PRs like this assigned to owning teams.

    • [Alex] opines that teams may be missing a more formal on-call process, through which these upgrades could be actualized.

    • [Andy] A big part of the problem is that our frontend test suites are insufficient to catch even fairly major problems before deployment

      • This is not really a frontend unique problem, it hits all PRs from outside the team

  • [Feanil/question] What kind of testing maturity do we feel we need?

    • Better mocking and Test Data

    • More contract testing

    • Adding tests specifically for issues that broke Prod.

    • Record context on the bugs that escaped to production in a more public way so the community can better understand what broke and how.

  • [Ned/question] Hackathon?

    • [Jeremy] We need organizers, please get in touch if interested

2022-09-21

2022-09-07

  • [Phil] [quest] User IDs across services - was very confused and was hoping for some clarification for people who know Django better.

    • https://open-edx-proposals.readthedocs.io/en/latest/architectural-decisions/oep-0032-arch-unique-identifier-for-users.html

    • Jeremy Bowman

      • User ID in Django is just an auto incrementing identifier

      • Only meant to be unique within service

      • We have used usernames and email addresses in the past to connect services.

      • PII is a concern, though, with usernames and email addresses.

      • We use LMS database ID as the global identifier for the user.

      • Other IDAs have their own user ID which is distinct from that LMS database ID, but have a field in the model.

    • John Nagro

      • Maybe enterprise-access or program-intent-engagement might have clues.

    • Chris Deery

      • Change the API LMS-side to use user ID.

      • Ask the owner too!

      • Interested in the context of how to get more MFEs getting set up as efficiently as possible.

    • John Nagro

      • Maybe there’s a way to create conveniences in cookiecutter to, e.g. hydrate missing user information

      • In Rails, you can have a class that looks & acts like an ORM object but is backed by an API

2022-08-31

Notes available on private 2U Confluence: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/150569008/Arch+Hours+Private+2022.

2022-08-24

  • [Ned] Putting public information in the public wiki

    • https://2u-internal.atlassian.net/wiki/spaces/DOC/pages/120586314/2U+or+Open+edX+Where+to+put+new+docs  

    • Leaving stub pointers from 2u-internal to openedx will help remind people about the split

    • Andy says it’s easy to move docs, but you have to fix the links

    • Who will be responsible for informing devs?

      • The Open Source Process working group will figure that out

    • What about wiki vs readthedocs?

      • If it’s going into a wiki, it’s better to put it in the right wiki

      • TODO: is there a global template that can provide in-the-moment guidance?

    • Does 2U have any sort of “enterprise search” solution for docs?

      • We don’t think so, but great point to revisit now

    • What does this “enterprise search” even look like?

      • There are offerings from vendors that search across systems (confluence, read the docs, other confluence, github, emails?!, etc.), making the API calls, scoping to things to which the searcher has access.

      • We did light investigation on this in the past, but dropped it because the available solutions were deemed too invasive at the time.

      • The ability to understand the current state of access control in Google Drive and Confluence is hard.  Adding enterprise search on top of this may exacerbate the hardness.

  • [Ned] OEP-55: Maintainership pilot is underway:

2022-08-17

  • [Andy] what even is this meeting now?

    • How does everyone stay informed about broader engineering efforts?

      • Now we are not all in L&P (“lump”?)

      • Conway’s Law in action

    • [Unstructured rambling from Alex about what/where Enterprise is now in relation to L&P and other parts of the platform system and organization].

    • What could we do to facilitate cross-batallion (column) architecture thoughts and information?

      • [Chris D.] Hire a chief architect?

      • [Alex D.] Is this what the Arch. Coordination WG is for?

      • [Andy] What even is the overriding edX engineering culture now?  A lot of scrum teams have their strong team cultures, but they’re each self-directed.

        • Org-chart: https://drive.google.com/file/d/1th-2GYGEsMzvFnto8iGa6IY69kGbhGZS/view  

        • Some columns have a dedicated architect right now (e.g. David Joy in LnP, although has interest in arch across edX/2U)

        • [Robert R.] Architectural fitness functions - does anyone have experience with this concept outside of edX?  Specifically measurable fitness functions.

          • [Ned] remembers some ideas about using linters to catch some things related to fitness functions.  Seems like we’re looking for a sort of “magic” technology to do architecture for us, instead of talking/training humans.

          • [Andy] Some experience in the past of publicizing cross-organization endpoint performance as a way to improve endpoints, make them adhere to better SLAs.

          • [Chris] Automated performance testing is hard.

      • “Repetition doesn’t ruin the prayer”

      • [Andy] Likes Chris’ concept of an architect - more of an architecture evangelist, trainer/teacher.  Not someone who hands you a design to go implement.

        • “Architecture Shaman”, “Architecture Preacher”, “Sage”, etc.

      • Do we need a role where someone goes around and gets architectural workshops organized on a frequent, regular basis.  Not presenting the workshops themselves, but prodding/requiring all (principal? Senior? anyone?) engineers to present topics at these workshops.

        • This came out of the idea that we lost our lunch/learn workshops that e.g. Dave O. would frequently run on performance (and other) things.

      • [Dave O.] 2U Enterprise is a similar use-case to a lot of open edX providers that have some custom stuff they want to run for paying organizations.  More ownership burden, optimized somewhat for faster speed of delivery.

        • Should things be optimized such that, if the enterprise squad is wholly re-organized (as a team/squad) tomorrow, the systems stay good?

        • [Alex] Rambles.

      • [Ned] Every scrum team feels like they “own too much”.  Why is this the case?  Is this an edX problem?  A product problem? A modern software problem?

      • [Robert] Raises the question of “are there some things which should not be included in open source?”

        • Could we make faster decisions about e.g. deprecating/decommissioning systems if we don’t have to worry about who outside of 2U is using that system?

      • Here’s an awesome diagram: https://openedx.atlassian.net/wiki/spaces/OEPM/pages/3499786241  

  • [Ned] Putting public information in the public wiki

  • [Ned] OEP-55: Maintainership pilot is underway:

  •  [Ned] Would someone like to run this next week?

2022-08-03

2022-07-27

  • [quest/ideation] (Dave): How to update MySQL charset to utf8mb4

    • We currently use “utf8”, which isn’t real UTF-8 and only has 3 bytes (lacking support for many characters)

    • Utf8mb4 is supported under 5.7, but the most appropriate collation to use isn’t supported until 8.0.1

    • 2U SRE is still figuring out how to do the 5.7 -> 8 upgrade in Aurora without extensive downtime; there seems to be one option that will require a bunch of prep work

    • Most other installations will likely just want to dump and restore at Open edX upgrade time; for these, upgrading the DB and switching the encoding at the same time may make sense

    • Jeremy will bring this to SRE’s attention and see if/how it impacts MySQL upgrade plans for http://edx.org

    • [Andy] Seriously, is it just easier to switch to PostgreSQL instead?

      • Jeremy will ask about this too…

  • [quest/ideation] (Jeremy): How proactively do we want to track new Ubuntu LTS releases?

    • Question for the BTR WG?

    • This has ramifications for which Python release we next add support for

    • [Ned] Python 3.11 is supposed to be 25% faster than 3.10, but looks like it may be a rocky upgrade bug-wise due to internal changes

2022-07-20

  • [Andy] our standard JWT authentication tangles the global user into a service’s database. JSONWebTokenAuthentication may not be the right choice outside the monolith, but it’s in the cookiecutter.

    • [Jeremy] Django requires a user object even for basic request/response handling, and many of the fields like first name, last name, and email are required.  So we either need to copy them from the LMS or make up bogus data to avoid PII spread.

    • [Andy] I agree that if we need a user it’s better not to have a half-real half-madeup user. :)

  •  [Ned] Anybody participating in the Open Courseware architecture meetings?  How’s that going, what degree of overlap is there with this meeting?