Arch Hours: 2022

Meeting Expectations

Why?

  • Provide an opportunity for generative discussion and ideas.

  • Foster comradery through technical curiosity and geekdom.

Who?

  • Open to all edX-ers and Arbisoft-ers

What?

  • At times, these informal discussions result in follow-up action and beneficial change in our technology or in our organization. While this is not a decision-making body, these serendipitous discussions spark ideas that may result in ADRs/OEPs and tickets on team backlogs.

  • At times, it serves as a form of informal office hours to ask live technical questions of the archeological collective.

  • At times, we have pre-planned deep-dive topics that folks propose to gather wide-input or to answer questions.

  • At times, we have hosted special guests (internal and external to edX) on specialized topics.

When?

  • Not lunch hour in ET timezone: With Covid remote work, "Arch Lunch" has evolved into “Arch Hour” in order to accommodate various home/life situations during lunch time.

How? Live Co-Editing

To circumvent Confluence’s limitations with the maximum number of concurrent editors:

Why not just stick with keeping the notes in the Google doc?

  • Google docs are not as discoverable.

  • Google docs don’t notify observers of future edits.

  • Google doc comments don’t notify all observers.

How? Structure

Please enter your proposed topics for discussion.
When we use Lean Coffee Style (link1, link2), we vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).

Prefix your topic with your intention so we are clear on what outcome you are striving from the discussion. Examples:

  • [inform] You are simply seeking to inform the group of this item. You may field clarifying questions from the group on your inform, but not seeking further discussion at this time.

  • [ideation] You are seeking divergent and wide perspectives from this group. In this brainstorming mode, all ideas are accepted, without critical analysis.

    • It may be helpful to clarify whether you’d like to ideate on the problem space or the solution space.

  • [analysis] You are asking the group to help you poke holes in your idea/topic/plan/etc.

  • [quest] You are seeking information/responses to a question you have.

2022-12-21

  • [Phil] [discuss] edx-cookiecutter & auto-adding LMS id from JWT to User Django model in non-LMS new services

    • Consensus:

      • Let’s add the lms_user_id in by default: PR + ADR

      • Let’s consider in the future how to reduce the number of identifiers, especially considering future efforts of unifying identity at 2U

        • Enterprise may have a model for this in how they stub users if they are added to subscriptions before they exist in the LMS.

    • Created: ​​https://github.com/openedx/edx-cookiecutters/issues/281

    • Raw discussion notes:

      • Purchase squad, migrating ecommerce to 2U pre-existing ecommerce - “Titan”

      • Confusion about canonical user identifiers - LMS user ID

      • Pie or Exams do this thing about auto-adding LMS user ID - should we add this to the cookie cutter?  Should new services automatically have the LMS user ID in their user model?

        • Well, maybe not all of them need it… but many may eventually need it?

      • John: Side note: Maybe we could set the id of the user in the new service to be the same as the lms_user_id?

        • Phil: I didn’t know we could do this!

        • Chris D: What about conflicts?

          • John N: There is only one user table that creates IDs

      • John, Robert: Seconded

      • Robert: We should have docs in the cookiecutter about this information

      • Robert: On the older services we didn’t have this for a long time. We were re-using an assorted variety of user identifiers across services. Users were and many times still are being created in LMS by different services.

        • History: Ecommerce was one of the first repos where we were trying to get the lms_user_id holistically added to all calls to/from the repo & LMS

      • David: Does Enterprise has any use cases of user imports?

        • John: We have a stub record we create if a user doesn’t pre-exist in LMS

      • John: Makes sense to have lms_user_id in the user model. Maybe a future thought is to reduce our total number of ids.

      • Robert: In the LMS, we do have the concept of external IDs.

      • Chris: We have global identity as well.

      • John: Maybe we have options to map it in the future.

  • [Robert] (quest) Arch Monthly Stand-up used to provide me some info about what others are up to. I know we had thoughts about an async replacement, but right now I feel like I just don’t get this info.

    • Do others feel they are getting this info? Where can I tap in?

      • There’s an L&P Scrum of Scrums that covers some of this for managers

    • Or, do we need some replacement?

      • BOM teams try to keep track of what to announce, does this need to be a more widely done practice?

      • Are demo/sharing time meetings common in teams?

2022-12-14

  • [Feanil/Ned] Announcements

  • [Feanil] General overview of how things are going at 2U?

  • [Andy] report on LTI tool actual vs. specified or expected behavior 

    • Unique identifiers

    • PII sharing

2022-12-07

2022-11-30

  • [Ben W] What does the http->https forwarding?

    • [Robert] Cloudflare probably for http->https. Also, an answer to a separate question, Google TagManager is often where random scripts are dropped on the page.

2022-11-23

Low attendance due to Thanksgiving-related PTO. There was some continuation of discussions about XBlocks, iframes, and CSS conflicts, but notes weren’t taken.

2022-11-16

2022-11-09

2022-11-02

2022-10-26

Skipped due to low attendance

2022-10-19

  • [Jeremy] High-level development environment objectives

    • No need to debug code updating problems

    • Fast to set up a new dev environment

    • Don’t need to carefully preserve manually set up testing data

    • Good support for debugging and observability

    • Consistent between services

    • Able to run reasonable subsets of the full Open edX ecosystem of services

    • Defaults to feature flags currently active in production

    • Comes with data needed to quickly test most features

  • [Adam] [quest] I'd like to discuss with this group and Simon to better understand the plan for moving to Open Search

  • [Jeremy] Can we get away from requiring thorough owning team review for maintenance, bug fix, and small feature enhancement PRs?  What would have to change to make that happen?

    • Plugins/libraries need to have been tested in the things they’re installed in

    • Make test suites more reflective of actual behavior in production deployment

    • Make the changes unused in edx-platform

    • Address issues raised in previous RCAs - trailing slash consistency, database migration linting

    • Shorten the time from merging to detecting problems in production

    • Canary deployments?

    • Shrink the size of edx-platform (small problem can bring down a large chunk of production)

    • Automatically deploy a test environment that exercises the change

2022-10-12

  • [inform] (Ben W) FWG/Opencraft/RacoonGang Theming conflict.  Working with groups to try and consolidate how we co-ordinate work around the platform between working groups and get them to talk to each other.

    • We ended up with parallel meetings: 2U-focused and Open edX community focused

    • Not much communication happened between these parallel groups

    • Trying to fit all front end stuff into one series of meetings ends up at poor signal to noise ratio

    • Need a clear forum for this coordination

    • (Ned) Concerned about defaulting to a meeting as the primary forum for this: conflicts, time zones, etc.

    • (Chris) Should we use the Open edX roadmap for this kind of coordination?

    • (Ben) Trying to reconcile architectural initiatives being driven by multiple organizations in the same project sounds terrifying

    • (John) This sounds like a flaw/scalability problem with our architecture and process that needs to be fixed

    • (Andy) We need to get better at sending more redundant communications the larger a project is

  • [quest] (Jeremy B) Developer Experience - reasonable focus for this meeting? Arch-BOM is pivoting(?) to focus on this

    • For a loose definition of DX, perhaps

    • Any examples or resources we should learn from?

      • (Ben) Standardized debugging/troubleshooting tools

    • (Chris) It feels like some aspects of DX start to cross back into architecture

    • (Chris) Would be good to get a status update on development environment efforts

    • (Ben) How can we make “thing not in platform” easier (for a new python API)

    • (Ben) How can we make “New MFE” easier (observability, config, etc)

    • (Hamzah) A “newsletter” of changes and features would be helpful.

  • [inform] (Ben W) FedX exists again.  What this means.  What our focuses will be.

2022-10-05

  • [inform] (Ned) We need hackathon organizers, please volunteer

  • [quest] (Jeremy) Hacktoberfest - do we want to accept contributions this year?

    • Easy way to get T-shirts for developers

    • But it’s not clear how much else we get out of this; usually a few vaguely useful contributions, a few mild wastes of time

    • [Ned] We have enough problems with our contribution pipeline as is, may not be a great idea to pile more into the backlog

    • [Jeremy] We do have a bunch of GitHub Issues for edx-platform pytest warning fixes that we could tag for participants

    • Let’s activate selectively for things where there are useful issues open and maintainers are willing

  • [quest] (Jeremy) How important/useful to people think type checking would be?

    • [Ned] First we’d need to fix our existing linting

    • We’d need a policy, one reasonable example could be “you may add type hints, but you don’t have to.  If you do, the linting must not break”

    • Communicate said policy

    • Hold off on any big push for test-generated type hints or other comprehensive annotations

  • [discuss] (Diana) What do we need to do to make sure there’s not much disruption from Slack migration? 

    • Migrate existing channels

    • Update integrations

    • Handle shared channels

    • There’s a lot written about this, few people have had time to read it all.  And it sounds like there are at least a few corner cases that the docs and process don’t cover yet.

    • Emoji transfers (Matt Hughes seems to be working on this)

  • [discuss] (Ned) Links to private wikis from public wiki. Allowed/disallowed?

    • Feanil: it’s fine as long as it’s clear that it’s a private link and that it was understood by the author that it’s private.

    • [Jeremy] Is it worth wrapping them in conditional content blocks to make it explicit and avoid distracting other readers?

    • [Feanil] How about a table at the bottom of the page for links to each org’s private related context?

  • [inform] (Ned) Kelly is trying to formalize “public workstreams”: https://edx-internal.slack.com/archives/CDA7GMJ4B/p1664910103145889 (private 2U link)

  • [discuss] Max’s impressions of FedBom PR flow

  • [inform/quest] (Jeremy) Arch-BOM -> Developer Experience

    • If you have any suggestions on improvements that should be prioritized, please let us know

2022-09-28

  • [Ned/inform] Open Source Process working group: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/19467639/Open+Source+Process+Working+Group (private)

  • [Ned] Forking Strategies doc in progress: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/155746369/Forking+Strategies (private)

    • [Jeff] Does this also cover the case where we need to fork an external dependency to fix an a11y issue?

  • [Jeff/quest] Do we have a Dates API, for extensions?

    • Idea is that we should have some mechanism in the platform to facilitate people scheduling time to work together on a course

    • Things like this: https://www.flow.club/ and https://focusme.com/  

    • [Dave] There’s support for retrieving key dates about the course, but not adding dates

  • [ideation] (Jeremy) Frontend security vulnerability handling

    • We get dependabot alerts about security vulnerabilities in dependencies.

    • Would be nice to just upgrade things (hopefully automatically)

    • Fed-BOM is working to get upgrade PRs like this assigned to owning teams.

    • [Alex] opines that teams may be missing a more formal on-call process, through which these upgrades could be actualized.

    • [Andy] A big part of the problem is that our frontend test suites are insufficient to catch even fairly major problems before deployment

      • This is not really a frontend unique problem, it hits all PRs from outside the team

  • [Feanil/question] What kind of testing maturity do we feel we need?

    • Better mocking and Test Data

    • More contract testing

    • Adding tests specifically for issues that broke Prod.

    • Record context on the bugs that escaped to production in a more public way so the community can better understand what broke and how.

  • [Ned/question] Hackathon?

    • [Jeremy] We need organizers, please get in touch if interested

2022-09-21

2022-09-07

  • [Phil] [quest] User IDs across services - was very confused and was hoping for some clarification for people who know Django better.

    • https://open-edx-proposals.readthedocs.io/en/latest/architectural-decisions/oep-0032-arch-unique-identifier-for-users.html

    • Jeremy Bowman

      • User ID in Django is just an auto incrementing identifier

      • Only meant to be unique within service

      • We have used usernames and email addresses in the past to connect services.

      • PII is a concern, though, with usernames and email addresses.

      • We use LMS database ID as the global identifier for the user.

      • Other IDAs have their own user ID which is distinct from that LMS database ID, but have a field in the model.

    • John Nagro

      • Maybe enterprise-access or program-intent-engagement might have clues.

    • Chris Deery

      • Change the API LMS-side to use user ID.

      • Ask the owner too!

      • Interested in the context of how to get more MFEs getting set up as efficiently as possible.

    • John Nagro

      • Maybe there’s a way to create conveniences in cookiecutter to, e.g. hydrate missing user information

      • In Rails, you can have a class that looks & acts like an ORM object but is backed by an API

2022-08-31

Notes available on private 2U Confluence: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/150569008/Arch+Hours+Private+2022.

2022-08-24

  • [Ned] Putting public information in the public wiki

    • https://2u-internal.atlassian.net/wiki/spaces/DOC/pages/120586314/2U+or+Open+edX+Where+to+put+new+docs  

    • Leaving stub pointers from 2u-internal to openedx will help remind people about the split

    • Andy says it’s easy to move docs, but you have to fix the links

    • Who will be responsible for informing devs?

      • The Open Source Process working group will figure that out

    • What about wiki vs readthedocs?

      • If it’s going into a wiki, it’s better to put it in the right wiki

      • TODO: is there a global template that can provide in-the-moment guidance?

    • Does 2U have any sort of “enterprise search” solution for docs?

      • We don’t think so, but great point to revisit now

    • What does this “enterprise search” even look like?

      • There are offerings from vendors that search across systems (confluence, read the docs, other confluence, github, emails?!, etc.), making the API calls, scoping to things to which the searcher has access.

      • We did light investigation on this in the past, but dropped it because the available solutions were deemed too invasive at the time.

      • The ability to understand the current state of access control in Google Drive and Confluence is hard.  Adding enterprise search on top of this may exacerbate the hardness.

  • [Ned] OEP-55: Maintainership pilot is underway:

2022-08-17

  • [Andy] what even is this meeting now?

    • How does everyone stay informed about broader engineering efforts?

      • Now we are not all in L&P (“lump”?)

      • Conway’s Law in action

    • [Unstructured rambling from Alex about what/where Enterprise is now in relation to L&P and other parts of the platform system and organization].

    • What could we do to facilitate cross-batallion (column) architecture thoughts and information?

      • [Chris D.] Hire a chief architect?

      • [Alex D.] Is this what the Arch. Coordination WG is for?

      • [Andy] What even is the overriding edX engineering culture now?  A lot of scrum teams have their strong team cultures, but they’re each self-directed.

        • Org-chart: https://drive.google.com/file/d/1th-2GYGEsMzvFnto8iGa6IY69kGbhGZS/view  

        • Some columns have a dedicated architect right now (e.g. David Joy in LnP, although has interest in arch across edX/2U)

        • [Robert R.] Architectural fitness functions - does anyone have experience with this concept outside of edX?  Specifically measurable fitness functions.

          • [Ned] remembers some ideas about using linters to catch some things related to fitness functions.  Seems like we’re looking for a sort of “magic” technology to do architecture for us, instead of talking/training humans.

          • [Andy] Some experience in the past of publicizing cross-organization endpoint performance as a way to improve endpoints, make them adhere to better SLAs.

          • [Chris] Automated performance testing is hard.

      • “Repetition doesn’t ruin the prayer”

      • [Andy] Likes Chris’ concept of an architect - more of an architecture evangelist, trainer/teacher.  Not someone who hands you a design to go implement.

        • “Architecture Shaman”, “Architecture Preacher”, “Sage”, etc.

      • Do we need a role where someone goes around and gets architectural workshops organized on a frequent, regular basis.  Not presenting the workshops themselves, but prodding/requiring all (principal? Senior? anyone?) engineers to present topics at these workshops.

        • This came out of the idea that we lost our lunch/learn workshops that e.g. Dave O. would frequently run on performance (and other) things.

      • [Dave O.] 2U Enterprise is a similar use-case to a lot of open edX providers that have some custom stuff they want to run for paying organizations.  More ownership burden, optimized somewhat for faster speed of delivery.

        • Should things be optimized such that, if the enterprise squad is wholly re-organized (as a team/squad) tomorrow, the systems stay good?

        • [Alex] Rambles.

      • [Ned] Every scrum team feels like they “own too much”.  Why is this the case?  Is this an edX problem?  A product problem? A modern software problem?

      • [Robert] Raises the question of “are there some things which should not be included in open source?”

        • Could we make faster decisions about e.g. deprecating/decommissioning systems if we don’t have to worry about who outside of 2U is using that system?

      • Here’s an awesome diagram: Visualization Brainstorm - Product Core and Tech Core  

  • [Ned] Putting public information in the public wiki

  • [Ned] OEP-55: Maintainership pilot is underway:

    • Maintainership Pilot  

    • [relevant to discussion about open source costs, ownership of crufty things, etc.]

  •  [Ned] Would someone like to run this next week?

2022-08-03

2022-07-27

  • [quest/ideation] (Dave): How to update MySQL charset to utf8mb4

    • We currently use “utf8”, which isn’t real UTF-8 and only has 3 bytes (lacking support for many characters)

    • Utf8mb4 is supported under 5.7, but the most appropriate collation to use isn’t supported until 8.0.1

    • 2U SRE is still figuring out how to do the 5.7 -> 8 upgrade in Aurora without extensive downtime; there seems to be one option that will require a bunch of prep work

    • Most other installations will likely just want to dump and restore at Open edX upgrade time; for these, upgrading the DB and switching the encoding at the same time may make sense

    • Jeremy will bring this to SRE’s attention and see if/how it impacts MySQL upgrade plans for http://edx.org

    • [Andy] Seriously, is it just easier to switch to PostgreSQL instead?

      • Jeremy will ask about this too…

  • [quest/ideation] (Jeremy): How proactively do we want to track new Ubuntu LTS releases?

    • Question for the BTR WG?

    • This has ramifications for which Python release we next add support for

    • [Ned] Python 3.11 is supposed to be 25% faster than 3.10, but looks like it may be a rocky upgrade bug-wise due to internal changes

2022-07-20

  • [Andy] our standard JWT authentication tangles the global user into a service’s database. JSONWebTokenAuthentication may not be the right choice outside the monolith, but it’s in the cookiecutter.

    • [Jeremy] Django requires a user object even for basic request/response handling, and many of the fields like first name, last name, and email are required.  So we either need to copy them from the LMS or make up bogus data to avoid PII spread.

    • [Andy] I agree that if we need a user it’s better not to have a half-real half-madeup user. :)

  •  [Ned] Anybody participating in the Open Courseware architecture meetings?  How’s that going, what degree of overlap is there with this meeting?

    • [Chris] More like what I expect from an “architecture” meeting, about boundaries, following our best practices, etc.

    • [Ned] Wondering how much of the content there is of interest to the broader Open edX community

      • [Chris] Touches on Team Topologies, internal team structures, etc. which may be confidential and/or uninteresting to the community

    • [Ned] There’s also https://discuss.openedx.org/t/new-working-group-proposal-architecture-coordination/7786 , which may be interesting to the people here

  •  [Chris] Why is architecture so distributed/decentralized at edX/OCM compared to many other firms?

    • [Jeremy] We used to be more centralized, but teams often got blocked waiting for consensus from an architectural council with a different cadence.

    • [Jeremy] Also, we’ve already made a number of key choices like framework, deployment process, linting, etc.

      • [Chris] That feels more like DevOps than Architecture, although we do seem to have nailed DevOps pretty well.

    • [Andy] TripAdvisor was even more strongly against centralized architecture, apparently due to some bad experiences at other companies the employees had previously worked at.

    • (much more conversation that we failed to take notes on)

2022-07-13

2022-07-06

  • [inform] (Dave): Sent an email to interested parties about forming an Arch Coordination Working Group. Please ping me if you want to be added to the thread.

  • [quest] (Dave): Sentiment around level of tech debt?

    • (Dave) It feels to me like some old pain points are finally getting addressed

    • (Andy) My team wrote up a doc of existing tech debt, and many of the items were left there for 6+ months and they’ve been ok, may just need to accept that some of those are ok as is.

    • (Jeremy) It feels like a high percentage of the success in this area has been due to hiring contractors to do it for us

      • (Dave) Yes, but there was a lot of prep work building up to those efforts

      • (Jeremy) And we have contractors in the Open edX community now with a lot more experience with the project

  • [inform] (Jeremy) Wrote up a draft of https://openedx.atlassian.net/wiki/spaces/AC/pages/3467640837 , feedback welcome

  • [quest] (Jeremy) Should we cancel future sessions of this meeting?

    • Attendance is down, but we still have several active participants

    • (Dave) With large attendance, felt like it was only appropriate to bring up topics of broad interest

    • (Jeremy) Might be useful to collect and vote on topics ahead of time

    • (Andy) Maybe needs a rebranding?

    • (Andy) Switch to biweekly?

2022-06-29

  • [analysis] (David) This month we switched Monthly Arch Standup to a “lean coffee” style… which makes it feel like this meeting. There are a few things folks get from that meeting: a forced read of team status updates, updates on impactful changes, and the occasional “aha!” when we realize some teams should coordinate.  Is there a better way to do this?

  • [ideation](Simon) The value of this meeting and how to use that to improve the attendance? Easy solutions includes:

    • Adjust frequency

    • Adjust the duration

    • The start time

    • Discuss the historical context with OCM devs

    • (Ned, to add to above): seems like cross-functional meetings in general are getting smaller.

      • [Jeremy] A few people feel like they need to keep track of everything, most feel too busy with immediate needs to pay attention

    • There’s fragmentation between this meeting, OC Arch Hour, enterprise arch meeting, Monthly Arch Standup

    • [Simon] Consumer Review is more structured, with a schedule and specific proposals to be discussed.  Would some elements of that make this meeting and related ones more successful?

    • [Andy] This is often more of a process meeting than an architecture meeting, but that feels valuable

    • [Simon] Maybe add some smaller meetings to replace this one, move most architectural concerns into subgroups, and either reduce frequency or eliminate this meeting?

    • [Jeremy] Next steps?

      • [Andy] Reach out to people who don’t come and ask them what, if anything, would make it valuable to them?

      • [David] Kill this meeting, have tCRIL create a new one for the broader community, double down on 2U working groups, area-specific arch meetings, etc.?

  •  [analysis] (Jeremy) Draft recommendations for making cross-team PR reviews go more smoothly: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/76808270/Cross-Squad+PR+Reviews .  Thoughts?

  • [ideation] (David) Defining architectural principles and fitness functions for our domains… how!?  Worthwhile?

2022-06-22

  •  What information do we provide to our partners (2U, Trilogy, Get Smarter) when sending them leads from our site?  What’s contained in the UTM code and how do we know what happens on the other side?

    • Please connect with Gabe Mulley to figure out the different pathways for learners to go from http://edx.org to 2U other LOBs websites

  •  [David] Question for Simon around relaunching architecture advisory/working group meeting for Open Courses - Status update?

    • One meeting

      • Identity problem - what should the advisory group be?

        • Touches all sorts of things, team org, cross-functional stuff

    • Potential activities for advisory forum (top of mind list from David)

      • Principles

      • ADR review

      • AIM - architectural idea memo

      • “Ilities” - characteristics

      • Tech Radar

  • [Jeremy] Getting reviews on FED-BOM PRs

    • [Robert] Can we make it even more clear that what we want is just a sanity check that no major changes are inadvertently being made?

    • [David] Some of these are in teams with active maintainers, we could ask them for review on those instead.

    • [David] Renovate PRs that are patch/minor version bumps with no conflicts or Github check issues can just get merged

    • [David] Maybe we can use labels to help route after frontend triage takes a look?

      • Okay, maybe not useful

  • [Jeremy] How to get community momentum on the backlog of well-defined maintenance tasks?

    • [Andy] What about having deadlines made it work for INCR and Django 3.2?

      • The fact that many of the people were on the hook to upgrade to the next named release which needed these changes to stay in support windows.

    • [Robert] Do we just need more squads like Arbi-BOM?

      • In the Open edX community, not necessarily at 2U

      • Ask community members to chip in funding/support for such teams?

    • [Andy] Badges and achievements?

      • Discussion forum badges have been considered

      • Some kind of org recognition that could be used in marketing materials?  “Gold level Open edX supporter”

      • Recognition at conferences

2022-06-15

  • [Ned, inform] writing up instructions for using forks/upstream, in prep for SOX compliance: Working with Personal Forks    

    • (Jeremy) Maybe add some notes on how to avoid / recover from accidentally committing changes to master in the fork?

    • (Andy) What if you need to collaborate with other developers on the change?

      • (Ned) Give them access to your fork

    • (Feanil) What’s the obstacle to using the edx org for working forks?

      • (Ned) Large installed base of local clones configured to point at edx instead of openedx, which currently work due to forwarding.  No good data on how often this forwarding still happens.

        • (Feanil) I’ll email GitHub to ask if we can get stats for this

    • (Simon) Where are we using this first?

      • (Ned) There’s an upcoming communication about the first 6 repos where everyone outside the owning teams will have to follow this process.

    • (Ned) We may later need to also do this for a broader range of repos in the openedx org.

      • (Simon) Please let me know as soon as there’s any concrete news on this

  • [inform] (Jeremy) We’re considering having Arbi-BOM kick off implementation of OEP-45: Configuring and Operating Open edX.  Let me know if that concerns anyone.

    • (Feanil) Probably worth running it by Kyle and Regis again

    • (Feanil) Why are we using YAML rather than Python files?

      • (Jeremy) Ability to import twice for ease of dealing with derived settings, easier to write a schema validator for, more flexibility on where the settings file can live (doesn’t have to be on the PYTHONPATH)

      • (Feanil) Good points, but just keep this in mind when implementing in case a Python file turns out to work better for other reasons, given that it’s what Django usually expects

  • [Quest](Simon) Where are we with the Kafka event stream work? When can we expand the implementation to other use cases?

    • We have a working happy path use case, and are currently consolidating relevant code into a shared library

    • Need to complete that consolidation, do some error handling and monitoring improvements

    • Roadmap for Event Bus: https://github.com/openedx/platform-roadmap/issues/28

  • [inform|analysis] (David) We’re adding runtime config for MFEs based on config defined in edx-platform. 

2022-06-08

  •  [inform] (Simon) I created a OC Engineering and Architecture Advisory. I can use feedback from attendees of this meeting

    • Adam: Async Feedback, I think Arch Hour Moved on top of Embedded SRE meeting, I find it very helpful to read the meeting notes afterwards though.

    • Chris - It is also on top of Paragon WG

  • + [quest] (Simon) What LTI client account do 2U/edX maintain for development? TurnItIn? H5P.com? Others?

    • Studio and LMS go to different assignments on Turnitin

      • Same parameters with different values somehow

    • How do we get Turnitin accounts?

    • Work with PMs and PCs to collect a list of LTI clients that our partners most frequently use. Then approach those LTI clients and establish a process with them for supporting edx-platform integration. Establish also a process to add or subtract from that list of “supported LTI clients”.

  • [quest] (Robert) Did my devstack hacks document have anything new for anyone? Relates to last week’s discussion around my having less pain than others.

  • [inform] (Robert) Arch-BOM is experimenting with a Github project in place of Jira board. 

  • [inform/ideation/quest/analysis] (David J) Categorizing pages in the Architecture and Engineering wiki space by where they should probably end up: Architecture and Engineering wiki categorization

2022-06-01

  • [ideation/quest] [Robert] I’d like to discuss our test strategy regarding cypress tests.

    • Since e2e tests are more costly to run and maintain, we’ve generally kept to a smoke suite of important use cases. What strategy do we want?

    • For edx-platform, we used to have bokchoy integration tests. In this ADR, again, we decided on just a smoke suite because additional tests were too costly to maintain, too costly to run, and very rarely failed due to a real problem. Arbi-BOM removed bokchoy, and I think there is a plan to replace it with a cypress suite. Is the decision in this ADR still accurate?

    • Getting the e2e cypress tests working in the pipeline is currently owned by the QA team.

      • It is exciting that some of this work is making progress.

      • However, it seemed from Ansab Gillani’s Eng-All presentation that his team might be envisioning much greater test coverage using cypress (to be confirmed). Let’s discuss with someone from the team to determine alignment/misalignment, and determine good next steps.

    • Where to run the new Cypress e2e tests?

      • GitHub Actions: not positive it will work with the pipeline, needs discovery work

      • build-jenkins: current e2e tests run here, slated for decommissioning soon

      • tools-jenkins: choice of last resort

      • GoCD: not clear this can work

    • (Ned) Has this been announced/promoted to the Open edX community?

    • (Jeremy) Do the cypress e2e tests work in devstack or Tutor?

      • Not yet, mainly tested against stage so far

      • This isn’t a regression against the bok-choy e2e tests, since they haven’t worked in devstack for a long time

    • (Jeremy) How many of our e2e maintenance problems seem to be from bok-choy vs. cypress?

      • Cypress is a significant improvement over bok-choy, but still fairly problematic and slow compared to Pact tests

    • (Simon) The end goal is to have cypress test be running in the edx-platform CI/CD deployment

      • Expect to have this, if ESRE can fix the pipeline running blocker of cypress by then, around mid-June

    • (Dawoud) YOW! 2017 Beth Skurrie - It's Not Hard to Test Smart: Delivering Customer Value Faster #YOW is an insightful talk on e2e vs contract testing, what not to e2e test, the intent of the contract testing, etc.

  • [quest] (Jeremy) What factor(s) have most hampered your ability to deliver value in the last 6 months or so? (Deliberately open-ended, don’t want to lead towards any particular problem or solution.)

    • (Chris) Attrition, lots of people needing to take ownership of code they don’t know well yet

      • We need a better culture around writing code with the intention of eventually handing it over to someone else - documentation, etc.

    • (Andy) Test cycle time, too long to avoid context switches

      • Especially for local development, takes too long to get set up to even be able to run tests

      • We’re getting better about duration on GitHub tests, but still requires overhead to push a branch and create a PR, etc.

      • (Chris) Often takes days to get devstack back in a usable state after not using it for a while

        • (Robert) Why do some people keep encountering this and others almost never do?

          • Contributing factor is number of different services in use

          • Also, set of commands for quick fix of most problems isn’t well documented

    • (Simon) Lack of clarity of the value of what we’re currently doing

      • Example: ticket blocked for 2 weeks, but only impacting 1 learner

      • (Kashif) We don’t have good data on how often different parts of our test suite catch actual problems (especially relative to time spent diagnosing flaky tests)

2022-05-25

  • [ideation] (David) What would it take to create high-level (context/system) baseline, current state architecture diagrams for all of Open edX? 

    • In Mermaid, please: https://github.com/mermaid-js/mermaid#readme  

      • (David) I actually tried this and found that its layout engine isn’t good enough for complex diagrams… they just get impossible to read.  Happy to learn and be wrong about that, though!

        • (Ned) I haven’t tried Mermaid, but text would be great if we don’t want to start over every year.

          • (David) Diagrams as code living close to what they describe would be delightful

    • We have this from Content architecture vision, no? Why step away from C4modeling?

      • I’m not, that was the “context / system” above, but I should have said container for the second level

    • edX architecture onboarding presentation

    • Diagrams are useful, have different audiences, and require effort to maintain

    • (Jeremy) Should we have a designated owner(s) of overview diagrams and docs?

    • (Feanil) Even bad diagrams can be useful in the sense of getting more experience learning how to make better diagrams.

    • (David) Diagrams should come with a description of its intended audience

  • [quest](Simon) What is known within TCRIL world? What are the initiatives you are working on? 

  • [Ideation/quest?] (Kyle):  Private 2u-internal jira links (or other private links) in PRs - ways to nudge people to put context in the PR & commit message?

    • Idea: PR template

      • Ned disagrees - they get ignored/stale

      • [Robert] The decision of what we wish should be documented somewhere. Could be OEP. Could also be in PR template as well.

    • Idea: Linter/nagger that warns about private link

      • This would also unfairly warn people, though, who are including private links but also including all the relevant context.

      • Idea: Have some heuristics, eg if the pull request description is really short AND there’s a private link, then nag

    • Point: How many PRs are actually looked at by community members?

      • Kyle: potentially all of them. But how can we make this clearer?

        • Robert: Just say so every time we hit the problem.

    • relevant tcril-engineering issue: https://github.com/openedx/tcril-engineering/issues/271

2022-05-18

  • [Andy] generic xblock ticketing to enable exam service - be able to convey “the exam service thinks it is ok to show this xblock to this user right now” or conversely “this xblock demands the exam service said it was ok to see” via signed jwt.

    • Discussed for ~25 minutes, then context switched

  • [Andy] results from RCA “did not test” survey 2020-2022

    • Roughly half of recent RCAs have involved an inability to test in some way

  • [quest] (Danielle) David Joy mentioned edX/Open edX initiatives around guilds/interest groups/team extra curricular initiatives that ended up becoming very distributed over time. What about the distributed approach did work, what in retrospect didn’t pan out as intended

    • The question is mainly regarding what we call working groups

    • Add working group participation to career path. 

      • Lends credibility and importance to participating in this kind of activity

    • Not all working groups are the same

      • Some working groups are more  aligned with the daily work participants do than others

      • How is my participation in a working group helping me with my daily work?

      • What does the end goal of my participation in a working group look like? 

      • managers can play a role here with coaching their reports to help evaluate the participation and progress

    • How do we reconcile squad needs vs. working group needs?

      • Part of it is explicit expectation that engineers will spend a percentage of their time on this, and managers should help make that happen

      • Some things work better in squads, others in working groups

        • Are the tasks high latency, or do they require a lot of heads down time?

  • [quest] (Jeremy) We’re thinking of kicking off some kind of initiative to get better consistency our dev/stage/sandbox/prod/etc. environments.  What are people’s top wish list items in this area?

    • [David comment in chat] "How do I sandbox?” and "How do I use a sandbox with this thing?” seem like a perennial issues... which I think is influenced by the lack of consistency/predictability

2022-05-11

  • [Ned] Any Atlassian migration concerns to discuss?

    • Process feels a bit confused, but no major concerns right now

    • [Feanil] Curious about how we continue to work in public as much as practical when Jira goes private

      • Arch/Arbi-BOM considering experiment to work primarily in GitHub Issues

    • Looking to get Jira out of the picture for OSPR and BD

  • [Diana] (question) Paver, future of?

    • Should we make a conscious decision to either continue using or move away from paver?

    • Is there a clear “winner” to replace it?

    • [David] What are all the things Paver does?

    • [Feanil] we use paver in bad ways

      • To hide platform complexity, which keeps people from learning those complexities

    • [Jeremy] We should probably create a paver DEPR to clarify that we plan to phase it out over time, and not use it for new things

      • Jeremy will ask Arbi-BOM to enumerate what it is still used for, so we can come up with plans for each of them

  • [Andy] how do we get serious about local testability? We could break the site for a few days maybe?

    • [Jeremy] There are a few efforts in various stages of progress that could help with this:

      • Dev Env WG and the migration to tutor

      • Arch-BOM’s work on the Dev Data OEP and framework

      • Arbi-BOM’s effort to improve the state of Open edX configuration

      • Incident Management’s work on Pact (consumer-based contract testing)

    • [Andy] It keeps coming up in RCAs that something broke because it was too hard to test locally before merging and deploying

    • edX/2U used to have a Test Engineering group, we may still need something like that (especially if not bound specifically to Jenkins maintenance and the edx-platform test suite)

    • [Simon] Should we do an RCA on the problem to get more clarification on what exactly we need to solve?

    • [Andy] audit of RCAs since 2020 for could not / did not test as contributing factor: https://docs.google.com/spreadsheets/d/15UR4R8FWUgdBFJyRnXc6dbUV2ES3O8OPJyQhEJFeHI0/edit#gid=0 - RCA category is filtered to “regular” to exclude SRE and process type RCAs

      • Summary for outside of 2U google doc space - few RCAs so far in 2022 but 66% are in this bucket

      • More RCAs in 2021, maybe 50% in this bucket

      • 2020 similar to 2022

      • General level of terribleness in RCAs much lower in 2022 vs 2021 and especially 2020

2022-05-04

  • [Ned] (inform) Tobie Langel at conference: Moving to Collective Ownership

    • His key points from the keynote:

      • 2U needs to:

        • Spell out the business value of open source

        • Accept changes to flow in order to level the playing field

        • Teach to fish rather than give fish

      • The community needs to:

        • Understand that open source is a do-ocracy

        • Spell out their business value for contributing

        • Stop asking for fish

      • tCRIL needs to:

        • Facilitate everything

    • live notes from the follow-on discussion session: https://docs.google.com/document/d/1BuMwDdsFVto1NLvaUgkJXHgvibzZbmjCuQrMe-v4wTE/edit#

    • Simon: maybe there’s open source value for 2U, but the community won’t like it.  Maybe if 2U earns money?

      • Other projects have run into this problem, where profit companies aren’t contributing back

    • Force of divergence might get greater over time.

      • How can we push back against the forces of divergence?

      • 2U uses data pipeline tools that cost money, so the community doesn’t adopt them, and we have a different scale than they do.

        • This is diversity, is that different from divergence?

        • Find places to use common solutions, and also allow for diversity

          • Pick the boundaries appropriately

        • A common reason for organizations to use an Open edX provider (like Appsembler or Opencraft) is that they use some vendor they’d like to integrate their platform with.

      • [Simon] Can tCRIL help facilitate the identification of what community contributions would be most valuable for everyone?

        • E.g. There was a time when edX devs thought ecommerce would be fully accepted by the community, so a lot of effort was put into documentation and feature work, but that turned out to be a faulty assumption - Open edX installations mostly did not want/need the ecommerce system.

          • Ned contends this was not really wasted effort, the payoff was just very far in the future and adopted by fewer Open edX installations than we thought at the time.  But it still helped drive adoption.

    • [Ned] The attention from 2U/edX toward the community could wander as time goes on, and then the value of community contributions decreases/disappears?

      • Working with the community is an engineering tactic and strategy; there won’t be strong user feedback that indicates to 2U that we’re not working enough with the community.

      • The business value of working with the community can seem counterintuitive, e.g. working to help merge a contribution often doesn’t get us closer to releasing a given feature next week (or whenever).

        • There is at least one example (Racoon Gang blended projects, coordinated by Adam S.) where community contributions are both timely and relevant/unblocking to active sprint work by a 2U scrum team (Enterprise Titans/Access).

        • Robert R. is reviewing a giant community DEPR PR and is happy to see the work moving forward (from Racoon Gang).  It requires some non-trivial amount of Robert’s time, but the work is moving forward.

          • (Do we just love Racoon Gang?)

    • [Robert] Tobie brought up the idea of an open-source Member Organization as a way to provide funding.  We don’t really have that, but tCRIL has decided to start funding some community projects (in addition to 2U/edX).  

      • The tCRIL funding just comes out of the general tCRIL project.

        • We’re not sure how nonprofits can collect dues or whatever from potential members and what those dues could be used for.

      • Would it be better if there was just one big community fund?

      • Is it possible to allow donors to specifically allocate money toward blended development (other non-profit donations allow this type of thing, e.g. “Here’s money from the Undergraduate basket weaving program at U. State”).

    • [Dave O.] Community contributions that help clean-up or deprecate brings real value to edX/2U/tCRIL, though it might not have a direct line to business value.

  • [Ned] OEP-55 (Project Maintainers): increase community rights, or reduce edX rights?

  • [Ned] (somewhat relatedly) people are chattering about changing edX deploys.  Some sketchy notes here: https://openedx.atlassian.net/wiki/spaces/AC/pages/3343646983   

  • Question about “by the end of the year, 2U should be treated like everyone else”.  Do we know what this would look like? (also note this quote is from Tobie, whose statements on this matter are intentionally bold/provocative).

    • Couple of options

      • Are 2U engineers made to go through the same core-committer process as non-2U community members?  Or is access to some repos restricted to only the core-committer process?  Or something else.

    • The hard work is figuring out exactly how this could/should work.

  • [Andy] The language around “rights” is maybe too emotional?  Same with “level playing field”.

    • Maybe “permissions” would be better

    • “Level playing field” might be too big of a lift to get to.  As long as 2U is the largest Open edX installation/organization, it won’t be level.

      • It’s “asymmetric”

    • Part of the thing to fix is an emotional component; the rest of the community feels like a second-class citizen.  This is important to acknowledge.

    • [Simon] Security patching process - we might need to hide community contributions that patch a vulnerability.  Conversely, there are 2U things that need to be hidden/private from the community.

      • [Dave O.] Says the forbidden word (“fork”).

2022-04-27

  • [quest] (Jeremy) What are your top pain points related to Open edX configuration and settings?  Do you have any suggestions for improvement in this area?

    • Lack of consistency

      • Settings file in repo

      • Remote config

      • YAML files generated for devstack & Tutor

    • OEP-45: Configuring and Operating Open edX (Provisional) (link to Configuration section)

    • Multiple override layers make it nearly impossible to keep the big picture in your head

    • Little/inconsistent documentation for each setting

  •  [inform/quest] (Jeremy) We’re working on spinning up an Arbisoft squad for front end architecture and maintenance.  If that pans out, what would be your top project nominations for them to work on?

    • [Simon] Somewhat concerned about interface between maintenance work and front end features impacted by it on owning teams

    • On demand model may work best, where they do more work on request than proactive maintenance

    • May be best to start with Front End WG requests

  •  [inform](Andy) I’ve started working through how we might capture program intent (not enrollments, intent) Spec here with Spencer and I, Beginning of Technical Doc on it and the Approach doc by Spencer which still called this program enrollments

  •  [Quest](Simon) Why do we have Monthly arch standup, Arch hour, and Content theme arch group as a sub group? Is architecture organically grown to the point where we can use some top level organization?

    • The OEP-56 is trying to address the meta side of this question.

    • Can we please use that OEP to formulate a plan for this question?

    • Feels like too many cross-functional needs land in “well, maybe Arch-BOM will get around to it”

2022-04-20

  • [quest](Simon) input or reactions on How do we perform access control on Special Exams without edx-proctoring

    • Async feedback after the meeting is welcome, the doc’s a little long to read and process during the meeting

  • [Ned] Can we set guidelines for how to use our various communication channels?

    • (Simon) This feels like an attempt to impose control on an existing grassroots communication pattern

    • (Ned) Rationale is to make sure that people who want to stay informed about certain kinds of discussions/decisions don’t miss key communications

    • (Jeremy) Arch-BOM and Arbi-BOM use https://openedx.atlassian.net/wiki/spaces/AT/pages/2331836527  

    • (Ned) Is there a set of announcement venues which, when all utilized to announce something, we can realistically assume will reach all developers?

      • True at least for all developers adequately paying attention, as long as we don’t flood the communication channels with irrelevant information

      • (Ned) Good point, we need to take care to preserve the signal/noise ratio in such channels

  • [analysis] (Jeremy) Does it sound reasonable to spin up a bunch of Blended Development projects, coordinated by different squads, to finish building all the MFEs we need?  Then throw away all the edx-platform JS and spin up an Arbisoft squad to help with front end maintenance?

    • [SC]It’s a good practice to calculate the cost of delay on the risks and the scale.

    • [JB] The old javascript will also be a reason for poor employee engagement and trigger employee retention risks.

2022-04-13

  • [Ned] (inform) there’s a thread in #openedx about GitHub wikis

    • Strong suspicion that it would just make things worse

    • Only 14 repos use it, most of them stale or almost empty

xblock-utils              2 pages   2016-01-12 Tim Krones: Improve formatting, spelling, and wording. edx-analytics-pipeline    9 pages   2017-11-28 brianhw: Updated Tasks to Run to Update Insights (markdown) edx-analytics-dashboard   2 pages   2015-12-22 Daniel Friedman: Updated WIP: OpenID Connect (markdown) cs_comments_service       5 pages   2014-06-30 Trinh Nguyen: Updated Query or Delete comments data in mongodb (markdown) edx-proctoring            2 pages   2015-12-16 chrisndodge: Updated Release Process (markdown) xblock-sdk                1 pages   2016-05-13 Bui Trung Nghia: Initial Home page openedx-demo-course       1 pages   2014-04-24 Luyang: Created Home (markdown) configuration             19 pages  2020-03-11 Tim McCormack: repoint to devstack repo edx-platform              56 pages  2022-03-25 Julia Eskew: Updated Opaque Keys (Locators) (markdown) edx-app-android           2 pages   2015-03-25 LiuNaidi: Initial Home page studio-frontend           1 pages   2018-03-06 Eric Fischer: formatting edx-notifications         1 pages   2015-04-15 chrisndodge: Updated Home (markdown) ux-pattern-library        6 pages   2019-12-13 genisys58: Created Styleguide: Sass & CSS (markdown) edx-tools                 2 pages   2017-10-03 Julia Eskew: Updated Home (markdown)

2022-04-06

  • [Ned/David] we are looking for examples of discussions where you were uncertain whether you could share information from inside 2U to outside.

    • Came up in conversation with 2U privacy policy authors

    • Reconciling differences between said policy and historical edX behavior

    • Doc for 2U Privacy: 2U Privacy Policy and the Open edX Community

    • Grey areas:

      • Vendors and services we use

      • OGSPs

      • 2U-edX convergence opportunities (software, capabilities, product, etc.)

      • What can we share with tCRIL that we can’t share publicly?

    • Examples:

      • Plans for integrating lines of business

      • Roadmap of future features

      • Discussing security