Arch Hours: 2022

Meeting Expectations

Why?

  • Provide an opportunity for generative discussion and ideas.

  • Foster comradery through technical curiosity and geekdom.

Who?

  • Open to all edX-ers and Arbisoft-ers

What?

  • At times, these informal discussions result in follow-up action and beneficial change in our technology or in our organization. While this is not a decision-making body, these serendipitous discussions spark ideas that may result in ADRs/OEPs and tickets on team backlogs.

  • At times, it serves as a form of informal office hours to ask live technical questions of the archeological collective.

  • At times, we have pre-planned deep-dive topics that folks propose to gather wide-input or to answer questions.

  • At times, we have hosted special guests (internal and external to edX) on specialized topics.

When?

  • Not lunch hour in ET timezone: With Covid remote work, "Arch Lunch" has evolved into “Arch Hour” in order to accommodate various home/life situations during lunch time.

How? Live Co-Editing

To circumvent Confluence’s limitations with the maximum number of concurrent editors:

Why not just stick with keeping the notes in the Google doc?

  • Google docs are not as discoverable.

  • Google docs don’t notify observers of future edits.

  • Google doc comments don’t notify all observers.

How? Structure

Please enter your proposed topics for discussion.
When we use Lean Coffee Style (link1, link2), we vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).

Prefix your topic with your intention so we are clear on what outcome you are striving from the discussion. Examples:

  • [inform] You are simply seeking to inform the group of this item. You may field clarifying questions from the group on your inform, but not seeking further discussion at this time.

  • [ideation] You are seeking divergent and wide perspectives from this group. In this brainstorming mode, all ideas are accepted, without critical analysis.

    • It may be helpful to clarify whether you’d like to ideate on the problem space or the solution space.

  • [analysis] You are asking the group to help you poke holes in your idea/topic/plan/etc.

  • [quest] You are seeking information/responses to a question you have.

2022-12-21

2022-12-14

  • [Feanil/Ned] Announcements

  • [Feanil] General overview of how things are going at 2U?

  • [Andy] report on LTI tool actual vs. specified or expected behavior 

    • Unique identifiers

    • PII sharing

2022-12-07

2022-11-30

  • [Ben W] What does the http->https forwarding?

    • [Robert] Cloudflare probably for http->https. Also, an answer to a separate question, Google TagManager is often where random scripts are dropped on the page.

2022-11-23

Low attendance due to Thanksgiving-related PTO. There was some continuation of discussions about XBlocks, iframes, and CSS conflicts, but notes weren’t taken.

2022-11-16

  • [inform] (Jeremy) Updated draft of Development Environment Vision is ready for review

  • [quest] Jeff Witt 1 min: Use of !important in CSS – OK to use, or to be avoided?  Consensus seems to be that it’d be best to avoid it. Uncertain if there’s any substantial a11y angle on this guideline.

  • [discussion] Ned: OEP-55 Maintainership: monitoring issues, PR SLAs

    • We’ve picked repos for the pilot that are likely to do well at this, but what happens when it’s expanded to repos owned by overwhelmed teams?

    • [John] Do we need someone like Natalia to help teams keep track of this?

    • [Andy] It’ll probably increase the pressure to catch up with the maintenance backlog in various repos

    • [Jeremy] I suspect that much of the need for a project manager arises from immature processes around software maintenance and sustainability, we should also take steps to address that.

    • [Ned] Pilot Phase 2

    • [Andy] We have processes for tracking OSPRs, but really not for GitHub Issues yet.  How do we make sure these actually get considered when prioritizing?  (Given that many of our product managers/owners live primarily in Jira.)

    • [John] We could improve scheduling of automated upgrade PRs.

      • [Andy] Some teams are already doing this, at least for the Python upgrade PRs.

    • Much of OSPR handling is currently being dealt with in per-team on-call processes, which works but may not be the ideal approach.

    • [Andy] If you have a product-mandated backlog, fix that first.  Needs to be a conversation that factors in maintenance needs.

    • [John] Having more advance notice that PRs will be coming (and why) really helps.

2022-11-09

2022-11-02

2022-10-26

Skipped due to low attendance

2022-10-19

  • [Jeremy] High-level development environment objectives

    • No need to debug code updating problems

    • Fast to set up a new dev environment

    • Don’t need to carefully preserve manually set up testing data

    • Good support for debugging and observability

    • Consistent between services

    • Able to run reasonable subsets of the full Open edX ecosystem of services

    • Defaults to feature flags currently active in production

    • Comes with data needed to quickly test most features

  • [Adam] [quest] I'd like to discuss with this group and Simon to better understand the plan for moving to Open Search

  • [Jeremy] Can we get away from requiring thorough owning team review for maintenance, bug fix, and small feature enhancement PRs?  What would have to change to make that happen?

    • Plugins/libraries need to have been tested in the things they’re installed in

    • Make test suites more reflective of actual behavior in production deployment

    • Make the changes unused in edx-platform

    • Address issues raised in previous RCAs - trailing slash consistency, database migration linting

    • Shorten the time from merging to detecting problems in production

    • Canary deployments?

    • Shrink the size of edx-platform (small problem can bring down a large chunk of production)

    • Automatically deploy a test environment that exercises the change

2022-10-12

  • [inform] (Ben W) FWG/Opencraft/RacoonGang Theming conflict.  Working with groups to try and consolidate how we co-ordinate work around the platform between working groups and get them to talk to each other.

    • We ended up with parallel meetings: 2U-focused and Open edX community focused

    • Not much communication happened between these parallel groups

    • Trying to fit all front end stuff into one series of meetings ends up at poor signal to noise ratio

    • Need a clear forum for this coordination

    • (Ned) Concerned about defaulting to a meeting as the primary forum for this: conflicts, time zones, etc.

    • (Chris) Should we use the Open edX roadmap for this kind of coordination?

    • (Ben) Trying to reconcile architectural initiatives being driven by multiple organizations in the same project sounds terrifying

    • (John) This sounds like a flaw/scalability problem with our architecture and process that needs to be fixed

    • (Andy) We need to get better at sending more redundant communications the larger a project is

  • [quest] (Jeremy B) Developer Experience - reasonable focus for this meeting? Arch-BOM is pivoting(?) to focus on this

    • For a loose definition of DX, perhaps

    • Any examples or resources we should learn from?

      • (Ben) Standardized debugging/troubleshooting tools

    • (Chris) It feels like some aspects of DX start to cross back into architecture

    • (Chris) Would be good to get a status update on development environment efforts

    • (Ben) How can we make “thing not in platform” easier (for a new python API)

    • (Ben) How can we make “New MFE” easier (observability, config, etc)

    • (Hamzah) A “newsletter” of changes and features would be helpful.

  • [inform] (Ben W) FedX exists again.  What this means.  What our focuses will be.

2022-10-05

  • [inform] (Ned) We need hackathon organizers, please volunteer

  • [quest] (Jeremy) Hacktoberfest - do we want to accept contributions this year?

    • Easy way to get T-shirts for developers

    • But it’s not clear how much else we get out of this; usually a few vaguely useful contributions, a few mild wastes of time

    • [Ned] We have enough problems with our contribution pipeline as is, may not be a great idea to pile more into the backlog

    • [Jeremy] We do have a bunch of GitHub Issues for edx-platform pytest warning fixes that we could tag for participants

    • Let’s activate selectively for things where there are useful issues open and maintainers are willing

  • [quest] (Jeremy) How important/useful to people think type checking would be?

    • [Ned] First we’d need to fix our existing linting

    • We’d need a policy, one reasonable example could be “you may add type hints, but you don’t have to.  If you do, the linting must not break”

    • Communicate said policy

    • Hold off on any big push for test-generated type hints or other comprehensive annotations

  • [discuss] (Diana) What do we need to do to make sure there’s not much disruption from Slack migration? 

    • Migrate existing channels

    • Update integrations

    • Handle shared channels

    • There’s a lot written about this, few people have had time to read it all.  And it sounds like there are at least a few corner cases that the docs and process don’t cover yet.

    • Emoji transfers (Matt Hughes seems to be working on this)

  • [discuss] (Ned) Links to private wikis from public wiki. Allowed/disallowed?

    • Feanil: it’s fine as long as it’s clear that it’s a private link and that it was understood by the author that it’s private.

    • [Jeremy] Is it worth wrapping them in conditional content blocks to make it explicit and avoid distracting other readers?

    • [Feanil] How about a table at the bottom of the page for links to each org’s private related context?

  • [inform] (Ned) Kelly is trying to formalize “public workstreams”: https://edx-internal.slack.com/archives/CDA7GMJ4B/p1664910103145889 (private 2U link)

  • [discuss] Max’s impressions of FedBom PR flow

  • [inform/quest] (Jeremy) Arch-BOM -> Developer Experience

    • If you have any suggestions on improvements that should be prioritized, please let us know

2022-09-28

  • [Ned/inform] Open Source Process working group: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/19467639/Open+Source+Process+Working+Group (private)

  • [Ned] Forking Strategies doc in progress: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/155746369/Forking+Strategies (private)

  • [Jeff/quest] Do we have a Dates API, for extensions?

    • Idea is that we should have some mechanism in the platform to facilitate people scheduling time to work together on a course

    • Things like this: https://www.flow.club/ and https://focusme.com/  

    • [Dave] There’s support for retrieving key dates about the course, but not adding dates

  • [ideation] (Jeremy) Frontend security vulnerability handling

    • We get dependabot alerts about security vulnerabilities in dependencies.

    • Would be nice to just upgrade things (hopefully automatically)

    • Fed-BOM is working to get upgrade PRs like this assigned to owning teams.

    • [Alex] opines that teams may be missing a more formal on-call process, through which these upgrades could be actualized.

    • [Andy] A big part of the problem is that our frontend test suites are insufficient to catch even fairly major problems before deployment

      • This is not really a frontend unique problem, it hits all PRs from outside the team

  • [Feanil/question] What kind of testing maturity do we feel we need?

    • Better mocking and Test Data

    • More contract testing

    • Adding tests specifically for issues that broke Prod.

    • Record context on the bugs that escaped to production in a more public way so the community can better understand what broke and how.

  • [Ned/question] Hackathon?

    • [Jeremy] We need organizers, please get in touch if interested

2022-09-21

2022-09-07

  • [Phil] [quest] User IDs across services - was very confused and was hoping for some clarification for people who know Django better.

    • https://open-edx-proposals.readthedocs.io/en/latest/architectural-decisions/oep-0032-arch-unique-identifier-for-users.html

    • Jeremy Bowman

      • User ID in Django is just an auto incrementing identifier

      • Only meant to be unique within service

      • We have used usernames and email addresses in the past to connect services.

      • PII is a concern, though, with usernames and email addresses.

      • We use LMS database ID as the global identifier for the user.

      • Other IDAs have their own user ID which is distinct from that LMS database ID, but have a field in the model.

    • John Nagro

      • Maybe enterprise-access or program-intent-engagement might have clues.

    • Chris Deery

      • Change the API LMS-side to use user ID.

      • Ask the owner too!

      • Interested in the context of how to get more MFEs getting set up as efficiently as possible.

    • John Nagro

      • Maybe there’s a way to create conveniences in cookiecutter to, e.g. hydrate missing user information

      • In Rails, you can have a class that looks & acts like an ORM object but is backed by an API

2022-08-31

Notes available on private 2U Confluence: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/150569008/Arch+Hours+Private+2022.

2022-08-24

  • [Ned] Putting public information in the public wiki

    • https://2u-internal.atlassian.net/wiki/spaces/DOC/pages/120586314/2U+or+Open+edX+Where+to+put+new+docs  

    • Leaving stub pointers from 2u-internal to openedx will help remind people about the split

    • Andy says it’s easy to move docs, but you have to fix the links

    • Who will be responsible for informing devs?

      • The Open Source Process working group will figure that out

    • What about wiki vs readthedocs?

      • If it’s going into a wiki, it’s better to put it in the right wiki

      • TODO: is there a global template that can provide in-the-moment guidance?

    • Does 2U have any sort of “enterprise search” solution for docs?

      • We don’t think so, but great point to revisit now

    • What does this “enterprise search” even look like?

      • There are offerings from vendors that search across systems (confluence, read the docs, other confluence, github, emails?!, etc.), making the API calls, scoping to things to which the searcher has access.

      • We did light investigation on this in the past, but dropped it because the available solutions were deemed too invasive at the time.

      • The ability to understand the current state of access control in Google Drive and Confluence is hard.  Adding enterprise search on top of this may exacerbate the hardness.

  • [Ned] OEP-55: Maintainership pilot is underway:

2022-08-17

  • [Andy] what even is this meeting now?

    • How does everyone stay informed about broader engineering efforts?

      • Now we are not all in L&P (“lump”?)

      • Conway’s Law in action

    • [Unstructured rambling from Alex about what/where Enterprise is now in relation to L&P and other parts of the platform system and organization].

    • What could we do to facilitate cross-batallion (column) architecture thoughts and information?

      • [Chris D.] Hire a chief architect?

      • [Alex D.] Is this what the Arch. Coordination WG is for?

      • [Andy] What even is the overriding edX engineering culture now?  A lot of scrum teams have their strong team cultures, but they’re each self-directed.

        • Org-chart: https://drive.google.com/file/d/1th-2GYGEsMzvFnto8iGa6IY69kGbhGZS/view  

        • Some columns have a dedicated architect right now (e.g. David Joy in LnP, although has interest in arch across edX/2U)

        • [Robert R.] Architectural fitness functions - does anyone have experience with this concept outside of edX?  Specifically measurable fitness functions.

          • [Ned] remembers some ideas about using linters to catch some things related to fitness functions.  Seems like we’re looking for a sort of “magic” technology to do architecture for us, instead of talking/training humans.

          • [Andy] Some experience in the past of publicizing cross-organization endpoint performance as a way to improve endpoints, make them adhere to better SLAs.

          • [Chris] Automated performance testing is hard.

      • “Repetition doesn’t ruin the prayer”

      • [Andy] Likes Chris’ concept of an architect - more of an architecture evangelist, trainer/teacher.  Not someone who hands you a design to go implement.

        • “Architecture Shaman”, “Architecture Preacher”, “Sage”, etc.

      • Do we need a role where someone goes around and gets architectural workshops organized on a frequent, regular basis.  Not presenting the workshops themselves, but prodding/requiring all (principal? Senior? anyone?) engineers to present topics at these workshops.

        • This came out of the idea that we lost our lunch/learn workshops that e.g. Dave O. would frequently run on performance (and other) things.

      • [Dave O.] 2U Enterprise is a similar use-case to a lot of open edX providers that have some custom stuff they want to run for paying organizations.  More ownership burden, optimized somewhat for faster speed of delivery.

        • Should things be optimized such that, if the enterprise squad is wholly re-organized (as a team/squad) tomorrow, the systems stay good?

        • [Alex] Rambles.

      • [Ned] Every scrum team feels like they “own too much”.  Why is this the case?  Is this an edX problem?  A product problem? A modern software problem?

      • [Robert] Raises the question of “are there some things which should not be included in open source?”

        • Could we make faster decisions about e.g. deprecating/decommissioning systems if we don’t have to worry about who outside of 2U is using that system?

      • Here’s an awesome diagram: https://openedx.atlassian.net/wiki/spaces/OEPM/pages/3499786241  

  • [Ned] Putting public information in the public wiki

  • [Ned] OEP-55: Maintainership pilot is underway:

  •  [Ned] Would someone like to run this next week?

2022-08-03

2022-07-27

  • [quest/ideation] (Dave): How to update MySQL charset to utf8mb4

    • We currently use “utf8”, which isn’t real UTF-8 and only has 3 bytes (lacking support for many characters)

    • Utf8mb4 is supported under 5.7, but the most appropriate collation to use isn’t supported until 8.0.1

    • 2U SRE is still figuring out how to do the 5.7 -> 8 upgrade in Aurora without extensive downtime; there seems to be one option that will require a bunch of prep work

    • Most other installations will likely just want to dump and restore at Open edX upgrade time; for these, upgrading the DB and switching the encoding at the same time may make sense

    • Jeremy will bring this to SRE’s attention and see if/how it impacts MySQL upgrade plans for http://edx.org

    • [Andy] Seriously, is it just easier to switch to PostgreSQL instead?

      • Jeremy will ask about this too…

  • [quest/ideation] (Jeremy): How proactively do we want to track new Ubuntu LTS releases?

    • Question for the BTR WG?

    • This has ramifications for which Python release we next add support for

    • [Ned] Python 3.11 is supposed to be 25% faster than 3.10, but looks like it may be a rocky upgrade bug-wise due to internal changes

2022-07-20

  • [Andy] our standard JWT authentication tangles the global user into a service’s database. JSONWebTokenAuthentication may not be the right choice outside the monolith, but it’s in the cookiecutter.

    • [Jeremy] Django requires a user object even for basic request/response handling, and many of the fields like first name, last name, and email are required.  So we either need to copy them from the LMS or make up bogus data to avoid PII spread.

    • [Andy] I agree that if we need a user it’s better not to have a half-real half-madeup user. :)

  •  [Ned] Anybody participating in the Open Courseware architecture meetings?  How’s that going, what degree of overlap is there with this meeting?

    • [Chris] More like what I expect from an “architecture” meeting, about boundaries, following our best practices, etc.

    • [Ned] Wondering how much of the content there is of interest to the broader Open edX community

      • [Chris] Touches on Team Topologies, internal team structures, etc. which may be confidential and/or uninteresting to the community

    • [Ned] There’s also https://discuss.openedx.org/t/new-working-group-proposal-architecture-coordination/7786 , which may be interesting to the people here

  •  [Chris] Why is architecture so distributed/decentralized at edX/OCM compared to many other firms?

    • [Jeremy] We used to be more centralized, but teams often got blocked waiting for consensus from an architectural council with a different cadence.

    • [Jeremy] Also, we’ve already made a number of key choices like framework, deployment process, linting, etc.

      • [Chris] That feels more like DevOps than Architecture, although we do seem to have nailed DevOps pretty well.

    • [Andy] TripAdvisor was even more strongly against centralized architecture, apparently due to some bad experiences at other companies the employees had previously worked at.

    • (much more conversation that we failed to take notes on)

2022-07-13

2022-07-06

  • [inform] (Dave): Sent an email to interested parties about forming an Arch Coordination Working Group. Please ping me if you want to be added to the thread.

  • [quest] (Dave): Sentiment around level of tech debt?

    • (Dave) It feels to me like some old pain points are finally getting addressed

    • (Andy) My team wrote up a doc of existing tech debt, and many of the items were left there for 6+ months and they’ve been ok, may just need to accept that some of those are ok as is.

    • (Jeremy) It feels like a high percentage of the success in this area has been due to hiring contractors to do it for us

      • (Dave) Yes, but there was a lot of prep work building up to those efforts

      • (Jeremy) And we have contractors in the Open edX community now with a lot more experience with the project

  • [inform] (Jeremy) Wrote up a draft of https://openedx.atlassian.net/wiki/spaces/AC/pages/3467640837 , feedback welcome

  • [quest] (Jeremy) Should we cancel future sessions of this meeting?

    • Attendance is down, but we still have several active participants

    • (Dave) With large attendance, felt like it was only appropriate to bring up topics of broad interest

    • (Jeremy) Might be useful to collect and vote on topics ahead of time

    • (Andy) Maybe needs a rebranding?

    • (Andy) Switch to biweekly?

2022-06-29

  • [analysis] (David) This month we switched Monthly Arch Standup to a “lean coffee” style… which makes it feel like this meeting. There are a few things folks get from that meeting: a forced read of team status updates, updates on impactful changes, and the occasional “aha!” when we realize some teams should coordinate.  Is there a better way to do this?

  • [ideation](Simon) The value of this meeting and how to use that to improve the attendance? Easy solutions includes:

    • Adjust frequency

    • Adjust the duration

    • The start time

    • Discuss the historical context with OCM devs

    • (Ned, to add to above): seems like cross-functional meetings in general are getting smaller.

      • [Jeremy] A few people feel like they need to keep track of everything, most feel too busy with immediate needs to pay attention

    • There’s fragmentation between this meeting, OC Arch Hour, enterprise arch meeting, Monthly Arch Standup

    • [Simon] Consumer Review is more structured, with a schedule and specific proposals to be discussed.  Would some elements of that make this meeting and related ones more successful?

    • [Andy] This is often more of a process meeting than an architecture meeting, but that feels valuable

    • [Simon] Maybe add some smaller meetings to replace this one, move most architectural concerns into subgroups, and either reduce frequency or eliminate this meeting?

    • [Jeremy] Next steps?

      • [Andy] Reach out to people who don’t come and ask them what, if anything, would make it valuable to them?

      • [David] Kill this meeting, have tCRIL create a new one for the broader community, double down on 2U working groups, area-specific arch meetings, etc.?

  •  [analysis] (Jeremy) Draft recommendations for making cross-team PR reviews go more smoothly: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/76808270/Cross-Squad+PR+Reviews .  Thoughts?

  • [ideation] (David) Defining architectural principles and fitness functions for our domains… how!?  Worthwhile?

2022-06-22

  •  What information do we provide to our partners (2U, Trilogy, Get Smarter) when sending them leads from our site?  What’s contained in the UTM code and how do we know what happens on the other side?

    • Please connect with Gabe Mulley to figure out the different pathways for learners to go from http://edx.org to 2U other LOBs websites

  •  [David] Question for Simon around relaunching architecture advisory/working group meeting for Open Courses - Status update?

    • One meeting

      • Identity problem - what should the advisory group be?

        • Touches all sorts of things, team org, cross-functional stuff

    • Potential activities for advisory forum (top of mind list from David)

      • Principles

      • ADR review

      • AIM - architectural idea memo

      • “Ilities” - characteristics

      • Tech Radar

  • [Jeremy] Getting reviews on FED-BOM PRs

    • [Robert] Can we make it even more clear that what we want is just a sanity check that no major changes are inadvertently being made?

    • [David] Some of these are in teams with active maintainers, we could ask them for review on those instead.

    • [David] Renovate PRs that are patch/minor version bumps with no conflicts or Github check issues can just get merged

    • [David] Maybe we can use labels to help route after frontend triage takes a look?

      • Okay, maybe not useful

  • [Jeremy] How to get community momentum on the backlog of well-defined maintenance tasks?

    • [Andy] What about having deadlines made it work for INCR and Django 3.2?

      • The fact that many of the people were on the hook to upgrade to the next named release which needed these changes to stay in support windows.

    • [Robert] Do we just need more squads like Arbi-BOM?

      • In the Open edX community, not necessarily at 2U

      • Ask community members to chip in funding/support for such teams?

    • [Andy] Badges and achievements?

      • Discussion forum badges have been considered

      • Some kind of org recognition that could be used in marketing materials?  “Gold level Open edX supporter”

      • Recognition at conferences

2022-06-15

  • [Ned, inform] writing up instructions for using forks/upstream, in prep for SOX compliance: https://openedx.atlassian.net/wiki/spaces/AC/pages/3458170881    

    • (Jeremy) Maybe add some notes on how to avoid / recover from accidentally committing changes to master in the fork?

    • (Andy) What if you need to collaborate with other developers on the change?

      • (Ned) Give them access to your fork

    • (Feanil) What’s the obstacle to using the edx org for working forks?

      • (Ned) Large installed base of local clones configured to point at edx instead of openedx, which currently work due to forwarding.  No good data on how often this forwarding still happens.

        • (Feanil) I’ll email GitHub to ask if we can get stats for this

    • (Simon) Where are we using this first?

      • (Ned) There’s an upcoming communication about the first 6 repos where everyone outside the owning teams will have to follow this process.

    • (Ned) We may later need to also do this for a broader range of repos in the openedx org.

      • (Simon) Please let me know as soon as there’s any concrete news on this

  • [inform] (Jeremy) We’re considering having Arbi-BOM kick off implementation of OEP-45: Configuring and Operating Open edX.  Let me know if that concerns anyone.

    • (Feanil) Probably worth running it by Kyle and Regis again

    • (Feanil) Why are we using YAML rather than Python files?

      • (Jeremy) Ability to import twice for ease of dealing with derived settings, easier to write a schema validator for, more flexibility on where the settings file can live (doesn’t have to be on the PYTHONPATH)

      • (Feanil) Good points, but just keep this in mind when implementing in case a Python file turns out to work better for other reasons, given that it’s what Django usually expects

  • [Quest](Simon) Where are we with the Kafka event stream work? When can we expand the implementation to other use cases?

    • We have a working happy path use case, and are currently consolidating relevant code into a shared library

    • Need to complete that consolidation, do some error handling and monitoring improvements

    • Roadmap for Event Bus: https://github.com/openedx/platform-roadmap/issues/28

  • [inform|analysis] (David) We’re adding runtime config for MFEs based on config defined in edx-platform. 

2022-06-08

  •  [inform] (Simon) I created a OC Engineering and Architecture Advisory. I can use feedback from attendees of this meeting

    • Adam: Async Feedback, I think Arch Hour Moved on top of Embedded SRE meeting, I find it very helpful to read the meeting notes afterwards though.

    • Chris - It is also on top of Paragon WG

  • + [quest] (Simon) What LTI client account do 2U/edX maintain for development? TurnItIn? H5P.com? Others?

    • Studio and LMS go to different assignments on Turnitin

      • Same parameters with different values somehow

    • How do we get Turnitin accounts?

    • Work with PMs and PCs to collect a list of LTI clients that our partners most frequently use. Then approach those LTI clients and establish a process with them for supporting edx-platform integration. Establish also a process to add or subtract from that list of “supported LTI clients”.

  • [quest] (Robert) Did my devstack hacks document have anything new for anyone? Relates to last week’s discussion around my having less pain than others.

  • [inform] (Robert) Arch-BOM is experimenting with a Github project in place of Jira board. 

  • [inform/ideation/quest/analysis] (David J) Categorizing pages in the Architecture and Engineering wiki space by where they should probably end up: Architecture and Engineering wiki categorization

2022-06-01

  • [ideation/quest] [Robert] I’d like to discuss our test strategy regarding cypress tests.

    • Since e2e tests are more costly to run and maintain, we’ve generally kept to a smoke suite of important use cases. What strategy do we want?

    • For edx-platform, we used to have bokchoy integration tests. In this ADR, again, we decided on just a smoke suite because additional tests were too costly to maintain, too costly to run, and very rarely failed due to a real problem. Arbi-BOM removed bokchoy, and I think there is a plan to replace it with a cypress suite. Is the decision in this ADR still accurate?

    • Getting the e2e cypress tests working in the pipeline is currently owned by the QA team.

      • It is exciting that some of this work is making progress.

      • However, it seemed from Ansab Gillani’s Eng-All presentation that his team might be envisioning much greater test coverage using cypress (to be confirmed). Let’s discuss with someone from the team to determine alignment/misalignment, and determine good next steps.

    • Where to run the new Cypress e2e tests?

      • GitHub Actions: not positive it will work with the pipeline, needs discovery work

      • build-jenkins: current e2e tests run here, slated for decommissioning soon

      • tools-jenkins: choice of last resort

      • GoCD: not clear this can work

    • (Ned) Has this been announced/promoted to the Open edX community?

    • (Jeremy) Do the cypress e2e tests work in devstack or Tutor?

      • Not yet, mainly tested against stage so far

      • This isn’t a regression against the bok-choy e2e tests, since they haven’t worked in devstack for a long time

    • (Jeremy) How many of our e2e maintenance problems seem to be from bok-choy vs. cypress?

      • Cypress is a significant improvement over bok-choy, but still fairly problematic and slow compared to Pact tests

    • (Simon) The end goal is to have cypress test be running in the edx-platform CI/CD deployment

      • Expect to have this, if ESRE can fix the pipeline running blocker of cypress by then, around mid-June

    • (Dawoud) YOW! 2017 Beth Skurrie - It's Not Hard to Test Smart: Delivering Customer Value Faster #YOW is an insightful talk on e2e vs contract testing, what not to e2e test, the intent of the contract testing, etc.

  • [quest] (Jeremy) What factor(s) have most hampered your ability to deliver value in the last 6 months or so? (Deliberately open-ended, don’t want to lead towards any particular problem or solution.)

    • (Chris) Attrition, lots of people needing to take ownership of code they don’t know well yet

      • We need a better culture around writing code with the intention of eventually handing it over to someone else - documentation, etc.

    • (Andy) Test cycle time, too long to avoid context switches

      • Especially for local development, takes too long to get set up to even be able to run tests

      • We’re getting better about duration on GitHub tests, but still requires overhead to push a branch and create a PR, etc.

      • (Chris) Often takes days to get devstack back in a usable state after not using it for a while

        • (Robert) Why do some people keep encountering this and others almost never do?

          • Contributing factor is number of different services in use

          • Also, set of commands for quick fix of most problems isn’t well documented

    • (Simon) Lack of clarity of the value of what we’re currently doing

      • Example: ticket blocked for 2 weeks, but only impacting 1 learner

      • (Kashif) We don’t have good data on how often different parts of our test suite catch actual problems (especially relative to time spent diagnosing flaky tests)

2022-05-25

  • [ideation] (David) What would it take to create high-level (context/system) baseline, current state architecture diagrams for all of Open edX? 

    • In Mermaid, please: https://github.com/mermaid-js/mermaid#readme  

      • (David) I actually tried this and found that its layout engine isn’t good enough for complex diagrams… they just get impossible to read.  Happy to learn and be wrong about that, though!

        • (Ned) I haven’t tried Mermaid, but text would be great if we don’t want to start over every year.

          • (David) Diagrams as code living close to what they describe would be delightful

    • We have this from Content architecture vision, no? Why step away from C4modeling?

      • I’m not, that was the “context / system” above, but I should have said container for the second level

    • edX architecture onboarding presentation

    • Diagrams are useful, have different audiences, and require effort to maintain

    • (Jeremy) Should we have a designated owner(s) of overview diagrams and docs?

    • (Feanil) Even bad diagrams can be useful in the sense of getting more experience learning how to make better diagrams.

    • (David) Diagrams should come with a description of its intended audience

  • [quest](Simon) What is known within TCRIL world? What are the initiatives you are working on? 

  • [Ideation/quest?] (Kyle):  Private 2u-internal jira links (or other private links) in PRs - ways to nudge people to put context in the PR & commit message?

    • Idea: PR template

      • Ned disagrees - they get ignored/stale

      • [Robert] The decision of what we wish should be documented somewhere. Could be OEP. Could also be in PR template as well.

    • Idea: Linter/nagger that warns about private link

      • This would also unfairly warn people, though, who are including private links but also including all the relevant context.

      • Idea: Have some heuristics, eg if the pull request description is really short AND there’s a private link, then nag

    • Point: How many PRs are actually looked at by community members?

      • Kyle: potentially all of them. But how can we make this clearer?

        • Robert: Just say so every time we hit the problem.

    • relevant tcril-engineering issue: https://github.com/openedx/tcril-engineering/issues/271

2022-05-18

  • [Andy] generic xblock ticketing to enable exam service - be able to convey “the exam service thinks it is ok to show this xblock to this user right now” or conversely “this xblock demands the exam service said it was ok to see” via signed jwt.

    • Discussed for ~25 minutes, then context switched

  • [Andy] results from RCA “did not test” survey 2020-2022

    • Roughly half of recent RCAs have involved an inability to test in some way

  • [quest] (Danielle) David Joy mentioned edX/Open edX initiatives around guilds/interest groups/team extra curricular initiatives that ended up becoming very distributed over time. What about the distributed approach did work, what in retrospect didn’t pan out as intended

    • The question is mainly regarding what we call working groups

    • Add working group participation to career path. 

      • Lends credibility and importance to participating in this kind of activity

    • Not all working groups are the same

      • Some working groups are more  aligned with the daily work participants do than others

      • How is my participation in a working group helping me with my daily work?

      • What does the end goal of my participation in a working group look like? 

      • managers can play a role here with coaching their reports to help evaluate the participation and progress

    • How do we reconcile squad needs vs. working group needs?

      • Part of it is explicit expectation that engineers will spend a percentage of their time on this, and managers should help make that happen

      • Some things work better in squads, others in working groups

        • Are the tasks high latency, or do they require a lot of heads down time?

  • [quest] (Jeremy) We’re thinking of kicking off some kind of initiative to get better consistency our dev/stage/sandbox/prod/etc. environments.  What are people’s top wish list items in this area?

    • [David comment in chat] "How do I sandbox?” and "How do I use a sandbox with this thing?” seem like a perennial issues... which I think is influenced by the lack of consistency/predictability

2022-05-11

  • [Ned] Any Atlassian migration concerns to discuss?

    • Process feels a bit confused, but no major concerns right now

    • [Feanil] Curious about how we continue to work in public as much as practical when Jira goes private

      • Arch/Arbi-BOM considering experiment to work primarily in GitHub Issues

    • Looking to get Jira out of the picture for OSPR and BD

  • [Diana] (question) Paver, future of?

    • Should we make a conscious decision to either continue using or move away from paver?

    • Is there a clear “winner” to replace it?

    • [David] What are all the things Paver does?

    • [Feanil] we use paver in bad ways

      • To hide platform complexity, which keeps people from learning those complexities

    • [Jeremy] We should probably create a paver DEPR to clarify that we plan to phase it out over time, and not use it for new things

      • Jeremy will ask Arbi-BOM to enumerate what it is still used for, so we can come up with plans for each of them

  • [Andy] how do we get serious about local testability? We could break the site for a few days maybe?

    • [Jeremy] There are a few efforts in various stages of progress that could help with this:

      • Dev Env WG and the migration to tutor

      • Arch-BOM’s work on the Dev Data OEP and framework

      • Arbi-BOM’s effort to improve the state of Open edX configuration

      • Incident Management’s work on Pact (consumer-based contract testing)

    • [Andy] It keeps coming up in RCAs that something broke because it was too hard to test locally before merging and deploying

    • edX/2U used to have a Test Engineering group, we may still need something like that (especially if not bound specifically to Jenkins maintenance and the edx-platform test suite)

    • [Simon] Should we do an RCA on the problem to get more clarification on what exactly we need to solve?

    • [Andy] audit of RCAs since 2020 for could not / did not test as contributing factor: https://docs.google.com/spreadsheets/d/15UR4R8FWUgdBFJyRnXc6dbUV2ES3O8OPJyQhEJFeHI0/edit#gid=0 - RCA category is filtered to “regular” to exclude SRE and process type RCAs

      • Summary for outside of 2U google doc space - few RCAs so far in 2022 but 66% are in this bucket

      • More RCAs in 2021, maybe 50% in this bucket

      • 2020 similar to 2022

      • General level of terribleness in RCAs much lower in 2022 vs 2021 and especially 2020

2022-05-04

  • [Ned] (inform) Tobie Langel at conference: Moving to Collective Ownership

    • His key points from the keynote:

      • 2U needs to:

        • Spell out the business value of open source

        • Accept changes to flow in order to level the playing field

        • Teach to fish rather than give fish

      • The community needs to:

        • Understand that open source is a do-ocracy

        • Spell out their business value for contributing

        • Stop asking for fish

      • tCRIL needs to:

        • Facilitate everything

    • live notes from the follow-on discussion session: https://docs.google.com/document/d/1BuMwDdsFVto1NLvaUgkJXHgvibzZbmjCuQrMe-v4wTE/edit#

    • Simon: maybe there’s open source value for 2U, but the community won’t like it.  Maybe if 2U earns money?

      • Other projects have run into this problem, where profit companies aren’t contributing back

    • Force of divergence might get greater over time.

      • How can we push back against the forces of divergence?

      • 2U uses data pipeline tools that cost money, so the community doesn’t adopt them, and we have a different scale than they do.

        • This is diversity, is that different from divergence?

        • Find places to use common solutions, and also allow for diversity

          • Pick the boundaries appropriately

        • A common reason for organizations to use an Open edX provider (like Appsembler or Opencraft) is that they use some vendor they’d like to integrate their platform with.

      • [Simon] Can tCRIL help facilitate the identification of what community contributions would be most valuable for everyone?

        • E.g. There was a time when edX devs thought ecommerce would be fully accepted by the community, so a lot of effort was put into documentation and feature work, but that turned out to be a faulty assumption - Open edX installations mostly did not want/need the ecommerce system.

          • Ned contends this was not really wasted effort, the payoff was just very far in the future and adopted by fewer Open edX installations than we thought at the time.  But it still helped drive adoption.

    • [Ned] The attention from 2U/edX toward the community could wander as time goes on, and then the value of community contributions decreases/disappears?

      • Working with the community is an engineering tactic and strategy; there won’t be strong user feedback that indicates to 2U that we’re not working enough with the community.

      • The business value of working with the community can seem counterintuitive, e.g. working to help merge a contribution often doesn’t get us closer to releasing a given feature next week (or whenever).

        • There is at least one example (Racoon Gang blended projects, coordinated by Adam S.) where community contributions are both timely and relevant/unblocking to active sprint work by a 2U scrum team (Enterprise Titans/Access).

        • Robert R. is reviewing a giant community DEPR PR and is happy to see the work moving forward (from Racoon Gang).  It requires some non-trivial amount of Robert’s time, but the work is moving forward.

          • (Do we just love Racoon Gang?)

    • [Robert] Tobie brought up the idea of an open-source Member Organization as a way to provide funding.  We don’t really have that, but tCRIL has decided to start funding some community projects (in addition to 2U/edX).  

      • The tCRIL funding just comes out of the general tCRIL project.

        • We’re not sure how nonprofits can collect dues or whatever from potential members and what those dues could be used for.

      • Would it be better if there was just one big community fund?

      • Is it possible to allow donors to specifically allocate money toward blended development (other non-profit donations allow this type of thing, e.g. “Here’s money from the Undergraduate basket weaving program at U. State”).

    • [Dave O.] Community contributions that help clean-up or deprecate brings real value to edX/2U/tCRIL, though it might not have a direct line to business value.

  • [Ned] OEP-55 (Project Maintainers): increase community rights, or reduce edX rights?

  • [Ned] (somewhat relatedly) people are chattering about changing edX deploys.  Some sketchy notes here: https://openedx.atlassian.net/wiki/spaces/AC/pages/3343646983   

  • Question about “by the end of the year, 2U should be treated like everyone else”.  Do we know what this would look like? (also note this quote is from Tobie, whose statements on this matter are intentionally bold/provocative).

    • Couple of options

      • Are 2U engineers made to go through the same core-committer process as non-2U community members?  Or is access to some repos restricted to only the core-committer process?  Or something else.

    • The hard work is figuring out exactly how this could/should work.

  • [Andy] The language around “rights” is maybe too emotional?  Same with “level playing field”.

    • Maybe “permissions” would be better

    • “Level playing field” might be too big of a lift to get to.  As long as 2U is the largest Open edX installation/organization, it won’t be level.

      • It’s “asymmetric”

    • Part of the thing to fix is an emotional component; the rest of the community feels like a second-class citizen.  This is important to acknowledge.

    • [Simon] Security patching process - we might need to hide community contributions that patch a vulnerability.  Conversely, there are 2U things that need to be hidden/private from the community.

      • [Dave O.] Says the forbidden word (“fork”).

2022-04-27

  • [quest] (Jeremy) What are your top pain points related to Open edX configuration and settings?  Do you have any suggestions for improvement in this area?

    • Lack of consistency

      • Settings file in repo

      • Remote config

      • YAML files generated for devstack & Tutor

    • OEP-45: Configuring and Operating Open edX (Provisional) (link to Configuration section)

    • Multiple override layers make it nearly impossible to keep the big picture in your head

    • Little/inconsistent documentation for each setting

  •  [inform/quest] (Jeremy) We’re working on spinning up an Arbisoft squad for front end architecture and maintenance.  If that pans out, what would be your top project nominations for them to work on?

    • [Simon] Somewhat concerned about interface between maintenance work and front end features impacted by it on owning teams

    • On demand model may work best, where they do more work on request than proactive maintenance

    • May be best to start with Front End WG requests

  •  [inform](Andy) I’ve started working through how we might capture program intent (not enrollments, intent) Spec here with Spencer and I, Beginning of Technical Doc on it and the Approach doc by Spencer which still called this program enrollments

  •  [Quest](Simon) Why do we have Monthly arch standup, Arch hour, and Content theme arch group as a sub group? Is architecture organically grown to the point where we can use some top level organization?

    • The OEP-56 is trying to address the meta side of this question.

    • Can we please use that OEP to formulate a plan for this question?

    • Feels like too many cross-functional needs land in “well, maybe Arch-BOM will get around to it”

2022-04-20

  • [quest](Simon) input or reactions on How do we perform access control on Special Exams without edx-proctoring

    • Async feedback after the meeting is welcome, the doc’s a little long to read and process during the meeting

  • [Ned] Can we set guidelines for how to use our various communication channels?

    • (Simon) This feels like an attempt to impose control on an existing grassroots communication pattern

    • (Ned) Rationale is to make sure that people who want to stay informed about certain kinds of discussions/decisions don’t miss key communications

    • (Jeremy) Arch-BOM and Arbi-BOM use https://openedx.atlassian.net/wiki/spaces/AT/pages/2331836527  

    • (Ned) Is there a set of announcement venues which, when all utilized to announce something, we can realistically assume will reach all developers?

      • True at least for all developers adequately paying attention, as long as we don’t flood the communication channels with irrelevant information

      • (Ned) Good point, we need to take care to preserve the signal/noise ratio in such channels

  • [analysis] (Jeremy) Does it sound reasonable to spin up a bunch of Blended Development projects, coordinated by different squads, to finish building all the MFEs we need?  Then throw away all the edx-platform JS and spin up an Arbisoft squad to help with front end maintenance?

    • [SC]It’s a good practice to calculate the cost of delay on the risks and the scale.

    • [JB] The old javascript will also be a reason for poor employee engagement and trigger employee retention risks.

2022-04-13

  • [Ned] (inform) there’s a thread in #openedx about GitHub wikis

    • Strong suspicion that it would just make things worse

    • Only 14 repos use it, most of them stale or almost empty

xblock-utils              2 pages   2016-01-12 Tim Krones: Improve formatting, spelling, and wording. edx-analytics-pipeline    9 pages   2017-11-28 brianhw: Updated Tasks to Run to Update Insights (markdown) edx-analytics-dashboard   2 pages   2015-12-22 Daniel Friedman: Updated WIP: OpenID Connect (markdown) cs_comments_service       5 pages   2014-06-30 Trinh Nguyen: Updated Query or Delete comments data in mongodb (markdown) edx-proctoring            2 pages   2015-12-16 chrisndodge: Updated Release Process (markdown) xblock-sdk                1 pages   2016-05-13 Bui Trung Nghia: Initial Home page openedx-demo-course       1 pages   2014-04-24 Luyang: Created Home (markdown) configuration             19 pages  2020-03-11 Tim McCormack: repoint to devstack repo edx-platform              56 pages  2022-03-25 Julia Eskew: Updated Opaque Keys (Locators) (markdown) edx-app-android           2 pages   2015-03-25 LiuNaidi: Initial Home page studio-frontend           1 pages   2018-03-06 Eric Fischer: formatting edx-notifications         1 pages   2015-04-15 chrisndodge: Updated Home (markdown) ux-pattern-library        6 pages   2019-12-13 genisys58: Created Styleguide: Sass & CSS (markdown) edx-tools                 2 pages   2017-10-03 Julia Eskew: Updated Home (markdown)

2022-04-06

  • [Ned/David] we are looking for examples of discussions where you were uncertain whether you could share information from inside 2U to outside.

    • Came up in conversation with 2U privacy policy authors

    • Reconciling differences between said policy and historical edX behavior

    • Doc for 2U Privacy: 2U Privacy Policy and the Open edX Community

    • Grey areas:

      • Vendors and services we use

      • OGSPs

      • 2U-edX convergence opportunities (software, capabilities, product, etc.)

      • What can we share with tCRIL that we can’t share publicly?

    • Examples:

      • Plans for integrating lines of business

      • Roadmap of future features

      • Discussing security

        • Plans to upgrade or not

        • Don’t talk about what type of database we’re using

        • edX.org has a way of privately patching product before it releases things to the rest of the ecosystem”

      • Paragon and 2U design systems conversation

      • Contributions to Open edX public roadmap

    • Philosophical approaches:

      • “Tech is not the competitive advantage, the content is.”?

    • Data working group: what do we want to do for analytics for the platform?  “Hey, my group in 2U is planning on investing a bunch of time in X”  The fact that 2U is planning on spending time on that tells you something. 

    • Don’t want to be hamstringing our partners who help us implement this functionality and utilize it, while at the same time, people who want to know what tech we use, people have automated ways of doing that today.

    • [Simon] Vendors: Vendor asking about one of our lines of business, LMS choices, etc.  Should we have shared that?  

      • [David] ^^^ Me, right now, not sure what to write down and what to keep back

      • Guideline Jennifer set up for us to follow - let’s not share anything outside of OCM without explicit acknowledgement from other lines of business

        • Within OCM, the line is easier to understand because we’re familiar with it

      • This conversation could very well have happened with a community member and not a vendor, too!

  • (quest) [Jeremy] What would you need to decide that an event bus is useful and ready to use in a project?  Top 1-2 concerns, not full list.

    • Testability

      • Can I just inject events into the bus to avoid bringing up the other end of the thing?

      • How to know if an event fired?

    • Development cycle time

    • Good documentation:

      • How do I add an event?

      • How do I receive an event?

    • Directory of existing events

    • Observability

  • (quest) [Jeremy] If we had a cloud development environment, what are some of the things you’d need to see in it to find it useful?

    • [Andy] Push button startup

    • [Andy] Disposable environments, “Don’t know the state?  Throw it away!”

    • [David] Minimally, I can auth against it so I don’t need edx-platform

    • [Chris D] From a developer perspective, what would be good about it? Are there any developers clamoring for this?

    • [Diane] A simpler, more understandable config setup

2022-03-30

  • [analysis](SC) I’d like to get feedback on Special Exams IDA design. It’s what Cosmonauts are planning to do starting April

    • Discussed terminology of “Special Exams”

      • What makes them “special”?

        • Timing constraints

        • Proctoring/monitoring

    • How much of implementation is shared with regular exams?

    • New IDA may utilize the event bus, but unsure of the details until more of the core IDA design is fleshed out

    • Discussed IDA vs plugin, how this is distinct from the standard LTI proctoring interface

      • Data synchronization across IDA boundaries may be painful

2022-03-23

  • [Ned] proposal in review for a pilot of narrowing 2U rights to repos 

    • The idea is to limit write permissions to just the repos each 2U employee actually needs, not all of them

    • Trial for a single squad is being prepared

    • What if a developer suddenly needs access to a new repo?  Can we do this quickly enough?

      • There’s currently a process with a SLA of 24 hours or less

      • Can we increase the pool of approvers for such requests to reduce the turnaround time?

        • “Core Admins”?

    • Will we lose “drive-by” improvements?

    • These are all public repos, it’s always possible to create a PR from a fork even before write permission is granted

      • There’s a proposal to use branch protection instead of write permission to enforce the restrictions

        • Solves some pain points around forking, but doesn’t offer as much visibility into the current state of access

    • Will this worsen the sense among some 2U developers that they’re detached from decisions about the direction of the platform and don’t know how to influence it?

    • Would an alternative to making things more symmetric be to give core committers membership in push-pull-all?

      • Yes, but that doesn’t help the secondary security goal of the proposal.

    • Could some people potentially keep full write access to most repos?

    • Would it be worth doing a study of past committers to a repo and seeing who the new permissions would knock out?

  • [SC] What is the current status of automated acceptance tests? Not only edx-platform

    • Context: Was dealing with acceptance tests on edx-analytics-dashboard

    • [Jeremy] Incident response has been working on this and is porting things over to cypress, but we’re not sure where they are on this.

    • Are we maintaining these in general?  They don’t seem to cover much usual functionality anymore.

      • Not much.  We haven’t been getting much value from them relative to the maintenance burden.

      • We’re testing out consumer-based contract testing via Pact as a replacement for much of the value we had tried to get from acceptance tests

  • [KB] tCRIL proposal says “architecture WG is in the works”, can we map this to the OEP and discuss?

    • No news yet, this has been waiting on OEP-56 finalization.

      • Not because that’s a blocker, but because WIP is high and we’re trying to focus more on one thing at a time

  • [KB] I heard that tCRIL is doing some sort of tool to have broader insight to Github repos (called Amore?) and I want to go to there.

    • Grimoire Labs - it’s currently broken, Ed is working on setting up a hosted instance of it

    • Aggregates data about a bunch of repos and lets you build dashboards about it via Elasticsearch

    • Not thrilled with the state of the package, some overlap with Repo Health Dashboard functionality

    • We’re not too happy with the state of Elasticsearch licensing and hosting at the moment…

    • We need a better front end for the repo health dashboard, there’s a ticket with ideas for this: https://openedx.atlassian.net/browse/BOM-2145  

      • If anybody wants to talk about this or start working on it, please get in touch with Jeremy Bowman

    • Maybe Backstage could be part of this

  • [PS] [ideation] When to break out a new Django app?

2022-03-16

  • [Ned] Slack shared channel policy: why did we require them to be private?

    • Because it used to be much harder to distinguish channels that had external people in them.  Slack has since fixed this.

  • [Ned] Introductions, we have a new person

  • [David] (maybe a 2U topic, oops) Arch Hours - there’s a Content Arch WG that has one too, and interest in broader 2U.  Is this perhaps the Open edX Arch Hour?

    • Essentially, try to bring up topics of broad interest to the Open edX community here and ones that are likely to touch on 2U proprietary information in the other venues that only have 2U employees in them

    • Arch Hour representation of teams?  I.e., “representative democracy”

    • Reminder: we try not to make decisions in these meetings, because not all stakeholders can reliably attend.  More of a forum for questions and initial feedback on ideas before they go out for broader discussion and review.

    • OEP-56 tries to establish a process for decision making that gives all stakeholders a reasonable opportunity to participate

    • We have existing change broadcast/feedback mechanisms for OEPs, DEPR WG, and Architecture squads

  • [inform](Bernard) New to company and checking out forum 

  • [inform](Feanil) Dave is doing a lot of open thinking around the learning core

  • [inform](Feanil) Consider watching the announcements space on Open edX Discourse

  • [Jeremy] We have a lot of GitHub Project boards now, in different places: edx & openedx orgs, org and repo level in each.  We should make sure they consistently have descriptions, and some kind of index of them.

  • [inform](Feanil) docs.openedx.org design coming soon 

2022-03-09

  • [analysis] (Jeremy B) We have a Confluence page for major Open edX upgrade projects at https://openedx.atlassian.net/wiki/spaces/AC/pages/1165395730 , but it rarely gets updated.  Would a GitHub Projects board work better, with a separate issue for each major upgrade (past, present, or future)?

  • [inform] (Robert) Event bus work is at-risk of being de-prioritized. If you have a strong need, please reach out to discuss.

  • [quest] (Ned) What’s the status of OEP-56?

  • [inform] (Jeremy B) I created an #npm-releases Slack channel

    • Is this useful?

    • What should be subscribed to?

    • Better docs and broader announcement forthcoming.

  • [inform] (Adam B) edX SRE would like to be able to turn off Build Jenkins "soon".  This is possible thanks to the awesome work Arbi-BOM has done to migrate jobs to GHA, if you know of a job that doesn't yet have a migration plan, please reach out.

2022-03-02

  • Quick Introductions: Simon and Danielle

  • [quest] (Jeremy) 2U’s internal Slack has a #pypi-releases that (currently imperfectly) tracks releases of our dependencies from PyPI.  Is this worth enhancing and/or announcing more broadly?  Would the Open edX community benefit from some kind of equivalent that only includes dependencies of Open edX?

  • [quest](Feanil)So many OEPs in Flight, are people getting time to look at them?

  • [inform/question] (Dave) I’ve been sketching out more of the Learning Core proposal that I had mentioned a while back. What’s the best way to get feedback/thoughts/input/etc. on this, particularly from 2U?

    • Robert: Want to see: here are the use cases where I think people are going to be happier. Here’s the added pain you might feel because of this change. (in any ADRs, forum posts, etc.)

    • David Joy: How much will people need to change how they work?

    • David Joy: Bigger than ADRs

    • Andy: Need to find a champion inside 2U

  • [Q] (Ben W) The new meeting time coincides with the paragon working group… is there any other time that might work for this meeting, to allow people to attend both?

    • Jeremy: Options are very limited, but I’ll see if there are any practical alternatives

2022-02-23

  • [inform] (Jeremy) Arch-BOM is in the process of clarifying its mission and role, there are likely to be some minor shifts in recommended ways to interact with the team as a result

    • Arch-BOM is: Jeremy, David Joy, Becca Graber, Robert Raposa, Diana Huang, Tim McCormack, Ned Batchelder

  •  [inform] (David) OEP-56: Arch Process has been added as a PR, but is still in Draft state/in flux: 

  • [inform] (Ned) I’ve been hacking on a GitHub daily digest. Example: 3 Days of activity in three repos of interest to me.

  • [quest] (Jeremy) How should we go about determining if, say, MySQL full text search is “performant enough” to replace Elasticsearch for our usage?

    • How much of this performance testing can/should be done up front?

    • Log existing search queries and pipe them through the suggested replacement

  • [analysis] (David) Mermaid in Markdown on GitHub - straw that broke the documentation OEP’s back?

2022-02-16

  • [Inform] (Simon Chen) Course application architecture diagram as part of Content theme architecture vision

  • *[Inform] (Feanil) Documentation work for Open edX

  • **[Inform] (Ned) next-up open source concerns: managing risk to edX.org deploys, privacy policies

  • [Inform] (Ned) I hacked together a GitHub “daily digest” thingy, and am interested if others would find it useful.  Example digest: https://nedbat.github.io/graphql-learn/example.html  

  • ***[Ideation] (Feanil) Conference talks you can give at the Open edX Conference

    • Conference is 4/26-4/29

    • Talks are due shortly (2/20)

    • Need more proposals

    • Proposed pre-recorded talks are ok if you can’t physically go

    • Don’t need to worry too much about being too technical or not technical enough

    • Topic suggestions:

      • Is there a Paragon talk already?

        • Submit your talk, if there is overlap we may ask you to work with others in the community.

      • Event bus (some combination of Arch-BOM folk) +1

      •  Shifting away from the modulestore in LMS (Julia)

        • Learning sequences

        • Blockstore

        • Etc.

      • MFE state of the world in Open edX platform (Julia)

        • Which ones exist?

        • Which are planned?

      • Content authoring editors: Make it React! (Julia)

        • Text editor re-write in Studio

        • Video/problem editor re-writes planned

        • How to re-write your editor in React

      • Code maintenance automation and effort distribution (Jeremy and/or someone from Arbi-BOM?)

      • ID Verification and its Future (Simon)

      • Tutor, Devstack, and the future of Open edX Development (Julia)

      • Lessons from INCR and other distributed upgrade efforts (Jeremy)

  • **[ideation] (Jeremy) How can we best maintain and get momentum on a pipeline of Open edX development tasks that anybody in the community could work on?

    • Prior art: INCR tickets, Django upgrades

    • Open edX Roadmap for big-ish things

    • What are the hooks for people to get involved?

      • Badges on Discourse

    • Potentially good for:

      • Upgrades

      • Fixing linter/interpreter warnings

      • Removing deprecated stuff

    • Maybe some kind of working group for helping new developers get involved in the project?

    • Existing working groups could curate appropriate tasks for the pipeline

    • GitHub Project for good starter tickets?  Maybe another one for slightly more complex tasks?

  • [quest] (Waheed) Some of the edx/openedx owned packages are not released for years, what is the process to push a new version to NPM registry for them? E.g. https://github.com/openedx/stylelint-config-edx

    • Semantic-release is not configured for this repo

    • ACTION: Added the release workflow and published a new version successfully

2022-02-09

  • **[Jeremy Bowman] New Working Groups, making WGs work better

2022-02-02

  • **** [Julia] The future of devstack - will we be examining Tutor to move to remote-based containers for development?

  • [Kelly] Do we have a way of spinning up a remote environment for messing around?

  • ***[Jeremy] Do we want to move this meeting a little earlier in the day?  Arbisoft people struggle to make this time slot, probably not the greatest for Capetown or Open edX star-people in Europe either.

    • Yes, let’s try for a 9am time slot (10 conflicts with too many team meetings).  May or may not involve moving to a different day of the week.

  • ** [Julia] The authN MFE now starts up with the LMS on devstack. Is this a change in thinking? The learning/courseware MFE still doesn’t start up with the LMS.

    • Julia will follow up with that team.

2022-01-26

  • [quest] (collective) What’s up with Slack vs. Google Chats at 2U?

    • Slack was in use first, but security concerns prompted a migration to Google Chat

    • But there was strong resistance from TechDev

    • Current status: TechDev is still on Slack, pretty much everybody else switched to Google Chat

  • ******** [Quest] (Ben W) How do we integrate 2U Guilds with the concept of edX working groups?

    • There’s an enumeration of OCM/edX working groups (or rather, 2 enumerations: public & private)

    • Guilds mostly came into being last year (1-2 started earlier)

      • FWIW, same roughly with the formalization of working groups at EdX

    • 2U Guilds are mostly communities of practice, rather than sources of tasks to be done

    • https://2universe.2u.com/departments/tech/eec~2/engagementandculture/new_wiki (not sure why architecture doesn’t have frequency/channel, but it’s #guild-architecture, and monthly. Feel free to join!)

    • OCM/edX Working Groups tend to be more active, communities of practice tend to be done more as Slack channels and study groups

    • Currently, 2U cross-team initiatives are more ad-hoc

    • Vibram is the paragon-equivalent team in 2U

      • Helpful representatives to meet from edX:

    • 2U Guilds are in Slack as #guild-* channels

      • Ruby is dead

    • Key point for pre-edX 2U folk - OCM/edX has both private and very public working groups, due to close ties to the open source Open edX platform

  • ***** [ideation] (Ben W) Onboarding Working Group - Charter written and ready for potential review and seeking other people interested in contributing to/joining 

    • https://openedx.atlassian.net/wiki/spaces/COMM/pages/3313993290

    • Plan is to focus primarily on the quality and accuracy of documentation

    • A key point will be separating what’s generic to Open edX and what’s specific to OCM/edX

    • Does the working group need a sponsor in management?

      • Not necessarily, but it might help

      • Ask managers of people planning to join the Working Group

  • **** [analysis] (Jeremy) Can we reduce the number of big things we use that have a tricky upgrade treadmill?

  • [quest] (David) How can we start getting involved in the 2U Architecture Guild? 

    • Join #guild-architecture in Slack

  • * [inform] Introductions!

    • Hi, nice to meet you!  Tell us about yourself!

    • David Joy, Principal engineer on the Arch team at OCM/edX

    • Ned Batchelder (edX/OCM), open-source logistics mostly, also Python foundations

    • Jeremy Bowman, Engineering Manager for the Arch-BOM and Arbi-BOM architecture squads at OCM/edX

    • Ben Warzeski - EdX - Aurora Content Theme Frontend Engineer

      • Obsessed w/ React, testing, and process

    • Julia Eskew - Principal SW Dev on the Teaching & Learning team (edX)

    • Chris Deery Principal Engineer on the Engage team

    • Andrew Thal, Software (Enterprise?) Architect @ 2U (global)

    • Andy Shultz, Principal Engineer on Cosmonauts @ 2U/edX

    • Raymond Zhou, Software Engineer on edX Teaching & Learning

    • Robert Raposa, Principal Engineer on Arch-BOM @ 2U/edX

    • Danielle Eriksen, Senior Software Engineer @2u, Degrees Vertical

    • Diana Huang - Senior Software Engineer on Arch-BOM (edX/OCM)

2022-01-19

  • [quest] (Binod) Enterprise team is looking to best leverage feature toggles abilities to do more flexible (per customer or per subscription) toggles of features.A lot of these are done via config models right now. I wanted to get a base understanding of what best practices are in this area (edx-toggles read gave me words like: Waffle, Django setting, SettingToggle). Also, want to brainstorm on how can frontends best leverage these settings? Prior art? link to edx-toggles

  • [quest] (Alex D.) Would dev teams prefer to have a discrete Redis elasticache instance for each service they own (that requires redis)?  Today, there is one redis instance that is shared by every redis-requiring service.  What would the benefits be?

  • + [Andy] (WG followup for next time, writing here so I don’t forget) working group / community component as individual heroism vs. team process

    • https://docs.google.com/document/d/1yaqJyIEkmeuq7Q9-LK9QumrUhRP_CLBa-t5qeHH9s_k/edit  

    • Accountability / Visibility

    • Prioritization rubric

      • SWG does a good job of this

    • Sub-topic: addressing the asynchronous problem… how do we work when we're not together?  Many working groups struggle with this, even well-functioning ones like BTR.

    • 30/60/90s - vehicle for getting people involved in WGs?

    • We should add the list of working groups and a request to get involved in one to the onboarding docs

    • The problem may not be pressure to focus on sprint tickets, but lack of pressure to work on working group activities

    • Having explicit small tickets that can be put on sprint boards may help; try to ticket more working group tasks

2022-01-12

  • [Ned] Rollback vs fix-forward: how to decide?

    • A rollback is just one button, but it used to be very involved.

    • We usually recommend a rollback unless there’s a known backwards-incompatible database migration

    • Have people been bitten by rolling back then causing more trouble with data problems?

    • If we aren’t familiar with the commits, then a rollback might be scary because you don’t know if there’s a data point-of-no-return

    • One technique: pre-prep a revert PR for scary changes

    • “Rollbacks are scary, they can be seen as stepping on people’s toes, and let’s not forget it’s kinda like publicly admitting fault and even though no one’s gonna shame you, it’s still slightly embarrassing”

    • Put another way, the rollback may be more _visibly_ disruptive to us, as it screws up everyone else too.  We’re not having the customer problem thrown in our face, but we see our impact on all our coworkers.  And a revert/fix forward feels less invasive... if you can do it fast enough.

    • First impulse: debug the problem.

    • Rollback will involve other people’s commits

      • You might be removing a new feature

      • Other services might need rollbacks to match

    • After a rollback, doesn’t the entire release train have to be paused to not rollback the rollback?

      • Yeah, so a common pattern is rollback first, investigate, revert the problem, unblock the train, fix the problem for real.

    • Rollbacks don’t rollback migrations automatically, it’s an additional step.

      • Doesn’t this make rollbacks more dangerous?

    • We have a whole set of guidelines for dropping a column in stages: https://openedx.atlassian.net/wiki/spaces/AC/pages/23003228#EverythingAboutDatabaseMigrations-Howtodropacolumn

    • Jeremy’s anecdote: outages lasting an hour due to trying to fix forward are more common (~10) than rollbacks that go badly (~2)

    • Suggested action items:

      • Eng All Hands presentation on best practices around rollbacks and fixing forwards

      • Repeat rollback training session and/or promote documents and video from the last one

      • Remind people again of the all about data migrations Confluence page

  • [Ned] How to encourage devs to use their community effort? The edX career pathways now include a Community component.  Are people doing this? Can we raise the visibility?

    • Many people are participating, they have trouble tracking how much time

    • 20hr/mo came from the Core Contributor guidelines

    • Having community work in trackable (Jira) tickets can help

      • But working groups are public, and Jira isn’t good there

    • Is there a list of possible community work?

      • Working groups (both internal and external)

      • ERGs

      • Discuss/Slack activity

      • External pull requests

    • WG vs Guild vs Tiger Team

2022-01-05

  • [quest] (Jeremy) Outreach to the rest of 2U regarding architecture - what do we want to learn from them, what do we want them to know about us?

    • Pass along architecture onboarding deck

    • Clarify that we’re cleaning up a ball of mud

    • What existing meetings and rituals are there?

      • Who are the players?  Is it split up by product lines?

      • How do they interact with their product team?

    • What other guilds/working groups exist?

    • Do they have common infrastructure between product lines/silos?

    • System diagrams?

    • What teams exist that are working on architecture problems?

    • Put together our own version of all the above to share

    • Time to update our Arch Onboarding deck?

    • Security policies/oversight

    • How do you do ownership amongst engineering teams?  

    • How do you get requirements?

    • QA perspectives and philosophies

    • How UX/UI design teams interface with engineering

  • [quest] (Julia) As our infra/processes shift to tCril/2U-based, what team is responsible for updating our Developer Onboarding Docs?

    • Engineering managers are starting to make ad hoc changes as we sort out new onboarding procedures

    • Arch-BOM is starting to think about what we should do next in this area, but there aren’t concrete plans yet

    • We should quite possibly have more formal ownership of docs, aligned with code and service ownership.

    • Perhaps encourage developers to leave a wiki comment when encountering incorrect information? “This isn’t correct - I don’t know how to fix it but it should be fixed.”

    • Should we present a few bullet points at an Eng All Hands or such about how to enable developers to better maintain docs?

    • Onboarding working group?

      • Julia says: Old onboarding WG (6-7 years ago) used to interview developers a few months after onboarding to find out how things were going

      • One idea - have most new developers join this WG by default until they find another one they’d rather join instead

  • Working groups:

    • Are members other than group leads struggling to allocate time to them?

      • Anecdotally, yes.

      • Rotating leads helps avoid burnout, but doesn’t increase the group’s development bandwidth at any given point in time

    • Should tickets be added to team boards labelled as WG work, and raise it as an issue if the percentage of WG tickets on the board and/or completed in a sprint is too low?

    • Should we have a “tiger team” to set up our working group processes to work correctly?  (Short term “working group working group”.)

    • Security working group is working well, but may or may not be a good model for other groups because it’s concretely different in some respects.

    • Jeremy B to work on getting managerial buy-in on setting up a functional process for working groups.

    • Who wants to be involved:

      • Andy, David J, Jeremy B