Arch Hours: 2022
Meeting Expectations
Why?
Provide an opportunity for generative discussion and ideas.
Foster comradery through technical curiosity and geekdom.
Who?
Open to all edX-ers and Arbisoft-ers
What?
At times, these informal discussions result in follow-up action and beneficial change in our technology or in our organization. While this is not a decision-making body, these serendipitous discussions spark ideas that may result in ADRs/OEPs and tickets on team backlogs.
At times, it serves as a form of informal office hours to ask live technical questions of the archeological collective.
At times, we have pre-planned deep-dive topics that folks propose to gather wide-input or to answer questions.
At times, we have hosted special guests (internal and external to edX) on specialized topics.
When?
Not lunch hour in ET timezone: With Covid remote work, "Arch Lunch" has evolved into “Arch Hour” in order to accommodate various home/life situations during lunch time.
How? Live Co-Editing
To circumvent Confluence’s limitations with the maximum number of concurrent editors:
during the hour together, we capture topics and take notes at https://docs.google.com/document/d/18TmQf3GllPDfjR7WKiMIhR2eqsbwPi1h3Ojdb6yDYCY/edit.
after the hour, we move those notes to this page.
Why not just stick with keeping the notes in the Google doc?
Google docs are not as discoverable.
Google docs don’t notify observers of future edits.
Google doc comments don’t notify all observers.
How? Structure
Please enter your proposed topics for discussion.
When we use Lean Coffee Style (link1, link2), we vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).
Prefix your topic with your intention so we are clear on what outcome you are striving from the discussion. Examples:
[inform] You are simply seeking to inform the group of this item. You may field clarifying questions from the group on your inform, but not seeking further discussion at this time.
[ideation] You are seeking divergent and wide perspectives from this group. In this brainstorming mode, all ideas are accepted, without critical analysis.
It may be helpful to clarify whether you’d like to ideate on the problem space or the solution space.
[analysis] You are asking the group to help you poke holes in your idea/topic/plan/etc.
[quest] You are seeking information/responses to a question you have.
2022-12-21
[Phil] [discuss] edx-cookiecutter & auto-adding LMS id from JWT to User Django model in non-LMS new services
Consensus:
Let’s add the lms_user_id in by default: PR + ADR
Let’s consider in the future how to reduce the number of identifiers, especially considering future efforts of unifying identity at 2U
Enterprise may have a model for this in how they stub users if they are added to subscriptions before they exist in the LMS.
Created: https://github.com/openedx/edx-cookiecutters/issues/281
Raw discussion notes:
Purchase squad, migrating ecommerce to 2U pre-existing ecommerce - “Titan”
Confusion about canonical user identifiers - LMS user ID
Pie or Exams do this thing about auto-adding LMS user ID - should we add this to the cookie cutter? Should new services automatically have the LMS user ID in their user model?
Well, maybe not all of them need it… but many may eventually need it?
John: Side note: Maybe we could set the id of the user in the new service to be the same as the lms_user_id?
Phil: I didn’t know we could do this!
Chris D: What about conflicts?
John N: There is only one user table that creates IDs
John, Robert: Seconded
Robert: We should have docs in the cookiecutter about this information
Robert: On the older services we didn’t have this for a long time. We were re-using an assorted variety of user identifiers across services. Users were and many times still are being created in LMS by different services.
History: Ecommerce was one of the first repos where we were trying to get the lms_user_id holistically added to all calls to/from the repo & LMS
David: Does Enterprise has any use cases of user imports?
John: We have a stub record we create if a user doesn’t pre-exist in LMS
John: Makes sense to have lms_user_id in the user model. Maybe a future thought is to reduce our total number of ids.
Robert: In the LMS, we do have the concept of external IDs.
Chris: We have global identity as well.
John: Maybe we have options to map it in the future.
[Robert] (quest) Arch Monthly Stand-up used to provide me some info about what others are up to. I know we had thoughts about an async replacement, but right now I feel like I just don’t get this info.
Do others feel they are getting this info? Where can I tap in?
There’s an L&P Scrum of Scrums that covers some of this for managers
Or, do we need some replacement?
BOM teams try to keep track of what to announce, does this need to be a more widely done practice?
Are demo/sharing time meetings common in teams?
2022-12-14
[Feanil/Ned] Announcements
Think about conference talk ideas over break! Open edX Conference - 2023: Call for Proposals
[Feanil] General overview of how things are going at 2U?
[Andy] report on LTI tool actual vs. specified or expected behavior
Unique identifiers
PII sharing
2022-12-07
[2U internal] [Ned] (discuss) Brainstorm things 2U should tell people at the Open edX conference: https://docs.google.com/document/d/1nBW_uS7KSjFNq1K_sjkv8IliadcUiDc06HqCo4DIjas/edit
[2U internal] [Ned] (inform) Open edX lunch+learn Weds 14th. What questions don’t people even know they should ask? https://docs.google.com/document/d/1EpQ8TP3P38F5QPRF8IKx8NTISMFnCmKG-ZMXjW4YUiM/edit
[2U internal] [Ned] Is it OK that we don’t have a common town square?
We use the #tech-dev-edx for 2U
Docs: https://2u-internal.atlassian.net/wiki/spaces/AT/pages/16385625/How+We+Announce
[todo] Add sre@edx.org to some announcements
[idea] Use blogs
[question] Should we create a new Google group?
Maybe not, it may hide information from recipients
What would it be called?
Who would maintain it?
[2U internal] [Andy] tales of allowed programs
Ngrok
Github local testing of webhooks
LTI see https://github.com/openedx/xblock-lti-consumer#lti-13
Sharing a dev environment from
2022-11-30
[Ben W] What does the http->https forwarding?
[Robert] Cloudflare probably for http->https. Also, an answer to a separate question, Google TagManager is often where random scripts are dropped on the page.
2022-11-23
Low attendance due to Thanksgiving-related PTO. There was some continuation of discussions about XBlocks, iframes, and CSS conflicts, but notes weren’t taken.
2022-11-16
[inform] (Jeremy) Updated draft of Development Environment Vision is ready for review
[quest] Jeff Witt 1 min: Use of !important in CSS – OK to use, or to be avoided? Consensus seems to be that it’d be best to avoid it. Uncertain if there’s any substantial a11y angle on this guideline.
[discussion] Ned: OEP-55 Maintainership: monitoring issues, PR SLAs
We’ve picked repos for the pilot that are likely to do well at this, but what happens when it’s expanded to repos owned by overwhelmed teams?
[John] Do we need someone like Natalia to help teams keep track of this?
[Andy] It’ll probably increase the pressure to catch up with the maintenance backlog in various repos
[Jeremy] I suspect that much of the need for a project manager arises from immature processes around software maintenance and sustainability, we should also take steps to address that.
[Ned] Pilot Phase 2
[Andy] We have processes for tracking OSPRs, but really not for GitHub Issues yet. How do we make sure these actually get considered when prioritizing? (Given that many of our product managers/owners live primarily in Jira.)
[John] We could improve scheduling of automated upgrade PRs.
[Andy] Some teams are already doing this, at least for the Python upgrade PRs.
Much of OSPR handling is currently being dealt with in per-team on-call processes, which works but may not be the ideal approach.
[Andy] If you have a product-mandated backlog, fix that first. Needs to be a conversation that factors in maintenance needs.
[John] Having more advance notice that PRs will be coming (and why) really helps.
2022-11-09
[ideation] (Beggs) - OAS (OpenApi), Rest API standards and API client/SDK generation
https://2u-internal.atlassian.net/wiki/spaces/IM/pages/18973040/Consumer+Driven+Contract+Testing (and child pages, for Pact)
Django Packages : drf api documentation (drf-spectacular may be worth switching to from drf-yasg, see https://github.com/axnsan12/drf-yasg#openapi-30-note )
[ideation] (Beggs) - Standardized/Convergent core education objects for all of 2u (enrollment, course, grades, course completion, etc…)
[Jeremy] Global identity project and workshops
[Jeremy] L&P Eng Leads conversations around DDD universal language and/or unified data model
[inform] (Ned) Open edX CFP
[quest] (Jeremy) - Senior engineers & business context awareness
Many engineers lack at least context around user demographics
Also lacking good context on specific priorities for the current and upcoming quarters (thanks re-orgs)
Engineering managers in too many meetings, senior engineers in not enough
https://github.com/orgs/edx/projects/15/views/1 (Roadmap of Platform Core Teams)
Business model summary doc could be useful, especially for onboarding
2022-11-02
[Jeff] Are all XBlocks in the Learning MFE kept in iframes?
[Jeremy] Any useful info in https://openedx.atlassian.net/wiki/spaces/AC/pages/1890681313
[Jeremy] Or in https://github.com/openedx/frontend-app-learning/blob/master/docs/decisions/0002-courseware-page-decisions.md ?
An entire unit (which may consist of multiple XBlocks) is rendered in the iframe
[Jeremy] An update in https://github.com/openedx/frontend-app-learning/blob/master/docs/decisions/0009-courseware-api-direction.md indicates that the iframe rendering isn’t on the shortlist of things to change
[Robert] [quest] What process should we use to review and commit (or abandon) to various architectural docs?
Is the OEP process the right process?
From a resourcing perspective, how do we handle our current situation, rather than the old chief-architect model.
Here are some of the now-questionable docs: https://openedx.atlassian.net/wiki/spaces/AC/pages/921895082
Note: as an example, DDD is listed as one of the principles, but is that still the case?
Ideas
Break down existing docs into smaller OEPs to try to get them over the line.
Start with less controversial parts of the Manifesto, or less controversial DDD domains (rather than all domains).
Use Architectural Coordination Working Group to find resources in trying to push some of this work over the line.
For DDD, architects driving OEPs would need to include Product.
2022-10-26
Skipped due to low attendance
2022-10-19
[Jeremy] High-level development environment objectives
No need to debug code updating problems
Fast to set up a new dev environment
Don’t need to carefully preserve manually set up testing data
Good support for debugging and observability
Consistent between services
Able to run reasonable subsets of the full Open edX ecosystem of services
Defaults to feature flags currently active in production
Comes with data needed to quickly test most features
[Adam] [quest] I'd like to discuss with this group and Simon to better understand the plan for moving to Open Search
E.g. When will we know that OpenSearch will be the thing for edx-platform?
How do we know that discovery and forums won’t have any issues
Do we have a plan to remove elasticsearch from edx-platform?
Indexing courses and course teams
What is the cost of delay of staying ?
What is the
Notes: 200 ES searches per month
Adam: I also need to leave shortly after 9:30
John supports less Elasticsearch, RDS is pretty good
Need to push pre-Olive for non-AWS
AWS will support
Current SRE work: https://2u-internal.atlassian.net/browse/ISRE-1280
Infinity needs the data in the new stage cluster
Who owns discovery? Vanguards
DEPR is also looking into stopping our use of elasticsearch
[Jeremy] Can we get away from requiring thorough owning team review for maintenance, bug fix, and small feature enhancement PRs? What would have to change to make that happen?
Plugins/libraries need to have been tested in the things they’re installed in
Make test suites more reflective of actual behavior in production deployment
Make the changes unused in edx-platform
Address issues raised in previous RCAs - trailing slash consistency, database migration linting
Shorten the time from merging to detecting problems in production
Canary deployments?
Shrink the size of edx-platform (small problem can bring down a large chunk of production)
Automatically deploy a test environment that exercises the change
2022-10-12
[inform] (Ben W) FWG/Opencraft/RacoonGang Theming conflict. Working with groups to try and consolidate how we co-ordinate work around the platform between working groups and get them to talk to each other.
We ended up with parallel meetings: 2U-focused and Open edX community focused
Not much communication happened between these parallel groups
Trying to fit all front end stuff into one series of meetings ends up at poor signal to noise ratio
Need a clear forum for this coordination
(Ned) Concerned about defaulting to a meeting as the primary forum for this: conflicts, time zones, etc.
(Chris) Should we use the Open edX roadmap for this kind of coordination?
(Ben) Trying to reconcile architectural initiatives being driven by multiple organizations in the same project sounds terrifying
(John) This sounds like a flaw/scalability problem with our architecture and process that needs to be fixed
(Andy) We need to get better at sending more redundant communications the larger a project is
[quest] (Jeremy B) Developer Experience - reasonable focus for this meeting? Arch-BOM is pivoting(?) to focus on this
For a loose definition of DX, perhaps
Any examples or resources we should learn from?
(Ben) Standardized debugging/troubleshooting tools
(Chris) It feels like some aspects of DX start to cross back into architecture
(Chris) Would be good to get a status update on development environment efforts
(Ben) How can we make “thing not in platform” easier (for a new python API)
(Ben) How can we make “New MFE” easier (observability, config, etc)
(Hamzah) A “newsletter” of changes and features would be helpful.
[inform] (Ben W) FedX exists again. What this means. What our focuses will be.
2022-10-05
[inform] (Ned) We need hackathon organizers, please volunteer
[quest] (Jeremy) Hacktoberfest - do we want to accept contributions this year?
Easy way to get T-shirts for developers
But it’s not clear how much else we get out of this; usually a few vaguely useful contributions, a few mild wastes of time
[Ned] We have enough problems with our contribution pipeline as is, may not be a great idea to pile more into the backlog
[Jeremy] We do have a bunch of GitHub Issues for edx-platform pytest warning fixes that we could tag for participants
Let’s activate selectively for things where there are useful issues open and maintainers are willing
[quest] (Jeremy) How important/useful to people think type checking would be?
[Ned] First we’d need to fix our existing linting
We’d need a policy, one reasonable example could be “you may add type hints, but you don’t have to. If you do, the linting must not break”
Communicate said policy
Hold off on any big push for test-generated type hints or other comprehensive annotations
[discuss] (Diana) What do we need to do to make sure there’s not much disruption from Slack migration?
Migrate existing channels
Update integrations
Handle shared channels
There’s a lot written about this, few people have had time to read it all. And it sounds like there are at least a few corner cases that the docs and process don’t cover yet.
Emoji transfers (Matt Hughes seems to be working on this)
[discuss] (Ned) Links to private wikis from public wiki. Allowed/disallowed?
Feanil: it’s fine as long as it’s clear that it’s a private link and that it was understood by the author that it’s private.
[Jeremy] Is it worth wrapping them in conditional content blocks to make it explicit and avoid distracting other readers?
[Feanil] How about a table at the bottom of the page for links to each org’s private related context?
[inform] (Ned) Kelly is trying to formalize “public workstreams”: https://edx-internal.slack.com/archives/CDA7GMJ4B/p1664910103145889 (private 2U link)
[discuss] Max’s impressions of FedBom PR flow
[Jeremy] Gave historical context
A key problem is that teams feel that the risk of breakage is high enough that it’s only worth merging code directly related to current product management priorities. This isn’t good for long-term product quality and maintenance efforts, and we need to find ways to fix it.
[Feanil] Michelle Philbrick has been nominated for new core committer role to help manage PR workflow, evolution of what Natalia has done in the past
[Feanil] Grimoire does some PR tracking work
[inform/quest] (Jeremy) Arch-BOM -> Developer Experience
If you have any suggestions on improvements that should be prioritized, please let us know
2022-09-28
[Ned/inform] Open Source Process working group: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/19467639/Open+Source+Process+Working+Group (private)
#openedx-internal in edX or 2U slack (private)
[Ned] Forking Strategies doc in progress: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/155746369/Forking+Strategies (private)
[Jeff] Does this also cover the case where we need to fork an external dependency to fix an a11y issue?
[Ned] Not yet, but it probably should
[Jeremy] Potentially related: https://openedx.atlassian.net/wiki/spaces/AC/pages/3036972032
[Jeff/quest] Do we have a Dates API, for extensions?
Idea is that we should have some mechanism in the platform to facilitate people scheduling time to work together on a course
Things like this: https://www.flow.club/ and https://focusme.com/
[Dave] There’s support for retrieving key dates about the course, but not adding dates
[ideation] (Jeremy) Frontend security vulnerability handling
We get dependabot alerts about security vulnerabilities in dependencies.
Would be nice to just upgrade things (hopefully automatically)
Fed-BOM is working to get upgrade PRs like this assigned to owning teams.
[Alex] opines that teams may be missing a more formal on-call process, through which these upgrades could be actualized.
[Andy] A big part of the problem is that our frontend test suites are insufficient to catch even fairly major problems before deployment
This is not really a frontend unique problem, it hits all PRs from outside the team
[Feanil/question] What kind of testing maturity do we feel we need?
Better mocking and Test Data
More contract testing
Educating more developers on how to do this
Recording by Dawoud? Jeremy will follow up to discuss the possibility
BTW, he’s giving a DjangoCon US talk this year: https://2022.djangocon.us/talks/building-microservice-architecture-for/
Questions
Can you test backend rendered pages with contract testing?
Adding tests specifically for issues that broke Prod.
Record context on the bugs that escaped to production in a more public way so the community can better understand what broke and how.
[Ned/question] Hackathon?
[Jeremy] We need organizers, please get in touch if interested
2022-09-21
[Ned] What information would be helpful for improving PR review flow, either inside or outside of 2U?
[Ben] Tickets! (Jira) Hard to accurately prioritize among other work.
[John] Get PMs to review them on arrival?
[Jeremy] There are columns in the Squads tab of the ownership spreadsheet to specify how each team wants to be notified of review requests.
[Jeremy] https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/76808270/Cross-Squad+PR+Reviews proposes guidelines for improving this, feel free to suggest enhancements
[Andy] Even pretty innocuous PRs have caused major problems recently, making people very reluctant to review and merge incoming PRs from outside the team
[Andy] Having a dedicated concierge for tracking incoming PR review requests to each team would be useful.
[Jeremy] There’s an auto-formatter for tox.ini files; would people find value in trying it? https://pypi.org/project/tox-ini-fmt/
Tox-ini-fmt deletes all comments!! https://github.com/tox-dev/tox-ini-fmt/issues/42
I want one for GitHub actions
It sounds like there’s some interest in both of these, we can experiment with them somewhere. A linter/formatter that understands GitHub Actions semantics would be nice, if one exists.
[John] related, i used this locally with success to run/debug actions https://github.com/nektos/act
We should get more repos with old custom workflows to use the same workflow templates
GitHub Action to make a change across many repos and create PRs for each of them: https://github.com/edx/.github/actions/workflows/bulk_repo_update.yml (Arbi-BOM uses this pretty often)
[Jeremy] I also found a tox extension that checks for system dependencies before trying to install Python dependencies, to avoid weird hard-to-interpret installation errors. Would that be useful? https://pypi.org/project/tox-bindep/
Not a burning need, but seems useful, if only for documentation purposes
Particularly useful for operators (who may have reasons to use neither Ansible nor official/Tutor Docker images)
[Ned/inform] OEP-55 progress
[Feanil] Open edX conference preparation is starting
CFP will open in the next few weeks
Will be at MIT starting Tuesday, March 28th, 2023
2022-09-07
[Phil] [quest] User IDs across services - was very confused and was hoping for some clarification for people who know Django better.
User ID in Django is just an auto incrementing identifier
Only meant to be unique within service
We have used usernames and email addresses in the past to connect services.
PII is a concern, though, with usernames and email addresses.
We use LMS database ID as the global identifier for the user.
Other IDAs have their own user ID which is distinct from that LMS database ID, but have a field in the model.
Maybe enterprise-access or program-intent-engagement might have clues.
Change the API LMS-side to use user ID.
Ask the owner too!
Interested in the context of how to get more MFEs getting set up as efficiently as possible.
Maybe there’s a way to create conveniences in cookiecutter to, e.g. hydrate missing user information
In Rails, you can have a class that looks & acts like an ORM object but is backed by an API
2022-08-31
Notes available on private 2U Confluence: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/150569008/Arch+Hours+Private+2022.
2022-08-24
[Ned] Putting public information in the public wiki
Leaving stub pointers from 2u-internal to openedx will help remind people about the split
Andy says it’s easy to move docs, but you have to fix the links
Who will be responsible for informing devs?
The Open Source Process working group will figure that out
What about wiki vs readthedocs?
If it’s going into a wiki, it’s better to put it in the right wiki
TODO: is there a global template that can provide in-the-moment guidance?
Does 2U have any sort of “enterprise search” solution for docs?
We don’t think so, but great point to revisit now
What does this “enterprise search” even look like?
There are offerings from vendors that search across systems (confluence, read the docs, other confluence, github, emails?!, etc.), making the API calls, scoping to things to which the searcher has access.
We did light investigation on this in the past, but dropped it because the available solutions were deemed too invasive at the time.
Last we checked, Elastic had the best out-of-the-box solution for talking to all the systems we use. https://www.elastic.co/enterprise-search
The ability to understand the current state of access control in Google Drive and Confluence is hard. Adding enterprise search on top of this may exacerbate the hardness.
[Ned] OEP-55: Maintainership pilot is underway:
https://openedx.atlassian.net/wiki/spaces/COMM/pages/3426844690/Bootstrapping+Maintainership
[relevant to discussion about open source costs, ownership of crufty things, etc.]
Slack channel: #maintainers-pilot in edX slack: https://edx-internal.slack.com/archives/C03R320AFJP
https://backstage.techdev.2u.com/
A few people have access to the configuration for this, many people have ideas for how to enhance it, little enhancement has been done yet
https://backstage.openedx.org - run by tCRIL for public Open edX repos, still in early days
Is anonymous access to Backstage possible? Would probably be good for the Open edX one.
2022-08-17
[Andy] what even is this meeting now?
How does everyone stay informed about broader engineering efforts?
Now we are not all in L&P (“lump”?)
Conway’s Law in action
[Unstructured rambling from Alex about what/where Enterprise is now in relation to L&P and other parts of the platform system and organization].
What could we do to facilitate cross-batallion (column) architecture thoughts and information?
[Chris D.] Hire a chief architect?
[Alex D.] Is this what the Arch. Coordination WG is for?
[Andy] What even is the overriding edX engineering culture now? A lot of scrum teams have their strong team cultures, but they’re each self-directed.
Org-chart: https://drive.google.com/file/d/1th-2GYGEsMzvFnto8iGa6IY69kGbhGZS/view
Some columns have a dedicated architect right now (e.g. David Joy in LnP, although has interest in arch across edX/2U)
[Robert R.] Architectural fitness functions - does anyone have experience with this concept outside of edX? Specifically measurable fitness functions.
[Ned] remembers some ideas about using linters to catch some things related to fitness functions. Seems like we’re looking for a sort of “magic” technology to do architecture for us, instead of talking/training humans.
[Andy] Some experience in the past of publicizing cross-organization endpoint performance as a way to improve endpoints, make them adhere to better SLAs.
[Chris] Automated performance testing is hard.
“Repetition doesn’t ruin the prayer”
[Andy] Likes Chris’ concept of an architect - more of an architecture evangelist, trainer/teacher. Not someone who hands you a design to go implement.
“Architecture Shaman”, “Architecture Preacher”, “Sage”, etc.
Do we need a role where someone goes around and gets architectural workshops organized on a frequent, regular basis. Not presenting the workshops themselves, but prodding/requiring all (principal? Senior? anyone?) engineers to present topics at these workshops.
This came out of the idea that we lost our lunch/learn workshops that e.g. Dave O. would frequently run on performance (and other) things.
[Dave O.] 2U Enterprise is a similar use-case to a lot of open edX providers that have some custom stuff they want to run for paying organizations. More ownership burden, optimized somewhat for faster speed of delivery.
Should things be optimized such that, if the enterprise squad is wholly re-organized (as a team/squad) tomorrow, the systems stay good?
[Alex] Rambles.
[Ned] Every scrum team feels like they “own too much”. Why is this the case? Is this an edX problem? A product problem? A modern software problem?
[Robert] Raises the question of “are there some things which should not be included in open source?”
Could we make faster decisions about e.g. deprecating/decommissioning systems if we don’t have to worry about who outside of 2U is using that system?
Here’s an awesome diagram: https://openedx.atlassian.net/wiki/spaces/OEPM/pages/3499786241
[Ned] Putting public information in the public wiki
[Ned] OEP-55: Maintainership pilot is underway:
https://openedx.atlassian.net/wiki/spaces/COMM/pages/3426844690/Bootstrapping+Maintainership
[relevant to discussion about open source costs, ownership of crufty things, etc.]
[Ned] Would someone like to run this next week?
2022-08-03
[quest] (CD) Masquerade - there are multiple implementations. Is there an approach that should be considered canonical?
[inform] (Feanil) Arch Advisory Board
[inform] (Ned) Boston-area Open edX social: https://www.eventbrite.com/e/open-edx-boston-meetup-tickets-350633653697
[quest] (Ned) ADRs in OEPs?
Ned added a bookmark to open OEP-repo pull requests to #open-edx-proposals
[inform] (Ned) Readme contents
[inform] (Jeremy) Followup on SRE conversations
MySQL utf8mb4 encoding
edX SRE is prioritizing a discovery sprint, expects it to be relatively straightforward
Public issue to switch over all the code to come after that, before Olive branches cut
Ubuntu LTS upgrade cadence
We’re going to first optimize the process of doing Ubuntu upgrades: better automation of version updates in the code, more standardized deployment process, etc.
Will attempt a 22.04 upgrade after that if we’re not too close to the 24.04 release already
[Feanil] We should make sure to wait for .1 followup releases
[inform] (Jeremy) Successful experiment with GitHub labels for PRs in review, writeup forthcoming
[inform] (Jeremy) Arbi-BOM is making PRs to improve our Dockerfiles and could use more feedback to help establish best practices. Prototype: https://github.com/openedx/edx-analytics-dashboard/pull/1333
Feedback is especially welcome before we copy the patterns established here into more than a dozen other repos, making changes more difficult later
[inform] (Jeremy) The multi-repo bulk PR generation tool has successfully been moved from Jenkins to GitHub Actions: https://github.com/edx/.github/blob/master/.github/workflows/cleanup.yml
Called via https://github.com/edx/.github/actions/workflows/cleanup.yml -> Run Workflow
[Ned] Why GitHub Actions instead of some other means of running a script?
[Jeremy] Access tokens and other secrets pre-configured in the GitHub org, ability to do long process with lots of network and storage requirements on free hosted servers
[inform] (Feanil) Translations Infrastructure Revamp
Carlos at tCRIL is leading
FYI 2U already upgraded the libraries to use the newest client.
2022-07-27
[quest/ideation] (Dave): How to update MySQL charset to utf8mb4
We currently use “utf8”, which isn’t real UTF-8 and only has 3 bytes (lacking support for many characters)
Utf8mb4 is supported under 5.7, but the most appropriate collation to use isn’t supported until 8.0.1
2U SRE is still figuring out how to do the 5.7 -> 8 upgrade in Aurora without extensive downtime; there seems to be one option that will require a bunch of prep work
Most other installations will likely just want to dump and restore at Open edX upgrade time; for these, upgrading the DB and switching the encoding at the same time may make sense
Jeremy will bring this to SRE’s attention and see if/how it impacts MySQL upgrade plans for http://edx.org
[Andy] Seriously, is it just easier to switch to PostgreSQL instead?
Jeremy will ask about this too…
[quest/ideation] (Jeremy): How proactively do we want to track new Ubuntu LTS releases?
Question for the BTR WG?
This has ramifications for which Python release we next add support for
[Ned] Python 3.11 is supposed to be 25% faster than 3.10, but looks like it may be a rocky upgrade bug-wise due to internal changes
2022-07-20
[Andy] our standard JWT authentication tangles the global user into a service’s database. JSONWebTokenAuthentication may not be the right choice outside the monolith, but it’s in the cookiecutter.
[Jeremy] Django requires a user object even for basic request/response handling, and many of the fields like first name, last name, and email are required. So we either need to copy them from the LMS or make up bogus data to avoid PII spread.
[Andy] I agree that if we need a user it’s better not to have a half-real half-madeup user. :)
[Ned] Anybody participating in the Open Courseware architecture meetings? How’s that going, what degree of overlap is there with this meeting?
[Chris] More like what I expect from an “architecture” meeting, about boundaries, following our best practices, etc.
[Ned] Wondering how much of the content there is of interest to the broader Open edX community
[Chris] Touches on Team Topologies, internal team structures, etc. which may be confidential and/or uninteresting to the community
[Ned] There’s also https://discuss.openedx.org/t/new-working-group-proposal-architecture-coordination/7786 , which may be interesting to the people here
[Chris] Why is architecture so distributed/decentralized at edX/OCM compared to many other firms?
[Jeremy] We used to be more centralized, but teams often got blocked waiting for consensus from an architectural council with a different cadence.
[Jeremy] Also, we’ve already made a number of key choices like framework, deployment process, linting, etc.
[Chris] That feels more like DevOps than Architecture, although we do seem to have nailed DevOps pretty well.
[Andy] TripAdvisor was even more strongly against centralized architecture, apparently due to some bad experiences at other companies the employees had previously worked at.
(much more conversation that we failed to take notes on)
2022-07-13
[Dave] (inform): Architecture Coordination Working Group proposal thread.
[inform] (Jeremy): https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/76808270/Cross-Squad+PR+Reviews
Internal doc for now (streamlining process of getting PR reviews from 2U squads outside your own), but likely to be distilled into a more general maintainership doc
[inform] (Jeremy): https://openedx.atlassian.net/wiki/spaces/AC/pages/3467640837
The parts around “how do we define tasks to ask for help with” are pretty solid now
Still working on the “how do we incentivize the right people to help” bits
[ideation] (David): Architecture Idea Memo
Engineer comes up with an idea that they think will help with a problem, but have trouble quantifying and justifying it
Small improvements we should just do at engineer discretion, but bigger things might benefit from help in making the case for putting time into it
Further notes: Architecture Idea Memo Discussion
2022-07-06
[inform] (Dave): Sent an email to interested parties about forming an Arch Coordination Working Group. Please ping me if you want to be added to the thread.
[quest] (Dave): Sentiment around level of tech debt?
(Dave) It feels to me like some old pain points are finally getting addressed
(Andy) My team wrote up a doc of existing tech debt, and many of the items were left there for 6+ months and they’ve been ok, may just need to accept that some of those are ok as is.
(Jeremy) It feels like a high percentage of the success in this area has been due to hiring contractors to do it for us
(Dave) Yes, but there was a lot of prep work building up to those efforts
(Jeremy) And we have contractors in the Open edX community now with a lot more experience with the project
[inform] (Jeremy) Wrote up a draft of https://openedx.atlassian.net/wiki/spaces/AC/pages/3467640837 , feedback welcome
[quest] (Jeremy) Should we cancel future sessions of this meeting?
Attendance is down, but we still have several active participants
(Dave) With large attendance, felt like it was only appropriate to bring up topics of broad interest
(Jeremy) Might be useful to collect and vote on topics ahead of time
(Andy) Maybe needs a rebranding?
(Andy) Switch to biweekly?
2022-06-29
[analysis] (David) This month we switched Monthly Arch Standup to a “lean coffee” style… which makes it feel like this meeting. There are a few things folks get from that meeting: a forced read of team status updates, updates on impactful changes, and the occasional “aha!” when we realize some teams should coordinate. Is there a better way to do this?
[ideation](Simon) The value of this meeting and how to use that to improve the attendance? Easy solutions includes:
Adjust frequency
Adjust the duration
The start time
Discuss the historical context with OCM devs
(Ned, to add to above): seems like cross-functional meetings in general are getting smaller.
[Jeremy] A few people feel like they need to keep track of everything, most feel too busy with immediate needs to pay attention
There’s fragmentation between this meeting, OC Arch Hour, enterprise arch meeting, Monthly Arch Standup
[Simon] Consumer Review is more structured, with a schedule and specific proposals to be discussed. Would some elements of that make this meeting and related ones more successful?
[Andy] This is often more of a process meeting than an architecture meeting, but that feels valuable
[Simon] Maybe add some smaller meetings to replace this one, move most architectural concerns into subgroups, and either reduce frequency or eliminate this meeting?
[Jeremy] Next steps?
[Andy] Reach out to people who don’t come and ask them what, if anything, would make it valuable to them?
[David] Kill this meeting, have tCRIL create a new one for the broader community, double down on 2U working groups, area-specific arch meetings, etc.?
[analysis] (Jeremy) Draft recommendations for making cross-team PR reviews go more smoothly: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/76808270/Cross-Squad+PR+Reviews . Thoughts?
[ideation] (David) Defining architectural principles and fitness functions for our domains… how!? Worthwhile?
2022-06-22
What information do we provide to our partners (2U, Trilogy, Get Smarter) when sending them leads from our site? What’s contained in the UTM code and how do we know what happens on the other side?
Please connect with Gabe Mulley to figure out the different pathways for learners to go from http://edx.org to 2U other LOBs websites
[David] Question for Simon around relaunching architecture advisory/working group meeting for Open Courses - Status update?
One meeting
Identity problem - what should the advisory group be?
Touches all sorts of things, team org, cross-functional stuff
Potential activities for advisory forum (top of mind list from David)
Principles
ADR review
AIM - architectural idea memo
“Ilities” - characteristics
Tech Radar
[Jeremy] Getting reviews on FED-BOM PRs
[Robert] Can we make it even more clear that what we want is just a sanity check that no major changes are inadvertently being made?
[David] Some of these are in teams with active maintainers, we could ask them for review on those instead.
[David] Renovate PRs that are patch/minor version bumps with no conflicts or Github check issues can just get merged
[David] Maybe we can use labels to help route after frontend triage takes a look?
Okay, maybe not useful
[Jeremy] How to get community momentum on the backlog of well-defined maintenance tasks?
[Andy] What about having deadlines made it work for INCR and Django 3.2?
The fact that many of the people were on the hook to upgrade to the next named release which needed these changes to stay in support windows.
[Robert] Do we just need more squads like Arbi-BOM?
In the Open edX community, not necessarily at 2U
Ask community members to chip in funding/support for such teams?
[Andy] Badges and achievements?
Discussion forum badges have been considered
Some kind of org recognition that could be used in marketing materials? “Gold level Open edX supporter”
Recognition at conferences
2022-06-15
[Ned, inform] writing up instructions for using forks/upstream, in prep for SOX compliance: https://openedx.atlassian.net/wiki/spaces/AC/pages/3458170881
(Jeremy) Maybe add some notes on how to avoid / recover from accidentally committing changes to master in the fork?
(Andy) What if you need to collaborate with other developers on the change?
(Ned) Give them access to your fork
(Feanil) What’s the obstacle to using the edx org for working forks?
(Ned) Large installed base of local clones configured to point at edx instead of openedx, which currently work due to forwarding. No good data on how often this forwarding still happens.
(Feanil) I’ll email GitHub to ask if we can get stats for this
(Simon) Where are we using this first?
(Ned) There’s an upcoming communication about the first 6 repos where everyone outside the owning teams will have to follow this process.
(Ned) We may later need to also do this for a broader range of repos in the openedx org.
(Simon) Please let me know as soon as there’s any concrete news on this
[inform] (Jeremy) We’re considering having Arbi-BOM kick off implementation of OEP-45: Configuring and Operating Open edX. Let me know if that concerns anyone.
(Feanil) Probably worth running it by Kyle and Regis again
(Feanil) Why are we using YAML rather than Python files?
(Jeremy) Ability to import twice for ease of dealing with derived settings, easier to write a schema validator for, more flexibility on where the settings file can live (doesn’t have to be on the PYTHONPATH)
(Feanil) Good points, but just keep this in mind when implementing in case a Python file turns out to work better for other reasons, given that it’s what Django usually expects
[Quest](Simon) Where are we with the Kafka event stream work? When can we expand the implementation to other use cases?
We have a working happy path use case, and are currently consolidating relevant code into a shared library
Need to complete that consolidation, do some error handling and monitoring improvements
Roadmap for Event Bus: https://github.com/openedx/platform-roadmap/issues/28
[inform|analysis] (David) We’re adding runtime config for MFEs based on config defined in edx-platform.
https://github.com/openedx/edx-platform/pull/30473
(Simon) lms_mfe_config_api perhaps?
(David) This isn’t lms specific, but the auth code is in LMS
(Chris) Why do we need this?
(David) Community wants to be able to change branding, visual look, etc. without needing to rebuild all MFEs
(Jeremy) Also, Tutor currently has to rebuild all MFEs if any configuration variables change
https://github.com/openedx/frontend-app-learning/blob/master/.env
2022-06-08
[inform] (Simon) I created a OC Engineering and Architecture Advisory. I can use feedback from attendees of this meeting
Adam: Async Feedback, I think Arch Hour Moved on top of Embedded SRE meeting, I find it very helpful to read the meeting notes afterwards though.
Chris - It is also on top of Paragon WG
+ [quest] (Simon) What LTI client account do 2U/edX maintain for development? TurnItIn? H5P.com? Others?
Studio and LMS go to different assignments on Turnitin
Same parameters with different values somehow
How do we get Turnitin accounts?
Work with PMs and PCs to collect a list of LTI clients that our partners most frequently use. Then approach those LTI clients and establish a process with them for supporting edx-platform integration. Establish also a process to add or subtract from that list of “supported LTI clients”.
[quest] (Robert) Did my devstack hacks document have anything new for anyone? Relates to last week’s discussion around my having less pain than others.
[inform] (Robert) Arch-BOM is experimenting with a Github project in place of Jira board.
[inform/ideation/quest/analysis] (David J) Categorizing pages in the Architecture and Engineering wiki space by where they should probably end up: Architecture and Engineering wiki categorization
DJ: If you want to edit it, just ask, happy to add you - I set it to public comments but not public edit.
Maybe ending up here: https://docs.openedx.org/en/latest/developers/developers_home.html
A popular four-category doc strategy: https://diataxis.fr/
2022-06-01
[ideation/quest] [Robert] I’d like to discuss our test strategy regarding cypress tests.
Since e2e tests are more costly to run and maintain, we’ve generally kept to a smoke suite of important use cases. What strategy do we want?
For edx-platform, we used to have bokchoy integration tests. In this ADR, again, we decided on just a smoke suite because additional tests were too costly to maintain, too costly to run, and very rarely failed due to a real problem. Arbi-BOM removed bokchoy, and I think there is a plan to replace it with a cypress suite. Is the decision in this ADR still accurate?
Getting the e2e cypress tests working in the pipeline is currently owned by the QA team.
It is exciting that some of this work is making progress.
However, it seemed from Ansab Gillani’s Eng-All presentation that his team might be envisioning much greater test coverage using cypress (to be confirmed). Let’s discuss with someone from the team to determine alignment/misalignment, and determine good next steps.
Where to run the new Cypress e2e tests?
GitHub Actions: not positive it will work with the pipeline, needs discovery work
build-jenkins: current e2e tests run here, slated for decommissioning soon
tools-jenkins: choice of last resort
GoCD: not clear this can work
(Ned) Has this been announced/promoted to the Open edX community?
Not yet
Probably worth a mention on http://discuss.openedx.org
(Jeremy) Do the cypress e2e tests work in devstack or Tutor?
Not yet, mainly tested against stage so far
This isn’t a regression against the bok-choy e2e tests, since they haven’t worked in devstack for a long time
(Jeremy) How many of our e2e maintenance problems seem to be from bok-choy vs. cypress?
Cypress is a significant improvement over bok-choy, but still fairly problematic and slow compared to Pact tests
(Simon) The end goal is to have cypress test be running in the edx-platform CI/CD deployment
Expect to have this, if ESRE can fix the pipeline running blocker of cypress by then, around mid-June
(Dawoud) YOW! 2017 Beth Skurrie - It's Not Hard to Test Smart: Delivering Customer Value Faster #YOW is an insightful talk on e2e vs contract testing, what not to e2e test, the intent of the contract testing, etc.
[quest] (Jeremy) What factor(s) have most hampered your ability to deliver value in the last 6 months or so? (Deliberately open-ended, don’t want to lead towards any particular problem or solution.)
(Chris) Attrition, lots of people needing to take ownership of code they don’t know well yet
We need a better culture around writing code with the intention of eventually handing it over to someone else - documentation, etc.
(Andy) Test cycle time, too long to avoid context switches
Especially for local development, takes too long to get set up to even be able to run tests
We’re getting better about duration on GitHub tests, but still requires overhead to push a branch and create a PR, etc.
(Chris) Often takes days to get devstack back in a usable state after not using it for a while
(Robert) Why do some people keep encountering this and others almost never do?
Contributing factor is number of different services in use
Also, set of commands for quick fix of most problems isn’t well documented
(Simon) Lack of clarity of the value of what we’re currently doing
Example: ticket blocked for 2 weeks, but only impacting 1 learner
(Kashif) We don’t have good data on how often different parts of our test suite catch actual problems (especially relative to time spent diagnosing flaky tests)
2022-05-25
[ideation] (David) What would it take to create high-level (context/system) baseline, current state architecture diagrams for all of Open edX?
In Mermaid, please: https://github.com/mermaid-js/mermaid#readme
(David) I actually tried this and found that its layout engine isn’t good enough for complex diagrams… they just get impossible to read. Happy to learn and be wrong about that, though!
(Ned) I haven’t tried Mermaid, but text would be great if we don’t want to start over every year.
(David) Diagrams as code living close to what they describe would be delightful
We have this from Content architecture vision, no? Why step away from C4modeling?
I’m not, that was the “context / system” above, but I should have said container for the second level
Diagrams are useful, have different audiences, and require effort to maintain
(Jeremy) Should we have a designated owner(s) of overview diagrams and docs?
(Feanil) Even bad diagrams can be useful in the sense of getting more experience learning how to make better diagrams.
(David) Diagrams should come with a description of its intended audience
[quest](Simon) What is known within TCRIL world? What are the initiatives you are working on?
(Kyle) This is our roadmap view: https://github.com/orgs/openedx/projects/12/views/1 . Check out “in progress”
And our day-to-day: https://github.com/orgs/openedx/projects/8/
[Ideation/quest?] (Kyle): Private 2u-internal jira links (or other private links) in PRs - ways to nudge people to put context in the PR & commit message?
Idea: PR template
Ned disagrees - they get ignored/stale
[Robert] The decision of what we wish should be documented somewhere. Could be OEP. Could also be in PR template as well.
Idea: Linter/nagger that warns about private link
This would also unfairly warn people, though, who are including private links but also including all the relevant context.
Idea: Have some heuristics, eg if the pull request description is really short AND there’s a private link, then nag
Point: How many PRs are actually looked at by community members?
Kyle: potentially all of them. But how can we make this clearer?
Robert: Just say so every time we hit the problem.
relevant tcril-engineering issue: https://github.com/openedx/tcril-engineering/issues/271
2022-05-18
[Andy] generic xblock ticketing to enable exam service - be able to convey “the exam service thinks it is ok to show this xblock to this user right now” or conversely “this xblock demands the exam service said it was ok to see” via signed jwt.
Discussed for ~25 minutes, then context switched
[Andy] results from RCA “did not test” survey 2020-2022
Roughly half of recent RCAs have involved an inability to test in some way
[quest] (Danielle) David Joy mentioned edX/Open edX initiatives around guilds/interest groups/team extra curricular initiatives that ended up becoming very distributed over time. What about the distributed approach did work, what in retrospect didn’t pan out as intended
The question is mainly regarding what we call working groups
Add working group participation to career path.
Lends credibility and importance to participating in this kind of activity
Not all working groups are the same
Some working groups are more aligned with the daily work participants do than others
How is my participation in a working group helping me with my daily work?
What does the end goal of my participation in a working group look like?
managers can play a role here with coaching their reports to help evaluate the participation and progress
How do we reconcile squad needs vs. working group needs?
Part of it is explicit expectation that engineers will spend a percentage of their time on this, and managers should help make that happen
Some things work better in squads, others in working groups
Are the tasks high latency, or do they require a lot of heads down time?
[quest] (Jeremy) We’re thinking of kicking off some kind of initiative to get better consistency our dev/stage/sandbox/prod/etc. environments. What are people’s top wish list items in this area?
[David comment in chat] "How do I sandbox?” and "How do I use a sandbox with this thing?” seem like a perennial issues... which I think is influenced by the lack of consistency/predictability
2022-05-11
[Ned] Any Atlassian migration concerns to discuss?
Process feels a bit confused, but no major concerns right now
[Feanil] Curious about how we continue to work in public as much as practical when Jira goes private
Arch/Arbi-BOM considering experiment to work primarily in GitHub Issues
Looking to get Jira out of the picture for OSPR and BD
[Diana] (question) Paver, future of?
Should we make a conscious decision to either continue using or move away from paver?
Is there a clear “winner” to replace it?
[David] What are all the things Paver does?
[Feanil] we use paver in bad ways
To hide platform complexity, which keeps people from learning those complexities
[Jeremy] We should probably create a paver DEPR to clarify that we plan to phase it out over time, and not use it for new things
Jeremy will ask Arbi-BOM to enumerate what it is still used for, so we can come up with plans for each of them
[Andy] how do we get serious about local testability? We could break the site for a few days maybe?
[Jeremy] There are a few efforts in various stages of progress that could help with this:
Dev Env WG and the migration to tutor
Arch-BOM’s work on the Dev Data OEP and framework
Arbi-BOM’s effort to improve the state of Open edX configuration
Incident Management’s work on Pact (consumer-based contract testing)
[Andy] It keeps coming up in RCAs that something broke because it was too hard to test locally before merging and deploying
edX/2U used to have a Test Engineering group, we may still need something like that (especially if not bound specifically to Jenkins maintenance and the edx-platform test suite)
[Simon] Should we do an RCA on the problem to get more clarification on what exactly we need to solve?
[Andy] audit of RCAs since 2020 for could not / did not test as contributing factor: https://docs.google.com/spreadsheets/d/15UR4R8FWUgdBFJyRnXc6dbUV2ES3O8OPJyQhEJFeHI0/edit#gid=0 - RCA category is filtered to “regular” to exclude SRE and process type RCAs
Summary for outside of 2U google doc space - few RCAs so far in 2022 but 66% are in this bucket
More RCAs in 2021, maybe 50% in this bucket
2020 similar to 2022
General level of terribleness in RCAs much lower in 2022 vs 2021 and especially 2020
2022-05-04
[Ned] (inform) Tobie Langel at conference: Moving to Collective Ownership
His key points from the keynote:
2U needs to:
Spell out the business value of open source
Accept changes to flow in order to level the playing field
Teach to fish rather than give fish
The community needs to:
Understand that open source is a do-ocracy
Spell out their business value for contributing
Stop asking for fish
tCRIL needs to:
Facilitate everything
live notes from the follow-on discussion session: https://docs.google.com/document/d/1BuMwDdsFVto1NLvaUgkJXHgvibzZbmjCuQrMe-v4wTE/edit#
Simon: maybe there’s open source value for 2U, but the community won’t like it. Maybe if 2U earns money?
Other projects have run into this problem, where profit companies aren’t contributing back
Force of divergence might get greater over time.
How can we push back against the forces of divergence?
2U uses data pipeline tools that cost money, so the community doesn’t adopt them, and we have a different scale than they do.
This is diversity, is that different from divergence?
Find places to use common solutions, and also allow for diversity
Pick the boundaries appropriately
A common reason for organizations to use an Open edX provider (like Appsembler or Opencraft) is that they use some vendor they’d like to integrate their platform with.
[Simon] Can tCRIL help facilitate the identification of what community contributions would be most valuable for everyone?
E.g. There was a time when edX devs thought ecommerce would be fully accepted by the community, so a lot of effort was put into documentation and feature work, but that turned out to be a faulty assumption - Open edX installations mostly did not want/need the ecommerce system.
Ned contends this was not really wasted effort, the payoff was just very far in the future and adopted by fewer Open edX installations than we thought at the time. But it still helped drive adoption.
[Ned] The attention from 2U/edX toward the community could wander as time goes on, and then the value of community contributions decreases/disappears?
Working with the community is an engineering tactic and strategy; there won’t be strong user feedback that indicates to 2U that we’re not working enough with the community.
The business value of working with the community can seem counterintuitive, e.g. working to help merge a contribution often doesn’t get us closer to releasing a given feature next week (or whenever).
There is at least one example (Racoon Gang blended projects, coordinated by Adam S.) where community contributions are both timely and relevant/unblocking to active sprint work by a 2U scrum team (Enterprise Titans/Access).
Robert R. is reviewing a giant community DEPR PR and is happy to see the work moving forward (from Racoon Gang). It requires some non-trivial amount of Robert’s time, but the work is moving forward.
(Do we just love Racoon Gang?)
[Robert] Tobie brought up the idea of an open-source Member Organization as a way to provide funding. We don’t really have that, but tCRIL has decided to start funding some community projects (in addition to 2U/edX).
The tCRIL funding just comes out of the general tCRIL project.
We’re not sure how nonprofits can collect dues or whatever from potential members and what those dues could be used for.
Would it be better if there was just one big community fund?
Is it possible to allow donors to specifically allocate money toward blended development (other non-profit donations allow this type of thing, e.g. “Here’s money from the Undergraduate basket weaving program at U. State”).
[Dave O.] Community contributions that help clean-up or deprecate brings real value to edX/2U/tCRIL, though it might not have a direct line to business value.
[Ned] OEP-55 (Project Maintainers): increase community rights, or reduce edX rights?
[Ned] (somewhat relatedly) people are chattering about changing edX deploys. Some sketchy notes here: https://openedx.atlassian.net/wiki/spaces/AC/pages/3343646983
Question about “by the end of the year, 2U should be treated like everyone else”. Do we know what this would look like? (also note this quote is from Tobie, whose statements on this matter are intentionally bold/provocative).
Couple of options
Are 2U engineers made to go through the same core-committer process as non-2U community members? Or is access to some repos restricted to only the core-committer process? Or something else.
The hard work is figuring out exactly how this could/should work.
[Andy] The language around “rights” is maybe too emotional? Same with “level playing field”.
Maybe “permissions” would be better
“Level playing field” might be too big of a lift to get to. As long as 2U is the largest Open edX installation/organization, it won’t be level.
It’s “asymmetric”
Part of the thing to fix is an emotional component; the rest of the community feels like a second-class citizen. This is important to acknowledge.
[Simon] Security patching process - we might need to hide community contributions that patch a vulnerability. Conversely, there are 2U things that need to be hidden/private from the community.
[Dave O.] Says the forbidden word (“fork”).
2022-04-27
[quest] (Jeremy) What are your top pain points related to Open edX configuration and settings? Do you have any suggestions for improvement in this area?
Lack of consistency
Settings file in repo
Remote config
YAML files generated for devstack & Tutor
OEP-45: Configuring and Operating Open edX (Provisional) (link to Configuration section)
Multiple override layers make it nearly impossible to keep the big picture in your head
Little/inconsistent documentation for each setting
[inform/quest] (Jeremy) We’re working on spinning up an Arbisoft squad for front end architecture and maintenance. If that pans out, what would be your top project nominations for them to work on?
[Simon] Somewhat concerned about interface between maintenance work and front end features impacted by it on owning teams
On demand model may work best, where they do more work on request than proactive maintenance
May be best to start with Front End WG requests
[inform](Andy) I’ve started working through how we might capture program intent (not enrollments, intent) Spec here with Spencer and I, Beginning of Technical Doc on it and the Approach doc by Spencer which still called this program enrollments
[Quest](Simon) Why do we have Monthly arch standup, Arch hour, and Content theme arch group as a sub group? Is architecture organically grown to the point where we can use some top level organization?
The OEP-56 is trying to address the meta side of this question.
Can we please use that OEP to formulate a plan for this question?
Feels like too many cross-functional needs land in “well, maybe Arch-BOM will get around to it”
2022-04-20
[quest](Simon) input or reactions on How do we perform access control on Special Exams without edx-proctoring
Async feedback after the meeting is welcome, the doc’s a little long to read and process during the meeting
[Ned] Can we set guidelines for how to use our various communication channels?
(Simon) This feels like an attempt to impose control on an existing grassroots communication pattern
(Ned) Rationale is to make sure that people who want to stay informed about certain kinds of discussions/decisions don’t miss key communications
(Jeremy) Arch-BOM and Arbi-BOM use https://openedx.atlassian.net/wiki/spaces/AT/pages/2331836527
(Ned) Is there a set of announcement venues which, when all utilized to announce something, we can realistically assume will reach all developers?
True at least for all developers adequately paying attention, as long as we don’t flood the communication channels with irrelevant information
(Ned) Good point, we need to take care to preserve the signal/noise ratio in such channels
[analysis] (Jeremy) Does it sound reasonable to spin up a bunch of Blended Development projects, coordinated by different squads, to finish building all the MFEs we need? Then throw away all the edx-platform JS and spin up an Arbisoft squad to help with front end maintenance?
[SC]It’s a good practice to calculate the cost of delay on the risks and the scale.
[JB] The old javascript will also be a reason for poor employee engagement and trigger employee retention risks.
2022-04-13
[Ned] (inform) there’s a thread in #openedx about GitHub wikis
Strong suspicion that it would just make things worse
Only 14 repos use it, most of them stale or almost empty
xblock-utils 2 pages 2016-01-12 Tim Krones: Improve formatting, spelling, and wording.
edx-analytics-pipeline 9 pages 2017-11-28 brianhw: Updated Tasks to Run to Update Insights (markdown)
edx-analytics-dashboard 2 pages 2015-12-22 Daniel Friedman: Updated WIP: OpenID Connect (markdown)
cs_comments_service 5 pages 2014-06-30 Trinh Nguyen: Updated Query or Delete comments data in mongodb (markdown)
edx-proctoring 2 pages 2015-12-16 chrisndodge: Updated Release Process (markdown)
xblock-sdk 1 pages 2016-05-13 Bui Trung Nghia: Initial Home page
openedx-demo-course 1 pages 2014-04-24 Luyang: Created Home (markdown)
configuration 19 pages 2020-03-11 Tim McCormack: repoint to devstack repo
edx-platform 56 pages 2022-03-25 Julia Eskew: Updated Opaque Keys (Locators) (markdown)
edx-app-android 2 pages 2015-03-25 LiuNaidi: Initial Home page
studio-frontend 1 pages 2018-03-06 Eric Fischer: formatting
edx-notifications 1 pages 2015-04-15 chrisndodge: Updated Home (markdown)
ux-pattern-library 6 pages 2019-12-13 genisys58: Created Styleguide: Sass & CSS (markdown)
edx-tools 2 pages 2017-10-03 Julia Eskew: Updated Home (markdown)
[David] (quest?) Where should we put arch diagrams so they will be discoverable and kept up to date?
Lucidcharts that aren’t specific to a particular repo
Feedback Requested: Open edX Documentation Restructure - Announcements
[SC](inform) Program Enrollments coming! The approach doc has a lot of support
2022-04-06
[Ned/David] we are looking for examples of discussions where you were uncertain whether you could share information from inside 2U to outside.
Came up in conversation with 2U privacy policy authors
Reconciling differences between said policy and historical edX behavior
Doc for 2U Privacy: 2U Privacy Policy and the Open edX Community
Grey areas:
Vendors and services we use
OGSPs
2U-edX convergence opportunities (software, capabilities, product, etc.)
What can we share with tCRIL that we can’t share publicly?
Examples:
Plans for integrating lines of business
Roadmap of future features
Discussing security
Plans to upgrade or not
Don’t talk about what type of database we’re using
“edX.org has a way of privately patching product before it releases things to the rest of the ecosystem”
Paragon and 2U design systems conversation
Contributions to Open edX public roadmap
Philosophical approaches:
“Tech is not the competitive advantage, the content is.”?
Data working group: what do we want to do for analytics for the platform? “Hey, my group in 2U is planning on investing a bunch of time in X” The fact that 2U is planning on spending time on that tells you something.
Don’t want to be hamstringing our partners who help us implement this functionality and utilize it, while at the same time, people who want to know what tech we use, people have automated ways of doing that today.
[Simon] Vendors: Vendor asking about one of our lines of business, LMS choices, etc. Should we have shared that?
[David] ^^^ Me, right now, not sure what to write down and what to keep back
Guideline Jennifer set up for us to follow - let’s not share anything outside of OCM without explicit acknowledgement from other lines of business
Within OCM, the line is easier to understand because we’re familiar with it
This conversation could very well have happened with a community member and not a vendor, too!
(quest) [Jeremy] What would you need to decide that an event bus is useful and ready to use in a project? Top 1-2 concerns, not full list.
Testability
Can I just inject events into the bus to avoid bringing up the other end of the thing?
How to know if an event fired?
Development cycle time
Good documentation:
How do I add an event?
How do I receive an event?
Directory of existing events
Observability
(quest) [Jeremy] If we had a cloud development environment, what are some of the things you’d need to see in it to find it useful?
[Andy] Push button startup
[Andy] Disposable environments, “Don’t know the state? Throw it away!”
[David] Minimally, I can auth against it so I don’t need edx-platform
[Chris D] From a developer perspective, what would be good about it? Are there any developers clamoring for this?
[Diane] A simpler, more understandable config setup
2022-03-30
[analysis](SC) I’d like to get feedback on Special Exams IDA design. It’s what Cosmonauts are planning to do starting April
Discussed terminology of “Special Exams”
What makes them “special”?
Timing constraints
Proctoring/monitoring
How much of implementation is shared with regular exams?
New IDA may utilize the event bus, but unsure of the details until more of the core IDA design is fleshed out
Discussed IDA vs plugin, how this is distinct from the standard LTI proctoring interface
Data synchronization across IDA boundaries may be painful
2022-03-23
[Ned] proposal in review for a pilot of narrowing 2U rights to repos
The idea is to limit write permissions to just the repos each 2U employee actually needs, not all of them
Trial for a single squad is being prepared
What if a developer suddenly needs access to a new repo? Can we do this quickly enough?
There’s currently a process with a SLA of 24 hours or less
Can we increase the pool of approvers for such requests to reduce the turnaround time?
“Core Admins”?
Will we lose “drive-by” improvements?
These are all public repos, it’s always possible to create a PR from a fork even before write permission is granted
There’s a proposal to use branch protection instead of write permission to enforce the restrictions
Solves some pain points around forking, but doesn’t offer as much visibility into the current state of access
Will this worsen the sense among some 2U developers that they’re detached from decisions about the direction of the platform and don’t know how to influence it?
Would an alternative to making things more symmetric be to give core committers membership in push-pull-all?
Yes, but that doesn’t help the secondary security goal of the proposal.
Could some people potentially keep full write access to most repos?
Would it be worth doing a study of past committers to a repo and seeing who the new permissions would knock out?
[SC] What is the current status of automated acceptance tests? Not only edx-platform
Context: Was dealing with acceptance tests on edx-analytics-dashboard
[Jeremy] Incident response has been working on this and is porting things over to cypress, but we’re not sure where they are on this.
Are we maintaining these in general? They don’t seem to cover much usual functionality anymore.
Not much. We haven’t been getting much value from them relative to the maintenance burden.
We’re testing out consumer-based contract testing via Pact as a replacement for much of the value we had tried to get from acceptance tests
[KB] tCRIL proposal says “architecture WG is in the works”, can we map this to the OEP and discuss?
No news yet, this has been waiting on OEP-56 finalization.
Not because that’s a blocker, but because WIP is high and we’re trying to focus more on one thing at a time
[KB] I heard that tCRIL is doing some sort of tool to have broader insight to Github repos (called Amore?) and I want to go to there.
Grimoire Labs - it’s currently broken, Ed is working on setting up a hosted instance of it
Aggregates data about a bunch of repos and lets you build dashboards about it via Elasticsearch
Not thrilled with the state of the package, some overlap with Repo Health Dashboard functionality
We’re not too happy with the state of Elasticsearch licensing and hosting at the moment…
We need a better front end for the repo health dashboard, there’s a ticket with ideas for this: https://openedx.atlassian.net/browse/BOM-2145
If anybody wants to talk about this or start working on it, please get in touch with Jeremy Bowman
Maybe Backstage could be part of this
[PS] [ideation] When to break out a new Django app?
2022-03-16
[Ned] Slack shared channel policy: why did we require them to be private?
Because it used to be much harder to distinguish channels that had external people in them. Slack has since fixed this.
[Ned] Introductions, we have a new person
[David] (maybe a 2U topic, oops) Arch Hours - there’s a Content Arch WG that has one too, and interest in broader 2U. Is this perhaps the Open edX Arch Hour?
Essentially, try to bring up topics of broad interest to the Open edX community here and ones that are likely to touch on 2U proprietary information in the other venues that only have 2U employees in them
Arch Hour representation of teams? I.e., “representative democracy”
Reminder: we try not to make decisions in these meetings, because not all stakeholders can reliably attend. More of a forum for questions and initial feedback on ideas before they go out for broader discussion and review.
OEP-56 tries to establish a process for decision making that gives all stakeholders a reasonable opportunity to participate
We have existing change broadcast/feedback mechanisms for OEPs, DEPR WG, and Architecture squads
[inform](Bernard) New to company and checking out forum
[inform](Feanil) Dave is doing a lot of open thinking around the learning core
[inform](Feanil) Consider watching the announcements space on Open edX Discourse
[Jeremy] We have a lot of GitHub Project boards now, in different places: edx & openedx orgs, org and repo level in each. We should make sure they consistently have descriptions, and some kind of index of them.
[inform](Feanil) docs.openedx.org design coming soon
2022-03-09
[analysis] (Jeremy B) We have a Confluence page for major Open edX upgrade projects at https://openedx.atlassian.net/wiki/spaces/AC/pages/1165395730 , but it rarely gets updated. Would a GitHub Projects board work better, with a separate issue for each major upgrade (past, present, or future)?
Maybe put issues in https://github.com/openedx/public-engineering , and create a new board for it? Jeremy will discuss it with tCRIL.
[inform] (Robert) Event bus work is at-risk of being de-prioritized. If you have a strong need, please reach out to discuss.
https://openedx.atlassian.net/wiki/spaces/AT/pages/3196321805/Event+Bus+Use+Cases
Question: What work would be prioritized over the event bus work?
Still being identified. We’re starting a process to enumerate and prioritize projects for Arch-BOM and Arbi-BOM.
[quest] (Ned) What’s the status of OEP-56?
[inform] (Jeremy B) I created an #npm-releases Slack channel
Is this useful?
What should be subscribed to?
Better docs and broader announcement forthcoming.
[inform] (Adam B) edX SRE would like to be able to turn off Build Jenkins "soon". This is possible thanks to the awesome work Arbi-BOM has done to migrate jobs to GHA, if you know of a job that doesn't yet have a migration plan, please reach out.
2022-03-02
Quick Introductions: Simon and Danielle
[quest] (Jeremy) 2U’s internal Slack has a #pypi-releases that (currently imperfectly) tracks releases of our dependencies from PyPI. Is this worth enhancing and/or announcing more broadly? Would the Open edX community benefit from some kind of equivalent that only includes dependencies of Open edX?
Split into 2 different channels? Core and other
[quest](Feanil)So many OEPs in Flight, are people getting time to look at them?
It’s hard to know if the OEP is ready for review.
Some Suggestions
Mark things in GitHub Draft State
Ensure we have a clear review period after its left draft state.
Make review period more clear and explicit for updates to existing OEPs which there have been many.
Can we split out #oep-notifications? That’d help signal-to-noise.
Make it clear when changes are minor.
Ways to Follow OEPs
[inform/question] (Dave) I’ve been sketching out more of the Learning Core proposal that I had mentioned a while back. What’s the best way to get feedback/thoughts/input/etc. on this, particularly from 2U?
Robert: Want to see: here are the use cases where I think people are going to be happier. Here’s the added pain you might feel because of this change. (in any ADRs, forum posts, etc.)
David Joy: How much will people need to change how they work?
David Joy: Bigger than ADRs
Andy: Need to find a champion inside 2U
[Q] (Ben W) The new meeting time coincides with the paragon working group… is there any other time that might work for this meeting, to allow people to attend both?
Jeremy: Options are very limited, but I’ll see if there are any practical alternatives
2022-02-23
[inform] (Jeremy) Arch-BOM is in the process of clarifying its mission and role, there are likely to be some minor shifts in recommended ways to interact with the team as a result
Arch-BOM is: Jeremy, David Joy, Becca Graber, Robert Raposa, Diana Huang, Tim McCormack, Ned Batchelder
[inform] (David) OEP-56: Arch Process has been added as a PR, but is still in Draft state/in flux:
[inform] (Ned) I’ve been hacking on a GitHub daily digest. Example: 3 Days of activity in three repos of interest to me.
[quest] (Jeremy) How should we go about determining if, say, MySQL full text search is “performant enough” to replace Elasticsearch for our usage?
How much of this performance testing can/should be done up front?
Log existing search queries and pipe them through the suggested replacement
[analysis] (David) Mermaid in Markdown on GitHub - straw that broke the documentation OEP’s back?
2022-02-16
[Inform] (Simon Chen) Course application architecture diagram as part of Content theme architecture vision
*[Inform] (Feanil) Documentation work for Open edX
Is the OEP the canonical entry point for getting more info on the initiative? (or is there another one…)
https://github.com/openedx/tcril-engineering/issues/89 - Overall Epic
https://github.com/openedx/tcril-engineering/issues/103 - Persona Work
https://github.com/openedx/tcril-engineering/issues/104 - Document Audit
**[Inform] (Ned) next-up open source concerns: managing risk to edX.org deploys, privacy policies
[Inform] (Ned) I hacked together a GitHub “daily digest” thingy, and am interested if others would find it useful. Example digest: https://nedbat.github.io/graphql-learn/example.html
***[Ideation] (Feanil) Conference talks you can give at the Open edX Conference
Conference is 4/26-4/29
Talks are due shortly (2/20)
Need more proposals
Proposed pre-recorded talks are ok if you can’t physically go
Don’t need to worry too much about being too technical or not technical enough
Topic suggestions:
Is there a Paragon talk already?
Submit your talk, if there is overlap we may ask you to work with others in the community.
Event bus (some combination of Arch-BOM folk) +1
Shifting away from the modulestore in LMS (Julia)
Learning sequences
Blockstore
Etc.
MFE state of the world in Open edX platform (Julia)
Which ones exist?
Which are planned?
Content authoring editors: Make it React! (Julia)
Text editor re-write in Studio
Video/problem editor re-writes planned
How to re-write your editor in React
Code maintenance automation and effort distribution (Jeremy and/or someone from Arbi-BOM?)
ID Verification and its Future (Simon)
Tutor, Devstack, and the future of Open edX Development (Julia)
Lessons from INCR and other distributed upgrade efforts (Jeremy)
**[ideation] (Jeremy) How can we best maintain and get momentum on a pipeline of Open edX development tasks that anybody in the community could work on?
Prior art: INCR tickets, Django upgrades
Open edX Roadmap for big-ish things
What are the hooks for people to get involved?
Badges on Discourse
Potentially good for:
Upgrades
Fixing linter/interpreter warnings
Removing deprecated stuff
Maybe some kind of working group for helping new developers get involved in the project?
Existing working groups could curate appropriate tasks for the pipeline
GitHub Project for good starter tickets? Maybe another one for slightly more complex tasks?
[quest] (Waheed) Some of the edx/openedx owned packages are not released for years, what is the process to push a new version to NPM registry for them? E.g. https://github.com/openedx/stylelint-config-edx
Semantic-release is not configured for this repo
ACTION: Added the release workflow and published a new version successfully
2022-02-09
**[David Joy] Review Arch Forum Proposal
Draft/WIP: OCM/Open edX Architectural Process
Lots of discussion, but we did very poorly at taking notes. Oops.
**[Jeremy Bowman] New Working Groups, making WGs work better
2022-02-02
**** [Julia] The future of devstack - will we be examining Tutor to move to remote-based containers for development?
[Diana] Short answer is yes, but we should also chat about it
Context: https://github.com/openedx/devstack is the development environment that most OCM engineers use. There’s an alternative created by the community called Tutor, which we’re generally interested in transitioning to
Régis: Creator of Tutor, an alternative development (and deployment) tool for the Open edX platform.
Tutor talk: https://www.youtube.com/watch?v=_7GVNkEQ3qU
tCRIL: The new non-profit that is managing the Open edX platform.
Analysis (slightly outdated) of Tutor usage as replacement for devstack with edX use cases in mind: https://openedx.atlassian.net/wiki/spaces/AC/pages/2451276919
[Kelly] Do we have a way of spinning up a remote environment for messing around?
[Diana] Sandboxes do exist, even if they are not great for development: https://openedx.atlassian.net/wiki/pages/createpage.action?spaceKey=SRE&title=Sandboxes
***[Jeremy] Do we want to move this meeting a little earlier in the day? Arbisoft people struggle to make this time slot, probably not the greatest for Capetown or Open edX star-people in Europe either.
Yes, let’s try for a 9am time slot (10 conflicts with too many team meetings). May or may not involve moving to a different day of the week.
** [Julia] The authN MFE now starts up with the LMS on devstack. Is this a change in thinking? The learning/courseware MFE still doesn’t start up with the LMS.
Julia will follow up with that team.
2022-01-26
[quest] (collective) What’s up with Slack vs. Google Chats at 2U?
Slack was in use first, but security concerns prompted a migration to Google Chat
But there was strong resistance from TechDev
Current status: TechDev is still on Slack, pretty much everybody else switched to Google Chat
******** [Quest] (Ben W) How do we integrate 2U Guilds with the concept of edX working groups?
There’s an enumeration of OCM/edX working groups (or rather, 2 enumerations: public & private)
Guilds mostly came into being last year (1-2 started earlier)
FWIW, same roughly with the formalization of working groups at EdX
2U Guilds are mostly communities of practice, rather than sources of tasks to be done
https://2universe.2u.com/departments/tech/eec~2/engagementandculture/new_wiki (not sure why architecture doesn’t have frequency/channel, but it’s #guild-architecture, and monthly. Feel free to join!)
OCM/edX Working Groups tend to be more active, communities of practice tend to be done more as Slack channels and study groups
Currently, 2U cross-team initiatives are more ad-hoc
Vibram is the paragon-equivalent team in 2U
Helpful representatives to meet from edX:
Ben Warzeski
David Joy - Frontend Arch lead
Adam Stankiewicz- Paragon Tech Lead
2U Guilds are in Slack as #guild-* channels
Ruby is dead
Key point for pre-edX 2U folk - OCM/edX has both private and very public working groups, due to close ties to the open source Open edX platform
***** [ideation] (Ben W) Onboarding Working Group - Charter written and ready for potential review and seeking other people interested in contributing to/joining
https://openedx.atlassian.net/wiki/spaces/COMM/pages/3313993290
Plan is to focus primarily on the quality and accuracy of documentation
A key point will be separating what’s generic to Open edX and what’s specific to OCM/edX
Does the working group need a sponsor in management?
Not necessarily, but it might help
Ask managers of people planning to join the Working Group
**** [analysis] (Jeremy) Can we reduce the number of big things we use that have a tricky upgrade treadmill?
Node.js? Could we use Rust/Go things like Next.js instead?
Adam: Could this relate to that Nextblock project?
vs npm <-> yarn?
Adam: I like yarn ideas
Part of our node upgrade pain points come from the fact we didn’t pin the version everywhere and that SRE doesn’t rememb
https://nextjs.org/blog/next-12 (new Next.js release bragging about Rust utilization)
Ruby & Mongo - how do we sunset the existing forums implementation?
Elasticsearch - there were conversations recently about dropping this
(Andy) Cosmonauts is trying to kill elasticsearch in analytics at least, mostly by killing the whole feature that uses it but if needed replacing it https://openedx.atlassian.net/wiki/spaces/~mroytman/pages/3230957649/MST-1047+Is+Elasticsearch+inescapable
https://openedx.atlassian.net/wiki/spaces/COMM/pages/1678639164
https://discuss.openedx.org/t/deprecation-removal-depr-170-move-from-elasticsearch-to-opensearch/5844 (conversation veered towards “can we just stop using Elasticsearch instead?”)
[quest] (David) How can we start getting involved in the 2U Architecture Guild?
Join #guild-architecture in Slack
* [inform] Introductions!
Hi, nice to meet you! Tell us about yourself!
David Joy, Principal engineer on the Arch team at OCM/edX
Ned Batchelder (edX/OCM), open-source logistics mostly, also Python foundations
Jeremy Bowman, Engineering Manager for the Arch-BOM and Arbi-BOM architecture squads at OCM/edX
Ben Warzeski - EdX - Aurora Content Theme Frontend Engineer
Obsessed w/ React, testing, and process
Julia Eskew - Principal SW Dev on the Teaching & Learning team (edX)
Chris Deery Principal Engineer on the Engage team
Andrew Thal, Software (Enterprise?) Architect @ 2U (global)
Andy Shultz, Principal Engineer on Cosmonauts @ 2U/edX
Raymond Zhou, Software Engineer on edX Teaching & Learning
Robert Raposa, Principal Engineer on Arch-BOM @ 2U/edX
Danielle Eriksen, Senior Software Engineer @2u, Degrees Vertical
Diana Huang - Senior Software Engineer on Arch-BOM (edX/OCM)
2022-01-19
[quest] (Binod) Enterprise team is looking to best leverage feature toggles abilities to do more flexible (per customer or per subscription) toggles of features.A lot of these are done via config models right now. I wanted to get a base understanding of what best practices are in this area (edx-toggles read gave me words like: Waffle, Django setting, SettingToggle). Also, want to brainstorm on how can frontends best leverage these settings? Prior art? link to edx-toggles
Support for toggles per customer (enterprise)
Customer = a whole enterprise
Recently a PR to add an organization-level override for a waffle flag was created
Distributed toggle state, or single source of truth?
What we have today is de facto distributed toggle state, which isn’t terrible.
https://open-edx-proposals.readthedocs.io/en/latest/best-practices/oep-0017-bp-feature-toggles.html
https://waffle.readthedocs.io/en/stable/usage/javascript.html
We’ve investigated at some point, but it’s not used in production
For Frontends: I believe an existing pain point is: deploy env var via edx-internal then redeploy UI (is this accurate?) (Binod)
Using edx-internal environment variables for MFEs is very simple but not nearly as powerful as our waffle flags. When we need the ability to filter the flag by some criteria, we pass a waffle flag up from the backend in some API used by the MFE.
Waffle + MFE also in gradebook MFE
Optimizely is also used in frontend-app-learning
Very much draft/outline Micro-frontend plugins OEP: https://github.com/openedx/open-edx-proposals/pull/287/files
https://open-edx-proposals.readthedocs.io/en/latest/best-practices/oep-0017-bp-feature-toggles.html#distributed-configuration : distributed toggles ref
Enterprise engg also looking more at self service level admin controls for features (for admins to use)
[quest] (Alex D.) Would dev teams prefer to have a discrete Redis elasticache instance for each service they own (that requires redis)? Today, there is one redis instance that is shared by every redis-requiring service. What would the benefits be?
+ [Andy] (WG followup for next time, writing here so I don’t forget) working group / community component as individual heroism vs. team process
https://docs.google.com/document/d/1yaqJyIEkmeuq7Q9-LK9QumrUhRP_CLBa-t5qeHH9s_k/edit
Accountability / Visibility
Prioritization rubric
SWG does a good job of this
Sub-topic: addressing the asynchronous problem… how do we work when we're not together? Many working groups struggle with this, even well-functioning ones like BTR.
30/60/90s - vehicle for getting people involved in WGs?
We should add the list of working groups and a request to get involved in one to the onboarding docs
The problem may not be pressure to focus on sprint tickets, but lack of pressure to work on working group activities
Having explicit small tickets that can be put on sprint boards may help; try to ticket more working group tasks
2022-01-12
[Ned] Rollback vs fix-forward: how to decide?
A rollback is just one button, but it used to be very involved.
We usually recommend a rollback unless there’s a known backwards-incompatible database migration
Have people been bitten by rolling back then causing more trouble with data problems?
If we aren’t familiar with the commits, then a rollback might be scary because you don’t know if there’s a data point-of-no-return
One technique: pre-prep a revert PR for scary changes
“Rollbacks are scary, they can be seen as stepping on people’s toes, and let’s not forget it’s kinda like publicly admitting fault and even though no one’s gonna shame you, it’s still slightly embarrassing”
Put another way, the rollback may be more _visibly_ disruptive to us, as it screws up everyone else too. We’re not having the customer problem thrown in our face, but we see our impact on all our coworkers. And a revert/fix forward feels less invasive... if you can do it fast enough.
First impulse: debug the problem.
Rollback will involve other people’s commits
You might be removing a new feature
Other services might need rollbacks to match
After a rollback, doesn’t the entire release train have to be paused to not rollback the rollback?
Yeah, so a common pattern is rollback first, investigate, revert the problem, unblock the train, fix the problem for real.
Rollbacks don’t rollback migrations automatically, it’s an additional step.
Doesn’t this make rollbacks more dangerous?
We have a whole set of guidelines for dropping a column in stages: https://openedx.atlassian.net/wiki/spaces/AC/pages/23003228#EverythingAboutDatabaseMigrations-Howtodropacolumn
Jeremy’s anecdote: outages lasting an hour due to trying to fix forward are more common (~10) than rollbacks that go badly (~2)
Suggested action items:
Eng All Hands presentation on best practices around rollbacks and fixing forwards
Repeat rollback training session and/or promote documents and video from the last one
Remind people again of the all about data migrations Confluence page
[Ned] How to encourage devs to use their community effort? The edX career pathways now include a Community component. Are people doing this? Can we raise the visibility?
Many people are participating, they have trouble tracking how much time
20hr/mo came from the Core Contributor guidelines
Having community work in trackable (Jira) tickets can help
But working groups are public, and Jira isn’t good there
Is there a list of possible community work?
Working groups (both internal and external)
ERGs
Discuss/Slack activity
External pull requests
WG vs Guild vs Tiger Team
2022-01-05
[quest] (Jeremy) Outreach to the rest of 2U regarding architecture - what do we want to learn from them, what do we want them to know about us?
Pass along architecture onboarding deck
Clarify that we’re cleaning up a ball of mud
What existing meetings and rituals are there?
Who are the players? Is it split up by product lines?
How do they interact with their product team?
What other guilds/working groups exist?
Do they have common infrastructure between product lines/silos?
System diagrams?
What teams exist that are working on architecture problems?
Put together our own version of all the above to share
Time to update our Arch Onboarding deck?
Security policies/oversight
How do you do ownership amongst engineering teams?
How do you get requirements?
QA perspectives and philosophies
How UX/UI design teams interface with engineering
[quest] (Julia) As our infra/processes shift to tCril/2U-based, what team is responsible for updating our Developer Onboarding Docs?
Engineering managers are starting to make ad hoc changes as we sort out new onboarding procedures
Arch-BOM is starting to think about what we should do next in this area, but there aren’t concrete plans yet
We should quite possibly have more formal ownership of docs, aligned with code and service ownership.
Perhaps encourage developers to leave a wiki comment when encountering incorrect information? “This isn’t correct - I don’t know how to fix it but it should be fixed.”
Should we present a few bullet points at an Eng All Hands or such about how to enable developers to better maintain docs?
Onboarding working group?
Julia says: Old onboarding WG (6-7 years ago) used to interview developers a few months after onboarding to find out how things were going
One idea - have most new developers join this WG by default until they find another one they’d rather join instead
Working groups:
Are members other than group leads struggling to allocate time to them?
Anecdotally, yes.
Rotating leads helps avoid burnout, but doesn’t increase the group’s development bandwidth at any given point in time
Should tickets be added to team boards labelled as WG work, and raise it as an issue if the percentage of WG tickets on the board and/or completed in a sprint is too low?
Should we have a “tiger team” to set up our working group processes to work correctly? (Short term “working group working group”.)
Security working group is working well, but may or may not be a good model for other groups because it’s concretely different in some respects.
Jeremy B to work on getting managerial buy-in on setting up a functional process for working groups.
Who wants to be involved:
Andy, David J, Jeremy B