Arch Hour: 2020-12-17
Topics
External Guests in 2021 - In the past, we’ve invited special guests from other orgs, including a QA expert and an MFE expert. What topics/specialties would folks be interested for guests for next year? (Could also be from another edx-internal function or from the community.)
Early thoughts on approaches to splitting LMS and Studio (link) +1 +1 +1+1
CMS and LMS share code and a common database
A few approaches to separate
Approach 1: Duplicate code as a first step → kill unneeded code → evolve duplicated code separately
Approach 1a: Duplicate the code, continue to share the database (at least initially)
Approach 1b: Duplicate the code, duplicate the database
Approach 2: Extract code
Approach 2a: Extract common utility code
Approach 2b: Extract features
Approach 3: Install edx-platform as a library
LMS and CMS are Django projects, installing apps from elsewhere
Open Questions
Would we split the workers at some point, would we want to pull code jail into a separate service?
Would there ever be certain apps/features that are only for LMS or only for CMS?
dimensions to consider
ownership
coupling & cohesion
database/storage
event-driven - allows for data duplication more easily between LMS and CMS
access control - allows
Best Practices for having frontend code fail gracefully when backend APIs go down +2
When we upgrade backend services, the frontend also fails
ex: admin portal
portal.edx.org
https://github.com/edx/frontend-app-admin-portal/blob/master/src/components/TableComponent/index.jsx#L105-L114ex: When SRE & Purchase upgraded ecommerce to avoid using an EOL database they decided it was ok that payment showed errors for a bit (partially because it was hard to fix)
It’s currently inconsistent - only available if the MFE chose to handle it
From frontend-platform’s perspective, it raises the error up to the caller
Can there be a toolbox for FE developers to choose from to handle graceful degradation?
Error page
Spinner
Banner
Message: "Oops, an error has occurred, please check our <status page> or reach out to <support> if you continue to see this message"
How to prioritize this effort?
Options
Have the default in frontend-platform be opinionated and fail strongly - the squad can then override ("How do we make the right way the easy way")
Don’t do the upgrade for the squad until the squad has demonstrated they are aware of this issue or addressed the issue
Currently it defaults to an error page with the message: “An unexpected error occurred. Please click the button below to refresh the page.”
ACTION @Adam Blackwell (Deactivated) and @Nimisha Asthagiri (Deactivated) to see what we can do with @Stacey Messier (Deactivated) and @Adam Butterworth (Deactivated) on improving the UX/UI of our default error page.
How to make it easy to add consistent alerting logic for MFEs
On call best practices and discoverability
Where and How should we be building docker images to best suit the needs of engineers?+3
Current Build locations/paradigms
Jenkins jobs w/ Groovy in jenkins-job-dsl that calls shell scripts and pushes to ECR or Dockerhub
Travis jobs that push to Dockerhub
Github Actions defined in service repos that use make commands that push to ECR or Dockerhub (which are run on either private runners or in Github runners)
Notes Example:
1: Github Action triggered on push to master: https://github.com/edx/edx-notes-api/blob/master/.github/workflows/push-docker-image.yml#L21-L29
2: Calls shell script: https://github.com/edx/edx-notes-api/blob/master/.github/workflows/deployment_prs.sh#L24
3: Calls
make docker_build
: https://github.com/edx/edx-notes-api/blob/master/Makefile#L77-L924: Get’s pushed to just dockerhub currently (but we want to push private security fixes to ECR to keep them private until they are deployed)
GoCD jobs that use tubular scripts
https://github.com/edx/edx-notes-api/blob/master/.github/workflows/push-docker-image.yml#L21-L29
Other repos - Amazon ECR
There’s a POC on publishing to a private DockerHub
As part of this, GitHub Actions → GoCD
Note: could use GitHub Flow
Where?
ECR - for privacy
DockerHub - for community
How?
Options
GitHub Actions
Jenkins
GoCD
Travis - legacy
Tradeoffs
Number or repos that code lives in
Long-term maintenance (by SRE or teams who own services)
Pipeline features
GoCD has familiar features that allow us to do things like build images off of private security forks currently which may be hard in
Github Actions has Github Flow:
Speed
Time to test.
Time to deploy to stage/prod
Ability to cache things
May want to eventually use Kaniko to cache things.
Caching may sometimes be bad.
What’s next?
Make sure we can easily move from place between CI tools if needed
Start thread in #architecture to get more visibility / input.
Status Page(s) Best Practices
Maintenance Banner Best Practices
Things we don’t know about that will make containerizing edxapp hard +1
Backlog of Questions/Discussions
This section lists a backlog of previous proposed topics that haven’t yet been discussed.
Recent MFE docs
Requirements for public reposarchived - how will these requirements be reinforced?