Arch Hour: 2020-12-17

Topics

Please enter your proposed topics for discussion here.
In Lean Coffee Style (link1, link2), we will vote on which topics the group wants to discuss and time-box the discussion to 10 or 15mns → 5mns (if re-voted) → 5mns (if re-voted).

  • External Guests in 2021 - In the past, we’ve invited special guests from other orgs, including a QA expert and an MFE expert. What topics/specialties would folks be interested for guests for next year? (Could also be from another edx-internal function or from the community.)

  • Early thoughts on approaches to splitting LMS and Studio (link) +1 +1 +1+1

    • CMS and LMS share code and a common database

    • A few approaches to separate

      • Approach 1: Duplicate code as a first step → kill unneeded code → evolve duplicated code separately

        • Approach 1a: Duplicate the code, continue to share the database (at least initially)

        • Approach 1b: Duplicate the code, duplicate the database

      • Approach 2: Extract code

        • Approach 2a: Extract common utility code

        • Approach 2b: Extract features

      • Approach 3: Install edx-platform as a library

        • LMS and CMS are Django projects, installing apps from elsewhere

    • Open Questions

      • Would we split the workers at some point, would we want to pull code jail into a separate service?

      • Would there ever be certain apps/features that are only for LMS or only for CMS?

        • dimensions to consider

          • ownership

          • coupling & cohesion

          • database/storage

        • event-driven - allows for data duplication more easily between LMS and CMS

        • access control - allows

  • Best Practices for having frontend code fail gracefully when backend APIs go down +2

    • When we upgrade backend services, the frontend also fails

    • ex: admin portal
      portal.edx.org
      https://github.com/edx/frontend-app-admin-portal/blob/master/src/components/TableComponent/index.jsx#L105-L114



    • ex: When SRE & Purchase upgraded ecommerce to avoid using an EOL database they decided it was ok that payment showed errors for a bit (partially because it was hard to fix)

    • It’s currently inconsistent - only available if the MFE chose to handle it

    • From frontend-platform’s perspective, it raises the error up to the caller

    • Can there be a toolbox for FE developers to choose from to handle graceful degradation?

      • Error page

      • Spinner

      • Banner

      • Message: "Oops, an error has occurred, please check our <status page> or reach out to <support> if you continue to see this message"

    • How to prioritize this effort?

      • Options

        • Have the default in frontend-platform be opinionated and fail strongly - the squad can then override ("How do we make the right way the easy way")

        • Don’t do the upgrade for the squad until the squad has demonstrated they are aware of this issue or addressed the issue

    • Currently it defaults to an error page with the message: “An unexpected error occurred. Please click the button below to refresh the page.”

    • ACTION @Adam Blackwell (Deactivated) and @Nimisha Asthagiri (Deactivated) to see what we can do with @Stacey Messier (Deactivated) and @Adam Butterworth (Deactivated) on improving the UX/UI of our default error page.

  • How to make it easy to add consistent alerting logic for MFEs

  • On call best practices and discoverability

  • Where and How should we be building docker images to best suit the needs of engineers?+3

    • Current Build locations/paradigms

    • https://github.com/edx/edx-notes-api/blob/master/.github/workflows/push-docker-image.yml#L21-L29

    • Other repos - Amazon ECR

    • There’s a POC on publishing to a private DockerHub

      • As part of this, GitHub Actions → GoCD

    • Note: could use GitHub Flow

    • Where?

      • ECR - for privacy

      • DockerHub - for community

    • How?

      • Options

        • GitHub Actions

        • Jenkins

        • GoCD

        • Travis - legacy

      • Tradeoffs

        • Number or repos that code lives in

        • Long-term maintenance (by SRE or teams who own services)

        • Pipeline features

          • GoCD has familiar features that allow us to do things like build images off of private security forks currently which may be hard in

          • Github Actions has Github Flow:

        • Speed

          • Time to test.

          • Time to deploy to stage/prod

        • Ability to cache things

          • May want to eventually use Kaniko to cache things.

          • Caching may sometimes be bad.

    • What’s next?

      • Make sure we can easily move from place between CI tools if needed

      • Start thread in #architecture to get more visibility / input.

  • Status Page(s) Best Practices

  • Maintenance Banner Best Practices

  • Things we don’t know about that will make containerizing edxapp hard +1

Backlog of Questions/Discussions

This section lists a backlog of previous proposed topics that haven’t yet been discussed.