WIP: Known Architecture Backlog

Goals

  • Improve time-to-value of product delivery teams.
  • Minimize the resources required for enhancement and maintenance of the edX platform.

Metrics 

Rationale

In order to validate the impact of architecture initiatives, measure and observe trends in time and resources required to deliver products and features.

Todo

  1. Define metrics.
    1. Metrics section in Monolith WG: 04/26/2017
    2. Metrics: Resources
    3. Feature toggle usage
  2. Implement dashboard to observe metrics over time

Architecture Linters

Rationale

In order to scale development, we want agile teams to continue to be as autonomous and independent as possible.  The more that we can automate best practices and intentional architecture, the tighter the feedback loop is for individual teams.  Automated linters that enforce architectural principles enable this autonomy.

Todo

  1. Create arch linter that enforces Django Authentication and Permission decorators as required by this PR (in prep for OAuth scopes work).
  2. Create any arch linters as required by the Feature Toggles OEP.

Education (Diagrams, Docs, Courses, etc.)

Rationale

Increase general understanding of the edX platform architecture and best practices across all parts of engineering (internal and external) in order to

  • reduce time to onboard (internal and external) developers.
  • reduce time in reviewing internal PRs, contractor PRs, and OSPRs.
  • reduce unexpected surprises that could have been alleviated if a system was better understood.

Todo

  1. OEPs and/or edX Courses (also see Architecture Decisions for eventual OEPs)
  2. Design docs and diagrams (see Architecture Onboarding Presentation and Architecture Design Documents
  3. Consolidate documentation for better discoverability 

Integration and Extension Points (APIs, Plugins, Hooks, Events, etc.)

Rationale

To help with both immediate TTV development effort as well as long-term maintanence of the core platform, a strong allegiance to the Open-Closed SOLID principle will go a long way.  Specifically, the core platform should be open to extensions, but closed to modifications.  However, at this time, there are very limited number of extension points in the platform.  So further investement in this area is required.

External APIs (not internal Python APIs): External Interfaces into the platform allow separate services to interact with the platform without making any modifications to the platform. Typically implemented as REST endpoints, these interfaces are usually CRUD operations.  They should be:

  • Discoverable
  • Consistent and Clear
  • Secure

Plugins are a manifestation of the Dependency Inversion SOLID principle (in addition to Open-Closed) and allow new implementations of a feature without needing to modify the platform.  This requires creating a plugin interface that abides by "Liskov Substitution" and "Interface Segregation" so new plugins can be easily substituted and configured.

Hooks allow external services to customize a feature that is internal to the platform by exposing an interface that allows an external function/service to implement/change a behavior of the feature.  These can be more manipulative than CRUD REST operations.

Events allow external services to be notified when something occurs.  Once again, this allows extending a feature without needing to modify the platform.

Todo

  1. Understand the upcoming features required by product development teams and create runways to build in the APIs, Plugins, Hooks, Events, etc to enable them to move more quickly.  By forcing us to follow the Open-Closed principle internally, over time, we will eventually put in the needed integration points in our platform.

External APIs

  1. Consistency and Clarity
    1. API conventions - create OEP based on updated version of edX REST API Conventions
    2. Common API infrastructure across services (see Libraries we KNOW we want to move out of the monolith)
      1. Rate limiting (see  LEARNER-3858 - Getting issue details... STATUS )
      2. Pagination
      3. Authentication (JWT-based)
      4. Authorization (Scopes enforcement)
  2. Discoverability
    1. Swagger integration
  3. Security
    1. OAuth Scopes support across services
    2. OAuth - deprecate old libraries (DOP, OpenID, etc)
    3. OAuth - use asymmetric JWT signing keys

System Architecture

Rationale

"If you can’t build a well-structured monolith, what makes you think microservices is the answer?"  (Simon Brown's keynote)

Regardless of how many integration and extension points are poked into the edX platform, the large monolith will remain sluggish and there will still be many reasons to update it as long as core functionality remains hidden within it.  Alternatively, edX as a business will simply implement cursory features around the platform without really digging into or innovating in its core features - with the mindset of wanting to deliver value (not core value) quickly.

Furthermore, while future development happens outside of the monolith, unless the future architecture is strategically designed the development team may continue to create tightly-coupled interfaces but now across process boundaries.  So, a strategic consideration of industry-tested best practices with microservices is fully warranted, followed by education and enforcement in future development.

Todo

  1. DDD
    1. DONE. Domain-driven design (DDD) book club.
    2. Follow DDD
      1. Domain Vision Statement
      2. Bounded Contexts
      3. Ubiquitous Language
    3. Evangelize Domain-driven microservices, path to the Reactive Manifesto
    4. Katas of modern architectural design patterns → leading to a strategy.
    5. Determine what Architectural Linters can be put into place
  2. Taming the Monolith
    1. Time-boxed effort on high-level clean up of the platform as started with edx-platform Code Structure: Hackathon XIV and later documented in edx-platform Repository Overview.

Feature Toggles

Rationale

There is a proliferation of feature toggles in the edX platform.  There may even be a steeper increase over the last 2 years due to more frequent releases and encouragement of shorter-duration git branches.  However, there are drastic long-term maintainence issues of the platform if these toggles are not properly maintained/removed/etc.

Todo

  1. DONE. Document best practices in Feature Toggles OEP.
  2. Implement #9, #11, and #12 of the framework requirements.
  3. Runway for testing toggles in unit tests, end-to-end tests, etc. as suggested by the OEP.
  4. Evangelize and establish sustainable process of toggle creation, maintainance, and cleanup (as described in the OEP).

Sources of Drag (→ Runways)

Rationale

By assessing actual developer pains and challenges with implementing, deploying, and maintaining features, we can determine what gaps exist in evolving our platform.

Todo

  1. Discovery
    1. Assess self-reported pains documented in Architecture Challenges.
    2. Hold live meetings to understand challenges involved: roundtable discussions, feature retros, value stream mapping exercises, 1:1 with team leads, etc.
  2. Known pain-points
    1. FED development
    2. Monolith updates (may be addressed by IntegrationandExtensionPoints(APIs,Plugins,Hooks,Events,etc. and SystemArchitecture from above)
    3. Coupled services (may be addressed by SystemArchitecture from above)
    4. Data migration (can't evolve schema of large tables)