Architecture Challenges (2017-2018)
Purpose
This is a live page to keep track of sources of drag faced by an edX engineer or team due to architectural or design issues in our system.
Goals
- Surface issues that we face when doing rapid experiments or iterative development and see whether they are challenges for other teams as well.
- See if any patterns emerge that call for integration points or other architectural features.
- Use anecdotal data on this page to prioritize initiatives on architectural runways.
Challenges
Top-level Categories (as of 10/15/18)
Pain Category | # incidents | # votes in meeting | status | notes | |
1 | Monolith (mindset): Pluggability, Extensibility, Interdependency/coupling, Deprecation/removal | 14 | 2 | NOPE | |
2 | Modernizing FED | 9 | 2 | PROGRESS | Arch-FED efforts:
|
3 | Coupled services: Data synchronization and duplication | 5 | 2 | PARTIAL | Publisher efforts to fix data synch issues |
4 | Environments / Testing - End-to-end tests, Stage environments, Sandboxes, etc. | 5 | - | NOPE | DevOps efforts:
|
5 | Configuration/Toggles: OEP-17 - Feature Toggles | 2 | - | PROGRESS | Arch-BOM & Tools efforts:
|
6 | A/B Testing | 1 | 1 | ||
7 | Release pipeline evolvability | 1 | 1 | ||
8 | Authorization: product access, courseware, etc (aspect-oriented design) | 1 | 1 | Enterprise efforts:
| |
9 | Clear Best Practices - Debugging, API versioning | 1 | - |
Pain Incidents
Category | Issue | Team anecdotes (with dates, with team) | Number of occurrences | Status |
---|---|---|---|---|
Monolith / New Features 14 | Requires updating many Django files in the existing IDA. |
| Runway created for Django App Plugins. | |
Inline Discussions: A lot of our codebase depends on forums, making performance issues potentially cascade into the entire system. |
| |||
Needs to receive Django signals from apps in the existing IDA. |
| Runway created for Django Plugin Settings. | ||
Python API: Wants to make use of a python API defined in the existing IDA (to avoid unnecessary network roundtrips and code complexity involved with calling an HTTP API). | ||||
Testing. Wants feature package tests to be run when IDA tests are run to ensure that changes made to the IDA code don't break the feature. | ||||
Difficult to add a new LMS Dashboard |
| |||
LMS Course listing not extensible |
| |||
New Product: Introducing a new product offering is difficult. |
| Notes:
| ||
Functionality stuck in the Monolith: Reusable building blocks are locked up in the openedx.core.lib.api in edx-platform, which makes it difficult to pull into other apps and/or projects. |
| Enterprise actively working on moving pagination to drf-extension. To avoid breaking changes analytics-data-api response not updated. Evaluation of core.lib.api classes ongoing. | ||
XBlocks doesn't run on Block Transformers The Courseware doesn't use Block Transformers as its source for XBlock field data. The mobile API and Course Outline both do. This inconsistency means that we need to duplicate some features across both |
| |||
API Pluggability - Course Blocks API: new feature wants to make use of a new Transformer and/or add new fields to the API response. |
| |||
Front End Development 9 | UI Pluggability: Adding UI to an existing IDA, especially into LMS/Studio. |
| ||
Bootstrap / Pattern Library: CSS conflicts |
| A hack POC exists, but it is not ideal. | ||
Paragon: components are inconsistent and ongoing maintenance is unclear |
| |||
Backbone: Integrating with Backbone |
| Dahlia's next feature to implement will deal heavily with this. | ||
Mako: Inserting the component into the page with Mako |
| A studiofontend mako definition has been created to ease some of this pain. | ||
Changing or testing learner features 3 | TTV is affected when changes are required in edx-platform. |
| ||
Difficult to take advantage of our endpoint versioning which should normally let us support multiple external clients (mobile, etc) without having to sync our releases. |
| |||
Experimentation Tools: Need to duplicate configuration code for experimentation NOTE - Workaround exits for now on this. |
| Generic experiment key-value store can be used for storing experimental config/data. | ||
Releases / Pipeline 1 | Altering GoCD pipelines |
| ||
Environment Sync / Management (Stage, Sandboxes, Production) 5 | Setting up communications between our different IDAs on a sandbox: Setting up sandboxes so that the different IDAs are configured to communicated with each other is a tedious and frustrating process. |
| ||
True / Complete Staging Environments: Prod data is rarely synchronized to stage or loadtest environments. |
| |||
Debugging & Investigation 1 | There are no clear best practices or points-of-view on how we add logging to our code. This results in noisy unhelpful logs in some places and a lack of valuable information in others. |
| ||
Feature Toggle reporting is not automated (see OEP-17) 2 | Testing: Feature toggles (often waffle switch and flags) have not matched in stage/prod causing unexpected e2e failures or Production Outages/RCAs. |
| ||
Documentation: It is difficult to know how to remediate related to feature toggles that are undocumented regarding intended use and lifespan. |
| |||
Debugging / Logging: Determining/debugging production outages because of toggle changes. | ||||
Deprecation: Not removing toggles after rollout. | ||||
end-to-end test failures discovered on a centralized Staging environment, blocks release. |
| OEP 17: Feature toggles provides strategy for e2e tests related to toggles | ||
Data Duplication 5 | We have an architecture philosophy of self-contained systems, which often involves systems having local copies or caches of data owned by other systems. |
| ||
Course language is not in synch across our services |
| Language value in LMS' CourseOverview is now synched with the value in Catalog. | ||
Course start date is not in synch across our services. |
|