|Table of Contents
Manifesto Description and Examples
Decentralized beats Centralized
We create a scalable system with decentralized points of autonomy. We prevent centralized points of failure and bottlenecks. We balance this with intentional architecture.
We minimize coordination across team boundaries and value team-level autonomy in order to scale our organization. Collaboration between teams, however, is still vital to align on high-level goals and ensure the convergence of architectural decisions.
Developer teams are responsible for the services and code they own; therefore, they have autonomy over local architectural decisions, data modeling, code quality, library usage, monitoring, etc. Although edX promotes standards for many such concerns and encourages teams to collaboratively seek advice, the final say on decisions about a technology lies with its owners. These decisions, however, need to be defendable and should be documented in ADRs.
We follow SOLID design principles. We invest appropriate time in determining the Single Responsibility of functions, modules, and higher-order components. We build interfaces for components that have many dependents, following Dependency Inversion. We depend on interfaces and not on concrete implementations.
We create an extensible platform that allows developers to build extensions that integrate with the core of the platform. This allows the core to remain small, while volatile extensions remain in the periphery.
Clear Bounded Contexts
We decompose large-scale systems into smaller, cognitively understandable, encapsulated and cohesive components called Bounded Contexts.
We intentionally choose to duplicate data across Contexts when data from a source Context is needed for a 'core Job' of a dependent Context. The dependent Context then has the flexibility to store/transform/optimize its local copy of the data according to its own special needs. This produces highly independent self-contained systems.
We look at client-side integrations when a product feature requires time-precise data from the source Context and cannot tolerate delays in asynchronous data synchronization by the dependent Context.
Loosely Coupled Codebase
We prefer building a loosely coupled architecture that is not bottlenecked by centralized implementations (libraries, code, etc). While our engineering impulse may be to de-duplicate code, we do not (usually) apply DRY across Contexts. We understand that when multiple microservices depend on shared libraries, it can lead to unintended team bottlenecks.
We allow exceptions to this trade-off when architecturally significant concerns need to be consistent across services (e.g., API authentication and authorization frameworks in edx-drf-extensions).
We keep the deployment pipelines of each of our microservices to be independent of each other. We never design a feature that requires multiple services to be deployed at the same time.
Asynchronous beats Synchronous
Inspired by the Reactive Manifesto, we create a loosely coupled architecture that is responsive, resilient, and elastic by having asynchronous (non-blocking) communication channels between services wherever possible.
We keep our 99% web response times to less than 2 seconds unless an explicit exception is made (with Product). This means we do not make blocking synchronous remote requests (within a web request) to other microservices, especially those that are outside of our Context’s Subdomain.
We prefer accessing slightly stale data rather than being dependent on the real-time availability of other services. We recognize that the CAP theorem holds in our distributed services. That is, for Partition-tolerance (the system continues to operate in spite of partial network failures) we choose Availability (find a local/nearby replica of data) over Consistency (all read the same data at the same time).
When we write data, we don’t expect it will be immediately available throughout the system and be read at the same time. We prefer each client to be resilient and handle any ambiguities.
When the time-precision of reading up-to-date data is paramount such that delays from background data synchronization cannot be tolerated, we have client (e.g., frontend) code (asynchronously) access the data from its source of truth. We prefer clients to fetch data across contexts (when really necessary) rather than have synchronous remote calls within backend business logic.
We recognize that this design introduces dependencies between Contexts (even if through the client) and so we do not use this pattern when the data is necessary for a core Job of the client.
We minimize the load on our webservers by exposing only APIs that are performant. We implement slow transactions as asynchronous tasks that run in the background and do not interfere with web responses.
We add pagination from the ground up through our backend APIs, not just (a posteriori) in frontend views.
For APIs that provide large data sets, we prefer asynchronously generating data reports and exposing them via separate APIs that do not load our servers.
Intentional Architecture beats Emergent Design
"Without guidance, evolutionary architecture becomes simply a reactionary architecture."
Work to Simplify
We understand that the first solution that comes to mind is not necessarily the best solution. We take some time to better understand the problem space before ideating on solutions. Solving a problem one bit at a time can lead to complicated solutions since each increment addresses symptoms instead of the underlying diagnosis.
Instead, invest time to tease apart the problem, make relevant connections, and create meaningful abstractions in order to create simpler long-term solutions.
Additionally, long-lived code may age and accrue complexity over time as business needs evolve and features are added. So, we continuously refactor and improve the code, leaving it better than it was (“scouts rule”).
In following Agile practices and iterative processes, we recognize that (appropriately) timeboxed upfront design efforts are investments that can lead to better long-term technical outcomes. The design may contain abstractions or APIs that can save development time in the future.
We understand that the phrase ‘You Ain’t Gonna Need It (YAGNI)’ applies more to local code within a Context since that code can be continuously refactored as the feature and code evolve. For a platform developed across multiple distributed services, features and code cannot be as easily refactored and so upfront design that results in additional generalities pay off.
As part of our regular development process, we document technical decisions in GitHub as described in OEP-19. We add the creation of decision records in acceptance criteria during our grooming processes. We remind our teammates to do so when we review PRs. We recognize decision records serve to onboard new members, to understand historical traces, to reference past decisions, and to ensure the team is aligned.
In addition to decision records, we actively maintain READMEs and HowTos as described in the OEP.
We believe automated verifications will more efficiently and precisely direct technical changes than manual verifications. Technical direction includes both detection of errors (e.g., variable not defined) as well as guidance towards a future technical evolution (e.g., adoption of PII annotation).
Fitness function is the term used in Building Evolutionary Architecture to “communicate, validate and preserve architectural characteristics in an automated, continual manner, which is the key to building evolutionary architectures.”