Event Bus Ownership Onboarding
Topics
Helps address “Asynchronous beats Synchronous”
What is an event bus?
De-coupling logic in different parts of the Open edX deployment to asynchronously take action without having two parts know about each other.
De-coupling of two services is pretty fundamental. A service can emit events and others could consume it without knowing much about the source.
Having a way for data to get out of a service without creating a strong link/dependency between the services.
More centralized location for data storage independent of what service produced the data. Standardizing push/pull of data between services without designing a bunch of APIs and dependency cycles.
A way to standardize on how we’re announcing events and a consuming them. A framework we can build learnings into so we don’t need to keep re-inventing them.
Questions
What are the source of truth?
Event Sourcing - Events are the source of truth, send the event first and then materialize it into any database that wants it, including the producer of the event.
DB Sourcing - The source service’s SQL database is the source of truth and events should be close to that but not guaranteed to be true if it disagrees with the DB.
This is what we’re gonna go with to start, because this is what we have right now.
We’ll need to handle various situations as a result of this choice to ensure that consuming services can still stay consistent with the source.
How can we avoid accidentally getting eventual consistency when we are actually assuming a stronger consistency?
Eventually consistent - Data is not update immediately for everyone.
Immediately Consistent - Reads after writes contain the new data.
For communication that is already happening between services we’re already at eventual consistency, this question will have to be asked each time we move an app out of a service, in which case it will depend on the product needs.
We’re not gonna have answers on this for a while and will run into it more as we go.
Onboarding of new producers/consumers should involve review, maybe with a checklist, so that we can head off anti-patterns or common pitfalls
We should also have a reference document for how to think about code changes relating to event buses, condensing expert knowledge. For example, in DBs you want to think about transactionality and 1+N problems and in caches you want to think about key invalidation and thundering herd problems… what special things do people need to consider for event buses?
How important is it that all messages get delivered? (message delivery guarantees)
The level of guarantee will depend on the message and what you need for a given scenario.
At-least Once
At-most once
Exactly Once
Are there things a service should not do with a message?
Like don’t immediately call back the source service.
Yes, but we don’t know most of them right now. We can read-up on existing best practices to help ourselves level up.
What about managing DB transactions and events?
Just as with Django signals, we’ll likely want to use on-commit to send the event (possibly with some rare exceptions)
How do we manage deprecation of topics/old versions? Do consumers need to register which topics they subscribe to, or do we have to rely on grep? (schema management question)
Devstack POC
Producing
Consuming
Schemas (openedx-events)
Technology - Kafka/Pulsar - OEP
Cloud Hosting
Open edX use cases
Abstraction Layer
Next steps
New Events Checklist
Is the new event going to change the consistency SLA for the data it’s publishing