Event Bus Ownership Onboarding

Topics

  • Helps address “Asynchronous beats Synchronous

  • What is an event bus?

    • De-coupling logic in different parts of the Open edX deployment to asynchronously take action without having two parts know about each other.

    • De-coupling of two services is pretty fundamental. A service can emit events and others could consume it without knowing much about the source.

    • Having a way for data to get out of a service without creating a strong link/dependency between the services.

    • More centralized location for data storage independent of what service produced the data. Standardizing push/pull of data between services without designing a bunch of APIs and dependency cycles.

    • A way to standardize on how we’re announcing events and a consuming them. A framework we can build learnings into so we don’t need to keep re-inventing them.

  • Questions

    • What are the source of truth?

      • Event Sourcing - Events are the source of truth, send the event first and then materialize it into any database that wants it, including the producer of the event.

      • DB Sourcing - The source service’s SQL database is the source of truth and events should be close to that but not guaranteed to be true if it disagrees with the DB.

        • This is what we’re gonna go with to start, because this is what we have right now.

        • We’ll need to handle various situations as a result of this choice to ensure that consuming services can still stay consistent with the source.

    • How can we avoid accidentally getting eventual consistency when we are actually assuming a stronger consistency?

      • Eventually consistent - Data is not update immediately for everyone.

      • Immediately Consistent - Reads after writes contain the new data.

      • For communication that is already happening between services we’re already at eventual consistency, this question will have to be asked each time we move an app out of a service, in which case it will depend on the product needs.

        • We’re not gonna have answers on this for a while and will run into it more as we go.

        • Onboarding of new producers/consumers should involve review, maybe with a checklist, so that we can head off anti-patterns or common pitfalls

          • We should also have a reference document for how to think about code changes relating to event buses, condensing expert knowledge. For example, in DBs you want to think about transactionality and 1+N problems and in caches you want to think about key invalidation and thundering herd problems… what special things do people need to consider for event buses?

    • How important is it that all messages get delivered? (message delivery guarantees)

      • The level of guarantee will depend on the message and what you need for a given scenario.

        • At-least Once

        • At-most once

        • Exactly Once

    • Are there things a service should not do with a message?

      • Like don’t immediately call back the source service.

      • Yes, but we don’t know most of them right now. We can read-up on existing best practices to help ourselves level up.

    • What about managing DB transactions and events?

      • Just as with Django signals, we’ll likely want to use on-commit to send the event (possibly with some rare exceptions)

    • How do we manage deprecation of topics/old versions? Do consumers need to register which topics they subscribe to, or do we have to rely on grep? (schema management question)

  • Devstack POC

    • Producing

    • Consuming

    • Schemas (openedx-events)

  • Technology - Kafka/Pulsar - OEP

  • Cloud Hosting

  • Open edX use cases

  • Abstraction Layer

  • Next steps

  • New Events Checklist

    • Is the new event going to change the consistency SLA for the data it’s publishing