This is a place for notes, resource links, etc. that may be fixed and moved as part of more permanent documentation in the future.
Tasking Notes
These are personal notes for Robert Raposa to eventually get into JIRA.
Event Bus Tasks
Schema:
CloudEvents vs Avro
attrs
in openedx-events
Make the real version of this code, likely the transformation functions need to also move into the bridge class.
Needs tests written to validate the encode/decode.
Currently does not handle non-base types(opaque keys) but the final version should be able to handle this.
Maybe error out on other class types we’re not prepared to serialize/desserialize.
Should hard-code skipping PII objects(There is a personal data attrs object) for now.
Ensure devstack work is merged to master.
Instructions, review
Looking into Grades events
Grade codes
Design with Aperture
Certificate Change - ideal
Toggles
Starting up Kafka Client based off Kafka setting in place.
This is what provides option for new events to start.
Implementing producer and event
Don’t need yet: Event should have its own toggle to make optional.
Design events for credentials
An unfiltered event, where credentials does its own filtering, rather than leaving in edx-platform
simplest error handling to start
Verify that credentials action is idempotent and could handle event bus + old functionality at the same time.
Kafka Prototype updated(Spike, not expecting to merge this)
TODO: Links to edx-platform PRs
Match Pulsar
No outbox
Maybe get details from Feanil.
Create a management command that will consume a test topic and process the event(Print it)
Producer should produce with a well defined avro schema.
Fine to be a test event/topic
Consumer should be able to de-serialize to a well defined Python Object.
Discovery: abstraction layer (Feanil)
Implementing consumer
TODO:
infrastructure work
consumer management command
kubernetes/helm work for a consumer process that runs forever
"Hello World", sleeping and logging
Stage [deferred]
streamnative:
set up stage/prod clusters
defer peering until PII and other data
Discovery Questions and Notes
Requirements for Enabling Squads
Infrastructure ready
Onboarding Documentation ready
How to create a new event
Requirements for a new event
(Initially) Must have the feature using it behind a feature flag.
Must have a schema
How to consume an event
How can I learn what events exist?
How to not break the abstraction layer (if we have one)
Schemas
Do we start with FULL_TRANSITIVE as a compatability level?
Do we need to determine what changes we can’t make, if any, if transitive?
Or does this just slow performance?
Producing Events
When and how do we handle back-filling data for a new event stream with legacy data?
If we use an outbox:
Do we need to have back-pressure and fail when the outbox is too large?
Do we need to deserialize/serialize again when pulling from outbox?
Is outbox needed? Does it solve ensuring we get into outbox as part of db transaction in original request?
May we just have an emergency outbox for when the bus is down but just use synchronous send by default?
Abstraction Layer
We’ll need to both keep this in mind, and looking into this more as we know more about what we wish to abstract.
General
When updating OEP, also see https://openedx.atlassian.net/wiki/spaces/AT/pages/3133407254/Event+Bus+POC#Findings
Is it a problem that we are using the term Event Bus and Message Bus interchangeably?
Our Architecture Manifesto (WIP) more generally documents Asynchronous over Synchronous.
Our manifesto was loosely based on the Reactive Manifesto, which highlights Message Driven (in contrast to Event Driven).
Answer: No. Event Bus and Message Bus can be used interchangeably because the term Bus implies the pub/sub messaging pattern. Message Driven vs Event Driven may have different meanings, but it is unclear if that is universal or just according to the Reactive Manifesto.
Do we need to get more clear on use cases, or do we wish to find a one-size fits all technology?
Answer: We will be starting with pub/sub.
Documenting event definitions and event implementations (sending).
See annotations introduced in openedx-events for event definitions.
DE is collecting ownership information for events defined/emitted.
How will back-filling of events work?
When might we address questions around reliable data synchronization?
Where would this fall in terms of discovery work?
For example:
Ensuring event matches data committed to database.
Ensuring events are not lost.
Ultimately, will we want how-tos.
Ownership and rollout questions regarding this infrastructure work.
How to enable early wild-west-like learning with a small subset, and open more widely as the path gets better paved.
Reminder to get feedback early and often from anyone, as possible.
When is the right time for a Consumer Review? Is this inform only?
Potential Materials
https://learning.oreilly.com/playlists/caa16a2b-2ce8-4211-b268-852876b0bf70/
Types of Events- At edX
Signal Events: https://github.com/eduNEXT/openedx-events
https://open-edx-proposals.readthedocs.io/en/latest/oep-0041-arch-async-server-event-messaging.html
Other edX Initiatives
Message Bus Discovery: Apache Kafka (Revenue Team)
Additional Notes
Kafka Discovery
See 3 Python Libraries for Kafka Compared
In addition to choosing a Python library, we need to determine if messages are sent synchronously.
Kafka Connect or Kafka Streams
https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/
Note: Connectors are not managed by Amazon MSK
Kafka Hosting
Configuring to not lose data
Required discovery
Runbooks
Observability
Backups/disaster recovery
Future:
Zookeeper removal: https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
Addition of Tiered Storage
Schemas
Event-Driven Architecture
Medium: Benefits and challenges of Event-Driven architecture
https://www.infoworld.com/article/3269207/busting-event-driven-myths.html
“It is also important to have strong monitoring and observability solutions in place. You need to know which service sent which events, and who is subscribed to these events. Having good visibility into the flow of events will let you understand the system and troubleshoot it with more confidence and less guessing.”
Miscellaneous discovery for message bus:
Leading book club
Pulsar vs Kafka
Naming and name-spacing
Event observability
Types of events?
Log Events
New Relic Events
Tracking Events
Data Analysis
Segment Events
Realtime Events (Partners)
xAPI and Caliper
Missing unique ids?
Intraprocess Events
(Backend) Django Signals
What changes for inter/intra?
(Frontend) Javascript Events