Building Event-Driven Microservices Ch 4, 2021-10-20

Chapters 4 Outline

We are continuing discussion of Chapter 4 which we barely started last time.

Chapter 4: Integrating Event-Driven Architectures with Existing Systems

  • What Is Data Liberation?

    • Compromises for Data Liberation

    • Converting Liberated Data to Events

  • Data Liberation Patterns

  • Data Liberation Frameworks

  • Liberating Data by Query

    • Bulk Loading

    • Incremental Timestamp Loading

    • Autoincrementing ID Loading

    • Custom Querying

    • Incremental Updating

    • Benefits of Query-Based Updating

    • Drawbacks of Query-Based Updating

  • Liberating Data Using Change-Data Capture Logs

    • Benefits of Using Data Store Logs

    • Drawbacks of Using Data Base Logs

  • Liberating Data Using Outbox Tables

    • Performance Considerations

    • Isolating Internal Data Models

    • Ensuring Schema Compatibility

    • Capturing Change-Data Using Triggers

  • Making Data Definition Changes to Data Sets Under Capture

    • Handling After-the-Fact Data Definition Changes for the Query and CDC Log Patterns

    • Handling Data Definition Changes for Change-Data Table Capture Patterns

  • Sinking Event Data to Data Stores

  • The Impacts of Sinking and Sourcing on a Business

  • Summary

Discussion Notes

  •  Was chapter 4 relevant to edX?

    • Oversold

      • Only talks about integrating with old databases and many bad ways to integrate with them.

        • Bad things:

          • Publish your internal data model directly onto the event bus.

    • Main point of events as 1st class data citizen and you need a source of truth you can trust.

    • Monoliths that fully can’t be edited are not the usual problem and not the problem edX has so the topic felt slightly less useful.

    • These are relevant tools but not clear when we should go to an event stream first.

  • In Django, we can emit the object from django’s post_save signal if we really wanted to emit the object.

  • How sure are we that we want to make events the source of truth?

    • Should enrollments be events as source of truth?

      • Depends 

    • Example: We’ve got a 3rd party system that wants us to know a thing that happened over there via a REST API, that could go to a stream or saved to the DB.

      • It’s hard to believe that the distributed stream is as safe as a local MySQL database.

      • If it goes down, the events gets stored as a write-only log which can be stored and re-played so it’s easy to move them to the relevant place.

      • If the event fails to be created, the request/response could handle that just like they do a failed write to MySQL.

      • If you buy into the event first model, you buy that the event is as critical and important as your SQL database.

        • Do we need that storage mechanism or do we just need a message bus for just in time events?

          • Message Bus: get just in time events but for back-filling you go to the service that’s the source of truth.

          • Event Bus: directives and storage of historical data.

        • We could choose to not retain data forever and focus on the just-in-time cross service communication with just enough retention to support unexpected downtime of services.

        • Perhaps this means we don’t want to have entity topics which make the bus the source of truth to some degree.

          • Maybe entity topic is an anti-pattern if we want to focus on just-in-time events.

  • We have lots of services, do they share the same database?

    • Most microservices have independent databases, LMS/Studio is the exception.

  • Post Save and Data Loss

    • Post save would be after the data is committed but might fail to get on the bus?

      • What are other failure modes that can occur?

      • Will this depend on the guarantees any particular piece of data needs?

      • Now that it’s in the DB, I have a way to trust that it will get replicated to other systems.

    • For other services the message bus is the source of truth for them, they’ll need to understand the guarantees for their data source.

    • If we go the other way and put the message sending in the transaction, we may transmit things that failed to 

    • If we’re worried about the source of truth, it complicates updating a row vs an event that might span multiple tables.

  • Do we need to have clear well defined foreign keys in our events?  Fields that we can guarantee mean the same thing in all events so that consumers can safely foreign key data across events to each-other locally.