Building Event-Driven Microservices Ch 3-4, 2021-09-29

Chapters 3-4 Outline

Chapter 3: Communication and Data Contracts

  • Event-Driven Data Contracts

    • Intro

      • data contract = data definition + triggering logic

    • Using Explicit Schemas as Contracts

    • Schema Definition Comments

    • Full-Featured Schema Evolution

    • Code Generator Support

    • Breaking Schema Changes

  • Selecting an Event Format

  • Designing Events

    • Tell the Truth, the Whole Truth, and Nothing but the Truth

    • Use a Singular Event Definition per Stream

    • Use the Narrowest Data Types

    • Keep Events Single-Purpose

    • Minimize the Size of Events

    • Involve Prospective Consumers in the Event Design

    • Avoid Events as Semaphores or Signals

  • Summary


Chapter 4: Integrating Event-Driven Architectures with Existing Systems

  • What Is Data Liberation?

    • Compromises for Data Liberation

    • Converting Liberated Data to Events

  • Data Liberation Patterns

  • Data Liberation Frameworks

  • Liberating Data by Query

    • Bulk Loading

    • Incremental Timestamp Loading

    • Autoincrementing ID Loading

    • Custom Querying

    • Incremental Updating

    • Benefits of Query-Based Updating

    • Drawbacks of Query-Based Updating

  • Liberating Data Using Change-Data Capture Logs

    • Benefits of Using Data Store Logs

    • Drawbacks of Using Data Base Logs

  • Liberating Data Using Outbox Tables

    • Performance Considerations

    • Isolating Internal Data Models

    • Ensuring Schema Compatibility

    • Capturing Change-Data Using Triggers

  • Making Data Definition Changes to Data Sets Under Capture

    • Handling After-the-Fact Data Definition Changes for the Query and CDC Log Patterns

    • Handling Data Definition Changes for Change-Data Table Capture Patterns

  • Sinking Event Data to Data Stores

  • The Impacts of Sinking and Sourcing on a Business

  • Summary

Discussion Notes

  • Author strongly recommends schema management.  How do we feel about schema management for events?

    • It’s always gonna have a schema, it's a matter of how much you manage it.

    • Formal schema management is a useful tool for doing this thoughtfully at scale.

    • Where does schema management live?

      • Schema registry, holds schema and can be used to evaluate if new schema is compatible with existing schema.

      • Compatibility mode

        • Start with full and go from there.

        • Hard to imagine not having full compatibility if we want to have a large number of consumers.

          • Counterpoint: We could deprecate old versions of schema and communicate deadlines between producer team and consumer teams.

            • This suggests another line of communication between various teams.

  • Design Section

    • Many of the principals seemed to conflict.

      • One event definition per stream.

        • How do we handle CRUD? Separate streams so we have one for each action. That seems very heavy and we’d have to worry about ordering at the consumer.

      • The idea is to not overload your entity topics.

        • Let a thousand streams bloom but each stream should be one entity.

      • Where ordering matters, we may want to push away from one event per stream, to be able to reason about when events happened.

      • There’s pure click stream, entity streams, and there are things in between.

        • Eg. I need to take an action that I need to take when an enrollment occurs.

        • Eg. User used to pass and used to fail.

        • Some things may need be short cut by IDs so event sizes don’t blow up.  The referenced IDs would be the key in entity events 

    • When mapping from tables to entities, how do you deal with ids and foreign keys?

      • Might have to be critical of what the domain concept is that you want to convey.  This might be at odds with how it’s laid out in a relational database.

      • Be mindful of entities growing too big.

      • But also be careful about pushing a bunch of ids in a message and then seeing a bunch of call-backs to fetch the data of those IDs

  • The idea of redundancy is not really built into the messaging systems.

    • Keeping track of what the context was of a change.

    • Eg. Indicate what the enrollment mode was before the change and what it is after this event.

  • Do we want to keep entity events and subset of entity events?

    • Sometimes you might not care about the underlying event but a meta concept on top of it.

      • Grade change vs pass a course.

  • Author is pushing Kafka

    • Expectation is that it’s treated more akin to the SQL database behind an app.

    • Commit to the data store as a core data store that can make high reliability guarantees.

    • We don’t say what to do if the database falls out of sync.

  • Data Liberation

    • Saying I’m going to entity stream everything, I don’t like encapsulation.

    • By exposing all your internal RDBMS schema, change management can be very complex and schema management becomes more difficult.

    • Things happen and you need to react to it is a core part of the business,  the book provides many strategies but it’s up to you to build good events that give you enough context to take the correct business actions.