Page Comparison

...

Continuous Integration
- CI server detects code is committed, verifies code and runs tests
- Versioned Artifacts are also created for further validation and usage in downstream deployments
  - Confirms that the artifacts deployed are the ones tested
  - Reused without continual recreation
  - Traceability back to the commit
- 3 questions from Jez Humble on whether you're really doing it
  - Do you check in to mainline once per day?
    - Even if you are using short-lived branches, integrate frequently
  - Do you have a suite of tests to validation your changes?
  - When the build it broken, is it the #1 priority of the team to fix it?
- Repo
  - Since repo and single CI build for all microservices
    - requires lock-step releases
    - Ok for early stage and short-period of time
    - Cycle time impacted - speed of moving a single change to being live
    - Ownership issues
  - Single repo with separate CI builds mapping to different parts of the source tree
    - Better than the above
    - Can get into the habit of slipping changes that couple services together.
  - Separate repo with separate CI builds for each microservice
    - Faster development
    - Clearer ownership
    - More difficult to make changes across repos - can be easier via command-line scripts
Continuous Delivery
- Treat each check-in as a release candidate, getting constant feedback on its production readiness
- Build pipeline
  - One stage for faster tests and another stage for slower tests
  - Feel more confident about the change as it goes through the pipeline
  - Fast tests → Slow tests → User acceptance tests → Performance tests → Production
One microservice per build
- Is the goal
- However, while service boundaries are still being defined, a single build for all services reduces the cost of cross-service changes
Deployable Artifacts
- Platform-specific
- OS-specific
- Images
Environments for each pipeline stage and different deployments
- different collection of configurations and hosts
- Service configuration
  - Keep configuration decoupled from artifacts
  - Or use a dedicated config system
Service to host mapping
- Multiple services per host
  - Coupling, even with Application Containers
- Single service per host
  - Monitoring and remediation much easier
  - Isolation of failure
  - Independent scaling
  - Automation mitigates amount of overhead
Virtualization
- Vagrant - can be taxing on a developer machine when running a lot of VMs at once
- Linux Containers
- Docker - Vagrant can host a Docker instance

Ch 7. Testing

Brian Marick's Testing Quadrant from Agile Testing by Lisa Crispin and Janet Gregory
Mike Cohn's Test Pyramid from Succeeding with Agile
- Go up
  - scope increases, confidence increases, test-time increases;
  - when test breaks harder to diagnose cause
  - order of magnitude less than previous
- When broader-scoped test fails, write unit regression test
- Test Snow cone - or inverted pyramid - doesn't work well with continuous integration due to slow test cycles
Unit tests
- fast feedback on functionality
- thousands run in less than a minute
- limiting use of external files or network connections
- outcomes of test-driven design and property-based testing
Service tests
- bypass UI
- At the API layer for web services
- Can stub out external collaborators to decrease test times
- Cover more scope than unit tests, but less brittle than larger-scoped tests
- Mocking or stubbing?
  - Martin Fowler's Test Doubles
  - Stubbing is preferable
  - Mocking is brittle as it requires more coupling with the fake collaborators
End-to-end (E2E) journey tests
- GUI/browser-driven tests
- Higher degree of confidence when they pass, but slower and trickier - especially with microservices
- Tricky
  - Which versions of the services to test against?
  - Duplicate of effort by different service owners
  - => Single suite of end-to-end tests - fan-in from individual pipelines
- Downsides
  - Flaky and brittle
    - => Remove flaky tests to retain faith in them
    - => See if they can be replaced with smaller-scoped tests
  - Lack of ownership
    - => Shared codebase with joint ownership
  - Long duration
    - Large end-to-end test suites take days to run
    - Take hours to diagnose
    - => Can run in parallel
    - => Remove unneeded tests
  - Pile-up of issues
    - => release small changes more frequently
  - Don't create a meta-version of all services that were tested together - results in coupling of services again
- Establish agreed-upon core journeys that are to be tested
  - Each user story should NOT result in an end-to-end test
  - Very small number (low double digits even for complex systems)
Consumer-driven contract (CDC) tests
- Interfaces/expectations by consumers are codified
- Pact - open-source consumer-driven testing tool
- Codifying discussions between consumers and producers - a test failure triggers a conversation
- E2E tests are training wheels for CDC
  - E2E tests can be a useful safety net, trading off cycle time for decreased risk
  - Slowly reduce reliance on E2E tests so they are no longer needed
Post-deployment Testing
- Acknowledge that we cannot find all errors before production
- Smoke test suite runs once code is on prod
- Blue/green deployment
  - allows quick rollback if needed
  - zero downtime deployments
- Canary releasing
  - to verify new version is performing as expected (error rates, response time, etc) under real traffic
  - version co-exist for longer periods of time
  - either a subset of production traffic or shadow of full production traffic
- Mean time between failures (MTBF) versus Mean time to repair (MTTR)
  - Optimizing for MTTR over MTBF allows for less time spent on functional test suites and faster time to customer value/testing an idea
Cross-functional Testing (a.k.a., nonfunctional requirements)
- Fall into the Property Testing quadrant
- Performance Tests
  - Follow pyramid as well - unit test level as well as e2e via load tests
  - Have a mix - of isolated tests as well as journey tests
  - load tests
    - gradually increase number of simulated customers
    - system matches prod as much as possible
    - may get false positives in addition to false negatives
  - Test runs
    - Run a subset of tests every day; larger set every week
    - Regularly as possible - so can isolate commits
    - Have targets with clear call-to-action, otherwise results may be ignored

Ch 8. Monitoring

With microservices, multiple servers, multiple log files, etc.
Make log files centrally available/searchable
Metric tracking
- aggregated sampling across time, across services
- but still see data for individual services and individual instances
Synthetic transaction (a.k.a., semantic monitoring)
- fake events to ensure behavior
Correlation IDs
- unique identifier (e.g., GUID) used to track a transaction across services/boundaries
- one path missing a correlation ID will break the monitoring
  - having a thin shared client wrapper library can help
Cascading failures
- monitor integration points between systems
Service
- response time, error rates, application-level metrics
- health of downstream services
- monitor OS to track rogue processes and for capacity planning
System
- aggregate host-level metrics with application-level metrics, but can drill down to individual hosts
- maintain long-term data to measure trends
- standardize on
  - tools
  - log formats
  - correlation IDs
  - call to action with alerts and dashboards
Future - common, generic system for business metrics and operation metrics to monitor system in a more holistic way

Ch 9. Security

Ch 10. Conway's Law and System Design

Organizational structure is a strong influence on the structure of the system
Microservices are modeled after business domains, not technical ones
- Teams aligned along bounded contexts, as services are.
Align service ownership to co-located teams, which are aligned around the same bounded contexts of the organization.

Versions Compared

Old Version 4

New Version 5

Key

Ch 7. Testing

Ch 8. Monitoring

Ch 9. Security

Ch 10. Conway's Law and System Design

Ch 11. Microservices at Scale

Ch 12. Final