Page Comparison

...

Ch 11. Microservices at Scale

At scale, concerns over failures (statistically likely) and performance
Can over-optimize unless know requirements for:
- response time/latency
- availability
- durability of data
Graceful degradation
Architectural safety measures
- Anti-fragile organization by Nassim Taleb
  - intentionally causing failures at Netflix and Google
- Timeouts
  - too long slows down whole system
  - too quick creates false negatives
  - choose defaults and log → monitor → adjust
- Circuit Breakers
  - Fail fast after a certain number of failures
    - gracefully degrade or error
    - queue for later if async
  - Restart after certain threshold
- Bulkheads
  - Lose a part of the ship but rest remains intact
  - Separation of concerns via separate microservices
- Timeouts and circuit breakers help free up resources when they become constrained
- Bulkheads ensure they don't become constrained in the first place