...
Ch 11. Microservices at Scale
- At scale, concerns over failures (statistically likely) and performance
- Can over-optimize unless know requirements for:
- response time/latency
- availability
- durability of data
- Graceful degradation
- Architectural safety measures
- Anti-fragile organization by Nassim Taleb
- intentionally causing failures at Netflix and Google
- Timeouts
- too long slows down whole system
- too quick creates false negatives
- choose defaults and log → monitor → adjust
- Circuit Breakers
- Fail fast after a certain number of failures
- gracefully degrade or error
- queue for later if async
- Restart after certain threshold
- Fail fast after a certain number of failures
- Bulkheads
- Lose a part of the ship but rest remains intact
- Separation of concerns via separate microservices
- Timeouts and circuit breakers help free up resources when they become constrained
- Bulkheads ensure they don't become constrained in the first place
- Anti-fragile organization by Nassim Taleb