Health Checks and Graceful Degradation in Distributed Systems

The Two Types of Health Checks

Health is a Spectrum, not a Binary Taxonomy

The Need for Feedback Loops when applying Backpressure

Image from my presentation on the Prometheus monitoring system at Google NYC in November 2016
Alert taken from my presentation on the Prometheus monitoring system at OSCON in May 2017
Myriad forms of rate limiting and load shedding techniques
  1. Applied Performance Theory, Kavya Joshi from QCon London2018
  2. Queueing Theory in Practice: Performance Modeling for the Working Engineer, Eben Freeman from LISA 2017
  3. Stop Rate Limiting — Capacity Planning Done Right, Jon Moore from Strangeloop 2017
  4. Predictive Load Balancing: Unfair but Faster and More Robust, Steve Gury from Strangeloop 2017
  5. The chapters on Handling Overload and Addressing Cascading Failures from the SRE Book

Conclusion

--

--

--

@copyconstruct on Twitter. views expressed on this blog are solely mine, not those of present or past employers.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

AWS Windows Kubernetes Nodes with kops

API versioning and evolution with proxies

An Apache Spark Application In Microservices Ecosystems

Utilizing Spark Structured Streaming as a Middleware to Connect Microservices

5 Design Strategies To Reduce Site Latency

Building a Chat backend application inside a large scale ecosystem

Improving Instagram’s Music Audio Quality

A guide to deploying your React App with AWS S3 (Including HTTPS, a Custom Domain, a CDN and…

Assuring Agile Delivery

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cindy Sridharan

Cindy Sridharan

@copyconstruct on Twitter. views expressed on this blog are solely mine, not those of present or past employers.

More from Medium

Why are popular services almost never down!

Heterogeneous migration: reducing Dangdang’s customer system RTO 60x and increasing speed by 20%

Simplify migrating from Kafka to Pulsar with Kafka Connect Support

Developer’s experience with an event-driven solution implementation