Testing Microservices, the sane way

The mainspring of microservices

The ability to develop, deploy and scale different business functionality independently is one of the most touted benefits of adopting a microservices architecture.

Full Stack in a Box — A Cautionary Tale

I often see or hear about many companies trying to replicate the entire service topology on developer laptops locally. I’ve had first hand experience with this fallacy at a previous company I worked at where we tried to spin up our entire stack in a Vagrant box. The Vagrant repo itself was called something along the lines of “full-stack in a box”, and the idea, as you might imagine, was that a simple vagrant up should enable any engineer in the company (even frontend and mobile developers) to be able to spin up the stack in its entirety on their laptops.

Building and Testing microservices, monolith style

What I’ve since come to realize after speaking with friends is that this is a problem that plagues not just startups but also a vast number of much larger organizations. Over the course of the last few years I’ve heard enough anecdotal evidence about the inherent fragility of this setup and the maintenance costs of the machinery required to support it that I now believe trying to spin up the full stack on developer laptops is fundamentally the wrong mindset to begin with, be it at startups or at bigger companies.

The Spectrum of Testing

Historically, testing has been something that referred to a pre-production or pre-release activity. Some companies employed — and continue to employ — dedicated teams of testers or QA engineers whose sole responsibility was to perform manual or automated tests for the software built by development teams. Once a piece of software passed QA, it was handed over to the Operations team to run (in the case of services) or shipped as a product release (in the case of desktop software or games and what have you).

Testing in Production as a substitute to Pre-Production testing?

I’ve previously written in great detail about “post-production testing” from primarily an Observability standpoint. Monitoring is a form of post-production testing, as is alerting, exploration and dynamic instrumentation. It might not even be stretching it to call techniques like feature flagging and gating as forms of testing in production. User interaction or user experience measurement — often performed using techniques like A/B testing and real user monitoring — can be interpreted as a form of testing in production as well.

What to test in production and what to test pre-production?

Inasmuch as testing of services is a spectrum, it’s important for both forms of testing to be primary considerations at the time of system design (both architecture as well as code), since it then unlocks the ability to decide what functionality of a system absolutely must be verified pre-production and what characteristics (more like a very long tail of idiosyncrasies) lend themselves better to being explored in production with the help of more comprehensive instrumentation and tooling.

From Charity Majors’ Strangeloop 2017 talk

Exploration is NOT for pre-production testing

Exploratory testing is an approach to testing that has been around since the 80’s. Practiced mostly by professional testers, exploratory testing was something deemed to require less preparation on the part of the tester, uncover crucial bugs and prove to be more “intellectually stimulating than execution of scripted tests”. I’ve never been a professional tester nor worked in an organization that had a separate team of software testers so that might explain why I only learned about this form of testing recently.

Operational semantics of the application

This includes writing code while concerning oneself with questions such as:

Operational characteristics of the dependencies

We build on top of increasingly leaky (and oftentimes frangible) abstractions with failure modes that are not well-understood. Examples of such characteristics I’ve had to be conversant with in the last three years have been:

Debuggable code

Writing debuggable code involves being able to ask questions in the future, which in turn involves:

Pre-production testing

Having made the case for a hybrid approach to testing, let’s get to the crux of what the rest of this post is about — pre-production testing of microservices.

The Goal of pre-production testing

As stated previously, I view pre-production testing as a best effort verification of the correctness of a system as well as a best effort simulation of the known failure modes. The goal of pre-production testing, as such, isn’t to prove there aren’t any bugs (except perhaps in parsers and any application that deals with money or safety), but to assure that the known-knowns are well covered and the known-unknowns have instrumentation in place for.

The Scope of pre-production testing

The scope of pre-production testing is only as good as our ability to conceive good heuristics that might prove to be a precursor of production bugs. This includes being able to approximate or intuit the boundaries of the system, the happy code paths (success cases) and more importantly the sad paths (error and exception handling), and continuously refine these heuristics over time.

Unit testing

Microservices are built on the notion of splitting up units of business logic into standalone services in-keeping with the single responsibility principle, where every individual service is responsible for a standalone piece of business or infrastructural functionality. These services then communicate with each other over the network either via some form of synchronous RPC mechanism or asynchronous message passing.

Mike Cohn’s test pyramid

Not all I/O is equal

Cory Benfield gave a great talk at PyCon 2016, where he argued that most libraries make the mistake of not separating protocol parsing from I/O, which makes both testing and code reuse really hard. This is certainly very true and something I absolutely agree with.

Making peace with mocking

From Google’s testing blog on knowing your doubles

The myriad unsung benefits of unit tests

There’s more to unit testing than what’s been discussed heretofore in this post. It’d be remiss to not talk about property-based testing and fuzzing while I’m on the topic of unit testing. Popularized by the QuickCheck library in Haskell (since then ported to Scala and other languages) and the Hypothesis library in Python, property-based testing allows one to run the same test multiple times over with varying inputs without requiring the programmer to generate a fixed set of inputs in the test case. Jessica Kerr has the best talk I’ve ever seen on the topic of property-based testing. Fred Hébert has an entire book on the topic for those interested in learning more, and in his review of this post, cast more light on the different types of approaches of different property-based tools:

VCR — or replaying or caching of test responses

Integration Testing

If unit testing with mocks is so fraught with fragility and lacking in verisimilitude, does that mean integration testing is then the remedy to cure all ills?

The Step Up Rule

I coined the term “step-up testing”, the general idea being to test at one layer above what’s generally advocated for. Under this model, unit tests would look more like integration tests (by treating I/O as a part of the unit under test within a bounded context), integration testing would look more like testing against real production, and testing in production looks more like, well, monitoring and exploration. The restructured test pyramid (test funnel?) for distributed systems would look like the following:

The Test Pyramid for Distributed Systems

Unit tests look more like integration tests

Well over 80% of the functionality of the central API server involves communicating with MongoDB, and thus the vast number of unit tests involve actually connecting to a local MongoDB instance. Service E is a LuaJIT auth proxy that load balances traffic originating from three different sources to different instances of the central API. One of the most critical pieces of functionality of service E is to ensure that the appropriate handler gets invoked for every Consul watch it sets, and thus some unit tests actually spin up a child Consul process to communicate with it, and then kill the process when the test finishes. These are two examples of services that were better tested with the sort of unit testing that purists might frown upon as integration tests.

Integration testing looks more like testing in production

Let’s take a closer look at service G, H, I and K (not shown in the diagram above).

Traffic Shaping with Service Meshes

One of the reasons I’m so excited about the emerging service mesh paradigm is because a proxy enables traffic shaping in a way that’s extremely conducive to testing. With a small amount of logic in the proxy to route staging traffic to the staging instance (which can be achieved with something as simple as setting a specific HTTP header in all non-production requests or based on the IP address of the incoming request), one can end up exercising the actual production stack for all but the service in question. This enables performing real integration testing with production services without the overhead of maintaining an ornate test environment.


The goal of this post wasn’t to make an argument for one form of testing over another. I’m hardly an expert in any of these things and I’d be the first to admit that my thinking has been primarily shaped by the type of systems I’ve worked with, the constraints I’ve had to deal with (chiefly very limited time and resources) and the organizational dynamics of the companies I’ve worked at. It might be very possible that none of what I posited in this post hold water in different scenarios with differing contexts.



@copyconstruct on Twitter. views expressed on this blog are solely mine, not those of present or past employers.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cindy Sridharan

Cindy Sridharan

@copyconstruct on Twitter. views expressed on this blog are solely mine, not those of present or past employers.