Microservices open up new ways to design system tests

I’ve written before about Pagero’s journey from a Monolith to a Microservices architecture, and efforts to achieve Continuous Delivery. One part of that journey that I’ve been very involved in is the evolution of our automated end-to-end system tests.

It turns out that having lots of microservices opens up new ways to design these tests. We found we could make them faster and easier to debug, as well as having less test code to maintain. I’m quite proud of the approach we’ve come up with and I thought others might be interested to hear about it.

Before we started with microservices, we had a relatively simple test structure for the monolith. We were testing at various levels, mostly according to the standard testing pyramid:

Figure 1: Our testing pyramid before we introduced microservices.

Most of the test cases were unit tests, at the class and method level. Over that, we had quite a few tests focused around the data access layer, using a fairly fine-grained API to check the ORM integration with the database. Then there were some higher-level tests that accessed an API to test whole features. Then came the end-to-end tests, testing the full stack via the GUI.

This was working pretty well when I arrived just over three years ago. The trouble was, new functionality was being built in external microservices, and we needed some way to test them together with the monolith. Each new microservice of course had its own tests – mostly unit tests, but also a few tests for the whole service running in isolation of the others. Testing the pieces like that, only gets you so far, though. The higher levels of the testing pyramid are also needed, and that’s where our bottleneck seemed to be.

“Having lots of microservices opens up new ways to design automated end-to-end system tests.”

Pain points with the old system tests

The thing was, the old service-level tests weren’t very easy to read or understand, and they were a lot of work to maintain. The GUI tests were in better shape, using modern tools, with a good structure. But actually, GUI tests are always expensive to maintain and run, and it turns out a lot of the functionality in our system is provided primarily via APIs, not the user interface.

I thought we should probably be looking to expand our service-layer API tests to cover the new microservices too. I was experimenting with different ways to do this, when we ran into the killer problem with these old tests – when we ran them in the new microservices architecture, they suddenly got 10 times slower!

When the tests could be run in the same JVM as the entire monolithic system, everything was fine. The tests accessed the functionality using RMI calls. As if by magic, the application server converted all these remote calls into local calls. It all changed when we introduced microservices though.

We had to deploy the monolith in its Docker container alongside all the others, and suddenly the tests were in a different JVM. All those hundreds of RMI calls immediately became an order of magnitude slower, as the application server no longer optimised them away. What’s more, we found we had to rebuild the tests every time we rebuilt the system, since the client end of an RMI call must exactly match the server version. It was all getting rather difficult to manage, and the tests no longer took 20 minutes to run, it was more like 2 hours.

So I proposed that we should build a new suite of API tests, that would be designed to work in a microservices architecture from the ground up. They should be faster to run, cheaper to maintain, and easier to find the root cause when something failed. Quite a shopping list! This is what I came up with:

Strategy for the New System Tests

The new tests access all the services using REST, which is a much more flexible and universal way to make remote calls than Java RMI. Each test case also listens to all the messages being sent between the various microservices while it processes the test request. This enables us to track the progress of a test while it’s running, and if it fails, make it easier to work out at what point it went wrong. It also means the test can react to events in the system, and trigger exceptional workflows.

Those decisions were quite obvious in a way, that the tests should speak the same language as the microservices. By doing that, the tests naturally ran more quickly, and were easier to debug. I felt we could do more to reduce the maintenance burden though. To that end we decided they would also be Data-Driven, and use Approval testing.

“The new tests access all the services using REST, which is a much more flexible and universal way to make remote calls than Java RMI.”

Data-Driven Testing

Data-Driven Testing means you generally keep the workflow the same, but vary the data from case to case. This minimizes the amount of new code per test case, since most of the time it is enough to change the input data in order to exercise a new or updated feature. In the case of Pagero Online, the most important use case is sending a document from an Issuer to a Recipient. So this is the use-case that almost all our tests exercise.

Within that basic premise, that a test comprises an Issuer, a Recipient, and a Document being sent between them, there are a host of variations that you need to cover. If you look at our test cases, essentially all of them are based around these three elements.

We define them in structured text files separate from the fixture code that performs the use case. The fixture code parses the text files, makes REST calls to create the corresponding Issuer and Recipient in the system, then makes a request to Pagero Online on behalf of the Issuer to send the document to the Recipient.

Defining a new test case is often just a matter of defining a different configuration for these three entities. It’s only if you need a significant variation on that basic use-case that you need to go in and write any new test fixture code.

Approval Testing

Approval Testing refers to the way you determine whether a test has passed. You may be familiar already with Assertion-Based Testing, where you have to decide in each test case what the correct behaviour is, by making assertions about the output you require. In Approval Testing, you instead compare the current output with an ‘Approved’ version of that output, that was gathered in a previous test run. Any difference between the actual output and the approved version is marked as failure.

So with Pagero Online, we store an approved version of the document that is presented to the Recipient. We also store an approved version of the list of messages that were passed between the services while the document was being processed, and any error logs related to it. By examining all those different kinds of output from Pagero Online, we can be confident the document was processed correctly.

This approach has several advantages, particularly when used together with Data-Driven testing. You mostly don’t have to write assertion code that is specific to a certain test case, reducing the amount of test code further. What’s more, because every test is actually checking quite a large number of things, you end up finding bugs you didn’t anticipate when you wrote the test. For example, discovering a stray extra message being sent that shouldn’t be, or a formatting error on page three of the document presentation.

New system tests replacing the old ones

It took a few months, but eventually the new system tests started to prove themselves to be useful testing new functionality we had built. Over the following two years or so, we did a lot of work replacing the old tests with new ones – and we’re not finished yet actually. In accordance with the 80/20 rule, that last 20% of them seems to be taking an inordinately long time.

In the meantime, I think we’ve shown the new approach to automated system tests has been largely successful in the aim of reducing test maintenance, while being effective and running quickly. Because they execute in parallel, we can make the tests run faster by adding hardware. Usually it takes about 15 minutes to get results.

“I think we’ve shown the new approach has been largely successful in the aim of reducing test maintenance, while being effective and running quickly.”

The overall picture

Looking at the full picture of all our testing activities, we still have many unit tests, those are the cheapest and simplest to maintain. Each microservice has whole-service tests that prove it works in isolation, when calls to other services are stubbed out. The next level of testing is the new system tests, which run all the services together, and then the GUI tests which test the whole stack. If all those automated tests pass, we have some manual tests and some further regression tests before the release candidate is ready to be deployed.

Figure 2: Testing pyramid including our microservice tests and new system tests.

Actually, it’s a little more complex than I just made it sound. Configuration management of all those microservices and the test cases that exercise them is decidedly more complex than I imagined when we started! Instead of having a new version of one monolith to test, you have new versions of loads of microservices to co-ordinate. It’s not a trivial problem, and I’ll tell you about how we handle that, in part two of this article.