Cloud native testing – the future?
Cloud native testing. A Monday morning at Pagero. The early spring sunshine is met with uncompromising laughter and plenty of fresh coffee. There is also plenty of automated testing. We’re working on our new smart analytics platform, and have a shiny new feature that we just finished; we’re happy with our unit and integration test coverage. Now we’re just waiting for the CI/CD pipeline to run. And runs it does. In fact, it keeps on running.
Being impatient by nature, we start biting our nails. They’re really short in no time. But the pipeline is still running. Our patience is at an end, so we context-switch and start working on the next feature. Meanwhile, another team just pushed a bugfix and is now also waiting for our pipeline. Eventually, they tire.
Suddenly, the by now long-forgotten pipeline turns red, and chaos reigns supreme. What happened? Network issues were causing Docker pulls to take longer to execute. And since there were quite a few services in the test suite…
Experiencing déjà-vu? Then there’s a fair chance that you work with automated tests in a micro-service environment.
This is not a singular event; we keep seeing different scenarios that delay feedback from our test suites. As a strong believer in shifting tests left; I start thinking. On our journey from a monolith, via micro-services, to a cloud-native approach; our environment just keeps getting more complex. This makes automated testing harder – and the tests tend to become more error-prone.
Perhaps it’s time for a cloud native approach to testing?
Automated testing in a micro-service architecture
If you haven’t already seen it; I strongly recommend the material produced by Emily Bache on automated testing in a micro-service architecture.
Here is a video on the topic:
And, the first of a series of blog posts
”Perhaps it’s time for a cloud native approach to testing?”
Automated testing – because we care
Every day, our customers trust us with documents that are vital to their business. It might be ten thousand invoices or a single order of critical medical supplies.
At Pagero, we take this trust seriously. That’s why we get all warm and fuzzy when talking about boring things like compliance, redundancy, failover and high availability. It’s also why we do not compromise on automated testing – because we want to be able to sleep soundly at night.
Today, this means that we have multiple environments. We have a production cluster; a failover cluster, a staging environment and a test environment. We also have a CI/CD environment where we automatically deploy a new, temporary environment for each test suite. As the environment grows more complex, the preparations will take longer and longer to run. This, in turn, incurs context-switching, a shift to the right, and eventually generates a cost of delay.
In the future, I think we can simplify this.
Enter cloud native
With a cloud native approach, there is really no need for multiple environments.
We can have a hybrid cluster that spans multiple hardware profiles, continents and providers. We can also segment our environment so that an untrusted service cannot access our customer data, or in any way affect the customer experience. We are also able to route traffic along different paths depending on how we classify it.
This means that everything for which we need separate environments today can already be handled in one cloud native environment. That also applies to automated tests. We will no longer have to spend time and resources trying to produce production-like environments. We can do our testing in a verified environment.
“With a cloud native approach, there is really no need for multiple environments.”
We now have a way to run all the tests in a single; prepared environment. Perhaps we can also simplify how we test?
Purpose of testing
Unit tests are a great tool, so use them! But let’s now completely ignore them for the rest of this article. There are many other types of tests to talk about:
- Integration tests
- E2E Tests / Functional tests
- Acceptance tests
- Performance tests
- Security / penetration tests
- A/B tests
In fact, all these tests serve the same purpose. They strive to increase confidence in the new functionality before deployment. They merely focus on different areas, and in some cases – like acceptance tests – on different targets. Security/penetration-tests are also a bit different, but we’ll cover that in a moment.
What are we really trying to achieve? We want to make sure that our features work for the customer. We often do this by faking a small subset of the features used, or data provided by a customer.
Automate the customer
Depending on what industry you work in, it might actually be OK to let the customer test your system. For us, it is not. So we need another way to test the system with realistic customer data. The important thing is that it should not affect the customer’s critical documents.
Since we are working in the production environment, we have access to real-time customer data. It would be a shame not to use it. We can also record it, to be able to reproduce the traffic later.
Quite often, this is not enough, we may want to test our system with higher volumes of data; or with data that has not started to arrive yet. In this case, we can write a small client with the sole purpose of generating data, and sending it into the system.
We do this already in our current environment; we have set up fake customer accounts that constantly receive a steady load of realistic data. We use a small client in Go; combined with a custom generator based on Go templates.
Regardless of how we generate the data, we need a way to safely test it.
“What are we really trying to achieve? We want to make sure that our features work for the customer.”
A story of two services
Let’s consider the following two services for an instant:
- Candidate service – a service that has just been deployed. We do not yet have sufficient confidence that it will not cause fire, death and destruction.
- Graduated service – a service that we have tested, and that has been behaving well in production for a while now.
What we now need is a way to take a candidate service through graduation.
So imagine for a second that we would deploy these side by side. The stable, graduated service, and the updated service as a candidate. We want them to behave similarly, or different but in a predictable way.
We want to subject them to the same type of traffic and data – but without letting the candidate service wreak havoc in production. It could look something like this:
This approach would let us gain confidence in the following:
- The service will produce the required results and not have undesired side effects
- The service will be able to cofunction with all graduated versions of services in production – and current candidates.
- The service will handle the same load as the graduated service.
- Security issues in the new service should be contained, and not let anyone access other parts of the environment.
There are a couple of tools from the cloud native sphere that might let us do exactly this – service meshes and container sandboxing.
Service Meshes and Container Sandboxing
The concept of sandboxed containers has been around for a while, but there are some new implementations. If you’re familiar with Docker, you’ll know that Docker uses a large segment of the host kernel; and that most Docker containers run as root.
This is a potential security issue – so recently some alternatives have started to surface. These alternatives instead run the container inside a very lightweight virtual machine (VM).
This means that if the container is compromised, it’s still contained within the VM. Which sounds really useful for testing.
Without going into the complexity of the service mesh concept, I’ll focus on what it can do for us.
Traffic can be routed using labels. So if we slap a “TEST” label on a specific request, it will be possible to route that traffic through a specific service – or chain of services. We could then split the traffic and send the same traffic towards graduated & candidate services.
The same pattern is applicable to message queues and events. When we have no more available routes for this traffic, we can let it output the result. This would then be comparable to the output from the corresponding graduated service.
“The concept of sandboxed containers has been around for a while, but there are some new implementations.”
Using the results
We now have two sets of results, one from the candidate services and one from the graduated services.
If our results come from recorded traffic, we can use the baseline provided by the graduated services. In case we used the generated data, we can create a list of expected results.
For real-time data, we can manually compare the differences. If we’re feeling really devious, we could apply a machine learning algorithm, to attempt to detect variations.
A word of caution
If you’d like to try these patterns for testing, there are certain things to keep in mind.
Quite often, our services will maintain state – like storing a result set in a database; or will produce side effects – like emitting an event to another service. Unless we take specific precautions, this would also happen from the candidate services. It’s generally a good idea to keep your services free of state and side effects whenever possible.
Know your system
Make sure that you have monitoring and metrics in place. If you add load to the system, it might not scale, and you need to be able to detect that early. The same applies if you’re on a cloud provider, and the services start to autoscale – that could get expensive.
“Any work we do towards improving the architecture will automatically improve the tests.”
Advantages with cloud native testing
This approach would let us automatically deploy services to production as soon as the code is pushed and built. This should decrease context-switching and let us keep the flow. It will also reduce the threshold that often prevents us from working effectively with rapid deployments.
Meanwhile, the outcome of this type of testing should be a stronger confidence than usual. We will know that it performs as expected, scales, and is not prone to security issues. And more importantly – we will know this for the same configuration and data that are used in production.
It will also let us focus our work in one area. Any work we do towards improving the architecture, monitoring, metrics or stability of the production environment; will automatically improve the tests.
And as a side effect, if one of our graduated services should start regressing – perhaps due to memory issues or changing data – we will be able to detect this as well.