Testing Microservices: Challenges and Solutions

The transition to distributed environments has created complexity, overhead, and friction when writing and running new backend tests.

These tests require a lot of preparation, infrastructure building, and maintenance since many services communicate asynchronously and they often miss exceptions that are thrown on the “deeper layer” of the system architecture and it’s hard to make it testable.

In this blog, I would like to show that by running trace-based automated tests, you can validate your data through robust tests with almost zero effort. Here’s how it can be done.

Backend Testing in a Distributed Environment

What does the testing of a distributed backend look like today? The following flow depicts a trace of a typical financial transaction app. As you can see, there are a lot of components that depend on each other.

There are two Kafka topics, as well as Postgres, Dynamo DB, third-party APIs, and five microservices. Any code change in any service can and often does affect multiple others.

A trace visulization using Helios
A trace of a typical microservices-based app using Helios

If this were a monolith, a failure would simply return an HTTP 500 status code from the server. But in this microservices architecture, you might get 200 from the BFF (Backend for Front) microservice while the exception would still be thrown on another microservice, without you receiving any indication about it.

How Developers Usually Build Test Automation Infrastructure for Microservices

Let’s say we want to test the following use cases. They are both parts of the backend e2e happy flow.

  1. Each time the POST request gets to the `deposit` endpoint, we will check the email is sent to the customer via SES.
  2. Ensuring the client was charged and that the Stripe call was successful.

There are a few ways to build test automation for these two scenarios.

1. Log-based Testing

Assuming that the developer added logs for each operation, we can fetch the logs from those services and validate that the relevant data exists.

For example:

class OrderTest(unittest.TestCase):
def test_process_order_happy_flow(self):
with self.assertLogs('foo', level='INFO') as cm:
requests.get(f'/process_order/{TEST_CLIENT_ID}')
self.assertEqual(
cm.output,
[f'INFO:send email to {TEST_CLIENT_ID}',
f'INFO:Charge {TEST_CLIENT_ID} succeeded!']
)

What are the challenges with this approach?

  1. We assume that the developer added logs, which is not always true.
  2. We can only validate the data that was written in the logs. If, for example, there are no written payloads of a request to the logs, they can’t be validated.
  3. We coupled logs and the tests — which sounds like a perfect way to create flaky tests.
  4. We are testing only the logs of the operation and not the operation itself. As a result, we don’t know if the operation succeeded.

2. DB Querying

By saving the operation indication in DB, the DB can be queried during the test to validate if it exists.

class OrderTest(unittest.TestCase):
def test_process_order_happy_flow(self):
requests.get(f'/process_order/{TEST_CLIENT_ID}')
client = Client.get_by_id(TEST_CLIENT_ID)
assert client.charged_successfully is True
assert client.email_sent is True

What are the challenges with this approach?

  1. We need to expose the database to the tests project, which sometimes requires heavy lifting.
  2. We need to engineer the database for the tests, which is overengineering and not the focus of our work, making it a weird thing to do.
  3. We coupled our DB model to the tests, which is logically wrong.
  4. And again, just like with the previous log solution, we are testing the DB object update operation and not the actual operation, which is entirely wrong and can’t detect any real issues.

These testing solutions are only the tip of the iceberg of odd testing solutions for distributed applications.

How We Perform Backend Testing

At Helios, we use trace-based testing. Traces enable a new type of testing because they allow us to see all the operations triggered in our distributed system by a single operation. This makes it easy to approach each operation and see it as part of a whole, and not as an individual action.

In addition, traces can be created automatically, without a developer having to decide where to insert them, unlike logs.

The building block of a trace is a span (you can read more about that here). A span is an interaction between two components in your application. Spans allow us to review the attributes we want to validate when testing.

As a result:

  • You don’t need to create test infrastructure that exposes interfaces of ‘inner layers’ in their systems.
  • You can create meaningful tests without fully understanding the system and how it operates.
  • It is extremely easy to see all kinds of flows in the application across multiple services.
  • If developers see a trace of a bug in production — they can create a test directly from the bug. This is an implementation of the paradigm that says “create tests that make you avoid each bug that you find”.

How To Test with Traces and Spans

In this next section, I’m going to show how we auto-generate test code using Helios. You can try this out yourself for free here, just follow the instructions below. If you don’t want to install it you can start by playing with it using the Sandbox.

Once you have it installed, spans are created automatically by the auto-instrumentation SDK, which is based on OpenTelemetry.
OpenTelemetry (OTel), is an open-source solution that provides a collection of SDKs, APIs, and tools for collecting and correlating telemetry data from different interactions in cloud-native, distributed systems.

Through spans, it collects all the payloads of every communication between two components in the system, i.e each request and response. With this capability, you can build robust tests without changing a single code line.

Here is the code for the use cases from before, generated automatically based on traces:

def test_by_helios():
requests.post(URL, headers=HEADERS, json=DATA)
http_post_spans = test_trace_manager.find_spans(
service=CHARGE_SERVICE_NAME,
operation=CHARGE_SPAN_OPERATION,
span_selectors=STRIPE_SPAN_SELECTORS
)
assert
http_post_spans, 'HTTPS POST in accounts-service did not occur'

ses_send_spans = test_trace_manager.find_spans(
service=SES_SERVICE_NAME,
operation=SES_SPAN_OPERATION,
span_selectors=SES_SPAN_SELECTORS
)
assert
ses_send_spans, 'aws.ses.sendEmail in emails-service did not occur'

Notice the [test_trace_manager.find_spans(…)] method. This method allows developers to write tests based on spans that occurred in a specific trace. This enables anyone, regardless of their code experience, to generate backend automation tests.

Generating Trace-Based Tests

To generate the test from a trace in the Helios app, you can change the mode from view to test and then set a validation checkpoint for each span.

A view of a trace on ‘view’ mode in Helois
Switching the mode from ‘view’ to ‘test’

You can also configure each validation checkpoint and select what they want to validate.

A configuration window on Helios to configure attributes and payloads to generate precise tests.
Customizing attributes and payloads to generate precise tests for various flows

After that, they can generate the test code and export it to any supported language.

Here’s what the auto-generated test will look like:

A microservices test code that was auto-generated using Helios
Test code that was auto-generated using Helios

You can implement the known method of creating a test from a bug. In addition, using the trace visualization tool — you can now create a test from a trace visually, which is a game-changer in the backend testing domain.

Thanks for reading and please feel free to comment and ask anything below.

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Join the Level Up talent collective and find an amazing job


Testing Microservices: Challenges and Solutions was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

The transition to distributed environments has created complexity, overhead, and friction when writing and running new backend tests.

These tests require a lot of preparation, infrastructure building, and maintenance since many services communicate asynchronously and they often miss exceptions that are thrown on the “deeper layer” of the system architecture and it’s hard to make it testable.

In this blog, I would like to show that by running trace-based automated tests, you can validate your data through robust tests with almost zero effort. Here’s how it can be done.

Backend Testing in a Distributed Environment

What does the testing of a distributed backend look like today? The following flow depicts a trace of a typical financial transaction app. As you can see, there are a lot of components that depend on each other.

There are two Kafka topics, as well as Postgres, Dynamo DB, third-party APIs, and five microservices. Any code change in any service can and often does affect multiple others.

A trace visulization using Helios
A trace of a typical microservices-based app using Helios

If this were a monolith, a failure would simply return an HTTP 500 status code from the server. But in this microservices architecture, you might get 200 from the BFF (Backend for Front) microservice while the exception would still be thrown on another microservice, without you receiving any indication about it.

How Developers Usually Build Test Automation Infrastructure for Microservices

Let’s say we want to test the following use cases. They are both parts of the backend e2e happy flow.

  1. Each time the POST request gets to the `deposit` endpoint, we will check the email is sent to the customer via SES.
  2. Ensuring the client was charged and that the Stripe call was successful.

There are a few ways to build test automation for these two scenarios.

1. Log-based Testing

Assuming that the developer added logs for each operation, we can fetch the logs from those services and validate that the relevant data exists.

For example:

class OrderTest(unittest.TestCase):
def test_process_order_happy_flow(self):
with self.assertLogs('foo', level='INFO') as cm:
requests.get(f'/process_order/{TEST_CLIENT_ID}')
self.assertEqual(
cm.output,
[f'INFO:send email to {TEST_CLIENT_ID}',
f'INFO:Charge {TEST_CLIENT_ID} succeeded!']
)

What are the challenges with this approach?

  1. We assume that the developer added logs, which is not always true.
  2. We can only validate the data that was written in the logs. If, for example, there are no written payloads of a request to the logs, they can’t be validated.
  3. We coupled logs and the tests — which sounds like a perfect way to create flaky tests.
  4. We are testing only the logs of the operation and not the operation itself. As a result, we don’t know if the operation succeeded.

2. DB Querying

By saving the operation indication in DB, the DB can be queried during the test to validate if it exists.

class OrderTest(unittest.TestCase):
def test_process_order_happy_flow(self):
requests.get(f'/process_order/{TEST_CLIENT_ID}')
client = Client.get_by_id(TEST_CLIENT_ID)
assert client.charged_successfully is True
assert client.email_sent is True

What are the challenges with this approach?

  1. We need to expose the database to the tests project, which sometimes requires heavy lifting.
  2. We need to engineer the database for the tests, which is overengineering and not the focus of our work, making it a weird thing to do.
  3. We coupled our DB model to the tests, which is logically wrong.
  4. And again, just like with the previous log solution, we are testing the DB object update operation and not the actual operation, which is entirely wrong and can’t detect any real issues.

These testing solutions are only the tip of the iceberg of odd testing solutions for distributed applications.

How We Perform Backend Testing

At Helios, we use trace-based testing. Traces enable a new type of testing because they allow us to see all the operations triggered in our distributed system by a single operation. This makes it easy to approach each operation and see it as part of a whole, and not as an individual action.

In addition, traces can be created automatically, without a developer having to decide where to insert them, unlike logs.

The building block of a trace is a span (you can read more about that here). A span is an interaction between two components in your application. Spans allow us to review the attributes we want to validate when testing.

As a result:

  • You don’t need to create test infrastructure that exposes interfaces of ‘inner layers’ in their systems.
  • You can create meaningful tests without fully understanding the system and how it operates.
  • It is extremely easy to see all kinds of flows in the application across multiple services.
  • If developers see a trace of a bug in production — they can create a test directly from the bug. This is an implementation of the paradigm that says “create tests that make you avoid each bug that you find”.

How To Test with Traces and Spans

In this next section, I’m going to show how we auto-generate test code using Helios. You can try this out yourself for free here, just follow the instructions below. If you don’t want to install it you can start by playing with it using the Sandbox.

Once you have it installed, spans are created automatically by the auto-instrumentation SDK, which is based on OpenTelemetry.
OpenTelemetry (OTel), is an open-source solution that provides a collection of SDKs, APIs, and tools for collecting and correlating telemetry data from different interactions in cloud-native, distributed systems.

Through spans, it collects all the payloads of every communication between two components in the system, i.e each request and response. With this capability, you can build robust tests without changing a single code line.

Here is the code for the use cases from before, generated automatically based on traces:

def test_by_helios():
requests.post(URL, headers=HEADERS, json=DATA)
http_post_spans = test_trace_manager.find_spans(
service=CHARGE_SERVICE_NAME,
operation=CHARGE_SPAN_OPERATION,
span_selectors=STRIPE_SPAN_SELECTORS
)
assert
http_post_spans, 'HTTPS POST in accounts-service did not occur'

ses_send_spans = test_trace_manager.find_spans(
service=SES_SERVICE_NAME,
operation=SES_SPAN_OPERATION,
span_selectors=SES_SPAN_SELECTORS
)
assert
ses_send_spans, 'aws.ses.sendEmail in emails-service did not occur'

Notice the [test_trace_manager.find_spans(…)] method. This method allows developers to write tests based on spans that occurred in a specific trace. This enables anyone, regardless of their code experience, to generate backend automation tests.

Generating Trace-Based Tests

To generate the test from a trace in the Helios app, you can change the mode from view to test and then set a validation checkpoint for each span.

A view of a trace on ‘view’ mode in Helois
Switching the mode from ‘view’ to ‘test’

You can also configure each validation checkpoint and select what they want to validate.

A configuration window on Helios to configure attributes and payloads to generate precise tests.
Customizing attributes and payloads to generate precise tests for various flows

After that, they can generate the test code and export it to any supported language.

Here’s what the auto-generated test will look like:

A microservices test code that was auto-generated using Helios
Test code that was auto-generated using Helios

You can implement the known method of creating a test from a bug. In addition, using the trace visualization tool — you can now create a test from a trace visually, which is a game-changer in the backend testing domain.

Thanks for reading and please feel free to comment and ask anything below.

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Join the Level Up talent collective and find an amazing job


Testing Microservices: Challenges and Solutions was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


Print Share Comment Cite Upload Translate
APA
Aviv Kerbel | Sciencx (2024-03-28T19:47:51+00:00) » Testing Microservices: Challenges and Solutions. Retrieved from https://www.scien.cx/2022/12/28/testing-microservices-challenges-and-solutions/.
MLA
" » Testing Microservices: Challenges and Solutions." Aviv Kerbel | Sciencx - Wednesday December 28, 2022, https://www.scien.cx/2022/12/28/testing-microservices-challenges-and-solutions/
HARVARD
Aviv Kerbel | Sciencx Wednesday December 28, 2022 » Testing Microservices: Challenges and Solutions., viewed 2024-03-28T19:47:51+00:00,<https://www.scien.cx/2022/12/28/testing-microservices-challenges-and-solutions/>
VANCOUVER
Aviv Kerbel | Sciencx - » Testing Microservices: Challenges and Solutions. [Internet]. [Accessed 2024-03-28T19:47:51+00:00]. Available from: https://www.scien.cx/2022/12/28/testing-microservices-challenges-and-solutions/
CHICAGO
" » Testing Microservices: Challenges and Solutions." Aviv Kerbel | Sciencx - Accessed 2024-03-28T19:47:51+00:00. https://www.scien.cx/2022/12/28/testing-microservices-challenges-and-solutions/
IEEE
" » Testing Microservices: Challenges and Solutions." Aviv Kerbel | Sciencx [Online]. Available: https://www.scien.cx/2022/12/28/testing-microservices-challenges-and-solutions/. [Accessed: 2024-03-28T19:47:51+00:00]
rf:citation
» Testing Microservices: Challenges and Solutions | Aviv Kerbel | Sciencx | https://www.scien.cx/2022/12/28/testing-microservices-challenges-and-solutions/ | 2024-03-28T19:47:51+00:00
https://github.com/addpipe/simple-recorderjs-demo