Flaky (as a good pastry) tests.

I was interviewing a bunch of SDET engineers some time ago. And one of the standard questions I asked them was about flaky tests. Have they seen them/how you have dealt with them?

This question was pretty much a litmus test. Everybody has flaky tests (especially if you have complex end-to-end tests). And even simple unit tests could easily be flaky.

The most interesting part was to just listen to how people try to handle that.

A personal story. When I was in Okta for some time, I was running the Eng Productivity team. One (probably the most significant responsibility) team was developing/operating a home-grown CI system. And the question of flaky tests was front and center. There was something like ~20k tests available for the monolith, and you had about 2/3 probability of stepping on a flaky test which doing the build.

Tim Secor and I spend quite a long time discussing/debating how to solve this problem.

I will talk next about some ideas/solutions we discovered back then and some things I read/thought about later on.

BTW. As usual, don’t expect magic. Everything is relatively straightforward and pragmatic.

Don’ts

Photo by Sandy Millar on Unsplash

Before jumping in, what you should be doing. Here is a simple thing that you should NOT be doing.

Rerun

Reruns are a big no-no. Either automatic or manual. Like Yoda said, “Reruns lead to fear. Fear leads to anger. Anger leads to hate and so on”.

Rerun feels so natural. It’s right there. One-click, and you get a green build. The obvious problem is that you just sweep the issue under the rug instead of fixing flaky tests. And they will accumulate more and more.

For example, when I joined the Eng Productivity team in Okta, we had automatic three reruns for failed tests. And it took us a while to dig ourselves from this whole (since it let hundreds and hundreds of flaky tests accumulate and be unaddressed).

BTW. I am not saying that you should stop in the track all reruns. If you go far on this road (using reruns), you won’t be able to quit cold turkey. However, you really want to wind it down.

Don’t assume that it’s test code that is broken.

It’s almost a continuation of a rerun. People use rerun because they think that test code is broken (vs. production code). However, imagine that your production code fails in this place with a 30% chance. And CI shows your flashing red alert about it, and you, instead of fixing the production code, are just ignoring it. Do you really want to let it go to production and customers flashing red alerts signs this time around?

Tests that do too much

I have seen tests that try to do too many things (in a row). This is especially common for end-to-end tests. Usually, the argument is, “The test setup is long and complicated, so we don’t want to do it for each test. So, let’s put all of these tests in one uber-test.”

Yeap. You win computer time, and you lose test simplicity/reproducibility (and tons of engineering time).

Dos

Photo by Neetu Laddha on Unsplash

Fix your flaky test

This sounds obvious. However, you really need to spend time on these things. This is especially true if you are at the journey’s beginning (don’t let hundreds of flaky tests accumulate).

However, I know it’s not always possible. Sometimes you can’t fix it because you can’t reproduce it locally, and/or it happens once in a blue moon, and you just don’t have enough clues to figure it out.

However, overall you want to fix more problems than introduce them.

Keep your test code clean.

Often enough, production code is treated well. And tests code is treated like a third-class citizen.

Just to be fair, my tests may not win a beauty contest, but they are simple/readable, and so on.

Bad/smelly and complex test code often results in brittleness (which would be much easier to spot in cleaner code).

The statistic is your friend

One of the things we started to do almost immediately (back in Okta) is capture statistics on how often each test fails. This was helpful. If you have a big test base, you don’t want to start fixing tests that fail only in 0.1%. Instead, you want to start with tests that fail 30%. (Just to get a bigger bang for your buck).

Figure out how to run only tests which you need

Let me speak with both sides of my mouth here. On the one hand, microservices are good in that you have clear boundaries, and as a result, you can run only tests related to this service.

On the other hand, most of the problems with tests happen for end-to-end scenarios (vs. unit tests). And end-to-end tests will span multiple microservices, so microservices architecture doesn’t completely remove this problem.

However, stepping back. Suppose you don’t need to run some tests for specific changes (because they are not related), don’t run them. You don’t want to spend cycles on unrelated tests (especially if they are not in your area of ownership).

Respect test pyramid

There is a classical test pyramid which I already mentioned in several articles (including Building high-quality software). The idea is that you should have tons of unit tests, fewer integration tests, and even fewer end-to-end tests. Unit tests should be fast, and (most of the time) they are bulletproof. Integration tests are slower and usually have more probability of failing, and end-to-end tests are even slower and have a higher chance of failing.

As a result, you really want to be unit tests heavy because this makes your tests way more robust, and end-to-end should be just sprinkled on top.

Sometimes it’s ok to kill the test.

Please-please be cautious with that. It’s usually a last resort thing. Usually, tests have a positive value. They prevent regressions and allow us to move forward with confidence. However, I saw some poorly written tests with a high probability of failure, which was tolerated for a long (using reruns). They definitely had a serious negative value (and as a result, sometimes it’s ok to delete them).

Obviously, it’s better to fix or rewrite them. However, again, it’s ok to delete tests in some exceptional cases. (as a last resort thing).

BTW. I always argue for one of two things — you either fix the test or delete it, never comment it out. Commenting things out create a very strange middle ground.

Final thoughts

Photo by Kate Remmer on Unsplash

The problem of flaky tests will never go away from you. It will rare its ugly head again and again in crazy situations.

The most important part here is not eradicating the problem (it’s impossible) but keeping it continuously at bay.

A simple rule of thumb is that your project should get a green build at least 95% of the time (if you just continue to rerun it). If it’s failing more often than that, it’s good to start spending extra cycles addressing these issues.

And probably to repeat. I am not a saint. Some of the projects I was working on were green all the way home, while some problems had issues with flaky tests, which I left unattended for quite a while.

In the end, it’s a pragmatic decision. Tests are created to reduce the time for manual testing, regressions (and all associated expenses). In most cases, your investments in it pay multiple times off, but ultimately you are the judge when enough investment is enough.

Flaky (as a good pastry) tests. was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

Post date January 5, 2022
Post categories In software engineering, software-development, testing

This content originally appeared on Level Up Coding - Medium and was authored by Victor Ronin

Photo by Mae Mu on Unsplash

I was interviewing a bunch of SDET engineers some time ago. And one of the standard questions I asked them was about flaky tests. Have they seen them/how you have dealt with them?

This question was pretty much a litmus test. Everybody has flaky tests (especially if you have complex end-to-end tests). And even simple unit tests could easily be flaky.

The most interesting part was to just listen to how people try to handle that.

Tim Secor and I spend quite a long time discussing/debating how to solve this problem.

I will talk next about some ideas/solutions we discovered back then and some things I read/thought about later on.

BTW. As usual, don't expect magic. Everything is relatively straightforward and pragmatic.

Don'ts

Photo by Sandy Millar on Unsplash

Before jumping in, what you should be doing. Here is a simple thing that you should NOT be doing.

Rerun

Reruns are a big no-no. Either automatic or manual. Like Yoda said, "Reruns lead to fear. Fear leads to anger. Anger leads to hate and so on".

Rerun feels so natural. It's right there. One-click, and you get a green build. The obvious problem is that you just sweep the issue under the rug instead of fixing flaky tests. And they will accumulate more and more.

BTW. I am not saying that you should stop in the track all reruns. If you go far on this road (using reruns), you won't be able to quit cold turkey. However, you really want to wind it down.

Don't assume that it's test code that is broken.

It's almost a continuation of a rerun. People use rerun because they think that test code is broken (vs. production code). However, imagine that your production code fails in this place with a 30% chance. And CI shows your flashing red alert about it, and you, instead of fixing the production code, are just ignoring it. Do you really want to let it go to production and customers flashing red alerts signs this time around?

Tests that do too much

I have seen tests that try to do too many things (in a row). This is especially common for end-to-end tests. Usually, the argument is, "The test setup is long and complicated, so we don't want to do it for each test. So, let's put all of these tests in one uber-test."

Yeap. You win computer time, and you lose test simplicity/reproducibility (and tons of engineering time).

Dos

Photo by Neetu Laddha on Unsplash

Fix your flaky test

This sounds obvious. However, you really need to spend time on these things. This is especially true if you are at the journey's beginning (don't let hundreds of flaky tests accumulate).

However, I know it's not always possible. Sometimes you can't fix it because you can't reproduce it locally, and/or it happens once in a blue moon, and you just don't have enough clues to figure it out.

However, overall you want to fix more problems than introduce them.

Keep your test code clean.

Often enough, production code is treated well. And tests code is treated like a third-class citizen.

Just to be fair, my tests may not win a beauty contest, but they are simple/readable, and so on.

Bad/smelly and complex test code often results in brittleness (which would be much easier to spot in cleaner code).

The statistic is your friend

One of the things we started to do almost immediately (back in Okta) is capture statistics on how often each test fails. This was helpful. If you have a big test base, you don't want to start fixing tests that fail only in 0.1%. Instead, you want to start with tests that fail 30%. (Just to get a bigger bang for your buck).

Figure out how to run only tests which you need

Let me speak with both sides of my mouth here. On the one hand, microservices are good in that you have clear boundaries, and as a result, you can run only tests related to this service.

However, stepping back. Suppose you don't need to run some tests for specific changes (because they are not related), don't run them. You don't want to spend cycles on unrelated tests (especially if they are not in your area of ownership).

Respect test pyramid

As a result, you really want to be unit tests heavy because this makes your tests way more robust, and end-to-end should be just sprinkled on top.

Sometimes it's ok to kill the test.

Please-please be cautious with that. It's usually a last resort thing. Usually, tests have a positive value. They prevent regressions and allow us to move forward with confidence. However, I saw some poorly written tests with a high probability of failure, which was tolerated for a long (using reruns). They definitely had a serious negative value (and as a result, sometimes it's ok to delete them).

Obviously, it's better to fix or rewrite them. However, again, it's ok to delete tests in some exceptional cases. (as a last resort thing).

BTW. I always argue for one of two things — you either fix the test or delete it, never comment it out. Commenting things out create a very strange middle ground.

Final thoughts

Photo by Kate Remmer on Unsplash

The problem of flaky tests will never go away from you. It will rare its ugly head again and again in crazy situations.

The most important part here is not eradicating the problem (it's impossible) but keeping it continuously at bay.

A simple rule of thumb is that your project should get a green build at least 95% of the time (if you just continue to rerun it). If it's failing more often than that, it's good to start spending extra cycles addressing these issues.

In the end, it's a pragmatic decision. Tests are created to reduce the time for manual testing, regressions (and all associated expenses). In most cases, your investments in it pay multiple times off, but ultimately you are the judge when enough investment is enough.

Flaky (as a good pastry) tests. was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This content originally appeared on Level Up Coding - Medium and was authored by Victor Ronin

Print Share Comment Cite Upload Translate Updates

APA

Victor Ronin | Sciencx (2022-01-05T22:28:40+00:00) Flaky (as a good pastry) tests.. Retrieved from https://www.scien.cx/2022/01/05/flaky-as-a-good-pastry-tests/

MLA

" » Flaky (as a good pastry) tests.." Victor Ronin | Sciencx - Wednesday January 5, 2022, https://www.scien.cx/2022/01/05/flaky-as-a-good-pastry-tests/

HARVARD

Victor Ronin | Sciencx Wednesday January 5, 2022 » Flaky (as a good pastry) tests.., viewed ,<https://www.scien.cx/2022/01/05/flaky-as-a-good-pastry-tests/>

VANCOUVER

Victor Ronin | Sciencx - » Flaky (as a good pastry) tests.. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2022/01/05/flaky-as-a-good-pastry-tests/

CHICAGO

" » Flaky (as a good pastry) tests.." Victor Ronin | Sciencx - Accessed . https://www.scien.cx/2022/01/05/flaky-as-a-good-pastry-tests/

IEEE

" » Flaky (as a good pastry) tests.." Victor Ronin | Sciencx [Online]. Available: https://www.scien.cx/2022/01/05/flaky-as-a-good-pastry-tests/. [Accessed: ]

rf:citation

» Flaky (as a good pastry) tests. | Victor Ronin | Sciencx | https://www.scien.cx/2022/01/05/flaky-as-a-good-pastry-tests/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Don'ts

Dos

Final thoughts

Related Posts