Tag Archives: Titus Winters

Test Failures Should Be Actionable

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Titus Winters

There are a lot of rules and best practices around unit testing. There are many posts on this blog; there is deeper material in the Software Engineering at Google book; there is specific guidance for every major language; there is guidance on test frameworks, test naming, and dozens of other test-related topics. Isn’t this excessive?

Good unit tests contain several important properties, but you could focus on a key principle: Test failures should be actionable.

When a test fails, you should be able to begin investigation with nothing more than the test’s name and its failure messages—no need to add more information and rerun the test.

Effective use of unit test frameworks and assertion libraries (JUnit, Truth, pytest, GoogleTest, etc.) serves two important purposes. Firstly, the more precisely we express the invariants we are testing, the more informative and less brittle our tests will be. Secondly, when those invariants don’t hold and the tests fail, the failure info should be immediately actionable. This meshes well with Site Reliability Engineering guidance on alerting.

Consider this example of a C++ unit test of a function returning an absl::Status (an Abseil type that returns either an “OK” status or one of a number of different error codes):

EXPECT_TRUE(LoadMetadata().ok());

EXPECT_OK(LoadMetadata());

Sample failure output

load_metadata_test.cc:42: Failure

Value of: LoadMetadata().ok()

Expected: true

Actual: false

load_metadata_test.cc:42: Failure

Value of: LoadMetadata()

Expected: is OK

Actual: NOT_FOUND: /path/to/metadata.bin

If the test on the left fails, you have to investigate why the test failed; the test on the right immediately gives you all the available detail, in this case because of a more precise GoogleTest matcher.

Here are some other posts on this blog that emphasize making test failures actionable:

  • Writing Descriptive Test Names - If our tests are narrow and sufficiently descriptive, the test name itself may give us enough information to start debugging.

  • Keep Tests Focused - If we test multiple scenarios in a single test, it’s hard to identify  exactly what went wrong.

  • Prefer Narrow Assertions in Unit Tests - If we have overly wide assertions (such as  depending on every field of a complex output proto), the test may fail for many unimportant reasons. False positives are the opposite of actionable.

  • Keep Cause and Effect Clear - Refrain from using large global test data structures shared across multiple unit tests, allowing for clear identification of each test’s setup.

How I Learned To Stop Writing Brittle Tests and Love Expressive APIs

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Titus Winters

A valuable but challenging property for tests is “resilience,” meaning a test should only fail when something important has gone wrong. However, an opposite property may be easier to see: A “brittle” test is one that fails not for real problems that would break in production, but because the test itself is fragile for innocuous reasons. Error messages, changing the order of metadata headers in a web request, or the order of calls to a heavily-mocked dependency can often cause a brittle test to fail.

Expressive test APIs are a powerful tool in the fight against brittle, implementation-detail heavy tests. A test written with IsSquare(output) is more expressive (and less brittle) than a test written with details such as JsonEquals(.width = 42, .length = 42), in cases where the size of the square is irrelevant. Similar expressive designs might include unordered element matching for hash containers, metadata comparisons for photos, and activity logs in processing objects, just to name a few. 

As an example, consider this C++ test code:

absl::flat_hash_set<int> GetValuesFromConfig(const Config&);


TEST(ConfigValues, DefaultConfigsArePrime) {

  // Note the strange order of these values. BAD CODE, DON’T DO THIS!

  EXPECT_THAT(GetValuesFromConfig(Config()), ElementsAre(29, 17, 31));

}

The reliance on hash ordering makes this test brittle, preventing improvements to the API being tested. A critical part of the fix to the above code was to provide better test APIs that allowed engineers to more effectively express the properties that mattered. Thus we added UnorderedElementsAre to the GoogleTest test framework and refactored brittle tests to use that: 

TEST(ConfigValues, DefaultConfigsArePrimeAndOrderDoesNotMatter) {

  EXPECT_THAT(GetValuesFromConfig(Config()), UnorderedElementsAre(17, 29, 31));

}

It’s easy to see brittle tests and think, “Whoever wrote this did the wrong thing! Why are these tests so bad?” But it’s far better to see that these brittle failures are a signal indicating where the available testing APIs are missing, under-advertised, or need attention.

Brittleness may indicate that the original test author didn’t have access to (or didn’t know about) test APIs that could more effectively identify the salient properties that the test meant to enforce. Without the right tools, it’s too easy to write tests that depend on irrelevant details, making those tests brittle. 

If your tests are brittle, look for ways to narrow down golden diff tests that compare exact pixel layouts or log outputs. Discover and learn more expressive APIs. File feature requests with the owners of the upstream systems.

If you maintain infrastructure libraries and can’t make changes because of brittleness, think about what your users are lacking, and invest in expressive test APIs.