Category Archives: Google Testing Blog

If it ain’t broke, you’re not trying hard enough

How I Learned To Stop Writing Brittle Tests and Love Expressive APIs

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Titus Winters

A valuable but challenging property for tests is “resilience,” meaning a test should only fail when something important has gone wrong. However, an opposite property may be easier to see: A “brittle” test is one that fails not for real problems that would break in production, but because the test itself is fragile for innocuous reasons. Error messages, changing the order of metadata headers in a web request, or the order of calls to a heavily-mocked dependency can often cause a brittle test to fail.

Expressive test APIs are a powerful tool in the fight against brittle, implementation-detail heavy tests. A test written with IsSquare(output) is more expressive (and less brittle) than a test written with details such as JsonEquals(.width = 42, .length = 42), in cases where the size of the square is irrelevant. Similar expressive designs might include unordered element matching for hash containers, metadata comparisons for photos, and activity logs in processing objects, just to name a few. 

As an example, consider this C++ test code:

absl::flat_hash_set<int> GetValuesFromConfig(const Config&);


TEST(ConfigValues, DefaultConfigsArePrime) {

  // Note the strange order of these values. BAD CODE, DON’T DO THIS!

  EXPECT_THAT(GetValuesFromConfig(Config()), ElementsAre(29, 17, 31));

}

The reliance on hash ordering makes this test brittle, preventing improvements to the API being tested. A critical part of the fix to the above code was to provide better test APIs that allowed engineers to more effectively express the properties that mattered. Thus we added UnorderedElementsAre to the GoogleTest test framework and refactored brittle tests to use that: 

TEST(ConfigValues, DefaultConfigsArePrimeAndOrderDoesNotMatter) {

  EXPECT_THAT(GetValuesFromConfig(Config()), UnorderedElementsAre(17, 29, 31));

}

It’s easy to see brittle tests and think, “Whoever wrote this did the wrong thing! Why are these tests so bad?” But it’s far better to see that these brittle failures are a signal indicating where the available testing APIs are missing, under-advertised, or need attention.

Brittleness may indicate that the original test author didn’t have access to (or didn’t know about) test APIs that could more effectively identify the salient properties that the test meant to enforce. Without the right tools, it’s too easy to write tests that depend on irrelevant details, making those tests brittle. 

If your tests are brittle, look for ways to narrow down golden diff tests that compare exact pixel layouts or log outputs. Discover and learn more expressive APIs. File feature requests with the owners of the upstream systems.

If you maintain infrastructure libraries and can’t make changes because of brittleness, think about what your users are lacking, and invest in expressive test APIs.




Prefer Narrow Assertions in Unit Tests

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

by Kai Kent

Your project is adding a loyalty promotion feature, so you add a new column CREATION_DATE to the ACCOUNT table. Suddenly the test below starts failing. Can you spot the problem?

TEST_F(AccountTest, UpdatesBalanceAfterWithdrawal) {

  ASSERT_OK_AND_ASSIGN(Account account,

                       database.CreateNewAccount(/*initial_balance=*/5000));

  ASSERT_OK(account.Withdraw(3000));

  const Account kExpected = { .balance = 2000, /* a handful of other fields */ };

  EXPECT_EQ(account, kExpected);

}

You forgot to update the test for the newly added column; but the test also has an underlying problem:

It checks for full equality of a potentially complex object, and thus implicitly tests unrelated behaviors. Changing anything in Account, such as adding or removing a field, will cause all the tests with a similar pattern to fail. Broad assertions are an easy way to accidentally create brittle tests  - tests that fail when anything about the system changes, and need frequent fixing even though they aren't finding real bugs.

Instead, the test should use narrow assertions that only check the relevant behavior. The example test should be updated to only check the relevant field account.balance:

TEST_F(AccountTest, UpdatesBalanceAfterWithdrawal) {

  ASSERT_OK_AND_ASSIGN(Account account,

                       database.CreateNewAccount(/*initial_balance=*/5000));

  ASSERT_OK(account.Withdraw(3000));

  EXPECT_EQ(account.balance, 2000);

}

Broad assertions should only be used for unit tests that care about all of the implicitly tested behaviors, which should be a small minority of unit tests. Prefer to have at most one such test that checks for full equality of a complex object for the common case, and use narrow assertions for all other cases.

Similarly, when writing frontend unit tests, use one screenshot diff test to verify the layout of your UI, but test individual behaviors with narrow DOM assertions.

For testing large protocol buffers, some languages provide libraries for verifying a subset of proto fields in a single assertion, such as:

Prefer Narrow Assertions in Unit Tests

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

by Kai Kent

Your project is adding a loyalty promotion feature, so you add a new column CREATION_DATE to the ACCOUNT table. Suddenly the test below starts failing. Can you spot the problem?

TEST_F(AccountTest, UpdatesBalanceAfterWithdrawal) {

  ASSERT_OK_AND_ASSIGN(Account account,

                       database.CreateNewAccount(/*initial_balance=*/5000));

  ASSERT_OK(account.Withdraw(3000));

  const Account kExpected = { .balance = 2000, /* a handful of other fields */ };

  EXPECT_EQ(account, kExpected);

}

You forgot to update the test for the newly added column; but the test also has an underlying problem:

It checks for full equality of a potentially complex object, and thus implicitly tests unrelated behaviors. Changing anything in Account, such as adding or removing a field, will cause all the tests with a similar pattern to fail. Broad assertions are an easy way to accidentally create brittle tests  - tests that fail when anything about the system changes, and need frequent fixing even though they aren't finding real bugs.

Instead, the test should use narrow assertions that only check the relevant behavior. The example test should be updated to only check the relevant field account.balance:

TEST_F(AccountTest, UpdatesBalanceAfterWithdrawal) {

  ASSERT_OK_AND_ASSIGN(Account account,

                       database.CreateNewAccount(/*initial_balance=*/5000));

  ASSERT_OK(account.Withdraw(3000));

  EXPECT_EQ(account.balance, 2000);

}

Broad assertions should only be used for unit tests that care about all of the implicitly tested behaviors, which should be a small minority of unit tests. Prefer to have at most one such test that checks for full equality of a complex object for the common case, and use narrow assertions for all other cases.

Similarly, when writing frontend unit tests, use one screenshot diff test to verify the layout of your UI, but test individual behaviors with narrow DOM assertions.

For testing large protocol buffers, some languages provide libraries for verifying a subset of proto fields in a single assertion, such as:

What’s in a Name?

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

by Adam Raider


“There are only two hard things in computer science: cache invalidation and naming things.” —Phil Karlton

Have you ever read an identifier only to realize later it doesn’t do what you expected? Or had to read the implementation in order to understand an interface? These indirections eat up our cognitive bandwidth and make our work more difficult. We spend far more time reading code than we do writing it; thoughtful names can save the reader (and writer) a lot of time and frustration. Here are some naming tips:

  • Spend time considering names—it’s worth it. Don’t default to the first name that comes to mind. The more public the name, the more expensive it is to change. Past a certain scale, names become infeasible to change, especially for APIs. Pay attention to a name in proportion to the cost of renaming it later. If you’re feeling stuck, consider running a new name by a teammate.

  • Describe behavior. Encourage naming based on what functions do rather than when the functions are called. Avoid prefixes like “handle” or “on” as they describe when and provide no added meaning:

button.listen('click', handleClick)

button.listen('click', addItemToCart)

  • Reveal intent with a contextually appropriate level of abstraction

    • High-abstraction functions describe the what and operate on high-level types.

    • Lower-abstraction functions describe the how and operate on lower-level types.

For example, logout might call into clearUserToken, and recordWithCamera might call into parseStreamBytes.

  • Prefer unique, precise names. Are you frequently asking for the UserManager? Manager, Util, and similar suffixes are a common but imprecise naming convention. What does it do? It manages! If you’re struggling to come up with a more precise name, consider splitting the class into smaller ones. 

  • Balance clarity and conciseness—use abbreviations with care. Commonly used abbreviations, such as HTML, i18n, and RPC, can aid communication but less-known ones can confuse your average readers. Ask yourself, “Will my readers immediately understand this label? Will a reader five years from now understand it?” 

  • Avoid repetition and filler words. Or in other words, don’t say the same thing twice. It adds unnecessary visual noise:

userData.userBirthdayDate

user.birthDate

  • Software changes—names should, too. If you see an identifier that doesn’t aptly describe itself—fix it!

Learn more about identifier naming in this post: IdentifierNamingPostForWorldWideWebBlog.



Increase Test Fidelity By Avoiding Mocks

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

by Andrew Trenk and Dillon Bly

Replacing your code’s dependencies with mocks can make unit tests easier to write and faster to run. However, among other problems, using mocks can lead to tests that are less effective at catching bugs.

The fidelity of a test refers to how closely the behavior of the test resembles the behavior of the production code. A test with higher fidelity gives you higher confidence that your code will work properly. 

When specifying a dependency to use in a test, prefer the highest-fidelity option. Learn more in the Test Doubles chapter of the Software Engineering at Google book.

  1. Try to use the real implementation. This provides the most fidelity, because the code in the implementation will be executed in the test. There may be tradeoffs when using a real implementation: they can be slow, non-deterministic, or difficult to instantiate (e.g., it connects to an external server). Use your judgment to decide if a real implementation is the right choice.
  2. Use a fake if you can’t use the real implementation. A fake is a lightweight implementation of an API that behaves similarly to the real implementation, e.g., an in-memory database. A fake ensures a test has high fidelity, but takes effort to write and maintain; e.g., it needs its own tests to ensure that it conforms to the behavior of the real implementation. Typically, the owner of the real implementation creates and maintains the fake.
  3. Use a mock if you can’t use the real implementation or a fake. A mock reduces fidelity, since it doesn’t execute any of the actual implementation of a dependency; its behavior is specified inline in a test (a technique known as stubbing), so it may diverge from the behavior of the real implementation. Mocks provide a basic level of confidence that your code works properly, and can be especially useful when testing a code path that is hard to trigger (e.g., an error condition such as a timeout).
    (Note: Although “mocks” are objects created using mocking frameworks such as Mockito or unittest.mock, the same problems will occur if you manually create your own implementation within tests.)

A low-fidelity test: Dependencies are replaced with mocks. Try to avoid this.

A high-fidelity test: Dependencies use real implementations or fakes. Prefer this.

@Mock OrderValidator validator;

@Mock PaymentProcessor processor;

...


ShoppingCart cart =

new ShoppingCart(

validator, processor);

OrderValidator validator =

createValidator();

PaymentProcessor processor =

new FakeProcessor();


...


ShoppingCart cart =

    new ShoppingCart(

validator, processor);

Aim for as much fidelity as you can achieve without increasing the size of a test. At Google, tests are classified by size. Most tests should be small: they must run in a single process and must not wait on a system or event outside of their process. Increasing the fidelity of a small test is often a good choice if the test stays within these constraints. A healthy test suite also includes medium and large tests, which have higher fidelity since they can use heavyweight dependencies that aren’t feasible to use in small tests, e.g., dependencies that increase execution times or call other processes.

Let Code Speak for Itself

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

by Shiva Garg and Francois Aube

Comments can be invaluable for understanding and maintaining a code base.  But excessive comments in code can become unhelpful clutter full of extraneous and/or outdated detail.

Comments that offer useless (or worse, obsolete) information hurt readability. Here are some tips to let your code speak for itself: 

  • Write comments to explain the “why” behind a certain approach in code. The comment below has two good reasons to exist: documenting non-obvious behavior and answering a question that a reader is likely to have (i.e. why doesn’t this code render directly on the screen?):

// Eliminate flickering by rendering the next frame off-screen and swapping into the

// visible buffer.

RenderOffScreen();

SwapBuffers();

  • Use well-named identifiers to guide the reader and reduce the need for comments:

// Payout should not happen if the user is

// in an ineligible country.

std::unordered_set<std::string> ineligible =

  {"Atlantis", "Utopia"};

if (!ineligible.contains(country)) {

  Payout(user.user_id);

}

if (IsCountryEligibleForPayout(country)) { Payout(user.user_id); }

  • Write function comments (a.k.a. API documentation) that describe intended meaning and purpose, not implementation details. Choose unambiguous function signatures that callers can use  without reading any documentation. Don’t explain inner details that could change without affecting the contract with the caller:

// Reads an input string containing either a

// number of milliseconds since epoch or an

// ISO 8601 date and time. Invokes the

// Sole, Laces, and ToeCap APIs, then

// returns an object representing the Shoe

// available then or nullptr if none were.

Shoe* ModelAvailableAt(char* time);

// Returns the Shoe that was available for

// purchase at `time`. If no model was

// available, throws a runtime_error.

Shoe ModelAvailableAt(time_t time);

  • Omit comments that state the obvious. Superfluous comments increase code maintenance when code gets refactored and don’t add value, only overhead to keep these comments current:

// Increment counter by 1.

counter++;

Learn more about writing good comments: To Comment or Not to Comment?, Best practices for writing code comments



Exceptional Exception Handling

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

by Yiming Sun

Have you ever seen huge exception-handling blocks? Here is an example. Let's assume we are calling bakePizza() to bake a pizza, and it can be overbaked, throwing a PizzaOverbakedException.

class PizzaOverbakedException extends Exception {};


void bakePizza () throws PizzaOverbakedException {};


try {

  // 100+ lines of code to prepare pizza ingredients.

  ...

  bakePizza();

  // Another 100+ lines of code to deliver pizza to a customer.

  ...

} catch (Exception e) {

  throw new IllegalStateException(); // Root cause ignored while throwing new exception.

}

Here are the problems with the above code:

  • Obscuring the logic. The method bakePizza(), is obscured by the additional lines of code of preparation and delivery, so unintended exceptions from preparation and delivery may be caught.
  • Catching the general exception. catch (Exception e) will catch everything, despite that we might only want to handle PizzaOverbakedException here.
  • Rethrowing a general exception, with the original exception ignored. This means that the root cause is lost - we don't know what exactly goes wrong with pizza baking while debugging.

Here is a better alternative, rewritten to avoid the problems above.

class PizzaOverbakedException extends Exception {};


void bakePizza () throws PizzaOverbakedException {};


// 100+ lines of code to prepare pizza ingredients.

...

try {

  bakePizza();

} catch (PizzaOverbakedException e) {  // Other exceptions won’t be caught.

  // Rethrow a more meaningful exception; so that we know pizza is overbaked.

  throw new IllegalStateException(“You burned the pizza!”, e);  

}

// Another 100+ lines of code to deliver pizza to a customer.

...

Clean Up Code Cruft

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Per Jacobsson

The book Clean Code discusses a camping rule that is good to keep in the back of your mind when writing code:


Leave the campground cleaner than you found it


So how does that fit into software development? The thinking is this: When you make changes to code that can potentially be improved, try to make it just a little bit better.

This doesn't necessarily mean you have to go out of your way to do huge refactorings. Changing something small can go a long way:

  • Rename a variable to something more descriptive. 

  • Break apart a huge function into a few logical pieces.

  • Fix a lint warning.

  • Bring an outdated comment up to date.

  • Extract duplicated lines to a function.

  • Write a unit test for an untested function.

  • Whatever other itch you feel like scratching.

Cleaning up the small things often makes it easier to see and fix the bigger issues.

But what about "If it's not broken, don't fix it"? Changing code can be risky, right? There's no obvious rule, but if you're always afraid to change your code, you have bigger problems. Cruft in code that is actively being changed is like credit card debt. Either you pay it off, or you eventually go bankrupt.  

Unit tests help mitigate the risk of changing code. When you're doing cleanup work, be sure there are unit tests for the things you're about to change. This may mean writing a few new ones yourself.

If you’re working on a change and end up doing some minor cleanup, you can often include these cleanups in the same change. Be careful to not distract your code reviewer by adding too many unrelated cleanups. An option that works well is to send the cleanup fixes in multiple tiny changes that are small enough to just take a few seconds to review

As mentioned in the book: "Can you imagine working on a project where the code simply got better as time passed?"

“Clean Code: A Handbook of Agile Software Craftsmanship” by Robert C. Martin was published in 2008.

Write Clean Code to Reduce Cognitive Load

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Andrew Trenk

Do you ever read code and find it hard to understand? You may be experiencing cognitive load!

Cognitive load refers to the amount of mental effort required to complete a task. When reading code, you have to keep in mind information such as values of variables, conditional logic, loop indices, data structure state, and interface contracts. Cognitive load increases as code becomes more complex. People can typically hold up to 5–7 separate pieces of information in their short-term memory (source); code that involves more information than that can be difficult to understand.

Two brains displayed side-by-side. 

The left brain is red with a sad face. The text below it says 'Complex code: Too much cognitive load'.;

The left brain is green with a happy face. The text below it says 'Simple code: Minimal cognitive load'.

Cognitive load is often higher for other people reading code you wrote than it is for yourself, since readers need to understand your intentions. Think of the times you read someone else’s code and struggled to understand its behavior. One of the reasons for code reviews is to allow reviewers to check if the changes to the code cause too much cognitive load. Be kind to your co-workers: reduce their cognitive load by writing clean code.

The key to reducing cognitive load is to make code simpler so it can be understood more easily by readers. This is the principle behind many code health practices. Here are some examples:

  • Limit the amount of code in a function or file. Aim to keep the code concise enough that you can keep the whole thing in your head at once. Prefer to keep functions small, and try to limit each class to a single responsibility.

  • Create abstractions to hide implementation details. Abstractions such as functions and interfaces allow you to deal with simpler concepts and hide complex details. However, remember that over-engineering your code with too many abstractions also causes cognitive load.

  • Simplify control flow. Functions with too many if statements or loops can be hard to understand since it is difficult to keep the entire control flow in your head. Hide complex logic in helper functions, and reduce nesting by using early returns to handle special cases.

  • Minimize mutable state. Stateless code is simpler to understand. For example, avoid mutable class fields when possible, and make types immutable.

  • Include only relevant details in tests. A test can be hard to follow if it includes boilerplate test data that is irrelevant to the test case, or relevant test data is hidden in helper functions.

  • Don’t overuse mocks in tests. Improper use of mocks can lead to tests that are cluttered with calls that expose implementation details of the system under test.

Learn more about cognitive load in the book The Programmer’s Brain, by Felienne Hermans.

Include Only Relevant Details In Tests

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Dagang Wei

What problem in the code below makes the test hard to follow?

def test_get_balance(self):

  settings = BankSettings(FDIC_INSURED, REGULATED, US_BASED)

  account = Account(settings, ID, BALANCE, ADDRESS, NAME, EMAIL, PHONE)

  self.assertEqual(account.GetBalance(), BALANCE)

The problem is that there is a lot of noise in the account creation code, which makes it hard to tell which details are relevant to the assert statement. 

But going from one extreme to the other can also make the test hard to follow:

def test_get_balance(self):

  account = _create_account()

  self.assertEqual(account.GetBalance(), BALANCE)

Now the problem is that critical details are hidden in the _create_account() helper function, so it’s not obvious where the BALANCE field comes from. In order to understand the test, you need to switch context by diving into the helper function.

A good test should include only details relevant to the test, while hiding noise:

def test_get_balance():

  account = _create_account(BALANCE)

  self.assertEqual(account.GetBalance(), BALANCE)

By following this advice, it should be easy to see the flow of data throughout a test. For example:

Bad (flow of data is hidden):

Good (flow of data is clear):

def test_bank_account_overdraw_fails(self):

  account = _create_account()

  outcome = _overdraw(account)

  self._assert_withdraw_failed(

    outcome, account)


def _create_account():

  settings = BankSettings(...)

  return Account(settings, BALANCE, ...)


def _overdraw(account):

  # Boilerplate code

  ...

  return account.Withdraw(BALANCE + 1)


def _assert_withdraw_failed(

    self, outcome, account):

  self.assertEqual(outcome, FAILED)

  self.assertEqual(

    account.GetBalance(), BALANCE)

def test_bank_account_overdraw_fails(self):

  account = _create_account(BALANCE)

  outcome = _withdraw(account, BALANCE + 1)

  self.assertEqual(outcome, FAILED)

  self.assertEqual(

    account.GetBalance(), BALANCE)

def _create_account(balance):

  settings = BankSettings(...)

  return Account(settings, balance, ...)


def _withdraw(account, amount):

  # Boilerplate code

  ...

  return account.Withdraw(amount)