Category Archives: Google Testing Blog

If it ain’t broke, you’re not trying hard enough

Tech on the Toilet: Driving Software Excellence, One Bathroom Break at a Time

By Kanu Tewary and Andrew Trenk


Tech on the Toilet (TotT) is a weekly one-page publication about software development that is posted in bathrooms in Google offices worldwide.  At Google, TotT is a trusted source for high quality technical content and software engineering best practices. TotT episodes relevant outside Google are posted to this blog.


We have been posting TotT to this blog since 2007. We're excited to announce that Testing on the Toilet has been renamed to Tech on the Toilet. TotT originally covered only software testing topics, but for many years has been covering any topics relevant to software development, such as coding practices, machine learning, web development, and more.


A Cultural Institution


TotT is a grassroots effort with a mission to deliver easily-digestable one-pagers on software development to engineers in the most unexpected of places: bathroom stalls! But TotT is more than just bathroom reading -- it's a movement. Driven by a team of 20-percent volunteers, TotT empowers Google employees to learn and grow, fostering a culture of excellence within the Google engineering community.



Photo of TotT posted in a bathroom stallPhoto of TotT posted in a bathroom stall

Photos of TotT posted in bathroom stalls at Google.


Anyone at Google can author a TotT episode (regardless of tenure or seniority). Each episode is carefully curated and edited to provide concise, actionable, authoritative information about software best practices and developer tools. After an episode is published, it is posted to Google bathrooms around the world, and is also available to read online internally at Google. TotT episodes often become a canonical source for helping far-flung teams standardize their software development tools and practices.  


Because Every Superhero Has An Origin Story 

TotT began as a bottom-up approach to drive a culture change. The year was 2006 and Google was experiencing rapid growth and huge challenges: there were many costly bugs and rolled-back releases. A small group of engineers, members of the so-called Testing Grouplet, passionate about testing, brainstormed about how to instill a culture of software testing at Google. In a moment of levity, someone suggested posting flyers in restrooms (since people have time to read there, clearly!). The Testing Grouplet named their new publication Testing on the Toilet. TotT’s red lightbub, green lightbulb logo–displayed at the top of the page of each printed flyer–was adapted from the Testing Grouplet’s logo.   

The TotT logo

The TotT logo.

The first TotT episode, a simple code example with a suggested improvement, was written by an engineer at Google headquarters in Mountain View, and posted by a volunteer in Google bathrooms in London. Soon other engineers wrote episodes, and an army of volunteers started posting those episodes at their sites. Hundreds of engineers started encountering TotT episodes.

The initial response was a mix of surprise and intrigue, with some engineers even expressing outrage at the "violation" of their bathroom sanctuary. However, the majority of feedback was positive, with many appreciating the readily accessible knowledge. Learn more about the history of TotT in this blog post by one of the original members of the Testing Grouplet.  

Trusted, Concise, Actionable

TotT has become an authoritative source for software development best practices at Google. Many episodes, like the following popular episodes at Google, are cited hundreds of times in code reviews and other internal documents: 

A 2019 research paper presented at the International Conference of Software Engineering even analyzed the impact of TotT episodes on the adoption of internal tools and infrastructure, demonstrating its effectiveness in driving positive change.

TotT has inspired various other publications at Google, like Learning on the Loo: non-technical articles to improve efficiency, reduce stress and improve work satisfaction. Other companies have been inspired to create their own bathroom publications, thanks to TotT. So the next time you find yourself reading a TotT episode, take a moment to appreciate its humble bathroom beginnings. After all, where better to ponder the mysteries of the code than in a place of quiet contemplation?

SMURF: Beyond the Test Pyramid

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Adam Bender

The test pyramid is the canonical heuristic for guiding test suite evolution. It conveys a simple message - prefer more unit tests than integration tests, and prefer more integration tests than end-to-end tests.

A diagram of the test pyramid

While useful, the test pyramid lacks the details you need as your test suite grows and you face challenging trade-offs. To scale your test suite, go beyond the test pyramid.

The SMURF mnemonic is an easy way to remember the tradeoffs to consider when balancing your test suite:

  • Speed: Unit tests are faster than other test types and can be run more often—you’ll catch problems sooner.

  • Maintainability: The aggregated cost of debugging and maintaining tests (of all types) adds up quickly. A larger system under test has more code, and thus greater exposure to dependency churn and requirement drift which, in turn, creates more maintenance work.  

  • Utilization: Tests that use fewer resources (memory, disk, CPU) cost less to run. A good test suite optimizes resource utilization so that it does not grow super-linearly with the number of tests. Unit tests usually have better utilization characteristics, often because they use test doubles or only involve limited parts of a system. 

  • Reliability: Reliable tests only fail when an actual problem has been discovered. Sorting through flaky tests for problems wastes developer time and costs resources in rerunning the tests. As the size of a system and its corresponding tests grow, non-determinism (and thus, flakiness) creeps in, and your test suite is more likely to become unreliable.

  • Fidelity: High-fidelity tests come closer to approximating real operating conditions (e.g., real databases or traffic loads) and better predict the behavior of our production systems. Integration and end-to-end tests can better reflect realistic conditions, while unit tests have to simulate the environment, which can lead to drift between test expectations and reality.

A radar chart depicting the relationship between SMURF attributes as applied to unit, integration, and end-to-end tests. Unit tests perform best on all attributes except fidelity, where they are the worst. Integration tests are mid-way performers on all aspects. End-to-end tests are worst on all aspects, except fidelity where they are the best.

A radar chart  of Test Type vs. Test Property (i.e. SMURF). Farther from center is better. 


In many cases, the relationships between the SMURF dimensions are in tension: improving one dimension can affect the others. However, if you can improve one or more dimensions of a test without harming the others, then you should do so. When thinking about the types of your tests (unit, integration, end-to-end), your choices have meaningful implications for your test suite’s cost and the value it provides.



Write Change-Resilient Code With Domain Objects

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Amy Fu

Although a product's requirements can change often, its fundamental ideas usually change slowly. This leads to an interesting insight: if we write code that matches the fundamental ideas of the product, it will be more likely to survive future product changes.

Domain objects are building blocks (such as classes and interfaces) in our code that match the fundamental ideas of the product. Instead of writing code to match the desired behavior for the product's requirements ("configure text to be white"), we match the underlying idea ("text color settings").

For example, imagine you’re part of the gPizza team, which sells tasty, fresh pizzas to feed hungry Googlers. Due to popular demand, your team has decided to add a delivery service.

Without domain objects, the quickest path to pizza delivery is to simply create a deliverPizza method:

public class DeliveryService {

  public void deliverPizza(List<Pizza> pizzas) { ... }

}

Although this works well at first, what happens if gPizza expands its offerings to other foods?
You could add a new method:

  public void deliverWithDrinks(List<Pizza> pizzas, List<Drink> drinks) { ... }

But as your list of requirements grows (snacks, sweets, etc.), you’ll be stuck adding more and more methods. How can you change your initial implementation to avoid this continued maintenance burden?

You could add a domain object that models the product's ideas, instead of its requirements:

  • A use case is a specific behavior that helps the product satisfy its business requirements.
    (In this case, "Deliver pizzas so we make more money".)

  • A domain object represents a common idea that is shared by several similar use cases.

To identify the appropriate domain object, ask yourself:

  1. What related use cases does the product support, and what do we plan to support in future?

A: gPizza wants to deliver pizzas now, and eventually other products such as drinks and snacks.

  1. What common idea do these use cases share?

A: gPizza wants to send the customer the food they ordered.

  1. What is a domain object we can use to represent this common idea?

A: The domain object is a food order. We can encapsulate the use cases in a FoodOrder class.

Domain objects can be a useful generalization - but avoid choosing objects that are too generic, since there is a tradeoff between improved maintainability and more complex, ambiguous code. Generally, aim to support only planned use cases - not all possible use cases (see YAGNI principles).

// GOOD: It's clear what we're delivering.

public void deliver(FoodOrder order) {}

// BAD: Don't support furniture delivery.

public void deliver(DeliveryList items) {}

Learn more about domain objects and the more advanced topic of domain-driven design in the book Domain-Driven Design by Eric Evans.

Less Is More: Principles for Simple Comments

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By David Bendory

Simplicity is the ultimate sophistication. — Leonardo da Vinci

You’re staring at a wall of code resembling a Gordian knot of Klingon. What’s making it worse? A sea of code comments so long that you’d need a bathroom break just to read them all! Let’s fix that.

  • Adopt the mindset of someone unfamiliar with the project to ensure simplicity. One approach is to separate the process of writing your comments from reviewing them; proofreading your comments without code context in mind helps ensure they are clear and concise for future readers.

  • Use self-contained comments to clearly convey intent without relying on the surrounding code for context. If you need to read the code to understand the comment, you’ve got it backwards!

Not self-contained; requires reading the code

Suggested alternative

// Respond to flashing lights in // rearview mirror.

// Pull over for police and/or yield to

// emergency vehicles.

while flashing_lights_in_rearview_mirror() {

  move_to_slower_lane() || stop_on_shoulder();

}

  • Include only essential information in the comments and leverage external references to reduce cognitive load on the reader. For comments suggesting improvements, links to relevant bugs or docs keep comments concise while providing a path for follow-up. Note that linked docs may be inaccessible, so use judgment in deciding how much context to include directly in the comments.

Too much potential improvement in the comment

Suggested alternative

// The local bus offers good average- // case performance. Consider using // the subway which may be faster

// depending on factors like time of // day, weather, etc.

// TODO: Consider various factors to // present the best transit option.

// See issuetracker.fake/bus-vs-subway

commute_by_local_bus();

  • Avoid extensive implementation details in function-level comments. When implementations change, such details often result in outdated comments. Instead, describe the public API contract, focusing on what the function does.

Too much implementation detail

Suggested alternative

// For high-traffic intersections // prone to accidents, pass through // the intersection and make 3 right // turns, which is equivalent to // turning left.

// Perform a safe left turn at a

// high-traffic intersection.

// See discussion in

// dangerous-left-turns.fake/about.

fn safe_turn_left() {

  go_straight();

  for i in 0..3 {

    turn_right();

  }

}


In Praise of Small Pull Requests

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.


By Elliotte Rusty Harold

Note: A “pull request” refers to one self-contained change that has been submitted to version control or which is undergoing code review. At Google, this is referred to as a“CL”, which is short for “changelist”.

Prefer small, focused pull requests that do exactly one thing each. Why? Several reasons:
  • Small pull requests are easier to review. A mistake in a focused pull request is more obvious. In a 40 file pull request that does several things, would you notice that one if statement had reversed the logic it should have and was using true instead of false? By contrast, if that if block and its test were the only things that changed in a pull request, you’d be a lot more likely to catch the error.

  • Small pull requests can be reviewed quickly. A reviewer can often respond quickly by slipping small reviews in between other tasks. Larger pull requests are a big task by themselves, often waiting until the reviewer has a significant chunk of time.

  • If something does go wrong and your continuous build breaks on a small pull request, the small size makes it much easier to figure out exactly where the mistake is. They are also easier to rollback if something goes wrong.

  • By virtue of their size, small pull requests are less likely to conflict with other developers’ work. Merge conflicts are less frequent and easier to resolve.

  • If you’ve made a critical error, it saves a lot of work when the reviewer can point this out after you’ve only gone a little way down the wrong path. Better to find out after an hour than after several weeks.

  • Pull request descriptions are more accurate when pull requests are focused on one task. The revision history becomes easier to read.

  • Small pull requests can lead to increased code coverage because it’s easier to make sure each individual pull request is completely tested.

Small pull requests are not always possible. In particular:

  • Frequent pull requests require reviewers to respond quickly to code review requests. If it takes multiple hours to get a pull request reviewed, developers spend more time blocked. Small pull requests often work better when reviewers are co-located (ideally within Nerf gun range for gentle reminders). 

  • Some features cannot safely be committed in partial states. If this is a concern, try to put the new feature behind a flag.

  • Refactorings such as changing an argument type in a public method may require modifying many dozens of files at once.

Nonetheless, even if a pull request can’t be small, it can still be focused, e.g., fixing one bug, adding one feature or UI element, or refactoring one method.

Don’t DRY Your Code Prematurely

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Dan Maksimovich

Many of us have been told the virtues of “Don’t Repeat Yourself” or DRY. Pause and consider: Is the duplication truly redundant or will the functionality need to evolve independently over timeApplying DRY principles too rigidly leads to premature abstractions that make future changes more complex than necessary. 

Consider carefully if code is truly redundant or just superficially similar.  While functions or classes may look the same, they may also serve different contexts and business requirements that evolve differently over time. Think about how the functions’ purpose holds with time, not just about making the code shorter. When designing abstractions, do not prematurely couple behaviors that may evolve separately in the longer term.

When does introducing an abstraction harm our code? Let’s consider the following code: 

# Premature DRY abstraction assuming # uniform rules, limiting entity-

# specific changes.

class DeadlineSetter:

 def __init__(self, entity_type):

  self.entity_type = entity_type


 def set_deadline(self, deadline):

   if deadline <= datetime.now():

    raise ValueError(

      “Date must be in the future”)

task = DeadlineSetter(“task”)

task.set_deadline(

datetime(2024, 3, 12))

payment = DeadlineSetter(“payment”)

payment.set_deadline(

datetime(2024, 3, 18))

# Repetitive but allows for clear,

# entity-specific logic and future

# changes.

def set_task_deadline(task_deadline):

  if task_deadline <= datetime.now():

raise ValueError(

    “Date must be in the future”)

def set_payment_deadline( payment_deadline):

  if payment_deadline <= datetime.now():

    raise ValueError(

    “Date must be in the future”)

set_task_deadline(

datetime(2024, 3, 12))

set_payment_deadline(

datetime(2024, 3, 18))

The approach on the right seems to violate the DRY principle since the ValueError checks are coincidentally the same.  However, tasks and payments represent distinct concepts with potentially diverging logic. If payment date later required a new validation, you could easily add it to the right-hand code; adding it to the left-hand code is much more invasive.

When in doubt, keep behaviors separate until enough common patterns emerge over time that justify the coupling. On a small scale, managing duplication can be simpler than resolving a premature abstraction’s complexity. In early stages of development, tolerate a little duplication and wait to abstract. 

Future requirements are often unpredictable. Think about the “You Aren’t Gonna Need It” or YAGNI principle. Either the duplication will prove to be a nonissue, or with time, it will clearly indicate the need for a well-considered abstraction.


Don’t DRY Your Code Prematurely

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Dan Maksimovich

Many of us have been told the virtues of “Don’t Repeat Yourself” or DRY. Pause and consider: Is the duplication truly redundant or will the functionality need to evolve independently over timeApplying DRY principles too rigidly leads to premature abstractions that make future changes more complex than necessary. 

Consider carefully if code is truly redundant or just superficially similar.  While functions or classes may look the same, they may also serve different contexts and business requirements that evolve differently over time. Think about how the functions’ purpose holds with time, not just about making the code shorter. When designing abstractions, do not prematurely couple behaviors that may evolve separately in the longer term.

When does introducing an abstraction harm our code? Let’s consider the following code: 

# Premature DRY abstraction assuming # uniform rules, limiting entity-

# specific changes.

class DeadlineSetter:

 def __init__(self, entity_type):

  self.entity_type = entity_type


 def set_deadline(self, deadline):

   if deadline <= datetime.now():

    raise ValueError(

      “Date must be in the future”)

task = DeadlineSetter(“task”)

task.set_deadline(

datetime(2024, 3, 12))

payment = DeadlineSetter(“payment”)

payment.set_deadline(

datetime(2024, 3, 18))

# Repetitive but allows for clear,

# entity-specific logic and future

# changes.

def set_task_deadline(task_deadline):

  if task_deadline <= datetime.now():

raise ValueError(

    “Date must be in the future”)

def set_payment_deadline( payment_deadline):

  if payment_deadline <= datetime.now():

    raise ValueError(

    “Date must be in the future”)

set_task_deadline(

datetime(2024, 3, 12))

set_payment_deadline(

datetime(2024, 3, 18))

The approach on the right seems to violate the DRY principle since the ValueError checks are coincidentally the same.  However, tasks and payments represent distinct concepts with potentially diverging logic. If payment date later required a new validation, you could easily add it to the right-hand code; adding it to the left-hand code is much more invasive.

When in doubt, keep behaviors separate until enough common patterns emerge over time that justify the coupling. On a small scale, managing duplication can be simpler than resolving a premature abstraction’s complexity. In early stages of development, tolerate a little duplication and wait to abstract. 

Future requirements are often unpredictable. Think about the “You Aren’t Gonna Need It” or YAGNI principle. Either the duplication will prove to be a nonissue, or with time, it will clearly indicate the need for a well-considered abstraction.


Avoid the Long Parameter List

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Gene Volovich

Have you seen code like this?

void transform(String fileIn, String fileOut, String separatorIn, String separatorOut);

This seems simple enough, but it can be difficult to remember the parameter ordering. It gets worse if you add more parameters (e.g., to specify the encoding, or to email the resulting file):

void transform(String fileIn, String fileOut, String separatorIn, String separatorOut,

    String encoding, String mailTo, String mailSubject, String mailTemplate);

To make the change, will you add another (overloaded) transform method? Or add more parameters to the existing method, and update every single call to transform? Neither seems satisfactory.

One solution is to encapsulate groups of the parameters into meaningful objects. The CsvFile class used here is a “value object” simply a holder for the data.

class CsvFile {

  CsvFile(String filename, String separator, String encoding) { ... }

  String filename() { return filename; }

  String separator() { return separator; }

  String encoding() { return encoding; }

} // ... and do the same for the EmailMessage class

void transform(CsvFile src, CsvFile target, EmailMessage resultMsg) { ... }

How to define a value object varies by language. For example, in Java, you can use a record class, which is available in Java 16+ (for older versions of Java, you can use AutoValue to generate code for the value object); in Kotlin, you can use a data class; in C++, you can use an option struct.

Using a value object this way may still result in a long parameter list when instantiating it. Solutions for this vary by language. For example, in Python, you can use keyword arguments and default parameter values to shorten the parameter list; in Java, one option is to use the Builder pattern, which lets you call a separate function to set each field, and allows you to skip setting fields that have default values.

CsvFile src = CsvFile.builder().setFilename("a.txt").setSeparator(":").build();

CsvFile target = CsvFile.builder().setFilename("b.txt").setEncoding(UTF_8).build();

EmailMessage msg = 

    EmailMessage.builder().setMailTo(rcpt).setMailTemplate("template").build();

transform(src, target, msg);

Always try to group data that belongs together and break up long, complicated parameter lists. The result will be code that is easier to read and maintain, and harder to make mistakes with. 

Test Failures Should Be Actionable

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Titus Winters

There are a lot of rules and best practices around unit testing. There are many posts on this blog; there is deeper material in the Software Engineering at Google book; there is specific guidance for every major language; there is guidance on test frameworks, test naming, and dozens of other test-related topics. Isn’t this excessive?

Good unit tests contain several important properties, but you could focus on a key principle: Test failures should be actionable.

When a test fails, you should be able to begin investigation with nothing more than the test’s name and its failure messages—no need to add more information and rerun the test.

Effective use of unit test frameworks and assertion libraries (JUnit, Truth, pytest, GoogleTest, etc.) serves two important purposes. Firstly, the more precisely we express the invariants we are testing, the more informative and less brittle our tests will be. Secondly, when those invariants don’t hold and the tests fail, the failure info should be immediately actionable. This meshes well with Site Reliability Engineering guidance on alerting.

Consider this example of a C++ unit test of a function returning an absl::Status (an Abseil type that returns either an “OK” status or one of a number of different error codes):

EXPECT_TRUE(LoadMetadata().ok());

EXPECT_OK(LoadMetadata());

Sample failure output

load_metadata_test.cc:42: Failure

Value of: LoadMetadata().ok()

Expected: true

Actual: false

load_metadata_test.cc:42: Failure

Value of: LoadMetadata()

Expected: is OK

Actual: NOT_FOUND: /path/to/metadata.bin

If the test on the left fails, you have to investigate why the test failed; the test on the right immediately gives you all the available detail, in this case because of a more precise GoogleTest matcher.

Here are some other posts on this blog that emphasize making test failures actionable:

  • Writing Descriptive Test Names - If our tests are narrow and sufficiently descriptive, the test name itself may give us enough information to start debugging.

  • Keep Tests Focused - If we test multiple scenarios in a single test, it’s hard to identify  exactly what went wrong.

  • Prefer Narrow Assertions in Unit Tests - If we have overly wide assertions (such as  depending on every field of a complex output proto), the test may fail for many unimportant reasons. False positives are the opposite of actionable.

  • Keep Cause and Effect Clear - Refrain from using large global test data structures shared across multiple unit tests, allowing for clear identification of each test’s setup.

isBooleanTooLongAndComplex

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Yiming Sun

You may have come across some complex, hard-to-read Boolean expressions in your codebase and wished they were easier to understand. For example, let's say we want to decide whether a pizza is fantastic:

// Decide whether this pizza is fantastic.

if ((!pepperoniService.empty() || sausages.size() > 0)

    && (useOnionFlag.get() || hasMushroom(ENOKI, PORTOBELLO)) && hasCheese()) {

  ...

}

A first step toward improving this is to extract the condition into a well-named variable:

boolean isPizzaFantastic

    (!pepperoniService.empty() || sausages.size() > 0)

    && (useOnionFlag.get() || hasMushroom(ENOKI, PORTOBELLO)) && hasCheese();

if (isPizzaFantastic) {

  ... 

}

However, the Boolean expression is still too complex. It's potentially confusing to calculate the value of isPizzaFantastic from a given set of inputs. You might need to grab a pen and paper, or start a server locally and set breakpoints. 

Instead, try to group the details into intermediate Booleans that provide meaningful abstractions. Each Boolean below represents a single well-defined quality, and you no longer need to mix && and || within an expression. Without changing the business logic, you’ve made it easier to see how the Booleans relate to each other:

boolean hasGoodMeat = !pepperoniService.empty() || sausages.size() > 0;

boolean hasGoodVeggies = useOnionFlag.get() || hasMushroom(ENOKI, PORTOBELLO);

boolean isPizzaFantastic = hasGoodMeat && hasGoodVeggies && hasCheese();

Another option is to hide the logic in a separate method. This also offers the possibility of early returns using guard clauses, further reducing the need to keep track of intermediate states:

boolean isPizzaFantastic() {

  if (!hasCheese()) {

    return false;

  }

  if (pepperoniService.empty() && sausages.size() == 0) {

    return false;

  }

  return useOnionFlag.get() || hasMushroom(ENOKI, PORTOBELLO);
}