Author Archives: Google Testing Bloggers

Testing on the Toilet: Don’t Mock Types You Don’t Own

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Stefan Kennedy and Andrew Trenk

The code below mocks a third-party library. What problems can arise when doing this?

// Mock a salary payment library
@Mock SalaryProcessor mockSalaryProcessor;
@Mock TransactionStrategy mockTransactionStrategy;
MyPaymentService myPaymentService = new MyPaymentService(mockSalaryProcessor);

Mocking types you don’t own can make maintenance more difficult:
  • It can make it harder to upgrade the library to a new version: The expectations of an API hardcoded in a mock can be wrong or get out of date. This may require time-consuming work to manually update your tests when upgrading the library version. In the above example, an update that changes addStrategy() to return a new type derived from TransactionStrategy (e.g. SalaryStrategy) requires the mock to be updated to return this type, even though the code under test doesn’t need to be changed since it can still reference TransactionStrategy.
  • It can make it harder to know whether a library update introduced a bug in your code: The assumptions built into mocks may get out of date as changes are made to the library, resulting in tests that pass even when the code under test has a bug. In the above example, if a library update changes paySalary() to instead return TransactionStrategy.SCHEDULED, a bug could potentially be introduced due to the code under test not handling this return value properly. However, the maintainer wouldn’t know because the mock would not return this value so the test would continue to pass.
Instead of using a mock, use the real implementation, or if that’s not feasible, use a fake implementation that is ideally provided by the library owner. This reduces the maintenance burden since the issues with mocks listed above don’t occur when using a real or fake implementation. For example:
FakeSalaryProcessor fakeProcessor = new FakeSalaryProcessor(); // Designed for tests
MyPaymentService myPaymentService = new MyPaymentService(fakeProcessor);

If you can’t use the real implementation and a fake implementation doesn’t exist (and library owners aren’t able to create one), create a wrapper class that calls the type, and mock this instead. This reduces the maintenance burden by avoiding mocks that rely on the signatures of the library API. For example:

@Mock MySalaryProcessor mockMySalaryProcessor; // Wraps the SalaryProcessor library
// Mock the wrapper class rather than the library itself

MyPaymentService myPaymentService = new MyPaymentService(mockMySalaryProcessor);

To avoid the problems listed above, prefer to test the wrapper class with calls to the real implementation. The downsides of testing with the real implementation (e.g. tests taking longer to run) are limited only to the tests for this wrapper class rather than tests throughout your codebase.

“Don’t mock types you don’t own” is also described by Steve Freeman and Nat Pryce in their book, Growing Object Oriented Software, Guided by TestsFor more details about the downsides of overusing mocks (even for types you do own), see this Google Testing Blog post.

COOL to be a TE @ Google

By Anantha Keesara

Test Engineers are a part of Google’s Engineering Productivity (EngProd) Group. As mentioned in a previous post, we advocate for our users, provide comprehensive testing solutions, and play a key role creating successful and reliable products and platforms. At Google, Test Engineers are not manual testers; we are technical engineers whose focus is on advancing product excellence and engineering productivity.

In short, it’s COOL (Constant learner, Out-of-the-box thinker, Orchestrator, Leading-edge user) to be a Test Engineer at Google:

Constant learning is what keeps Google Test Engineers motivated. We understand holistically how all the pieces of the software stack are interconnected and what kind of coverage exists or is needed to test the connections between the stacks.This product knowledge makes us test experts. We work closely with Software Engineers from the very beginning of the development process to discuss the testability of the designs before the features are implemented.  We develop test strategies, methodologies, and test plans; we write scripts, design systems, and build tools and test infrastructure. We review design docs, do deep dives into Google's massive codebase, evaluate stack traces, and determine the root causes of production outages. Through this constant learning, we not only build deep technical expertise and do risk management by identifying weak spots in the code base, we also find creative ways to break software and identify potential problems. Our job ladder also gives us the flexibility and independence to explore and learn new technologies like ML concepts and Cloud computing and to build new testing solutions or improve existing ones.

Out-of-the-box thinking, a result of constant learning, is another thing that keeps us motivated. As Google Test Engineers, we champion Engineering excellence by  providing optimized solutions to address engineering inefficiencies, testing gaps, and process gaps. We constantly think of ways to make machines do the work to increase testability and productivity. Hundreds and thousands of lines of code get checked-in every minute at Google. To maintain velocity, quality, and code health, we devise creative ways to test and debug the test failures -- like performing diff testing, building dynamic test cases from the logs, designing heuristic algorithms to identify culprits for test failures, building solutions to reduce the test run time, and implementing stubs, fakes, and mock objects and servers to help developers write stable unit and integration testing. Along with devising creative ways to test and debug the test failures, we also focus on improving engineering excellence and product excellence by defining and measuring productivity metrics and product health metrics like quality, stability, and performance. The testing of, for example, Search, Ads, Maps, YouTube, Cloud, self-driving cars, and Google Apps would not have scaled with traditional testing practices. 

Orchestrating the testing efforts is a key responsibility of Google Test Engineers. As orchestrators we can collaborate with cross functional teams including Product Managers, Technical Program Managers, and Software engineers to define critical user journeys (CUJs), determine testing strategies, and ensure that the right tests are run on the right configurations/environments. With our strong communication and collaboration skills, we work with the cross-functional teams and play the role of evangelists in spreading the word on new tools, technologies, and best testing practices.  We also have the opportunity to host Hackathons and Fixits, host interns, drive college recruiting events, engage with the open source community in testing the open source products, listen to feedback, and convert that feedback into product improvements.

Leading-edge user: the fun part of being a Test Engineer! We can engage with product development, participate in the review of product designs, documentation, and prototypes, play with features and products early on, and provide informed feedback. Best of all, as early adopters we get to wear wearables, ride in self driving cars, be in our own world with AR/VR, engage with Google Assistant to get our chores done, and have multiple laptops, phones, and smart display units! 

Stay tuned to learn more COOL things about Test Engineering at Google! 

Testing on the Toilet: Tests Too DRY? Make Them DAMP!

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Derek Snyder and Erik Kuefler

The test below follows the DRY principle (“Don’t Repeat Yourself”), a best practice that encourages code reuse rather than duplication, e.g., by extracting helper methods or by using loops. But is it a well-written test?
def setUp(self):
self.users = [User('alice'), User('bob')] # This field can be reused across tests. = Forum()

def testCanRegisterMultipleUsers(self):
for user in self.users: # Use a for-loop to verify that all users are registered.
def _RegisterAllUsers(self): # This method can be reused across tests.
for user in self.users:

While the test body above is concise, the reader needs to do some mental computation to understand it, e.g., by following the flow of self.users from setUp() through _RegisterAllUsers(). Since tests don't have tests, it should be easy for humans to manually inspect them for correctness, even at the expense of greater code duplication. This means that the DRY principle often isn’t a good fit for unit tests, even though it is a best practice for production code.

In tests we can use the DAMP principle (“Descriptive and Meaningful Phrases”), which emphasizes readability over uniqueness. Applying this principle can introduce code redundancy (e.g., by repeating similar code), but it makes tests more obviously correct. Let’s add some DAMP-ness to the above test:

def setUp(self): = Forum()

def testCanRegisterMultipleUsers(self):
# Create the users in the test instead of relying on users created in setUp.
user1 = User('alice')
user2 = User('bob')

# Register the users in the test instead of in a helper method, and don't use a for-loop.
# Assert each user individually instead of using a for-loop.

Note that the DRY principle is still relevant in tests; for example, using a helper function for creating value objects can increase clarity by removing redundant details from the test body. Ideally, test code should be both readable and unique, but sometimes there’s a trade-off. When writing unit tests and faced with a choice between the DRY and DAMP principles, lean more heavily toward DAMP.

Code Health: Respectful Reviews == Useful Reviews

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Liz Kammer (Google), Maggie Hodges (UX research consultant), and Ambar Murillo (Google)

While code review is recognized as a valuable tool for improving the quality of software projects, code review comments that are perceived as being unclear or harsh can have unfavorable consequences: slow reviews, blocked dependent code reviews, negative emotions, or negative perceptions of other contributors or colleagues.

Consider these tips to resolve code review comments respectfully.

As a Reviewer or Author:
  • DO: Assume competence. An author’s implementation or a reviewer’s recommendation may be due to the other party having different context than you. Start by asking questions to gain understanding.
  • DO: Provide rationale or context, such as a best practices document, a style guide, or a design document. This can help others understand your decision or provide mentorship.
  • DO: Consider how comments may be interpreted. Be mindful of the differing ways hyperbole, jokes, and emojis may be perceived.
    Author Don’t:
    I prefer short names so I’d rather
    not change this. Unless you make
    me? :)
    Author Do:
    Best practice suggests omitting
    obvious/generic terms. I’m not
    sure how to reconcile that
    advice with this request.
  • DON’T: Criticize the person. Instead, discuss the code. Even the perception that a comment is about a person (e.g., due to using “you” or “your”) distracts from the goal of improving the code.
    Reviewer Don’t:
    Why are you using this approach?
    You’re adding unnecessary
    Reviewer Do:
    This concurrency model appears to
    be adding complexity to the
    system without any visible
    performance benefit.
  • DON’T: Use harsh language. Code review comments with a negative tone are less likely to be useful. For example, prior research found very negative comments were considered useful by authors 57% of the time, while more-neutral comments were useful 79% of the time.  

As a Reviewer:
  • DO: Provide specific and actionable feedback. If you don’t have specific advice, sometimes it’s helpful to ask for clarification on why the author made a decision.
    Reviewer Don’t:
    I don’t understand this.
    Reviewer Do:
    If this is an optimization, can you
    please add comments?
  • DO: Clearly mark nitpicks and optional comments by using prefixes such as ‘Nit’ or ‘Optional’. This allows the author to better gauge the reviewer’s expectations.

As an Author:
  • DO: Clarify code or reply to the reviewer’s comment in response to feedback. Failing to do so can signal a lack of receptiveness to implementing improvements to the code.
    Author Don’t:
    That makes sense in some cases but
    not here.
    Author Do:
    I added a comment about why
    it’s implemented that way.
  • DO: When disagreeing with feedback, explain the advantage of your approach. In cases where you can’t reach consensus, follow Google’s guidance for resolving conflicts in code review.

Truth 1.0: Fluent Assertions for Java and Android Tests

By Chris Povirk, Java Core Libraries

Software testing is important—and sometimes frustrating. The frustration can come from working on innately hard domains, like concurrency, but too often it comes from a thousand small cuts:

assertEquals("Message has been sent", getString(notification, EXTRA_BIG_TEXT));
getString(notification, EXTRA_TEXT)
.contains("Kurt Kluever <[email protected]>"));

The two assertions above test almost the same thing, but they are structured differently. The difference in structure makes it hard to identify the difference in what's being tested.

A better way to structure these assertions is to use a fluent API:

assertThat(getString(notification, EXTRA_BIG_TEXT))
.isEqualTo("Message has been sent");
assertThat(getString(notification, EXTRA_TEXT))
.contains("Kurt Kluever <[email protected]>");

A fluent API naturally leads to other advantages:
  • IDE autocompletion can suggest assertions that fit the value under test, including rich operations like containsExactly(permission.SEND_SMS, permission.READ_SMS).
  • Failure messages can include the value under test and the expected result. Contrast this with the assertTrue call above, which lacks a failure message entirely.
Google's fluent assertion library for Java and Android is Truth. We're happy to announce that we've released Truth 1.0, which stabilizes our API after years of fine-tuning.

Truth started in 2011 as a Googler's personal open source project. Later, it was donated back to Google and cultivated by the Java Core Libraries team, the people who bring you Guava.

You might already be familiar with assertion libraries like Hamcrest and AssertJ, which provide similar features. We've designed Truth to have a simpler API and more readable failure messages. For example, here's a failure message from AssertJ:

<[year: 2019
month: 7
day: 15
to contain exactly in any order:
<[year: 2019
month: 6
day: 30
elements not found:
<[year: 2019
month: 6
day: 30
and elements not expected:
<[year: 2019
month: 7
day: 15

And here's the equivalent message from Truth:

value of:
year: 2019
month: 6
day: 30

but was:
year: 2019
month: 7
day: 15

For more details, read our comparison of the libraries, and try Truth for yourself.

Also, if you're developing for Android, try AndroidX Test. It includes Truth extensions that make assertions even easier to write and failure messages even clearer:

.isEqualTo("Message has been sent");
.contains("Kurt Kluever <[email protected]>");

Coming soon: Kotlin users of Truth can look forward to Kotlin-specific enhancements.

Android Platform Testing Made Easy

By Simran Basi, Dan Shi, Dan Willemsen, and Clay Murphy

Android Engineering Productivity (Android EngProd) seeks to ease development of the Android operating system for the entire ecosystem. Android EngProd creates tools, processes, and documentation aimed at Android platform development. We are now starting to push the best previously internal development infrastructure into the open for all to benefit.

Although comprehensive, the Android Compatibility Test Suite (CTS) and Trade Federation Test Harness can be unwieldy to configure. So we recently publicly released new tooling and associated docs that simplify device configuration and testing in the form of the Soong build system replacing Make, Test Mapping for easy configs, and Atest to run tests locally.

Configuring tests in Soong builds

The Soong build system was introduced in Android 8.0 (Oreo) to eventually replace the Make-based system (i.e. files) used in previous releases. Soong allows simple build configuration with support for android_test declarations arriving in Android Q, now available in the Android Open Source Project (AOSP) master branch.

Soong uses Android.bp files, which are JSON-like simple declarative descriptions of modules to build. Here is an example test configuration in Soong, from: /platform_testing/tests/example/instrumentation/Android.bp
android_test {
    name: "HelloWorldTests",
    srcs: ["src/**/*.java"],
    sdk_version: "current",
    static_libs: ["android-support-test"],
    certificate: "platform",
    test_suites: ["device-tests"],

Note the android_test declaration at the beginning indicates this is a test. Including android_app instead would conversely indicate this is a build package. Complex test configuration options still exist for test modules requiring customized setup and tear down that cannot be performed within the test case itself.

Mapping tests in the source tree

Test Mapping allows developers to create pre- and post-submit test rules directly in the Android source tree and leave the decisions of branches and devices to be tested to the test infrastructure itself. Test Mapping definitions are JSON files named TEST_MAPPING that can be placed in any source directory.

Test Mapping categorizes tests via a test group. The name of a test group can be any string. For example, presubmit can be for a group of tests to run when validating changes. And postsubmit tests can be used to validate the builds after changes are merged.

For the directory requiring test coverage, simply add a TEST_MAPPING JSON file resembling the example below. These rules will ensure the tests run in presubmit checks when any files are touched in that directory or any of its subdirectories.

Here is a sample TEST_MAPPING file:
  "presubmit": [
      "name": "CtsAccessibilityServiceTestCases",
      "options": [
          "include-annotation": "android.platform.test.annotations.Presubmit"
  "postsubmit": [
      "name": "CtsWindowManagerDeviceTestCases"
  "imports": [
      "path": "frameworks/base/services/core/java/com/android/server/am"

Running tests locally with Atest

Atest is a command line tool that allows developers to build, install, and run Android tests locally, greatly speeding test re-runs without requiring knowledge of Trade Federation Test Harness command line options.

Atest commands take the following form:
atest [optional-arguments] test-to-run

You can run one or more tests by separating test references with spaces, like so:
atest test-to-run-1 test-to-run-2

To run an entire test module, use its module name. Input the name as it appears in the LOCAL_MODULE or LOCAL_PACKAGE_NAME variables in that test's or Android.bp file.

For example:
atest FrameworksServicesTests
atest CtsJankDeviceTestCases

Discovering tests with Atest and TEST MAPPING

Atest and TEST MAPPING work together to solve the problem of test discovery, i.e. what tests need to be run when a directory of code is edited. For example, to execute all presubmit test rules for a given directory locally:

  1. Go to the directory containing the TEST_MAPPING file.
  2. Run the command: atest
All presubmit tests configured in the TEST_MAPPING files of the current directory and its parent directories are run. Atest will locate and run two tests for presubmit.

Finding more testing documentation

Further, introductory testing documents were published on to support Soong and platform testing in general:
In addition to exposing more testing documentation, Android has recently opened up build infrastructure to monitor submissions through See the More visibility into the Android Open Source Project blog post and the Continuous Integration Dashboard for instructions on viewing build status and downloading build artifacts.

Android EngProd endeavors to bring you more previously internal-only features to make your life easier. Watch this Google Testing Blog, the Android Developers Blog, and for future enhancements.

Testing on the Toilet: Exercise Service Call Contracts in Tests

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Ben Yu

The following test mocks out a service call to CloudService Does the test provide enough confidence that the service call is likely to work?

@Test public void uploadFileToCloudStorage() {

CloudUploader cloudUploader = new CloudUploader(mockCloudService);

Uri uri = cloudUploader.uploadFile(new File(“/path/to/foo.txt”));
// The uploaded file URI contains the user ID, file type, and upload ID. (Or does it?)
assertThat(uri).isEqualTo(new Uri(“/testuser/text/uploadId.txt”));

Lots of things can go wrong, especially when service contracts get complex. For example, plain/text may not be a valid file type, and you can’t verify that the URI of the uploaded file is correct.

If the code under test relies on the contract of a service, prefer exercising the service call instead of mocking it out. This gives you more confidence that you are using the service correctly:
@Test public void uploadFileToCloudStorage() {
CloudUploader cloudUploader = new CloudUploader(cloudService);
Uri uri = cloudUploader.uploadFile(”/path/to/foo.txt”);

How can you exercise the service call?

  1. Use a fake.  A fake is a fast and lightweight implementation of the service that behaves just like the real implementation. A fake is usually maintained by the service owners; don’t create your own fake unless you can ensure its behavior will stay in sync with the real implementation.  Learn more about fakes at
  2. Use a hermetic server.  This is a real server that is brought up by the test and runs on the same machine that the test is running on. A downside of using a hermetic server is that starting it up and interacting with it can slow down tests.  Learn more about hermetic servers at
If the service you are using doesn’t have a fake or hermetic server, mocks may be the only tool at your disposal. But if your tests are not exercising the service call contract, you must take extra care to ensure the service call works, such as by having a comprehensive suite of end-to-end tests or resorting to manual QA (which can be inefficient and hard to scale).

Efficacy Presubmit

By Peter Spragins
with input from John Roane, Collin Johnston, Matt Rodrigues and Dave Chen

A Brief History of Efficacy

Originally named "Test Efficacy", a small team was formed in 2014 to quantify the value of individual tests to the development process. Some tests were particularly valuable because they provided a reliable breakage signal for critical code. Some tests were not useful because they were non-deterministic or they never failed. Confoundingly, tests would change in value over time as well. The team’s initial intention was to present this information to developers and help them optimize the development process.

To achieve the goal of informing developers about their tests, the team had to collect a huge amount of developer infrastructure/workflow data from a variety of sources across Google. Collecting all of this data in one place turned out to be incredibly valuable.

In addition to collecting and processing the data, the team developed a somewhat radical philosophy towards running tests at scale: the only important results come from tests which deterministically fail. Running an additional test that you know will pass is not a valuable signal to developers, and likely a waste of resources.

Background on Google Presubmit

The process of committing code at Google has several testing stages. Perhaps the three most important testing stages are:
  1. Individual ad-hoc testing
  2. Presubmit
  3. Continuous build/continuous integration (hereafter referred to as continuous build).
Stages 1 and 2 can actually be interleaved in any order and repeated any number of times.

A presubmit executes all of the tests which are known to be affected by the edited code within one user's proposed code changes. The "affected tests" are calculated with the help of a "project definition", a configuration maintained by teams. A presubmit can run at any point during the change proposal process, but most importantly it must run before a user can permanently commit their changes.

Continuous build, (3), is the continuous running of all tests within a project at the newest committed version of the code. Continuous build will execute tests even when they have already passed at presubmit.

The same test may run several times at presubmit during the development process, one last time at presubmit before a commit and then finally once again at continuous build, after being merged into the main branch of Google's huge repository. For this reason, a "missed failure" at presubmit is not a critical failure. The test will still be run at continuous build, and then rolled back if it fails.

Efficacy Presubmit Service

Efficacy Presubmit Service is the fusion of "running the right tests at the right time" with one of the largest collections of test/developer data in the world. The service has one simple job: save time and resources by not running, or even compiling, tests that we are very confident will pass at Presubmit. The ideal "Efficacy Presubmit" would predict which tests will pass ahead of time and only run tests which were going to fail. Then the user can get feedback from the failing tests, and fix their mistakes with the minimal possible cost of user and CPU time.

To make this idea possible we have made one significant abstraction of the actual presubmit testing process. In a given presubmit there may be zero tests run, or many. In a presubmit with one test, if that test fails then the presubmit fails. In a presubmit with a thousand tests, only one failing test will still fail the presubmit. Efficacy Presubmit makes the abstraction that each of these test executions is an equivalent unit. This greatly simplifies creating a training dataset.

Machine Learning / Probabilistic Safety

Quick background on ML

ML techniques and processes are quite well known throughout the industry at this point. The Tensorflow tutorials are a great introduction. The type of ML we use is classification. A classifier is essentially a mapping from the domain of the dataset, to the range of the classes. Mnist is a very famous example of classification. An mnist classifier maps from the domain of the input image to the range of digits {0, 1, …, 9}.

In some other classification problems, the inputs are more "tabular". A famous example of tabular classification is Iris Species. This is very similar to what Efficacy does.

Efficacy's Application of ML

Given the abstraction on the presubmit testing process described above, predicting the outcomes of automated testing at a large company is a perfect machine learning problem in many ways. You have:

  1. The set of test executions and results is a very large labelled dataset
  2. Copious numerical feature columns with trustworthy values
    1.  Recent failure history of each test
    2.  Various "distance" metrics from edited source files to tests - i.e. is this a test for the edited code?
    3. Test size and runtime data
  3. Several dimensions that can be aggregated
There are some aspects of the problem which make ML difficult as well:

  1. The classes are highly imbalanced with respect to labels (the vast majority of tests are going to pass, not fail)
  2. Flaky tests can mislead the model because their labels are "untrue"

We chose to reduce the problem to binary classification. The model chooses whether or not to run the test. In other words, failure is the positive class, and everything else is the negative class.

We pick a threshold that results in an extremely low number of false negatives - failing tests which are not run because the model thinks they would have passed. This does reduce the number of skipped tests, true negatives, in exchange for a very high margin of safety. In addition to this, tests will be run afterwards at continuous build anyway, making presubmit skipping very safe.

Difficulties of Scale

In addition to the problems that were natural to the "schema" of the dataset, we faced some problems due to the scale of Google's testing.

Many of these problems stem from the fact that Google works out of one large repository (paper, talk). Because of this some presubmits have a very large number of tests and some commits require a large number of presubmits before they are finished. This means that the service has to make predictions for a very large number of tests all at once. If a presubmit tried to run every test at Google, then the service would have to predict each test individually. That means N times the number of columns, etc. Loading the data to generate all of these feature values uses a lot of memory.

Another difficulty of doing this work at scale is that even with very rare false negatives, they will still happen somewhat frequently. This requires our team to be open to communication with any customer team. In some cases we may have to tell them they were the victim of a very low probability event. In other cases we may find a bug, or room for improvement.


The two key numbers for the system's performance are sensitivity, the percentage of failing tests we actually execute, and specificity, the percentage of passing tests we actually skip. The two numbers go hand in hand. For a given model, requiring a higher sensitivity will result in a lower specificity, or vice versa. We can easily tune the percentage of tests skipped, resulting in changes to the fidelity of the testing signal the developers receive. When the system is wrong, it can have some negative impact to developers if the prediction is a false negative. Rarely, it will allow a developer to commit code that will break a test during continuous build. This results in a broken "project", which takes some time to detect, and then a roll-back of the code. This requires some developer time, and a flexible mentality towards testing. In order to achieve a positive balance from this, we must extract millions of skipped tests for every negative developer experience. The sensitivity of our system is very high, and our specificity is around 25%.

Code Health: Make Interfaces Hard to Misuse

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Marek Kiszkis

We all try to avoid errors in our code. But what about errors created by callers of your code? A good interface design can make it easy for callers to do the right thing, and hard for callers to do the wrong thing. Don't push the responsibility of maintaining invariants required by your class on to its callers.
Can you see the issues that can arise with this code?
class Vector {
explicit Vector(int num_slots); // Creates an empty vector with `num_slots` slots.
int RemainingSlots() const; // Returns the number of currently remaining slots.
void AddSlots(int num_slots); // Adds `num_slots` more slots to the vector.
// Adds a new element at the end of the vector. Caller must ensure that RemainingSlots()
// returns at least 1 before calling this, otherwise caller should call AddSlots().
void Insert(int value);

If the caller forgets to call AddSlots(), undefined behavior might be triggered when Insert() is called. The interface pushes complexity onto the caller, exposing the caller to implementation details.

Since maintaining the slots is not relevant to the caller-visible behaviors of the class, don't expose them in the interface; make it impossible to trigger undefined behavior by adding slots as needed in Insert().
@Test public void class Vector {
explicit Vector(int num_slots);
// Adds a new element at the end of the vector. If necessary,
// allocates new slots to ensure that there is enough storage
// for the new value.
void Insert(int value);

Contracts enforced by the compiler are usually better than contracts enforced by runtime checks, or worse, documentation-only contracts that rely on callers to do the right thing.
Here are other examples that could signal that an interface is easy to misuse:
  • Requiring callers to call an initialization function (alternative: expose factory methods that return your object fully initialized).
  • Requiring callers to perform custom cleanup (alternative: use language-specific constructs that ensure automated cleanup when your object goes out of scope).
  • Allowing code paths that create objects without required parameters (e.g. a user without an ID).
  • Allowing parameters for which only some values are valid, especially if it is possible to use a more appropriate type (e.g. prefer Duration timeout instead of int timeout_in_millis).
It is not always practical to have a foolproof interface. In certain cases, relying on static analysis or documentation is necessary since some requirements are impossible to express in an interface (e.g. that a callback function needs to be thread-safe).

Don’t enforce what you don’t need to enforce - avoid code that is too defensive. For example, extensive validation of function parameters can increase complexity and reduce performance.

Testing on the Toilet: Only Verify Relevant Method Arguments

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Dillon Bly

What makes this test fragile?
@Test public void displayGreeting_showSpecialGreetingOnNewYearsDay() {
fakeUser.setName("Fake User”);
// The test will fail if userGreeter.displayGreeting() didn’t call
// mockUserPrompter.updatePrompt() with these exact arguments.
"Hi Fake User! Happy New Year!", TitleBar.of("2018-01-01"), PromptStyle.NORMAL);


The test specifies exact values for all arguments to mockUserPrompter. These arguments may need to be updated when the code under test is changed, even if the changes are unrelated to the behavior being tested. For example, if additional text is added to TitleBar, every test in the codebase that specifies this argument will need to be updated.

In addition, verifying too many arguments makes it difficult to understand what behavior is being tested since it’s not obvious which arguments are important to the test and which are irrelevant.

Instead, only verify arguments that affect the correctness of the specific behavior being tested. You can use argument matchers (e.g., any() and contains() in Mockito) to ignore arguments that don't need to be verified:
@Test public void displayGreeting_showSpecialGreetingOnNewYearsDay() {
verify(mockUserPrompter).updatePrompt(contains("Happy New Year!"), any(), any()));

Arguments ignored in one test can be verified in other tests. Following this pattern allows us to verify only one behavior per test, which makes tests more readable and more resilient to change. For example, here is a separate test that we might write:
@Test public void displayGreeting_renderUserName() {
fakeUser.setName(“Fake User”);
// Focus on the argument relevant to showing the user's name.
verify(mockUserPrompter).updatePrompt(contains("Hi Fake User!"), any(), any());