Category Archives: Google Testing Blog

If it ain’t broke, you’re not trying hard enough

Truth 1.0: Fluent Assertions for Java and Android Tests

By Chris Povirk, Java Core Libraries

Software testing is important—and sometimes frustrating. The frustration can come from working on innately hard domains, like concurrency, but too often it comes from a thousand small cuts:

assertEquals("Message has been sent", getString(notification, EXTRA_BIG_TEXT));
getString(notification, EXTRA_TEXT)
.contains("Kurt Kluever <>"));

The two assertions above test almost the same thing, but they are structured differently. The difference in structure makes it hard to identify the difference in what's being tested.

A better way to structure these assertions is to use a fluent API:

assertThat(getString(notification, EXTRA_BIG_TEXT))
.isEqualTo("Message has been sent");
assertThat(getString(notification, EXTRA_TEXT))
.contains("Kurt Kluever <>");

A fluent API naturally leads to other advantages:
  • IDE autocompletion can suggest assertions that fit the value under test, including rich operations like containsExactly(permission.SEND_SMS, permission.READ_SMS).
  • Failure messages can include the value under test and the expected result. Contrast this with the assertTrue call above, which lacks a failure message entirely.
Google's fluent assertion library for Java and Android is Truth. We're happy to announce that we've released Truth 1.0, which stabilizes our API after years of fine-tuning.

Truth started in 2011 as a Googler's personal open source project. Later, it was donated back to Google and cultivated by the Java Core Libraries team, the people who bring you Guava.

You might already be familiar with assertion libraries like Hamcrest and AssertJ, which provide similar features. We've designed Truth to have a simpler API and more readable failure messages. For example, here's a failure message from AssertJ:

<[year: 2019
month: 7
day: 15
to contain exactly in any order:
<[year: 2019
month: 6
day: 30
elements not found:
<[year: 2019
month: 6
day: 30
and elements not expected:
<[year: 2019
month: 7
day: 15

And here's the equivalent message from Truth:

value of:
year: 2019
month: 6
day: 30

but was:
year: 2019
month: 7
day: 15

For more details, read our comparison of the libraries, and try Truth for yourself.

Also, if you're developing for Android, try AndroidX Test. It includes Truth extensions that make assertions even easier to write and failure messages even clearer:

.isEqualTo("Message has been sent");
.contains("Kurt Kluever <>");

Coming soon: Kotlin users of Truth can look forward to Kotlin-specific enhancements.

Android Platform Testing Made Easy

By Simran Basi, Dan Shi, Dan Willemsen, and Clay Murphy

Android Engineering Productivity (Android EngProd) seeks to ease development of the Android operating system for the entire ecosystem. Android EngProd creates tools, processes, and documentation aimed at Android platform development. We are now starting to push the best previously internal development infrastructure into the open for all to benefit.

Although comprehensive, the Android Compatibility Test Suite (CTS) and Trade Federation Test Harness can be unwieldy to configure. So we recently publicly released new tooling and associated docs that simplify device configuration and testing in the form of the Soong build system replacing Make, Test Mapping for easy configs, and Atest to run tests locally.

Configuring tests in Soong builds

The Soong build system was introduced in Android 8.0 (Oreo) to eventually replace the Make-based system (i.e. files) used in previous releases. Soong allows simple build configuration with support for android_test declarations arriving in Android Q, now available in the Android Open Source Project (AOSP) master branch.

Soong uses Android.bp files, which are JSON-like simple declarative descriptions of modules to build. Here is an example test configuration in Soong, from: /platform_testing/tests/example/instrumentation/Android.bp
android_test {
    name: "HelloWorldTests",
    srcs: ["src/**/*.java"],
    sdk_version: "current",
    static_libs: ["android-support-test"],
    certificate: "platform",
    test_suites: ["device-tests"],

Note the android_test declaration at the beginning indicates this is a test. Including android_app instead would conversely indicate this is a build package. Complex test configuration options still exist for test modules requiring customized setup and tear down that cannot be performed within the test case itself.

Mapping tests in the source tree

Test Mapping allows developers to create pre- and post-submit test rules directly in the Android source tree and leave the decisions of branches and devices to be tested to the test infrastructure itself. Test Mapping definitions are JSON files named TEST_MAPPING that can be placed in any source directory.

Test Mapping categorizes tests via a test group. The name of a test group can be any string. For example, presubmit can be for a group of tests to run when validating changes. And postsubmit tests can be used to validate the builds after changes are merged.

For the directory requiring test coverage, simply add a TEST_MAPPING JSON file resembling the example below. These rules will ensure the tests run in presubmit checks when any files are touched in that directory or any of its subdirectories.

Here is a sample TEST_MAPPING file:
  "presubmit": [
      "name": "CtsAccessibilityServiceTestCases",
      "options": [
          "include-annotation": "android.platform.test.annotations.Presubmit"
  "postsubmit": [
      "name": "CtsWindowManagerDeviceTestCases"
  "imports": [
      "path": "frameworks/base/services/core/java/com/android/server/am"

Running tests locally with Atest

Atest is a command line tool that allows developers to build, install, and run Android tests locally, greatly speeding test re-runs without requiring knowledge of Trade Federation Test Harness command line options.

Atest commands take the following form:
atest [optional-arguments] test-to-run

You can run one or more tests by separating test references with spaces, like so:
atest test-to-run-1 test-to-run-2

To run an entire test module, use its module name. Input the name as it appears in the LOCAL_MODULE or LOCAL_PACKAGE_NAME variables in that test's or Android.bp file.

For example:
atest FrameworksServicesTests
atest CtsJankDeviceTestCases

Discovering tests with Atest and TEST MAPPING

Atest and TEST MAPPING work together to solve the problem of test discovery, i.e. what tests need to be run when a directory of code is edited. For example, to execute all presubmit test rules for a given directory locally:

  1. Go to the directory containing the TEST_MAPPING file.
  2. Run the command: atest
All presubmit tests configured in the TEST_MAPPING files of the current directory and its parent directories are run. Atest will locate and run two tests for presubmit.

Finding more testing documentation

Further, introductory testing documents were published on to support Soong and platform testing in general:
In addition to exposing more testing documentation, Android has recently opened up build infrastructure to monitor submissions through See the More visibility into the Android Open Source Project blog post and the Continuous Integration Dashboard for instructions on viewing build status and downloading build artifacts.

Android EngProd endeavors to bring you more previously internal-only features to make your life easier. Watch this Google Testing Blog, the Android Developers Blog, and for future enhancements.

Testing on the Toilet: Exercise Service Call Contracts in Tests

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Ben Yu

The following test mocks out a service call to CloudService Does the test provide enough confidence that the service call is likely to work?

@Test public void uploadFileToCloudStorage() {

CloudUploader cloudUploader = new CloudUploader(mockCloudService);

Uri uri = cloudUploader.uploadFile(new File(“/path/to/foo.txt”));
// The uploaded file URI contains the user ID, file type, and upload ID. (Or does it?)
assertThat(uri).isEqualTo(new Uri(“/testuser/text/uploadId.txt”));

Lots of things can go wrong, especially when service contracts get complex. For example, plain/text may not be a valid file type, and you can’t verify that the URI of the uploaded file is correct.

If the code under test relies on the contract of a service, prefer exercising the service call instead of mocking it out. This gives you more confidence that you are using the service correctly:
@Test public void uploadFileToCloudStorage() {
CloudUploader cloudUploader = new CloudUploader(cloudService);
Uri uri = cloudUploader.uploadFile(”/path/to/foo.txt”);

How can you exercise the service call?

  1. Use a fake.  A fake is a fast and lightweight implementation of the service that behaves just like the real implementation. A fake is usually maintained by the service owners; don’t create your own fake unless you can ensure its behavior will stay in sync with the real implementation.  Learn more about fakes at
  2. Use a hermetic server.  This is a real server that is brought up by the test and runs on the same machine that the test is running on. A downside of using a hermetic server is that starting it up and interacting with it can slow down tests.  Learn more about hermetic servers at
If the service you are using doesn’t have a fake or hermetic server, mocks may be the only tool at your disposal. But if your tests are not exercising the service call contract, you must take extra care to ensure the service call works, such as by having a comprehensive suite of end-to-end tests or resorting to manual QA (which can be inefficient and hard to scale).

Efficacy Presubmit

By Peter Spragins
with input from John Roane, Collin Johnston, Matt Rodrigues and Dave Chen

A Brief History of Efficacy

Originally named "Test Efficacy", a small team was formed in 2014 to quantify the value of individual tests to the development process. Some tests were particularly valuable because they provided a reliable breakage signal for critical code. Some tests were not useful because they were non-deterministic or they never failed. Confoundingly, tests would change in value over time as well. The team’s initial intention was to present this information to developers and help them optimize the development process.

To achieve the goal of informing developers about their tests, the team had to collect a huge amount of developer infrastructure/workflow data from a variety of sources across Google. Collecting all of this data in one place turned out to be incredibly valuable.

In addition to collecting and processing the data, the team developed a somewhat radical philosophy towards running tests at scale: the only important results come from tests which deterministically fail. Running an additional test that you know will pass is not a valuable signal to developers, and likely a waste of resources.

Background on Google Presubmit

The process of committing code at Google has several testing stages. Perhaps the three most important testing stages are:
  1. Individual ad-hoc testing
  2. Presubmit
  3. Continuous build/continuous integration (hereafter referred to as continuous build).
Stages 1 and 2 can actually be interleaved in any order and repeated any number of times.

A presubmit executes all of the tests which are known to be affected by the edited code within one user's proposed code changes. The "affected tests" are calculated with the help of a "project definition", a configuration maintained by teams. A presubmit can run at any point during the change proposal process, but most importantly it must run before a user can permanently commit their changes.

Continuous build, (3), is the continuous running of all tests within a project at the newest committed version of the code. Continuous build will execute tests even when they have already passed at presubmit.

The same test may run several times at presubmit during the development process, one last time at presubmit before a commit and then finally once again at continuous build, after being merged into the main branch of Google's huge repository. For this reason, a "missed failure" at presubmit is not a critical failure. The test will still be run at continuous build, and then rolled back if it fails.

Efficacy Presubmit Service

Efficacy Presubmit Service is the fusion of "running the right tests at the right time" with one of the largest collections of test/developer data in the world. The service has one simple job: save time and resources by not running, or even compiling, tests that we are very confident will pass at Presubmit. The ideal "Efficacy Presubmit" would predict which tests will pass ahead of time and only run tests which were going to fail. Then the user can get feedback from the failing tests, and fix their mistakes with the minimal possible cost of user and CPU time.

To make this idea possible we have made one significant abstraction of the actual presubmit testing process. In a given presubmit there may be zero tests run, or many. In a presubmit with one test, if that test fails then the presubmit fails. In a presubmit with a thousand tests, only one failing test will still fail the presubmit. Efficacy Presubmit makes the abstraction that each of these test executions is an equivalent unit. This greatly simplifies creating a training dataset.

Machine Learning / Probabilistic Safety

Quick background on ML

ML techniques and processes are quite well known throughout the industry at this point. The Tensorflow tutorials are a great introduction. The type of ML we use is classification. A classifier is essentially a mapping from the domain of the dataset, to the range of the classes. Mnist is a very famous example of classification. An mnist classifier maps from the domain of the input image to the range of digits {0, 1, …, 9}.

In some other classification problems, the inputs are more "tabular". A famous example of tabular classification is Iris Species. This is very similar to what Efficacy does.

Efficacy's Application of ML

Given the abstraction on the presubmit testing process described above, predicting the outcomes of automated testing at a large company is a perfect machine learning problem in many ways. You have:

  1. The set of test executions and results is a very large labelled dataset
  2. Copious numerical feature columns with trustworthy values
    1.  Recent failure history of each test
    2.  Various "distance" metrics from edited source files to tests - i.e. is this a test for the edited code?
    3. Test size and runtime data
  3. Several dimensions that can be aggregated
There are some aspects of the problem which make ML difficult as well:

  1. The classes are highly imbalanced with respect to labels (the vast majority of tests are going to pass, not fail)
  2. Flaky tests can mislead the model because their labels are "untrue"

We chose to reduce the problem to binary classification. The model chooses whether or not to run the test. In other words, failure is the positive class, and everything else is the negative class.

We pick a threshold that results in an extremely low number of false negatives - failing tests which are not run because the model thinks they would have passed. This does reduce the number of skipped tests, true negatives, in exchange for a very high margin of safety. In addition to this, tests will be run afterwards at continuous build anyway, making presubmit skipping very safe.

Difficulties of Scale

In addition to the problems that were natural to the "schema" of the dataset, we faced some problems due to the scale of Google's testing.

Many of these problems stem from the fact that Google works out of one large repository (paper, talk). Because of this some presubmits have a very large number of tests and some commits require a large number of presubmits before they are finished. This means that the service has to make predictions for a very large number of tests all at once. If a presubmit tried to run every test at Google, then the service would have to predict each test individually. That means N times the number of columns, etc. Loading the data to generate all of these feature values uses a lot of memory.

Another difficulty of doing this work at scale is that even with very rare false negatives, they will still happen somewhat frequently. This requires our team to be open to communication with any customer team. In some cases we may have to tell them they were the victim of a very low probability event. In other cases we may find a bug, or room for improvement.


The two key numbers for the system's performance are sensitivity, the percentage of failing tests we actually execute, and specificity, the percentage of passing tests we actually skip. The two numbers go hand in hand. For a given model, requiring a higher sensitivity will result in a lower specificity, or vice versa. We can easily tune the percentage of tests skipped, resulting in changes to the fidelity of the testing signal the developers receive. When the system is wrong, it can have some negative impact to developers if the prediction is a false negative. Rarely, it will allow a developer to commit code that will break a test during continuous build. This results in a broken "project", which takes some time to detect, and then a roll-back of the code. This requires some developer time, and a flexible mentality towards testing. In order to achieve a positive balance from this, we must extract millions of skipped tests for every negative developer experience. The sensitivity of our system is very high, and our specificity is around 25%.

Code Health: Make Interfaces Hard to Misuse

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Marek Kiszkis

We all try to avoid errors in our code. But what about errors created by callers of your code? A good interface design can make it easy for callers to do the right thing, and hard for callers to do the wrong thing. Don't push the responsibility of maintaining invariants required by your class on to its callers.
Can you see the issues that can arise with this code?
class Vector {
explicit Vector(int num_slots); // Creates an empty vector with `num_slots` slots.
int RemainingSlots() const; // Returns the number of currently remaining slots.
void AddSlots(int num_slots); // Adds `num_slots` more slots to the vector.
// Adds a new element at the end of the vector. Caller must ensure that RemainingSlots()
// returns at least 1 before calling this, otherwise caller should call AddSlots().
void Insert(int value);

If the caller forgets to call AddSlots(), undefined behavior might be triggered when Insert() is called. The interface pushes complexity onto the caller, exposing the caller to implementation details.

Since maintaining the slots is not relevant to the caller-visible behaviors of the class, don't expose them in the interface; make it impossible to trigger undefined behavior by adding slots as needed in Insert().
@Test public void class Vector {
explicit Vector(int num_slots);
// Adds a new element at the end of the vector. If necessary,
// allocates new slots to ensure that there is enough storage
// for the new value.
void Insert(int value);

Contracts enforced by the compiler are usually better than contracts enforced by runtime checks, or worse, documentation-only contracts that rely on callers to do the right thing.
Here are other examples that could signal that an interface is easy to misuse:
  • Requiring callers to call an initialization function (alternative: expose factory methods that return your object fully initialized).
  • Requiring callers to perform custom cleanup (alternative: use language-specific constructs that ensure automated cleanup when your object goes out of scope).
  • Allowing code paths that create objects without required parameters (e.g. a user without an ID).
  • Allowing parameters for which only some values are valid, especially if it is possible to use a more appropriate type (e.g. prefer Duration timeout instead of int timeout_in_millis).
It is not always practical to have a foolproof interface. In certain cases, relying on static analysis or documentation is necessary since some requirements are impossible to express in an interface (e.g. that a callback function needs to be thread-safe).

Don’t enforce what you don’t need to enforce - avoid code that is too defensive. For example, extensive validation of function parameters can increase complexity and reduce performance.

Testing on the Toilet: Only Verify Relevant Method Arguments

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Dillon Bly

What makes this test fragile?
@Test public void displayGreeting_showSpecialGreetingOnNewYearsDay() {
fakeUser.setName("Fake User”);
// The test will fail if userGreeter.displayGreeting() didn’t call
// mockUserPrompter.updatePrompt() with these exact arguments.
"Hi Fake User! Happy New Year!", TitleBar.of("2018-01-01"), PromptStyle.NORMAL);


The test specifies exact values for all arguments to mockUserPrompter. These arguments may need to be updated when the code under test is changed, even if the changes are unrelated to the behavior being tested. For example, if additional text is added to TitleBar, every test in the codebase that specifies this argument will need to be updated.

In addition, verifying too many arguments makes it difficult to understand what behavior is being tested since it’s not obvious which arguments are important to the test and which are irrelevant.

Instead, only verify arguments that affect the correctness of the specific behavior being tested. You can use argument matchers (e.g., any() and contains() in Mockito) to ignore arguments that don't need to be verified:
@Test public void displayGreeting_showSpecialGreetingOnNewYearsDay() {
verify(mockUserPrompter).updatePrompt(contains("Happy New Year!"), any(), any()));

Arguments ignored in one test can be verified in other tests. Following this pattern allows us to verify only one behavior per test, which makes tests more readable and more resilient to change. For example, here is a separate test that we might write:
@Test public void displayGreeting_renderUserName() {
fakeUser.setName(“Fake User”);
// Focus on the argument relevant to showing the user's name.
verify(mockUserPrompter).updatePrompt(contains("Hi Fake User!"), any(), any());

Testing on the Toilet: Keep Tests Focused

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Ben Yu

What scenario does the following code test?
TEST_F(BankAccountTest, WithdrawFromAccount) {
Transaction transaction = account_.Deposit(Usd(5));

EXPECT_THAT(account_.Withdraw(Usd(5)), IsOk());
EXPECT_THAT(account_.Withdraw(Usd(1)), IsRejected());
EXPECT_THAT(ccount_.Withdraw(Usd(1)), IsOk());
Translated to English: “(1) I had $5 and was able to withdraw $5; (2) then got rejected when overdrawing $1; (3) but if I enable overdraft with a $1 limit, I can withdraw $1.” If that sounds a little hard to track, it is: it is testing three scenarios, not one.

A better approach is to exercise each scenario in its own test:
TEST_F(BankAccountTest, CanWithdrawWithinBalance) {
DepositAndSettle(Usd(5)); // Common setup code is extracted into a helper method.
EXPECT_THAT(account_.Withdraw(Usd(5)), IsOk());
TEST_F(BankAccountTest, CannotOverdraw) {
EXPECT_THAT(account_.Withdraw(Usd(6)), IsRejected());
TEST_F(BankAccountTest, CanOverdrawUpToOverdraftLimit) {
EXPECT_THAT(account_.Withdraw(Usd(6)), IsOk());

Writing tests this way provides many benefits:

  • Logic is easier to understand because there is less code to read in each test method.
  • Setup code in each test is simpler because it only needs to serve a single scenario.
  • Side effects of one scenario will not accidentally invalidate or mask a later scenario’s assumptions.
  • If a scenario in one test fails, other scenarios will still run since they are unaffected by the failure.
  • Test names clearly describe each scenario, which makes it easier to learn which scenarios exist.
One sign that you might be testing more than one scenario: after asserting the output of one call to the system under test, the test makes another call to the system under test.

While a scenario for a unit test often consists of a single call to the system under test, its scope can be larger for integration and end-to-end tests. For example, a test that a web UI can send email might open the inbox, click the compose button, write some text, and press the send button.

Code Health: Understanding Code In Review

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Max Kanat-Alexander

It's easy to assume that a developer who sends you some code for review is smarter than you'll ever be, and that's why you don't understand their code.

But in reality, if code is hard to understand, it's probably too complex. If you're familiar with the programming language being used, reading healthy code should be almost as easy as reading a book in your native language.

Pretend a developer sends you this block of Python to be reviewed:
def IsOkay(n):
f = False
for i in range(2, n):
if n % i == 0:
f = True
return not f

Don't spend more than a few seconds trying to understand it. Simply add a code review comment saying, "It's hard for me to understand this piece of code," or be more specific, and say, "Please use more descriptive names here."

The developer then clarifies the code and sends it to you for review again:
def IsPrime(n):
for divisor in range(2, n / 2):
if n % divisor == 0:
return False

return True

Now we can read it pretty easily, which is a benefit in itself.

Often, just asking a developer to clarify a piece of code will result in fundamental improvements. In this case, the developer noticed possible performance improvements since the code was easier to read—the function now returns earlier when the number isn't prime, and the loop only goes to n/2 instead of n.

However, now that we can easily understand this code, we can see many problems with it. For example, it has strange behavior with 0 and 1, and there are other problems, too. But most importantly, it is now apparent that this entire function should be removed and be replaced with a preexisting function for detecting if a number is prime. Clarifying the code helped both the developer and reviewer.

In summary, don't waste time reviewing code that is hard to understand, just ask for it to be clarified. In fact, such review comments are one of the most useful and important tools a code reviewer has!

Testing on the Toilet: Cleanly Create Test Data

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Ben Yu

Helper methods make it easier to create test data. But they can become difficult to read over time as you need more variations of the test data to satisfy constantly evolving requirements from new tests:
// This helper method starts with just a single parameter:
Company company = newCompany(PUBLIC);

// But soon it acquires more and more parameters.
// Conditionals creep into the newCompany() method body to handle the nulls,
// and the method calls become hard to read due to the long parameter lists:
Company small = newCompany(2, 2, null, PUBLIC);
Company privatelyOwned = newCompany(null, null, null, PRIVATE);
Company bankrupt = newCompany(null, null, PAST_DATE, PUBLIC);

// Or a new method is added each time a test needs a different combination of fields:
Company small = newCompanyWithEmployeesAndBoardMembers(2, 2, PUBLIC);
Company privatelyOwned = newCompanyWithType(PRIVATE);
Company bankrupt = newCompanyWithBankruptcyDate(PAST_DATE, PUBLIC);

Instead, use the test data builder pattern: create a helper method that returns a partially-built object (e.g., a Builder in languages such as Java, or a mutable object) whose state can be overridden in tests. The helper method initializes logically-required fields to reasonable defaults, so each test can specify only fields relevant to the case being tested:
Company small = newCompany().setEmployees(2).setBoardMembers(2).build();
Company privatelyOwned = newCompany().setType(PRIVATE).build();
Company bankrupt = newCompany().setBankruptcyDate(PAST_DATE).build();
Company arbitraryCompany = newCompany().build();

// Zero parameters makes this method reusable for different variations of Company.
// It also doesn’t need conditionals to ignore parameters that aren’t set (e.g. null
// values) since a test can simply not set a field if it doesn’t care about it.
private static Company.Builder newCompany() {
return Company.newBuilder().setType(PUBLIC).setEmployees(100); // Set required fields

Also note that tests should never rely on default values that are specified by a helper method since that forces readers to read the helper method’s implementation details in order to understand the test.
// This test needs a public company, so explicitly set it.
// It also needs a company with no board members, so explicitly clear it.
Company publicNoBoardMembers = newCompany().setType(PUBLIC).clearBoardMembers().build();

You can learn more about this topic at

Testing on the Toilet: Only Verify State-Changing Method Calls

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Dillon Bly and Andrew Trenk

Which lines can be safely removed from this test?
@Test public void addPermissionToDatabase() {
new UserAuthorizer(mockUserService, mockPermissionDb).grantPermission(USER, READ_ACCESS);

  // The test will fail if any of these methods is not called.
verify(mockPermissionDb).addPermission(USER, READ_ACCESS);

The answer is that the calls to verify the non-state-changing methods can be removed.

Method calls on another object fall into one of two categories:
  • State-changing: methods that have side effects and change the world outside the code under test, e.g., sendEmail(), saveRecord(), logAccess().
  • Non-state-changing: methods that return information about the world outside the code under test and don't modify anything, e.g., getUser(), findResults(), readFile().
You should usually avoid verifying that non-state-changing methods are called:
  • It is often redundant: a method call that doesn't change the state of the world is meaningless on its own. The code under test will use the return value of the method call to do other work that you can assert.
  • It makes tests brittle: tests need to be updated whenever method calls change. For example, if a test is expecting mockUserService.isUserActive(USER) to be called, it would fail if the code under test is modified to call user.isActive() instead.
  • It makes tests less readable: the additional assertions in the test make it more difficult to determine which method calls actually affect the state of the world.
  • It gives a false sense of security: just because the code under test called a method does not mean the code under test did the right thing with the method’s return value.
Instead of verifying that they are called, use non-state-changing methods to simulate different conditions in tests, e.g., when(mockUserService.isUserActive(USER)).thenReturn(false). Then write assertions for the return value of the code under test, or verify state-changing method calls.
Verifying non-state-changing method calls may be useful if there is no other output you can assert. For example, if your code should be caching an RPC result, you can verify that the method that makes the RPC is called only once.
With the unnecessary verifications removed, the test looks like this:
@Test public void addPermissionToDatabase() {
new UserAuthorizer(mockUserService, mockPermissionDb).grantPermission(USER, READ_ACCESS);

  // Verify only the state-changing method.
verify(mockPermissionDb).addPermission(USER, READ_ACCESS);

That’s much simpler! But remember that instead of using a mock to verify that a method was called, it would be even better to use a real or fake object to actually execute the method and check that it works properly. For example, the above test could use a fake database to check that the permission exists in the database rather than just verifying that addPermission() was called.

You can learn more about this topic in the book Growing Object-Oriented Software, Guided by Tests. Note that the book uses the terms “command” and “query” instead of “state-changing” and “non-state-changing”.