Category Archives: Google Testing Blog

If it ain’t broke, you’re not trying hard enough

Improve Readability With Positive Booleans

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Max Kanat-Alexander

Reading healthy code should be as easy as reading a book in your native language. You shouldn’t have to stop and puzzle over what a line of code is doing. One small trick that can assist with this is to make boolean checks about something positive rather than about something negative.

Here’s an extreme example:

if not nodisable_kryponite_shield:

  devise_clever_escape_plan()

else:

  engage_in_epic_battle()

What does that code do? Sure, you can figure it out, but healthy code is not a puzzle, it’s a simple communication. Let’s look at two principles we can use to simplify this code.

1. Name your flags and variables in such a way that they represent the positive check you wish to make (the presence of something, something being enabled, something being true) rather than the negative check you wish to make (the absence of something, something being disabled, something being false).


if not enable_kryponite_shield:

  devise_clever_escape_plan()

else:

  engage_in_epic_battle()

That is already easier to read and understand than the first example.

2. If your conditional looks like “if not else ” then reverse it to put the positive case first.

if enable_kryponite_shield:

  engage_in_epic_battle()

else:

  devise_clever_escape_plan()


Now the intention of the code is immediately obvious.

There are many other contexts in which this gives improvements to readability. For example, the command foo --disable_feature=False is harder to read and think about than
foo --enable_feature=True, particularly when you change the default to enable the feature.

There are some exceptions (for example, in Python, if foo is not None could be considered a “positive check” even though it has a “not” in it), but in general checking the presence or absence of a positive is simpler for readers to understand than checking the presence or absence of a negative.


Shell Scripts: Stay Small & Simple

A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By David Mandelberg

Shell scripts (including Bash scripts) can be convenient for automating simple command line procedures, and they are often better than keeping complicated commands in a single developer's history. However, shell scripts can be hard to understand and maintain, and are typically not as well-supported as other programming languages. Shell scripts have less support for unit testing, and there is likely a lower chance that somebody reading one will be experienced with the language.

Python, Go, or other general-purpose languages are often better choices than shell. Shell is convenient for some simple use cases, and the Google shell style guide can help with writing better shell scripts. But it is difficult, even for experienced shell scripters, to mitigate the risks of its many surprising behaviors. So whenever possible, use shell scripts only for small, simple use cases, or avoid shell entirely.

Here are some examples of mistakes that are far too easy to make when writing a shell script (see Bash Pitfalls for many more):

  • Forgetting to quote something can have surprising results, due to shell's complicated evaluation rules. E.g., even if a wildcard is properly quoted, it can still be unexpectedly expanded elsewhere:

$ msg='Is using bash a pro? Or a con?'

$ echo $msg  # Note that there's a subdirectory called 'proc' in the current directory.

Is using bash a proc Or a con?  # ? was unexpectedly treated as a wildcard.

  • Many things that would be function arguments in other languages are command line arguments in shell. Command line arguments are world-readable, so they can leak secrets:

$ curl -H "Authorization: Bearer ${SECRET}" "$URL" &

$ ps aux  # The current list of processes shows the secret.

  • By default, the shell ignores all errors from commands, which can cause severe bugs if code assumes that earlier commands succeeded. The command set -e can appear to force termination at the first error, but its behavior is inconsistent. For example, set -e does not affect some commands in pipelines (like false in false | cat), nor will it affect some command substitutions (such as the false in export FOO="$(false)"). Even worse, its behavior inside a function depends on how that function is called.

  • Many things run in subshells, which can (often unexpectedly) hide changes to variables from the main shell. It can also make manual error handling harder, compounding the issue above:

$ run_or_exit() { "$@" || exit $?; }  # Executes the arguments then exits on failure.

$ foo="$(run_or_exit false)"  # Exits the $() subshell, but the script continues.




The Secret to Great Code Reviews: Respect Reviewers’ Comments

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Marius Latinis

You prepared a code change and asked for a review. A reviewer left a comment you disagree with. Are you going to reply that you will not address the comment? 

When addressing comments for your code reviewed by colleagues, find a solution that makes both you and the reviewer happy. The fact that a reviewer left a comment suggests you may be able to improve the code further. Here are two effective ways to respond:

  • When it’s easy for you to make an improvement, update the code. Improved code benefits future readers and maintainers. You will also avoid a potentially long and emotional debate with a reviewer.

  • If the comment is unclear, ask the reviewer to explain. To facilitate the process, talk directly with the reviewer through chat, or in person.

Let’s demonstrate with an example code review scenario:

  1. You prepare a code change that modifies the following function:

    3 // Return the post with the most upvotes.

    3 // Return the post with the most upvotes.

                                                 

    4 // Restrict to English if englishOnly = true.

    4 Post findMostUpvotedPost(

    5 Post findMostUpvotedPost(

    5     List<Post> posts) {

    6     List<Post> posts,

     

    7     boolean englishOnly) {

    6   ... 

    8   ... // Old and new logic mixed together.

    7 }

    9 }


  1. The code reviewer leaves the following comment:

    alertreviewer

    11:51 AM

    The new function signature is too complex. Can we keep the signature unchanged?


    Reply

You disagree with the comment that one additional parameter makes the signature too complex. Nevertheless, do not reject the suggestion outright.

There is another issue that might have prompted the comment: it is not the responsibility of this function to check the post’s language (https://en.wikipedia.org/wiki/Single-responsibility_principle).

  1. You rewrite your code to address the reviewer’s comment:

    ImmutableList<Post> englishPosts = selectEnglishPosts(posts);  // Your new logic.

    Post mostUpvotedEnglishPost = findMostUpvotedPost(englishPosts);  // No change needed.

Now the code is improved, and both you and the reviewer are happy.

Communicate Design Tradeoffs Visually

A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Tim Lyakhovetskiy


A goal of any written design or project proposal is to present and evaluate alternatives. However, documents that include multiple solutions can be difficult to read when the qualities of each solution are not clearly expressed.

A common approach to simplifying proposals is to use “pros and cons” for each alternative, but this leads to biased writing since the pros and cons may be weighed differently depending on the reader’s priorities.

In this example, can you quickly tell how this option would measure up against others?

Option 1 - Optimize Shoelace Untangling Wizard in ShoeApp UI


Pros

  • Shoelace Untangling Wizard UI will use 10% less CPU

  • Less than one quarter to implement

  • Users will see 100ms less UI lag

Cons

  • Security risk (shoelace colors exposed) until ShoeAppBackend team fixes lacing API

  • ShoeAppBackend will be blocked for 3 months

  • User documentation for Shoelace Untangling Wizard UI has to change

This format requires the reader to remember many details in order to evaluate which option they prefer. Instead, express tradeoffs using impact on qualities. There are many common quality attributes including Performance, Security, Maintainability, Usability, Testability, Scalability, and Cost.

Use colors and symbols in a table ( negative, somewhat negative, positive) to make it easy for readers to parse your ideas. The symbols are needed for accessibility, e.g. color-blindness and screen readers.

Option 1 - Optimize Shoelace Untangling Wizard in ShoeApp UI

Usability

➕ Users will see 100ms less UI lag

User documentation for Shoelace Untangling Wizard UI has to change

Security

➖ Security risk (shoelace colors exposed) until ShoeAppBackend fixes lacing API

Partner impact

➖ ShoeAppBackend will be blocked for 3 months

Performance

➕ Shoelace Untangling Wizard UI will use 10% less CPU

Schedule/Cost

➕ Less than one quarter to implement

Notice that the content uses approximately the same space but communicates more visually. The benefit is even greater when there are many alternatives/attributes, as it’s possible to evaluate the whole option at a glance.



Else Nuances

 This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Sam Lee and Stan Chan

If your function exits early in an if statement, using or not using an else clause is equivalent in terms of behavior. However, the proper use of else clauses and guard clauses (lack of else) can help emphasize the intent of the code to the reader.

Consider the following guidelines to help you structure your functions:

  • Use a guard clause to handle special cases upfront, so that the rest of the code can focus on the core logic. A guard clause checks a criterion and fails fast or returns early if it is not met, which reduces nesting (see the Reduce Nesting article).

def parse_path(path: str) -> Path:

  if not path:

    raise ValueError(“path is empty.”)

  else:

    # Nested logic here.

    ...

def parse_path(path: str) -> Path:

  if not path:

    raise ValueError(“path is empty.”)

  # No nesting needed for the valid case.

  ...

  • Use else if it is part of the core responsibility. Prefer to keep related conditional logic syntactically grouped together in the same if...else structure if each branch is relevant to the core responsibility of the function. Grouping logic in this way emphasizes the complementary nature of each condition. The complementary nature is emphasized explicitly by the else statement, instead of being inferred and relying on the resulting behavior of the prior return statement.

def get_favicon(self) -> Icon:

  if self.user.id is None:

    return Icon.SIGNED_OUT

  if self.browser.incognito: return Icon.INCOGNITO

  if not self.new_inbox_items:

    return Icon.EMPTY;

  return Icon.HAS_ITEMS

def get_favicon(self) -> Icon:

  if self.user.id is None:

    return Icon.SIGNED_OUT

  elif self.browser.incognito:

    return Icon.INCOGNITO

  elif not self.new_inbox_items:

    return Icon.EMPTY

  else:

    return Icon.HAS_ITEMS

  # No trailing return is needed or allowed.

When it’s idiomatic, use a switch (or similar) statement instead of if...else statements. (switch/when in Go/Kotlin can accept boolean conditions like if...else.)

Not all scenarios will be clear-cut for which pattern to use; use your best judgment to choose between these two styles. A good rule of thumb is use a guard if it's a special case, use else if its core logic. Following these guidelines can improve code understandability by emphasizing the connections between different logical branches.


Use Abstraction to Improve Function Readability

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.


By Palak Bansal and Mark Manley


Which version of the createPizza function below is easier to understand?

func createPizza(order *Order) *Pizza {  

  pizza := &Pizza{Base: order.Size,

                  Sauce: order.Sauce,

                  Cheese: “Mozzarella”}


  if order.kind == “Veg” {

    pizza.Toppings = vegToppings

  } else if order.kind == “Meat” {

    pizza.Toppings = meatToppings

  }


  oven := oven.New()

  if oven.Temp != cookingTemp { 

    for (oven.Temp < cookingTemp) {

      time.Sleep(checkOvenInterval)

      oven.Temp = getOvenTemp(oven)

    }

  }


  if !pizza.Baked {

    oven.Insert(pizza)

    time.Sleep(cookTime)

    oven.Remove(pizza)

    pizza.Baked = true

  }


  box := box.New()

  pizza.Boxed = box.PutIn(pizza)

  pizza.Sliced = box.SlicePizza(order.Size)

  pizza.Ready = box.Close()

  return pizza  

}

func createPizza(order *Order) *Pizza {

  pizza := prepare(order)

  bake(pizza)

  box(pizza)

  return pizza

}


func prepare(order *Order) *Pizza {

  pizza := &Pizza{Base: order.Size,

                  Sauce: order.Sauce,

                  Cheese: “Mozzarella”}

  addToppings(pizza, order.kind)

  return pizza

}


func addToppings(pizza *Pizza, kind string) {

  if kind == “Veg” {

    pizza.Toppings = vegToppings

  } else if kind == “Meat” {

    pizza.Toppings = meatToppings

  }

}


func bake(pizza *Pizza) {

  oven := oven.New()

  heatOven(oven) 

  bakePizza(pizza, oven)

}


func heatOven(oven *Oven) { … }

func bakePizza(pizza *Pizza, oven *Oven) { … }

func box(pizza *Pizza) { … }

You probably said the right-hand side is easier, but why? The left side mixes together several levels of abstraction: low-level implementation details (e.g., how to heat the oven), intermediate-level functions (e.g., how to bake pizza), and high-level abstractions (e.g., preparing, baking, and boxing the pizza).

The right side is easier to follow because the functions have a consistent level of abstraction, providing a top-down narrative of the code’s logic. createPizza is a high-level function that delegates the preparing, baking, and boxing steps to lower-level specialized functions with intuitive names. Those functions, in turn, delegate to their own lower-level specialized functions (e.g., heatOven) until they reach a function that handles implementation details without needing to call other functions. 

Avoid mixing different abstraction layers into a single function. Nest functions at equal abstraction levels to provide a narrative. This self-documenting style is simpler to follow, debug, and reuse.

You can learn more about this topic in the book Clean Code by Robert C. Martin. 



Better Feedback Makes for Faster Reviews

Testing on the Toilet Presents...

Healthy Code on the Commode



Better Feedback Makes for Faster Reviews

by Felipe Sodré and Adam Bender

Have you ever received a code review with a comment like this?

server_is_alive = InitializeServer()

Comment:

This doesn’t make any sense. Why don't you use InitializeServerWithAFewExtraSteps() instead?


If so, it probably left you scratching your head. Why is the reviewer asking this question? Are they testing you? Did you make a big mistake? The problem is the comment is vague (and maybe even a bit rude). It also leaves out important context as to what the reviewer is thinking, making it difficult to respond to. Here are a few simple ways to ensure your review comments are effective, helpful, and clear. 

  • Be kind! People are more receptive to feedback if you assume competence and treat them with respect. 

  • Focus your comments on the code, not the author. Avoid statements with the word ‘you’ which can give the impression that you are judging the person and not the artifact.

  • Explain why you are making the comment. You may be aware of alternatives that are not obvious to the author, or they may be aware of additional constraints.

  • Express the tradeoffs your suggestion entails and take a pragmatic approach to working with the author to achieve the right balance.

  • Approach your role as a guide, not a test grader. Balance giving direct guidance with leaving some degrees of freedom for the author.

  • Consider marking low priority comments with severity like Nit, Optional, or FYI to help the author prioritize the important ones. 

Some comments can be difficult to express concisely. If you have a rewrite in mind, like a subtle API usage or structural change (e.g., split this big change into smaller changes), consider providing an example. If you find you are stuck after more than two rounds of feedback, consider moving to a more direct communication channel like chat or live call to discuss next steps.

When done well, comments can make code reviews more efficient and collaborative:

server_is_alive = InitializeServer()

Comment:

InitializeServerWithAFewExtraSteps() seems to achieve the same result but with built-in logging and auditing (see http://short_link_here). Would that be a better option here?


The goal of reviews of any kind is to get the best possible outcome for you and your team. Software Engineering is a team effort, and by helping each other more effectively you will deliver better, together!

References: Code Health: Respectful Reviews == Useful Reviews, Google's guide on Code Review Comments

More information, discussion, and archives: testing.googleblog.com

Copyright Google LLC. Licensed under a Creative Commons

Attribution–ShareAlike 4.0 License (http://creativecommons.org/licenses/by-sa/4.0/).


Sensenmann: Code Deletion at Scale

By Phil Norman

Code at Scale

At Google, tens of thousands of software engineers contribute to a multi-billion-line mono-repository. This repository, stored in a system called Piper, contains the source code of shared libraries, production services, experimental programs, diagnostic and debugging tools: basically anything that's code-related.


This open approach can be very powerful. For example, if an engineer is unsure how to use a library, they can find examples just by searching. It also allows kind-hearted individuals to perform important updates across the whole repository, be that migrating to newer APIs, or following language developments such as Python 3, or Go generics.


Code, however, doesn't come for free: it's expensive to produce, but also costs real engineering time to maintain. Such maintenance cannot easily be skipped, at least if one wants to avoid larger costs later on.


But what if there were less code to maintain? Are all those lines of code really necessary?


Deletion at Scale

Any large project accumulates dead code: there's always some module that is no longer needed, or a program that was used during early development but hasn't been run in years. Indeed, entire projects are created, function for a time, and then stop being useful. Sometimes they are cleaned up, but cleanups require time and effort, and it's not always easy to justify the investment.


However, while this dead code sits around undeleted, it's still incurring a cost: the automated testing system doesn't know it should stop running dead tests; people running large-scale cleanups aren't aware that there's no point migrating this code, as it is never run anyway.


So what if we could clean up dead code automatically? That was exactly what people started thinking several years ago, during the Zürich Engineering Productivity team's annual hackathon. The Sensenmann project, named after the German word for the embodiment of Death, has been highly successful. It submits over 1000 deletion changelists per week, and has so far deleted nearly 5% of all C++ at Google.


Its goal is simple (at least, in principle): automatically identify dead code, and send code review requests ('changelists') to delete it.


What to Delete?

Google's build system, Blaze (the internal version of Bazel) helps us determine this: by representing dependencies between binary targets, libraries, tests, source files and more, in a consistent and accessible way, we're able to construct a dependency graph. This allows us to find libraries that are not linked into any binary, and propose their deletion.


That's only a small part of the problem, though: what about all those binaries? All the one-shot data migration programs, and diagnostic tools for deprecated systems? If they don't get removed, all the libraries they depend on will be kept around too.


The only real way to know if programs are useful is to check whether they're being run, so for internal binaries (programs run in Google's data centres, or on employee workstations), a log entry is written when a program runs, recording the time and which specific binary it is. By aggregating this, we get a liveness signal for every binary used in Google. If a program hasn't been used for a long time, we try sending a deletion changelist.


What Not to Delete?

There are, of course, exceptions: some program code is there simply to serve as an example of how to use an API; some programs run only in places we can't get a log signal from. There are many other exceptions too, where removing the code would be deleterious. For this reason, it's important to have a blocklisting system so that exceptions can be marked, and we can avoid bothering people with spurious changelists.


The Devel's in the Details

Consider a simple case. We have two binaries, each depending on its own library, and also on a third, shared library. Drawing this (ignoring the source files and other dependencies), we find this kind of structure:


If we see that main1 is in active use, but main2 was last used over a year ago, we can propagate the liveness signal through the build tree, marking main1 as alive along with everything it depends upon. What is left can be removed; as main2 depends on lib2, we want to delete these two targets in the same change:


So far so good, but real production code has unit tests, whose build targets depend upon the libraries they test. This immediately makes the graph traversal a lot more complicated:


The testing infrastructure is going to run all those tests, including lib2_test, despite lib2 never being executed 'for real'. This means we cannot use test runs as a 'liveness' signal: if we did, we'd consider lib2_test to be alive, which would keep lib2 around forever. We would only be able to clean up untested code, which would severely hamper our efforts.


What we really want is for each test to share the fate of the library it is testing. We can do this by making the library and its test interdependent, thus creating loops in the graph:


This turns each library and its test into a strongly connected component. We can use the same technique as before, marking the 'live' nodes and then hunting for collections of 'dead' nodes to be deleted, but this time using Tarjan's strongly connected components algorithm to deal with the loops.


Simple, right? Well, yes, if it's easy to identify the relationships between tests and the libraries they're testing. Sadly, that is not always the case. In the examples above, there's a simple naming convention which allows us to match tests to libraries, but we can't rely on that heuristic in general.


Consider the following two cases:



On the left, we have an implementation of the LZW compression algorithm, as separate compressor and decompressor libraries. The test is actually testing both of them, to ensure data isn't corrupted after being compressed and then decompressed. On the right, we have a web_test that is testing our web server library; it uses a URL encoder library for support, but isn't actually testing the URL encoder itself. On the left, we want to consider the LZW test and both LZW libraries as one connected component, but on the right, we'd want to exclude the URL encoder and consider web_test and web_lib as the connected component.


Despite requiring different treatment, these two cases have identical structures. In practice, we can encourage engineers to mark libraries like url_encoder_lib as being 'test only' (ie. only for use to support unit testing), which can help in the web-test case; otherwise our current approach is to use the edit distance between test and library names to pick the most likely library to match to a given test. Being able to identify cases like the LZW example, with one test and two libraries, is likely to involve processing test coverage data, and has not yet been explored.


Focus on the User...

While the ultimate beneficiaries of dead code deletion are the software engineers themselves, many of whom appreciate the help in keeping their projects tidy, not everyone is happy to receive automated changelists trying to delete code they wrote. This is where the social engineering side of the project comes in, which is every bit as important as the software engineering.


Automatic code deletion is an alien concept to many engineers, and just as with the introduction of unit testing 20 years ago, many are resistent to it. It takes time and effort to change people's minds, along with a good deal of careful communication.


There are three main parts to Sensenmann's communication strategy. Of primary importance are the change descriptions, as they are the first thing a reviewer will see. They must be concise, but must provide enough background for all reviewers to be able to make a judgement. This is a difficult balance to achieve: too short, and many people will fail to find the information they need; too long, and one ends up with a wall of text no one will bother to read. Well-labelled links to supporting documentation and FAQs can really help here.


The second part is the supporting documentation. Concise and clear wording is vital here, too, as is a good navigable structure. Different people will need different information: some need reassurance that in a source control system, deletions can be rolled back; some will need guidance in how best to deal with a bad change, for example by fixing a misuse of the build system. Through careful thought, and iterations of user feedback, the supporting documentation can become a useful resource.


The third part is dealing with user feedback. This can be the hardest part at times: feedback is more frequently negative than positive, and can require a cool head and a good deal of diplomacy at times. However, accepting such feedback is the best way to improve the system in general, make users happier, and thus avoid negative feedback in the future.


Onwards and Upwards

Automatically deleting code may sound like a strange idea: code is expensive to write, and is generally considered to be an asset. However, unused code costs time and effort, whether in maintaining it, or cleaning it up. Once a code base reaches a certain size, it starts to make real sense to invest engineering time in automating the clean-up process. At Google's scale, it is estimated that automatic code deletion has paid for itself tens of times over, in saved maintenance costs.


The implementation requires solutions to problems both technical and social in nature. While a lot of progress has been made in both these areas, they are not entirely solved yet. As improvements are made, though, the rate of acceptance of the deletions increases and automatic deletion becomes more and more impactful. This kind of investment will not make sense everywhere, but if you have a huge mono-repository, maybe it'd make sense for you too. At least at Google, reducing the C++ maintenance burden by 5% is a huge win.



Code Health: Now You’re Thinking With Functions

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

by Cathal Weakliam

Loops are the standard way to process collections like arrays and lists. However, some loops implement the same patterns repeatedly, leading to duplicate code. Higher-order functions—functions that use other functions as inputs or outputs—can reduce duplication by providing a simpler way to express common operations with collections.

Consider these two loops in JavaScript that decide if every object in an array meets a condition:

let everyRequestValid = true;
for (const request of requests) {
if (!isValid(request)) {
everyRequestValid = false;
break;
}
}

if (everyRequestValid) {
// do something
}
let everyUserEligible = true;
for (const user of users) {
if (!isEligible(user)) {
everyUserEligible = false;
break;
}
}

if (everyUserEligible) {
// do something
}


The high similarity between the two loops violates the Don’t Repeat Yourself principle and creates an unnecessary burden on readers and maintainers of the code.

To reduce the maintenance burden, use the every method to replace each loop with a single expression.  (In other languages every may have a different name, like all or allMatch).

if (requests.every(isValid)) {
// do something
}
if (users.every(isEligible)) {
// do something
}


Processing collections with higher-order functions has several benefits:
  • It significantly reduces duplication by abstracting away the common looping code.
  • The resulting code is much shorter and simpler, with less opportunity for bugs.
  • A reader can quickly see the intent of the code as it is not obscured behind low-level control flow.
Two other common higher-order functions are map (apply a function to each element of a collection) and filter (select the elements of a collection that pass a predicate). While the exact syntax varies by language, here’s how they look in JavaScript (using an anonymous function as the argument):

// Double each value in `ints`
const doubled = ints.map(n => n * 2);
// Get only the even values in `ints`
const evens = ints.filter(n => n % 2 === 0);


Just don’t overdo it! Don’t rewrite a loop with functions if it makes it harder to understand, or if it would be considered unidiomatic in your language (for example, in Python, list comprehensions are equivalent to map and filter but are usually preferred).

A Tale of Two Features

By George Pirocanac


I have often been asked, “What is the most memorable bug that you have encountered in your testing career?” For me, it is hands down a bug that happened quite a few years ago. I was leading an Engineering Productivity team that supported Google App Engine. At that time App Engine was still in its early stages, and there were many challenges associated with testing rapidly evolving features. Our testing frameworks and processes were also evolving, so it was an exciting time to be on the team. 

What makes this bug so memorable is that I spent so much time developing a comprehensive suite of test scenarios, yet a failure occurred during such an obvious use case that it left me shaking my head and wondering how I had missed it. Even with many years of testing experience it can be very humbling to construct scenarios that adequately mirror what will happen in the field.

I’ll try to provide enough background for the reader to play along and see if they can determine the anomalous scenario. As a further hint, the problem resulted from the interaction of two App Engine features, so I like calling this story A Tale of Two Features.

Feature 1 - Datastore Admin (backup, restore, delete)


Google App Engine was released 13 years ago as Google’s first Cloud product. It allowed users to build and deploy highly scalable web applications in the Cloud. To support this, it had its own scalable database called the Datastore. An administration console allowed users to manage the application and its Datastore through a web interface. Users wrote applications that consisted of request handlers that App Engine invoked according to the URL that was specified. The handlers could call App Engine services like Datastore through a remote procedure call (RPC) mechanism. Figure 1 illustrates this flow.



The first feature in this Tale of Two Features resided in the administration console, providing the ability to back up, restore, and delete selected or all of an application’s entities in the Datastore. It was implemented in a clever way that incorporated it directly into the application, rather than as an independent utility. As part of the application it could freely operate on the Datastore and incur the same billing charges as other datastore operations within the application. When the feature was invoked, traffic would be sent to its handler and the application would process it. Figure 2 illustrates this request flow.




By the time this memorable bug occurred, this Datastore administration feature was mature, well tested, and stable. No changes were being made to it.


Feature 2 - Utilities for Migrating to the HR - Datastore


The second feature (or more accurately, set of features) came at least a year after the first feature was released and stable. It helped users migrate their applications to a new High Replication (HR) Datastore. The HR Datastore was more reliable than its predecessor, but using it meant creating a new application and copying over all the data from the old Datastore. To support such migrations, App Engine developers added two new features to the administration console. The first copied all the data from the Datastore of one application to another, and the second redirected all traffic from one application to another. The latter was particularly useful because it meant the new application would seamlessly receive the traffic after a migration. This set of features was written by another team, and we in Engineering Productivity supported them by creating processes for testing various Datastore migrations. The migration-support features were thoroughly tested and released to developers. Figure 3 illustrates the request flow of the redirection feature.



What Could Possibly Go Wrong?


So this was the situation when we released these utilities for migrating to the new Datastore. We were very confident that they worked, as we had tested migrations of many different types and sizes of Datastore entities. We had also tested that a migration could be interrupted without data loss. All checked out fine. I was confident that this new feature would work, yet soon after we released it, we started getting problem reports.

If you have been playing along, now is the time to ask yourself,  “What could possibly go wrong?” As an added hint, the problem reports claimed that all the data in the newly migrated application was disappearing.


What Did Go Wrong

As mentioned above, developers began to report that data was disappearing from their newly migrated applications. It wasn’t at all common, yet of course it is most disconcerting when data “just disappears.” We had to investigate how this could occur. Our standard processes ensured that we had internal backups of the data, which were promptly restored. In parallel we tried to reproduce the problem, but we couldn’t—at least until we figured out what was happening. As I mentioned earlier, once we understood it, it was quite obvious, but that only made it all the more painful that we missed it.

What was happening was that, after migrating and automatically redirecting traffic to the new application, a number of customers thought they still needed to delete the data from their old application, so they used the first Datastore admin feature to do that. As expected, the feature sent traffic to that application to delete the entities from the Datastore. But that traffic was now being automatically redirected to the new application, and voila—all the data that had been copied earlier was now deleted there. Since only a handful of developers tried to delete the data from their old applications, this explained why the problem occurred only rarely. Figure 4 illustrates this request flow.



Obvious, isn’t it, once you know what is happening.


Lessons Learned


This all occurred years ago, and App Engine is based on a far different and more robust framework today. Datastore migrations are but a memory from the past, yet this experience made a great impression on me.

The most important thing I learned from this experience is that, while it is important to test features for their functionality, it’s also important to think of them as part of workflows. In performing our testing we exercised a very limited number of steps in the migration process workflow and omitted a very reasonable step at the end: trying to delete the data from the old application. Our focus was in testing the variability of contents in the Datastore rather than different steps in the migration process. It was this focus that kept our eyes away from the relatively obvious failure case.

Another thing I learned was that this bug might have been caught if the developer of the first feature had been in the design review for the second set of migration features (particularly the feature that automatically redirects traffic). Unfortunately, that person had already joined a new team. A key step in reducing bugs can occur at the design stage if “what-if” questions are being asked.

Finally, I was enormously impressed that we were able to recover so quickly. Protecting against data loss is one of the most important aspects of Cloud management, and being able to recover from mistakes is at least as important as trying to prevent them. I have the utmost respect for my coworkers in Site Reliability Engineering (SRE).


References