Tag Archives: Jobs

COOL to be a TE @ Google

By Anantha Keesara

Test Engineers are a part of Google’s Engineering Productivity (EngProd) Group. As mentioned in a previous post, we advocate for our users, provide comprehensive testing solutions, and play a key role creating successful and reliable products and platforms. At Google, Test Engineers are not manual testers; we are technical engineers whose focus is on advancing product excellence and engineering productivity.

In short, it’s COOL (Constant learner, Out-of-the-box thinker, Orchestrator, Leading-edge user) to be a Test Engineer at Google:


Constant learning is what keeps Google Test Engineers motivated. We understand holistically how all the pieces of the software stack are interconnected and what kind of coverage exists or is needed to test the connections between the stacks.This product knowledge makes us test experts. We work closely with Software Engineers from the very beginning of the development process to discuss the testability of the designs before the features are implemented.  We develop test strategies, methodologies, and test plans; we write scripts, design systems, and build tools and test infrastructure. We review design docs, do deep dives into Google's massive codebase, evaluate stack traces, and determine the root causes of production outages. Through this constant learning, we not only build deep technical expertise and do risk management by identifying weak spots in the code base, we also find creative ways to break software and identify potential problems. Our job ladder also gives us the flexibility and independence to explore and learn new technologies like ML concepts and Cloud computing and to build new testing solutions or improve existing ones.


Out-of-the-box thinking, a result of constant learning, is another thing that keeps us motivated. As Google Test Engineers, we champion Engineering excellence by  providing optimized solutions to address engineering inefficiencies, testing gaps, and process gaps. We constantly think of ways to make machines do the work to increase testability and productivity. Hundreds and thousands of lines of code get checked-in every minute at Google. To maintain velocity, quality, and code health, we devise creative ways to test and debug the test failures -- like performing diff testing, building dynamic test cases from the logs, designing heuristic algorithms to identify culprits for test failures, building solutions to reduce the test run time, and implementing stubs, fakes, and mock objects and servers to help developers write stable unit and integration testing. Along with devising creative ways to test and debug the test failures, we also focus on improving engineering excellence and product excellence by defining and measuring productivity metrics and product health metrics like quality, stability, and performance. The testing of, for example, Search, Ads, Maps, YouTube, Cloud, self-driving cars, and Google Apps would not have scaled with traditional testing practices. 

Orchestrating the testing efforts is a key responsibility of Google Test Engineers. As orchestrators we can collaborate with cross functional teams including Product Managers, Technical Program Managers, and Software engineers to define critical user journeys (CUJs), determine testing strategies, and ensure that the right tests are run on the right configurations/environments. With our strong communication and collaboration skills, we work with the cross-functional teams and play the role of evangelists in spreading the word on new tools, technologies, and best testing practices.  We also have the opportunity to host Hackathons and Fixits, host interns, drive college recruiting events, engage with the open source community in testing the open source products, listen to feedback, and convert that feedback into product improvements.

Leading-edge user: the fun part of being a Test Engineer! We can engage with product development, participate in the review of product designs, documentation, and prototypes, play with features and products early on, and provide informed feedback. Best of all, as early adopters we get to wear wearables, ride in self driving cars, be in our own world with AR/VR, engage with Google Assistant to get our chores done, and have multiple laptops, phones, and smart display units! 


Stay tuned to learn more COOL things about Test Engineering at Google! 

What Test Engineers do at Google: Building Test Infrastructure

Author: Jochen Wuttke

In a recent post, we broadly talked about What Test Engineers do at Google. In this post, I talk about one aspect of the work TEs may do: building and improving test infrastructure to make engineers more productive.

Refurbishing legacy systems makes new tools necessary
A few years ago, I joined an engineering team that was working on replacing a legacy system with a new implementation. Because building the replacement would take several years, we had to keep the legacy system operational and even add features, while building the replacement so there would be no impact on our external users.

The legacy system was so complex and brittle that the engineers spent most of their time triaging and fixing bugs and flaky tests, but had little time to implement new features. The goal for the rewrite was to learn from the legacy system and to build something that was easier to maintain and extend. As the team's TE, my job was to understand what caused the high maintenance cost and how to improve on it. I found two main causes:
  • Tight coupling and insufficient abstraction made unit testing very hard, and as a consequence, a lot of end-to-end tests served as functional tests of that code.
  • The infrastructure used for the end-to-end tests had no good way to create and inject fakes or mocks for these services. As a result, the tests had to run the large number of servers for all these external dependencies. This led to very large and brittle tests that our existing test execution infrastructure was not able to handle reliably.
Exploring solutions
At first, I explored if I could split the large tests into smaller ones that would test specific functionality and depend on fewer external services. This proved impossible, because of the poorly structured legacy code. Making this approach work would have required refactoring the entire system and its dependencies, not just the parts my team owned.

In my second approach, I also focussed on large tests and tried to mock services that were not required for the functionality under test. This also proved very difficult, because dependencies changed often and individual dependencies were hard to trace in a graph of over 200 services. Ultimately, this approach just shifted the required effort from maintaining test code to maintaining test dependencies and mocks.

My third and final approach, illustrated in the figure below, made small tests more powerful. In the typical end-to-end test we faced, the client made RPCcalls to several services, which in turn made RPC calls to other services. Together the client and the transitive closure over all backend services formed a large graph (not tree!) of dependencies, which all had to be up and running for the end-to-end test. The new model changes how we test client and service integration. Instead of running the client on inputs that will somehow trigger RPC calls, we write unit tests for the code making method calls to the RPC stub. The stub itself is mocked with a common mocking framework like Mockito in Java. For each such test, a second test verifies that the data used to drive that mock "makes sense" to the actual service. This is also done with a unit test, where a replay client uses the same data the RPC mock uses to call the RPC handler method of the service.


This pattern of integration testing applies to any RPC call, so the RPC calls made by a backend server to another backend can be tested just as well as front-end client calls. When we apply this approach consistently, we benefit from smaller tests that still test correct integration behavior, and make sure that the behavior we are testing is "real".

To arrive at this solution, I had to build, evaluate, and discard several prototypes. While it took a day to build a proof-of-concept for this approach, it took me and another engineer a year to implement a finished tool developers could use.

Adoption
The engineers embraced the new solution very quickly when they saw that the new framework removes large amounts of boilerplate code from their tests. To further drive its adoption, I organized multi-day events with the engineering team where we focussed on migrating test cases. It took a few months to migrate all existing unit tests to the new framework, close gaps in coverage, and create the new tests that validate the mocks. Once we converted about 80% of the tests, we started comparing the efficacy of the new tests and the existing end-to-end tests.

The results are very good:
  • The new tests are as effective in finding bugs as the end-to-end tests are.
  • The new tests run in about 3 minutes instead of 30 minutes for the end-to-end tests.
  • The client side tests are 0% flaky. The verification tests are usually less flaky than the end-to-end tests, and never more.
Additionally, the new tests are unit tests, so you can run them in your IDE and step through them to debug. These results allowed us to run the end-to-end tests very rarely, only to detect misconfigurations of the interacting services, but not as functional tests.

Building and improving test infrastructure to help engineers be more productive is one of the many things test engineers do at Google. Running this project from requirements gathering all the way to a finished product gave me the opportunity to design and implement several prototypes, drive the full implementation of one solution, lead engineering teams to adoption of the new framework, and integrate feedback from engineers and actual measurements into the continuous refinement of the tool.

What Test Engineers do at Google

by Matt Lowrie, Manjusha Parvathaneni, Benjamin Pick, and Jochen Wuttke

Test engineers (TEs) at Google are a dedicated group of engineers who use proven testing practices to foster excellence in our products. We orchestrate the rapid testing and releasing of products and features our users rely on. Achieving this velocity requires creative and diverse engineering skills that allow us to advocate for our users. By building testable user journeys into the process, we ensure reliable products. TEs are also the glue that bring together feature stakeholders (product managers, development teams, UX designers, release engineers, beta testers, end users, etc.) to confirm successful product launches. Essentially, every day we ask ourselves, “How can we make our software development process more efficient to deliver products that make our users happy?”.

The TE role grew out of the desire to make Google’s early free products, like Search, Gmail and Docs, better than similar paid products on the market at the time. Early on in Google’s history, a small group of engineers believed that the company’s “launch and iterate” approach to software deployment could be improved with continuous automated testing. They took it upon themselves to promote good testing practices to every team throughout the company, via some programs you may have heard about: Testing on the Toilet, the Test Certified Program, and the Google Test Automation Conference (GTAC). These efforts resulted in every project taking ownership of all aspects of testing, such as code coverage and performance testing. Testing practices quickly became commonplace throughout the company and engineers writing tests for their own code became the standard. Today, TEs carry on this tradition of setting the standard of quality which all products should achieve.

Historically, Google has sustained two separate job titles related to product testing and test infrastructure, which has caused confusion. We often get asked what the difference is between the two. The rebranding of the Software engineer, tools and infrastructure (SETI) role, which now concentrates on engineering productivity, has been addressed in a previous blog post. What this means for test engineers at Google, is an enhanced responsibility of being the authority on product excellence. We are expected to uphold testing standards company-wide, both programmatically and persuasively.

Test engineer is a unique role at Google. As TEs, we define and organize our own engineering projects, bridging gaps between engineering output and end-user satisfaction. To give you an idea of what TEs do, here are some examples of challenges we need to solve on any particular day:
  • Automate a manual verification process for product release candidates so developers have more time to respond to potential release-blocking issues.
  • Design and implement an automated way to track and surface Android battery usage to developers, so that they know immediately when a new feature will cause users drained batteries.
  • Quantify if a regenerated data set used by a product, which contains a billion entities, is better quality than the data set currently live in production.
  • Write an automated test suite that validates if content presented to a user is of an acceptable quality level based on their interests.
  • Read an engineering design proposal for a new feature and provide suggestions about how and where to build in testability.
  • Investigate correlated stack traces submitted by users through our feedback tracking system, and search the code base to find the correct owner for escalation.
  • Collaborate on determining the root cause of a production outage, then pinpoint tests that need to be added to prevent similar outages in the future.
  • Organize a task force to advise teams across the company about best practices when testing for accessibility.
Over the next few weeks leading up to GTAC, we will also post vignettes of actual TEs working on different projects at Google, to showcase the diversity of the Google Test Engineer role. Stay tuned!

From QA to Engineering Productivity

By Ari Shamash

In Google’s early days, a small handful of software engineers built, tested, and released software. But as the user-base grew and products proliferated, engineers started specializing in roles, creating more scale in the development process:

  • Test Engineers (TEs) --  tested new products and systems integration
  • Release Engineers (REs) --  pushed bits into production
  • Site Reliability Engineers (SREs) --  managed systems and data centers 24x7.

This story focuses on the evolution of quality assurance and the roles of the engineers behind it at Google.  The REs and SREs also evolved, but we’ll leave that for another day.

Initially, teams relied heavily on manual operations.  When we attempted to automate testing, we largely focused on the frontends, which worked, because Google was small and our products had fewer integrations.  However, as Google grew, longer and longer manual test cycles bogged down iterations and delayed feature launches.  Also, since we identified bugs later in the development cycle, it took us longer and longer to fix them.  We determined that pushing testing upstream via automation would help address these issues and accelerate velocity.

As manual testing transitioned to automated processes, two separate testing roles began to emerge at Google:

  • Test Engineers (TEs) -- With their deep product knowledge and test/quality domain expertise, TEs focused on what should be tested.
  • Software Engineers in Test (SETs) -- Originally software engineers with deep infrastructure and tooling expertise, SETs built the frameworks and packages required to implement automation.

The impact was significant:

  • Automated tests became more efficient and deterministic (e.g. by improving runtimes, eliminating sources of flakiness, etc.) 
  • Metrics driven engineering proliferated (e.g. improving code and feature coverage led to higher quality products).

Manual operations were reduced to manual verification on new features, and typically only in end-to-end, cross product integration boundaries.  TEs developed extreme depth of knowledge for the products they supported.  They became go-to engineers for product teams that needed expertise in test automation and integration. Their role evolved into a broad spectrum of responsibilities: writing scripts to automate testing, creating tools so developers could test their own code, and constantly designing better and more creative ways to identify weak spots and break software.

SETs (in collaboration with TEs and other engineers) built a wide array of test automation tools and developed best practices that were applicable across many products. Release velocity accelerated for products.  All was good, and there was much rejoicing!

SETs initially focused on building tools for reducing the testing cycle time, since that was the most manually intensive and time consuming phase of getting product code into production.  We made some of these tools available to the software development community: webdriver improvements, protractor, espresso, EarlGrey, martian proxy, karma, and GoogleTest. SETs were interested in sharing and collaborating with others in the industry and established conferences. The industry has also embraced the Test Engineering discipline, as other companies hired software engineers into similar roles, published articles, and drove Test-Driven Development into mainstream practices.

Through these efforts, the testing cycle time decreased dramatically, but interestingly the overall velocity did not increase proportionately, since other phases in the development cycle became the bottleneck.  SETs started building tools to accelerate all other aspects of product development, including:

  • Extending IDEs to make writing and reviewing code easier, shortening the “write code” cycle
  • Automating release verification, shortening the “release code” cycle.
  • Automating real time production system log verification and anomaly detection, helping automate production monitoring.
  • Automating measurement of developer productivity, helping understand what’s working and what isn’t.

In summary, the work done by the SETs naturally progressed from supporting only product testing efforts to include supporting product development efforts as well. Their role now encompassed a much broader Engineering Productivity agenda.

Given the expanded SET charter, we wanted the title of the role to reflect the work. But what should the new title be?  We empowered the SETs to choose a new title, and they overwhelmingly (91%) selected Software Engineer, Tools & Infrastructure (abbreviated to SETI).

Today, SETIs and TEs still collaborate very closely on optimizing the entire development life cycle with a goal of eliminating all friction from getting features into production. Interested in building next generation tools and infrastructure?  Join us!