Posted by Harini Chandrasekharan, Staff Software Engineer, Google Play
The Google Play Store, launched 10 years ago in 2012 sits at the heart of Android, connecting billions of users with an equally staggering and ever-growing collection of apps and games worldwide.
Let's take a peek behind the curtains to learn what it takes to design the serving infrastructure of the worlds largest Android marketplace. In the world of consumer facing software, it's not a surprise that out of box engineering solutions fail to meet the requirements that Google scale demands. Therefore every system at Google is carefully crafted and honed with iterative enhancements to meet the unique availability, quality and latency demands of the Google Play Store.
What is feature engineering?
In the domain of consumer facing features, users’ opinions and choices, developer ecosystem and demand often changes faster than infrastructure can. In such an environment, the biggest challenge engineers face is how to be nimble and design infrastructure that’s not only future-proof but also meets the needs of the consumer space within the constraints of scalability and performance. Let’s take a deeper look at some engineering challenges in such a dynamic space.
What does success look like?
In a data driven organization such as the Play store, metrics are built for measuring anything and everything of importance. Here are some of the dimensions that come in handy when measuring and tracking success:
- Product/business metrics - These are metrics specific to the product or service under consideration. Running A/B experiments to measure changes to these metrics for the new treatment builds confidence, particularly when decision making involves several tradeoffs.
- Performance - Measuring latency, error rates and availability makes the backbone of almost every service and for good reason. Knowing these baseline metrics is essential since this closely tracks user experience and perception of the product.
- System health - These are internal system metrics tracking resource utilization and fleet stability.
Challenges in feature engineering infrastructure
Designing backend systems that scale to the requirements of the Play Store that also meet the performance criteria required to make user interactions feel fluid and responsive is paramount. From an engineering perspective, infrastructure needs to continuously evolve to meet the needs of the business. The Play store is no different—the store infrastructure has evolved several times in the last decade to not only support the needs of new features that are available to users today, but also to modernize, eliminate tech debt and most of all reduce latency.
Frequent iterationChallenge: Features often require large amounts of iteration over time, it's hard to plan engineering infrastructure that meets all the future requirements.
In an experiment driven culture, the optimum approach for rapidly building features at scale often results in tech debt. Tech debt has various forms—relics of past features that did not make it result in layers that are hard to clean up, affect performance, make code error prone and hard to test.
Independent evolutionChallenge: In large organizations spanning 100s of engineers, several features are often being built in parallel and independent of each other.
Infrastructure reuse and sharing innovations are often impossible without significantly compromising on velocity. In a space where the product evolves at a rapid pace there is often a large amount of uncertainty with the different levers and knobs one can build into systems to make them flexible. Too many levers can lead to large system complexity. Too few levers and the cost of iteration is sky high. Finding the balance between the two is one of the core competencies of a feature engineer in this space.
Time to experimentChallenge: There is often an opportunity cost to pay for time spent building elegant engineering solutions.
Time to experiment is one of the most important metrics to keep in mind when designing solutions for user facing features. Flexible design that enables rapid iteration and meets the latency and other performance SLOs is ideal.
In practice, there is often a large amount of guesswork that goes into estimating impact of a particular user facing change, while we can use past data and learnings confidently to estimate in some scenarios, it's not sufficient for a brand new ambitious, never before tried idea.
Feature engineering guiding principles
Let’s see how the Play Store solves these challenges to enable state of the art innovation.
Data driven experiments and launches - understand your success metrics
Optimizing for time to market i.e getting the feature to the user and measuring how it impacts app installs and other store business metrics using A/B experiments is of prime importance. Iterating fast based on data helps tune the final feature to the desired end state. Google has several home grown technologies for running A/B experiments at worldwide scale with seamless integration with metric presentation tools that make running these experiments smooth and easy, so developers can spend more time coding and less in analysis.
Design and experiment with polished MVPs - with a focus on quality
Deciding what to build, whether it meets Google quality standards, understanding engineering costs and the user needs it solves are all important questions that need to be answered before designing anything. Feature Engineering is therefore often done in close collaboration with Product Managers. Aligning on the perfect MVP that can be built in a reasonable amount of engineering time that meets the user journey is the key to a successful product.
Frequently modernize the infrastructure - clean up tech debt
Frequent iterations and a fast MVP development culture often comes with its set of cons, the biggest being tech debt. In optimizing for fast velocity, cutting corners results in obsolete code (due to unlaunchable metrics) or discarded experiment flags. These often make testing, maintaining and impact future development velocity if left unfixed. Additionally, using the latest and greatest frameworks to get to the last milliseconds of latency or making development easier yields great dividends in the long run. Frequently modernizing the infrastructure either via refactoring or full rewrites may traditionally spell signs of poorly designed code, but it's one of the bigger tradeoffs that feature engineers often have to make, because after all what use is all the fancy infrastructure if users don't interact with the feature in the first place!