Category Archives: Google Developers Blog

News and insights on Google platforms, tools and events

Debug TensorFlow Models with tfdbg

Posted by Shanqing Cai, Software Engineer, Tools and Infrastructure.

We are excited to share TensorFlow Debugger (tfdbg), a tool that makes debugging of machine learning models (ML) in TensorFlow easier.
TensorFlow, Google's open-source ML library, is based on dataflow graphs. A typical TensorFlow ML program consists of two separate stages:
  1. Setting up the ML model as a dataflow graph by using the library's Python API,
  2. Training or performing inference on the graph by using the Session.run()method.
If errors and bugs occur during the second stage (i.e., the TensorFlow runtime), they are difficult to debug.

To understand why that is the case, note that to standard Python debuggers, the Session.run() call is effectively a single statement and does not exposes the running graph's internal structure (nodes and their connections) and state (output arrays or tensors of the nodes). Lower-level debuggers such as gdb cannot organize stack frames and variable values in a way relevant to TensorFlow graph operations. A specialized runtime debugger has been among the most frequently raised feature requests from TensorFlow users.

tfdbg addresses this runtime debugging need. Let's see tfdbg in action with a short snippet of code that sets up and runs a simple TensorFlow graph to fit a simple linear equation through gradient descent.

import numpy as np
import tensorflow as tf
import tensorflow.python.debug as tf_debug
xs = np.linspace(-0.5, 0.49, 100)
x = tf.placeholder(tf.float32, shape=[None], name="x")
y = tf.placeholder(tf.float32, shape=[None], name="y")
k = tf.Variable([0.0], name="k")
y_hat = tf.multiply(k, x, name="y_hat")
sse = tf.reduce_sum((y - y_hat) * (y - y_hat), name="sse")
train_op = tf.train.GradientDescentOptimizer(learning_rate=0.02).minimize(sse)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

sess = tf_debug.LocalCLIDebugWrapperSession(sess)
for _ in range(10):
sess.run(train_op, feed_dict={x: xs, y: 42 * xs})

As the highlighted line in this example shows, the session object is wrapped as a class for debugging (LocalCLIDebugWrapperSession), so the calling the run() method will launch the command-line interface (CLI) of tfdbg. Using mouse clicks or commands, you can proceed through the successive run calls, inspect the graph's nodes and their attributes, visualize the complete history of the execution of all relevant nodes in the graph through the list of intermediate tensors. By using the invoke_stepper command, you can let the Session.run() call execute in the "stepper mode", in which you can step to nodes of your choice, observe and modify their outputs, followed by further stepping actions, in a way analogous to debugging procedural languages (e.g., in gdb or pdb).

A class of frequently encountered issue in developing TensorFlow ML models is the appearance of bad numerical values (infinities and NaNs) due to overflow, division by zero, log of zero, etc. In large TensorFlow graphs, finding the source of such nodes can be tedious and time-consuming. With the help of tfdbg CLI and its conditional breakpoint support, you can quickly identify the culprit node. The video below demonstrates how to debug infinity/NaN issues in a neural network with tfdbg:

A screencast of the TensorFlow Debugger in action, from this tutorial.


Compared with alternative debugging options such as Print Ops, tfdbg requires fewer lines of code change, provides more comprehensive coverage of the graphs, and offers a more interactive debugging experience. It will speed up your model development and debugging workflows. It offers additional features such as offline debugging of dumped tensors from server environments and integration with tf.contrib.learn. To get started, please visit this documentation. This research paperlays out the design of tfdbg in greater detail.

The minimum required TensorFlow version for tfdbgis 0.12.1. To report bugs, please open issues on TensorFlow's GitHub Issues Page. For general usage help, please post questions on StackOverflow using the tag tensorflow.
Acknowledgements
This project would not be possible without the help and feedback from members of the Google TensorFlow Core/API Team and the Applied Machine Intelligence Team.





Announcing TensorFlow 1.0

Posted By: Amy McDonald Sandjideh, Technical Program Manager, TensorFlow

In just its first year, TensorFlow has helped researchers, engineers, artists, students, and many others make progress with everything from language translation to early detection of skin cancer and preventing blindness in diabetics. We're excited to see people using TensorFlow in over 6000 open-source repositories online.


Today, as part of the first annual TensorFlow Developer Summit, hosted in Mountain View and livestreamed around the world, we're announcing TensorFlow 1.0:


It's faster: TensorFlow 1.0 is incredibly fast! XLA lays the groundwork for even more performance improvements in the future, and tensorflow.org now includes tips & tricksfor tuning your models to achieve maximum speed. We'll soon publish updated implementations of several popular models to show how to take full advantage of TensorFlow 1.0 - including a 7.3x speedup on 8 GPUs for Inception v3 and 58x speedup for distributed Inception v3 training on 64 GPUs!


It's more flexible: TensorFlow 1.0 introduces a high-level API for TensorFlow, with tf.layers, tf.metrics, and tf.losses modules. We've also announced the inclusion of a new tf.keras module that provides full compatibility with Keras, another popular high-level neural networks library.


It's more production-ready than ever: TensorFlow 1.0 promises Python API stability (details here), making it easier to pick up new features without worrying about breaking your existing code.

Other highlights from TensorFlow 1.0:

  • Python APIs have been changed to resemble NumPy more closely. For this and other backwards-incompatible changes made to support API stability going forward, please use our handy migration guide and conversion script.
  • Experimental APIs for Javaand Go
  • Higher-level API modules tf.layers, tf.metrics, and tf.losses - brought over from tf.contrib.learnafter incorporating skflowand TF Slim
  • Experimental release of XLA, a domain-specific compiler for TensorFlow graphs, that targets CPUs and GPUs. XLA is rapidly evolving - expect to see more progress in upcoming releases.
  • Introduction of the TensorFlow Debugger (tfdbg), a command-line interface and API for debugging live TensorFlow programs.
  • New Android demos for object detection and localization, and camera-based image stylization.
  • Installation improvements: Python 3 docker images have been added, and TensorFlow's pip packages are now PyPI compliant. This means TensorFlow can now be installed with a simple invocation of pip install tensorflow.

We're thrilled to see the pace of development in the TensorFlow community around the world. To hear more about TensorFlow 1.0 and how it's being used, you can watch the TensorFlow Developer Summit talks on YouTube, covering recent updates from higher-level APIs to TensorFlow on mobile to our new XLA compiler, as well as the exciting ways that TensorFlow is being used:





Click herefor a link to the livestream and video playlist (individual talks will be posted online later in the day).


The TensorFlow ecosystem continues to grow with new techniques like Foldfor dynamic batching and tools like the Embedding Projector along with updates to our existing tools like TensorFlow Serving. We're incredibly grateful to the community of contributors, educators, and researchers who have made advances in deep learning available to everyone. We look forward to working with you on forums like GitHub issues, Stack Overflow, @TensorFlow, the discuss@tensorflow.orggroup, and at future events.



G Suite Developer Sessions at Google Cloud Next 2017

Originally posted on the G Suite Developers Blog

Posted by Wesley Chun (@wescpy), Developer Advocate, G Suite

There are over 200 sessions happening next month at Google Cloud's Next 2017 conferencein San Francisco... so many choices! Along with content geared towards Google Cloud Platform, this year features the addition of G Suite so all 3 pillars of cloud computing (IaaS, PaaS, SaaS) are represented!


There are already thousands of developers including Independent Software Vendors (ISVs) creating solutions to help schools and enterprises running the G Suite collaboration and productivity suite (formerly Google Apps). If you're thinking about becoming one, consider building applications that extend, enhance, and integrate G Suite apps and data with other mission critical systems to help businesses and educational institutions succeed.


Looking for inspiration? Here's a preview of some of the sessions that current and potential G Suite developers should consider:


The first is intro blog post & video for the latest Google Sheets API as well as the intro blog post & video for the Google Slides API. Part of the talk also covers Google Apps Script, the Javascript-in-the-cloud solution that gives developers programmatic access to authorized G Suite data along with the ability to connect to other Google and external services.


If that's not enough Apps Script for you, or you're new to that technology, swing by to hear its Product Manager give you an introduction in his talk, quick intro video to give you an idea of what you can do with it!


Did you know that Apps Script also powers "add-ons" which extend the functionality of Google Docs, Sheets, and Forms? Then come to "the G Suite Marketplace where administrators or employees can install your add-ons for their organizations.


In addition to Apps Script apps, all your Google Docs, Sheets, and Slides documents live in Google Drive. But did you know that Drive is not just for individual file storage? Hear directly from a Drive Product Manager on how you can, "the Drive API and Team Drives, you can extend what Drive can do for your organization. One example from the most recent Google I/O tells the story of how WhatsApp used the Drive API to back up all your conversations! To get started with your own Drive API integration, check out this blog post and short video. Confused by when you should use Google Drive or Google Cloud Storage? I've got an app, err video, for that too! :-)


Not a software engineer but still code as part of your profession? Want to build a custom app for your department or line of business without having to worry about IT overhead? You may have heard about Google App Maker, our low-code development tool that does exactly that. Curious to learn more about it? Hear directly from its Product Manager lead in his talk entitled, "

All of these talks are just waiting for you at
Next, the best place to get your feet wet developing for G Suite, and of course, the Google Cloud Platform. Start by checking out the session schedule. Next will also offer many opportunities to meet and interact with industry peers along with representatives from all over Google who love the cloud. Register today and see you in San Francisco!




Introducing Google Developers India: A Local Youtube Channel for India’s Mobile Development Revolution

Posted By Peter Lubbers, Senior Program Manager

Today, we're launching the Google Developers India channel: a brand new Youtube channel tailored for Indian Developers. The channel will include original content like interviews with local experts, developer spotlights, technical tutorials, and complete Android courses to help you be a successful developer.

Why India?

By 2018, India will have the largest developer base in the world with over 4 million developers. Our initiative to train 2 million Indian developers, along with the tremendous popularity of mobile development in the country and the desire to build better mobile apps, will be best catered by an India-specific developers channel featuring Indian developers, influencers, and experts.



Here is a taste of what's to come in 2017:
  • Tech Interviews: Advice from India's top developers, influencers and tech experts.
  • Developer Stories: Inspirational stories of Indian developers.
  • DevShow India: A weekly show that will keep new and seasoned developers updated on all the news, trainings, and API's from Google.
  • Skilled to Scaled: A real life developer journey that takes us from the germination of an idea for an app, all the way to monetizing it on Google Play.
So what's next?


The channel is live now. Click hereto check it out.



What’s in an AMP URL?

Posted by Alex Fischer, Software Engineer, Google Search.

TL;DR: Today, we're adding a feature to the AMP integration in Google Search that allows users to access, copy, and share the canonical URL of an AMP document. But before diving deeper into the news, let's take a step back to elaborate more on URLs in the AMP world and how they relate to the speed benefits of AMP.

What's in a URL? On the web, a lot - URLs and origins represent, to some extent, trust and ownership of content. When you're reading a New York Times article, a quick glimpse at the URL gives you a level of trust that what you're reading represents the voice of the New York Times. Attribution, brand, and ownership are clear.

Recent product launches in different mobile apps and the recent launch of AMP in Google Search have blurred this line a little. In this post, I'll first try to explain the reasoning behind some of the technical decisions we made and make sense of the different kinds of AMP URLs that exist. I'll then outline changes we are making to address the concerns around URLs.

To start with, AMP documents have three different kinds of URLs:
  • Original URL: The publisher's document written in the AMP format. http://www.example.com/amp/doc.html
  • AMP Cache URL: The document served through an AMP Cache (e.g., all AMPs served by Google are served through the Google AMP Cache). Most users will never see this URL. https://www-example-com.cdn.ampproject.org/c/www.example.com/amp/doc.html
  • Google AMP Viewer URL: The document displayed in an AMP viewer (e.g., when rendered on the search result page). https://www.google.com/amp/www.example.com/amp.doc.html


Although having three different URLs with different origins for essentially the same content can be confusing, there are two main reasons why these different URLs exist: caching and pre-rendering. Both are large contributors to AMP's speed, but require new URLs and I will elaborate on why that is.

AMP Cache URLs

Let's start with AMP Cache URLs. Paul Bakaus, a Google Developer Advocate for AMP, has an excellent post describing why AMP Caches exist. Paul's post goes into great detail describing the benefits of AMP Caches, but it doesn't quite answer the question why they require new URLs. The answer to this question comes down to one of the design principles of AMP: build for easy adoption. AMP tries to solve some of the problems of the mobile web at scale, so its components must be easy to use for everyone.

There are a variety of options to get validation, proximity to users, and other benefits provided by AMP Caches. For a small site, however, that doesn't manage its own DNS entries, doesn't have engineering resources to push content through complicated APIs, or can't pay for content delivery networks, a lot of these technologies are inaccessible.

For this reason, the Google AMP Cache works by means of a simple URL "transformation." A webmaster only has to make their content available at some URL and the Google AMP Cache can then cache and serve the content through Google's world-wide infrastructure through a new URL that mirrors and transforms the original. It's as simple as that. Leveraging an AMP Cache using the original URL, on the other hand, would require the webmaster to modify their DNS records or reconfigure their name servers. While some sites do just that, the URL-based approach is easier to use for the vast majority of sites.

AMP Viewer URLs

In the previous section, we learned about Google AMP Cache URLs -- URLs that point to the cached version of an AMP document. But what about www.google.com/amp URLs? Why are they needed? These are "AMP Viewer" URLs and they exist because of pre-rendering.
AMP's built-in support for privacy and resource-conscientious pre-rendering is rarely talked about and often misunderstood. AMP documents can be pre-rendered without setting off a cascade of resource fetches, without hogging up users' CPU and memory, and without running any privacy-sensitive analytics code. This works regardless of whether the embedding application is a mobile web page or a native application. The need for new URLs, however, comes mostly from mobile web implementations, so I am using Google's mobile search result page (SERP) as an illustrative example.

How does pre-rendering work?

When a user performs a Google search that returns AMP-enabled results, some of these results are pre-rendered behind the scenes. When the user clicks on a pre-rendered result, the AMP page loads instantly.

Pre-rendering works by loading a hidden iframe on the embedding page (the search result page) with the content of the AMP page and an additional parameter that indicates that the AMP document is only being pre-rendered. The JavaScript component that handles the lifecycle of these iframes is called "AMP Viewer".
The AMP Viewer pre-renders an AMP document in a hidden iFrame.


The user's browser loads the document and the AMP runtime and starts rendering the AMP page. Since all other resources, such as images and embeds, are managed by the AMP runtime, nothing else is loaded at this point. The AMP runtime may decide to fetch some resources, but it will do so in a resource and privacy sensible way.

When a user clicks on the result, all the AMP Viewer has to do is show the iframe that the browser has already rendered and let the AMP runtime know that the AMP document is now visible.
As you can see, this operation is incredibly cheap - there is no network activity or hard navigation to a new page involved. This leads to a near-instant loading experience of the result.

Where do google.com/amp URLs come from?

All of the above happens while the user is still on the original page (in our example, that's the search results page). In other words, the user hasn't gone to a different page; they have just viewed an iframe on the same page and so the browser doesn't change the URL.

We still want the URL in the browser to reflect the page that is displayed on the screen and make it easy for users to link to. When users hit refresh in their browser, they expect the same document to show up and not the underlying search result page. So the AMP viewer has to manually update this URL. This happens using the History API. This API allows the AMP Viewer to update the browser's URL bar without doing a hard navigation.

The question is what URL the browser should be updated to. Ideally, this would be the URL of the result itself (e.g., www.example.com/amp/doc.html); or the AMP Cache URL (e.g., www-example-com.cdn.ampproject.org/www.example.com/amp/doc.html). Unfortunately, it can't be either of those. One of the main restrictions of the History API is that the new URL must be on the same origin as the original URL (reference). This is enforced by browsers (for security reasons), but it means that in Google Search, this URL has to be on the www.google.com origin.

Why do we show a header bar?

The previous section explained restrictions on URLs that an AMP Viewer has to handle. These URLs, however, can be confusing and misleading. They can open up the doors to phishing attacks. If an AMP page showed a login page that looks like Google's and the URL bar says www.google.com, how would a user know that this page isn't actually Google's? That's where the need for additional attribution comes in.

To provide appropriate attribution of content, every AMP Viewer must make it clear to users where the content that they're looking at is coming from. And one way of accomplishing this is by adding a header bar that displays the "true" origin of a page.

What's next?

I hope the previous sections made it clear why these different URLs exist and why there needs to be a header in every AMP viewer. We have heard how you feel about this approach and the importance of URLs. So what next? As you know, we want to be thoughtful in what we do and ensure that we don't break the speed and performance users expect from AMP pages.

Since the launch of AMP in Google Search in Feb 2015, we have taken the following steps:
  • All Google URLs (i.e., the Google AMP cache URL and the Google AMP viewer URL) reflect the original source of the content as best as possible: www.google.com/amp/www.example.com/amp/doc.html.
  • When users scroll down the page to read a document, the AMP viewer header bar hides, freeing up precious screen real-estate.
  • When users visit a Google AMP viewer URL on a platform where the viewer is not available, we redirect them to the canonical page for the document.
In addition to the above, many users have requested a way to access, copy, and share the canonical URL of a document. Today, we're adding support for this functionality in form of an anchor button in the AMP Viewer header on Google Search. This feature allows users to use their browser's native share functionality by long-tapping on the link that is displayed.

In the coming weeks, the Android Google app will share the original URL of a document when users tap on the app's share button. This functionality is already available on the iOS Google app.

Lastly, we're working on leveraging upcoming web platform APIs that allow us to improve this functionality even further. One such API is the Web Share API that would allow AMP viewers to invoke the platform's native sharing flow with the original URL rather than the AMP viewer URL.

We as Google have every intention in making the AMP experience as good as we can for both, users and publishers. A thriving ecosystem is very important to us and attribution, user trust, and ownership are important pieces of this ecosystem. I hope this blog post helps clear up the origin of the three URLs of AMP documents, their role in making AMP fast, and our efforts to further improve the AMP experience in Google Search. Lastly, an ecosystem can only flourish with your participation: give us feedback and get involved with AMP.

Introducing Associate Android Developer Certification by Google

Originally posted on Android Developer Blog

The Associate Android Developer Certification program was announced at Google I/O 2016, and launched a few months later. Since then, over 322 Android developers spanning 61 countries have proven their competency and earned the title of Google Certified Associate Android Developer.
To establish a professional standard for what it means to be an Associate Android developer in the current job market, Google created this certification, which allows us to recognize developers who have proven themselves to uphold that standard.

We conducted a job task analysis to determine the required competencies and content of the certification exam. Through field research and interviews with experts, we identified the knowledge, work practices, and essential skills expected of an Associate Android developer.

The certification process consists of a performance-based exam and an exit interview. The certification fee includes three exam attempts. The cost for certification is $149 USD, or 6500 INR if you reside in India. After payment, the exam will be available for download, and you have 48 hours to complete and submit it for grading.

In the exam, you will implement missing features and debug an Android app using Android Studio. If you pass, you will undergo an exit interview where, you will answer questions about your exam and demonstrate your knowledge of Associate Android Developer competencies.

Check out this short video for a quick overview of the Associate Android Developer certification process:



Earning your AAD Certification signifies that you possess the skills expected of an Associate Android developer, as determined by Google. You can showcase your credential on your resume and display your digital badge on your social media profiles. As a member of the AAD Alumni Community, you will also have access to program benefits focused on increasing your visibility as a certified developer.

Test your Android development skills and earn the title of Google Certified Associate Android Developer. Visit g.co/devcertification to get started!


New resources for building inclusive tech hubs

Posted by Amy Schapiro and the Women Techmakers team

For the tech industry to thrive and create groundbreaking technology that supports the global ecosystem, it is criticalto increase the diversity and inclusion of communities that make the technology. To support this global network of tech hubs - incubators, community organizations, accelerators and coworking spaces - Women Techmakers partnered with Change Catalyst to develop an in-depth video series and set of guides on how to build inclusive technology hubs.


Watch the videos on the Women Techmakers YouTube channel, and access the how-to guides on the Change Catalyst site [via this link].


For more information about Women Techmakers, Google's global program supporting women in technology, and to join the Membership program, visit womentechmakers.com.

Welcoming Fabric to Google

Originally posted on the Firebase Blog

Posted by Francis Ma, Firebase Product Manager

Almost eight months ago, we launchedthe expansion of Firebase to help developers build high-quality apps, grow their user base, and earn more money across iOS, Android and the Web. We've already seen great adoption of the platform, which brings together the best of Google's core businesses from Cloud to mobile advertising.

Our ultimate goal with Firebase is to free developers from so much of the complexity associated with modern software development, giving them back more time and energy to focus on innovation.

As we work towards that goal, we've continued to improve Firebase, working closely with our user community. We recently introducedmajor enhancements to many core features, including Firebase Analytics, Test Lab and Cloud Messaging, as well as added support for game developers with a C++ SDK and Unity plug-in.


We're deeply committed to Firebase and are doubling down on our investment to solve developer challenges.
Fabric and Firebase Joining Forces

Today, we're excited to announce that we've signed an agreement to acquire Fabric to continue the great work that Twitter put into the platform. Fabric will join Google's Developer Product Group, working with the Firebase team. Our missions align closely: help developers build better apps and grow their business.

As a popular, trusted tool over many years, we expect that Crashlytics will become the main crash reporting offering for Firebase and will augment the work that we have already done in this area. While Fabric was built on the foundation of Crashlytics, the Fabric team leveraged its success to launch a broad set of important tools, including Answers and Fastlane. We'll share further details in the coming weeks after we close the deal, as we work closely together with the Fabric team to determine the most efficient ways to further combine our strengths. During the transition period, Digits, the SMS authentication services, will be maintained by Twitter.


The integration of Fabric is part of our larger, long-term effort of delivering a comprehensive suite of features for iOS, Android and mobile Web app development.

This is a great moment for the industry and a unique opportunity to bring the best of Firebase with the best of Fabric. We're committed to making mobile app development seamless, so that developers can focus more of their time on building creative experiences.

Silence speaks louder than words when finding malware

Originally posted on Android Developer Blog

Posted by Megan Ruthven, Software Engineer

In Android Security, we're constantly working to better understand how to make Android devices operate more smoothly and securely. One security solution included on all devices with Google Play is Verify apps. Verify apps checks if there are Potentially Harmful Apps (PHAs) on your device. If a PHA is found, Verify apps warns the user and enables them to uninstall the app.

But, sometimes devices stop checking up with Verify apps. This may happen for a non-security related reason, like buying a new phone, or, it could mean something more concerning is going on. When a device stops checking up with Verify apps, it is considered Dead or Insecure (DOI). An app with a high enough percentage of DOI devices downloading it, is considered a DOI app. We use the DOI metric, along with the other security systems to help determine if an app is a PHA to protect Android users. Additionally, when we discover vulnerabilities, we patch Android devices with our security update system. This blog post explores the Android Security team's research to identify the security-related reasons that devices stop working and prevent it from happening in the future.
Flagging DOI Apps
To understand this problem more deeply, the Android Security team correlates app install attempts and DOI devices to find apps that harm the device in order to protect our users.
With these factors in mind, we then focus on 'retention'. A device is considered retained if it continues to perform periodic Verify apps security check ups after an app download. If it doesn't, it's considered potentially dead or insecure (DOI). An app's retention rate is the percentage of all retained devices that downloaded the app in one day. Because retention is a strong indicator of device health, we work to maximize the ecosystem's retention rate. Therefore, we use an app DOI scorer, which assumes that all apps should have a similar device retention rate. If an app's retention rate is a couple of standard deviations lower than average, the DOI scorer flags it. A common way to calculate the number of standard deviations from the average is called a Z-score. The equation for the Z-score is below.
  • N = Number of devices that downloaded the app.
  • x = Number of retained devices that downloaded the app.
  • p = Probability of a device downloading any app will be retained.

In this context, we call the Z-score of an app's retention rate a DOI score. The DOI score indicates an app has a statistically significant lower retention rate if the Z-score is much less than -3.7. This means that if the null hypothesis is true, there is much less than a 0.01% chance the magnitude of the Z-score being as high. In this case, the null hypothesis means the app accidentally correlated with lower retention rate independent of what the app does.
This allows for percolation of extreme apps (with low retention rate and high number of downloads) to the top of the DOI list. From there, we combine the DOI score with other information to determine whether to classify the app as a PHA. We then use Verify apps to remove existing installs of the app and prevent future installs of the app.
Difference between a regular and DOI app download on the same device.
Results in the wild
Among others, the DOI score flagged many apps in three well known malware families— Hummingbad, Ghost Push, and Gooligan. Although they behave differently, the DOI scorer flagged over 25,000 apps in these three families of malware because they can degrade the Android experience to such an extent that a non-negligible amount of users factory reset or abandon their devices. This approach provides us with another perspective to discover PHAs and block them before they gain popularity. Without the DOI scorer, many of these apps would have escaped the extra scrutiny of a manual review.
The DOI scorer and all of Android's anti-malware work is one of multiple layers protecting users and developers on Android. For an overview of Android's security and transparency efforts, check out our page.

Google AMP Cache, AMP Lite, and the need for speed

Posted by Huibao Lin and Eyal Peled, Software Engineers, Google


At Google we believe in designing products with speed as a core principle. The Accelerated Mobile Pages (AMP) format helps ensure that content reliably loads fast, but we can do even better.

Smart caching is one of the key ingredients in the near instant AMP experiences users get in products like Google Search and Google News & Weather. With caching, we can make content be, in general, physically closer to the users who are requesting it so that bytes take a shorter trip over the wire to reach the user. In addition, using a single common infrastructure like a cache provides greater consistency in page serving times despite the content originating from many hosts, which might have very different—and much larger—latency in serving the content as compared to the cache.

Faster and more consistent delivery are the major reasons why pages served in Google Search's AMP experience come from the Google AMP Cache. The Cache's unified content serving infrastructure opens up the exciting possibility to build optimizations that scale to improve the experience across hundreds of millions of documents served. Making it so that any document would be able to take advantage of these benefits is one of the main reasons the Google AMP Cache is available for free to anyone to use.

In this post, we'll highlight two improvements we've recently introduced: (1) optimized image delivery and (2) enabling content to be served more successfully in bandwidth-constrained conditions through a project called "AMP Lite."

Image optimizations by the Google AMP Cache


On average across the web, images make up 64% of the bytesof a page. This means images are a very promising target for impactful optimizations.

Applying image optimizations is an effective way for cutting bytes on the wire. The Google AMP Cache employs the image optimization stack used by the PageSpeed Modules and Chrome Data Compression. (Note that in order to make the above transformations, the Google AMP Cache disregards the "Cache-Control: no-transform" header.) Sites can get the same image optimizations on their origin by installing PageSpeed on their server.

Here's a rundown of some of the optimizations we've made:

1) Removing data which is invisible or difficult to see
We remove image data that is invisible to users, such as thumbnail and geolocation metadata. For JPEG images, we also reduce quality and color samples if they are higher than necessary. To be exact, we reduce JPEG quality to 85 and color samples to 4:2:0 — i.e., one color sample per four pixels. Compressing a JPEG to quality higher than this or with more color samples takes more bytes, but the visual difference is difficult to notice.

The reduced image data is then exhaustively compressed. We've found that these optimizations reduce bytes by 40%+ while not being noticeable to the user's eye.

2) Converting images to WebP format
Some image formats are more mobile-friendly. We convert JPEG to WebP for supported browsers. This transformation leads to an additional 25%+ reduction in bytes with no loss in quality.

3) Adding srcset
We add "srcset" if it has not been included. This applies to "amp-img" tags with "src" but no "srcset" attribute. The operation includes expanding "amp-img" tag as well as resizing the image to multiple dimensions. This reduces the byte count further on devices with small screens.

4) Using lower quality images under some circumstances
We decrease the quality of JPEG images when there is an indication that this is desired by the user or for very slow network conditions (as part of AMP Lite discussed below). For example, we reduce JPEG image quality to 50 for Chrome users who have turned on Data Saver. This transformation leads to another 40%+ byte reduction to JPEG images.

The following example shows the images before (left) and after(right) optimizations. Originally the image has 241,260 bytes, and after applying Optimizations 1, 2, & 4 it becomes 25,760 bytes. After the optimizations the image looks essentially the same, but 89% of the bytes have been saved.



AMP Lite for Slow Network Conditions


Many people around the world access the internet with slow connection speeds or on devices with low RAM and we've found that some AMP pages are not optimized for these severely bandwidth constrained users. For this reason, Google has also launched AMP Lite to remove even more bytes from AMP pages for these users.

With AMP Lite, we apply all of the above optimizations to images. In particular, we always use lower quality levels (see Bullet 4 above).

In addition, we optimize external fonts by using the amp-fonttag, setting the font loading timeout to 0 seconds so pages can be displayed immediately regardless of whether the external font was previously cached or not.

AMP Lite is rolling out for bandwidth-constrained users in several countries such as Vietnam and Malaysia and for holders of low ram devices globally. Note that these optimizations may modify the fine details of some images, but do not affect other parts of the page including ads.

* * *

All told, we see a combined 45% reduction in bytes across all optimizations listed above.
We hope to go even further in making more efficient use of users' data to provide even faster AMP experiences.