URLs - googblogs.com

Posted by Alex Fischer, Software Engineer, Google Search.

TL;DR: Today, we're adding a feature to the AMP integration in Google Search that allows users to access, copy, and share the canonical URL of an AMP document. But before diving deeper into the news, let's take a step back to elaborate more on URLs in the AMP world and how they relate to the speed benefits of AMP.

What's in a URL? On the web, a lot - URLs and origins represent, to some extent, trust and ownership of content. When you're reading a New York Times article, a quick glimpse at the URL gives you a level of trust that what you're reading represents the voice of the New York Times. Attribution, brand, and ownership are clear.

Recent product launches in different mobile apps and the recent launch of AMP in Google Search have blurred this line a little. In this post, I'll first try to explain the reasoning behind some of the technical decisions we made and make sense of the different kinds of AMP URLs that exist. I'll then outline changes we are making to address the concerns around URLs.

To start with, AMP documents have three different kinds of URLs:

Original URL: The publisher's document written in the AMP format. http://www.example.com/amp/doc.html
AMP Cache URL: The document served through an AMP Cache (e.g., all AMPs served by Google are served through the Google AMP Cache). Most users will never see this URL. https://www-example-com.cdn.ampproject.org/c/www.example.com/amp/doc.html
Google AMP Viewer URL: The document displayed in an AMP viewer (e.g., when rendered on the search result page). https://www.google.com/amp/www.example.com/amp.doc.html

Although having three different URLs with different origins for essentially the same content can be confusing, there are two main reasons why these different URLs exist: caching and pre-rendering. Both are large contributors to AMP's speed, but require new URLs and I will elaborate on why that is.

AMP Cache URLs

Let's start with AMP Cache URLs. Paul Bakaus, a Google Developer Advocate for AMP, has an excellent post describing why AMP Caches exist. Paul's post goes into great detail describing the benefits of AMP Caches, but it doesn't quite answer the question why they require new URLs. The answer to this question comes down to one of the design principles of AMP: build for easy adoption. AMP tries to solve some of the problems of the mobile web at scale, so its components must be easy to use for everyone.

There are a variety of options to get validation, proximity to users, and other benefits provided by AMP Caches. For a small site, however, that doesn't manage its own DNS entries, doesn't have engineering resources to push content through complicated APIs, or can't pay for content delivery networks, a lot of these technologies are inaccessible.

For this reason, the Google AMP Cache works by means of a simple URL "transformation." A webmaster only has to make their content available at some URL and the Google AMP Cache can then cache and serve the content through Google's world-wide infrastructure through a new URL that mirrors and transforms the original. It's as simple as that. Leveraging an AMP Cache using the original URL, on the other hand, would require the webmaster to modify their DNS records or reconfigure their name servers. While some sites do just that, the URL-based approach is easier to use for the vast majority of sites.

AMP Viewer URLs

In the previous section, we learned about Google AMP Cache URLs -- URLs that point to the cached version of an AMP document. But what about www.google.com/amp URLs? Why are they needed? These are "AMP Viewer" URLs and they exist because of pre-rendering.
AMP's built-in support for privacy and resource-conscientious pre-rendering is rarely talked about and often misunderstood. AMP documents can be pre-rendered without setting off a cascade of resource fetches, without hogging up users' CPU and memory, and without running any privacy-sensitive analytics code. This works regardless of whether the embedding application is a mobile web page or a native application. The need for new URLs, however, comes mostly from mobile web implementations, so I am using Google's mobile search result page (SERP) as an illustrative example.

How does pre-rendering work?

When a user performs a Google search that returns AMP-enabled results, some of these results are pre-rendered behind the scenes. When the user clicks on a pre-rendered result, the AMP page loads instantly.

Pre-rendering works by loading a hidden iframe on the embedding page (the search result page) with the content of the AMP page and an additional parameter that indicates that the AMP document is only being pre-rendered. The JavaScript component that handles the lifecycle of these iframes is called "AMP Viewer".

The AMP Viewer pre-renders an AMP document in a hidden iFrame.

The user's browser loads the document and the AMP runtime and starts rendering the AMP page. Since all other resources, such as images and embeds, are managed by the AMP runtime, nothing else is loaded at this point. The AMP runtime may decide to fetch some resources, but it will do so in a resource and privacy sensible way.

When a user clicks on the result, all the AMP Viewer has to do is show the iframe that the browser has already rendered and let the AMP runtime know that the AMP document is now visible.
As you can see, this operation is incredibly cheap - there is no network activity or hard navigation to a new page involved. This leads to a near-instant loading experience of the result.

Where do google.com/amp URLs come from?

All of the above happens while the user is still on the original page (in our example, that's the search results page). In other words, the user hasn't gone to a different page; they have just viewed an iframe on the same page and so the browser doesn't change the URL.

We still want the URL in the browser to reflect the page that is displayed on the screen and make it easy for users to link to. When users hit refresh in their browser, they expect the same document to show up and not the underlying search result page. So the AMP viewer has to manually update this URL. This happens using the History API. This API allows the AMP Viewer to update the browser's URL bar without doing a hard navigation.

The question is what URL the browser should be updated to. Ideally, this would be the URL of the result itself (e.g., www.example.com/amp/doc.html); or the AMP Cache URL (e.g., www-example-com.cdn.ampproject.org/www.example.com/amp/doc.html). Unfortunately, it can't be either of those. One of the main restrictions of the History API is that the new URL must be on the same origin as the original URL (reference). This is enforced by browsers (for security reasons), but it means that in Google Search, this URL has to be on the www.google.com origin.

Why do we show a header bar?

The previous section explained restrictions on URLs that an AMP Viewer has to handle. These URLs, however, can be confusing and misleading. They can open up the doors to phishing attacks. If an AMP page showed a login page that looks like Google's and the URL bar says www.google.com, how would a user know that this page isn't actually Google's? That's where the need for additional attribution comes in.

To provide appropriate attribution of content, every AMP Viewer must make it clear to users where the content that they're looking at is coming from. And one way of accomplishing this is by adding a header bar that displays the "true" origin of a page.

What's next?

I hope the previous sections made it clear why these different URLs exist and why there needs to be a header in every AMP viewer. We have heard how you feel about this approach and the importance of URLs. So what next? As you know, we want to be thoughtful in what we do and ensure that we don't break the speed and performance users expect from AMP pages.

Since the launch of AMP in Google Search in Feb 2015, we have taken the following steps:

All Google URLs (i.e., the Google AMP cache URL and the Google AMP viewer URL) reflect the original source of the content as best as possible: www.google.com/amp/www.example.com/amp/doc.html.
When users scroll down the page to read a document, the AMP viewer header bar hides, freeing up precious screen real-estate.
When users visit a Google AMP viewer URL on a platform where the viewer is not available, we redirect them to the canonical page for the document.

In addition to the above, many users have requested a way to access, copy, and share the canonical URL of a document. Today, we're adding support for this functionality in form of an anchor button in the AMP Viewer header on Google Search. This feature allows users to use their browser's native share functionality by long-tapping on the link that is displayed.

In the coming weeks, the Android Google app will share the original URL of a document when users tap on the app's share button. This functionality is already available on the iOS Google app.

Lastly, we're working on leveraging upcoming web platform APIs that allow us to improve this functionality even further. One such API is the Web Share API that would allow AMP viewers to invoke the platform's native sharing flow with the original URL rather than the AMP viewer URL.

We as Google have every intention in making the AMP experience as good as we can for both, users and publishers. A thriving ecosystem is very important to us and attribution, user trust, and ownership are important pieces of this ecosystem. I hope this blog post helps clear up the origin of the three URLs of AMP documents, their role in making AMP fast, and our efforts to further improve the AMP experience in Google Search. Lastly, an ecosystem can only flourish with your participation: give us feedback and get involved with AMP.

googblogs.com

All Google blogs and Press in one site

Tag Archives: URLs

What’s in an AMP URL?