Category Archives: Google Webmaster Central Blog

Official news on crawling and indexing sites for the Google index

Google’s robots.txt parser is now open source

For 25 years, the Robots Exclusion Protocol (REP) was only a de-facto standard. This had frustrating implications sometimes. On one hand, for webmasters, it meant uncertainty in corner cases, like when their text editor included BOM characters in their robots.txt files. On the other hand, for crawler and tool developers, it also brought uncertainty; for example, how should they deal with robots.txt files that are hundreds of megabytes large?



Today, we announced that we're spearheading the effort to make the REP an internet standard. While this is an important step, it means extra work for developers who parse robots.txt files.
We're here to help: we open sourced the C++ library that our production systems use for parsing and matching rules in robots.txt files. This library has been around for 20 years and it contains pieces of code that were written in the 90's. Since then, the library evolved; we learned a lot about how webmasters write robots.txt files and corner cases that we had to cover for, and added what we learned over the years also to the internet draft when it made sense.
We also included a testing tool in the open source package to help you test a few rules. Once built, the usage is very straightforward:
robots_main <robots.txt content> <user_agent> <url>
If you want to check out the library, head over to our GitHub repository for the robots.txt parser. We'd love to see what you can build using it! If you built something using the library, drop us a comment on Twitter, and if you have comments or questions about the library, find us on GitHub.
Posted by Edu Pereda, Lode Vandevenne, and Gary, Search Open Sourcing team

Formalizing the Robots Exclusion Protocol Specification

For 25 years, the Robots Exclusion Protocol (REP) has been one of the most basic and critical components of the web. It allows website owners to exclude automated clients, for example web crawlers, from accessing their sites - either partially or completely.
In 1994, Martijn Koster (a webmaster himself) created the initial standard after crawlers were overwhelming his site. With more input from other webmasters, the REP was born, and it was adopted by search engines to help website owners manage their server resources easier.
However, the REP was never turned into an official Internet standard, which means that developers have interpreted the protocol somewhat differently over the years. And since its inception, the REP hasn't been updated to cover today's corner cases. This is a challenging problem for website owners because the ambiguous de-facto standard made it difficult to write the rules correctly.
We wanted to help website owners and developers create amazing experiences on the internet instead of worrying about how to control crawlers. Together with the original author of the protocol, webmasters, and other search engines, we've documented how the REP is used on the modern web, and submitted it to the IETF.
The proposed REP draft reflects over 20 years of real world experience of relying on robots.txt rules, used both by Googlebot and other major crawlers, as well as about half a billion websites that rely on REP. These fine grained controls give the publisher the power to decide what they'd like to be crawled on their site and potentially shown to interested users. It doesn't change the rules created in 1994, but rather defines essentially all undefined scenarios for robots.txt parsing and matching, and extends it for the modern web. Notably:
  1. Any URI based transfer protocol can use robots.txt. For example, it's not limited to HTTP anymore and can be used for FTP or CoAP as well.
  2. Developers must parse at least the first 500 kibibytes of a robots.txt. Defining a maximum file size ensures that connections are not open for too long, alleviating unnecessary strain on servers.
  3. A new maximum caching time of 24 hours or cache directive value if available, gives website owners the flexibility to update their robots.txt whenever they want, and crawlers aren't overloading websites with robots.txt requests. For example, in the case of HTTP, Cache-Control headers could be used for determining caching time.
  4. The specification now provisions that when a previously accessible robots.txt file becomes inaccessible due to server failures, known disallowed pages are not crawled for a reasonably long period of time.
Additionally, we've updated the augmented Backus–Naur form in the internet draft to better define the syntax of robots.txt, which is critical for developers to parse the lines.
RFC stands for Request for Comments, and we mean it: we uploaded the draft to IETF to get feedback from developers who care about the basic building blocks of the internet. As we work to give web creators the controls they need to tell us how much information they want to make available to Googlebot, and by extension, eligible to appear in Search, we have to make sure we get this right.
If you'd like to drop us a comment, ask us questions, or just say hi, you can find us on Twitter and in our Webmaster Community, both offline and online.

Posted by Henner Zeller, Lizzi Harvey, and Gary

Bye Bye Preferred Domain setting

As we progress with the migration to the new Search Console experience, we will be saying farewell to one of our settings: preferred domain.



It's common for a website to have the same content on multiple URLs. For example, it might have the same content on http://example.com/ as on https://www.example.com/index.html. To make things easier, when our systems recognize that, we'll pick one URL as the "canonical" for Search. You can still tell us your preference in multiple ways if there's something specific you want us to pick (see paragraph below). But if you don't have a preference, we'll choose the best option we find. Note that with the deprecation we will no longer use any existing Search Console preferred domain configuration.

You can find detailed explanations on how to tell us your preference in the Consolidate duplicate URLs help center article. Here are some of the options available to you:
  1. Use rel=”canonical” link tag on HTML pages
  2. Use rel=”canonical” HTTP header
  3. Use a sitemap
  4. Use 301 redirects for retired URLs
Send us any feedback either through Twitter or our forum.

Posted by Daniel Waisberg, Search Advocate

Webmaster Conference: an event made for you

Over the years we attended hundreds of conferences, we spoke to thousands of webmasters, and recorded hundreds of hours of videos to help web creators find information about how to perform better in Google Search results. Now we'd like to go further: help those who aren't able to travel internationally and access the same information.

Today we're officially announcing the Webmaster Conference, a series of local events around the world. These events are primarily located where it's difficult to access search conferences or information about Google Search, or where there's a specific need for a Search event. For example, if we identify that a region has problems with hacked sites, we may organize an event focusing on that specific topic.

We want web creators to have equal opportunity in Google Search regardless of their language, financial status, gender, location, or any other attribute. The conferences are always free and easily accessible in the region where they're organized, and, based on feedback from the local communities and analyses, they're tailored for the audience that signed up for the events. That means it doesn't matter how much you already know about Google Search; the event you attend will have takeaways tailored to you. The talks will be in the local language, in case of international speakers through interpreters, and we'll do our best to also offer sign language interpretation if requested.
collage from past WMConf events
Webmaster Conference Okinawa

The structure of the event varies from region to region. For example, in Okinawa, Japan, we had a wonderful half-day event with novice and advanced web creators where we focused on how to perform better in Google Images. At Webmaster Conference India and Indonesia, that might change and we may focus more on how to create faster websites. We will also host web communities in Europe and North America later this year, so keep an eye out for the announcements!
We will continue attending external events as usual; we are doing these events to complement the existing ones. If you want to learn more about our upcoming events, visit the Webmaster Conference site which we'll update monthly, and follow our blogs and @googlewmc on Twitter!

Posted by Takeaki Kanaya and Gary

A video series on SEO myths for web developers

We invited members of the SEO and web developer community to join us for a new video series called "SEO mythbusting".
In this series, we discuss various topics around SEO from a developer's perspective, how we can work to make the "SEO black box" more transparent, and what technical SEO might look like as the web keeps evolving. We already published a few episodes: Web developer's 101:
A look at Googlebot:
Microformats and structured data:
JavaScript and SEO:
We have a few more episodes for you and we will launch the next episodes weekly on the Google Webmasters YouTube channel, so don't forget to subscribe to stay in the loop. You can also find all published episodes in this YouTube playlist. We look forward to hearing your feedback, topic suggestions, and guest recommendations in the YouTube comments as well as our Twitter account!

Mobile-First Indexing by default for new domains

Over the years since announcing mobile-first indexing - Google's crawling of the web using a smartphone Googlebot - our analysis has shown that new websites are generally ready for this method of crawling. Accordingly, we're happy to announce that mobile-first indexing will be enabled by default for all new, previously unknown to Google Search, websites starting July 1, 2019. It's fantastic to see that new websites are now generally showing users - and search engines - the same content on both mobile and desktop devices!

You can continue to check for mobile-first indexing of your website by using the URL Inspection Tool in Search Console. By looking at a URL on your website there, you'll quickly see how it was last crawled and indexed. For older websites, we'll continue monitoring and evaluating pages for their readiness for mobile first indexing, and will notify them through Search Console once they're seen as being ready. Since the default state for new websites will be mobile-first indexing, there's no need to send a notification.


Using the URL Inspection Tool to check the mobile-first indexing status

Our guidance on making all websites work well for mobile-first indexing continues to be relevant, for new and existing sites. For existing websites we determine their readiness for mobile-first indexing based on parity of content (including text, images, videos, links), structured data, and other meta-data (for example, titles and descriptions, robots meta tags). We recommend double-checking these factors when a website is launched or significantly redesigned.

While we continue to support responsive web design, dynamic serving, and separate mobile URLs for mobile websites, we recommend responsive web design for new websites. Because of issues and confusion we've seen from separate mobile URLs over the years, both from search engines and users, we recommend using a single URL for both desktop and mobile websites.

Mobile-first indexing has come a long way. We're happy to see how the web has evolved from being focused on desktop, to becoming mobile-friendly, and now to being mostly crawlable and indexable with mobile user-agents! We realize it has taken a lot of work from your side to get there, and on behalf of our mostly-mobile users, we appreciate that. We’ll continue to monitor and evaluate this change carefully. If you have any questions, please drop by our Webmaster forums or our public events.

Search at Google I/O 2019

Google I/O is our yearly developer conference where we have the pleasure of announcing some exciting new Search-related features and capabilities. A good place to start is Google Search: State of the Union, which explains how to take advantage of the latest capabilities in Google Search:

We also gave more details on how JavaScript and Google Search work together and what you can do to make sure your JavaScript site performs well in Search.

Try out new features today

Here are some of the new features, codelabs, and documentation that you can try out today:
The Google I/O sign at Shoreline Amphitheatre at Mountain View, CA

Be among the first to test new features

Your help is invaluable to making sure our products work for everyone. We shared some new features that we're still testing and would love your feedback and participation.
A large crowd at Google I/O

Learn more about what's coming soon

I/O is a place where we get to showcase new Search features, so we're excited to give you a heads up on what's next on the horizon:
Two people posing for a photo at Google I/O, forming a heart with their arms

We hope these cool announcements help & inspire you to create even better websites that work well in Search. Should you have any questions, feel free to post in our webmaster help forums, contact us on Twitter, or reach out to us at any of the next events we're at.

New in structured data: FAQ and How-to


Over the last few weeks, we've been discussing structured data: first providing best practices and then showing how to monitor it with Search Console. Today we are announcing support for FAQ and How-to structured data on Google Search and the Google Assistant, including new reports in Search Console to monitor how your site is performing.

In this post, we provide details to help you implement structured data on your FAQ and how-to pages in order to make your pages eligible to feature on Google Search as rich results and How-to Actions for the Assistant. We also show examples of how to monitor your search appearance with new Search Console enhancement reports.

Disclaimer: Google does not guarantee that your structured data will show up in search results, even if your page is marked up correctly. To determine whether a result gets a rich treatment, Google algorithms use a variety of additional signals to make sure that users see rich results when their content best serves the user’s needs. Learn more about structured data guidelines.

How-to on Search and the Google Assistant

How-to rich results provide users with richer previews of web results that guide users through step-by-step tasks. For example, if you provide information on how to tile a kitchen backsplash, tie a tie, or build a treehouse, you can add How-to structured data to your pages to enable the page to appear as a rich result on Search and a How-to Action for the Assistant.

Add structured data to the steps, tools, duration, and other properties to enable a How-to rich result for your content on the search page. If your page uses images or video for each step, make sure to mark up your visual content to enhance the preview and expose a more visual representation of your content to users. Learn more about the required and recommended properties you can use on your markup in the How-to developer documentation.


Sample search result showing How-to structured data    Sample search result showing How-to structured data

Your content can also start surfacing on the Assistant through new voice guided experiences. This feature lets you expand your content to new surfaces, to help users complete tasks wherever they are, and interactively progress through the steps using voice commands.

As shown in the Google Home Hub example below, the Assistant provides a conversational, hands-free experience that can help users complete a task. This is an incredibly lightweight way for web developers to expand their content to the Assistant. For more information about How-to for the Assistant, visit Build a How-to Guide Action with Markup.


How-to for the Assistant    How-to for the Assistant

To help you monitor How-to markup issues, we launched a report in Search Console that shows all errors, warnings and valid items for pages with HowTo structured data. Learn more about how to use the report to monitor your results.

Search Console enhancement report

FAQ on Search and the Google Assistant

An FAQ page provides a list of frequently asked questions and answers on a particular topic. For example, an FAQ page on an e-commerce website might provide answers on shipping destinations, purchase options, return policies, and refund processes. By using FAQPage structured data, you can make your content eligible to display these questions and answers to display directly on Google Search and the Assistant, helping users to quickly find answers to frequently asked questions.

FAQ structured data is only for official questions and answers; don't add FAQ structured data on forums or other pages where users can submit answers to questions - in that case, use the Q&A Page markup.

You can learn more about implementation details in the FAQ developer documentation.

FAQ on Search


To provide more ways for users to access your content, FAQ answers can also be surfaced on the Google Assistant. Your users can invoke your FAQ content by asking direct questions and get the answers that you marked up in your FAQ pages. For more information, visit Build an FAQ Action with Markup.
FAQ on Google Assistant


To help you monitor FAQ issues and search appearance, we also launched an FAQ report in Search Console that shows all errors, warnings and valid items related to your marked-up FAQ pages.

We would love to hear your thoughts on how FAQ or How-to structured data works for you. Send us any feedback either through Twitter or our forum.

Posted by Daniel Waisberg, Damian Biollo, Patrick Nevels, and Yaniv Loewenstein

The new evergreen Googlebot

Googlebot is the crawler that visits web pages to include them within Google Search index. The number one question we got from the community at events and social media was if we could make Googlebot evergreen with the latest Chromium. Today, we are happy to announce that Googlebot now runs the latest Chromium rendering engine (74 at the time of this post) when rendering pages for Search. Moving forward, Googlebot will regularly update its rendering engine to ensure support for latest web platform features.

What that means for you

Compared to the previous version, Googlebot now supports 1000+ new features, like:


You should check if you’re transpiling or use polyfills specifically for Googlebot and if so, evaluate if this is still necessary. There are still some limitations, so check our troubleshooter for JavaScript-related issues and the video series on JavaScript SEO.

Any thoughts on this? Talk to us on Twitter, the webmaster forums, or join us for the online office hours.

Google I/O 2019 – What sessions should SEOs and webmasters watch?

Google I/O 2019 is starting tomorrow and will run for 3 days, until Thursday. Google I/O is our yearly developers festival, where product announcements are made, new APIs and frameworks are introduced, and Product Managers present the latest from Google to an audience of 7,000+ developers who fly to California.

However, you don't have to physically attend the event to take advantage of this once-a-year opportunity: many conferences and talks are live streamed on YouTube for anyone to watch. Browse the full schedule of events, including a list of talks that we think will be interesting for webmasters to watch (all talks are in English). All the links shared below will bring you to pages with more details about each talk, and links to watch the sessions will display on the day of each event. All times are Pacific Central time (California time).



This list is only a small part of the agenda that we think is useful to webmasters and SEOs. There are many more sessions that you could find interesting! To learn about those other talks, check out the full list of “web” sessions, design sessions, Cloud sessions, machine learning sessions, and more. Use the filtering function to toggle the sessions on and off.

We hope you can make the time to watch the talks online, and participate in the excitement of I/O ! The videos will also be available on Youtube after the event, in case you can't tune in live.

Posted by Vincent Courson, Search Outreach Specialist