Tag Archives: Structured Data

Ways to succeed in Google News

With the New Year now underway, we'd like to offer some best practices and advice we hope will lead publishers to more success within Google News in 2019.

General advice

There is a lot of helpful information to consider within the Google News Publisher Help Center. Be sure to have read the material in this area, in particular the content and technical guidelines.

Headlines and dates


  • Present clear headlines: Google News looks at a variety of signals to determine the headline of an article, including within your HTML title tag and for the most prominent text on the page. Review our headline tips.
  • Provide accurate times and dates: Google News tries to determine the time and date to display for an article in a variety of ways. You can help ensure we get it right by using the following methods:
    • Show one clear date and time: As per our date guidelines, show a clear, visible date and time between the headline and the article text. Prevent other dates from appearing on the page whenever possible, such as for related stories.
    • Use structured data: Use the datePublished and dateModified schema and use the correct time zone designator for AMP or non-AMP pages
  • Avoid artificially freshening stories: If an article has been substantially changed, it can make sense to give it a fresh date and time. However, don't artificially freshen a story without adding significant information or some other compelling reason for the freshening. Also, do not create a very slightly updated story from one previously published, then delete the old story and redirect to the new one. That's against our article URLs guidelines.

Duplicate content

Google News seeks to reward independent, original journalistic content by giving credit to the originating publisher, as both users and publishers would prefer. This means we try not to allow duplicate content—which includes scraped, rewritten, or republished material—to perform better than the original content. In line with this, these are guidelines publishers should follow:

  • Block scraped content: Scraping commonly refers to taking material from another site, often on an automated basis. Sites that scrape content must block scraped content from Google News.
  • Block rewritten content: Rewriting refers to taking material from another site, then rewriting that material so that it is not identical. Sites that rewrite content in a way that provides no substantial or clear added value must block that rewritten content from Google News. This includes, but is not limited to, rewrites that make only very slight changes or those that make many word replacements but still keep the original article's overall meaning.
  • Block or consider canonical for republished content: Republishing refers to when a publisher has permission from another publisher or author to republish an original work, such as material from wire services or in partnership with other publications.
    Publishers that allow others to republish content can help ensure that their original versions perform better in Google News by asking those republishing to block or make use of canonical.
    Google News also encourages those that republish material to consider proactively blocking such content or making use of the canonical, so that we can better identify the original content and credit it appropriately.
  • Avoid duplicate content: If you operate a network of news sites that share content, the advice above about republishing is applicable to your network. Select what you consider to be the original article and consider blocking duplicates or making use of the canonical to point to the original.

Transparency


  • Be transparent: Visitors to your site want to trust and understand who publishes it and information about those who have written articles. That's why our content guidelines stress that content should have posts with clear bylines, information about authors, and contact information for the publication.
  • Don't be deceptive: Our content policies do not allow sites or accounts that impersonate any person or organization, or that misrepresent or conceal their ownership or primary purpose. We do not allow sites or accounts that engage in coordinated activity to mislead users. This includes, but isn't limited to, sites or accounts that misrepresent or conceal their country of origin or that direct content at users in another country under false premises.

More tips


  • Avoid taking part in link schemes: Don't participate in link schemes, which can include large-scale article marketing programs or selling links that pass PageRank. Review our page on link schemes for more information.
  • Use structured for rich presentation: Both those using AMP and non-AMP pages can make use of structured data to optimize your content for rich results or carousel-like presentations.
  • Protect your users and their data: Consider securing every page of your website with HTTPS to protect the integrity and confidentiality of the data users exchange on your site. You can find more useful tips in our best practices on how to implement HTTPS.

Here's to a great 2019!

We hope these tips help publishers succeed in Google News over the coming year. For those who have more questions about Google News, we are unable to do one-to-one support. However, we do monitor our Google News Publisher Forum—which has been newly-revamped—and try to provide guidance on questions that might help a number of publishers all at once. The forum is also a great resource where publishers share tips and advice with each other.

Mobile-First indexing, structured data, images, and your site

It's been two years since we started working on "mobile-first indexing" - crawling the web with smartphone Googlebot, similar to how most users access it. We've seen websites across the world embrace the mobile web, making fantastic websites that work on all kinds of devices. There's still a lot to do, but today, we're happy to announce that we now use mobile-first indexing for over half of the pages shown in search results globally.

Checking for mobile-first indexing

In general, we move sites to mobile-first indexing when our tests assure us that they're ready. When we move sites over, we notify the site owner through a message in Search Console. It's possible to confirm this by checking the server logs, where a majority of the requests should be from Googlebot Smartphone. Even easier, the URL inspection tool allows a site owner to check how a URL from the site (it's usually enough to check the homepage) was last crawled and indexed.

If your site uses responsive design techniques, you should be all set! For sites that aren't using responsive web design, we've seen two kinds of issues come up more frequently in our evaluations:

Missing structured data on mobile pages

Structured data is very helpful to better understand the content on your pages, and allows us to highlight your pages in fancy ways in the search results. If you use structured data on the desktop versions of your pages, you should have the same structured data on the mobile versions of the pages. This is important because with mobile-first indexing, we'll only use the mobile version of your page for indexing, and will otherwise miss the structured data.

Testing your pages in this regard can be tricky. We suggest testing for structured data in general, and then comparing that to the mobile version of the page. For the mobile version, check the source code when you simulate a mobile device, or use the HTML generated with the mobile-friendly testing tool. Note that a page does not need to be mobile-friendly in order to be considered for mobile-first indexing.

Missing alt-text for images on mobile pages

The value of alt-attributes on images ("alt-text") is a great way to describe images to users with screen-readers (which are used on mobile too!), and to search engine crawlers. Without alt-text for images, it's a lot harder for Google Images to understand the context of images that you use on your pages.

Check "img" tags in the source code of the mobile version for representative pages of your website. As above, the source of the mobile version can be seen by either using the browser to simulate a mobile device, or by using the Mobile-Friendly test to check the Googlebot rendered version. Search the source code for "img" tags, and double-check that your page is providing appropriate alt-attributes for any that you want to have findable in Google Images.

For example, that might look like this:

With alt-text (good!):
<img src="cute-puppies.png" alt="A photo of cute puppies on a blanket">

Without alt-text:
<img src="sad-puppies.png">

It's fantastic to see so many great websites that work well on mobile! We're looking forward to being able to index more and more of the web using mobile-first indexing, helping more users to search the web in the same way that they access it: with a smartphone. We’ll continue to monitor and evaluate this change carefully. If you have any questions, please drop by our Webmaster forums or our public events.


Introducing the Indexing API and structured data for livestreams

Over the past few years, it's become easier than ever to stream live videos online, from celebrity updates to special events. But it's not always easy for people to determine which videos are live and know when to tune in.
Today, we're introducing new tools to help more people discover your livestreams in Search and Assistant. With livestream structured data and the Indexing API, you can let Google know when your video is live, so it will be eligible to appear with a red "live" badge:

Add livestream structured data to your page

If your website streams live videos, use the livestream developer documentation to flag your video as a live broadcast and mark the start and end times. In addition, VideoObject structured data is required to tell Google that there's a video on your page.

Update Google quickly with the Indexing API

The Indexing API now supports pages with livestream structured data. We encourage you to call the Indexing API to request that your site is crawled in time for the livestream. We recommend calling the Indexing API when your livestream begins and ends, and if the structured data changes.
For more information, visit our developer documentation. If you have any questions, ask us in the Webmaster Help Forum. We look forward to seeing your live videos on Google!

Rich Results expands for Question & Answer pages

People come to Google seeking information about all kinds of questions.
Frequently, the information they're looking for is on sites where users ask and answer each other's questions. Popular social news sites, expert forums, and help and support message boards are all examples of this pattern.

A screenshot of an example search result for a page titled “Why do touchscreens sometimes register a touch when ...” with a preview of the top answers from the page.
In order to help users better identify which search results may give the best information about their question, we have developed a new rich result type for question and answer sites. Search results for eligible Q&A pages display a preview of the top answers. This new presentation helps site owners reach the right users for their content and helps users get the relevant information about their questions faster.
A screenshot of an example search result for a page titled “Why do touchscreens sometimes register a touch when ...” with a preview of the top answers from the page.

To be eligible for this feature, add Q&A structured data to your pages with Q&A content. Be sure to use the Structured Data Testing Tool to see if your page is eligible and to preview the appearance in search results. You can also check out Search Console to see aggregate stats and markup error examples. The Performance report also tells you which queries show your Q&A Rich Result in Search results, and how these change over time.
If you have any questions, ask us in the Webmaster Help Forum or reach out on Twitter!

Building Google Dataset Search and Fostering an Open Data Ecosystem



Earlier this month we launched Google Dataset Search, a tool designed to make it easier for researchers to discover datasets that can help with their work. What we colloquially call "Google Scholar for data,” Google Dataset Search is a search engine across metadata for millions of datasets in thousands of repositories across the Web. In this post, we go into some detail of how Dataset Search is built, outlining what we believe will help develop an open data ecosystem, and we also address the question that we received frequently since the Dataset Search launch, "Why is my dataset not showing up in Google Dataset Search?

An Overview
At a very high level, Google Data Search relies on dataset providers, big and small, adding structured metadata on their sites using the open schema.org/Dataset standard. The metadata specifies the salient properties of each dataset: its name and description, spatial and temporal coverage, provenance information, and so on. Dataset Search uses this metadata, links it with other resources that are available at Google (more on this below!), and builds an index of this enriched corpus of metadata. Once we built the index, we can start answering user queries — and figuring out which results best correspond to the query.
An overview of the technology behind Google Dataset Search
Using Structured Metadata from Data Providers
When Google's search engine processes a Web page with schema.org/Dataset mark-up, it understands that there is dataset metadata there and processes that structured metadata to create "records" describing each annotated dataset on a page. The use of schema.org allows developers to embed this structured information into HTML, without affecting the appearance of the page while making the semantics of the information visible to all search engines.

However, no matter how precise schema.org definitions or guidelines are, some metadata will inevitably be incomplete, wrong, or entirely missing. Furthermore, distinctions between some fields can be vague: is the dataset repository a publisher or a provider of a dataset? How do we distinguish between citations to a scientific paper that describes the creation of the dataset vs. papers describing its use? Indeed, many of these questions often generate active scholarly discussions.

Despite these variations, Dataset Search must provide a uniform and predictable user experience on the front end. Therefore, in some cases we substitute a more general field name (e.g., “provided by”) to display the values coming from multiple other fields (e.g., “publisher”, “creator”, etc.). In other cases, we are not able to use some of the fields at all: if a specific field is being misinterpreted in many different ways by dataset providers, we bypass that field for now and work with the community to clarify the guidelines. In each decision, we had one specific question that helped us in difficult cases "What will help data discovery the most?" This focus on the task that we were addressing made some of the problems easier than they seemed at first.

Connecting Replicas of Datasets
It is very common for a dataset, in particular a popular one, to be present in more than one repository. We use a variety of signals to determine when two datasets are replicas of each other. For example, schema.org has a way to specify the connection explicitly, through schema.org/sameAs, which is the best way to link different replicas together and to point to the canonical source of a dataset. Other signals include two datasets descriptions pointing to the same canonical page, having the same Digital Object Identifier (DOI), sharing links for downloading the dataset, or having a large overlap in other metadata fields. None of these signals are perfect in isolation, therefore we combine them to get the strongest possible indication of when two datasets are the same.

Reconciling to the Google Knowledge Graph
Google's Knowledge Graph is a powerful platform that describes and links information about many entities, including the ones that appear in dataset metadata: organizations providing datasets, locations for spatial coverage of the data, funding agencies, and so on. Therefore, we try to reconcile information mentioned in the metadata fields with the items in the Knowledge Graph. We can do this reconciliation with good precision for two main reasons. First, we know the types of items in the Knowledge Graph and the types of entities that we expect in the metadata fields. Therefore, we can limit the types of entities from the Knowledge Graph that we match with values for a particular metadata field. For example, a provider of a dataset should match with an organization entity in the Knowledge Graph and not with, say, a location. Second, the context of the Web page itself helps reduce the number of choices, which is particularly useful for distinguishing between organizations that share the same acronym. For example, the acronym CAMRA can stand for “Chilbolton Advanced Meteorological Radar” or “Campaign for Real Ale”. If we use terms from the Web page, we can then more easily determine that CAMRA is in fact the Chilbolton Radar when we see terms such as “clouds”, “vapor”, and “water” on the page.

This type of reconciliation opens up lots of possibilities to improve the search experience for users. For instance, Dataset Search can localize results by showing reconciled values of metadata in the same language as the rest of the page. Additionally, it can rely on synonyms, correct misspellings, expand acronyms, or use other relations in the Knowledge Graph for query expansion.

Linking to other Google Resources
Google has many other data resources that are useful in augmenting the dataset metadata, such as Google Scholar. Knowing which datasets are referenced and cited in publications serves at least two purposes:
  1. It provides a valuable signal about the importance and prominence of a dataset.
  2. It gives dataset authors an easy place to see citations to their data and to get credit.
Indeed, we hope that highlighting publications that use the data will lead to a more healthy ecosystem of data citation. For the moment, our links to Google scholar are very approximate as we lack a good model on how people cite data. We try to go beyond DOIs to give somewhat better coverage, but the number of articles citing a dataset ends up being approximate. We hope to make more progress in this area in order to get a higher level of precision.

Search and Ranking of Results
When a user issues a query, we search through the corpus of datasets, in a way not unlike Google Search works over Web pages. Just like with any search, we need to determine whether a document is relevant for the query and then rank the relevant documents. Because there are no large-scale studies on how users search for datasets, as a first approximation, we rely on Google Web ranking. However, ranking datasets is different from ranking Web pages, and we add some additional signals that take into account the metadata quality, citations, and so on. As Dataset Search gets used more by our users and we understand better how users search for datasets, we hope that ranking will improve significantly.

A Better Open Data Ecosystem
We built Dataset Search in an attempt to create a tool that will positively impact the discoverability of data. The decision to rely on open standards (schema.org, W3C DCAT, JSON-LD, etc.) for markup is intentional, as Dataset Search can only be as good as the open-data ecosystem that it supports. As such, Google Dataset Search aims to support a strong open data ecosystem by encouraging:
  1. Widespread adoption of open metadata formats to describe published data.
  2. Further development of open metadata formats to describe more types of data and in more detail.
  3. The culture of citing data the way we cite research publications, giving those who create and publish the data the credit that they deserve.
  4. The development of tools that leverage this metadata to enable more discovery or better use of data. 
The increased adoption of open metadata standards in conjunction with the continued development of Dataset Search (and, hopefully, other tools) should foster a healthier open data ecosystem where data is a first-class citizen of research.

So, Where is Your Dataset?
It is probably clear by now that Dataset Search is only as good as the metadata that exists on the Web pages for datasets. The most common answer to the question of why a specific dataset does not show up in our results is that the Web page for that dataset does not have any markup. Just pop that page into the Structured Data Testing Tool and you will see whether the markup is there. If you don't see any markup there, and you own the page, you can add it and if you don't own the page, you can ask the page owners to do it, which will make their page more easily discoverable by everyone.

We hope that the community finds Dataset Search useful, users make serendipitous discoveries and save time and scientists and journalists spend less time searching for data and more time using it.

Acknowledgements
We would like to thank Xiaomeng Ban, Dan Brickley, Lee Butler, Thomas Chen, Corinna Cortes, Kevin Espinoza, Archana Jain, Mike Jones, Kishore Papineni, Chris Sater, Gokhan Turhan, Shubin Zhao and Andi Vajda for their work on the project and all our partners, collaborators, and early adopters for their help.

Source: Google AI Blog


Hey Google, what’s the latest news?

Since launching the Google Assistant in 2016, we have seen users ask questions about everything from weather to recipes and news. In order to fulfill news queries with results people can count on, we collaborated on a new schema.org structured data specification called speakable for eligible publishers to mark up sections of a news article that are most relevant to be read aloud by the Google Assistant.

When people ask the Google Assistant -- "Hey Google, what's the latest news on NASA?", the Google Assistant responds with an excerpt from a news article and the name of the news organization. Then the Google Assistant asks if the user would like to hear another news article and also sends the relevant links to the user's mobile device.

As a news publisher, you can surface your content on the Google Assistant by implementing Speakable markup according to the developer documentation. This feature is now available for English language users in the US and we hope to launch in other languages and countries as soon as a sufficient number of publishers have implemented speakable. As this is a new feature, we are experimenting over time to refine the publisher and user experience.

If you have any questions, ask us in the Webmaster Help Forum. We look forward to hearing from you!

Introducing the Indexing API for job posting URLs

Last June we launched a job search experience that has since connected tens of millions of job seekers around the world with relevant job opportunities from third party providers across the web. Timely indexing of new job content is critical because many jobs are filled relatively quickly. Removal of expired postings is important because nothing's worse than finding a great job only to discover it's no longer accepting applications.

Today we're releasing the Indexing API to address this problem. This API allows any site owner to directly notify Google when job posting pages are added or removed. This allows Google to schedule job postings for a fresh crawl, which can lead to higher quality user traffic and job applicant satisfaction. Currently, the Indexing API can only be used for job posting pages that include job posting structured data.

For websites with many short-lived pages like job postings, the Indexing API keeps job postings fresh in Search results because it allows updates to be pushed individually. This API can be integrated into your job posting flow, allowing high quality job postings to be searchable quickly after publication. In addition, you can check the last time Google received each kind of notification for a given URL.

Follow the Quickstart guide to see how the Indexing API works. If you have any questions, ask us in the Webmaster Help Forum. We look forward to hearing from you!

Google Search at I/O 2018

With the eleventh annual Google I/O wrapped up, it’s a great time to reflect on some of the highlights.

What we did at I/O


The event was a wonderful way to meet many great people from various communities across the globe, exchange ideas, and gather feedback. Besides many great web sessions, codelabs, and office hours we shared a few things with the community in two sessions specific to Search:




The sessions included the launch of JavaScript error reporting in the Mobile Friendly Test tool, dynamic rendering (we will discuss this in more detail in a future post), and an explanation of how CMS can use the Indexing and Search Console APIs to provide users with insights. For example, Wix lets their users submit their homepage to the index and see it in Search results instantly, and Squarespace created a Google Search keywords report to help webmasters understand what prospective users search for.

During the event, we also presented the new Search Console in the Sandbox area for people to try and were happy to get a lot of positive feedback, from people being excited about the AMP Status report to others exploring how to improve their content for Search.

Hands-on codelabs, case studies and more


We presented the Structured Data Codelab that walks you through adding and testing structured data. We were really happy to see that it ended up being one of the top 20 codelabs by completions at I/O. If you want to learn more about the benefits of using Structured Data, check out our case studies.



During the in-person office hours we saw a lot of interest around HTTPS, mobile-first indexing, AMP, and many other topics. The in-person Office Hours were a wonderful addition to our monthly Webmaster Office Hours hangout. The questions and comments will help us adjust our documentation and tools by making them clearer and easier to use for everyone.

Highlights and key takeaways


We also repeated a few key points that web developers should have an eye on when building websites, such as:


  • Indexing and rendering don’t happen at the same time. We may defer the rendering to a later point in time.
  • Make sure the content you want in Search has metadata, correct HTTP statuses, and the intended canonical tag.
  • Hash-based routing (URLs with "#") should be deprecated in favour of the JavaScript History API in Single Page Apps.
  • Links should have an href attribute pointing to a URL, so Googlebot can follow the links properly.

Make sure to watch this talk for more on indexing, dynamic rendering and troubleshooting your site. If you wanna learn more about things to do as a CMS developer or theme author or Structured Data, watch this talk.

We were excited to meet some of you at I/O as well as the global I/O extended events and share the latest developments in Search. To stay in touch, join the Webmaster Forum or follow us on Twitter, Google+, and YouTube.

 

We updated our job posting guidelines

Last year, we launched job search on Google to connect more people with jobs. When you provide Job Posting structured data, it helps drive more relevant traffic to your page by connecting job seekers with your content. To ensure that job seekers are getting the best possible experience, it's important to follow our Job Posting guidelines.

We've recently made some changes to our Job Posting guidelines to help improve the job seeker experience.

  • Remove expired jobs
  • Place structured data on the job's detail page
  • Make sure all job details are present in the job description

Remove expired jobs

When job seekers put in effort to find a job and apply, it can be very discouraging to discover that the job that they wanted is no longer available. Sometimes, job seekers only discover that the job posting is expired after deciding to apply for the job. Removing expired jobs from your site may drive more traffic because job seekers are more confident when jobs that they visit on your site are still open for application. For more information on how to remove a job posting, see Remove a job posting.


Place structured data on the job's detail page

Job seekers find it confusing when they land on a list of jobs instead of the specific job's detail page. To fix this, put structured data on the most detailed leaf page possible. Don't add structured data to pages intended to present a list of jobs (for example, search result pages) and only add it to the most specific page describing a single job with its relevant details.

Make sure all job details are present in the job description

We've also noticed that some sites include information in the JobPosting structured data that is not present anywhere in the job posting. Job seekers are confused when the job details they see in Google Search don't match the job description page. Make sure that the information in the JobPosting structured data always matches what's on the job posting page. Here are some examples:

  • If you add salary information to the structured data, then also add it to the job posting. Both salary figures should match.
  • The location in the structured data should match the location in the job posting.

Providing structured data content that is consistent with the content of the job posting pages not only helps job seekers find the exact job that they were looking for, but may also drive more relevant traffic to your job postings and therefore increase the chances of finding the right candidates for your jobs.

If your site violates the Job Posting guidelines (including the guidelines in this blog post), we may take manual action against your site and it may not be eligible for display in the jobs experience on Google Search. You can submit a reconsideration request to let us know that you have fixed the problem(s) identified in the manual action notification. If your request is approved, the manual action will be removed from your site or page.

For more information, visit our Job Posting developer documentation and our JobPosting FAQ.

Introducing Rich Results & the Rich Results Testing Tool

Over the years, the different ways you can choose to highlight your website's content in search has grown dramatically. In the past, we've called these rich snippets, rich cards, or enriched results. Going forward - to simplify the terminology -  our documentation will use the name "rich results" for all of them. Additionally, we're introducing a new rich results testing tool to make diagnosing your pages' structured data easier.

The new testing tool focuses on the structured data types that are eligible to be shown as rich results. It allows you to test all data sources on your pages, such as JSON-LD (which we recommend), Microdata, or RDFa. The new tool provides a more accurate reflection of the page’s appearance on Search and includes improved handling for Structured Data found on dynamically loaded content. The tests for Recipes, Jobs, Movies, and Courses are currently supported -- but this is just a first step, we plan on expanding over time.

Testing a page is easy: just open the testing tool, enter a URL, and review the output. If there are issues, the tool will highlight the invalid code in the page source. If you're working with others on this page, the share-icon on the bottom-right lets you do that quickly. You can also use preview button to view all the different rich results the page is eligible for. And … once you're happy with the result, use Submit To Google to fetch & index this page for search.

Want to get started with rich snippets rich results? Check out our guides for marking up your content. Feel free to drop by our Webmaster Help forums should you have any questions or get stuck; the awesome experts there can often help resolve issues and give you tips in no time!