Tag Archives: Structured Data

Improving Developer Experience for Writing Structured Data

Though we’re still waiting on the full materialization of the promise of the Semantic Web, search engines—including Google—are heavy consumers of structured data on the web through Schema.org. In 2015, pages with Schema.org markup accounted for 31.3% of the web. Among SEO communities, interest in Schema.org and structured data has been on the rise in recent years.

Yet, as the use of structured data continues to grow, the developer experience in authoring pieces of structured data remains spotty. I ran into this as I was trying to write my own snippets of JSON-LD. It turns out, the state-of-the-art way of writing JSON-LD is to: read the Schema.org reference; try writing a JSON literal on your own; when you think you’re done, paste the JSON into a validator (like Google’s structured data testing tool); see what’s wrong, fix; and repeat, as needed.

If it’s your first time writing JSON-LD, you might spend a few minutes figuring out how to represent an enum or boolean, looking for examples as needed.

Enter schema-dts

My experience left me with a feeling that things could be improved; writing JSON-LD should be no harder than any JSON that is constrained by a certain schema. This led me to create schema-dts (npm, github) a TypeScript-based library (and an optional codegen tool) with type definitions of the latest Schema.org JSON-LD spec.

The thinking was this: Just as IDEs (and, later, language server protocols for lightweight code editors) supercharge our developer experience with as-you-type error highlighting and code completions, we can supercharge the experience of writing those JSON-LD literals.

With IDEs and language server protocols, the write-test-debug loop was made much tighter. Developers get immediate feedback on the basic correctness of the code they write, rather than having to save sporadically and feed their code to a compiler for that feedback. With schema-dts, we try to take validators like the structured data testing tool out of the critical path of write-test-debug. Instead, you can use a library to type-check your JSON, reporting errors as you type, and offering completions for `@type`s, property names, and their values.


Thanks to TypeScript’s structural typing and discriminated unions, the general shape of Schema.org’s JSON-LD can be well-represented in TypeScript typings. I have previously described the type theory behind creating a TypeScript structure that expresses the Schema.org class structure, enumerations, `DataType`s, and properties.

Schema-dts includes two related pieces: the ‘default’ schema-dts NPM package which includes the latest Schema.org definitions, and the schema-dts-gen CLI which allows you to create your own typing definitions from Schema.org-like .nt N-Triple files. The CLI also has flags to control whether deprecated classes, properties, and enums should be included, what `@context` should be assumed by objects you write, etc.

Goals and Non-Goals

The goal of schema-dts isn’t to make type definitions that accept all legal Schema.org JSON literals. Rather, it is to make sure we provide typings that always (or almost always) result in legal Schema.org JSON-LD literals that search engines would accept. In the process, we’d like to make sure it’s as general as possible, without sacrificing type checking and useful completions.

For instance, RDF’s perspective is that structured data is property-centric, and the Schema.org reference of the domains and ranges of properties is only a suggestion for what values are inferred as. RDF actually permits values of any type to be assigned to a property. Instead, schema-dts will actually constrain you by the Schema.org values.

***

If you’re passionate about structured data, try schema-dts and join the conversation on GitHub!

By: Eyas Sharaiha, Geo Engineering & Open Source scheme-dts Project

Google Search News: coming soon to a screen near you

The world of search is constantly evolving. New tools, opportunities, and features are regularly arriving, sometimes existing things change, and sometimes we say goodbye to some things to make way for the new. To help you stay on top of things, we've started a new YouTube series called Google Search News.

With Google Search News, we want to give you a regular & short summary of what's been happening around Google Search, specifically for SEOs, publishers, developers, and webmasters. The first episode is out now, so check it out. 

(The first episode, now on your screen)

In this first episode, we cover:

We plan to make these updates regularly, and adjust the format over time as needed. Let us know what you think in the video comments!

An End-to-End AutoML Solution for Tabular Data at KaggleDays



Machine learning (ML) for tabular data (e.g. spreadsheet data) is one of the most active research areas in both ML research and business applications. Solutions to tabular data problems, such as fraud detection and inventory prediction, are critical for many business sectors, including retail, supply chain, finance, manufacturing, marketing and others. Current ML-based solutions to these problems can be achieved by those with significant ML expertise, including manual feature engineering and hyper-parameter tuning, to create a good model. However, the lack of broad availability of these skills limits the efficiency of business improvements through ML.

Google’s AutoML efforts aim to make ML more scalable and accelerate both research and industry applications. Our initial efforts of neural architecture search have enabled breakthroughs in computer vision with NasNet, and evolutionary methods such as AmoebaNet and hardware-aware mobile vision architecture MNasNet further show the benefit of these learning-to-learn methods. Recently, we applied a learning-based approach to tabular data, creating a scalable end-to-end AutoML solution that meets three key criteria:
  • Full automation: Data and computation resources are the only inputs, while a servable TensorFlow model is the output. The whole process requires no human intervention.
  • Extensive coverage: The solution is applicable to the majority of arbitrary tasks in the tabular data domain.
  • High quality: Models generated by AutoML has comparable quality to models manually crafted by top ML experts.
To benchmark our solution, we entered our algorithm in the KaggleDays SF Hackathon, an 8.5 hour competition of 74 teams with up to 3 members per team, as part of the KaggleDays event. The first time that AutoML has competed against Kaggle participants, the competition involved predicting manufacturing defects given information about the material properties and testing results for batches of automotive parts. Despite competing against participants thats were at the Kaggle progression system Master level, including many who were at the GrandMaster level, our team (“Google AutoML”) led for most of the day and ended up finishing second place by a narrow margin, as seen in the final leaderboard.

Our team’s AutoML solution was a multistage TensorFlow pipeline. The first stage is responsible for automatic feature engineering, architecture search, and hyperparameter tuning through search. The promising models from the first stage are fed into the second stage, where cross validation and bootstrap aggregating are applied for better model selection. The best models from the second stage are then combined in the final model.
The workflow for the “Google AutoML” team was quite different from that of other Kaggle competitors. While they were busy with analyzing data and experimenting with various feature engineering ideas, our team spent most of time monitoring jobs and and waiting for them to finish. Our solution for second place on the final leaderboard required 1 hour on 2500 CPUs to finish end-to-end.

After the competition, Kaggle published a public kernel to investigate winning solutions and found that augmenting the top hand-designed models with AutoML models, such as ours, could be a useful way for ML experts to create even better performing systems. As can be seen in the plot below, AutoML has the potential to enhance the efforts of human developers and address a broad range of ML problems.
Potential model quality improvement on final leaderboard if AutoML models were merged with other Kagglers’ models. “Erkut & Mark, Google AutoML”, includes the top winner “Erkut & Mark” and the second place “Google AutoML” models. Erkut Aykutlug and Mark Peng used XGBoost with creative feature engineering whereas AutoML uses both neural network and gradient boosting tree (TFBT) with automatic feature engineering and hyperparameter tuning.
Google Cloud AutoML Tables
The solution we presented at the competitions is the main algorithm in Google Cloud AutoML Tables, which was recently launched (beta) at Google Cloud Next ‘19. The AutoML Tables implementation regularly performs well in benchmark tests against Kaggle competitions as shown in the plot below, demonstrating state-of-the-art performance across the industry.
Third party benchmark of AutoML Tables on multiple Kaggle competitions
We are excited about the potential application of AutoML methods across a wide range of real business problems. Customers have already been leveraging their tabular enterprise data to tackle mission-critical tasks like supply chain management and lead conversion optimization using AutoML Tables, and we are excited to be providing our state-of-the-art models to solve tabular data problems.

Acknowledgements
This project was only possible thanks to Google Brain team members Ming Chen, Da Huang, Yifeng Lu, Quoc V. Le and Vishy Tirumalashetty. We also thank Dawei Jia, Chenyu Zhao and Tin-yun Ho from the Cloud AutoML Tables team for great infrastructure and product landing collaboration. Thanks to Walter Reade, Julia Elliott and Kaggle for organizing such an engaging competition.

Source: Google AI Blog


Google I/O 2019 – What sessions should SEOs and webmasters watch?

Google I/O 2019 is starting tomorrow and will run for 3 days, until Thursday. Google I/O is our yearly developers festival, where product announcements are made, new APIs and frameworks are introduced, and Product Managers present the latest from Google to an audience of 7,000+ developers who fly to California.

However, you don't have to physically attend the event to take advantage of this once-a-year opportunity: many conferences and talks are live streamed on YouTube for anyone to watch. Browse the full schedule of events, including a list of talks that we think will be interesting for webmasters to watch (all talks are in English). All the links shared below will bring you to pages with more details about each talk, and links to watch the sessions will display on the day of each event. All times are Pacific Central time (California time).



This list is only a small part of the agenda that we think is useful to webmasters and SEOs. There are many more sessions that you could find interesting! To learn about those other talks, check out the full list of “web” sessions, design sessions, Cloud sessions, machine learning sessions, and more. Use the filtering function to toggle the sessions on and off.

We hope you can make the time to watch the talks online, and participate in the excitement of I/O ! The videos will also be available on Youtube after the event, in case you can't tune in live.

Posted by Vincent Courson, Search Outreach Specialist

Help Google Search know the best date for your web page

Sometimes, Google shows dates next to listings in its search results. In this post, we’ll answer some commonly-asked questions webmasters have about how these dates are determined and provide some best practices to help improve their accuracy.

How dates are determined

Google shows the date of a page when its automated systems determine that it would be relevant to do so, such as for pages that can be time-sensitive, including news content:

Google determines a date using a variety of factors, including but not limited to: any prominent date listed on the page itself or dates provided by the publisher through structured markup.

Google doesn’t depend on one single factor because all of them can be prone to issues. Publishers may not always provide a clear visible date. Sometimes, structured data may be lacking or may not be adjusted to the correct time zone. That’s why our systems look at several factors to come up with what we consider to be our best estimate of when a page was published or significantly updated.

How to specify a date on a page

To help Google to pick the right date, site owners and publishers should:

  • Show a clear date: Show a visible date prominently on the page.
  • Use structured data: Use the datePublished and dateModified schema with the correct time zone designator for AMP or non-AMP pages. When using structured data, make sure to use the ISO 8601 format for dates.

Guidelines specific to Google News

Google News requires clearly showing both the date and the time that content was published or updated. Structured data alone is not enough, though it is recommended to use in addition to a visible date and time. Date and time should be positioned between the headline and the article text. For more guidance, also see our help page about article dates.

If an article has been substantially changed, it can make sense to give it a fresh date and time. However, don't artificially freshen a story without adding significant information or some other compelling reason for the freshening. Also, do not create a very slightly updated story from one previously published, then delete the old story and redirect to the new one. That's against our article URLs guidelines.

More best practices for dates on web pages

In addition to the most important requirements listed above, here are additional best practices to help Google determine the best page to consider showing for a web page:

  • Show when a page has been updated: If you update a page significantly, also update the visible date (and time, if you display that). If desired, you can show two dates: when a page was originally published and when it was updated. Just do so in a way that’s visually clear to your readers. If showing both dates, it’s also highly recommended to use datePublished and dateModified for AMP or non-AMP pages to make it easier for algorithms to recognize.
  • Use the right time zone: If specifying a time, make sure to provide the correct timezone, taking into account daylight saving time as appropriate.
  • Be consistent in usage. Within a page, make sure to use exactly the same date (and, potentially, time) in structured data as well as in the visible part of the page. Make sure to use the same timezone if you specify one on the page.
  • Don’t use future dates or dates related to what a page is about: Always use a date for when a page itself was published or updated, not a date linked to something like an event that the page is writing about, especially for events or other subjects that happen in the future (you may use Event markup separately, if appropriate).
  • Follow Google's structured data guidelines: While Google doesn't guarantee that a date (or structured data in general) specified on a page will be used, following our structured data guidelines does help our algorithms to have it available in a machine-readable way.
  • Troubleshoot by minimizing other dates on the page: If you’ve followed the best practices above and find incorrect dates are being selected, consider if you can remove or minimize other dates that may appear on the page, such as those that might be next to related stories.

We hope these guidelines help to make it easier to specify the right date on your website's pages! For questions or comments on this, or other structured data topics, feel free to drop by our webmaster help forums.


Help customers discover your products on Google

People come to Google to discover new brands and products throughout their shopping journey. On Search and Google Images, shoppers are provided with rich snippets like product description, ratings, and price to help guide purchase decisions.

Connecting potential customers with up-to-date and accurate product information is key to successful shopping journeys on Google, so today, we’re introducing new ways for merchants to provide this information to improve results for shoppers.

  1. Search Console

    Many retailers and brands add structured data markup to their websites to ensure Google understands the products they sell. A new report for ‘Products’ is now available in Search Console for sites that use schema.org structured data markup to annotate product information. The report allows you to see any pending issues for markup on your site. Once an issue is fixed, you can use the report to validate if your issues were resolved by re-crawling your affected pages. Learn more about the rich result status reports

  1. Merchant Center

    While structured data markup helps Google properly display your product information when we crawl your site, we are expanding capabilities for all retailers to directly provide up-to-date product information to Google in real-time. Product data feeds uploaded to Google Merchant Center will now be eligible for display in results on surfaces like Search and Google Images. This product information will be ranked based only on relevance to users’ queries, and no payment is required or accepted for eligibility. We’re starting with the expansion in the US, and support for other countries will be announced later in the year.

    Get started

    You don’t need a Google Ads campaign to participate. If you don’t have an existing account and sell your products in the US, create a Merchant Center account and upload a product data feed.

  1. Manufacturer Center

    We’re also rolling out new features to improve your brand’s visibility and help customers find your products on Google by providing authoritative and up-to-date product information through Google Manufacturer Center. This information includes product description, variants, and rich content, such as high-quality images and videos that can show on the product’s knowledge panel.

These solutions give you multiple options to better reach and inform potential customers about your products as they shop across Google.

If you have any questions, be sure to post in our forum.

Ways to succeed in Google News

With the New Year now underway, we'd like to offer some best practices and advice we hope will lead publishers to more success within Google News in 2019.

General advice

There is a lot of helpful information to consider within the Google News Publisher Help Center. Be sure to have read the material in this area, in particular the content and technical guidelines.

Headlines and dates


  • Present clear headlines: Google News looks at a variety of signals to determine the headline of an article, including within your HTML title tag and for the most prominent text on the page. Review our headline tips.
  • Provide accurate times and dates: Google News tries to determine the time and date to display for an article in a variety of ways. You can help ensure we get it right by using the following methods:
    • Show one clear date and time: As per our date guidelines, show a clear, visible date and time between the headline and the article text. Prevent other dates from appearing on the page whenever possible, such as for related stories.
    • Use structured data: Use the datePublished and dateModified schema and use the correct time zone designator for AMP or non-AMP pages
  • Avoid artificially freshening stories: If an article has been substantially changed, it can make sense to give it a fresh date and time. However, don't artificially freshen a story without adding significant information or some other compelling reason for the freshening. Also, do not create a very slightly updated story from one previously published, then delete the old story and redirect to the new one. That's against our article URLs guidelines.

Duplicate content

Google News seeks to reward independent, original journalistic content by giving credit to the originating publisher, as both users and publishers would prefer. This means we try not to allow duplicate content—which includes scraped, rewritten, or republished material—to perform better than the original content. In line with this, these are guidelines publishers should follow:

  • Block scraped content: Scraping commonly refers to taking material from another site, often on an automated basis. Sites that scrape content must block scraped content from Google News.
  • Block rewritten content: Rewriting refers to taking material from another site, then rewriting that material so that it is not identical. Sites that rewrite content in a way that provides no substantial or clear added value must block that rewritten content from Google News. This includes, but is not limited to, rewrites that make only very slight changes or those that make many word replacements but still keep the original article's overall meaning.
  • Block or consider canonical for republished content: Republishing refers to when a publisher has permission from another publisher or author to republish an original work, such as material from wire services or in partnership with other publications.
    Publishers that allow others to republish content can help ensure that their original versions perform better in Google News by asking those republishing to block or make use of canonical.
    Google News also encourages those that republish material to consider proactively blocking such content or making use of the canonical, so that we can better identify the original content and credit it appropriately.
  • Avoid duplicate content: If you operate a network of news sites that share content, the advice above about republishing is applicable to your network. Select what you consider to be the original article and consider blocking duplicates or making use of the canonical to point to the original.

Transparency


  • Be transparent: Visitors to your site want to trust and understand who publishes it and information about those who have written articles. That's why our content guidelines stress that content should have posts with clear bylines, information about authors, and contact information for the publication.
  • Don't be deceptive: Our content policies do not allow sites or accounts that impersonate any person or organization, or that misrepresent or conceal their ownership or primary purpose. We do not allow sites or accounts that engage in coordinated activity to mislead users. This includes, but isn't limited to, sites or accounts that misrepresent or conceal their country of origin or that direct content at users in another country under false premises.

More tips


  • Avoid taking part in link schemes: Don't participate in link schemes, which can include large-scale article marketing programs or selling links that pass PageRank. Review our page on link schemes for more information.
  • Use structured for rich presentation: Both those using AMP and non-AMP pages can make use of structured data to optimize your content for rich results or carousel-like presentations.
  • Protect your users and their data: Consider securing every page of your website with HTTPS to protect the integrity and confidentiality of the data users exchange on your site. You can find more useful tips in our best practices on how to implement HTTPS.

Here's to a great 2019!

We hope these tips help publishers succeed in Google News over the coming year. For those who have more questions about Google News, we are unable to do one-to-one support. However, we do monitor our Google News Publisher Forum—which has been newly-revamped—and try to provide guidance on questions that might help a number of publishers all at once. The forum is also a great resource where publishers share tips and advice with each other.

Mobile-First indexing, structured data, images, and your site

It's been two years since we started working on "mobile-first indexing" - crawling the web with smartphone Googlebot, similar to how most users access it. We've seen websites across the world embrace the mobile web, making fantastic websites that work on all kinds of devices. There's still a lot to do, but today, we're happy to announce that we now use mobile-first indexing for over half of the pages shown in search results globally.

Checking for mobile-first indexing

In general, we move sites to mobile-first indexing when our tests assure us that they're ready. When we move sites over, we notify the site owner through a message in Search Console. It's possible to confirm this by checking the server logs, where a majority of the requests should be from Googlebot Smartphone. Even easier, the URL inspection tool allows a site owner to check how a URL from the site (it's usually enough to check the homepage) was last crawled and indexed.

If your site uses responsive design techniques, you should be all set! For sites that aren't using responsive web design, we've seen two kinds of issues come up more frequently in our evaluations:

Missing structured data on mobile pages

Structured data is very helpful to better understand the content on your pages, and allows us to highlight your pages in fancy ways in the search results. If you use structured data on the desktop versions of your pages, you should have the same structured data on the mobile versions of the pages. This is important because with mobile-first indexing, we'll only use the mobile version of your page for indexing, and will otherwise miss the structured data.

Testing your pages in this regard can be tricky. We suggest testing for structured data in general, and then comparing that to the mobile version of the page. For the mobile version, check the source code when you simulate a mobile device, or use the HTML generated with the mobile-friendly testing tool. Note that a page does not need to be mobile-friendly in order to be considered for mobile-first indexing.

Missing alt-text for images on mobile pages

The value of alt-attributes on images ("alt-text") is a great way to describe images to users with screen-readers (which are used on mobile too!), and to search engine crawlers. Without alt-text for images, it's a lot harder for Google Images to understand the context of images that you use on your pages.

Check "img" tags in the source code of the mobile version for representative pages of your website. As above, the source of the mobile version can be seen by either using the browser to simulate a mobile device, or by using the Mobile-Friendly test to check the Googlebot rendered version. Search the source code for "img" tags, and double-check that your page is providing appropriate alt-attributes for any that you want to have findable in Google Images.

For example, that might look like this:

With alt-text (good!):
<img src="cute-puppies.png" alt="A photo of cute puppies on a blanket">

Without alt-text:
<img src="sad-puppies.png">

It's fantastic to see so many great websites that work well on mobile! We're looking forward to being able to index more and more of the web using mobile-first indexing, helping more users to search the web in the same way that they access it: with a smartphone. We’ll continue to monitor and evaluate this change carefully. If you have any questions, please drop by our Webmaster forums or our public events.


Introducing the Indexing API and structured data for livestreams

Over the past few years, it's become easier than ever to stream live videos online, from celebrity updates to special events. But it's not always easy for people to determine which videos are live and know when to tune in.
Today, we're introducing new tools to help more people discover your livestreams in Search and Assistant. With livestream structured data and the Indexing API, you can let Google know when your video is live, so it will be eligible to appear with a red "live" badge:

Add livestream structured data to your page

If your website streams live videos, use the livestream developer documentation to flag your video as a live broadcast and mark the start and end times. In addition, VideoObject structured data is required to tell Google that there's a video on your page.

Update Google quickly with the Indexing API

The Indexing API now supports pages with livestream structured data. We encourage you to call the Indexing API to request that your site is crawled in time for the livestream. We recommend calling the Indexing API when your livestream begins and ends, and if the structured data changes.
For more information, visit our developer documentation. If you have any questions, ask us in the Webmaster Help Forum. We look forward to seeing your live videos on Google!

Rich Results expands for Question & Answer pages

People come to Google seeking information about all kinds of questions.
Frequently, the information they're looking for is on sites where users ask and answer each other's questions. Popular social news sites, expert forums, and help and support message boards are all examples of this pattern.

A screenshot of an example search result for a page titled “Why do touchscreens sometimes register a touch when ...” with a preview of the top answers from the page.
In order to help users better identify which search results may give the best information about their question, we have developed a new rich result type for question and answer sites. Search results for eligible Q&A pages display a preview of the top answers. This new presentation helps site owners reach the right users for their content and helps users get the relevant information about their questions faster.
A screenshot of an example search result for a page titled “Why do touchscreens sometimes register a touch when ...” with a preview of the top answers from the page.

To be eligible for this feature, add Q&A structured data to your pages with Q&A content. Be sure to use the Structured Data Testing Tool to see if your page is eligible and to preview the appearance in search results. You can also check out Search Console to see aggregate stats and markup error examples. The Performance report also tells you which queries show your Q&A Rich Result in Search results, and how these change over time.
If you have any questions, ask us in the Webmaster Help Forum or reach out on Twitter!