Author Archives: Google Devs

Apps for the Google Assistant: new languages, devices and features!

Posted by Brad Abrams, Product Manager

As you may have seen, it's a big day for the Google Assistant with new features, new devices and new languages coming soon. But it's also a big day for developers like you, as Actions on Google is also coming to new devices and new languages, and getting better for building and sharing apps.

Say hallo, bonjour and kon'nichiwa to apps for the Google Assistant

Actions on Google is already available in English in the US, UK and Australia and today, we're adding new languages to the mix—German (de-DE), French (fr-FR), Japanese (ja-JP), Korean (kr-KR), and both French and English in Canada (en-CA, fr-CA). Starting this week, you can build apps for the Google Assistant in these new languages and soon, they'll be available via the Assistant! Users will soon be able to talk to apps like Zalando, Namatata and Drop the Puck, with more apps on the way. We can't wait to see what you build!

Apps are now available on new devices

Along with the new Pixelbook come apps for the Assistant. As soon as the Pixelbook hits shelves later this year, your apps will just work, with no extra steps from you! With that said, as with every new surface, especially one with a screen, it's good to make sure that your app is in tip top shape, including using high quality images or adding images to make your conversations more visual.

With apps on Pixelbook, you'll be able to reach a whole new audience and give users the chance to explore your app on a bigger screen, while they get things done throughout their day.

And, in case you missed it, we also recently introduced apps on headphones optimized for the Google Assistant and with the Assistant on Android TV.

Building Apps for Families

Today we shared how the Assistant is great for families—giving people the chance to connect, explore, learn and have fun with the Assistant. And from trivia to storytelling, you can now build Apps for Families and get a dedicated badge via the Assistant on your phone, letting people know your app is family friendly! Soon, users will be able to say "Ok Google, what's my Justice League superhero?" or "Ok Google, play Sports Illustrated Kids Trivia" if you're looking for a game. Or 'Ok Google, let's learn" for some educational fun.

To participate, you first need to make sure your app complies with the program policies and, after that, simply submit it for review. Once approved it will be live for anyone to try! You can learn more about that here. Apps for Families will only be available in US English at the start.

Apps made easy with templates and more

It's easier than ever to make your first (or fifth!) app. With new templates, you can create your own trivia game, flash card app or personality quiz for the Google Assistant without writing code or doing any complex server configurations. All you have to do is add some questions and answers via a Google Sheet. Within minutes, voilà, you can try it out on your Google Assistant and publish it! And if you want to try one today, just say "Ok Google, Play Planet Quiz"

We even provide pre-defined personalities when you create an app from the templates, offering a voice, tone and natural conversational feel for your app's users, without any additional work on your end.

If you prefer to code your own apps, we put a fresh coat of paint on our Actions Console UI to make it easier to create apps with tools like API.AI.

Transactions with the Assistant on phones

In May we announced that you could start building transactional apps for the Google Assistant on phones and starting this week in the US, you can submit your apps for review! To get a first look at how transactions will work, you'll soon be able to try out 1-800-Flowers, Applebee's, Panera and Ticketmaster.

Ready to give it a try for yourself? You can build and test transactional apps that include payments, status updates and follow-on actions here.

In addition to paying, with transactional apps, a user can see their order history, get status updates and more.

Community Program: Benefits for great apps

And, last up, to support your efforts in building apps for the Google Assistant and celebrate your accomplishments, we created a new developer community program. Starting with up to $200 in monthly Google Cloud credit and an Assistant t-shirt when you publish your first app, the perks and opportunities available to you will grow as you hit milestone after milestone including your chance to earn a Google Home. And if you've already created an app, don't fret! Your perks are on the way!

Thanks for everything you do to make the Assistant more helpful, fun and interactive! It's been an exciting 10 months to see the platform expand to new languages and devices and to see what you've all created.

Introducing Cloud Firestore: Our New Document Database for Apps

Originally posted by Alex Dufetel on the Firebase Blog

Today we're excited to launch Cloud Firestore, a fully-managed NoSQL document database for mobile and web app development. It's designed to easily store and sync app data at global scale, and it's now available in beta.

Key features of Cloud Firestore include:

  • Documents and collections with powerful querying
  • iOS, Android, and Web SDKs with offline data access
  • Real-time data synchronization
  • Automatic, multi-region data replication with strong consistency
  • Node, Python, Go, and Java server SDKs

And of course, we've aimed for the simplicity and ease-of-use that is always top priority for Firebase, while still making sure that Cloud Firestore can scale to power even the largest apps.

Optimized for app development

Managing app data is still hard; you have to scale servers, handle intermittent connectivity, and deliver data with low latency.

We've optimized Cloud Firestore for app development, so you can focus on delivering value to your users and shipping better apps, faster. Cloud Firestore:

  • Synchronizes data between devices in real-time. Our Android, iOS, and Javascript SDKs sync your app data almost instantly. This makes it incredibly easy to build reactive apps, automatically sync data across devices, and build powerful collaborative features -- and if you don't need real-time sync, one-time reads are a first-class feature.
  • Uses collections and documents to structure and query data. This data model is familiar and intuitive for many developers. It also allows for expressive queries. Queries scale with the size of your result set, not the size of your data set, so you'll get the same performance fetching 1 result from a set of 100, or 100,000,000.
  • Enables offline data access via a powerful, on-device database. This local database means your app will function smoothly, even when your users lose connectivity. This offline mode is available on Web, iOS and Android.
  • Enables serverless development. Cloud Firestore's client-side SDKs take care of the complex authentication and networking code you'd normally need to write yourself. Then, on the backend, we provide a powerful set of security rules so you can control access to your data. Security rules let you control which users can access which documents, and let you apply complex validation logic to your data as well. Combined, these features allow your mobile app to connect directly to your database.
  • Integrates with the rest of the Firebase platform. You can easily configure Cloud Functions to run custom code whenever data is written, and our SDKs automatically integrate with Firebase Authentication, to help you get started quickly.

Putting the 'Cloud' in Cloud Firestore

As you may have guessed from the name, Cloud Firestore was built in close collaboration with the Google Cloud Platform team.

This means it's a fully managed product, built from the ground up to automatically scale. Cloud Firestore is a multi-region replicated database that ensures once data is committed, it's durable even in the face of unexpected disasters. Not only that, but despite being a distributed database, it's also strongly consistent, removing tricky edge cases to make building apps easier regardless of scale.

It also means that delivering a great server-side experience for backend developers is a top priority. We're launching SDKs for Java, Go, Python, and Node.js today, with more languages coming in the future.

Another database?

Over the last 3 years Firebase has grown to become Google's app development platform; it now has 16 products to build and grow your app. If you've used Firebase before, you know we already offer a database, the Firebase Realtime Database, which helps with some of the challenges listed above.

The Firebase Realtime Database, with its client SDKs and real-time capabilities, is all about making app development faster and easier. Since its launch, it has been adopted by hundred of thousands of developers, and as its adoption grew, so did usage patterns. Developers began using the Realtime Database for more complex data and to build bigger apps, pushing the limits of the JSON data model and the performance of the database at scale. Cloud Firestore is inspired by what developers love most about the Firebase Realtime Database while also addressing its key limitations like data structuring, querying, and scaling.

So, if you're a Firebase Realtime Database user today, we think you'll love Cloud Firestore. However, this does not mean that Cloud Firestore is a drop-in replacement for the Firebase Realtime Database. For some use cases, it may make sense to use the Realtime Database to optimize for cost and latency, and it's also easy to use both databases together. You can read a more in-depth comparison between the two databases here.

We're continuing development on both databases and they'll both be available in our console and documentation.

Get started!

Cloud Firestore enters public beta starting today. If you're comfortable using a beta product you should give it a spin on your next project! Here are some of the companies and startups who are already building with Cloud Firestore:

Get started by visiting the database tab in your Firebase console. For more details, see the documentation, pricing, code samples, performance limitations during beta, and view our open source iOS and JavaScript SDKs on GitHub.

We can't wait to see what you build and hear what you think of Cloud Firestore!

Generating Google Slides from Images using Apps Script

Posted by Wesley Chun (@wescpy), Developer Advocate, G Suite

Today, we announceda collection of exciting new features in Google Slides—among these is support for Google Apps Script. Now you can use Apps Script for Slides to programmatically create and modify Slides, plus customize menus, dialog boxes and sidebars in the user interface.

Programming presentations with Apps Script

Presentations have come a long way—from casting hand shadows over fires in caves to advances in lighting technology (magic lanterns) to, eventually, (in)famous 35mm slide shows of your Uncle Bob's endless summer vacation. More recently, we have presentation software—like Slides—and developers have been able to write applications to create or update them. This is made even easier with the new Apps Script support for Google Slides. In the latest G Suite Dev Show episode, we demo this new service, walking you through a short example that automatically creates a slideshow from a collection of images.

To keep things simple, the chosen images are already available online, accessible by URL. For each image, a new (blank) slide is added then the image is inserted. The key to this script are two lines of JavaScript (given an existing presentation and a link to each image):

      var slide = presentation.appendSlide(SlidesApp.PredefinedLayout.BLANK);
var image = slide.insertImage(link);

The first line of code adds a new slide while the other inserts an image on the new slide. Both lines are repeated for each image in the collection. While this initial, rudimentary solution works, the slide presentation created doesn't exactly fit the bill. It turns out that adding a few more lines make the application much more useful. See the video for all the details.

Getting started

To get started, check the documentation to learn more about Apps Scripts for Slides, or check out the Translateand Progress Bar sample Add-ons. If you want to dig deeper into the code sample from our video, take a look at the corresponding tutorial. And, if you love watching videos, check out our Apps Script video library or other G Suite Dev Show episodes. If you wish to build applications with Google Slides outside of the Apps Script environment and want to use your own development tools, you can do so with the Slides (REST) API—check out its documentation and video library.

With all these options, we look forward to seeing the applications you build with Google Slides!

Monitoring the Apps Script issue tracker…with Apps Script

Originally posted by Paul McReynolds, Product Manager (@pauljmcr), Apps Script and Wesley Chun (@wescpy), Developer Advocate, G Suite on the G Suite Developers Blog

Apps Script is just as popular inside Google as it is among external users and developers. In fact, there are more than 70,000 weekly active scripts written by thousands of Googlers. One of our many uses for Apps Script at Google is to automate and monitor our internal issue tracker.

Your business depends on Apps Script...so does ours

In spring of this year, we migratedour G Suite issue trackers to a new system based on our internal tracker. This carries a lot of benefits, including improving our ability to track how issues reported from outside of Google relate to bugs and features we're working on internally. We also have an internal Apps Script API that talks to our issue tracker, which we can now use to work with issues reported from outside of Google.

As soon as the migration was finished, we put Apps Script to work monitoring…itself. Now we have a script in place that monitors Apps Script issues as they are reported and upvoted on the public tracker. When we see an issue that's having widespread or sudden impact, the script generates an alert that we can then investigate. With the help of our large, active community of developers, and leveraging Apps Script, we're now able to identify and respond to issues more quickly.

There's no substitute for independent monitoring, and our Apps Script-based approach isn't the first or the last line of defense. Instead, this new script helps us catch anything that our monitoring systems miss by listening to what developers are saying on the tracker.

If you see something, say something

Please help us keep Apps Script humming! When you notice a problem, search the issue trackerforit and file an issue if it's new. Click the star to let us know you're affected and leave a comment with instructions to reproduce, along with any other relevant details. Those instructions and other details help us respond to the issues more effectively, so please be sure to include them.

Happy scripting!

How Machine Learning with TensorFlow Enabled Mobile Proof-Of-Purchase at Coca-Cola

In this guest editorial, Patrick Brandt of The Coca-Cola Company tells us how they're using AI and TensorFlow to achieve frictionless proof-of-purchase.

Coca-Cola's core loyalty program launched in 2006 as MyCokeRewards.com. The "MCR.com" platform included the creation of unique product codes for every Coca-Cola, Sprite, Fanta, and Powerade product sold in 20oz bottles and cardboard "fridge-packs" purchasable at grocery stores and other retail outlets. Users could enter these product codes at MyCokeRewards.com to participate in promotional campaigns.

Fast-forward to 2016: Coke's loyalty programs are still hugely popular with millions of product codes having been entered for promotions and sweepstakes. However, mobile browsing went from non-existent in 2006 to over 50% share by the end of 2016. The launch of Coke.com as a mobile-first web experience (replacing MCR.com) was a response to these changes in browsing behavior. Thumb-entering 14-character codes into a mobile device could be a difficult enough user experience to impact the success of our programs. We want to provide our mobile audience the best possible experience, and recent advances in artificial intelligence opened new opportunities.

The quest for frictionless proof-of-purchase

For years Coke attempted to use off-the-shelf optical character recognition (OCR) libraries and services to read product codes with little success. Our printing process typically uses low-resolution dot-matrix fonts with the cap or fridge-pack media running under the printhead at very high speeds. All of this translates into a low-fidelity string of characters that defeats off-the-shelf OCR offerings (and can sometimes be hard to read with the human eye as well). OCR is critical to simplifying the code-entry process for mobile users: they should be able to take a picture of a code and automatically have the purchase registered for a promotional entry. We needed a purpose-built OCR system to recognize our product codes.

Bottlecap and fridge-pack examples

Our research led us to a promising solution: Convolutional Neural Networks. CNNs are one of a family of "deep learning" neural networks that are at the heart of modern artificial intelligence products. Google has used CNNs to extract street address numbers from StreetView images. CNNs also perform remarkably well at recognizing handwritten digits. These number-recognition use-cases were a perfect proxy for the type of problem we were trying to solve: extracting strings from images that contain small character sets with lots of variance in the appearance of the characters.

CNNs with TensorFlow

In the past, developing deep neural networks like CNNs has been a challenge because of the complexity of available training and inference libraries. TensorFlow, a machine learning framework that was open sourced by Google in November 2015, is designed to simplify the development of deep neural networks.

TensorFlow provides high-level interfaces to different kinds of neuron layers and popular loss functions, which makes it easier to implement different CNN model architectures. The ability to rapidly iterate over different model architectures dramatically reduced the time required to build Coke's custom OCR solution because different models could be developed, trained, and tested in a matter of days. TensorFlow models are also portable: the framework supports model execution natively on mobile devices ("AI on the edge") or in servers hosted remotely in the cloud. This enables a "create once, run anywhere" approach for model execution across many different platforms, including web-based and mobile.

Machine learning: practice makes perfect

Any neural network is only as good as the data used to train it. We knew that we needed a large set of labeled product-code images to train a CNN that would achieve our performance goals. Our training set would be built in three phases:

  1. Pre-launch simulated images
  2. Pre-launch real-world images
  3. Images labeled by our users in production

The pre-launch training phase began by programmatically generating millions of simulated product-code images. These simulated images included variations in tilt, lighting, shadows, and blurriness. The prediction accuracy (i.e. how often all 14 characters were correctly predicted within the top-10 predictions) was at 50% against real-world images when the model was trained using only simulated images. This provided a baseline for transfer-learning: a model initially trained with simulated images was the foundation for a more accurate model that would be trained against real-world images.

The challenge now turned to enriching the simulated images with enough real-world images to hit our performance goals. We created a purpose-built training app for iOS and Android devices that "trainers" could use to take pictures of codes and label them; these labeled images were then transferred to cloud storage for training. We did a production run of several thousand product codes on bottle caps and fridge-packs and distributed these to multiple suppliers who used the app to create the initial real-world training set.

Even with an augmented and enriched training set, there is no substitute for images created by end-users in a variety of environmental conditions. We knew that scans would sometimes result in an inaccurate code prediction, so we needed to provide a user-experience that would allow users to quickly correct these predictions. Two components are essential to delivering this experience: a product-code validation service that has been in use since the launch of our original loyalty platform in 2006 (to verify that a predicted code is an actual code) and a prediction algorithm that performs a regression to determine a per-character confidence at each one of the 14 character positions. If a predicted code is invalid, the top prediction as well as the confidence levels for each character are returned to the user interface. Low-confidence characters are visually highlighted to guide the user to update characters that need attention.

Error correction user interface lets users correct invalid predictions and generate useful training data

This user interface innovation enables an active learning process: a feedback loop allows the model to gradually improve by returning corrected predictions to the training pipeline. In this way, our users organically improve the accuracy of the character recognition model over time.

Product-code recognition pipeline

Optimizing for maximum performance

To meet user expectations around performance, we established a few ambitious requirements for the product-code OCR pipeline:

  • It had to be fast: we needed a one-second average processing time once the image of the product-code was sent into the OCR pipeline
  • It had to be accurate: our goal was to achieve 95% string recognition accuracy at launch with the guarantee that the model could be improved over time via active learning
  • It had to be small: the OCR pipeline needs to be small enough to be distributed directly to mobile apps and accommodate over-the-air updates as the model improves over time
  • It had to handle diverse product code media: dozens of different combinations of font types, bottlecaps, and cardboard fridge-pack media

We initially explored an architecture that used a single CNN for all product-code media. This approach created a model that was too large to be distributed to mobile apps and the execution time was longer than desired. Our applied-AI partners at Quantiphi, Inc.began iterating on different model architectures, eventually landing on one that used multiple CNNs.

This new architecture reduced the model size dramatically without sacrificing accuracy, but it was still on the high end of what we needed in order to support over-the-air updates to mobile apps. We next used TensorFlow's prebuilt quantization module to reduce the model size by reducing the fidelity of the weights between connected neurons. Quantization reduced the model size by a factor of 4, but a dramatic reduction in model size occurred when Quantiphi had a breakthrough using a new approach called SqueezeNet.

The SqueezeNet model was published by a team of researchers from UC Berkeley and Stanford in November of 2016. It uses a small but highly complex design to achieve accuracy levels on par with much larger models against popular benchmarks such as Imagenet. After re-architecting our character recognition models to use a SqueezeNet CNN, Quantiphi was able to reduce the model size of certain media types by a factor of 100. Since the SqueezeNet model was inherently smaller, a richer feature detection architecture could be constructed, achieving much higher accuracy at much smaller sizes compared to our first batch of models trained without SqueezeNet. We now have a highly accurate model that can be easily updated on remote devices; the recognition success rate of our final model before active learning was close to 96%, which translates into a 99.7% character recognition accuracy (just 3 misses for every 1000 character predictions).

Valid product-code recognition examples with different types of occlusion, translation, and camera focus issues

Crossing boundaries with AI

Advances in artificial intelligence and the maturity of TensorFlow enabled us to finally achieve a long-sought proof-of-purchase capability. Since launching in late February 2017, our product code recognition platform has fueled more than a dozen promotions and resulted in over 180,000 scanned codes; it is now a core component for all of Coca-Cola North America's web-based promotions.

Moving to an AI-enabled product-code recognition platform has been valuable for two key reasons:

  • Frictionless proof-of-purchase was enabled in a timely fashion, corresponding to our overall move to a mobile-first marketing platform.
  • Coke saved millions of dollars by avoiding the requirement to update printers in our production lines to support higher-fidelity fonts that would work with existing off-the-shelf OCR software.

Our product-code recognition platform is the first execution of new AI-enabled capabilities at scale within Coca-Cola. We're now exploring AI applications across multiple lines of business, from new product development to ecommerce retail optimization.

Actions on Google is now available in Australia

Posted by Brad Abrams, Product Manager

Last month we announcedthat UK users can access apps for the Google Assistant on Google Home and their phones—and starting today, we're bringing Actions on Google to Australia. From Perth to Sydney, developers can start building apps for the Google Assistant, giving their userseven more ways to get things done.

Similar to our launch in the UK, your English apps will appear in the local directory automatically. With that said, there are a few things to help make your app a true blue Aussie:

  • New TTS voices: There are a number of new TTS voices with an Australian (english) accent. We've automatically selected one for your app but you can change the selected voice or opt to use your current English US or UK voice by going to the actions console.
  • Practice makes perfect: We also recommend reviewing your response text strings andmaking adjustments to accommodate for differences between the languages, like making sure you know the important things, like candy should be lollies and servo is a gas station.

Our developer tools, documentationand simulatorhave all been updated to make it easy for you to create, test and deploy your app. So what are you waiting for?

UK and Aussie users are just the start, we'll continue to make the Actions on Google platform available in more languages over the coming year. If you have questions about internationalization, please reach out to us on Stackoverflowand Google+.

Introduction to TensorFlow Datasets and Estimators

Posted by The TensorFlow Team

TensorFlow 1.3 introduces two important features that you should try out:

  • Datasets: A completely new way of creating input pipelines (that is, reading data into your program).
  • Estimators: A high-level way to create TensorFlow models. Estimators include pre-made models for common machine learning tasks, but you can also use them to create your own custom models.

Below you can see how they fit in the TensorFlow architecture. Combined, they offer an easy way to create TensorFlow models and to feed data to them:

Our Example Model

To explore these features we're going to build a model and show you relevant code snippets. The complete code is available here, including instructions for getting the training and test files. Note that the code was written to demonstrate how Datasets and Estimators work functionally, and was not optimized for maximum performance.

The trained model categorizes Iris flowers based on four botanical features (sepal length, sepal width, petal length, and petal width). So, during inference, you can provide values for those four features and the model will predict that the flower is one of the following three beautiful variants:

From left to right: Iris setosa(by Radomil, CC BY-SA 3.0), Iris versicolor (by Dlanglois, CC BY-SA 3.0), and Iris virginica(by Frank Mayfield, CC BY-SA 2.0).

We're going to train a Deep Neural Network Classifier with the below structure. All input and output values will be float32, and the sum of the output values will be 1 (as we are predicting the probability for each individual Iris type):

For example, an output result might be 0.05 for Iris Setosa, 0.9 for Iris Versicolor, and 0.05 for Iris Virginica, which indicates a 90% probability that this is an Iris Versicolor.

Alright! Now that we have defined the model, let's look at how we can use Datasets and Estimators to train it and make predictions.

Introducing The Datasets

Datasets is a new way to create input pipelines to TensorFlow models. This API is much more performant than using feed_dict or the queue-based pipelines, and it's cleaner and easier to use. Although Datasets still resides in tf.contrib.data at 1.3, we expect to move this API to core at 1.4, so it's high time to take it for a test drive.

At a high-level, the Datasets consists of the following classes:

Where:

  • Dataset: Base class containing methods to create and transform datasets. Also allows you initialize a dataset from data in memory, or from a Python generator.
  • TextLineDataset: Reads lines from text files.
  • TFRecordDataset: Reads records from TFRecord files.
  • FixedLengthRecordDataset: Reads fixed size records from binary files.
  • Iterator: Provides a way to access one dataset element at a time.

Our dataset

To get started, let's first look at the dataset we will use to feed our model. We'll read data from a CSV file, where each row will contain five values-the four input values, plus the label:

The label will be:

  • 0 for Iris Setosa
  • 1 for Versicolor
  • 2 for Virginica.

Representing our dataset

To describe our dataset, we first create a list of our features:


feature_names = [
'SepalLength',
'SepalWidth',
'PetalLength',
'PetalWidth']

When we train our model, we'll need a function that reads the input file and returns the feature and label data. Estimators requires that you create a function of the following format:


def input_fn():
...<code>...
return ({ 'SepalLength':[values], ..<etc>.., 'PetalWidth':[values] },
[IrisFlowerType])

The return value must be a two-element tuple organized as follows: :

  • The first element must be a dict in which each input feature is a key, and then a list of values for the training batch.
  • The second element is a list of labels for the training batch.

Since we are returning a batch of input features and training labels, it means that all lists in the return statement will have equal lengths. Technically speaking, whenever we referred to "list" here, we actually mean a 1-d TensorFlow tensor.

To allow simple reuse of the input_fn we're going to add some arguments to it. This allows us to build input functions with different settings. The arguments are pretty straightforward:

  • file_path: The data file to read.
  • perform_shuffle: Whether the record order should be randomized.
  • repeat_count: The number of times to iterate over the records in the dataset. For example, if we specify 1, then each record is read once. If we specify None, iteration will continue forever.

Here's how we can implement this function using the Dataset API. We will wrap this in an "input function" that is suitable when feeding our Estimator model later on:

def my_input_fn(file_path, perform_shuffle=False, repeat_count=1):
def decode_csv(line):
parsed_line = tf.decode_csv(line, [[0.], [0.], [0.], [0.], [0]])
label = parsed_line[-1:] # Last element is the label
del parsed_line[-1] # Delete last element
features = parsed_line # Everything (but last element) are the features
d = dict(zip(feature_names, features)), label
return d

dataset = (tf.contrib.data.TextLineDataset(file_path) # Read text file
.skip(1) # Skip header row
.map(decode_csv)) # Transform each elem by applying decode_csv fn
if perform_shuffle:
# Randomizes input using a window of 256 elements (read into memory)
dataset = dataset.shuffle(buffer_size=256)
dataset = dataset.repeat(repeat_count) # Repeats dataset this # times
dataset = dataset.batch(32) # Batch size to use
iterator = dataset.make_one_shot_iterator()
batch_features, batch_labels = iterator.get_next()
return batch_features, batch_labels

Note the following: :

  • TextLineDataset: The Dataset API will do a lot of memory management for you when you're using its file-based datasets. You can, for example, read in dataset files much larger than memory or read in multiple files by specifying a list as argument.
  • shuffle: Reads buffer_size records, then shuffles (randomizes) their order.
  • map: Calls the decode_csv function with each element in the dataset as an argument (since we are using TextLineDataset, each element will be a line of CSV text). Then we apply decode_csv to each of the lines.
  • decode_csv: Splits each line into fields, providing the default values if necessary. Then returns a dict with the field keys and field values. The map function updates each elem (line) in the dataset with the dict.

That's an introduction to Datasets! Just for fun, we can now use this function to print the first batch:

next_batch = my_input_fn(FILE, True) # Will return 32 random elements

# Now let's try it out, retrieving and printing one batch of data.
# Although this code looks strange, you don't need to understand
# the details.
with tf.Session() as sess:
first_batch = sess.run(next_batch)
print(first_batch)

# Output
({'SepalLength': array([ 5.4000001, ...<repeat to 32 elems>], dtype=float32),
'PetalWidth': array([ 0.40000001, ...<repeat to 32 elems>], dtype=float32),
...
},
[array([[2], ...<repeat to 32 elems>], dtype=int32) # Labels
)

That's actually all we need from the Dataset API to implement our model. Datasets have a lot more capabilities though; please see the end of this post where we have collected more resources.

Introducing Estimators

Estimators is a high-level API that reduces much of the boilerplate code you previously needed to write when training a TensorFlow model. Estimators are also very flexible, allowing you to override the default behavior if you have specific requirements for your model.

There are two possible ways you can build your model using Estimators:

  • Pre-made Estimator - These are predefined estimators, created to generate a specific type of model. In this blog post, we will use the DNNClassifier pre-made estimator.
  • Estimator (base class) - Gives you complete control of how your model should be created by using a model_fn function. We will cover how to do this in a separate blog post.

Here is the class diagram for Estimators:

We hope to add more pre-made Estimators in future releases.

As you can see, all estimators make use of input_fn that provides the estimator with input data. In our case, we will reuse my_input_fn, which we defined for this purpose.

The following code instantiates the estimator that predicts the Iris flower type:

# Create the feature_columns, which specifies the input to our model.
# All our input features are numeric, so use numeric_column for each one.
feature_columns = [tf.feature_column.numeric_column(k) for k in feature_names]

# Create a deep neural network regression classifier.
# Use the DNNClassifier pre-made estimator
classifier = tf.estimator.DNNClassifier(
feature_columns=feature_columns, # The input features to our model
hidden_units=[10, 10], # Two layers, each with 10 neurons
n_classes=3,
model_dir=PATH) # Path to where checkpoints etc are stored

We now have a estimator that we can start to train.

Training the model

Training is performed using a single line of TensorFlow code:

# Train our model, use the previously function my_input_fn
# Input to training is a file with training example
# Stop training after 8 iterations of train data (epochs)
classifier.train(
input_fn=lambda: my_input_fn(FILE_TRAIN, True, 8))

But wait a minute... what is this "lambda: my_input_fn(FILE_TRAIN, True, 8)" stuff? That is where we hook up Datasets with the Estimators! Estimators needs data to perform training, evaluation, and prediction, and it uses the input_fn to fetch the data. Estimators require an input_fn with no arguments, so we create a function with no arguments using lambda, which calls our input_fn with the desired arguments: the file_path, shuffle setting, and repeat_count. In our case, we use our my_input_fn, passing it:

  • FILE_TRAIN, which is the training data file.
  • True, which tells the Estimator to shuffle the data.
  • 8, which tells the Estimator to and repeat the dataset 8 times.

Evaluating Our Trained Model

Ok, so now we have a trained model. How can we evaluate how well it's performing? Fortunately, every Estimator contains an evaluatemethod:

# Evaluate our model using the examples contained in FILE_TEST
# Return value will contain evaluation_metrics such as: loss & average_loss
evaluate_result = estimator.evaluate(
input_fn=lambda: my_input_fn(FILE_TEST, False, 4)
print("Evaluation results")
for key in evaluate_result:
print(" {}, was: {}".format(key, evaluate_result[key]))

In our case, we reach an accuracy of about ~93%. There are various ways of improving this accuracy, of course. One way is to simply run the program over and over. Since the state of the model is persisted (in model_dir=PATH above), the model will improve the more iterations you train it, until it settles. Another way would be to adjust the number of hidden layers or the number of nodes in each hidden layer. Feel free to experiment with this; please note, however, that when you make a change, you need to remove the directory specified in model_dir=PATH, since you are changing the structure of the DNNClassifier.

Making Predictions Using Our Trained Model

And that's it! We now have a trained model, and if we are happy with the evaluation results, we can use it to predict an Iris flower based on some input. As with training, and evaluation, we make predictions using a single function call:

# Predict the type of some Iris flowers.
# Let's predict the examples in FILE_TEST, repeat only once.
predict_results = classifier.predict(
input_fn=lambda: my_input_fn(FILE_TEST, False, 1))
print("Predictions on test file")
for prediction in predict_results:
# Will print the predicted class, i.e: 0, 1, or 2 if the prediction
# is Iris Sentosa, Vericolor, Virginica, respectively.
print prediction["class_ids"][0]

Making Predictions on Data in Memory

The preceding code specified FILE_TEST to make predictions on data stored in a file, but how could we make predictions on data residing in other sources, for example, in memory? As you may guess, this does not actually require a change to our predict call. Instead, we configure the Dataset API to use a memory structure as follows:

# Let create a memory dataset for prediction.
# We've taken the first 3 examples in FILE_TEST.
prediction_input = [[5.9, 3.0, 4.2, 1.5], # -> 1, Iris Versicolor
[6.9, 3.1, 5.4, 2.1], # -> 2, Iris Virginica
[5.1, 3.3, 1.7, 0.5]] # -> 0, Iris Sentosa
def new_input_fn():
def decode(x):
x = tf.split(x, 4) # Need to split into our 4 features
# When predicting, we don't need (or have) any labels
return dict(zip(feature_names, x)) # Then build a dict from them

# The from_tensor_slices function will use a memory structure as input
dataset = tf.contrib.data.Dataset.from_tensor_slices(prediction_input)
dataset = dataset.map(decode)
iterator = dataset.make_one_shot_iterator()
next_feature_batch = iterator.get_next()
return next_feature_batch, None # In prediction, we have no labels

# Predict all our prediction_input
predict_results = classifier.predict(input_fn=new_input_fn)

# Print results
print("Predictions on memory data")
for idx, prediction in enumerate(predict_results):
type = prediction["class_ids"][0] # Get the predicted class (index)
if type == 0:
print("I think: {}, is Iris Sentosa".format(prediction_input[idx]))
elif type == 1:
print("I think: {}, is Iris Versicolor".format(prediction_input[idx]))
else:
print("I think: {}, is Iris Virginica".format(prediction_input[idx])

Dataset.from_tensor_slides() is designed for small datasets that fit in memory. When using TextLineDataset as we did for training and evaluation, you can have arbitrarily large files, as long as your memory can manage the shuffle buffer and batch sizes.

Freebies

Using a pre-made Estimator like DNNClassifier provides a lot of value. In addition to being easy to use, pre-made Estimators also provide built-in evaluation metrics, and create summaries you can see in TensorBoard. To see this reporting, start TensorBoard from your command-line as follows:

# Replace PATH with the actual path passed as model_dir argument when the
# DNNRegressor estimator was created.
tensorboard --logdir=PATH

The following diagrams show some of the data that TensorBoard will provide:

Summary

In this this blogpost, we explored Datasets and Estimators. These are important APIs for defining input data streams and creating models, so investing time to learn them is definitely worthwhile!

For more details, be sure to check out

But it doesn't stop here. We will shortly publish more posts that describe how these APIs work, so stay tuned for that!

Until then, Happy TensorFlow coding!

Making the Google Developers documentation style guide public

Posted by Jed Hartman, Technical Writer

You can now use our developer-documentation style guide for open source documentation projects.

For some years now, our technical writers at Google have used an internal-only editorial style guide for most of our developer documentation. In order to better support external contributors to our open source projects, such as Kubernetes, AMP, or Dart, and to allow for more consistency across developer documentation, we're now making that style guide public.

If you contribute documentation to projects like those, you now have direct access to useful guidance about voice, tone, word choice, and other style considerations. It can be useful for general issues, like reminders to use second person, present tense, active voice, and the serial comma; it can also be great for checking very specific issues, like whether to write "app" or "application" when you want to be consistent with the Google Developers style.

The style guide is a reference document, so instead of reading through it in linear order, you can use it to look things up as needed. For matters of punctuation, grammar, and formatting, you can do a search-in-page to find items like "Commas," "Lists," and "Link text" in the left nav. For specific terms and phrases, you can look at the word list.

Keep an eye on the guide's release notes pagefor updates and developments, and send us your comments and suggestions via the Send Feedback link on each page of the guide—we want to hear from you as we continue to evolve the style guide.

Introducing the Mobile Web Specialist Certification by Google Developers

Posted by Sarah Clark, Program Manager, Web Developer Training
If you're a web developer, it's a crowded market, and you likely want to set yourself apart from other web developers. Would you like to show that you have the skills to create responsive and flexible web applications?
The Google Developers Certification Team is pleased to announce the Mobile Web Specialist Certification. Based on a thorough analysis of the market, this new certification highlights developers who have in-demand skills as mobile web developers. (But don't worry, the skills demonstrated in this exam can be used on the desktop and across all browsers.)
Use our Mobile Web Specialist Study Guide to help you prepare. When you're ready to take the exam, you will write code in a timed, performance-based exam. The cost for certification is $99 USD (6500 INR if you reside in India) and includes up to three exam attempts.
Check out this short video for a quick overview of the Mobile Web Specialist certification process:
Earning your Mobile Web Specialist Certification gives you a digital badge to display on your resume and social media profiles. As a member of the Mobile Web Specialist Alumni Community, you will also have access to program benefits focused on increasing your visibility as a certified developer.
The Mobile Web Specialist Certification joins the Associate Android Developer Certification in Google's family of performance-based certifications.
Visit g.co/dev/mobile-web-cert to get started and earn your Google Mobile Web Specialist Certification.

Bringing Real-time Spatial Audio to the Web with Songbird

Posted by Jamieson Brettle and Drew Allen, Chrome Media Team

For a virtual scene to be truly immersive, stunning visuals need to be accompanied by true spatial audio to create a realistic and believable experience. Spatial audio tools allow developers to include sounds that can come from any direction, and that are associated in 3D space with audio sources, thus completely enveloping the user in 360-degree sound.

Spatial audio helps draw the user into a scene and creates the illusion of entering an entirely new world. To make this possible, the Chrome Media team has created Songbird, an open source, spatial audio encoding engine that works in any web browser by using the Web Audio API.

The Songbird library takes in any number of mono audio streams and allows developers to programmatically place them in 3D space around the user. Songbird allows you to create immersive soundscapes, realistically reproducing reflection and reverb for the space you describe. Sounds bounce off walls and reflect off materials just as they would in real-life, capturing truly 360-degree sound. Songbird creates an ambisonic soundfield that can then be rendered in real-time for use in your application. We've partnered with the Omnitoneproject, which we blogged about last year, to add higher-order ambisonic support to Omnitone's binaural rendererto produce far more accurate sounding audio than ever before.

Songbird encapsulates Omnitone and with it, developers can now add interactive, full-sphere audio to any web based application. Songbird can scale to any order ambisonics, thereby creating a more realistic sound and higher performance than what is achievable through standard Web Audio API.

Songbird Audio Processing Diagram

The implementation of Songbird is based on the Google spatial mediaspecification. It expects mono input and outputs ambisonic (multichannel) ACN channel layout with SN3D normalization. Detailed documentation may be found here.

As the web emerges as an important VR platformfor delivering content, spatial audio will play a vital role in users' embrace of this new medium. Songbird and Omnitone are key tools in enabling spatial audio on the web platform and establishing it as a preeminent platform for compelling VR experiences. Combining these audio experiences with 3D JavaScript libraries like three.js gives a glimpseinto the future on the web.

Demo combining spatial sound in 3D environment

This project was made possible through close collaboration with Google's Daydream and Web Audio teams. This collaboration allowed us to deliver similar audio capabilities to the web as are available to developers creating Daydream applications.

We look forward to seeing what people do with Songbird now that it's open source. Check out the code on GitHub and let us know what you think. Also available are a number of demoson creating full spherical audio with Songbird.