Category Archives: Google Developers Blog

News and insights on Google platforms, tools and events

How Machine Learning with TensorFlow Enabled Mobile Proof-Of-Purchase at Coca-Cola

In this guest editorial, Patrick Brandt of The Coca-Cola Company tells us how they're using AI and TensorFlow to achieve frictionless proof-of-purchase.

Coca-Cola's core loyalty program launched in 2006 as MyCokeRewards.com. The "MCR.com" platform included the creation of unique product codes for every Coca-Cola, Sprite, Fanta, and Powerade product sold in 20oz bottles and cardboard "fridge-packs" purchasable at grocery stores and other retail outlets. Users could enter these product codes at MyCokeRewards.com to participate in promotional campaigns.

Fast-forward to 2016: Coke's loyalty programs are still hugely popular with millions of product codes having been entered for promotions and sweepstakes. However, mobile browsing went from non-existent in 2006 to over 50% share by the end of 2016. The launch of Coke.com as a mobile-first web experience (replacing MCR.com) was a response to these changes in browsing behavior. Thumb-entering 14-character codes into a mobile device could be a difficult enough user experience to impact the success of our programs. We want to provide our mobile audience the best possible experience, and recent advances in artificial intelligence opened new opportunities.

The quest for frictionless proof-of-purchase

For years Coke attempted to use off-the-shelf optical character recognition (OCR) libraries and services to read product codes with little success. Our printing process typically uses low-resolution dot-matrix fonts with the cap or fridge-pack media running under the printhead at very high speeds. All of this translates into a low-fidelity string of characters that defeats off-the-shelf OCR offerings (and can sometimes be hard to read with the human eye as well). OCR is critical to simplifying the code-entry process for mobile users: they should be able to take a picture of a code and automatically have the purchase registered for a promotional entry. We needed a purpose-built OCR system to recognize our product codes.

Bottlecap and fridge-pack examples

Our research led us to a promising solution: Convolutional Neural Networks. CNNs are one of a family of "deep learning" neural networks that are at the heart of modern artificial intelligence products. Google has used CNNs to extract street address numbers from StreetView images. CNNs also perform remarkably well at recognizing handwritten digits. These number-recognition use-cases were a perfect proxy for the type of problem we were trying to solve: extracting strings from images that contain small character sets with lots of variance in the appearance of the characters.

CNNs with TensorFlow

In the past, developing deep neural networks like CNNs has been a challenge because of the complexity of available training and inference libraries. TensorFlow, a machine learning framework that was open sourced by Google in November 2015, is designed to simplify the development of deep neural networks.

TensorFlow provides high-level interfaces to different kinds of neuron layers and popular loss functions, which makes it easier to implement different CNN model architectures. The ability to rapidly iterate over different model architectures dramatically reduced the time required to build Coke's custom OCR solution because different models could be developed, trained, and tested in a matter of days. TensorFlow models are also portable: the framework supports model execution natively on mobile devices ("AI on the edge") or in servers hosted remotely in the cloud. This enables a "create once, run anywhere" approach for model execution across many different platforms, including web-based and mobile.

Machine learning: practice makes perfect

Any neural network is only as good as the data used to train it. We knew that we needed a large set of labeled product-code images to train a CNN that would achieve our performance goals. Our training set would be built in three phases:

  1. Pre-launch simulated images
  2. Pre-launch real-world images
  3. Images labeled by our users in production

The pre-launch training phase began by programmatically generating millions of simulated product-code images. These simulated images included variations in tilt, lighting, shadows, and blurriness. The prediction accuracy (i.e. how often all 14 characters were correctly predicted within the top-10 predictions) was at 50% against real-world images when the model was trained using only simulated images. This provided a baseline for transfer-learning: a model initially trained with simulated images was the foundation for a more accurate model that would be trained against real-world images.

The challenge now turned to enriching the simulated images with enough real-world images to hit our performance goals. We created a purpose-built training app for iOS and Android devices that "trainers" could use to take pictures of codes and label them; these labeled images were then transferred to cloud storage for training. We did a production run of several thousand product codes on bottle caps and fridge-packs and distributed these to multiple suppliers who used the app to create the initial real-world training set.

Even with an augmented and enriched training set, there is no substitute for images created by end-users in a variety of environmental conditions. We knew that scans would sometimes result in an inaccurate code prediction, so we needed to provide a user-experience that would allow users to quickly correct these predictions. Two components are essential to delivering this experience: a product-code validation service that has been in use since the launch of our original loyalty platform in 2006 (to verify that a predicted code is an actual code) and a prediction algorithm that performs a regression to determine a per-character confidence at each one of the 14 character positions. If a predicted code is invalid, the top prediction as well as the confidence levels for each character are returned to the user interface. Low-confidence characters are visually highlighted to guide the user to update characters that need attention.

Error correction user interface lets users correct invalid predictions and generate useful training data

This user interface innovation enables an active learning process: a feedback loop allows the model to gradually improve by returning corrected predictions to the training pipeline. In this way, our users organically improve the accuracy of the character recognition model over time.

Product-code recognition pipeline

Optimizing for maximum performance

To meet user expectations around performance, we established a few ambitious requirements for the product-code OCR pipeline:

  • It had to be fast: we needed a one-second average processing time once the image of the product-code was sent into the OCR pipeline
  • It had to be accurate: our goal was to achieve 95% string recognition accuracy at launch with the guarantee that the model could be improved over time via active learning
  • It had to be small: the OCR pipeline needs to be small enough to be distributed directly to mobile apps and accommodate over-the-air updates as the model improves over time
  • It had to handle diverse product code media: dozens of different combinations of font types, bottlecaps, and cardboard fridge-pack media

We initially explored an architecture that used a single CNN for all product-code media. This approach created a model that was too large to be distributed to mobile apps and the execution time was longer than desired. Our applied-AI partners at Quantiphi, Inc.began iterating on different model architectures, eventually landing on one that used multiple CNNs.

This new architecture reduced the model size dramatically without sacrificing accuracy, but it was still on the high end of what we needed in order to support over-the-air updates to mobile apps. We next used TensorFlow's prebuilt quantization module to reduce the model size by reducing the fidelity of the weights between connected neurons. Quantization reduced the model size by a factor of 4, but a dramatic reduction in model size occurred when Quantiphi had a breakthrough using a new approach called SqueezeNet.

The SqueezeNet model was published by a team of researchers from UC Berkeley and Stanford in November of 2016. It uses a small but highly complex design to achieve accuracy levels on par with much larger models against popular benchmarks such as Imagenet. After re-architecting our character recognition models to use a SqueezeNet CNN, Quantiphi was able to reduce the model size of certain media types by a factor of 100. Since the SqueezeNet model was inherently smaller, a richer feature detection architecture could be constructed, achieving much higher accuracy at much smaller sizes compared to our first batch of models trained without SqueezeNet. We now have a highly accurate model that can be easily updated on remote devices; the recognition success rate of our final model before active learning was close to 96%, which translates into a 99.7% character recognition accuracy (just 3 misses for every 1000 character predictions).

Valid product-code recognition examples with different types of occlusion, translation, and camera focus issues

Crossing boundaries with AI

Advances in artificial intelligence and the maturity of TensorFlow enabled us to finally achieve a long-sought proof-of-purchase capability. Since launching in late February 2017, our product code recognition platform has fueled more than a dozen promotions and resulted in over 180,000 scanned codes; it is now a core component for all of Coca-Cola North America's web-based promotions.

Moving to an AI-enabled product-code recognition platform has been valuable for two key reasons:

  • Frictionless proof-of-purchase was enabled in a timely fashion, corresponding to our overall move to a mobile-first marketing platform.
  • Coke saved millions of dollars by avoiding the requirement to update printers in our production lines to support higher-fidelity fonts that would work with existing off-the-shelf OCR software.

Our product-code recognition platform is the first execution of new AI-enabled capabilities at scale within Coca-Cola. We're now exploring AI applications across multiple lines of business, from new product development to ecommerce retail optimization.

Actions on Google is now available in Australia

Posted by Brad Abrams, Product Manager

Last month we announcedthat UK users can access apps for the Google Assistant on Google Home and their phones—and starting today, we're bringing Actions on Google to Australia. From Perth to Sydney, developers can start building apps for the Google Assistant, giving their userseven more ways to get things done.

Similar to our launch in the UK, your English apps will appear in the local directory automatically. With that said, there are a few things to help make your app a true blue Aussie:

  • New TTS voices: There are a number of new TTS voices with an Australian (english) accent. We've automatically selected one for your app but you can change the selected voice or opt to use your current English US or UK voice by going to the actions console.
  • Practice makes perfect: We also recommend reviewing your response text strings andmaking adjustments to accommodate for differences between the languages, like making sure you know the important things, like candy should be lollies and servo is a gas station.

Our developer tools, documentationand simulatorhave all been updated to make it easy for you to create, test and deploy your app. So what are you waiting for?

UK and Aussie users are just the start, we'll continue to make the Actions on Google platform available in more languages over the coming year. If you have questions about internationalization, please reach out to us on Stackoverflowand Google+.

Introduction to TensorFlow Datasets and Estimators

Posted by The TensorFlow Team

TensorFlow 1.3 introduces two important features that you should try out:

  • Datasets: A completely new way of creating input pipelines (that is, reading data into your program).
  • Estimators: A high-level way to create TensorFlow models. Estimators include pre-made models for common machine learning tasks, but you can also use them to create your own custom models.

Below you can see how they fit in the TensorFlow architecture. Combined, they offer an easy way to create TensorFlow models and to feed data to them:

Our Example Model

To explore these features we're going to build a model and show you relevant code snippets. The complete code is available here, including instructions for getting the training and test files. Note that the code was written to demonstrate how Datasets and Estimators work functionally, and was not optimized for maximum performance.

The trained model categorizes Iris flowers based on four botanical features (sepal length, sepal width, petal length, and petal width). So, during inference, you can provide values for those four features and the model will predict that the flower is one of the following three beautiful variants:

From left to right: Iris setosa(by Radomil, CC BY-SA 3.0), Iris versicolor (by Dlanglois, CC BY-SA 3.0), and Iris virginica(by Frank Mayfield, CC BY-SA 2.0).

We're going to train a Deep Neural Network Classifier with the below structure. All input and output values will be float32, and the sum of the output values will be 1 (as we are predicting the probability for each individual Iris type):

For example, an output result might be 0.05 for Iris Setosa, 0.9 for Iris Versicolor, and 0.05 for Iris Virginica, which indicates a 90% probability that this is an Iris Versicolor.

Alright! Now that we have defined the model, let's look at how we can use Datasets and Estimators to train it and make predictions.

Introducing The Datasets

Datasets is a new way to create input pipelines to TensorFlow models. This API is much more performant than using feed_dict or the queue-based pipelines, and it's cleaner and easier to use. Although Datasets still resides in tf.contrib.data at 1.3, we expect to move this API to core at 1.4, so it's high time to take it for a test drive.

At a high-level, the Datasets consists of the following classes:

Where:

  • Dataset: Base class containing methods to create and transform datasets. Also allows you initialize a dataset from data in memory, or from a Python generator.
  • TextLineDataset: Reads lines from text files.
  • TFRecordDataset: Reads records from TFRecord files.
  • FixedLengthRecordDataset: Reads fixed size records from binary files.
  • Iterator: Provides a way to access one dataset element at a time.

Our dataset

To get started, let's first look at the dataset we will use to feed our model. We'll read data from a CSV file, where each row will contain five values-the four input values, plus the label:

The label will be:

  • 0 for Iris Setosa
  • 1 for Versicolor
  • 2 for Virginica.

Representing our dataset

To describe our dataset, we first create a list of our features:


feature_names = [
'SepalLength',
'SepalWidth',
'PetalLength',
'PetalWidth']

When we train our model, we'll need a function that reads the input file and returns the feature and label data. Estimators requires that you create a function of the following format:


def input_fn():
...<code>...
return ({ 'SepalLength':[values], ..<etc>.., 'PetalWidth':[values] },
[IrisFlowerType])

The return value must be a two-element tuple organized as follows: :

  • The first element must be a dict in which each input feature is a key, and then a list of values for the training batch.
  • The second element is a list of labels for the training batch.

Since we are returning a batch of input features and training labels, it means that all lists in the return statement will have equal lengths. Technically speaking, whenever we referred to "list" here, we actually mean a 1-d TensorFlow tensor.

To allow simple reuse of the input_fn we're going to add some arguments to it. This allows us to build input functions with different settings. The arguments are pretty straightforward:

  • file_path: The data file to read.
  • perform_shuffle: Whether the record order should be randomized.
  • repeat_count: The number of times to iterate over the records in the dataset. For example, if we specify 1, then each record is read once. If we specify None, iteration will continue forever.

Here's how we can implement this function using the Dataset API. We will wrap this in an "input function" that is suitable when feeding our Estimator model later on:

def my_input_fn(file_path, perform_shuffle=False, repeat_count=1):
def decode_csv(line):
parsed_line = tf.decode_csv(line, [[0.], [0.], [0.], [0.], [0]])
label = parsed_line[-1:] # Last element is the label
del parsed_line[-1] # Delete last element
features = parsed_line # Everything (but last element) are the features
d = dict(zip(feature_names, features)), label
return d

dataset = (tf.contrib.data.TextLineDataset(file_path) # Read text file
.skip(1) # Skip header row
.map(decode_csv)) # Transform each elem by applying decode_csv fn
if perform_shuffle:
# Randomizes input using a window of 256 elements (read into memory)
dataset = dataset.shuffle(buffer_size=256)
dataset = dataset.repeat(repeat_count) # Repeats dataset this # times
dataset = dataset.batch(32) # Batch size to use
iterator = dataset.make_one_shot_iterator()
batch_features, batch_labels = iterator.get_next()
return batch_features, batch_labels

Note the following: :

  • TextLineDataset: The Dataset API will do a lot of memory management for you when you're using its file-based datasets. You can, for example, read in dataset files much larger than memory or read in multiple files by specifying a list as argument.
  • shuffle: Reads buffer_size records, then shuffles (randomizes) their order.
  • map: Calls the decode_csv function with each element in the dataset as an argument (since we are using TextLineDataset, each element will be a line of CSV text). Then we apply decode_csv to each of the lines.
  • decode_csv: Splits each line into fields, providing the default values if necessary. Then returns a dict with the field keys and field values. The map function updates each elem (line) in the dataset with the dict.

That's an introduction to Datasets! Just for fun, we can now use this function to print the first batch:

next_batch = my_input_fn(FILE, True) # Will return 32 random elements

# Now let's try it out, retrieving and printing one batch of data.
# Although this code looks strange, you don't need to understand
# the details.
with tf.Session() as sess:
first_batch = sess.run(next_batch)
print(first_batch)

# Output
({'SepalLength': array([ 5.4000001, ...<repeat to 32 elems>], dtype=float32),
'PetalWidth': array([ 0.40000001, ...<repeat to 32 elems>], dtype=float32),
...
},
[array([[2], ...<repeat to 32 elems>], dtype=int32) # Labels
)

That's actually all we need from the Dataset API to implement our model. Datasets have a lot more capabilities though; please see the end of this post where we have collected more resources.

Introducing Estimators

Estimators is a high-level API that reduces much of the boilerplate code you previously needed to write when training a TensorFlow model. Estimators are also very flexible, allowing you to override the default behavior if you have specific requirements for your model.

There are two possible ways you can build your model using Estimators:

  • Pre-made Estimator - These are predefined estimators, created to generate a specific type of model. In this blog post, we will use the DNNClassifier pre-made estimator.
  • Estimator (base class) - Gives you complete control of how your model should be created by using a model_fn function. We will cover how to do this in a separate blog post.

Here is the class diagram for Estimators:

We hope to add more pre-made Estimators in future releases.

As you can see, all estimators make use of input_fn that provides the estimator with input data. In our case, we will reuse my_input_fn, which we defined for this purpose.

The following code instantiates the estimator that predicts the Iris flower type:

# Create the feature_columns, which specifies the input to our model.
# All our input features are numeric, so use numeric_column for each one.
feature_columns = [tf.feature_column.numeric_column(k) for k in feature_names]

# Create a deep neural network regression classifier.
# Use the DNNClassifier pre-made estimator
classifier = tf.estimator.DNNClassifier(
feature_columns=feature_columns, # The input features to our model
hidden_units=[10, 10], # Two layers, each with 10 neurons
n_classes=3,
model_dir=PATH) # Path to where checkpoints etc are stored

We now have a estimator that we can start to train.

Training the model

Training is performed using a single line of TensorFlow code:

# Train our model, use the previously function my_input_fn
# Input to training is a file with training example
# Stop training after 8 iterations of train data (epochs)
classifier.train(
input_fn=lambda: my_input_fn(FILE_TRAIN, True, 8))

But wait a minute... what is this "lambda: my_input_fn(FILE_TRAIN, True, 8)" stuff? That is where we hook up Datasets with the Estimators! Estimators needs data to perform training, evaluation, and prediction, and it uses the input_fn to fetch the data. Estimators require an input_fn with no arguments, so we create a function with no arguments using lambda, which calls our input_fn with the desired arguments: the file_path, shuffle setting, and repeat_count. In our case, we use our my_input_fn, passing it:

  • FILE_TRAIN, which is the training data file.
  • True, which tells the Estimator to shuffle the data.
  • 8, which tells the Estimator to and repeat the dataset 8 times.

Evaluating Our Trained Model

Ok, so now we have a trained model. How can we evaluate how well it's performing? Fortunately, every Estimator contains an evaluatemethod:

# Evaluate our model using the examples contained in FILE_TEST
# Return value will contain evaluation_metrics such as: loss & average_loss
evaluate_result = estimator.evaluate(
input_fn=lambda: my_input_fn(FILE_TEST, False, 4)
print("Evaluation results")
for key in evaluate_result:
print(" {}, was: {}".format(key, evaluate_result[key]))

In our case, we reach an accuracy of about ~93%. There are various ways of improving this accuracy, of course. One way is to simply run the program over and over. Since the state of the model is persisted (in model_dir=PATH above), the model will improve the more iterations you train it, until it settles. Another way would be to adjust the number of hidden layers or the number of nodes in each hidden layer. Feel free to experiment with this; please note, however, that when you make a change, you need to remove the directory specified in model_dir=PATH, since you are changing the structure of the DNNClassifier.

Making Predictions Using Our Trained Model

And that's it! We now have a trained model, and if we are happy with the evaluation results, we can use it to predict an Iris flower based on some input. As with training, and evaluation, we make predictions using a single function call:

# Predict the type of some Iris flowers.
# Let's predict the examples in FILE_TEST, repeat only once.
predict_results = classifier.predict(
input_fn=lambda: my_input_fn(FILE_TEST, False, 1))
print("Predictions on test file")
for prediction in predict_results:
# Will print the predicted class, i.e: 0, 1, or 2 if the prediction
# is Iris Sentosa, Vericolor, Virginica, respectively.
print prediction["class_ids"][0]

Making Predictions on Data in Memory

The preceding code specified FILE_TEST to make predictions on data stored in a file, but how could we make predictions on data residing in other sources, for example, in memory? As you may guess, this does not actually require a change to our predict call. Instead, we configure the Dataset API to use a memory structure as follows:

# Let create a memory dataset for prediction.
# We've taken the first 3 examples in FILE_TEST.
prediction_input = [[5.9, 3.0, 4.2, 1.5], # -> 1, Iris Versicolor
[6.9, 3.1, 5.4, 2.1], # -> 2, Iris Virginica
[5.1, 3.3, 1.7, 0.5]] # -> 0, Iris Sentosa
def new_input_fn():
def decode(x):
x = tf.split(x, 4) # Need to split into our 4 features
# When predicting, we don't need (or have) any labels
return dict(zip(feature_names, x)) # Then build a dict from them

# The from_tensor_slices function will use a memory structure as input
dataset = tf.contrib.data.Dataset.from_tensor_slices(prediction_input)
dataset = dataset.map(decode)
iterator = dataset.make_one_shot_iterator()
next_feature_batch = iterator.get_next()
return next_feature_batch, None # In prediction, we have no labels

# Predict all our prediction_input
predict_results = classifier.predict(input_fn=new_input_fn)

# Print results
print("Predictions on memory data")
for idx, prediction in enumerate(predict_results):
type = prediction["class_ids"][0] # Get the predicted class (index)
if type == 0:
print("I think: {}, is Iris Sentosa".format(prediction_input[idx]))
elif type == 1:
print("I think: {}, is Iris Versicolor".format(prediction_input[idx]))
else:
print("I think: {}, is Iris Virginica".format(prediction_input[idx])

Dataset.from_tensor_slides() is designed for small datasets that fit in memory. When using TextLineDataset as we did for training and evaluation, you can have arbitrarily large files, as long as your memory can manage the shuffle buffer and batch sizes.

Freebies

Using a pre-made Estimator like DNNClassifier provides a lot of value. In addition to being easy to use, pre-made Estimators also provide built-in evaluation metrics, and create summaries you can see in TensorBoard. To see this reporting, start TensorBoard from your command-line as follows:

# Replace PATH with the actual path passed as model_dir argument when the
# DNNRegressor estimator was created.
tensorboard --logdir=PATH

The following diagrams show some of the data that TensorBoard will provide:

Summary

In this this blogpost, we explored Datasets and Estimators. These are important APIs for defining input data streams and creating models, so investing time to learn them is definitely worthwhile!

For more details, be sure to check out

But it doesn't stop here. We will shortly publish more posts that describe how these APIs work, so stay tuned for that!

Until then, Happy TensorFlow coding!

Making the Google Developers documentation style guide public

Posted by Jed Hartman, Technical Writer

You can now use our developer-documentation style guide for open source documentation projects.

For some years now, our technical writers at Google have used an internal-only editorial style guide for most of our developer documentation. In order to better support external contributors to our open source projects, such as Kubernetes, AMP, or Dart, and to allow for more consistency across developer documentation, we're now making that style guide public.

If you contribute documentation to projects like those, you now have direct access to useful guidance about voice, tone, word choice, and other style considerations. It can be useful for general issues, like reminders to use second person, present tense, active voice, and the serial comma; it can also be great for checking very specific issues, like whether to write "app" or "application" when you want to be consistent with the Google Developers style.

The style guide is a reference document, so instead of reading through it in linear order, you can use it to look things up as needed. For matters of punctuation, grammar, and formatting, you can do a search-in-page to find items like "Commas," "Lists," and "Link text" in the left nav. For specific terms and phrases, you can look at the word list.

Keep an eye on the guide's release notes pagefor updates and developments, and send us your comments and suggestions via the Send Feedback link on each page of the guide—we want to hear from you as we continue to evolve the style guide.

Introducing the Mobile Web Specialist Certification by Google Developers

Posted by Sarah Clark, Program Manager, Web Developer Training
If you're a web developer, it's a crowded market, and you likely want to set yourself apart from other web developers. Would you like to show that you have the skills to create responsive and flexible web applications?
The Google Developers Certification Team is pleased to announce the Mobile Web Specialist Certification. Based on a thorough analysis of the market, this new certification highlights developers who have in-demand skills as mobile web developers. (But don't worry, the skills demonstrated in this exam can be used on the desktop and across all browsers.)
Use our Mobile Web Specialist Study Guide to help you prepare. When you're ready to take the exam, you will write code in a timed, performance-based exam. The cost for certification is $99 USD (6500 INR if you reside in India) and includes up to three exam attempts.
Check out this short video for a quick overview of the Mobile Web Specialist certification process:
Earning your Mobile Web Specialist Certification gives you a digital badge to display on your resume and social media profiles. As a member of the Mobile Web Specialist Alumni Community, you will also have access to program benefits focused on increasing your visibility as a certified developer.
The Mobile Web Specialist Certification joins the Associate Android Developer Certification in Google's family of performance-based certifications.
Visit g.co/dev/mobile-web-cert to get started and earn your Google Mobile Web Specialist Certification.

Bringing Real-time Spatial Audio to the Web with Songbird

Posted by Jamieson Brettle and Drew Allen, Chrome Media Team

For a virtual scene to be truly immersive, stunning visuals need to be accompanied by true spatial audio to create a realistic and believable experience. Spatial audio tools allow developers to include sounds that can come from any direction, and that are associated in 3D space with audio sources, thus completely enveloping the user in 360-degree sound.

Spatial audio helps draw the user into a scene and creates the illusion of entering an entirely new world. To make this possible, the Chrome Media team has created Songbird, an open source, spatial audio encoding engine that works in any web browser by using the Web Audio API.

The Songbird library takes in any number of mono audio streams and allows developers to programmatically place them in 3D space around the user. Songbird allows you to create immersive soundscapes, realistically reproducing reflection and reverb for the space you describe. Sounds bounce off walls and reflect off materials just as they would in real-life, capturing truly 360-degree sound. Songbird creates an ambisonic soundfield that can then be rendered in real-time for use in your application. We've partnered with the Omnitoneproject, which we blogged about last year, to add higher-order ambisonic support to Omnitone's binaural rendererto produce far more accurate sounding audio than ever before.

Songbird encapsulates Omnitone and with it, developers can now add interactive, full-sphere audio to any web based application. Songbird can scale to any order ambisonics, thereby creating a more realistic sound and higher performance than what is achievable through standard Web Audio API.

Songbird Audio Processing Diagram

The implementation of Songbird is based on the Google spatial mediaspecification. It expects mono input and outputs ambisonic (multichannel) ACN channel layout with SN3D normalization. Detailed documentation may be found here.

As the web emerges as an important VR platformfor delivering content, spatial audio will play a vital role in users' embrace of this new medium. Songbird and Omnitone are key tools in enabling spatial audio on the web platform and establishing it as a preeminent platform for compelling VR experiences. Combining these audio experiences with 3D JavaScript libraries like three.js gives a glimpseinto the future on the web.

Demo combining spatial sound in 3D environment

This project was made possible through close collaboration with Google's Daydream and Web Audio teams. This collaboration allowed us to deliver similar audio capabilities to the web as are available to developers creating Daydream applications.

We look forward to seeing what people do with Songbird now that it's open source. Check out the code on GitHub and let us know what you think. Also available are a number of demoson creating full spherical audio with Songbird.

Launchpad Accelerator is open to more countries around the world! Apply now.

Posted by Roy Glasberg, Global Lead, Launchpad Program & Accelerator

Launchpad Accelerator gives us an opportunity to work with and empower amazing developers, who are solving major challenges all around the world -- whether it's streamlining digital commerce across Africa, providing access to multimedia tools that support special needs education, or using AI to simplify business operations.

That's why we're doubling down on our efforts and opening up applications for the next class of the program to more countries for the first time starting today. Here's the full list of the new additions:

  • Africa: Algeria, Egypt, Ghana, Morocco, Tanzania, Tunisia & Uganda
  • Asia: Bangladesh, Myanmar, Pakistan & Sri Lanka
  • Europe: Estonia, Romania, Ukraine, Belarus & Russia
  • Latin America: Costa Rica, Panama, Peru & Uruguay

They'll be joined by our larger list of countries that are already part of the program, including: Argentina, Brazil, Chile, Colombia, Czech Republic, Hungary, India, Indonesia, Kenya, Malaysia, Mexico, Nigeria, Philippines, Poland, South Africa, Thailand, and Vietnam.

The application process for the equity-free program will end on October 2, 2017 at 9AM PST. Later in the year, the list of selected developers will be invited to the Google Developers Launchpad Space in San Francisco for 2 weeks of all-expense-paid training.

What are the benefits?

The training at Google HQ includes intensive mentoring from 20+ Google teams, and expert mentors from top technology companies and VCs in Silicon Valley. Participants receive equity-free support, credits for Google products, PR support and continue to work closely with Google back in their home country during the 6-month program. Hear from some alumnus about their experiences here.

What do we look for when selecting startups?

Each startup that applies to the Launchpad Accelerator is considered holistically and with great care. Below are general guidelines behind our process to help you understand what we look for in our candidates.

All startups in the program must:

  • Be a technological startup.
  • Be targeting their local markets.
  • Have proven product-market fit (beyond ideation stage).
  • Be based in the countries listed above.

Additionally, we are interested in what kind of startup you are. We also consider:

  • The problem you are trying to solve. How does it create value for users? How are you addressing a real challenge for your home city, country or region?
  • Does your management team have a leadership mindset and the drive to become an influencer?
  • Will you share what you learn in Silicon Valley for the benefit of other startups in your local ecosystem?
  • If you're based outside of these countries, stay tuned, as we expect to add more countries to the program in the future.

We can't wait to hear from you and see how we can work together to improve your business.

Participants from Class 4

Google Play Developer API new fields for In-app Billing information

Posted by Neto Marin, Developer Advocate

We'd like to share with you some good news about an improvement in the data available via the Google Play Developer API. Starting Monday Aug 28, the API for Purchases.productsand Purchases.subscriptionswill be returning a couple of new values:

  • orderId
    • To be returned via both products and subscriptions API
      • For Purchases, this will be the order id present in the purchase.
      • For subscriptions, this will be the orderId associated with the most recent recurring order id.
  • New subscription cancelReason: 2. Subscription replaced
    • Will be returned for subscriptions which were canceled due to the user changing subscription plans (e.g. upgrading to a new subscription plan).

This additional data will be automatically returned to you in the JSON responses to your API calls. Please double check your integration to make sure this new field and value will not cause any problems for you.

To view all of the values returned by the APIs, check Purchases.productsand Purchases.subscriptionsreference pages.

AIY Projects update: new maker projects, new partners, new kits

Posted by Billy Rutledge, Director, AIY Projects

Makers are hands-on when it comes to making change. We're explorers, hackers and problem solvers who build devices, ecosystems, art (sometimes a combination of the three) on the basis of our own (often unconventional) ideas. So when my team first sought out to empower makers of all types and ages with the AI technology we've honed at Google, we knew whatever we built had to be open and accessible. We stayed clear of limitations that come from platform and software stack requirements, high cost and complex set up, and fixed our focus on the curiosity and inventiveness that inspire makers around the world.

When we launched our Voice Kit with help from our partner Raspberry Pi in May and sold out globally in just a few hours, we got the message loud and clear. There is a genuine demand among do-it-yourselfers for artificial intelligence that makes human-to-machine interaction more like natural human interaction.

Last week we announced the Speech Commands Dataset, a collaboration between the TensorFlow and AIY teams. The dataset has 65,000 one-second long utterances of 30 short words by thousands of different contributors of the AIY websiteand allows you to build simple voice interfaces for applications. We're currently in the process of integrating the dataset with the next release of the Voice Kit, so makers could build devices that respond to simple voice commands without the press of a button or an internet connection.

Today, you can pre-order your Voice Kit, which will be available for purchase in stores and online through Micro Center.

Or you may have to resort to the hackthat maker Shivasiddarthcreated when Voice Kit with MagPi #57 sold out in May, and then again (within 17 minutes) earlier this month.

Cool ways that makers are already using the Voice Kit

Martin Mander created a retro-inspired intercom that he calls 1986 Google Pi Intercom. He describes it as "a wall-mounted Google voice assistant using a Raspberry PI 3 and the Google AIY (Artificial Intelligence Yourself) [voice] kit." He used a mid-80s intercom that he bought on sale for £4. It cleaned up well!

Get the full story from Martin and see what Slashgear had to say about the project.

(This one's for Dr. Who fans) Tom Minnich created a Dalek-voiced assistant.

He offers a tutorialon how you can modify the Voice Kit to do something similar — perhaps create a Drogon-voiced assistant?

Victor Van Heeused the Voice Kit to create a voice-activated internet streaming radio that can play other types of audio files as well. He provides instructions, so you can do the same.

The Voice Kit is currently available in the U.S. We'll be expanding globally by the end of this year. Stay tuned here, where we'll share the latest updates. The strong demand for the Voice Kit drives us to keep the momentum going on AIY Projects.

Inspiring makers with kits that understand human speech, vision and movement

What we build next will include vision and motion detection and will go hand in hand with our existing Voice Kit. AIY Project kits will soon offer makers the "eyes," "ears," "voice" and sense of "balance" to allow simple yet powerful device interfaces.

We'd love to bake your input into our next releases. Go to hackster.io or leave a comment to start up a conversation with us. Show us and the maker community what you're working on by using hashtag #AIYprojects on social media.

Kaldi now offers TensorFlow integration

Posted by Raziel Alvarez, Staff Research Engineer at Google and Yishay Carmiel, Founder of IntelligentWire

Automatic speech recognition (ASR) has seen widespread adoption due to the recent proliferation of virtual personal assistants and advances in word recognition accuracy from the application of deep learning algorithms. Many speech recognition teams rely on Kaldi, a popular open-source speech recognition toolkit. We're announcing today that Kaldi now offers TensorFlow integration.

With this integration, speech recognition researchers and developers using Kaldi will be able to use TensorFlow to explore and deploy deep learning models in their Kaldi speech recognition pipelines. This will allow the Kaldi community to build even better and more powerful ASR systems as well as providing TensorFlow users with a path to explore ASR while drawing upon the experience of the large community of Kaldi developers.

Building an ASR system that can understand human speech in every language, accent, environment, and type of conversation is an extremely complex undertaking. A traditional ASR system can be seen as a processing pipeline with many separate modules, where each module operates on the output from the previous one. Raw audio data enters the pipeline at one end and a transcription of recognized speech emerges from the other. In the case of Kaldi, these ASR transcriptions are post processed in a variety of ways to support an increasing array of end-user applications.

Yishay Carmiel and Hainan Xu of Seattle-based IntelligentWire, who led the development of the integration between Kaldi and TensorFlow with support from the two teams, know this complexity first-hand. Their company has developed cloud software to bridge the gap between live phone conversations and business applications. Their goal is to let businesses analyze and act on the contents of the thousands of conversations their representatives have with customers in real-time and automatically handle tasks like data entry or responding to requests. IntelligentWire is currently focused on the contact center market, in which more than 22 million agents throughout the world spend 50 billion hours a year on the phone and about 25 billion hours interfacing with and operating various business applications.

For an ASR system to be useful in this context, it must not only deliver an accurate transcription but do so with very low latency in a way that can be scaled to support many thousands of concurrent conversations efficiently. In situations like this, recent advances in deep learning can help push technical limits, and TensorFlow can be very useful.

In the last few years, deep neural networks have been used to replace many existing ASR modules, resulting in significant gains in word recognition accuracy. These deep learning models typically require processing vast amounts of data at scale, which TensorFlow simplifies. However, several major challenges must still be overcome when developing production-grade ASR systems:

  • Algorithms - Deep learning algorithms give the best results when tailored to the task at hand, including the acoustic environment (e.g. noise), the specific language spoken, the range of vocabulary, etc. These algorithms are not always easy to adapt once deployed.
  • Data - Building an ASR system for different languages and different acoustic environments requires large quantities of multiple types of data. Such data may not always be available or may not be suitable for the use case.
  • Scale - ASR systems that can support massive amounts of usage and many languages typically consume large amounts of computational power.

One of the ASR system modules that exemplifies these challenges is the language model. Language models are a key part of most state-of-the-art ASR systems; they provide linguistic context that helps predict the proper sequence of words and distinguish between words that sound similar. With recent machine learning breakthroughs, speech recognition developers are now using language models based on deep learning, known as neural language models. In particular, recurrent neural language models have shown superior results over classic statistical approaches.

However, the training and deployment of neural language models is complicated and highly time-consuming. For IntelligentWire, the integration of TensorFlow into Kaldi has reduced the ASR development cycle by an order of magnitude. If a language model already exists in TensorFlow, then going from model to proof of concept can take days rather than weeks; for new models, the development time can be reduced from months to weeks. Deploying new TensorFlow models into production Kaldi pipelines is straightforward as well, providing big gains for anyone working directly with Kaldi as well as the promise of more intelligent ASR systems for everyone in the future.

Similarly, this integration provides TensorFlow developers with easy access to a robust ASR platform and the ability to incorporate existing speech processing pipelines, such as Kaldi's powerful acoustic model, into their machine learning applications. Kaldi modules that feed the training of a TensorFlow deep learning model can be swapped cleanly, facilitating exploration, and the same pipeline that is used in production can be reused to evaluate the quality of the model.

We hope this Kaldi-TensorFlow integration will bring these two vibrant open-source communities closer together and support a wide variety of new speech-based products and related research breakthroughs. To get started using Kaldi with TensorFlow, please check out the Kaldi repo and also take a look at an example for Kaldi setup running with TensorFlow.