Tag Archives: Big Data & Machine Learning

Dialogflow adds versioning and other new features to help enterprises build vibrant conversational experiences

At Google I/O recently, we announced that Dialogflow has been adopted by more than half a million developers, and that number is growing rapidly. We also released new features that make it easier for enterprises to create, maintain, and improve a vibrant conversational experience powered by AI: versions and environments, an improved history tool for easier debugging, and support for training models with negative examples.

Versions and Environments BETA
Dialogflow’s new versions and environments feature gives enterprises a familiar approach for building, testing, deploying, and iterating conversational interfaces. Using this beta feature, you can deploy multiple versions of agents (which represent the conversational interface of your application, device, or bot in Dialogflow) to separate, customizable environments, giving you more control over how new features are built, tested, and deployed. For example, you may want to maintain one version of your agent in production and another in your development environment, do quality assurance on a version that contains just a subset of features, or develop different draft versions of your agent at the same time.
Creating versions and environments
Publishing a version

Managing versions within an environment

You can publish your agent either to Google Assistant production or to the Action Console's Alpha and Beta testing environments. You can even invite and manage testers of your digital agents—up to 20 for Alpha releases, and up to 200 for Beta releases!

Get started right away: Explore this tutorial to learn more about how to activate and use versions and environments in Dialogflow.

Improved conversation history tool for easier debugging
Dialogflow’s history tool now cleanly displays conversations between your users and your agent, and flags places where your agent was unable to match an intent. It also links to diagnostics via new integration with Google Stackdriver Logging so you can easily diagnose and quickly fix performance issues. We’ve also expanded the diagnostic information shown in the test console, so you can see the raw request being sent to your fulfillment webhook as well as the response.

Training with negative examples for improved precision
It can be frustrating for end-users when certain phrases trigger unwanted intents. To improve your digital agent’s precision, you can now add negative examples as training phrases for fallback intents. For example, by providing the negative example “Buy bus ticket to San Francisco” in the Default Fallback Intent for an agent that only books flights, instead of classifying that request as a purchase intent for an airplane ticket, the agent will respond by clarifying which method of transportation is supported.

Try Dialogflow today using a free credit
See the quickstart to set up a Google Cloud Platform project and quickly create a digital agent with Dialogflow Enterprise Edition. Remember, you get a $300 free credit to get started with any GCP product (good for 12 months).

Scale big while staying small with serverless on GCP — the Guesswork.co story

[Editor’s note: Mani Doraisamy built two products—Guesswork.co and CommerceDNA—on top of Google Cloud Platform. In this blog post he shares insights into how his application architecture evolved to support the changing needs of his growing customer base while still staying cost-effective.]

Guesswork is a machine learning startup that helps e-commerce companies in emerging markets recommend products for first-time buyers on their site. Large and established e-commerce companies can analyze their users' past purchase history to predict what product they are most likely to buy next and make personalized recommendations. But in developing countries, where e-commerce companies are mostly focused on attracting new users, there’s no history to work from, so most recommendation engines don’t work for them. Here at Guesswork, we can understand users and recommend them relevant products even if we don’t have any prior history about them. To do that, we analyze lots of data points about where a new user is coming from (e.g., did they come from an email campaign for t-shirts, or a fashion blog about shoes?) to find every possible indicator of intent. Thus far, we’ve worked with large e-commerce companies around the world such as Zalora (Southeast Asia), Galeries Lafayette Group (France) and Daraz (South Asia).

Building a scalable system to support this workload is no small feat. In addition to being able to process high data volumes per each customer, we also need to process hundreds of millions of users every month, plus any traffic spikes that happen during peak shopping seasons.

As a bootstrapped startup, we had three key goals while designing the system:

  1. Stay small. As a small team of three developers, we didn’t want to add any additional personnel even if we needed to scale up for a huge volume of users.
  2. Stay profitable. Our revenue is based on the performance of our recommendation engine. Instead of a recurring fee, customers pay us a commission on sales to their users that come from our recommendations. This business model made our application architecture and infrastructure costs a key factor in our ability to turn a profit.
  3. Embrace constraints. In order to increase our development velocity and stay flexible, we decided to trade off control over our development stack and embrace constraints imposed by managed cloud services.

These three goals turned into our motto: "I would rather optimize my code than fundraise." By turning our business goals into a coding problem, we also had so much more fun. I hope you will too, as I recount how we did it.

Choosing a database: The Three Musketeers

The first stack we focused was the database layer. Since we wanted to build on top of managed services, we decided to go with Google Cloud Platform (GCP)—a best-in-class option when it comes to scaling, in our opinion.

But, unlike traditional databases, cloud databases are not general purpose. They are specialized. So we picked three separate databases for transactional, analytical and machine learning workloads. We chose:

  • Cloud Datastore for our transactional database, because it can support high number of writes. In our case, the user events are in the billions and are updated in real time into Cloud Datastore.
  • BigQuery to analyze user behaviour. For example, we understand from BigQuery that users coming from a fashion blog usually buy a specific type of formal shoes.
  • Vision API to analyze product images and categorize products. Since we work with e-commerce companies across different geographies, the product names and descriptions are in different languages, and categorizing products based on images is more efficient than text analysis. We use this data along with user behaviour data from BigQuery and Cloud Datastore to make product recommendations.

First take: the App Engine approach

Once we chose our databases, we moved on to selecting the front-end service to receive user events from e-commerce sites and update Cloud Datastore. We chose App Engine, since it is a managed service and scales well at our volumes. Once App Engine updates the user events in Cloud Datastore, we synchronized that data into BigQuery and our recommendation engine using Cloud Dataflow, another managed service that orchestrates different databases in real time (i.e., streaming mode).

This architecture powered the first version of our product. As our business grew, our customers started asking for new features. One feature request was to send alerts to users when the price of a product changed. So, in the second version, we began listening to price changes in our e-commerce sites and triggered events to send alerts. The product’s price is already recorded as a user event in Cloud Datastore, but to detect change:

  • We compare the price we receive in the user event with the product master and determine if there is a difference.
  • If there is a difference, we propagate it to the analytical and machine learning databases to trigger an alert and reflect that change in the product recommendation.

There are millions of user events every day. Comparing each user event data with product master increased the number of reads on our datastore dramatically. Since each Cloud Datastore read counts toward our GCP monthly bill, it increased our costs to an unsustainable level.

Take two: the Cloud Functions approach

To bring down our costs, we had two options for redesigning our system:

  • Use memcache to load the product master in memory and compare the price/stock for every user event. With this option, we had no guarantee that memcache would be able to hold so many products in memory. So, we might miss a price change and end up with inaccurate product prices.
  • Use Cloud Firestore to record user events and product data. Firestore has an option to trigger Cloud Functions whenever there’s a change in value of an entity. In our case, the price/stock change automatically triggers a cloud function that updates the analytical and machine learning databases.

During our redesign, Firestore and Cloud Functions were in alpha, but we decided to use them as it gave us a clean and simple architecture:

  • With Firestore, we replaced both App Engine and Datastore. Firestore was able to accept user requests directly from a browser without the need for a front-end service like App Engine. It also scaled well like Datastore.
  • We used Cloud Functions not only as a way to trigger price/stock alerts, but as an orchestration tool to synchronize data between Firestore, BigQuery and our recommendation engine.

It turned out to be a good decision, as Cloud Functions scaled extremely well, even in alpha. For example, we went from one to 20 million users on Black Friday. In this new architecture, Cloud Functions replaced Dataflow’s streaming functionality with triggers, while providing a more intuitive language (JavaScript) than Dataflow’s pipeline transformations. Eventually, Cloud Functions became the glue that tied all the components together.

What we gained

Thanks to the flexibility of our serverless microservice-oriented architecture, we were able to replace and upgrade components as the needs of our business evolved without redesigning the whole system. We achieved the key goal of being profitable by using the right set of managed services and keeping our infrastructure costs well below our revenue. And since we didn't have to manage any servers, we were also able to scale our business with a small engineering team and still sleep peacefully at night.

Additionally, we saw some great outcomes that we didn't initially anticipate:

  • We increased our sales commissions by improving recommendation accuracy

    The best thing that happened in this new version was the ability to A/B test new algorithms. For example, we found that users who browse e-commerce sites with an Android phone are more likely to buy products that are on sale. So, we included user’s device as a feature in the recommendation algorithm and tested it with a small sample set. Since, Cloud Functions are loosely coupled (with Cloud Pub/Sub), we could implement a new algorithm and redirect users based on their device and geography. Once the algorithm produced good results, we rolled it out to all users without taking down the system. With this approach, we were able to continuously improve the accuracy of our recommendations, increasing revenue.
  • We reduced costs by optimizing our algorithm

    As counter intuitive it may sound, we also found that paying more money for compute didn't improve accuracy. For example, we analyzed a month of a user’s events vs. the latest session’s events to predict what the user was likely to buy next. We found that the latest session was more accurate even though it had less data points. The simpler and more intuitive the algorithm, the better it performed. Since Cloud Functions are modular by design, we were able to refactor each module and reduce costs without losing accuracy.
  • We reduced our dependence on external IT teams and signed more customers 

    We work with large companies and depending on their IT team, it can take a long time to integrate our solution. Cloud Functions allowed us to implement configurable modules for each of our customers. For example, while working with French e-commerce companies, we had to translate the product details we receive in the user events into English. Since Cloud Functions supports Node.js, we enabled scriptable modules in JavaScript for each customer that allowed us to implement translation on our end, instead of waiting for the customer’s IT team. This reduced our go-live time from months to days, and we were able to sign up new customers who otherwise might not have been able to invest the necessary time and effort up-front.

Since Cloud Functions was alpha at the time, we did face challenges while implementing non-standard functionality such as running headless Chrome. In such cases, we fell back on App Engine flexible environment and Compute Engine. Over time though, the Cloud Functions product team moved most of our desired functionality back into the managed environment, simplifying maintenance and giving us more time to work on functionality.

Let a thousand flowers bloom

If there is one take away from this story, it is this: Running a bootstrapped startup that serves 100 million users with three developers was unheard of just five years ago. With the relentless pursuit of abstraction among cloud platforms, this has become a reality. Serverless computing is at the bleeding edge of this abstraction. Among the serverless computing products, I believe Cloud Functions has a leg up on its competition because it stands on the shoulders of GCP's data products and their near-infinite scale. By combining simplicity with scale, Cloud Functions is the glue that makes GCP greater than the sum of its parts.The day has come when a bootstrapped startup can build a large-scale application like Gmail or Salesforce. You just read one such story— now it’s your turn :)

Registration for the Associate Cloud Engineer beta exam is now open

Mastering a discipline depends on learning the fundamentals of a craft before you can aspire to the next level.

To this point, we’ve developed a new certification exam, Associate Cloud Engineer, that identifies individuals who have the foundational skills necessary to use Google Cloud Console to deploy applications, monitor operations, and manage enterprise solutions. We are excited to announce that registration for the Associate Cloud Engineer beta exam is now open.

As businesses move in growing numbers to cloud-based environments, the need to hire or fill existing skills gaps with individuals proficient in cloud technology has skyrocketed. Unfortunately, there is a clear lack of people with the requisite skills to work with cloud technologies.

If you’re an aspiring cloud architect or data engineer who is technically proficient in the Google Cloud environment but don’t have years of experience designing cloud solutions, this certification is for you. The Associate Cloud Engineer is an entry point to our professional-level cloud certifications, Cloud Architect and Data Engineer, which recognize individuals who can use Google Cloud Platform (GCP) to solve more complex and strategic business problems.

Demonstrate that you have mastered the fundamental cloud skills as an Associate Cloud Engineer so you can take your next steps to become a Google Cloud Certified professional.

  • The beta exam is now open for registration. The testing period runs May 9-30, 2018
  • To earn this certification, you must successfully pass our Associate Cloud Engineer exam
  • Save 40% on the cost of certification by participating in this beta
  • The length of the exam is four hours 

When you become an Associate Cloud Engineer, you show potential employers that you have the essential skills to work on GCP. So, what are you waiting for?

Register to take the beta exam today.

BigQuery arrives in the Tokyo region

We launched BigQuery, our enterprise data warehouse solution, in 2011 so that our customers could leverage the processing power of Google's infrastructure to perform super-fast SQL queries. And although BigQuery is already available to all our customers no matter where they’re located, many enterprises need additional options for storing data and performing analysis in the countries where they operate. That’s why we’re rolling out regional availability for BigQuery. Google Cloud’s Tokyo region is our first expansion site, but it’s only the first step in a globally phased rollout that will continue throughout 2018 and 2019.

By bringing BigQuery availability to places like Tokyo, we’re helping more enterprises analyze and draw actionable insights from their business data at scales prohibitive or impossible with legacy data warehouses. Mizuho Bank—a leading global bank with one of the largest customer bases in Japan and a global network of financial and business centers—is one of many businesses exploring the potential for local BigQuery resources newly at their disposal.
“We believe BigQuery will become a significant driver to transform the way our data analysts work. Mizuho Bank currently runs SQL queries in on-premise data warehouses, but we used to experience issues of long processing time due to a fixed limit on our data-processing resources. We have wanted to move the data to the cloud for some time. Now that BigQuery is available at Google Cloud’s Tokyo region, we were able to conduct a proof of concept (PoC) under the conditions that satisfy Mizuho’s security requirements. We used real-world data to design the PoC to reflect our day-to-day operations. 
With BigQuery, we no longer need to worry about limited data-processing resources. We can aggregate processing tasks, perform parallel processing on large queries, and engage multiprocessing in order to substantially reduce working hours of data analysts collaborating across multiple departments. This streamlining can potentially generate more time for our data analysts to work on designing queries and models, and then interpreting the results. BigQuery can not only achieve cost-savings over the existing system but also provides integrated services that are very easy to use. 
The PoC has been a great success as we were able to confirm that by using data processing and visualization features like Dataprep and Data Studio with BigQuery, we can conduct all these data analysis processes seamlessly in the cloud.”

—Yoshikazu Kurosu, Senior Manager of Business Development Department, Mizuho Bank, Ltd. 

Financial services organizations are one of several industries that depend on the robust security Google Cloud offers when storing and analyzing sensitive data. Another industry that shares the same challenges is telecom providers. NTT Communications, a global leader in information and communication technology (ICT) solutions, is also looking to BigQuery due to its speed and scale.
“We’ve been using BigQuery on our Enterprise Cloud, a service we provide for enterprises, to detect and analyze anomalies in our network and servers. In our proof of concept (PoC) using BigQuery in Google Cloud’s Tokyo region, we performed evaluations of large-scale log data (cumulative total of 27.8 billion records) streamed real-time from nine regions around the world. Data analytics infrastructure requires real-time handling of extensive log data generated by both equipment and software. Our infrastructure also depends on speed and power to perform ultra-high speed analysis and output of time-series data.

BigQuery achieves high-speed response even in emergency situations, offers an excellent cost-performance ratio, and enables usage and application of large-scale log data that exceeds the capabilities of traditional data warehouses. We will continue to strengthen our cooperation with GCP services, BigQuery included, to provide cloud solutions to support secure data usage on behalf of our customers.” 
—Masaaki Moribayashi, Senior Vice President, Cloud Services, NTT Communications Corporation 
We hope regional availability will help more enterprises use BigQuery to store and analyze their sensitive data in a way that addresses local requirements.

To learn more about BigQuery, visit our website. And to get started using BigQuery in the US, EU or Tokyo, read our documentation on working with Dataset Locations.

Dialogflow Enterprise Edition is now generally available

Back in November, we announced the beta of Dialogflow Enterprise Edition. Today, on the heels of introducing Cloud Text-to-Speech and updating Cloud Speech-to-Text, we’re releasing Dialogflow Enterprise Edition for general availability, complete with support and a Service Level Agreement (SLA) that businesses need for their production deployments. It’s been a busy month for Cloud AI and conversational technology!

Hundreds of thousands of developers already use Dialogflow (formerly API.AI) to build voice- and text-based conversational experiences powered by machine learning and natural language understanding. Even if you have little or no experience in conversational interfaces, you can build natural and rich experiences in days or weeks with the easy-to-use Dialogflow platform. Further, Dialogflow's cross-platform capabilities let you build the experience once and launch it across websites, mobile apps and a variety of popular platforms and devices, including the Google Assistant, Amazon Alexa, and Facebook Messenger.

Starting today, Dialogflow API V2 is generally available and is now the default for all new agents, integrating with Google Cloud Speech-to-Text, enabling agent management via API, supporting gRPC and providing an easy transition to Enterprise Edition with no code migration.

We're constantly expanding Dialogflow capabilities to provide you with the best developer experience. Here are some other new features that we added since the beta:

KLM Royal Dutch Airlines, Domino’s and Ticketmaster have built conversational experiences with Dialogflow, helping them meet their customers where they are and assist them across their journeys. And for DPD, the UK’s leading parcel delivery company, “Dialogflow made it easy to build an AI-powered conversational experience that delights consumers using the resources and skill sets we already have. We estimate that Dialogflow helped us get our conversational interface to market 12 months sooner than planned,” says Max Glaisher, Product Innovation Manager at DPD.

Innovative businesses across industries are adopting Dialogflow Enterprise Edition

Companies are now turning to Dialogflow Enterprise Edition as they scale up to engage with growing numbers of users across multiple platforms.


Video game publisher Ubisoft uses Dialogflow as part of “Sam,” a personal gaming assistant that delivers personalized information and tips related to its video games and services. “Using Dialogflow Enterprise Edition, Ubisoft has access to a natural language processing system for Sam that can understand text and voice input efficiently out of the box,” says Thomas Belmont, Producer at Ubisoft. He adds that while developing Sam, “The team needed tools that let them iterate quickly and make modifications immediately, and Dialogflow Enterprise Edition was the best choice for those needs.” Dialogflow has also helped ensure a good user experience, with Sam answering 88% of player (user) requests in its first three months as a beta.
                                  (click to enlarge)

Best Buy Canada

To enhance the online shopping experience, Best Buy Canada built a conversational experience to make it quicker and easier for consumers to find the information they need using the Google Assistant. “Using Dialogflow, we've been able to steadily increase user sessions with our agent,” says Chris Rowinski, Product Owner at Best Buy Canada. “In the coming months, we plan on moving to Dialogflow Enterprise Edition so we can continue to scale up as our users and user engagement grow on voice- and screen-based devices."
                                (click to enlarge)

(Experience available in Canada only)


Ticketmaster, the world’s top live-event ticketing company, picked Dialogflow to build a conversational experience for event discovery and ticketing, and is now also developing a solution to optimize interactive voice response (IVR) for customer service. “I remember how excited I was the first time I saw Dialogflow; my mind started racing with ideas about how Ticketmaster could benefit from a cloud-based natural language processing provider,” says Tariq El-Khatib, Product Manager at Ticketmaster. “Now with the launch of Dialogflow Enterprise Edition, I can start turning those ideas into reality. With higher transaction quotas and support levels, we can integrate Dialogflow with our Customer Service IVR to increase our rate of caller intent recognition and improve customer experience.”

Try Dialogflow today using a free credit

See the quickstart to set up a Google Cloud Platform project and quickly create a Dialogflow Enterprise Edition agent. Remember, you get a $300 free credit to get started with any GCP product (good for 12 months).

Toward better phone call and video transcription with new Cloud Speech-to-Text

It’s been full speed ahead for our Cloud AI speech products as of late. Last month, we introduced Cloud Text-to-Speech, our speech synthesis API featuring DeepMind WaveNet models. And today, we’re announcing the largest overhaul of Cloud Speech-to-Text (formerly known as Cloud Speech API) since it was introduced two years ago.

We first unveiled the Cloud Speech API in 2016, and it’s been generally available for almost a year now, with usage more than doubling every six months. Today, with the opening of NAB and SpeechTek conferences, we’re introducing new features and updates that we think will make Speech-to-Text much more useful for business, including phone-call and video transcription.

Cloud Speech-to-Text now supports:

  1. A selection of pre-built models for improved transcription accuracy from phone calls and video
  2. Automatic punctuation, to improve readability of transcribed long-form audio
  3. A new mechanism (recognition metadata) to tag and group your transcription workloads, and provide feedback to the Google team
  4. A standard service level agreement (SLA) with a commitment to 99.9% availability

Let’s take a deeper look at the new updates to Cloud Speech-to-Text.

New video and phone call transcription models

There are lots of different ways to use speech recognition technology—everything from human-computer interaction (e.g., voice commands or IVRs) to speech analytics (e.g., call center analytics). In this version of Cloud Speech-to-Text, we’ve added models that are tailored for specific use cases— e.g., phone call transcriptions and transcriptions of audio from video.
For example, for processing phone calls, we’ve routed incoming English US phone call requests to a model that's optimized to handle phone calls and is considered by many customers to be best-in-class in the industry. Now we’re giving customers the power to explicitly choose the model that they prefer rather than rely on automatic model selection.

Most major cloud providers use speech data from incoming requests to improve their products. Here at Google Cloud, we’ve avoided this practice, but customers routinely request that we use real data that's representative of theirs, to improve our models. We want to meet this need, while being thoughtful about privacy and adhering to our data protection policies. That’s why today, we’re putting forth one of the industry’s first opt-in programs for data logging, and introducing a first model based on this data: enhanced phone_call.

We developed the enhanced phone_call model using data from customers who volunteered to share their data with Cloud Speech-to-Text for model enhancement purposes. Customers who choose to participate in the program going forward will gain access to this and other enhanced models that result from customer data. The enhanced phone_call model has 54% fewer errors than our basic phone_call model for our phone call test set.
In addition, we’re also unveiling the video model, which has been optimized to process audio from videos and/or audio with multiple speakers. The video model uses machine learning technology similar to that used by YouTube captioning, and shows a 64% reduction in errors compared to our default model on a video test set.

Both the enhanced phone_call and premium-priced video model are now available for en-US transcription and will soon be available for additional languages. We also continue to offer our existing models for voice command_and_search, as well as our default model for longform transcription.
Check out the demo on our product website to upload an audio file and see transcription results from each of these models.

Generate readable text with automatic punctuation

Most of us learn how to use basic punctuation (commas, periods, question marks) by the time we leave grade school. But properly punctuating transcribed speech is hard to do. Here at Google, we learned just how hard it can be from our early attempts at transcribing voicemail messages, which produced run-on sentences that were notoriously hard to read.
A few years ago, Google started providing automatic punctuation with our Google Voice voicemail transcription service. Recently, the team created a new LSTM neural network to improve automating punctuation in long-form speech transcription. Architected with performance in mind, the model is now available to you in beta in Cloud Speech-to-Text, and can automatically suggests commas, question marks and periods for your text.

Describe your use cases with recognition metadata

The progress we've made with Cloud Speech-to-Text is due in large part to the feedback you’ve given us over the last two years, and we want to open up those lines of communication even further, with recognition metadata. Now, you can describe your transcribed audio or video with tags such as “voice commands for a shopping app” or “basketball sports tv shows.” We then aggregate this information across Cloud Speech-to-Text users to prioritize what we work on next. Providing recognition metadata increases the probability that your use case will improve with time, but the program is entirely optional.

Customer references

We’re really excited about this new version of Cloud Speech-to-Text, but don’t just take our word for it—here’s what our customers have to say.
“Unstructured data, like audio, is full of rich information but many businesses struggle to find applications that make it easy to extract value from it and manage it. Descript makes it easier to edit and view audio files, just like you would a document. We chose to power our application with Google Cloud Speech-to-Text. Based on our testing, it’s the most advanced speech recognition technology and the new video model had half as many errors as anything else we looked at. And, with its simple pricing model, we’re able to offer the best prices for our users.”  
Andrew Mason, CEO, Descript
"LogMeIn’s GoToMeeting provides market leading collaboration software to millions of users around the globe. We are always looking for the best customer experience and after evaluating multiple solutions to allow our users to transcribe meetings we found Google’s Cloud Speech-to-Text’s new video model to be far more accurate than anything else we’ve looked at. We are excited to work with Google to help drive value for our customers beyond the meeting with the addition of transcription for GoToMeeting recordings." 
 – Matt Kaplan, Chief Product Officer, Collaboration Products at LogMeIn
"At InteractiveTel, we've been using Cloud Speech-to-Text since the beginning to power our real-time telephone call transcription and analytics products. We've constantly been amazed by Google's ability to rapidly improve features and performance, but we were stunned by the results obtained when using the new phone_call model. Just by switching to the new phone_call model we experienced accuracy improvements in excess of 64% when compared to other providers, and 48% when compared to Google's generic narrow-band model."  
 Jon Findley, Lead Product Engineer, InteractiveTel
Access to quality speech transcription technology opens up a world of possibilities for companies that want to connect with and learn from their users. With this update to Cloud Speech-to-Text, you get access to the latest research from our team of machine learning experts, all through a simple REST API. Pricing is $0.006 per 15 seconds of audio for all models except the video model, which is $0.012 per 15 seconds. We'll be providing the new video model for the same price ($0.006 per 15 seconds) for a limited trial period through May 31. To learn more, try out the demo on our product page or visit our documentation.

Introducing Cloud Text-to-Speech powered by DeepMind WaveNet technology

Many Google products (e.g., the Google Assistant, Search, Maps) come with built-in high-quality text-to-speech synthesis that produces natural sounding speech. Developers have been telling us they’d like to add text-to-speech to their own applications, so today we’re bringing this technology to Google Cloud Platform with Cloud Text-to-Speech.

You can use Cloud Text-to-Speech in a variety of ways, for example:
  • To power voice response systems for call centers (IVRs) and enabling real-time natural language conversations 
  • To enable IoT devices (e.g., TVs, cars, robots) to talk back to you 
  •  To convert text-based media (e.g., news articles, books) into spoken format (e.g., podcast or audiobook)
Cloud Text-to-Speech lets you choose from 32 different voices from 12 languages and variants. Cloud Text-to-Speech correctly pronounces complex text such as names, dates, times and addresses for authentic sounding speech right out of the gate. Cloud Text-to-Speech also allows you to customize pitch, speaking rate, and volume gain, and supports a variety of audio formats, including MP3 and WAV.

Rolling in the DeepMind

In addition, we're excited to announce that Cloud Text-to-Speech also includes a selection of high-fidelity voices built using WaveNet, a generative model for raw audio created by DeepMind. WaveNet synthesizes more natural-sounding speech and, on average, produces speech audio that people prefer over other text-to-speech technologies.

In late 2016, DeepMind introduced the first version of WaveNet  a neural network trained with a large volume of speech samples that's able to create raw audio waveforms from scratch. During training, the network extracts the underlying structure of the speech, for example which tones follow one another and what shape a realistic speech waveform should have. When given text input, the trained WaveNet model generates the corresponding speech waveforms, one sample at a time, achieving higher accuracy than alternative approaches.

Fast forward to today, and we're now using an updated version of WaveNet that runs on Google’s Cloud TPU infrastructure.The new, improved WaveNet model generates raw waveforms 1,000 times faster than the original model, and can generate one second of speech in just 50 milliseconds. In fact, the model is not just quicker, but also higher-fidelity, capable of creating waveforms with 24,000 samples a second. We’ve also increased the resolution of each sample from 8 bits to 16 bits, producing higher quality audio for a more human sound.
With these adjustments, the new WaveNet model produces more natural sounding speech. In tests, people gave the new US English WaveNet voices an average mean-opinion-score (MOS) of 4.1 on a scale of 1-5 — over 20% better than for standard voices and reducing the gap with human speech by over 70%. As WaveNet voices also require less recorded audio input to produce high quality models, we expect to continue to improve both the variety as well as quality of the WaveNet voices available to Cloud customers in the coming months.
Cloud Text-to-Speech is already helping multiple customers deliver a better experience to their end users. Customers include Cisco and Dolphin ONE.
“As the leading provider of collaboration solutions, Cisco has a long history of bringing the latest technology advances into the enterprise. Google’s Cloud Text-to-Speech has enabled us to achieve the natural sound quality that our customers desire."  
 Tim Tuttle, CTO of Cognitive Collaboration, Cisco
“Dolphin ONE’s Calll.io telephony platform offers connectivity from a multitude of devices, at practically any location. We’ve integrated Cloud Text-to-Speech into our products and allow our users to create natural call center experiences. By using Google Cloud’s machine learning tools, we’re instantly delivering cutting-edge technology to our users.” 
Jason Berryman, Dolphin ONE

Get started today

With Cloud Text-to-Speech, you’re now a few clicks away from one of the most advanced speech technologies in the world. To learn more, please visit the documentation or our pricing page. To get started with our public beta or try out the new voices, visit the Cloud Text-to-Speech website.

Cloud TPU machine learning accelerators now available in beta

Starting today, Cloud TPUs are available in beta on Google Cloud Platform (GCP) to help machine learning (ML) experts train and run their ML models more quickly.
Cloud TPUs are a family of Google-designed hardware accelerators that are optimized to speed up and scale up specific ML workloads programmed with TensorFlow. Built with four custom ASICs, each Cloud TPU packs up to 180 teraflops of floating-point performance and 64 GB of high-bandwidth memory onto a single board. These boards can be used alone or connected together via an ultra-fast, dedicated network to form multi-petaflop ML supercomputers that we call “TPU pods.” We will offer these larger supercomputers on GCP later this year.

We designed Cloud TPUs to deliver differentiated performance per dollar for targeted TensorFlow workloads and to enable ML engineers and researchers to iterate more quickly. For example:

  • Instead of waiting for a job to schedule on a shared compute cluster, you can have interactive, exclusive access to a network-attached Cloud TPU via a Google Compute Engine VM that you control and can customize. 
  • Rather than waiting days or weeks to train a business-critical ML model, you can train several variants of the same model overnight on a fleet of Cloud TPUs and deploy the most accurate trained model in production the next day. 
  • Using a single Cloud TPU and following this tutorial, you can train ResNet-50 to the expected accuracy on the ImageNet benchmark challenge in less than a day, all for well under $200! 

ML model training, made easy

Traditionally, writing programs for custom ASICs and supercomputers has required deeply specialized expertise. By contrast, you can program Cloud TPUs with high-level TensorFlow APIs, and we have open-sourced a set of reference high-performance Cloud TPU model implementations to help you get started right away:

To save you time and effort, we continuously test these model implementations both for performance and for convergence to the expected accuracy on standard datasets.

Over time, we'll open-source additional model implementations. Adventurous ML experts may be able to optimize other TensorFlow models for Cloud TPUs on their own using the documentation and tools we provide.

By getting started with Cloud TPUs now, you’ll be able to benefit from dramatic time-to-accuracy improvements when we introduce TPU pods later this year. As we announced at NIPS 2017, both ResNet-50 and Transformer training times drop from the better part of a day to under 30 minutes on a full TPU pod, no code changes required.

Two Sigma, a leading investment management firm, is impressed with the performance and ease of use of Cloud TPUs.
"We made a decision to focus our deep learning research on the cloud for many reasons, but mostly to gain access to the latest machine learning infrastructure. Google Cloud TPUs are an example of innovative, rapidly evolving technology to support deep learning, and we found that moving TensorFlow workloads to TPUs has boosted our productivity by greatly reducing both the complexity of programming new models and the time required to train them. Using Cloud TPUs instead of clusters of other accelerators has allowed us to focus on building our models without being distracted by the need to manage the complexity of cluster communication patterns." 
Alfred Spector, Chief Technology Officer, Two Sigma

A scalable ML platform

Cloud TPUs also simplify planning and managing ML computing resources:

  • You can provide your teams with state-of-the-art ML acceleration and adjust your capacity dynamically as their needs change. 
  • Instead of committing the capital, time and expertise required to design, install and maintain an on-site ML computing cluster with specialized power, cooling, networking and storage requirements, you can benefit from large-scale, tightly-integrated ML infrastructure that has been heavily optimized at Google over many years.
  • There’s no more struggling to keep drivers up-to-date across a large collection of workstations and servers. Cloud TPUs are preconfigured—no driver installation required!
  • You are protected by the same sophisticated security mechanisms and practices that safeguard all Google Cloud services.

“Since working with Google Cloud TPUs, we’ve been extremely impressed with their speed—what could normally take days can now take hours. Deep learning is fast becoming the backbone of the software running self-driving cars. The results get better with more data, and there are major breakthroughs coming in algorithms every week. In this world, Cloud TPUs help us move quickly by incorporating the latest navigation-related data from our fleet of vehicles and the latest algorithmic advances from the research community.
Anantha Kancherla, Head of Software, Self-Driving Level 5, Lyft
Here at Google Cloud, we want to provide customers with the best cloud for every ML workload and will offer a variety of high-performance CPUs (including Intel Skylake) and GPUs (including NVIDIA’s Tesla V100) alongside Cloud TPUs.

Getting started with Cloud TPUs

Cloud TPUs are available in limited quantities today and usage is billed by the second at the rate of $6.50 USD / Cloud TPU / hour.

We’re thrilled to see the enthusiasm that customers have expressed for Cloud TPUs. To help us manage demand, please sign up here to request Cloud TPU quota and describe your ML needs. We’ll do our best to give you access to Cloud TPUs as soon as we can.

To learn more about Cloud TPUs, join us for a Cloud TPU webinar on February 27th, 2018.

How we built a serverless digital archive with machine learning APIs, Cloud Pub/Sub and Cloud Functions

[Editor’s note: Today we hear from Incentro, a digital service provider and Google partner, which recently built a digital asset management solution on top of GCP. It combines machine learning services like Cloud Vision and Speech APIs to easily find and tag digital assets, plus Cloud Pub/Sub and Cloud Functions for an automated, serverless solution. Read on to learn how they did it.]

Here at Incentro, we have a large customer base among media and publishing companies. Recently, we noticed that our customers struggle with storing and searching for digital media assets in their archives. It’s a cumbersome process that involves a lot of manual tagging. As a result, the videos are often stored without being properly tagged, making it nearly impossible to find and reuse these assets afterwards.

To eliminate this sort of manual labour and to generate more business value from these expensive video assets, we sought to create a solution that would take care of mundane tasks like tagging photos and videos. This solution is called Segona Media (https://segona.io/media), and lets our customers store assets and tag and index their digital assets automatically.

Segona Media management features

Segona Media currently supports images, video and audio assets. For each of these asset types, Google Cloud provides specific managed APIs to extract relevant content from the asset without customers having to tag them manually or transcribe them.

  • For images, Cloud Vision API extracts most of the content we need: labels, landmarks, text and image properties are all extracted and can be used to find an image.
  • For audio, Cloud Speech API showed us tremendous results in transcribing an audio track. After extracting the audio into speech, we also use Google Cloud Natural Language API to discover sentiment and categories in the transcription. This way, users can search for spoken text, but also search for categories of text and even sentiment.
  • For video, we typically use a combination of audio and image analysis. Cloud Video Intelligence API extracts labels and timeframes, and detects explicit content. On top of that, we process the audio track from a video the same way we process audio assets (see above). This way users can search content from the video as well as from spoken text in the video.

Segona Media architecture

The traditional way for developing a solution like this involves getting hardware running, determining and installing application servers, databases, storage nodes, etc. After developing and getting the solution into production you may then come across a variety of familiar challenges: the operating system needs to be updated or upgraded or databases don't scale to cope with unexpected production data. We didn't want any of this, so after careful consideration, decided on a completely managed solution and serverless architecture. That way we’d have no servers to maintain, we could leverage Google’s ongoing API improvements and our solution could scale to handle the largest archives we could find.

We wanted Segona Media to also be able to easily connect to common tools in the media and publishing industries. Adobe InDesign, Premiere, Photoshop and Digital Asset Management solutions must all be able to easily store and retrieve assets from Segona Media. We solved this by using GCP APIs that were already in place for storing assets in Google Cloud Storage and just take it from there. We retrieve assets using the managed Elasticsearch API that runs on GCP.

Each action that Segona Media performs is a separate Google Cloud Function, usually triggered mostly by a Cloud Pub/Sub queue. Using a Pub/Sub queue to trigger a Cloud Function is an easy and scalable way to publish new actions.

Here’s a high-level architecture view of Segona Media:

High Level Architecture

And here's how the assets flow through Segona Media:

  1. An asset is uploaded/stored to a Cloud Storage bucket
  2. This event triggers a Cloud Function, which generates a unique ID, extracts metadata from the file object, moves it to the appropriate bucket (with lifecycle management) and creates the asset in the Elasticsearch index (we run Elastic Cloud hosted on GCP).
  3. This queues up multiple asset processors in Google Cloud Pub/Sub that are specific for an asset type and that extract relevant content from assets using Google APIs.

Media asset management

Now, let's see how Segona Media handles different types of media assets.


Images have a lot of features on which you can search, which we do via a dedicated microservice processor.

  • We use ImageMagick to extract metadata from the image object itself. We extract all XMP and EXIF metadata that's embedded in the file itself. This information is then added to the Elastic index and makes the image searchable by, for example, copyright info or resolution. 
  • Cloud Vision API extracts labels in the image, landmarks, text and image properties. This takes away manual tagging of objects in the image and makes a picture searchable for its contents.
  • Segona Media offers customers to create custom labels. For example, a television manufacturer might want to know the specific model of a television set in a picture. We’ve implemented custom predictions by building our own Tensorflow models trained on custom data, and we train and run predictions on Cloud ML Engine.
  • For easier serving of all assets, we also create a low resolution thumbnail of every image.


Processing audio is pretty straightforward. We want to be able to search for spoken text in audio files, and we use Cloud Speech API to extract text from the audio. We then feed the transcription into the Elasticsearch index, making the audio file searchable by every word.


Video is basically the combination of everything we do with images and audio files. There are some minor differences though, so let's see what microservices we invoke for these assets:

  • First of all, we create a thumbnail so we can serve up a low-res image of the video. We take a thumbnail at 50% of the video. We do this by combining FFmpeg and FFprobe in Cloud Functions, and store this thumbnail alongside the video asset. Creating a thumbnail with Cloud Functions and FFmpeg is easy! Check out this code snippet: https://bitbucket.org/snippets/keesvanbemmel/keAkqe 
  • Using the same FFmpeg architecture, we extract the audio stream from the video. This audio stream is then processed like any other audio file: We extract the text from spoken text in the audio stream and add it to the Elastic index so the video itself can also be found by searching for every word that's spoken. We extract the audio stream from the video in a single channel FLAC format as this gives us the best results. 
  • We also extract relevant information from the video contents itself using Cloud Video Intelligence. We extract labels that are in the video as well as the timestamps for when the labels were created. This way, we know which objects are at what point in the video. Knowing the timestamps for a given label is a fantastic way to point a user to not only a video, but the exact moment in the video that contains the object they're looking for.

There you have it—a summary of how to do smart media tagging in a completely serverless fashion, without all the OS updates when scaling up or out, or of course, the infrastructure maintenance and support! This way we can focus on what we care about: bringing an innovative, scalable solution to our end customers. Any questions? Let us know! We love to talk about this stuff ;) Leave a comment below, email me at kees.vanbemmel@incentro.com, or find us on Twitter at @incentro_.

Introducing Preemptible GPUs: 50% Off

In May 2015, Google Cloud introduced Preemptible VM instances to dramatically change how you think about (and pay for) computational resources for high-throughput batch computing, machine learning, scientific and technical workloads. Then last year, we introduced lower pricing for Local SSDs attached to Preemptible VMs, expanding preemptible cloud resources to high performance storage. Now we're taking it even further by announcing the beta release of GPUs attached to Preemptible VMs.

You can now attach NVIDIA K80 and NVIDIA P100 GPUs to Preemptible VMs for $0.22 and $0.73 per GPU hour, respectively. This is 50% cheaper than GPUs attached to on-demand instances, which we also recently lowered. Preemptible GPUs will be a particularly good fit for large-scale machine learning and other computational batch workloads as customers can harness the power of GPUs to run distributed batch workloads at predictably affordable prices.

As a bonus, we're also glad to announce that our GPUs are now available in our us-central1 region. See our GPU documentation for a full list of available locations.

Resources attached to Preemptible VMs are the same as equivalent on-demand resources with two key differences: Compute Engine may shut them down after providing you a 30-second warning, and you can use them for a maximum of 24 hours. This makes them a great choice for distributed, fault-tolerant workloads that don’t continuously require any single instance, and allows us to offer them at a substantial discount. But just like its on-demand equivalents, preemptible pricing is fixed. You’ll always get low cost, financial predictability and we bill on a per-second basis. Any GPUs attached to a Preemptible VM instance will be considered Preemptible and will be billed at the lower rate. To get started, simply append --preemptible to your instance create command in gcloud, specify scheduling.preemptible to true in the REST API or set Preemptibility to "On" in the Google Cloud Platform Console and then attach a GPU as usual. You can use your regular GPU quota to launch Preemptible GPUs or, alternatively, you can request a special Preemptible GPUs quota that only applies to GPUs attached to Preemptible VMs.
For users looking to create dynamic pools of affordable GPU power, Compute Engine’s managed instance groups can be used to automatically re-create your preemptible instances when they're preempted (if capacity is available). Preemptible VMs are also integrated into cloud products built on top of Compute Engine, such as Kubernetes Engine (GKE’s GPU support is currently in preview. The sign-up form can be found here).

Over the years we’ve seen customers do some very exciting things with preemptible resources: everything from solving problems in satellite image analysis, financial services, questions in quantum physics, computational mathematics and drug screening.
"Preemptible GPU instances from GCP give us the best combination of affordable pricing, easy access and sufficient scalability. In our drug discovery programs, cheaper computing means we can look at more molecules, thereby increasing our chances of finding promising drug candidates. Preemptible GPU instances have advantages over the other discounted cloud offerings we have explored, such as consistent pricing and transparent terms. This greatly improves our ability to plan large simulations, control costs and ensure we get the throughput needed to make decisions that impact our projects in a timely fashion." 
Woody Sherman, CSO, Silicon Therapeutics 

We’re excited to see what you build with GPUs attached to Preemptible VMs. If you want to share stories and demos of the cool things you've built with Preemptible VMs, reach out on Twitter, Facebook or G+.

For more details on Preemptible GPU resources, please check out the preemptible documentation, GPU documentation and best practices. For more pricing information, take a look at our Compute Engine pricing page or try out our pricing calculator. If you have questions or feedback, please visit our Getting Help page.

To get started using Preemptible GPUs today; sign up for Google Cloud Platform and get $300 in credits to try out Preemptible GPUs.