How to build a conversational app using Cloud Machine Learning APIs, Part 1

For consumers, conversational apps (such as chatbot) are among the most visible examples of machine learning in action. For developers, building a conversational app is instructive for understanding the value that machine-learning APIs bring to the process of creating completely new user experiences.

In this two-part post, we'll show you how to build an example “tour guide” app for Apple iOS that can see, listen, talk and translate via API.AI (a developer platform for creating conversational experiences) and Google Cloud Machine Learning APIs for Speech, Vision and Translate. You'll also see how easy it is to support multiple languages on these platforms.
The two parts will focus on the following topics:

Part 1
  • Overview
  • Architecture
  • API.AI intents
  • API.AI contexts

Part 2
This post is Part 1. Part 2 will be published in the following weeks.


Using API.AI

API.AI is a platform for building natural and rich conversational experiences. For our example, it will handle all core conversation flows in the tour guide app. (Note that API.AI provides great documentation and a sample app for its iOS SDK. SDKs for other platforms are also available, so you could easily extend this tour guide app to support Android.)

Create Agent
The first step is to create a “Tour Guide Agent.”

Create Intents
To engage users in a conversation, we first need to understand what users are saying to the agent. We do that with intents and entities. Intents map what your users say to what your conversational experience should do. Entities are used to extract parameter values from use queries.

Each intent contains a set of examples of user input and the desired automated response. To do that, you need to predict what users will say to open the conversation, and then enter those phrases in the “Add user expression” box. This list doesn’t need to be comprehensive. API.AI uses machine learning to train the agent to understand more variations of these examples. Later on, you can train the API.AI agent to understand more variations. For example, go to the Default Welcome Intent and add some user expressions “how are you,” “hello,” “hi” to open the conversation.

The next step after that is to add some more text responses.
Next, it’s time to work on contexts.

Contexts represent the current context of a user’s request. They're helpful for differentiating phrases that may be vague or have different meanings depending on the user’s preferences or geographic location, the current page in an app or the topic of conversation. Let’s look at an example.

User: Where am I?
Bot: Please upload a nearby picture and I can help find out where you are.
[User uploads a picture of Golden Gate Bridge.]
Bot: You are near Golden Gate Bridge.
User: How much is the ticket?
Bot: Golden Gate Bridge is free to visit.
User: When does it close today?
Bot: Golden Gate Bridge is open 24 hours a day, 7 days a week.
User: How do I get there?
[Bot shows a map to Golden Gate Bridge.]

In the above conversation, when user asks “How much is the ticket?” and “When does it close today?” or “How do I get there?”, the bot understands that the context is around Golden Gate Bridge.

The next thing to do is to weave intents and contexts together. For our example, each box in the diagram below is an intent and a context; the arrows indicate the relationships between them.

Output Contexts
Contexts are tied to user sessions (a session ID that you pass in API calls). If a user expression is matched to an intent, the intent can then set an output context to be shared by this expression in the future. You can also add a context when you send the user request to your API.AI agent. In our example, the where intent sets the where output context so that Location intent will be matched in the future.

Input Contexts
Input contexts limit intents to be matched only when certain contexts are set. In our example, location’s input context is set to where. The location intent is matched only when we're under where context.

Here are the steps to generate these intents and contexts:

First, create where intent and add where output context. This is the root in the context tree and has no input context.
Second, create location intent. Add where input context. Reset where output context and add location output context. In our tour guide app, the input context of location is where. When the location intent is detected, the where context needs to be reset so that any subsequent conversation won’t trigger this context again. This is done by setting the lifespan of the output context where to 0. By default, a context has a lifespan of 5 requests or 10 minutes.

Next, create ticket intent. Add location input context. Add location output context so that hours and map intents can continue to use the location context as input context.

You can pass the parameter from the input context with the format of #context.parameter; e.g., pass the location string from intent inquiry-where-location to inquiry.where.location.ticket in the format #inquiry-where-location.location.
Finally, create hours and map intents similar to ticket intent.

Next time

In Part 2, we’ll cover how to use Webhook integrations in API.AI to pass information from a matched intent into a Cloud Functions web service and then get a result. Finally, we’ll cover how to integrate Cloud Vision/Speech/Translation API, including support for Chinese language.

You can download the source code from github.