Author Archives: Research Blog

On the Personalities of Dead Authors



“Great, ice cream for dinner!”

How would you interpret that? If a 6 year old says it, it feels very different than if a parent says it. People are good at inferring the deeper meaning of language based on both the context in which something was said, and their knowledge of the personality of the speaker.

But can one program a computer to understand the intended meaning from natural language in a way similar to us? Developing a system that knows definitions of words and rules of grammar is one thing, but giving a computer conversational context along with the expectations of a speaker’s behaviors and language patterns is quite another!

To tackle this challenge, a Natural Language Understanding research group, led by Ray Kurzweil, works on building systems able to understand natural language at a deeper level. By experimenting with systems able to perceive and project different personality types, it is our goal to enable computers to interpret the meaning of natural language similar to the way we do.

One way to explore this research is to build a system capable of sentence prediction. Can we build a system that can, given a sentence from a book and knowledge of the author’s style and “personality”, predict what the author is most likely to write next?

We started by utilizing the works of a thousand different authors found on Project Gutenberg to see if we could train a Deep Neural Network (DNN) to predict, given an input sentence, what sentence would come next. The idea was to see whether a DNN could - given millions of lines from a jumble of authors - “learn” a pattern or style that would lead one sentence to follow another.

This initial system had no author ID at the input - we just gave it pairs (line, following line) from 80% of the literary sample (saving 20% of it as a validation holdout). The labels at the output of the network are a simple YES or NO, depending on whether the example was truly a pair of sentences in sequence from the training data, or a randomly matched pair. This initial system had an error rate of 17.2%, where a random guess would be 50%. A slightly more sophisticated version also adds a fixed number of previous sentences for context, which decreased the error down to 12.8%.
We then improved that initial system by giving the network an additional signal per example: a unique ID representing the author. We told it who was saying what. All examples from that author were now accompanied by this ID during training time. The new system learned to leverage the Author ID, and decreased the relative error by 12.3% compared to the previous system (from 12.8% down to 11.1%). At some level, the system is saying “I've been told that this is Shakespeare, who tends to write like this, so I'll take that into account when weighing which sentence is more likely to follow”. On a slightly different ranking task (pick which of two responses most likely follows, instead of just a yes/no on a given trigger/response pair), including the fixed window of previous sentences along with this author ID resulted in an error rate of less than 5%.

The 300 dimensional vectors our system derived to do these predictions are presumably representative of the Author’s word choice, thinking, and style. We call these “Author vectors”, analogous to word vectors or paragraph vectors. To get an intuitive sense of what these vectors are capturing, we projected the 300 dimensional space into two dimensions and plotted them as shown in the figure below. This gives some semblance of similarity and relative positions of authors in the space.
A two-dimensional representation of the vector embeddings for some of the authors in our study. To project the 300 dimensional vectors to two dimensions, we used the t-SNE algorithm. Note that contemporaries and influencers tend to be near each other (E.g., Nathaniel Hawthorne and Herman Melville, or Marlowe and Shakespeare).
It is interesting to consider which dimensions are most pertinent to defining personality and style, and which are more related to content or areas of interest. In the example above, we find Shakespeare and Marlowe in adjacent space. At the very least, these two dimensions reflect similarities of contemporary authors, but are there also measurable variables corresponding to “snark”, or humor, or sarcasm? Or perhaps there is something related to interests in sports?

With this working, we wondered, “How would the model respond to the questions of a personality test?” But to simulate how different authors might respond to questions found in such tests, we needed a NN that, rather than strictly making a yes/no decision, would produce a yes/no decision while being influenced by the author vector - including sentences it hasn't seen before.

To simulate different authors’ responses to questions, we use the author vectors described above as inputs to our more general networks. In that way, we get the performance and generalization of the network across all authors and text it learned on, but influenced by what’s unique to a chosen author. Combined with our generative model, these vectors allow us to generate responses as different authors. In effect, one can chat with a statistical representation of the text written by Shakespeare!

Once we set the author vector for a chosen author, we posed the Myers Briggs questions to the system as the “current sentence”, set the author vector for the chosen author, and gave the Myers Briggs response options as the next-sentence candidates. When we asked “Are you more of”: “a private person” or “an outgoing person” to our model of Shakespeare’s texts, it predicted “a private person”. When we changed the author vector to Mark Twain and pose the same question, we got “an outgoing person”.
If you're interested in more predictions our models made, here's the complete list for the small dataset of authors that we used. We have no reason to believe that these assessments are particularly accurate, since our systems weren't trained to do that well. Also, the responses are based on the writings of the author. Dialogs from fictional characters are not necessarily representative of the author’s actual personality. But we do know that these kinds of text-based systems can predict these kinds of classifications (for example this UPenn study used language use in public posts to predict users' personality traits). So we thought it would be interesting to see what we could get from our early models.

Though we can in no way claim that these models accurately respond with with the authors would have said, there are a few amusing anecdotes. When asked “Who is your favorite author?” and gave the options “Mark Twain”, “William Shakespeare”, “Myself”, and “Nobody”, the Twain model responded with “Mark Twain” and the Shakespeare model responded with “William Shakespeare”. Another example comes from the personality test: “When the phone rings” Shakespeare's model “hope[s] someone else will answer”, while Twain's “[tries] to get to it first”. Fitting, perhaps, since the telephone was patented during Twain's lifetime, but after Shakespeare.

This work is an early step towards better understanding intent, and how long-term context influences interpretation of text. In addition to being fun and interesting, this work has the potential to enrich products through personalization. For example, it could help provide more personalized response options for the recently introduced Smart Reply feature in Inbox by Gmail.

Google Science Fair 2016: #howcanwe make things better with science?



(Cross-posted from the Google for Education blog.)

Editor's note: The 2016 Google Science Fair opens for submissions today. Together with LEGO Education, National Geographic, Scientific American and Virgin Galactic, we’re inviting all young explorers and innovators to make something better through science and engineering. To learn more about the competition, how to enter, prize details and more, visit the site, and follow along on Google+ and Twitter

In this post, 2015 Grand Prize winner, Olivia Hallisey, joins us to reflect back on her own experience with Google Science Fair.
I remember the day I first heard about the Google Science Fair last year. I was sitting in my 10th grade science class when my teacher asked us: “What will you try?” I loved the invitation—and the challenge—that the Google Science Fair offered. It was a chance to use science to do something that could really make a difference in the world.

I had always been curious and interested in science, and knew I wanted to submit a project, but didn’t really know exactly where to begin. I asked my teacher for his advice on selecting a research topic. He encouraged me to choose something that I felt passionate about, or something that outraged me, and told me to look at the world around me for inspiration. So I did. At that time, the Ebola crisis was all over the news. It was a devastating situation and I wanted to help be a part of the solution. I had found my project.

With the outbreak spreading so quickly, I decided that I wanted to find a way to diagnose the virus earlier so that treatment could be delivered as quickly as possible to those who were affected. I read online about silk’s amazing storage and stabilizing properties, and wondered if I could use silk to transport antibodies that could test for the virus. After many failed attempts (and cutting up lots of cocoons) I finally succeeded in creating a temperature-independent, portable, and inexpensive diagnostic test that could detect the Ebola virus in under 30 minutes. I was really excited that my research could help contribute to saving lives, and I was proud to be selected as the Grand Prize winner a few months later.

As the 2016 Google Science Fair launches today, I wanted to share a few tips from my own experience: First, as my teacher once guided me to do, look at the world around you for ideas. If you’re stuck, try the Make Better Generator to find something that excites or inspires you. Second, find a mentor who’s interested in the same things as you. There are a lot of helpful ideas on the GSF site to get you started. And finally, don’t get discouraged—often what first appears like failure can teach you so much more.

I urge other teenagers like me to take this opportunity to find a way to make the world around them better. Every one of us, no matter our age or background, can make a difference—and as young people, we’re not always so afraid to try things that adults think will fail. But change doesn’t happen overnight, and it often starts with a question. So look at the world around you and challenge yourself to make something better.
Science isn’t just a subject—it’s a way to make things better. So I hope you’ll join the conversation and enter the Google Science Fair this year. Our world is waiting to see what you come up with!

Exploring the Intersection of Art and Machine Intelligence



In June of last year, we published a story about a visualization techniques that helped to understand how neural networks carried out difficult visual classification tasks. In addition to helping us gain a deeper understanding of how NNs worked, these techniques also produced strange, wonderful and oddly compelling images.

Following that blog post, and especially after we released the source code, dubbed DeepDream, we witnessed a tremendous interest not only from the machine learning community but also from the creative coding community. Additionally, several artists such as Amanda Peterson (aka Gucky), Memo Akten, Samim Winiger, Kyle McDonald and many others immediately started experimenting with the technique as a new way to create art.
GCHQ”, 2015, Memo Akten, used with permission.
Soon after, the paper A Neural Algorithm of Artistic Style by Leon Gatys in Tuebingen was released. Their technique used a convolutional neural network to factor images into their separate style and content components. This in turn allowed the creation, by using a neural network as a generic image parser, of new images that combined the style of one with the content of another. Once again it took the creative coding community by storm and immediately many artists and coders began experimenting with the new algorithm, resulting in Twitter bots and other explorations and experiments.
The style transfer algorithm crosses a photo with a painting style; for example Neil deGrasse Tyson in the style of Kadinsky’s Jane Rouge Bleu. Photo by Guillaume Piolle, used with permission.
The open-source deep-learning community, especially projects such as GitXiv, hugely contributed to the spread, accessibility and development of these algorithms. Both DeepDream and style transfer were rapidly implemented in a plethora of different languages and deep learning packages. Immediately others took the techniques and developed them further.
“Saxophone dreams” - Mike Tyka.
With machine learning as field moving forward at a breakneck pace and rapidly becoming part of many -- if not most -- online products, the opportunities for artistic uses are as wide as they are unexplored and perhaps overlooked. However the interest is growing rapidly: the University of London is now offering a course on Machine learning and art. NYU ITP offers a similar program this year. The Tate Modern’s IK Prize 2016 topic: Artificial Intelligence.

These are exciting early days, and we want to continue to stimulate artistic interest in these emerging technologies. To that end, we are announcing a two day DeepDream event in San Francisco at the Gray Area Foundation for the Arts, aimed at showcasing some of the latest exploration of the intersection of Machine Intelligence and Art, and spurring discussion focused around future directions:
  • Friday Feb 26th: DeepDream: The Art of Neural Networks, an exhibit consisting of 29 neural network generated artworks, created by artists at Google and from around the world. The works will be auctioned, with all proceeds going to the Gray Area Foundation, which has been active in supporting the intersection between arts and technology for over 10 years.
  • On Saturday Feb 27th: Art and Machine Learning Symposium, an open one-day symposium on Machine Learning and Art, aiming to bring together the neural network and the creative coding communities to exchange ideas, learn and discuss. Videos of all the talks will be posted online after the event.
We look forward to sharing some of the interesting works of art generated by the art and machine learning community, and being part of the discussion of how art and technology can be combined.

Text-to-Speech for low resource languages (episode 3): But can it say “Google”?



This is the third episode in the series of posts reporting on the work we are doing to build text-to-speech (TTS) systems for low resource languages. In the first episode, we described the crowdsourced acoustic data collection effort for Project Unison. In the second episode, we described how we built parametric voices based on that data. In this episode, we look at how we are compiling a pronunciation lexicon for a TTS system.

In Project Unison we are developing ways to bring Google's spoken language technology to the world’s major languages. As part of this broader goal, we are piloting a process for building a text-to-speech (TTS) system that can speak Bengali (Bangla). While our exploration of new methods have allowed us to gather sufficient data to train a statistical parametric voice capable of speaking Bengali, we had to address the next challenge: How do we make the voice sound like it is fluent in that language?

When people learn foreign languages, they are usually expected to pick up the full details from repeated exposure once they've mastered the basics and reached sufficient fluency. Often second-language learners struggle with issues that may seem so natural to fluent speakers that they are taken for granted. For instance, in order to read text out loud, one must know how to read different kinds of numerical expressions (e.g. dates, times, phone numbers, Roman numerals), and how to pronounce a wide variety of words, ranging from native words to newly coined brand names to loanwords, which themselves can arrive from different source languages. As TTS systems heavily rely on machine learning, they tend to face similar challenges as human learners: the way words are pronounced is often complex, sometimes surprising, and rarely fully documented.

Take the Bengali word meaning "microscope", which is অণুবীক্ষণ. Its pronunciation can be transcribed in the International Phonetic Alphabet as /o.nu.bik.kʰɔn/. When our system encounters this word, it analyzes the spelling in Bengali script into abstract written units called graphemes and then predicts the spoken sounds, or phonemes, of Bengali phonology from these graphemes.
The correspondence between graphemes and phonemes varies along several dimensions. One dimension is horizontal complexity: in many cases a single grapheme corresponds to a single phoneme, but the Bengali ligature ক্ষ is special, as several graphemes correspond to several phonemes in a somewhat surprising way. Another dimension is vertical predictability: a grapheme may correspond to different phonemes in different contexts and the correct phoneme may be difficult to predict. The Bengali grapheme “a” / “-a” is both very frequent and its pronunciation very unpredictable. It either corresponds to the phoneme /o/ or to the phoneme /ɔ/ or it is not pronounced. In the table above, we see all three possibilities within one word. As is standard in speech processing, our approach relies on human experts who transcribe words into phoneme sequences and machine learning models that capture the complex aspect of the grapheme-phoneme correspondence.

In order for our Bengali TTS system to pronounce the words in a sentence, it relies on a pronunciation dictionary, or lexicon, that provides pronunciations of a number of common words. When a word is not in the lexicon, it falls back on a machine learning model that was trained on thousands of pronunciations, which can then provide a pretty good guess at how a previously unseen word is pronounced. With a sufficiently large pronunciation dictionary, the system can be expected to reach a high level of fluency.

We first started compiling a Bengali pronunciation lexicon, with our Bangladeshi linguists transcribing a few thousand words into phonemes. This work was done in a web application that had been custom built for this purpose. Just like an earlier version, this transcription tool supports the work of linguists by providing a virtual keyboard for entering phonemes.

Once a few thousand words had been transcribed, we trained a machine learning system that could predict phonemic transcriptions for previously unseen words, so that the linguists only had to correct the output of that system. After the TTS voice had been built, it also became possible to listen to the voice reading out the entered transcriptions.

Even before the first machine learning model had been trained for Bengali, we configured the transcription tool to provide some constraints on how words could be transcribed. Bengali, like most writing systems, has certain aspects that make it complex, while in other ways it is quite regular. As discussed above, the grapheme “a” (অ) can have different pronunciations depending on context, but its pronunciation does not vary wildly: it is either silent or pronounced as a vowel, never as a consonant. By incorporating constraints on which graphemes can correspond to which phonemes, we can easily identify unlikely or erroneous transcriptions. This methodology has been in use at Google for several years.

The grapheme-phoneme correspondence varies along several dimensions, including regular words vs. abbreviations, and native words vs. loanwords. For example the word meaning "doctor" and pronounced /ɖɔk.ʈor/ can be written in several ways in Bengali: in Bengali script as ডক্টর or as the abbreviation ডঃ; and in Latin script as the English loanword doctor or as the abbreviation Dr. A TTS system should accept all ways of writing this word, hence all written variations are in our pronunciation lexicon.

A Bengali TTS voice should further be able to pronounce a variety of common brand names written in Latin script. The linguists from Project Unison therefore transcribed a few thousand such words phonemically into Bengali. For example, "WhatsApp" was transcribed /ho.aʈs.æp/, and "Google" was straightforwardly transcribed as /gu.gol/ just as if it had been spelled গুগল.

Overall our linguists transcribed more than 65,000 Bengali words into phonemic notation. In an effort to contribute to the community working on speech synthesis, speech recognition, and related natural language efforts, we are releasing our Bengali pronunciation dictionary under a
Creative Commons License (CC BY 4.0). It is our hope that this will be a valuable resource for researchers and developers who are improving the state of spoken language systems.

Despite our efforts, this Bengali dictionary is incomplete and contains residual errors. As a work-in-progress it will continue to improve over time. We are hoping that other natural language and speech researchers will join us in making available more datasets under open licenses. As we refine our development process and extend it to more languages, we are planning on releasing additional datasets for other languages in the future.

NEXT UP: One Down, 299 to Go (Ep 4)

Running your models in production with TensorFlow Serving



Machine learning powers many Google product features, from speech recognition in the Google app to Smart Reply in Inbox to search in Google Photos. While decades of experience have enabled the software industry to establish best practices for building and supporting products, doing so for services based upon machine learning introduces new and interesting challenges.

Today, we announce the release of TensorFlow Serving, designed to address some of these challenges. TensorFlow Serving is a high performance, open source serving system for machine learning models, designed for production environments and optimized for TensorFlow.
TensorFlow Serving is ideal for running multiple models, at large scale, that change over time based on real-world data, enabling:
  • model lifecycle management
  • experiments with multiple algorithms
  • efficient use of GPU resources
TensorFlow Serving makes the process of taking a model into production easier and faster. It allows you to safely deploy new models and run experiments while keeping the same server architecture and APIs. Out of the box it provides integration with TensorFlow, but it can be extended to serve other types of models.

Here’s how it works. In the simplified, supervised training pipeline shown below, training data is fed to the learner, which outputs a model:
Once a new model version becomes available, upon validation, it is ready to be deployed to the serving system, as shown below.
TensorFlow Serving uses the (previously trained) model to perform inference - predictions based on new data presented by its clients. Since clients typically communicate with the serving system using a remote procedure call (RPC) interface, TensorFlow Serving comes with a reference front-end implementation based on gRPC, a high performance, open source RPC framework from Google.

It is quite common to launch and iterate on your model over time, as new data becomes available, or as you improve the model. In fact, at Google, many pipelines run continuously, producing new model versions as new data becomes available.
TensorFlow Serving is written in C++ and it supports Linux. TensorFlow Serving introduces minimal overhead. In our benchmarks we recoded ~100,000 queries per second (QPS) per core on a 16 vCPU Intel Xeon E5 2.6 GHz machine, excluding gRPC and the TensorFlow inference processing time.

We are excited to share this important component of TensorFlow today under the Apache 2.0 open source license. We would love to hear your questions and feature requests on Stack Overflow and GitHub respectively. To get started quickly, clone the code from github.com/tensorflow/serving and check out this tutorial.

You can expect to keep hearing more about TensorFlow as we continue to develop what we believe to be one of the best machine learning toolboxes in the world. If you'd like to stay up to date, follow @googleresearch or +ResearchatGoogle, and keep an eye out for Jeff Dean's keynote address at GCP Next 2016 in March.

Google Research Awards: Fall 2015



We have just completed another round of the Google Research Awards, our annual open call for proposals on computer science and related topics including machine learning, speech recognition, natural language processing, and computational neuroscience. Our grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google researchers and engineers.

This round we received 950 proposals, an increase of 18% over last round, covering 55 countries and over 350 universities. After expert reviews and committee discussions, we decided to fund 151 projects. This round we increased our support of machine learning projects increased by 71% from last round. Physical interfaces and immersive experiences, a relatively new area for the Google Research Awards, saw a 19% increase in the number of submitted proposals.

Congratulations to the well-deserving recipients of this round’s awards. If you are interested in applying for the next round (deadline is October 15), please visit our website for more information. Please note that we are now moving to an annual cycle.

Announcing the Google Internet of Things (IoT) Technology Research Award Pilot



Over the past year, Google engineers have experimented and developed a set of building blocks for the Internet of Things - an ecosystem of connected devices, services and “things” that promises direct and efficient support of one’s daily life. While there has been significant progress in this field, there remain significant challenges in terms of (1) interoperability and a standardized modular systems architecture, (2) privacy, security and user safety, as well as (3) how users interact with, manage and control an ensemble of devices in this connected environment.

It is in this context that we are happy to invite university researchers1 to participate in the Internet of Things (IoT) Technology Research Award Pilot. This pilot provides selected researchers in-kind gifts of Google IoT related technologies (listed below), with the goal of fostering collaboration with the academic community on small-scale (~4-8 week) experiments, discovering what they can do with our software and devices.

We invite you to submit proposals in which Google IoT technologies are used to (1) explore interesting use cases and innovative user interfaces, (2) address technical challenges as well as interoperability between devices and applications, or (3) experiment with new approaches to privacy, safety and security. Proposed projects should make use of one or a combination of these Google technologies:
  • Google beacon platform - consisting of the open beacon format Eddystone and various client and cloud APIs, this platform allows developers to mark up the world to make your apps and devices work smarter by providing timely, contextual information.
  • Physical Web - based on the Eddystone URL beacon format, the Physical Web is an approach designed to allow any smart device to interact with real world objects - a vending machine, a poster, a toy, a bus stop, a rental car - and not have to download an app first.
  • Nearby Messages API - a publish-subscribe API that lets you pass small binary payloads between internet-connected Android and iOS devices as well as with beacons registered with Google's proximity beacon service.
  • Brillo & Weave - Brillo is an Android-based embedded OS that brings the simplicity and speed of mobile software development to IoT hardware to make it cost-effective to build a secure smart device, and to keep it updated over time. Weave is an open communications and interoperability platform for IoT devices that allows for easy connections to networks, smartphones (both Android and iOS), mobile apps, cloud services, and other smart devices.
  • OnHub router - a communication hub for the Internet of Things supporting Bluetooth® Smart Ready, 802.15.4 and 802.11a/b/g/n/ac. It also allows you to quickly create a guest network and control the devices you want to share (see On.Here).
  • Google Cloud Platform IoT Solutions - tools to scale connections, gather and make sense of data, and provide the reliable customer experiences that IoT hardware devices require.
  • Chrome Boxes & Kiosk Apps - provides custom full screen apps for a purpose-built Chrome device, such as a guest registration desk, a library catalog station, or a point-of-sale system in a store.
  • Vanadium - an open-source framework designed to make it easier to develop, secure, multi-device user experiences, with or without an Internet connection.
Check out the Ubiquity Dev Summit playlist for more information on these platforms and their best practices.

Please submit your proposal here by February 29th in order to be considered for a award. Proposals will be reviewed by researchers and product teams within Google. In addition to looking for impact and interesting ideas, priority will be given to research that can make immediate use of the available technologies. Selected proposals will be notified by the end of March 2016. If selected, the award will be subject to Google’s terms, and your use of Google technologies will be subject to the applicable Google terms of service.

To connect our physical world to the Internet is a broad and long-term challenge, one we hope to address by working with researchers across many disciplines and work practices. We are looking forward to the collaborative opportunity provided by this pilot, and learning about innovative applications you create for these new technologies.



1 The same eligibility conditions as for the Faculty Research Award Program apply - see here.

AlphaGo: Mastering the ancient game of Go with Machine Learning



Games are a great testing ground for developing smarter, more flexible algorithms that have the ability to tackle problems in ways similar to humans. Creating programs that are able to play games better than the best humans has a long history - the first classic game mastered by a computer was noughts and crosses (also known as tic-tac-toe) in 1952 as a PhD candidate’s project. Then fell checkers in 1994. Chess was tackled by Deep Blue in 1997. The success isn’t limited to board games, either - IBM's Watson won first place on Jeopardy in 2011, and in 2014 our own algorithms learned to play dozens of Atari games just from the raw pixel inputs.

But one game has thwarted A.I. research thus far: the ancient game of Go. Invented in China over 2500 years ago, Go is played by more than 40 million people worldwide. The rules are simple: players take turns to place black or white stones on a board, trying to capture the opponent's stones or surround empty space to make points of territory. Confucius wrote about the game, and its aesthetic beauty elevated it to one of the four essential arts required of any true Chinese scholar. The game is played primarily through intuition and feel, and because of its subtlety and intellectual depth it has captured the human imagination for centuries.

But as simple as the rules are, Go is a game of profound complexity. The search space in Go is vast -- more than a googol times larger than chess (a number greater than there are atoms in the universe!). As a result, traditional “brute force” AI methods -- which construct a search tree over all possible sequences of moves -- don’t have a chance in Go. To date, computers have played Go only as well as amateurs. Experts predicted it would be at least another 10 years until a computer could beat one of the world’s elite group of Go professionals.

We saw this as an irresistible challenge! We started building a system, AlphaGo, described in a paper in Nature this week, that would overcome these barriers. The key to AlphaGo is reducing the enormous search space to something more manageable. To do this, it combines a state-of-the-art tree search with two deep neural networks, each of which contains many layers with millions of neuron-like connections. One neural network, the “policy network”, predicts the next move, and is used to narrow the search to consider only the moves most likely to lead to a win. The other neural network, the “value network”, is then used to reduce the depth of the search tree -- estimating the winner in each position in place of searching all the way to the end of the game.

AlphaGo’s search algorithm is much more human-like than previous approaches. For example, when Deep Blue played chess, it searched by brute force over thousands of times more positions than AlphaGo. Instead, AlphaGo looks ahead by playing out the remainder of the game in its imagination, many times over - a technique known as Monte-Carlo tree search. But unlike previous Monte-Carlo programs, AlphaGo uses deep neural networks to guide its search. During each simulated game, the policy network suggests intelligent moves to play, while the value network astutely evaluates the position that is reached. Finally, AlphaGo chooses the move that is most successful in simulation.

We first trained the policy network on 30 million moves from games played by human experts, until it could predict the human move 57% of the time (the previous record before AlphaGo was 44%). But our goal is to beat the best human players, not just mimic them. To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and gradually improving them using a trial-and-error process known as reinforcement learning. This approach led to much better policy networks, so strong in fact that the raw neural network (immediately, without any tree search at all) can defeat state-of-the-art Go programs that build enormous search trees.

These policy networks were in turn used to train the value networks, again by reinforcement learning from games of self-play. These value networks can evaluate any Go position and estimate the eventual winner - a problem so hard it was believed to be impossible.

Of course, all of this requires a huge amount of compute power, so we made extensive use of Google Cloud Platform, which enables researchers working on AI and Machine Learning to access elastic compute, storage and networking capacity on demand. In addition, new open source libraries for numerical computation using data flow graphs, such as TensorFlow, allow researchers to efficiently deploy the computation needed for deep learning algorithms across multiple CPUs or GPUs.

So how strong is AlphaGo? To answer this question, we played a tournament between AlphaGo and the best of the rest - the top Go programs at the forefront of A.I. research. Using a single machine, AlphaGo won all but one of its 500 games against these programs. In fact, AlphaGo even beat those programs after giving them 4 free moves headstart at the beginning of each game. A high-performance version of AlphaGo, distributed across many machines, was even stronger.
This figure from the Nature article shows the Elo rating and approximate rank of AlphaGo (both single machine and distributed versions), the European champion Fan Hui (a professional 2-dan), and the strongest other Go programs, evaluated over thousands of games. Pale pink bars show the performance of other programs when given a four move headstart.
It seemed that AlphaGo was ready for a greater challenge. So we invited the reigning 3-time European Go champion Fan Hui — an elite professional player who has devoted his life to Go since the age of 12 — to our London office for a challenge match. The match was played behind closed doors between October 5-9 last year. AlphaGo won by 5 games to 0 -- the first time a computer program has ever beaten a professional Go player.
AlphaGo’s next challenge will be to play the top Go player in the world over the last decade, Lee Sedol. The match will take place this March in Seoul, South Korea. Lee Sedol is excited to take on the challenge saying, "I am privileged to be the one to play, but I am confident that I can win." It should prove to be a fascinating contest!

We are thrilled to have mastered Go and thus achieved one of the grand challenges of AI. However, the most significant aspect of all this for us is that AlphaGo isn’t just an ‘expert’ system built with hand-crafted rules, but instead uses general machine learning techniques to allow it to improve itself, just by watching and playing games. While games are the perfect platform for developing and testing AI algorithms quickly and efficiently, ultimately we want to apply these techniques to important real-world problems. Because the methods we have used are general purpose, our hope is that one day they could be extended to help us address some of society’s toughest and most pressing problems, from climate modelling to complex disease analysis.

Teach Yourself Deep Learning with TensorFlow and Udacity



Deep learning has become one of the hottest topics in machine learning in recent years. With TensorFlow, the deep learning platform that we recently released as an open-source project, our goal was to bring the capabilities of deep learning to everyone. So far, we are extremely excited by the uptake: more than 4000 users have forked it on GitHub in just a few weeks, and the project has been starred more than 16000 times by enthusiasts around the globe.

To help make deep learning even more accessible to engineers and data scientists at large, we are launching a new Deep Learning Course developed in collaboration with Udacity. This short, intensive course provides you with all the basic tools and vocabulary to get started with deep learning, and walks you through how to use it to address some of the most common machine learning problems. It is also accompanied by interactive TensorFlow notebooks that directly mirror and implement the concepts introduced in the lectures.
The course consists of four lectures which provide a tour of the main building blocks that are used to solve problems ranging from image recognition to text analysis. The first lecture focuses on the basics that will be familiar to those already versed in machine learning: setting up your data and experimental protocol, and training simple classification models. The second lecture builds on these fundamentals to explore how these simple models can be made deeper, and more powerful, and explores all the scalability problems that come with that, in particular regularization and hyperparameter tuning. The third lecture is all about convolutional networks and image recognition. The fourth and final lecture explore models for text and sequences in general, with embeddings and recurrent neural networks. By the end of the course, you will have implemented and trained this variety of models on your own machine and will be ready to transfer that knowledge to solve your own problems!

Our overall goal in designing this course was to provide the machine learning enthusiast a rapid and direct path to solving real and interesting problems with deep learning techniques, and we're now very excited to share what we've built! It has been a lot of fun putting together with the fantastic team of experts in online course design and production at Udacity. For more details, see the Udacity blog post, and register for the course. We hope you enjoy it!

Why attend USENIX Enigma?



Last August, we announced USENIX Enigma, a new conference intended to shine a light on great, thought-provoking research in security, privacy, and electronic crime. With Enigma beginning in just a few short weeks, I wanted to share a couple of the reasons I’m personally excited about this new conference.

Enigma aims to bridge the divide that exists between experts working in academia, industry, and public service, explicitly bringing researchers from different sectors together to share their work. Our speakers include those spearheading the defense of digital rights (Electronic Frontier Foundation, Access Now), practitioners at a number of well known industry leaders (Akamai, Blackberry, Facebook, LinkedIn, Netflix, Twitter), and researchers from multiple universities in the U.S. and abroad. With the diverse session topics and organizations represented, I expect interesting—and perhaps spirited—coffee break and lunchtime discussions among the equally diverse list of conference attendees.

Of course, I’m very proud to have some of my Google colleagues speaking at Enigma:

  • Adrienne Porter Felt will talk about blending research and engineering to solve usable security problems. You’ll hear how Chrome’s usable security team runs user studies and experiments to motivate engineering and design decisions. Adrienne will share the challenges they’ve faced when trying to adapt existing usable security research to practice, and give insight into how they’ve achieved successes.
  • Ben Hawkes will be speaking about Project Zero, a security research team dedicated to the mission of, “making 0day hard.” Ben will talk about why Project Zero exists, and some of the recent trends and technologies that make vulnerability discovery and exploitation fundamentally harder.
  • Elie Bursztein will go through key lessons the Gmail team learned over the past 11 years while protecting users from spam, phishing, malware, and web attacks. Illustrated with concrete numbers and examples from one of the largest email systems on the planet, attendees will gain insight into specific techniques and approaches useful in fighting abuse and securing their online services.

In addition to raw content, my Program Co-Chair, David Brumley, and I have prioritized talk quality. Researchers dedicate months or years of their time to thinking about a problem and conducting the technical work of research, but a common criticism of technical conferences is that the actual presentation of that research seems like an afterthought. Rather than be a regurgitation of a research paper in slide format, a presentation is an opportunity for a researcher to explain the context and impact of their work in their own voice; a chance to inspire the audience to want to learn more or dig deeper. Taking inspiration from the TED conference, Enigma will have shorter presentations, and the program committee has worked with each speaker to help them craft the best version of their talk.

Hope to see some of you at USENIX Enigma later this month!