Tag Archives: machine learning

Google at NIPS 2017



This week, Long Beach, California hosts the 31st annual Conference on Neural Information Processing Systems (NIPS 2017), a machine learning and computational neuroscience conference that includes invited talks, demonstrations and presentations of some of the latest in machine learning research. Google will have a strong presence at NIPS 2017, with over 450 Googlers attending to contribute to, and learn from, the broader academic research community via technical talks and posters, workshops, competitions and tutorials.

Google is at the forefront of machine learning, actively exploring virtually all aspects of the field from classical algorithms to deep learning and more. Focusing on both theory and application, much of our work on language understanding, speech, translation, visual processing, and prediction relies on state-of-the-art techniques that push the boundaries of what is possible. In all of those tasks and many others, we develop learning approaches to understand and generalize, providing us with new ways of looking at old problems and helping transform how we work and live.

If you are attending NIPS 2017, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for billions of people, and to see demonstrations of some of the exciting research we pursue. You can also learn more about our work being presented in the list below (Googlers highlighted in blue).

Google is a Platinum Sponsor of NIPS 2017.

Organizing Committee
Program Chair: Samy Bengio
Senior Area Chairs include: Corinna Cortes, Dale Schuurmans, Hugo Larochelle
Area Chairs include: Afshin Rostamizadeh, Amir Globerson, Been Kim, D. Sculley, Dumitru Erhan, Gal Chechik, Hartmut Neven, Honglak Lee, Ian Goodfellow, Jasper Snoek, John Wright, Jon Shlens, Kun Zhang, Lihong Li, Maya Gupta, Moritz Hardt, Navdeep Jaitly, Ryan Adams, Sally Goldman, Sanjiv Kumar, Surya Ganguli, Tara Sainath, Umar Syed, Viren Jain, Vitaly Kuznetsov

Invited Talk
Powering the next 100 years
John Platt

Accepted Papers
A Meta-Learning Perspective on Cold-Start Recommendations for Items
Manasi Vartak, Hugo Larochelle, Arvind Thiagarajan

AdaGAN: Boosting Generative Models
Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf

Deep Lattice Networks and Partial Monotonic Functions
Seungil You, David Ding, Kevin Canini, Jan Pfeifer, Maya Gupta

From which world is your graph
Cheng Li, Varun Kanade, Felix MF Wong, Zhenming Liu

Hiding Images in Plain Sight: Deep Steganography
Shumeet Baluja

Improved Graph Laplacian via Geometric Self-Consistency
Dominique Joncas, Marina Meila, James McQueen

Model-Powered Conditional Independence Test
Rajat Sen, Ananda Theertha Suresh, Karthikeyan Shanmugam, Alexandros Dimakis, Sanjay Shakkottai

Nonlinear random matrix theory for deep learning
Jeffrey Pennington, Pratik Worah

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
Jeffrey Pennington, Samuel Schoenholz, Surya Ganguli

SGD Learns the Conjugate Kernel Class of the Network
Amit Daniely

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
Maithra Raghu, Justin Gilmer, Jason Yosinski, Jascha Sohl-Dickstein

Learning Hierarchical Information Flow with Recurrent Neural Modules
Danijar Hafner, Alexander Irpan, James Davidson, Nicolas Heess

Online Learning with Transductive Regret
Scott Yang, Mehryar Mohri

Acceleration and Averaging in Stochastic Descent Dynamics
Walid Krichene, Peter Bartlett

Parameter-Free Online Learning via Model Selection
Dylan J Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan

Dynamic Routing Between Capsules
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton

Modulating early visual processing by language
Harm de Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, Aaron C Courville

MarrNet: 3D Shape Reconstruction via 2.5D Sketches
Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, Bill Freeman, Josh Tenenbaum

Affinity Clustering: Hierarchical Clustering at Scale
Mahsa Derakhshan, Soheil Behnezhad, Mohammadhossein Bateni, Vahab Mirrokni, MohammadTaghi Hajiaghayi, Silvio Lattanzi, Raimondas Kiveris

Asynchronous Parallel Coordinate Minimization for MAP Inference
Ofer Meshi, Alexander Schwing

Cold-Start Reinforcement Learning with Softmax Policy Gradient
Nan Ding, Radu Soricut

Filtering Variational Objectives
Chris J Maddison, Dieterich Lawson, George Tucker, Mohammad Norouzi, Nicolas Heess, Andriy Mnih, Yee Whye Teh, Arnaud Doucet

Multi-Armed Bandits with Metric Movement Costs
Tomer Koren, Roi Livni, Yishay Mansour

Multiscale Quantization for Fast Similarity Search
Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel Holtmann-Rice, David Simcha, Felix Yu

Reducing Reparameterization Gradient Variance
Andrew Miller, Nicholas Foti, Alexander D'Amour, Ryan Adams

Statistical Cost Sharing
Eric Balkanski, Umar Syed, Sergei Vassilvitskii

The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings
Krzysztof Choromanski, Mark Rowland, Adrian Weller

Value Prediction Network
Junhyuk Oh, Satinder Singh, Honglak Lee

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
George Tucker, Andriy Mnih, Chris J Maddison, Dieterich Lawson, Jascha Sohl-Dickstein

Approximation and Convergence Properties of Generative Adversarial Learning
Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri

Attention is All you Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin

PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
Jonathan Huggins, Ryan Adams, Tamara Broderick

Repeated Inverse Reinforcement Learning
Kareem Amin, Nan Jiang, Satinder Singh

Fair Clustering Through Fairlets
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Sergei Vassilvitskii

Affine-Invariant Online Optimization and the Low-rank Experts Problem
Tomer Koren, Roi Livni

Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
Sergey Ioffe

Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

Discriminative State Space Models
Vitaly Kuznetsov, Mehryar Mohri

Dynamic Revenue Sharing
Santiago Balseiro, Max Lin, Vahab Mirrokni, Renato Leme, Song Zuo

Multi-view Matrix Factorization for Linear Dynamical System Estimation
Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari

On Blackbox Backpropagation and Jacobian Sensing
Krzysztof Choromanski, Vikas Sindhwani

On the Consistency of Quick Shift
Heinrich Jiang

Revenue Optimization with Approximate Bid Predictions
Andres Munoz, Sergei Vassilvitskii

Shape and Material from Sound
Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Josh Tenenbaum, Bill Freeman

Learning to See Physics via Visual De-animation
Jiajun Wu, Erika Lu, Pushmeet Kohli, Bill Freeman, Josh Tenenbaum

Conference Demos
Electronic Screen Protector with Efficient and Robust Mobile Vision
Hee Jung Ryu, Florian Schroff

Magenta and deeplearn.js: Real-time Control of DeepGenerative Music Models in the Browser
Curtis Hawthorne, Ian Simon, Adam Roberts, Jesse Engel, Daniel Smilkov, Nikhil Thorat, Douglas Eck

Workshops
6th Workshop on Automated Knowledge Base Construction (AKBC) 2017
Program Committee includes: Arvind Neelakanta
Authors include: Jiazhong Nie, Ni Lao

Acting and Interacting in the Real World: Challenges in Robot Learning
Invited Speakers include: Pierre Sermanet

Advances in Approximate Bayesian Inference
Panel moderator: Matthew D. Hoffman

Conversational AI - Today's Practice and Tomorrow's Potential
Invited Speakers include: Matthew Henderson, Dilek Hakkani-Tur
Organizers include: Larry Heck

Extreme Classification: Multi-class and Multi-label Learning in Extremely Large Label Spaces
Invited Speakers include: Ed Chi, Mehryar Mohri

Learning in the Presence of Strategic Behavior
Invited Speakers include: Mehryar Mohri
Presenters include: Andres Munoz Medina, Sebastien Lahaie, Sergei Vassilvitskii, Balasubramanian Sivan

Learning on Distributions, Functions, Graphs and Groups
Invited speakers include: Corinna Cortes

Machine Deception
Organizers include: Ian Goodfellow
Invited Speakers include: Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow

Machine Learning and Computer Security
Invited Speakers include: Ian Goodfellow
Organizers include: Nicolas Papernot
Authors include: Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow

Machine Learning for Creativity and Design
Keynote Speakers include: Ian Goodfellow
Organizers include: Doug Eck, David Ha

Machine Learning for Audio Signal Processing (ML4Audio)
Authors include: Aren Jansen, Manoj Plakal, Dan Ellis, Shawn Hershey, Channing Moore, Rif A. Saurous, Yuxuan Wang, RJ Skerry-Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric Batternberg, Rob Clark

Machine Learning for Health (ML4H)
Organizers include: Jasper Snoek, Alex Wiltschko
Keynote: Fei-Fei Li

NIPS Time Series Workshop 2017
Organizers include: Vitaly Kuznetsov
Authors include: Brendan Jou

OPT 2017: Optimization for Machine Learning
Organizers include: Sashank Reddi

ML Systems Workshop
Invited Speakers include: Rajat Monga, Alexander Mordvintsev, Chris Olah, Jeff Dean
Authors include: Alex Beutel, Tim Kraska, Ed H. Chi, D. Scully, Michael Terry

Aligned Artificial Intelligence
Invited Speakers include: Ian Goodfellow

Bayesian Deep Learning
Organizers include: Kevin Murphy
Invited speakers include: Nal Kalchbrenner, Matthew D. Hoffman

BigNeuro 2017
Invited speakers include: Viren Jain

Cognitively Informed Artificial Intelligence: Insights From Natural Intelligence
Authors include: Jiazhong Nie, Ni Lao

Deep Learning At Supercomputer Scale
Organizers include: Erich Elsen, Zak Stone, Brennan Saeta, Danijar Haffner

Deep Learning: Bridging Theory and Practice
Invited Speakers include: Ian Goodfellow

Interpreting, Explaining and Visualizing Deep Learning
Invited Speakers include: Been Kim, Honglak Lee
Authors include: Pieter Kinderman, Sara Hooker, Dumitru Erhan, Been Kim

Learning Disentangled Features: from Perception to Control
Organizers include: Honglak Lee
Authors include: Jasmine Hsu, Arkanath Pathak, Abhinav Gupta, James Davidson, Honglak Lee

Learning with Limited Labeled Data: Weak Supervision and Beyond
Invited Speakers include: Ian Goodfellow

Machine Learning on the Phone and other Consumer Devices
Invited Speakers include: Rajat Monga
Organizers include: Hrishikesh Aradhye
Authors include: Suyog Gupta, Sujith Ravi

Optimal Transport and Machine Learning
Organizers include: Olivier Bousquet

The future of gradient-based machine learning software & techniques
Organizers include: Alex Wiltschko, Bart van Merriënboer

Workshop on Meta-Learning
Organizers include: Hugo Larochelle
Panelists include: Samy Bengio
Authors include: Aliaksei Severyn, Sascha Rothe

Symposiums
Deep Reinforcement Learning Symposium
Authors include: Benjamin Eysenbach, Shane Gu, Julian Ibarz, Sergey Levine

Interpretable Machine Learning
Authors include: Minmin Chen

Metalearning
Organizers include: Quoc V Le

Competitions
Adversarial Attacks and Defences
Organizers include: Alexey Kurakin, Ian Goodfellow, Samy Bengio

Competition IV: Classifying Clinically Actionable Genetic Mutations
Organizers include: Wendy Kan

Tutorial
Fairness in Machine Learning
Solon Barocas, Moritz Hardt


Machine learning gives environmentalists something to tweet about

Editor’s note: TensorFlow, our open source machine learning library, is just that—open to anyone. Companies, nonprofits, researchers and developers have used TensorFlow in some pretty cool ways, and we’re sharing those stories here on Keyword. Here’s one of them.


Victor Anton captured tens of thousands of birdsong recordings, collected over a three-year period. But he had no way to figure out which birdsong belonged to what bird.

The recordings, taken at 50 locations around a bird sanctuary in New Zealand known as “Zelandia,” were part of an effort to better understand the movement and numbers of threatened species including the Hihi, Tīeke and Kākāriki. Because researchers didn’t have reliable information about where the birds were and how they moved about, it was difficult to make good decisions about where to target conservation efforts on the ground.

Endangered species include the Kākāriki, Hihi, and Tīekei
Endangered species include the Kākāriki, Hihi, and Tīekei.

That’s where the recordings come in. Yet the amount of audio data was overwhelming. So Victor—a Ph.D. student at Victoria University of Wellington, New Zealand—and his team turned to technology.

“We knew we had lots of incredibly valuable data tied up in the recordings, but we simply didn’t have the manpower or a viable solution that would help us unlock this,” Victor tells us. “So we turned to machine learning to help us.”

Some of the audio recorders set up at 50 sites around the sanctuary
Some of the audio recorders set up at 50 sites around the sanctuary.

In one of the quirkier applications of machine learning, they trained a Google TensorFlow-based system to recognize specific bird calls and measure bird activity. The more audio it deciphered, the more it learned, and the more accurate it became.


It worked like this: the AI system used audio that had been recorded and stored, chopping it into minute-long segments, and then converting the file into a spectrogram. After the spectrograms were chopped into chunks, each spanning less than a second, they were processed individually by a deep convolutional neural network. A recurrent neural network then tied together the chunks and produced a continual prediction of which of the three birds was present across the minute-long segment. These segments were compiled to create a fuller picture about the presence and movement of the birds.

TensorFlow processed the Spectograms and learned to identify the calls of different species

TensorFlow processed the Spectograms and learned to identify the calls of different species.

The team faced some unique challenges. They were starting with a small quantity of labelled data, the software would often pick up other noises like construction, cars and even doorbells, and some of the bird species had a variety of birdsongs or two would sing at the same time.

To overcome these hurdles, they tested, verified and retrained the system many times over. As a result, they have learned things that would have otherwise remained locked up in thousands of hours of data. While it’s still early days, already conservation groups are talking to Victor about how they can use these initial results to better target their efforts. Moreover, the team has seen enough encouraging signs that they believe that their tools can be applied to other conservation projects.

“We are only just beginning to understand the different ways we can put machine learning to work in helping us protect different fauna,” says Victor, “ultimately allowing us to solve other environmental challenges across the world.”

Introducing AIY Vision Kit: Make devices that see

Earlier this year, we kicked off AIY Projects to help makers experiment with and learn about artificial intelligence. Our first release, AIY Voice Kit, was a huge hit! People built many amazing projects, showing what was possible with voice recognition in maker projects.

Today, we’re excited to announce our latest AIY Project, the Vision Kit. It’s our first project that features on-device neural network acceleration, providing powerful computer vision without a cloud connection.  

vision-kit-assembly
AIY Vision Kit's do-it-yourself assembly

What’s in the AIY Vision Kit?

Like AIY Voice Kit (released in May), Vision Kit is a do-it-yourself build. You’ll need to add a Raspberry Pi Zero W, a Raspberry Pi Camera, an SD card and a power supply, which must be purchased separately.

The kit includes a cardboard outer shell, the VisionBonnet circuit board, an RGB arcade-style button, a piezo speaker, a macro/wide lens kit, a tripod mounting nut and other connecting components.

vision-kit-exploded
AIY Vision Kit components

The main component of AIY Vision Kit is the VisionBonnet board for Raspberry Pi. The bonnet features the Intel® Movidius™ MA2450, a low-power vision processing unit capable of running neural network models on-device.

vision-kit-bonnet
AIY Vision Kit's VisionBonnet accessory for Raspberry Pi

The provided software includes three TensorFlow-based neural network models for different vision applications. One based on MobileNets can recognize a thousand common objects, a second can recognize faces and their expressions and the third is a person, cat and dog detector. We've also included a tool to compile models for Vision Kit, so you can train and retrain models with TensorFlow on your workstation or any cloud service.

We also provide a Python API that gives you the ability to change the RGB button colors, adjust the piezo element sounds and access the four GPIO pins.

With all of these features, you can explore many creative builds that use computer vision. For example, you can:


  • Identify all kinds of plant and animal species

  • See when your dog is at the back door

  • See when your car left the driveway

  • See that your guests are delighted by your holiday decorations

  • See when your little brother comes into your room (sound the alarm!)

Where can you get it?

AIY Vision Kit will be available in stores in early December. Pre-order your kit today through Micro Center.

** Please note that full assembly requires Raspberry Pi Zero W, Raspberry Pi Camera and a micro SD card, which must be purchased separately.

We're listening

Please let us know how we can improve on future kits and show us what you’re building by using the #AIYProjects hashtag on social media.

We’re excited to see what you build!

Source: Education


Introducing AIY Vision Kit: Make devices that see

Earlier this year, we kicked off AIY Projects to help makers experiment with and learn about artificial intelligence. Our first release, AIY Voice Kit, was a huge hit! People built many amazing projects, showing what was possible with voice recognition in maker projects.

Today, we’re excited to announce our latest AIY Project, the Vision Kit. It’s our first project that features on-device neural network acceleration, providing powerful computer vision without a cloud connection.  

vision-kit-assembly
AIY Vision Kit's do-it-yourself assembly

What’s in the AIY Vision Kit?

Like AIY Voice Kit (released in May), Vision Kit is a do-it-yourself build. You’ll need to add a Raspberry Pi Zero W, a Raspberry Pi Camera, an SD card and a power supply, which must be purchased separately.

The kit includes a cardboard outer shell, the VisionBonnet circuit board, an RGB arcade-style button, a piezo speaker, a macro/wide lens kit, a tripod mounting nut and other connecting components.

vision-kit-exploded
AIY Vision Kit components

The main component of AIY Vision Kit is the VisionBonnet board for Raspberry Pi. The bonnet features the Intel® Movidius™ MA2450, a low-power vision processing unit capable of running neural network models on-device.

vision-kit-bonnet
AIY Vision Kit's VisionBonnet accessory for Raspberry Pi

The provided software includes three TensorFlow-based neural network models for different vision applications. One based on MobileNets can recognize a thousand common objects, a second can recognize faces and their expressions and the third is a person, cat and dog detector. We've also included a tool to compile models for Vision Kit, so you can train and retrain models with TensorFlow on your workstation or any cloud service.

We also provide a Python API that gives you the ability to change the RGB button colors, adjust the piezo element sounds and access the four GPIO pins.

With all of these features, you can explore many creative builds that use computer vision. For example, you can:


  • Identify all kinds of plant and animal species

  • See when your dog is at the back door

  • See when your car left the driveway

  • See that your guests are delighted by your holiday decorations

  • See when your little brother comes into your room (sound the alarm!)

Where can you get it?

AIY Vision Kit will be available in stores in early December. Pre-order your kit today through Micro Center.

** Please note that full assembly requires Raspberry Pi Zero W, Raspberry Pi Camera and a micro SD card, which must be purchased separately.

We're listening

Please let us know how we can improve on future kits and show us what you’re building by using the #AIYProjects hashtag on social media.

We’re excited to see what you build!

The new maker toolkit: IoT, AI and Google Cloud Platform

Voice interaction is everywhere these days—via phones, TVs, laptops and smart home devices that use technology like the Google Assistant. And with the availability of maker-friendly offerings like Google AIY’s Voice Kit, the maker community has been getting in on the action and adding voice to their Internet of Things (IoT) projects.

As avid makers ourselves, we wrote an open-source, maker-friendly tutorial to show developers how to piggyback on a Google Assistant-enabled device (Google Home, Pixel, Voice Kit, etc.) and add voice to their own projects. We also created an example application to help you connect your project with GCP-hosted web and mobile applications, or tap into sophisticated AI frameworks that can provide more natural conversational flow.

Let’s take a look at what this tutorial, and our example application, can help you do.

Particle Photon: the brains of the operation

The Photon microcontroller from Particle is an easy-to-use IoT prototyping board that comes with onboard Wi-Fi and USB support, and is compatible with the popular Arduino ecosystem. It’s also a great choice for internet-enabled projects: every Photon gets its own webhook in Particle Cloud, and Particle provides a host of additional integration options with its web-based IDE, JavaScript SDK and command-line interface. Most importantly for the maker community, Particle Photons are super affordable, starting at just $19.

voice-kit-gcp-particle

Connecting the Google Assistant and Photon: Actions on Google and Dialogflow

The Google Assistant (via Google Home, Pixel, Voice Kit, etc.) responds to your voice input, and the Photon (through Particle Cloud) reacts to your application’s requests (in this case, turning an LED on and off). But how do you tie the two together? Let’s take a look at all the moving parts:


  • Actions on Google is the developer platform for the Google Assistant. With Actions on Google, developers build apps to help answer specific queries and connect users to products and services. Users interact with apps for the Assistant through a conversational, natural-sounding back-and-forth exchange, and your Action passes those user requests on to your app.

  • Dialogflow (formerly API.AI) lets you build even more engaging voice and text-based conversational interfaces powered by AI, and sends out request data via a webhook.

  • A server (or service) running Node.js handles the resulting user queries.


Along with some sample applications, our guide includes a Dialogflow agent, which lets you parse queries and route actions back to users (by voice and/or text) or to other applications. Dialogflow provides a variety of interface options, from an easy-to-use web-based GUI to a robust Node.js-powered SDK for interacting with both your queries and the outside world. In addition, its powerful machine learning tools add intelligence and natural language processing. Your applications can learn queries and intents over time, exposing even more powerful options for making and providing better results along the way. (The recently announced Dialogflow Enterprise Edition offers greater flexibility and support to meet the needs of large-scale businesses.)


Backend infrastructure: GCP

It’s a no-brainer to build your IoT apps on a Google Cloud Platform (GCP) backend, as you can use a single Google account to sign into your voice device, create Actions on Google apps and Dialogflow agents, and host the web services. To help get you up and running, we developed two sample web applications based on different GCP technologies that you can use as inspiration when creating a voice-powered IoT app:


  • Cloud Functions for Firebase. If your goal is quick deployment and iteration, Cloud Functions for Firebase is a simple, low-cost and powerful option—even if you don’t have much server-side development experience. It integrates quickly and easily with the other tools used here. Dialogflow, for example, now allows you to drop Cloud Functions for Firebase code directly into its graphical user interface.

  • App Engine. For those of you with more development experience and/or curiosity, App Engine is just as easy to deploy and scale, but includes more options for integrations with your other applications, additional programming language/framework choices, and a host of third-party add-ons. App Engine is a great choice if you already have a Node.js application to which you want to add voice actions, you want to tie into more of Google’s machine learning services, or you want to get deeper into device connection and management.


Next steps

As makers, we’ve only just scratched the surface of what we can do with these new tools like IoT, AI and cloud. Check out our full tutorials, and grab the code on Github. With these examples to build from, we hope we’ve made it easier for you to add voice powers to your maker project. For some extra inspiration, check out what other makers have built with AIY Voice Kit. And for even more ways to add machine learning to your maker project, check out the AIY Vision Kit, which just went on pre-sale today.

We can’t wait to see what you build!

Source: Google Cloud


Introducing the AIY Vision Kit: Add computer vision to your maker projects

Posted by Billy Rutledge, Director, AIY Projects

Since we released AIY Voice Kit, we've been inspired by the thousands of amazing builds coming in from the maker community. Today, the AIY Team is excited to announce our next project: the AIY Vision Kit — an affordable, hackable, intelligent camera.

Much like the Voice Kit, our Vision Kit is easy to assemble and connects to a Raspberry Pi computer. Based on user feedback, this new kit is designed to work with the smaller Raspberry Pi Zero W computer and runs its vision algorithms on-device so there's no cloud connection required.

Build intelligent devices that can perceive, not just see

The kit materials list includes a VisionBonnet, a cardboard outer shell, an RGB arcade-style button, a piezo speaker, a macro/wide lens kit, flex cables, standoffs, a tripod mounting nut and connecting components.

The VisionBonnet is an accessory board for Raspberry Pi Zero W that features the Intel® Movidius™ MA2450, a low-power vision processing unit capable of running neural networks. This will give makers visual perception instead of image sensing. It can run at speeds of up to 30 frames per second, providing near real-time performance.

Bundled with the software image are three neural network models:

  • A model based on MobileNetsthat can recognize a thousand common objects.
  • A model for face detection capable of not only detecting faces in the image, but also scoring facial expressions on a "joy scale" that ranges from "sad" to "laughing."
  • A model for the important task of discerning between cats, dogs and people.

For those of you who have your own models in mind, we've included the original TensorFlow code and a compiler. Take a new model you have (or train) and run it on the the Intel® Movidius™ MA2450.

Extend the kit to solve your real-world problems

The AIY Vision Kit is completely hackable:

  • Want to prototype your own product? The Vision Kit and the Raspberry Pi Zero W can fit into any number of tiny enclosures.
  • Want to change the way the camera reacts? Use the Python API to write new software to customize the RGB button colors, piezo element sounds and GPIO pins.
  • Want to add more lights, buttons, or servos? Use the 4 GPIO expansion pins to connect your own hardware.

We hope you'll use it to solve interesting challenges, such as:

  • Build "hotdog/not hotdog" (or any other food recognizer)
  • Turn music on when someone walks through the door
  • Send a text when your car leaves the driveway
  • Open the dog door when she wants to get back in the house

Ready to get your hands on one?

AIY Vision Kits will be available in December, with online pre-sales at Micro Center starting today.

*** Please note that AIY Vision Kit requires Raspberry Pi Zero W, Raspberry Pi Camera V2 and a micro SD card, which must be purchased separately.

Tell us what you think!

We're listening — let us know how we can improve our kits and share what you're making using the #AIYProjects hashtag on social media. We hope AIY Vision Kit inspires you to build all kinds of creative devices.

The new maker toolkit: IoT, AI and Google Cloud Platform

Voice interaction is everywhere these days—via phones, TVs, laptops and smart home devices that use technology like the Google Assistant. And with the availability of maker-friendly offerings like Google AIY’s Voice Kit, the maker community has been getting in on the action and adding voice to their Internet of Things (IoT) projects.

As avid makers ourselves, we wrote an open-source, maker-friendly tutorial to show developers how to piggyback on a Google Assistant-enabled device (Google Home, Pixel, Voice Kit, etc.) and add voice to their own projects. We also created an example application to help you connect your project with GCP-hosted web and mobile applications, or tap into sophisticated AI frameworks that can provide more natural conversational flow.

Let’s take a look at what this tutorial, and our example application, can help you do.

Particle Photon: the brains of the operation

The Photon microcontroller from Particle is an easy-to-use IoT prototyping board that comes with onboard Wi-Fi and USB support, and is compatible with the popular Arduino ecosystem. It’s also a great choice for internet-enabled projects: every Photon gets its own webhook in Particle Cloud, and Particle provides a host of additional integration options with its web-based IDE, JavaScript SDK and command-line interface. Most importantly for the maker community, Particle Photons are super affordable, starting at just $19.

voice-kit-gcp-particle

Connecting the Google Assistant and Photon: Actions on Google and Dialogflow

The Google Assistant (via Google Home, Pixel, Voice Kit, etc.) responds to your voice input, and the Photon (through Particle Cloud) reacts to your application’s requests (in this case, turning an LED on and off). But how do you tie the two together? Let’s take a look at all the moving parts:


  • Actions on Google is the developer platform for the Google Assistant. With Actions on Google, developers build apps to help answer specific queries and connect users to products and services. Users interact with apps for the Assistant through a conversational, natural-sounding back-and-forth exchange, and your Action passes those user requests on to your app.

  • Dialogflow (formerly API.AI) lets you build even more engaging voice and text-based conversational interfaces powered by AI, and sends out request data via a webhook.

  • A server (or service) running Node.js handles the resulting user queries.


Along with some sample applications, our guide includes a Dialogflow agent, which lets you parse queries and route actions back to users (by voice and/or text) or to other applications. Dialogflow provides a variety of interface options, from an easy-to-use web-based GUI to a robust Node.js-powered SDK for interacting with both your queries and the outside world. In addition, its powerful machine learning tools add intelligence and natural language processing. Your applications can learn queries and intents over time, exposing even more powerful options for making and providing better results along the way. (The recently announced Dialogflow Enterprise Edition offers greater flexibility and support to meet the needs of large-scale businesses.)


Backend infrastructure: GCP

It’s a no-brainer to build your IoT apps on a Google Cloud Platform (GCP) backend, as you can use a single Google account to sign into your voice device, create Actions on Google apps and Dialogflow agents, and host the web services. To help get you up and running, we developed two sample web applications based on different GCP technologies that you can use as inspiration when creating a voice-powered IoT app:


  • Cloud Functions for Firebase. If your goal is quick deployment and iteration, Cloud Functions for Firebase is a simple, low-cost and powerful option—even if you don’t have much server-side development experience. It integrates quickly and easily with the other tools used here. Dialogflow, for example, now allows you to drop Cloud Functions for Firebase code directly into its graphical user interface.

  • App Engine. For those of you with more development experience and/or curiosity, App Engine is just as easy to deploy and scale, but includes more options for integrations with your other applications, additional programming language/framework choices, and a host of third-party add-ons. App Engine is a great choice if you already have a Node.js application to which you want to add voice actions, you want to tie into more of Google’s machine learning services, or you want to get deeper into device connection and management.


Next steps

As makers, we’ve only just scratched the surface of what we can do with these new tools like IoT, AI and cloud. Check out our full tutorials, and grab the code on Github. With these examples to build from, we hope we’ve made it easier for you to add voice powers to your maker project. For some extra inspiration, check out what other makers have built with AIY Voice Kit. And for even more ways to add machine learning to your maker project, check out the AIY Vision Kit, which just went on pre-sale today.

We can’t wait to see what you build!

Interpreting Deep Neural Networks with SVCCA



Deep Neural Networks (DNNs) have driven unprecedented advances in areas such as vision, language understanding and speech recognition. But these successes also bring new challenges. In particular, contrary to many previous machine learning methods, DNNs can be susceptible to adversarial examples in classification, catastrophic forgetting of tasks in reinforcement learning, and mode collapse in generative modelling. In order to build better and more robust DNN-based systems, it is critically important to be able to interpret these models. In particular, we would like a notion of representational similarity for DNNs: can we effectively determine when the representations learned by two neural networks are same?

In our paper, “SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability,” we introduce a simple and scalable method to address these points. Two specific applications of this that we look at are comparing the representations learned by different networks, and interpreting representations learned by hidden layers in DNNs. Furthermore, we are open sourcing the code so that the research community can experiment with this method.

Key to our setup is the interpretation of each neuron in a DNN as an activation vector. As shown in the figure below, the activation vector of a neuron is the scalar output it produces on the input data. For example, for 50 input images, a neuron in a DNN will output 50 scalar values, encoding how much it responds to each input. These 50 scalar values then make up an activation vector for the neuron. (Of course, in practice, we take many more than 50 inputs.)
Here a DNN is given three inputs, x1, x2, x3. Looking at a neuron inside the DNN (bolded in red, right pane), this neuron produces a scalar output zi corresponding to each input xi. These values form the activation vector of the neuron.
With this basic observation and a little more formulation, we introduce Singular Vector Canonical Correlation Analysis (SVCCA), a technique for taking in two sets of neurons and outputting aligned feature maps learned by both of them. Critically, this technique accounts for superficial differences such as permutations in neuron orderings (crucial for comparing different networks), and can detect similarities where other, more straightforward comparisons fail.

As an example, consider training two convolutional neural nets (net1 and net2, below) on CIFAR-10, a medium scale image classification task. To visualize the results of our method, we compare activation vectors of neurons with the aligned features output by SVCCA. Recall that the activation vector of a neuron is the raw scalar outputs on input images. The x-axis of the plot consists of images sorted by class (gray dotted lines showing class boundaries), and the y axis the output value of the neuron.
On the left pane, we show the two highest activation (largest euclidean norm) neurons in net1 and net2. Examining highest activations neurons has been a popular method to interpret DNNs in computer vision, but in this case, the highest activation neurons in net1 and net2 have no clear correspondence, despite both being trained on the same task. However, after applying SVCCA, (right pane), we see that the latent representations learned by both networks do indeed share some very similar features. Note that the top two rows representing aligned feature maps are close to identical, as are the second highest aligned feature maps (bottom two rows). Furthermore, these aligned mappings in the right pane also show a clear correspondence with the class boundaries, e.g. we see the top pair give negative outputs for Class 8, with the bottom pair giving a positive output for Class 2 and Class 7.

While you can apply SVCCA across networks, one can also do this for the same network, across time, enabling the study of how different layers in a network converge to their final representations. Below, we show panes that compare the representation of layers in net1 during training (y-axes) with the layers at the end of training (x-axes). For example, in the top left pane (titled “0% trained”), the x-axis shows layers of increasing depth of net1 at 100% trained, and the y axis shows layers of increasing depth at 0% trained. Each (i,j) square then tells us how similar the representation of layer i at 100% trained is to layer j at 0% trained. The input layer is at the bottom left, and is (as expected) identical at 0% to 100%. We make this comparison at several points through training, at 0%, 35%, 75% and 100%, for convolutional (top row) and residual (bottom row) nets on CIFAR-10.
Plots showing learning dynamics of convolutional and residual networks on CIFAR-10. Note the additional structure also visible: the 2x2 blocks in the top row are due to batch norm layers, and the checkered pattern in the bottom row due to residual connections.
We find evidence of bottom-up convergence, with layers closer to the input converging first, and layers higher up taking longer to converge. This suggests a faster training method, Freeze Training — see our paper for details. Furthermore, this visualization also helps highlight properties of the network. In the top row, there are a couple of 2x2 blocks. These correspond to batch normalization layers, which are representationally identical to their previous layers. On the bottom row, towards the end of training, we can see a checkerboard like pattern appear, which is due to the residual connections of the network having greater similarity to previous layers.

So far, we’ve concentrated on applying SVCCA to CIFAR-10. But applying preprocessing techniques with the Discrete Fourier transform, we can scale this method to Imagenet sized models. We applied this technique to the Imagenet Resnet, comparing the similarity of latent representations to representations corresponding to different classes:
SVCCA similarity of latent representations with different classes. We take different layers in Imagenet Resnet, with 0 indicating input and 74 indicating output, and compare representational similarity of the hidden layer and the output class. Interestingly, different classes are learned at different speeds: the firetruck class is learned faster than the different dog breeds. Furthermore, the two pairs of dog breeds (a husky-like pair and a terrier-like pair) are learned at the same rate, reflecting the visual similarity between them.
Our paper gives further details on the results we’ve explored so far, and also touches on different applications, e.g. compressing DNNs by projecting onto the SVCCA outputs, and Freeze Training, a computationally cheaper method for training deep networks. There are many followups we’re excited about exploring with SVCCA — moving on to different kinds of architectures, comparing across datasets, and better visualizing the aligned directions are just a few ideas we’re eager to try out. We look forward to presenting these results next week at NIPS 2017 in Long Beach, and we hope the code will also encourage many people to apply SVCCA to their network representations to interpret and understand what their network is learning.

Learn more about the world around you with Google Lens and the Assistant

https://lh6.googleusercontent.com/iMvL_GIr7AjNu9KuqG0d9FNk4_fdRSY2o8U0rApVtWJ9LdybZF48995GiXYvktBN-mlntLqArFYkeKCiLCrby9eMVFWrzqsBpoRvoQ0UxW_WLZRpUxqELyscaS0vw-5McrWz7Ryk
Looking at a landmark and not sure what it is? Interested in learning more about a movie as you stroll by the poster? With Google Lens and your Google Assistant, you now have a helpful sidekick to tell you more about what’s around you, right on your Pixel.

When we introduced the new Pixel 2 last month, we talked about how Google Lens builds on Google’s advancements in computer vision and machine learning. When you combine that with the Google Assistant, which is built on many of the same technologies, you can get quick help with what you see. That means that you can learn more about what’s in front of you—in real time—by selecting the Google Lens icon and tapping on what you’re interested in.


Here are the key ways your Assistant and Google Lens can help you today:


  • Text: Save information from business cards, follow URLs, call phone numbers and navigate to addresses.
  • Landmarks: Explore a new city like a pro with your Assistant to help you recognize landmarks and learn about their history.
  • Art, books and movies: Learn more about a movie, from the trailer to reviews, right from the poster. Look up a book to see the rating and a short synopsis. Become a museum guru by quickly looking up an artist’s info and more. You can even add events, like the movie release date or gallery opening, to your calendar right from Google Lens.
  • Barcodes: Quickly look up products by barcode, or scan QR codes, all with your Assistant.


Google Lens in the Assistant will be rolling out to all Pixel phones set to English in the U.S., U.K., Australia, Canada, India and Singapore over the coming weeks. Once you get the update, go to your Google Assistant on your phone and tap the Google Lens icon in the bottom right corner.


We can’t wait to see how Google Lens helps you explore the world around you, with the help of your Google Assistant. And don’t forget, Google Lens is also available in Google Photos, so even after you take a picture, you can continue to explore and get more information about what’s in your photo.

Posted by Ibrahim Badr, Associate Product Manager, Google Assistant

An AI Resident at work: Suhani Vora and her work on genomics

Suhani Vora is a bioengineer, aspiring (and self-taught) machine learning expert, SNES Super Mario World ninja, and Google AI Resident. This means that she’s part of a 12-month research training program designed to jumpstart a career in machine learning. Residents, who are paired with Google AI mentors to work on research projects according to their interests, apply machine learning to their expertise in various backgrounds—from computer science to epidemiology.

I caught up with Suhani to hear more about her work as an AI Resident, her typical day, and how AI can help transform the field of genomics.

Phing: How did you get into machine learning research?

Suhani: During graduate school, I worked on engineering CRISPR/Cas9 systems, which enable a wide range of research on genomes. And though I was working with the most efficient tools available for genome editing, I knew we could make progress even faster.

One important factor was our limited ability to predict what novel biological designs would work. Each design cycle, we were only using very small amounts of previously collected data and relied on individual interpretation of that data to make design decisions in the lab.

By failing to incorporate more powerful computational methods to make use of big data and aid in the design process, it was affecting our ability to make progress quickly. Knowing that machine learning methods would greatly accelerate the speed of scientific discovery, I decided to work on finding ways to apply machine learning to my own field of genetic engineering.

I reached out to researchers in the field, asking how best to get started. A Googler I knew suggested I take the machine learning course by Andrew Ng on Coursera (could not recommend it more highly), so I did that. I’ve never had more fun learning! I had also started auditing an ML course at MIT, and reading papers on deep learning applications to problems in genomics. Ultimately, I took the plunge and and ended up joining the Residency program after finishing grad school.  

Tell us about your role at Google, and what you’re working on right now.

I’m a cross-disciplinary deep learning researcher—I research, code, and experiment with deep learning models to explore their applicability to problems in genomics.

In the same way that we use machine learning models to predict the objects are present in an image (think: searching for your dogs in Google Photos), I research ways we can build neural networks to automatically predict the properties of a DNA sequence. This has all kinds of applications, like predicting whether a DNA mutation will cause cancer, or is benign.

What’s a typical day like for you?

On any given day, I’m writing code to process new genomics data, or creating a neural network in TensorFlow to model the data. Right now, a lot of my time is spent troubleshooting such models.

I also spend time chatting with fellow Residents, or a member of the TensorFlow team, to get their expertise on the experiments or code I’m writing. This could include a meeting with my two mentors, Mark DePristo and Quoc Le, top researchers in the field of machine learning who regularly provide invaluable guidance for developing the neural network models I’m interested in.

What do you like most about the AI Residency program? About working at Google?

I like the freedom to pursue topics of our interest, combined with the strong support network we have to get things done. Google is a really positive work environment, and I feel set up to succeed. In a different environment I wouldn’t have the chance to work with a world-class researcher in computational genomics like Mark, AND Quoc, one of the world’s leading machine learning researchers, at time same time and place. It’s pretty mind-blowing.

What kind of background do you need to work in machine learning?

We have such a wide array of backgrounds among our AI Residents! The only real common thread I see is a very strong desire to work on machine learning, or to apply machine learning to a particular problem of choice. I think having a strong background in linear algebra, statistics, computer science, and perhaps modeling makes things easier—but these skills are also now accessible to almost anyone with an interest, through MOOCs!

What kinds of problems do you think that AI can help solve for the world?

Ultimately, it really just depends how creative we are in figuring out what AI can do for us. Current deep learning methods have become state of the art for image recognition tasks, such as automatically detecting pets or scenes in images, and natural language processing, like translating from Chinese to English. I’m excited to see the next wave of applications in areas such as speech recognition, robotic handling, and medicine.

Interested in the AI Residency? Check out submission details and apply for the 2018 program on our Careers site.