Tag Archives: machine learning

Get ready for AI to help make your business more productive

Editor’s note: Companies are evaluating how to use artificial intelligence to transform how they work. Nicholas McQuire, analyst at CCS Insight, reflects on how businesses are using machine learning and assistive technologies to help employees be more productive. He also provides tangible takeaways on how enterprises can better prepare for the future of work.

Employees are drowning in a sea of data and sprawling digital tools, using an average of 6.1 mobile apps for work purposes today, according to a recent CCS Insight survey of IT decision-makers. Part of the reason we’ve seen a lag in macro productivity since the 2008 financial crisis is that we waste a lot of time doing mundane tasks, like searching for data, booking meetings and learning the ins and outs of complex software.

According to Harvard Business Review, wasted time and inefficient processes—what experts call "organizational drag"—cost the U.S. economy a staggering $3 trillion each year. Employees need more assistive and personalized technology to help them tackle organizational drag and work faster and smarter.

Over the next five years, artificial intelligence (AI) will change the way we work and, in the process, transform businesses.

The arrival of AI in the enterprise is quickening

I witnessed a number of proofs of concept in machine learning in 2017; many speech-and image-based cognitive applications are emerging in specific markets, like fraud detection in finance, low-level contract analysis in the legal sector and personalization in retail. There are also AI applications emerging in corporate functions such as IT support, human resources, sales and customer service.

This shows promise for the technology, particularly in the face of challenges like trust, complexity, security and training required for machine learning systems. But it also suggests that the arrival of AI in enterprises could be moving more quickly than we think.

According to the same study, 58 percent of respondents said they are either using, trialling or researching the technology in their business. Decision-makers also said that on average, 29 percent of their applications will be enhanced with AI within the next two years—a remarkably bullish view.

New opportunities for businesses to evolve productivity

In this context, new AI capabilities pose exciting opportunities to evolve productivity and collaboration.

  • Assistive software: In the past year, assistive, cognitive features have become more prevalent in productivity software, such as search, quicker access to documents, automated email replies and virtual assistants. These solutions help surface contextually relevant information for employees and can automate simple, time-consuming tasks, like scheduling meetings, creating help desk tickets, booking conference rooms or summarizing content. In the future, they might also help firms improve and manage employee engagement, a critical human resources and leadership challenge at the moment.
  • Natural language processing: It won’t be long before we also see the integration of voice or natural language processing in productivity apps. The rise of speech-controlled smart speakers such as Google Home, Amazon Echo or the recently-launched Alexa for Business show that creating and completing documents using speech dictation, or using natural language queries to parse data or control functions in spreadsheets, is no longer in the realm of science fiction.
  • Security: Perhaps one of the biggest uses of AI will be to protect company information. Companies are beginning to use AI to protect against spam, phishing and malware in email, as well as the alarming rise of data breaches across the globe; the use of AI to detect threats and improve incident response will likely rise exponentially. Cloud security vendors with access to higher volumes of signals to train AI models are well placed to help businesses leverage early detection of threats. Perhaps this is why, IT professionals listed cybersecurity as the most-likely adopted use of AI in their organizations.

One thing to note: it’s important that enterprises gradually introduce their employees to machine learning capabilities in productivity apps as not to undermine the familiarity of the user experience or turn employees off in fear of privacy violations. In this respect, the advent of AI into work activities resembles consumer apps like YouTube, Maps, Spotify or Amazon, where the technology is subtle to users who may not be aware of cognitive features. The fact that 54 percent of employees in our survey stated they don't use AI in their personal life, despite the widespread use AI these successful apps, is an important illustration.

How your company can prepare for change

Businesses of all shapes and sizes need to prepare for one of the most important technology shifts of our generation. For those who have yet to get started, here are a few things to consider:

  1. Introduce your employees to AI in collaboration tools early. New, assistive AI features in collaboration software help employees get familiar with the technology and its benefits. Smart email, improved document access and search, chatbots and speech assistants will all be important and accessible technologies that can save employees time, improve workflows and enhance employee experiences.
  2. Take advantage of tools that use AI for data security. Rising data breaches and insider threats, coupled with the growing use of cloud and mobile applications, means the integrity of company data is consistently at risk. Security products that incorporate machine learning-based threat intelligence and anomaly detection should be a key priority.
  3. Don’t neglect change management. New collaboration tools that use AI have a high impact on organizational culture, but not all employees will be immediately supportive of this new way of working. While our surveys reveal employees are generally positive on AI, there is still much fear and confusion surrounding AI as a source of job displacement. Be mindful of the impact of change management, specifically the importance of good communication, training and, above all, employee engagement throughout the process.

AI will no doubt face some challenges over the next few years as it enters the workplace, but sentiment is changing away from doom-and-gloom scenarios towards understanding how the technology can be used more effectively to assist humans and enable smarter work. 

It will be fascinating to see how businesses and technology markets transform as AI matures in the coming years.

Source: Google Cloud


The makings of a smart cookie

Now that the holidays are in full swing, you’ve probably already dipped your hand into the cookie jar. You may have a favorite time-tested holiday cookie recipe, but this year we decided to mix up our seasonal baking with two new ingredients: a local bakery in Pittsburgh and our Google AI technology.

Over the past year, a small research team at Google has been experimenting with a new technology for experimental design. To demonstrate what this technology could do, our team came up with a real-world challenge: designing the best possible chocolate chip cookies using a given set of ingredients. Adding to the allure of this project was the fact that our team works out of Google’s Pittsburgh office, which was once an old Nabisco factory.

Using a technique called “Bayesian Optimization,” the team stepped away from their computers and rolled their sleeves up in the kitchen. First, we set a bunch of (metaphorical) knobs—in this case, the ingredients in the cookie recipe, i.e., type of chocolate; quantity of sugar, flour, vanilla, etc. The ingredients provide enough unique variables to manipulate and measure, and the recipe is easy to replicate. Our system guessed at a first recipe to try. We baked it, and our eager taste-testers—Googlers ready and willing to sacrifice for science by eating the cookies—tasted it and gave it a numerical score relative to store-bought cookie samples. We fed that rating back into the system, which learned from the rating and adjusted those “knobs” to create a new recipe. We did this dozens of times—baking, rating, and feeding it back in for a new recipe—and pretty soon the system got much better at creating tasty recipes.

After coming up with a really good recipe within Google, we wanted to see what an expert could do with our “smart cookie.” So Chef John, our lead chef in the office teaching kitchen, introduced the team to Jeanette Harris of the Gluten Free Goat Bakery & Cafe. Jeanette was diagnosed with Celiac over 10 years ago and she turned her passion for baking into an opportunity to offer treats to those who usually can’t partake. “When John came to me with the idea of creating an AI-generated cookie I didn’t know what to expect,” says Jeanette. “I run a small local bakery and take great care to ensure I’m providing safe, quality ingredients to my customers. But once the team took the time to explain what they were trying to do, I was all in!”

Working out of the Goat Bakery kitchen, Chef John and Jeanette mixed and matched some unusual ingredients like cardamom and szechuan pepper, using the measurements provided by Google’s system. Two months and 59 test batches later, the culinary duo came up with a new take on the classic chocolate chip cookie: The Chocolate Chip and Cardamom Cookie.

Smart Cooke_Recipe.png

“This was such a fun experiment! Being able to create something entirely new and different, with the help of AI, was so exciting and makes me wonder what other unique recipe concepts I can develop for my customers,” Jeanette says.

The smart cookie experiment is a taste of what’s possible with AI. We hope it gets you thinking about what kinds of things you can bake up with it.

Google at NIPS 2017



This week, Long Beach, California hosts the 31st annual Conference on Neural Information Processing Systems (NIPS 2017), a machine learning and computational neuroscience conference that includes invited talks, demonstrations and presentations of some of the latest in machine learning research. Google will have a strong presence at NIPS 2017, with over 450 Googlers attending to contribute to, and learn from, the broader academic research community via technical talks and posters, workshops, competitions and tutorials.

Google is at the forefront of machine learning, actively exploring virtually all aspects of the field from classical algorithms to deep learning and more. Focusing on both theory and application, much of our work on language understanding, speech, translation, visual processing, and prediction relies on state-of-the-art techniques that push the boundaries of what is possible. In all of those tasks and many others, we develop learning approaches to understand and generalize, providing us with new ways of looking at old problems and helping transform how we work and live.

If you are attending NIPS 2017, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for billions of people, and to see demonstrations of some of the exciting research we pursue. You can also learn more about our work being presented in the list below (Googlers highlighted in blue).

Google is a Platinum Sponsor of NIPS 2017.

Organizing Committee
Program Chair: Samy Bengio
Senior Area Chairs include: Corinna Cortes, Dale Schuurmans, Hugo Larochelle
Area Chairs include: Afshin Rostamizadeh, Amir Globerson, Been Kim, D. Sculley, Dumitru Erhan, Gal Chechik, Hartmut Neven, Honglak Lee, Ian Goodfellow, Jasper Snoek, John Wright, Jon Shlens, Kun Zhang, Lihong Li, Maya Gupta, Moritz Hardt, Navdeep Jaitly, Ryan Adams, Sally Goldman, Sanjiv Kumar, Surya Ganguli, Tara Sainath, Umar Syed, Viren Jain, Vitaly Kuznetsov

Invited Talk
Powering the next 100 years
John Platt

Accepted Papers
A Meta-Learning Perspective on Cold-Start Recommendations for Items
Manasi Vartak, Hugo Larochelle, Arvind Thiagarajan

AdaGAN: Boosting Generative Models
Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf

Deep Lattice Networks and Partial Monotonic Functions
Seungil You, David Ding, Kevin Canini, Jan Pfeifer, Maya Gupta

From which world is your graph
Cheng Li, Varun Kanade, Felix MF Wong, Zhenming Liu

Hiding Images in Plain Sight: Deep Steganography
Shumeet Baluja

Improved Graph Laplacian via Geometric Self-Consistency
Dominique Joncas, Marina Meila, James McQueen

Model-Powered Conditional Independence Test
Rajat Sen, Ananda Theertha Suresh, Karthikeyan Shanmugam, Alexandros Dimakis, Sanjay Shakkottai

Nonlinear random matrix theory for deep learning
Jeffrey Pennington, Pratik Worah

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
Jeffrey Pennington, Samuel Schoenholz, Surya Ganguli

SGD Learns the Conjugate Kernel Class of the Network
Amit Daniely

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
Maithra Raghu, Justin Gilmer, Jason Yosinski, Jascha Sohl-Dickstein

Learning Hierarchical Information Flow with Recurrent Neural Modules
Danijar Hafner, Alexander Irpan, James Davidson, Nicolas Heess

Online Learning with Transductive Regret
Scott Yang, Mehryar Mohri

Acceleration and Averaging in Stochastic Descent Dynamics
Walid Krichene, Peter Bartlett

Parameter-Free Online Learning via Model Selection
Dylan J Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan

Dynamic Routing Between Capsules
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton

Modulating early visual processing by language
Harm de Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, Aaron C Courville

MarrNet: 3D Shape Reconstruction via 2.5D Sketches
Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, Bill Freeman, Josh Tenenbaum

Affinity Clustering: Hierarchical Clustering at Scale
Mahsa Derakhshan, Soheil Behnezhad, Mohammadhossein Bateni, Vahab Mirrokni, MohammadTaghi Hajiaghayi, Silvio Lattanzi, Raimondas Kiveris

Asynchronous Parallel Coordinate Minimization for MAP Inference
Ofer Meshi, Alexander Schwing

Cold-Start Reinforcement Learning with Softmax Policy Gradient
Nan Ding, Radu Soricut

Filtering Variational Objectives
Chris J Maddison, Dieterich Lawson, George Tucker, Mohammad Norouzi, Nicolas Heess, Andriy Mnih, Yee Whye Teh, Arnaud Doucet

Multi-Armed Bandits with Metric Movement Costs
Tomer Koren, Roi Livni, Yishay Mansour

Multiscale Quantization for Fast Similarity Search
Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel Holtmann-Rice, David Simcha, Felix Yu

Reducing Reparameterization Gradient Variance
Andrew Miller, Nicholas Foti, Alexander D'Amour, Ryan Adams

Statistical Cost Sharing
Eric Balkanski, Umar Syed, Sergei Vassilvitskii

The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings
Krzysztof Choromanski, Mark Rowland, Adrian Weller

Value Prediction Network
Junhyuk Oh, Satinder Singh, Honglak Lee

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
George Tucker, Andriy Mnih, Chris J Maddison, Dieterich Lawson, Jascha Sohl-Dickstein

Approximation and Convergence Properties of Generative Adversarial Learning
Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri

Attention is All you Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin

PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
Jonathan Huggins, Ryan Adams, Tamara Broderick

Repeated Inverse Reinforcement Learning
Kareem Amin, Nan Jiang, Satinder Singh

Fair Clustering Through Fairlets
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Sergei Vassilvitskii

Affine-Invariant Online Optimization and the Low-rank Experts Problem
Tomer Koren, Roi Livni

Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
Sergey Ioffe

Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

Discriminative State Space Models
Vitaly Kuznetsov, Mehryar Mohri

Dynamic Revenue Sharing
Santiago Balseiro, Max Lin, Vahab Mirrokni, Renato Leme, Song Zuo

Multi-view Matrix Factorization for Linear Dynamical System Estimation
Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari

On Blackbox Backpropagation and Jacobian Sensing
Krzysztof Choromanski, Vikas Sindhwani

On the Consistency of Quick Shift
Heinrich Jiang

Revenue Optimization with Approximate Bid Predictions
Andres Munoz, Sergei Vassilvitskii

Shape and Material from Sound
Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Josh Tenenbaum, Bill Freeman

Learning to See Physics via Visual De-animation
Jiajun Wu, Erika Lu, Pushmeet Kohli, Bill Freeman, Josh Tenenbaum

Conference Demos
Electronic Screen Protector with Efficient and Robust Mobile Vision
Hee Jung Ryu, Florian Schroff

Magenta and deeplearn.js: Real-time Control of DeepGenerative Music Models in the Browser
Curtis Hawthorne, Ian Simon, Adam Roberts, Jesse Engel, Daniel Smilkov, Nikhil Thorat, Douglas Eck

Workshops
6th Workshop on Automated Knowledge Base Construction (AKBC) 2017
Program Committee includes: Arvind Neelakanta
Authors include: Jiazhong Nie, Ni Lao

Acting and Interacting in the Real World: Challenges in Robot Learning
Invited Speakers include: Pierre Sermanet

Advances in Approximate Bayesian Inference
Panel moderator: Matthew D. Hoffman

Conversational AI - Today's Practice and Tomorrow's Potential
Invited Speakers include: Matthew Henderson, Dilek Hakkani-Tur
Organizers include: Larry Heck

Extreme Classification: Multi-class and Multi-label Learning in Extremely Large Label Spaces
Invited Speakers include: Ed Chi, Mehryar Mohri

Learning in the Presence of Strategic Behavior
Invited Speakers include: Mehryar Mohri
Presenters include: Andres Munoz Medina, Sebastien Lahaie, Sergei Vassilvitskii, Balasubramanian Sivan

Learning on Distributions, Functions, Graphs and Groups
Invited speakers include: Corinna Cortes

Machine Deception
Organizers include: Ian Goodfellow
Invited Speakers include: Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow

Machine Learning and Computer Security
Invited Speakers include: Ian Goodfellow
Organizers include: Nicolas Papernot
Authors include: Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow

Machine Learning for Creativity and Design
Keynote Speakers include: Ian Goodfellow
Organizers include: Doug Eck, David Ha

Machine Learning for Audio Signal Processing (ML4Audio)
Authors include: Aren Jansen, Manoj Plakal, Dan Ellis, Shawn Hershey, Channing Moore, Rif A. Saurous, Yuxuan Wang, RJ Skerry-Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric Batternberg, Rob Clark

Machine Learning for Health (ML4H)
Organizers include: Jasper Snoek, Alex Wiltschko
Keynote: Fei-Fei Li

NIPS Time Series Workshop 2017
Organizers include: Vitaly Kuznetsov
Authors include: Brendan Jou

OPT 2017: Optimization for Machine Learning
Organizers include: Sashank Reddi

ML Systems Workshop
Invited Speakers include: Rajat Monga, Alexander Mordvintsev, Chris Olah, Jeff Dean
Authors include: Alex Beutel, Tim Kraska, Ed H. Chi, D. Scully, Michael Terry

Aligned Artificial Intelligence
Invited Speakers include: Ian Goodfellow

Bayesian Deep Learning
Organizers include: Kevin Murphy
Invited speakers include: Nal Kalchbrenner, Matthew D. Hoffman

BigNeuro 2017
Invited speakers include: Viren Jain

Cognitively Informed Artificial Intelligence: Insights From Natural Intelligence
Authors include: Jiazhong Nie, Ni Lao

Deep Learning At Supercomputer Scale
Organizers include: Erich Elsen, Zak Stone, Brennan Saeta, Danijar Haffner

Deep Learning: Bridging Theory and Practice
Invited Speakers include: Ian Goodfellow

Interpreting, Explaining and Visualizing Deep Learning
Invited Speakers include: Been Kim, Honglak Lee
Authors include: Pieter Kinderman, Sara Hooker, Dumitru Erhan, Been Kim

Learning Disentangled Features: from Perception to Control
Organizers include: Honglak Lee
Authors include: Jasmine Hsu, Arkanath Pathak, Abhinav Gupta, James Davidson, Honglak Lee

Learning with Limited Labeled Data: Weak Supervision and Beyond
Invited Speakers include: Ian Goodfellow

Machine Learning on the Phone and other Consumer Devices
Invited Speakers include: Rajat Monga
Organizers include: Hrishikesh Aradhye
Authors include: Suyog Gupta, Sujith Ravi

Optimal Transport and Machine Learning
Organizers include: Olivier Bousquet

The future of gradient-based machine learning software & techniques
Organizers include: Alex Wiltschko, Bart van Merriënboer

Workshop on Meta-Learning
Organizers include: Hugo Larochelle
Panelists include: Samy Bengio
Authors include: Aliaksei Severyn, Sascha Rothe

Symposiums
Deep Reinforcement Learning Symposium
Authors include: Benjamin Eysenbach, Shane Gu, Julian Ibarz, Sergey Levine

Interpretable Machine Learning
Authors include: Minmin Chen

Metalearning
Organizers include: Quoc V Le

Competitions
Adversarial Attacks and Defences
Organizers include: Alexey Kurakin, Ian Goodfellow, Samy Bengio

Competition IV: Classifying Clinically Actionable Genetic Mutations
Organizers include: Wendy Kan

Tutorial
Fairness in Machine Learning
Solon Barocas, Moritz Hardt


Machine learning gives environmentalists something to tweet about

Editor’s note: TensorFlow, our open source machine learning library, is just that—open to anyone. Companies, nonprofits, researchers and developers have used TensorFlow in some pretty cool ways, and we’re sharing those stories here on Keyword. Here’s one of them.


Victor Anton captured tens of thousands of birdsong recordings, collected over a three-year period. But he had no way to figure out which birdsong belonged to what bird.

The recordings, taken at 50 locations around a bird sanctuary in New Zealand known as “Zelandia,” were part of an effort to better understand the movement and numbers of threatened species including the Hihi, Tīeke and Kākāriki. Because researchers didn’t have reliable information about where the birds were and how they moved about, it was difficult to make good decisions about where to target conservation efforts on the ground.

Endangered species include the Kākāriki, Hihi, and Tīekei
Endangered species include the Kākāriki, Hihi, and Tīekei.

That’s where the recordings come in. Yet the amount of audio data was overwhelming. So Victor—a Ph.D. student at Victoria University of Wellington, New Zealand—and his team turned to technology.

“We knew we had lots of incredibly valuable data tied up in the recordings, but we simply didn’t have the manpower or a viable solution that would help us unlock this,” Victor tells us. “So we turned to machine learning to help us.”

Some of the audio recorders set up at 50 sites around the sanctuary
Some of the audio recorders set up at 50 sites around the sanctuary.

In one of the quirkier applications of machine learning, they trained a Google TensorFlow-based system to recognize specific bird calls and measure bird activity. The more audio it deciphered, the more it learned, and the more accurate it became.


It worked like this: the AI system used audio that had been recorded and stored, chopping it into minute-long segments, and then converting the file into a spectrogram. After the spectrograms were chopped into chunks, each spanning less than a second, they were processed individually by a deep convolutional neural network. A recurrent neural network then tied together the chunks and produced a continual prediction of which of the three birds was present across the minute-long segment. These segments were compiled to create a fuller picture about the presence and movement of the birds.

TensorFlow processed the Spectograms and learned to identify the calls of different species

TensorFlow processed the Spectograms and learned to identify the calls of different species.

The team faced some unique challenges. They were starting with a small quantity of labelled data, the software would often pick up other noises like construction, cars and even doorbells, and some of the bird species had a variety of birdsongs or two would sing at the same time.

To overcome these hurdles, they tested, verified and retrained the system many times over. As a result, they have learned things that would have otherwise remained locked up in thousands of hours of data. While it’s still early days, already conservation groups are talking to Victor about how they can use these initial results to better target their efforts. Moreover, the team has seen enough encouraging signs that they believe that their tools can be applied to other conservation projects.

“We are only just beginning to understand the different ways we can put machine learning to work in helping us protect different fauna,” says Victor, “ultimately allowing us to solve other environmental challenges across the world.”

Introducing AIY Vision Kit: Make devices that see

Earlier this year, we kicked off AIY Projects to help makers experiment with and learn about artificial intelligence. Our first release, AIY Voice Kit, was a huge hit! People built many amazing projects, showing what was possible with voice recognition in maker projects.

Today, we’re excited to announce our latest AIY Project, the Vision Kit. It’s our first project that features on-device neural network acceleration, providing powerful computer vision without a cloud connection.  

vision-kit-assembly
AIY Vision Kit's do-it-yourself assembly

What’s in the AIY Vision Kit?

Like AIY Voice Kit (released in May), Vision Kit is a do-it-yourself build. You’ll need to add a Raspberry Pi Zero W, a Raspberry Pi Camera, an SD card and a power supply, which must be purchased separately.

The kit includes a cardboard outer shell, the VisionBonnet circuit board, an RGB arcade-style button, a piezo speaker, a macro/wide lens kit, a tripod mounting nut and other connecting components.

vision-kit-exploded
AIY Vision Kit components

The main component of AIY Vision Kit is the VisionBonnet board for Raspberry Pi. The bonnet features the Intel® Movidius™ MA2450, a low-power vision processing unit capable of running neural network models on-device.

vision-kit-bonnet
AIY Vision Kit's VisionBonnet accessory for Raspberry Pi

The provided software includes three TensorFlow-based neural network models for different vision applications. One based on MobileNets can recognize a thousand common objects, a second can recognize faces and their expressions and the third is a person, cat and dog detector. We've also included a tool to compile models for Vision Kit, so you can train and retrain models with TensorFlow on your workstation or any cloud service.

We also provide a Python API that gives you the ability to change the RGB button colors, adjust the piezo element sounds and access the four GPIO pins.

With all of these features, you can explore many creative builds that use computer vision. For example, you can:


  • Identify all kinds of plant and animal species

  • See when your dog is at the back door

  • See when your car left the driveway

  • See that your guests are delighted by your holiday decorations

  • See when your little brother comes into your room (sound the alarm!)

Where can you get it?

AIY Vision Kit will be available in stores in early December. Pre-order your kit today through Micro Center.

** Please note that full assembly requires Raspberry Pi Zero W, Raspberry Pi Camera and a micro SD card, which must be purchased separately.

We're listening

Please let us know how we can improve on future kits and show us what you’re building by using the #AIYProjects hashtag on social media.

We’re excited to see what you build!

Source: Education


Introducing AIY Vision Kit: Make devices that see

Earlier this year, we kicked off AIY Projects to help makers experiment with and learn about artificial intelligence. Our first release, AIY Voice Kit, was a huge hit! People built many amazing projects, showing what was possible with voice recognition in maker projects.

Today, we’re excited to announce our latest AIY Project, the Vision Kit. It’s our first project that features on-device neural network acceleration, providing powerful computer vision without a cloud connection.  

vision-kit-assembly
AIY Vision Kit's do-it-yourself assembly

What’s in the AIY Vision Kit?

Like AIY Voice Kit (released in May), Vision Kit is a do-it-yourself build. You’ll need to add a Raspberry Pi Zero W, a Raspberry Pi Camera, an SD card and a power supply, which must be purchased separately.

The kit includes a cardboard outer shell, the VisionBonnet circuit board, an RGB arcade-style button, a piezo speaker, a macro/wide lens kit, a tripod mounting nut and other connecting components.

vision-kit-exploded
AIY Vision Kit components

The main component of AIY Vision Kit is the VisionBonnet board for Raspberry Pi. The bonnet features the Intel® Movidius™ MA2450, a low-power vision processing unit capable of running neural network models on-device.

vision-kit-bonnet
AIY Vision Kit's VisionBonnet accessory for Raspberry Pi

The provided software includes three TensorFlow-based neural network models for different vision applications. One based on MobileNets can recognize a thousand common objects, a second can recognize faces and their expressions and the third is a person, cat and dog detector. We've also included a tool to compile models for Vision Kit, so you can train and retrain models with TensorFlow on your workstation or any cloud service.

We also provide a Python API that gives you the ability to change the RGB button colors, adjust the piezo element sounds and access the four GPIO pins.

With all of these features, you can explore many creative builds that use computer vision. For example, you can:


  • Identify all kinds of plant and animal species

  • See when your dog is at the back door

  • See when your car left the driveway

  • See that your guests are delighted by your holiday decorations

  • See when your little brother comes into your room (sound the alarm!)

Where can you get it?

AIY Vision Kit will be available in stores in early December. Pre-order your kit today through Micro Center.

** Please note that full assembly requires Raspberry Pi Zero W, Raspberry Pi Camera and a micro SD card, which must be purchased separately.

We're listening

Please let us know how we can improve on future kits and show us what you’re building by using the #AIYProjects hashtag on social media.

We’re excited to see what you build!

The new maker toolkit: IoT, AI and Google Cloud Platform

Voice interaction is everywhere these days—via phones, TVs, laptops and smart home devices that use technology like the Google Assistant. And with the availability of maker-friendly offerings like Google AIY’s Voice Kit, the maker community has been getting in on the action and adding voice to their Internet of Things (IoT) projects.

As avid makers ourselves, we wrote an open-source, maker-friendly tutorial to show developers how to piggyback on a Google Assistant-enabled device (Google Home, Pixel, Voice Kit, etc.) and add voice to their own projects. We also created an example application to help you connect your project with GCP-hosted web and mobile applications, or tap into sophisticated AI frameworks that can provide more natural conversational flow.

Let’s take a look at what this tutorial, and our example application, can help you do.

Particle Photon: the brains of the operation

The Photon microcontroller from Particle is an easy-to-use IoT prototyping board that comes with onboard Wi-Fi and USB support, and is compatible with the popular Arduino ecosystem. It’s also a great choice for internet-enabled projects: every Photon gets its own webhook in Particle Cloud, and Particle provides a host of additional integration options with its web-based IDE, JavaScript SDK and command-line interface. Most importantly for the maker community, Particle Photons are super affordable, starting at just $19.

voice-kit-gcp-particle

Connecting the Google Assistant and Photon: Actions on Google and Dialogflow

The Google Assistant (via Google Home, Pixel, Voice Kit, etc.) responds to your voice input, and the Photon (through Particle Cloud) reacts to your application’s requests (in this case, turning an LED on and off). But how do you tie the two together? Let’s take a look at all the moving parts:


  • Actions on Google is the developer platform for the Google Assistant. With Actions on Google, developers build apps to help answer specific queries and connect users to products and services. Users interact with apps for the Assistant through a conversational, natural-sounding back-and-forth exchange, and your Action passes those user requests on to your app.

  • Dialogflow (formerly API.AI) lets you build even more engaging voice and text-based conversational interfaces powered by AI, and sends out request data via a webhook.

  • A server (or service) running Node.js handles the resulting user queries.


Along with some sample applications, our guide includes a Dialogflow agent, which lets you parse queries and route actions back to users (by voice and/or text) or to other applications. Dialogflow provides a variety of interface options, from an easy-to-use web-based GUI to a robust Node.js-powered SDK for interacting with both your queries and the outside world. In addition, its powerful machine learning tools add intelligence and natural language processing. Your applications can learn queries and intents over time, exposing even more powerful options for making and providing better results along the way. (The recently announced Dialogflow Enterprise Edition offers greater flexibility and support to meet the needs of large-scale businesses.)


Backend infrastructure: GCP

It’s a no-brainer to build your IoT apps on a Google Cloud Platform (GCP) backend, as you can use a single Google account to sign into your voice device, create Actions on Google apps and Dialogflow agents, and host the web services. To help get you up and running, we developed two sample web applications based on different GCP technologies that you can use as inspiration when creating a voice-powered IoT app:


  • Cloud Functions for Firebase. If your goal is quick deployment and iteration, Cloud Functions for Firebase is a simple, low-cost and powerful option—even if you don’t have much server-side development experience. It integrates quickly and easily with the other tools used here. Dialogflow, for example, now allows you to drop Cloud Functions for Firebase code directly into its graphical user interface.

  • App Engine. For those of you with more development experience and/or curiosity, App Engine is just as easy to deploy and scale, but includes more options for integrations with your other applications, additional programming language/framework choices, and a host of third-party add-ons. App Engine is a great choice if you already have a Node.js application to which you want to add voice actions, you want to tie into more of Google’s machine learning services, or you want to get deeper into device connection and management.


Next steps

As makers, we’ve only just scratched the surface of what we can do with these new tools like IoT, AI and cloud. Check out our full tutorials, and grab the code on Github. With these examples to build from, we hope we’ve made it easier for you to add voice powers to your maker project. For some extra inspiration, check out what other makers have built with AIY Voice Kit. And for even more ways to add machine learning to your maker project, check out the AIY Vision Kit, which just went on pre-sale today.

We can’t wait to see what you build!

Source: Google Cloud


Introducing the AIY Vision Kit: Add computer vision to your maker projects

Posted by Billy Rutledge, Director, AIY Projects

Since we released AIY Voice Kit, we've been inspired by the thousands of amazing builds coming in from the maker community. Today, the AIY Team is excited to announce our next project: the AIY Vision Kit — an affordable, hackable, intelligent camera.

Much like the Voice Kit, our Vision Kit is easy to assemble and connects to a Raspberry Pi computer. Based on user feedback, this new kit is designed to work with the smaller Raspberry Pi Zero W computer and runs its vision algorithms on-device so there's no cloud connection required.

Build intelligent devices that can perceive, not just see

The kit materials list includes a VisionBonnet, a cardboard outer shell, an RGB arcade-style button, a piezo speaker, a macro/wide lens kit, flex cables, standoffs, a tripod mounting nut and connecting components.

The VisionBonnet is an accessory board for Raspberry Pi Zero W that features the Intel® Movidius™ MA2450, a low-power vision processing unit capable of running neural networks. This will give makers visual perception instead of image sensing. It can run at speeds of up to 30 frames per second, providing near real-time performance.

Bundled with the software image are three neural network models:

  • A model based on MobileNetsthat can recognize a thousand common objects.
  • A model for face detection capable of not only detecting faces in the image, but also scoring facial expressions on a "joy scale" that ranges from "sad" to "laughing."
  • A model for the important task of discerning between cats, dogs and people.

For those of you who have your own models in mind, we've included the original TensorFlow code and a compiler. Take a new model you have (or train) and run it on the the Intel® Movidius™ MA2450.

Extend the kit to solve your real-world problems

The AIY Vision Kit is completely hackable:

  • Want to prototype your own product? The Vision Kit and the Raspberry Pi Zero W can fit into any number of tiny enclosures.
  • Want to change the way the camera reacts? Use the Python API to write new software to customize the RGB button colors, piezo element sounds and GPIO pins.
  • Want to add more lights, buttons, or servos? Use the 4 GPIO expansion pins to connect your own hardware.

We hope you'll use it to solve interesting challenges, such as:

  • Build "hotdog/not hotdog" (or any other food recognizer)
  • Turn music on when someone walks through the door
  • Send a text when your car leaves the driveway
  • Open the dog door when she wants to get back in the house

Ready to get your hands on one?

AIY Vision Kits will be available in December, with online pre-sales at Micro Center starting today.

*** Please note that AIY Vision Kit requires Raspberry Pi Zero W, Raspberry Pi Camera V2 and a micro SD card, which must be purchased separately.

Tell us what you think!

We're listening — let us know how we can improve our kits and share what you're making using the #AIYProjects hashtag on social media. We hope AIY Vision Kit inspires you to build all kinds of creative devices.

The new maker toolkit: IoT, AI and Google Cloud Platform

Voice interaction is everywhere these days—via phones, TVs, laptops and smart home devices that use technology like the Google Assistant. And with the availability of maker-friendly offerings like Google AIY’s Voice Kit, the maker community has been getting in on the action and adding voice to their Internet of Things (IoT) projects.

As avid makers ourselves, we wrote an open-source, maker-friendly tutorial to show developers how to piggyback on a Google Assistant-enabled device (Google Home, Pixel, Voice Kit, etc.) and add voice to their own projects. We also created an example application to help you connect your project with GCP-hosted web and mobile applications, or tap into sophisticated AI frameworks that can provide more natural conversational flow.

Let’s take a look at what this tutorial, and our example application, can help you do.

Particle Photon: the brains of the operation

The Photon microcontroller from Particle is an easy-to-use IoT prototyping board that comes with onboard Wi-Fi and USB support, and is compatible with the popular Arduino ecosystem. It’s also a great choice for internet-enabled projects: every Photon gets its own webhook in Particle Cloud, and Particle provides a host of additional integration options with its web-based IDE, JavaScript SDK and command-line interface. Most importantly for the maker community, Particle Photons are super affordable, starting at just $19.

voice-kit-gcp-particle

Connecting the Google Assistant and Photon: Actions on Google and Dialogflow

The Google Assistant (via Google Home, Pixel, Voice Kit, etc.) responds to your voice input, and the Photon (through Particle Cloud) reacts to your application’s requests (in this case, turning an LED on and off). But how do you tie the two together? Let’s take a look at all the moving parts:


  • Actions on Google is the developer platform for the Google Assistant. With Actions on Google, developers build apps to help answer specific queries and connect users to products and services. Users interact with apps for the Assistant through a conversational, natural-sounding back-and-forth exchange, and your Action passes those user requests on to your app.

  • Dialogflow (formerly API.AI) lets you build even more engaging voice and text-based conversational interfaces powered by AI, and sends out request data via a webhook.

  • A server (or service) running Node.js handles the resulting user queries.


Along with some sample applications, our guide includes a Dialogflow agent, which lets you parse queries and route actions back to users (by voice and/or text) or to other applications. Dialogflow provides a variety of interface options, from an easy-to-use web-based GUI to a robust Node.js-powered SDK for interacting with both your queries and the outside world. In addition, its powerful machine learning tools add intelligence and natural language processing. Your applications can learn queries and intents over time, exposing even more powerful options for making and providing better results along the way. (The recently announced Dialogflow Enterprise Edition offers greater flexibility and support to meet the needs of large-scale businesses.)


Backend infrastructure: GCP

It’s a no-brainer to build your IoT apps on a Google Cloud Platform (GCP) backend, as you can use a single Google account to sign into your voice device, create Actions on Google apps and Dialogflow agents, and host the web services. To help get you up and running, we developed two sample web applications based on different GCP technologies that you can use as inspiration when creating a voice-powered IoT app:


  • Cloud Functions for Firebase. If your goal is quick deployment and iteration, Cloud Functions for Firebase is a simple, low-cost and powerful option—even if you don’t have much server-side development experience. It integrates quickly and easily with the other tools used here. Dialogflow, for example, now allows you to drop Cloud Functions for Firebase code directly into its graphical user interface.

  • App Engine. For those of you with more development experience and/or curiosity, App Engine is just as easy to deploy and scale, but includes more options for integrations with your other applications, additional programming language/framework choices, and a host of third-party add-ons. App Engine is a great choice if you already have a Node.js application to which you want to add voice actions, you want to tie into more of Google’s machine learning services, or you want to get deeper into device connection and management.


Next steps

As makers, we’ve only just scratched the surface of what we can do with these new tools like IoT, AI and cloud. Check out our full tutorials, and grab the code on Github. With these examples to build from, we hope we’ve made it easier for you to add voice powers to your maker project. For some extra inspiration, check out what other makers have built with AIY Voice Kit. And for even more ways to add machine learning to your maker project, check out the AIY Vision Kit, which just went on pre-sale today.

We can’t wait to see what you build!

Interpreting Deep Neural Networks with SVCCA



Deep Neural Networks (DNNs) have driven unprecedented advances in areas such as vision, language understanding and speech recognition. But these successes also bring new challenges. In particular, contrary to many previous machine learning methods, DNNs can be susceptible to adversarial examples in classification, catastrophic forgetting of tasks in reinforcement learning, and mode collapse in generative modelling. In order to build better and more robust DNN-based systems, it is critically important to be able to interpret these models. In particular, we would like a notion of representational similarity for DNNs: can we effectively determine when the representations learned by two neural networks are same?

In our paper, “SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability,” we introduce a simple and scalable method to address these points. Two specific applications of this that we look at are comparing the representations learned by different networks, and interpreting representations learned by hidden layers in DNNs. Furthermore, we are open sourcing the code so that the research community can experiment with this method.

Key to our setup is the interpretation of each neuron in a DNN as an activation vector. As shown in the figure below, the activation vector of a neuron is the scalar output it produces on the input data. For example, for 50 input images, a neuron in a DNN will output 50 scalar values, encoding how much it responds to each input. These 50 scalar values then make up an activation vector for the neuron. (Of course, in practice, we take many more than 50 inputs.)
Here a DNN is given three inputs, x1, x2, x3. Looking at a neuron inside the DNN (bolded in red, right pane), this neuron produces a scalar output zi corresponding to each input xi. These values form the activation vector of the neuron.
With this basic observation and a little more formulation, we introduce Singular Vector Canonical Correlation Analysis (SVCCA), a technique for taking in two sets of neurons and outputting aligned feature maps learned by both of them. Critically, this technique accounts for superficial differences such as permutations in neuron orderings (crucial for comparing different networks), and can detect similarities where other, more straightforward comparisons fail.

As an example, consider training two convolutional neural nets (net1 and net2, below) on CIFAR-10, a medium scale image classification task. To visualize the results of our method, we compare activation vectors of neurons with the aligned features output by SVCCA. Recall that the activation vector of a neuron is the raw scalar outputs on input images. The x-axis of the plot consists of images sorted by class (gray dotted lines showing class boundaries), and the y axis the output value of the neuron.
On the left pane, we show the two highest activation (largest euclidean norm) neurons in net1 and net2. Examining highest activations neurons has been a popular method to interpret DNNs in computer vision, but in this case, the highest activation neurons in net1 and net2 have no clear correspondence, despite both being trained on the same task. However, after applying SVCCA, (right pane), we see that the latent representations learned by both networks do indeed share some very similar features. Note that the top two rows representing aligned feature maps are close to identical, as are the second highest aligned feature maps (bottom two rows). Furthermore, these aligned mappings in the right pane also show a clear correspondence with the class boundaries, e.g. we see the top pair give negative outputs for Class 8, with the bottom pair giving a positive output for Class 2 and Class 7.

While you can apply SVCCA across networks, one can also do this for the same network, across time, enabling the study of how different layers in a network converge to their final representations. Below, we show panes that compare the representation of layers in net1 during training (y-axes) with the layers at the end of training (x-axes). For example, in the top left pane (titled “0% trained”), the x-axis shows layers of increasing depth of net1 at 100% trained, and the y axis shows layers of increasing depth at 0% trained. Each (i,j) square then tells us how similar the representation of layer i at 100% trained is to layer j at 0% trained. The input layer is at the bottom left, and is (as expected) identical at 0% to 100%. We make this comparison at several points through training, at 0%, 35%, 75% and 100%, for convolutional (top row) and residual (bottom row) nets on CIFAR-10.
Plots showing learning dynamics of convolutional and residual networks on CIFAR-10. Note the additional structure also visible: the 2x2 blocks in the top row are due to batch norm layers, and the checkered pattern in the bottom row due to residual connections.
We find evidence of bottom-up convergence, with layers closer to the input converging first, and layers higher up taking longer to converge. This suggests a faster training method, Freeze Training — see our paper for details. Furthermore, this visualization also helps highlight properties of the network. In the top row, there are a couple of 2x2 blocks. These correspond to batch normalization layers, which are representationally identical to their previous layers. On the bottom row, towards the end of training, we can see a checkerboard like pattern appear, which is due to the residual connections of the network having greater similarity to previous layers.

So far, we’ve concentrated on applying SVCCA to CIFAR-10. But applying preprocessing techniques with the Discrete Fourier transform, we can scale this method to Imagenet sized models. We applied this technique to the Imagenet Resnet, comparing the similarity of latent representations to representations corresponding to different classes:
SVCCA similarity of latent representations with different classes. We take different layers in Imagenet Resnet, with 0 indicating input and 74 indicating output, and compare representational similarity of the hidden layer and the output class. Interestingly, different classes are learned at different speeds: the firetruck class is learned faster than the different dog breeds. Furthermore, the two pairs of dog breeds (a husky-like pair and a terrier-like pair) are learned at the same rate, reflecting the visual similarity between them.
Our paper gives further details on the results we’ve explored so far, and also touches on different applications, e.g. compressing DNNs by projecting onto the SVCCA outputs, and Freeze Training, a computationally cheaper method for training deep networks. There are many followups we’re excited about exploring with SVCCA — moving on to different kinds of architectures, comparing across datasets, and better visualizing the aligned directions are just a few ideas we’re eager to try out. We look forward to presenting these results next week at NIPS 2017 in Long Beach, and we hope the code will also encourage many people to apply SVCCA to their network representations to interpret and understand what their network is learning.