About a year ago, the Google Brain team first shared our mission “Make machines intelligent. Improve people’s lives.” In that time, we’ve shared updates on our work to infuse machine learning across Google products that hundreds of millions of users access everyday, including Translate, Maps, and more. Today, I’d like to share more about how we approach this mission both through advancement in the fundamental theory and understanding of machine learning, and through research in the service of product.
Five years ago, our colleagues Alfred Spector, Peter Norvig, and Slav Petrov published a blog post and paper explaining Google’s hybrid approach to research, an approach that always allowed for varied balances between curiosity-driven and application-driven research. The biggest challenges in machine learning that the Brain team is focused on require the broadest exploration of new ideas, which is why our researchers set their own agendas with much of our team focusing specifically on advancing the state-of-the-art in machine learning. In doing so, we have published hundreds of papers over the last several years in conferences such as NIPS, ICML and ICLR, with acceptance rates significantly above conference averages.
Critical to achieving our mission is contributing new and fundamental research in machine learning. To that end, we’ve built a thriving team that conducts long-term, open research to advance science. In pursuing research across fields such as visual and auditory perception, natural language understanding, art and music generation, and systems architecture and algorithms, we regularly collaborate with researchers at external institutions, with fully 1/3rd of our papers in 2017 having one or more cross-institutional authors. Additionally, we host collaborators from academic institutions to enhance our own work and strengthen our connection to the external scientific community.
In addition to working with the best minds in academia and industry, the Brain team, like many other teams at Google, believes in fostering the development of the next generation of scientists. Our team hosts more than 50 interns every year, with the goal of publishing their work in top machine learning venues (roughly 25% of our group’s publications so far in 2017 have intern co-authors, usually as primary authors). Additionally, in 2016, we welcomed the first cohort of the Google Brain Residency Program, a one-year program for people who want to learn to do machine learning research. In its inaugural year, 27 residents conducted research alongside and under the mentorship of Brain team members, and authored more than 40 papers that were accepted in top research conferences. Our second group of 36 residents started their one-year residency in our group in July, and are already involved in a wide variety of projects.
Along with other teams within Google Research, we enjoy the freedom to both contribute fundamental advances in machine learning, and separately conduct product-focused research. Both paths are important in ensuring that advances in machine learning have a significant impact on the world.
Google Fellows attending the 2017 Global PhD Fellowship Summit
The event included a panel discussion with Domagoj Babic, Kathryn McKinley, Nina Taft, Roy Want and Sunny Colsalvo about their unique career paths in academia and industry. Fellows also had the chance to connect one-on-one with Googlers to discuss their research, as well as receive feedback from leaders in their fields in smaller deep dives and a poster event.
Fellows share their work with Google researchers during the poster session
Our PhD Fellows represent some the best and brightest young researchers around the globe in Computer Science and it is our ongoing goal to support them as they make their mark on the world.
We’d additionally like to announce the complete list of our 2017 Google PhD Fellows, including the latest recipients from China and East Asia, India, and Australia. We look forward to seeing each of them at next year’s summit!
2017 Google PhD Fellows
Algorithms, Optimizations and Markets Chiu Wai Sam Wong, University of California, Berkeley Eric Balkanski, Harvard University Haifeng Xu, University of Southern California
Human-Computer Interaction Motahhare Eslami, University of Illinois, Urbana-Champaign Sarah D'Angelo, Northwestern University Sarah Mcroberts, University of Minnesota - Twin Cities Sarah Webber, The University of Melbourne
Machine Learning Aude Genevay, Fondation Sciences Mathématiques de Paris Dustin Tran, Columbia University Jamie Hayes, University College London Jin-Hwa Kim, Seoul National University Ling Luo, The University of Sydney Martin Arjovsky, New York University Sayak Ray Chowdhury, Indian Institute of Science Song Zuo, Tsinghua University Taco Cohen, University of Amsterdam Yuhuai Wu, University of Toronto Yunhe Wang, Peking University Yunye Gong, Cornell University
Machine Perception, Speech Technology and Computer Vision Avijit Dasgupta, International Institute of Information Technology - Hyderabad Franziska Müller, Saarland University - Saarbrücken GSCS and Max Planck Institute for Informatics George Trigeorgis, Imperial College London Iro Armeni, Stanford University Saining Xie, University of California, San Diego Yu-Chuan Su, University of Texas, Austin
Mobile Computing Sangeun Oh, Korea Advanced Institute of Science and Technology Shuo Yang, Shanghai Jiao Tong University
Natural Language Processing Bidisha Samanta, Indian Institute of Technology Kharagpur Ekaterina Vylomova, The University of Melbourne Jianpeng Cheng, The University of Edinburgh Kevin Clark, Stanford University Meng Zhang, Tsinghua University Preksha Nama, Indian Institute of Technology Madras Tim Rocktaschel, University College London
Privacy and Security Romain Gay, ENS - École Normale Supérieure Xi He, Duke University Yupeng Zhang, University of Maryland, College Park
Programming Languages, Algorithms and Software Engineering Christoffer Quist Adamsen, Aarhus University Muhammad Ali Gulzar, University of California, Los Angeles Oded Padon, Tel-Aviv University
Structured Data and Database Management Amir Shaikhha, EPFL CS Jingbo Shang, University of Illinois, Urbana-Champaign
Systems and Networking Ahmed M. Said Mohamed Tawfik Issa, Georgia Institute of Technology Khanh Nguyen, University of California, Irvine Radhika Mittal, University of California, Berkeley Ryan Beckett, Princeton University Samaneh Movassaghi, Australian National University
Posted by Chi Zeng and Justine Tunney, Software Engineers, Google Brain Team
When we open-sourced TensorFlow in 2015, it included TensorBoard, a suite of visualizations for inspecting and understanding your TensorFlow models and runs. Tensorboard included a small, predetermined set of visualizations that are generic and applicable to nearly all deep learning applications such as observing how loss changes over time or exploring clusters in high-dimensional spaces. However, in the absence of reusable APIs, adding new visualizations to TensorBoard was prohibitively difficult for anyone outside of the TensorFlow team, leaving out a long tail of potentially creative, beautiful and useful visualizations that could be built by the research community.
To allow the creation of new and useful visualizations, we announce the release of a consistent set of APIs that allows developers to add custom visualization plugins to TensorBoard. We hope that developers use this API to extend TensorBoard and ensure that it covers a wider variety of use cases.
We have updated the existing dashboards (tabs) in TensorBoard to use the new API, so they serve as examples for plugin creators. For the current listing of plugins included within TensorBoard, you can explore the tensorboard/plugins directory on GitHub. For instance, observe the new plugin that generates precision-recall curves:
The plugin demonstrates the 3 parts of a standard TensorBoard plugin:
A TensorFlow summary op used to collect data for later visualization. [GitHub]
A Python backend that serves custom data. [GitHub]
A dashboard within TensorBoard built with TypeScript and polymer. [GitHub]
Additionally, like other plugins, the “pr_curves” plugin provides a demo that (1) users can look over in order to learn how to use the plugin and (2) the plugin author can use to generate example data during development. To further clarify how plugins work, we’ve also created a barebones TensorBoard “Greeter” plugin. This simple plugin collects greetings (simple strings preceded by “Hello, ”) during model runs and displays them. We recommend starting by exploring (or forking) the Greeter plugin as well as other existing plugins.
A notable example of how contributors are already using the TensorBoard API is Beholder, which was recently created by Chris Anderson while working on his master’s degree. Beholder shows a live video feed of data (e.g. gradients and convolution filters) as a model trains. You can watch the demo video here.
We look forward to seeing what innovations will come out of the community. If you plan to contribute a plugin to TensorBoard’s repository, you should get in touch with us first through the issue tracker with your idea so that we can help out and possibly guide you.
Acknowledgements Dandelion Mané and William Chargin played crucial roles in building this API.
The Game of Go is an ancient Chinese board game, which has tremendous popularity with millions of players worldwide. Since the success of Deep Blue in the game of Chess in the late 90’s, Go has been considered as the next benchmark for machine learning and games. Indeed, it has simple rules, can be efficiently simulated, and progress can be measured objectively. However, due to the vast search space of possible moves, making an ML system capable of playing Go well represented a considerable challenge. Over the last two years, DeepMind’s AlphaGo has pushed the limit of what is possible with machine learning in games, bringing many innovations and technological advances in order to successfully defeat some of the best players in the world , , .
A little more than 10 years before the success of AlphaGo, the classical tree search techniques that were so successful in Chess were reigning in computer Go programs, but only reaching weak amateur level for human Go players. Thanks to Monte-Carlo Tree Search — a (then) new type of search algorithm based on sampling possible outcomes of the game from a position, and incrementally improving the search tree from the results of those simulations — computers were able to search much deeper in the game. This is important because it made it possible to incorporate less human knowledge in the programs — a task which is very hard to do right. Indeed, any missing knowledge that a human expert either cannot express or did not think about may create errors in the computer evaluation of the game position, and lead to blunders*.
In 2007, Sylvain and David augmented the Monte Carlo Tree Search techniques by exploring two types of knowledge incorporation: (i) online, where the decision for the next move is taken from the current position, using compute resources at the time when the next move is needed, and (ii) offline, where the learning process happens entirely before the game starts, and is summarized into a model that can be applied to all possible positions of a game (even though not all possible positions have been seen during the learning process). This ultimately led to the computer program MoGo, which showed an improvement in performance over previous Go algorithms.
For the online part, they adapted the simple idea that some actions don’t necessarily depend on each other. For example, if you need to book a vacation, the choice of the hotel, flight and car rental is obviously dependent on the choice of your destination. However, once given a destination, these things can be chosen (mostly) independently of each other. The same idea can be applied to Go, where some moves can be estimated partially independently of each other to get a very quick, albeit imprecise, estimate. Of course, when time is available, the exact dependencies are also analysed.
For offline knowledge incorporation, they explored the impact of learning an approximation of the position value with the computer playing against itself using reinforcement learning, adding that knowledge in the tree search algorithm. They also looked at how expert play patterns, based on human knowledge of the game, can be used in a similar way. That offline knowledge was used in two places; first, it helped focus the program on moves that looked similar to good moves it learned offline. Second, it helped simulate more realistic games when the program tried to estimate a given position value.
These improvements led to good success on the smaller version of the game of Go (9x9), even beating one professional player in an exhibition game, and also reaching a stronger amateur level on the full game (19x19). And in the years since 2007, we’ve seen many rapid advances (almost on a monthly basis) from researchers all over the world that have allowed the development of algorithms culminating in AlphaGo (which itself introduced many innovations).
Importantly, these algorithms and techniques are not limited to applications towards games, but also enable improvements in many domains. The contributions introduced by David and Sylvain in their collaboration 10 years ago were an important piece to many of the improvements and advancements in machine learning that benefit our lives daily, and we offer our sincere congratulations to both authors on this well-deserved award.
* As a side note, that’s why machine learning as a whole is such a powerful tool: replacing expert knowledge with algorithms that can more fully explore potential outcomes.↩
In our paper, we show that the Transformer outperforms both recurrent and convolutional models on academic English to German and English to French translation benchmarks. On top of higher translation quality, the Transformer requires less computation to train and is a much better fit for modern machine learning hardware, speeding up training by up to an order of magnitude.
BLEU scores (higher is better) of single models on the standard WMT newstest2014 English to German translation benchmark.
BLEU scores (higher is better) of single models on the standard WMT newstest2014 English to French translation benchmark.
Accuracy and Efficiency in Language Understanding Neural networks usually process language by generating fixed- or variable-length vector-space representations. After starting with representations of individual words or even pieces of words, they aggregate information from surrounding words to determine the meaning of a given bit of language in context. For example, deciding on the most likely meaning and appropriate representation of the word “bank” in the sentence “I arrived at the bank after crossing the…” requires knowing if the sentence ends in “... road.” or “... river.”
RNNs have in recent years become the typical network architecture for translation, processing language sequentially in a left-to-right or right-to-left fashion. Reading one word at a time, this forces RNNs to perform multiple steps to make decisions that depend on words far away from each other. Processing the example above, an RNN could only determine that “bank” is likely to refer to the bank of a river after reading each word between “bank” and “river” step by step. Prior research has shown that, roughly speaking, the more such steps decisions require, the harder it is for a recurrent network to learn how to make those decisions.
The sequential nature of RNNs also makes it more difficult to fully take advantage of modern fast computing devices such as TPUs and GPUs, which excel at parallel and not sequential processing. Convolutional neural networks (CNNs) are much less sequential than RNNs, but in CNN architectures like ByteNet or ConvS2S the number of steps required to combine information from distant parts of the input still grows with increasing distance.
The Transformer In contrast, the Transformer only performs a small, constant number of steps (chosen empirically). In each step, it applies a self-attention mechanism which directly models relationships between all words in a sentence, regardless of their respective position. In the earlier example “I arrived at the bank after crossing the river”, to determine that the word “bank” refers to the shore of a river and not a financial institution, the Transformer can learn to immediately attend to the word “river” and make this decision in a single step. In fact, in our English-French translation model we observe exactly this behavior.
More specifically, to compute the next representation for a given word - “bank” for example - the Transformer compares it to every other word in the sentence. The result of these comparisons is an attention score for every other word in the sentence. These attention scores determine how much each of the other words should contribute to the next representation of “bank”. In the example, the disambiguating “river” could receive a high attention score when computing a new representation for “bank”. The attention scores are then used as weights for a weighted average of all words’ representations which is fed into a fully-connected network to generate a new representation for “bank”, reflecting that the sentence is talking about a river bank.
The animation below illustrates how we apply the Transformer to machine translation. Neural networks for machine translation typically contain an encoder reading the input sentence and generating a representation of it. A decoder then generates the output sentence word by word while consulting the representation generated by the encoder. The Transformer starts by generating initial representations, or embeddings, for each word. These are represented by the unfilled circles. Then, using self-attention, it aggregates information from all of the other words, generating a new representation per word informed by the entire context, represented by the filled balls. This step is then repeated multiple times in parallel for all words, successively generating new representations.
The decoder operates similarly, but generates one word at a time, from left to right. It attends not only to the other previously generated words, but also to the final representations generated by the encoder.
Flow of Information Beyond computational performance and higher accuracy, another intriguing aspect of the Transformer is that we can visualize what other parts of a sentence the network attends to when processing or translating a given word, thus gaining insights into how information travels through the network.
To illustrate this, we chose an example involving a phenomenon that is notoriously challenging for machine translation systems: coreference resolution. Consider the following sentences and their French translations:
It is obvious to most that in the first sentence pair “it” refers to the animal, and in the second to the street. When translating these sentences to French or German, the translation for “it” depends on the gender of the noun it refers to - and in French “animal” and “street” have different genders. In contrast to the current Google Translate model, the Transformer translates both of these sentences to French correctly. Visualizing what words the encoder attended to when computing the final representation for the word “it” sheds some light on how the network made the decision. In one of its steps, the Transformer clearly identified the two nouns “it” could refer to and the respective amount of attention reflects its choice in the different contexts.
The encoder self-attention distribution for the word “it” from the 5th to the 6th layer of a Transformer trained on English to French translation (one of eight attention heads).
Given this insight, it might not be that surprising that the Transformer also performs very well on the classic language analysis task of syntactic constituency parsing, a task the natural language processing community has attacked with highly specialized systems for decades. In fact, with little adaptation, the same network we used for English to German translation outperformed all but one of the previously proposed approaches to constituency parsing.
Next Steps We are very excited about the future potential of the Transformer and have already started applying it to other problems involving not only natural language but also very different inputs and outputs, such as images and video. Our ongoing experiments are accelerated immensely by the Tensor2Tensor library, which we recently open sourced. In fact, after downloading the library you can train your own Transformer networks for translation and parsing by invoking just a few commands. We hope you’ll give it a try, and look forward to seeing what the community can do with the Transformer.
Acknowledgements This research was conducted by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez and Łukasz Kaiser. Additional thanks go to David Chenell for creating the animation above.
Posted by Reena Jana, Creative Lead, Business Inclusion, and Josh Lovejoy, UX Designer, Google Research
Machine learning systems are increasingly influencing many aspects of everyday life, and are used by both the hardware and software products that serve people globally. As such, researchers and designers seeking to create products that are useful and accessible for everyone often face the challenge of finding data sets that reflect the variety and backgrounds of users around the world. In order to train these machine learning systems, open, global — and growing — datasets are needed.
Over the last six months, we’ve seen such a dataset emerge from users of Quick, Draw!, Google’s latest approach to helping wide, international audiences understand how neural networks work. A group of Googlers designed Quick, Draw! as a way for anyone to interact with a machine learning system in a fun way, drawing everyday objects like trees and mugs. The system will try to guess what their drawing depicts, within 20 seconds. While the goal of Quick, Draw! was simply to create a fun game that runs on machine learning, it has resulted in 800 million drawings from twenty million people in 100 nations, from Brazil to Japan to the U.S. to South Africa.
And now we are releasing an open dataset based on these drawings so that people around the world can contribute to, analyze, and inform product design with this data. The dataset currently includes 50 million drawings Quick Draw! players have generated (we will continue to release more of the 800 million drawings over time).
It’s a considerable amount of data; and it’s also a fascinating lens into how to engage a wide variety of people to participate in (1) training machine learning systems, no matter what their technical background; and (2) the creation of open data sets that reflect a wide spectrum of cultures and points of view.
Seeing national — and global — patterns in one glance To understand visual patterns within the dataset quickly and efficiently, we worked with artist Kyle McDonald to overlay thousands of drawings from around the world. This helped us create composite images and identify trends in each nation, as well across all nations. We made animations of 1000 layered international drawings of cats and chairs, below, to share how we searched for visual trends with this data:
Cats, made from 1000 drawings from around the world:
Chairs, made from 1,000 drawings around the world:
Doodles of naturally recurring objects, like cats (or trees, rainbows, or skulls) often look alike across cultures:
However, for objects that might be familiar to some cultures, but not others, we saw notable differences. Sandwiches took defined forms or were a jumbled set of lines; mug handles pointed in opposite directions; and chairs were drawn facing forward or sideways, depending on the nation or region of the world:
One size doesn’t fit all These composite drawings, we realized, could reveal how perspectives and preferences differ between audiences from different regions, from the type of bread used in sandwiches to the shape of a coffee cup, to the aesthetic of how to depict objects so they are visually appealing. For example, a more straightforward, head-on view was more consistent in some nations; side angles in others.
Overlaying the images also revealed how to improve how we train neural networks when we lack a variety of data — even within a large, open, and international data set. For example, when we analyzed 115,000+ drawings of shoes in the Quick, Draw! dataset, we discovered that a single style of shoe, which resembles a sneaker, was overwhelmingly represented. Because it was so frequently drawn, the neural network learned to recognize only this style as a “shoe.”
But just as in the physical world, in the realm of training data, one size does not fit all. We asked, how can we consistently and efficiently analyze datasets for clues that could point toward latent bias? And what would happen if a team built a classifier based on a non-varied set of data? Diagnosing data for inclusion With the open source tool Facets, released last month as part of Google’s PAIR initiative, one can see patterns across a large dataset quickly. The goal is to efficiently, and visually, diagnose how representative large datasets, like the Quick, Draw! Dataset, may be.
Here’s a screenshot from the Quick,Draw! dataset within the Facets tool. The tool helped us position thousands of drawings by "faceting" them in multiple dimensions by their feature values, such as country, up to 100 countries. You, too, can filter for for features such as “random faces” in a 10-country view, which can then be expanded to 100 countries. At a glance, you can see proportions of country representations. You can also zoom in and see details of each individual drawing, allowing you to dive deeper into single data points. This is especially helpful when working with a large visual data set like Quick, Draw!, allowing researchers to explore for subtle differences or anomalies, or to begin flagging small-scale visual trends that might emerge later as patterns within the larger data set.
Here’s the same Quick, Draw! data for “random faces,” faceted for 94 countries and seen from another view. It’s clear in the few seconds that Facets loads the drawings in this new visualization that the data is overwhelmingly representative of the United States and European countries. This is logical given that the Quick, Draw! game is currently only available in English. We plan to add more languages over time. However, the visualization shows us that Brazil and Thailand seem to be non-English-speaking nations that are relatively well-represented within the data. This suggested to us that designers could potentially research what elements of the interface design may have worked well in these countries. Then, we could use that information to improve Quick,Draw! in its next iteration for other global, non-English-speaking audiences. We’re also using the faceted data to help us figure out how prioritize local languages for future translations.
Another outcome of using Facets to diagnose the Quick, Draw! data for inclusion was to identify concrete ways that anyone can improve the variety of data, as well as check for potential biases. Improvements could include:
Changing protocols for human rating of data or content generation, so that the data is more accurately representative of local or global populations
Analyzing subgroups of data and identify the database equivalent of "intersectionality" surfaced within visual patterns
Augmenting and reweighting data so that it is more inclusive
By releasing this dataset, and tools like Facets, we hope to facilitate the exploration of more inclusive approaches to machine learning, and to turn those observations into opportunities for innovation. We’re just beginning to draw insights from both Quick, Draw! and Facets. And we invite you to draw more with us, too.
Acknowledgements Jonas Jongejan, Henry Rowley, Takashi Kawashima, Jongmin Kim, Nick Fox-Gieg, built Quick, Draw! in collaboration with Google Creative Lab and Google’s Data Arts Team. The video about fairness in machine learning was created by Teo Soares, Alexander Chen, Bridget Prophet, Lisa Steinman, and JR Schmidt from Google Creative Lab. James Wexler, Jimbo Wilson, and Mahima Pushkarna, of PAIR, designed Facets, a project led by Martin Wattenberg and Fernanda Viégas, Senior Staff Research Scientists on the Google Brain team, and UX Researcher Jess Holbrook. Ian Johnson from the Google Cloud team contributed to the visualizations of overlaid drawings.
Posted by Pete Warden, Software Engineer, Google Brain Team
At Google, we’re often asked how to get started using deep learning for speech and other audio recognition problems, like detecting keywords or commands. And while there are some great open source speech recognition systems like Kaldi that can use neural networks as a component, their sophistication makes them tough to use as a guide to a simpler tasks. Perhaps more importantly, there aren’t many free and openly available datasets ready to be used for a beginner’s tutorial (many require preprocessing before a neural network model can be built on them) or that are well suited for simple keyword detection.
To solve these problems, the TensorFlow and AIY teams have created the Speech Commands Dataset, and used it to add training* and inference sample code to TensorFlow. The dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license, and will continue to grow in future releases as more contributions are received. The dataset is designed to let you build basic but useful voice interfaces for applications, with common words like “Yes”, “No”, digits, and directions included. The infrastructure we used to create the data has been open sourced too, and we hope to see it used by the wider community to create their own versions, especially to cover underserved languages and applications.
The results will depend on whether your speech patterns are covered by the dataset, so it may not be perfect — commercial speech recognition systems are a lot more complex than this teaching example. But we’re hoping that as more accents and variations are added to the dataset, and as the community contributes improved models to TensorFlow, we’ll continue to see improvements and extensions.
You can also learn how to train your own version of this model through the new audio recognition tutorial on TensorFlow.org. With the latest development version of the framework and a modern desktop machine, you can download the dataset and train the model in just a few hours. You’ll also see a wide variety of options to customize the neural network for different problems, and to make different latency, size, and accuracy tradeoffs to run on different platforms.
We are excited to see what new applications people are able to build with the help of this dataset and tutorial, so I hope you get a chance to dive in and start recognizing!
Posted by Bryan Perozzi, Research Scientist, NYC Algorithms and Optimization Team
The 23rd ACM conference on Knowledge Discovery and Data Mining (KDD’17), a main venue for academic and industry research in data science, information retrieval, data mining and machine learning, was held last week in Halifax, Canada. Google has historically been an active participant in KDD, and this year was no exception, with Googlers’ contributing numerous papers and participating in workshops.
In addition to our overall participation, we are happy to congratulate fellow Googler Bryan Perozzi for receiving the SIGKDD 2017 Doctoral Dissertation Award, which serves to recognize excellent research by doctoral candidates in the field of data mining and knowledge discovery. This award was given in recognition of his thesis on the topic of machine learning on graphs performed at Stony Brook University, under the advisorship of Steven Skiena. Part of his thesis was developed during his internships at Google. The thesis dealt with using a restricted set of local graph primitives (such as ego-networks and truncated random walks) to effectively exploit the information around each vertex for classification, clustering, and anomaly detection. Most notably, the work introduced the random-walk paradigm for graph embedding with neural networks in DeepWalk.
DeepWalk: Online Learning of Social Representations, originally presented at KDD'14, outlines a method for using a series of local information obtained from truncated random walks to learn latent representations of nodes in a graph (e.g. users in a social network). The core idea was to treat each segment of a random walk as a sentence “in the language of the graph.” These segments could then be used as input for neural network models to learn representations of the graph’s nodes, using sequence modeling methods like word2vec (which had just been developed at the time). This research continues at Google, most recently with Learning Edge Representations via Low-Rank Asymmetric Projections.
The full list of Google contributions at KDD’17 is listed below (Googlers highlighted in blue).
Organizing Committee Panel Chair: Andrew Tomkins Research Track Program Chair: Ravi Kumar Applied Data Science Track Program Chair: Roberto J. Bayardo Research Track Program Committee: Sergei Vassilvitskii, Alex Beutel, Abhimanyu Das, Nan Du, Alessandro Epasto, Alex Fabrikant, Silvio Lattanzi, Kristen Lefevre, Bryan Perozzi, Karthik Raman, Steffen Rendle, Xiao Yu Applied Data Science Program Track Committee: Edith Cohen, Ariel Fuxman, D. Sculley, Isabelle Stanton, Martin Zinkevich, Amr Ahmed, Azin Ashkan, Michael Bendersky, James Cook, Nan Du, Balaji Gopalan, Samuel Huston, Konstantinos Kollias, James Kunz, Liang Tang, Morteza Zadimoghaddam
Today, we’re excited to provide more insights into the research done in the Big Apple with the launch of the NYC Algorithms and Optimization Team page. The NYC Algorithms and Optimization Team comprises multiple overlapping research groups working on large-scale graph mining, large-scale optimization and market algorithms.
Large-scale Graph Mining The Large-scale Graph Mining Group is tasked with building the most scalable library for graph algorithms and analysis and applying it to a multitude of Google products. We formalize data mining and machine learning challenges as graph algorithms problems and perform fundamental research in those fields leading to publications in top venues.
Balanced Partitioning: Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems. As our paper shows, we are able to achieve a 15-25% reduction in cut size compared to state-of-the-art algorithms in the literature.
Clustering and Connected Components: We have state-of-the-art implementations of many different algorithms including hierarchical clustering, overlapping clustering, local clustering, spectral clustering, and connected components. Our methods are 10-30x faster than the best previously studied algorithms and can scale to graphs with trillions of edges.
Public-private Graph Computation: Our research on novel models of graph computation based on a personal view of private data preserves the privacy of each user.
Large-scale Optimization The Large-scale Optimization Group’s mission is to develop large-scale optimization techniques and use them to improve the efficiency and robustness of infrastructure at Google. We apply techniques from areas such as combinatorial optimization, online algorithms, and control theory to make Google’s massive computational infrastructure do more with less. We combine online and offline optimizations to achieve such goals as increasing throughput, decreasing latency, minimizing resource contention, maximizing the efficacy of caches, and eliminating unnecessary work in distributed systems.
Our research is used in critical infrastructure that supports core products:
Google Search Infrastructure Optimization: We partnered with the Google Search infrastructure team to build a distributed feedback control loop to govern the way queries are fanned out to machines. We also improved the efficacy of caching by increasing the homogeneity of the stream of queries seen by any single machine.
Market Algorithms The Market Algorithms Group analyzes, designs, and delivers economically and computationally efficient marketplaces across Google. Our research serves to optimize display ads for DoubleClick’s reservation ads and exchange, as well as sponsored search and mobile ads.
In the past few years, we have explored a number of areas, including:
Robust Stochastic Allocation: In one paper, we study online algorithms that achieve a good performance in both adversarial and stochastic arrival models. In another paper, we develop a hybrid model and algorithms with approximation factors that change as a function of the accuracy of the forecast.
Dynamic Mechanism Design: We have developed efficient mechanisms for sophisticated settings that occur in internet advertising, such as online settings and polyhedral constraints. We have also designed a new family of dynamic mechanisms, called bank account mechanisms, and showed their effectiveness in designing non-clairvoyant dynamic mechanisms that can be implemented without relying on forecasting the future steps.
Posted by Tali Dekel and Michael Rubinstein, Research Scientists
Whether you are a photographer, a marketing manager, or a regular Internet user, chances are you have encountered visible watermarks many times. Visible watermarks are those logos and patterns that are often overlaid on digital images provided by stock photography websites, marking the image owners while allowing viewers to perceive the underlying content so that they could license the images that fit their needs. It is the most common mechanism for protecting the copyrights of hundreds of millions of photographs and stock images that are offered online daily.
It’s standard practice to use watermarks on the assumption that they prevent consumers from accessing the clean images, ensuring there will be no unauthorized or unlicensed use. However, in “On The Effectiveness Of Visible Watermarks” recently presented at the 2017 Computer Vision and Pattern Recognition Conference (CVPR 2017), we show that a computer algorithm can get past this protection and remove watermarks automatically, giving users unobstructed access to the clean images the watermarks are intended to protect.
Left: example watermarked images from popular stock photography websites. Right: watermark-free version of the images on the left, produced automatically by a computer algorithm. More results are available below and on our project page. Image sources: Adobe Stock, 123RF.
As often done with vulnerabilities discovered in operating systems, applications or protocols, we want to disclose this vulnerability and propose solutions in order to help the photography and stock image communities adapt and better protect its copyrighted content and creations. From our experiments much of the world’s stock imagery is currently susceptible to this circumvention. As such, in our paper we also propose ways to make visible watermarks more robust to such manipulations. The Vulnerability of Visible Watermarks Visible watermarks are often designed to contain complex structures such as thin lines and shadows in order to make them harder to remove. Indeed, given a single image, for a computer to detect automatically which visual structures belong to the watermark and which structures belong to the underlying image is extremely difficult. Manually, the task of removing a watermark from an image is tedious, and even with state-of-the-art editing tools it may take a Photoshop expert several minutes to remove a watermark from one image.
However, a fact that has been overlooked so far is that watermarks are typically added in a consistent manner to many images. We show that this consistency can be used to invert the watermarking process — that is, estimate the watermark image and its opacity, and recover the original, watermark-free image underneath. This can be all be done automatically, without any user intervention, and by only observing watermarked image collections publicly available online.
The consistency of a watermark over many images allows to automatically remove it in mass scale. Left: input collection marked by the same watermark, middle: computed watermark and its opacity, right: recovered, watermark-free images. Image sources: COCO dataset, Copyright logo.
The first step of this process is identifying which image structures are repeating in the collection. If a similar watermark is embedded in many images, the watermark becomes the signal in the collection and the images become the noise, and simple image operations can be used to pull out a rough estimation of the watermark pattern.
Watermark extraction with increasing number of images. Left: watermarked input images, Middle: median intensities over the input images (up to the input image shown), Right: the corresponding estimated (matted) watermark. All images licensed from 123RF.
This provides a rough (noisy) estimate of the matted watermark (the watermark image times its spatially varying opacity, i.e., alpha matte). To actually recover the image underneath the watermark, we need to know the watermark’s decomposition into its image and alpha matte components. For this, a multi-image optimization problem can be formed, which we call “multi-image matting” (an extension of the traditional, single image matting problem), where the watermark (“foreground”) is separated into its image and opacity components while reconstructing a subset of clean (“background”) images. This optimization is able to produce very accurate estimations of the watermark components already from hundreds of images, and can deal with most watermarks used in practice, including ones containing thin structures, shadows or color gradients (as long as the watermarks are semi-transparent). Once the watermark pattern is recovered, it can be efficiently removed from any image marked by it.
Here are some more results, showing the estimated watermarks and example watermark-free results generated for several popular stock image services. We show many more results in our supplementary material on the project page.
Left column: Watermark estimated automatically from watermarked images online (rendered on a gray background). Middle colum: Input watermarked image. Right column: Automatically removed watermark. Image sources: Adobe Stock, Can Stock Photo, 123RF, Fotolia.
Making Watermarks More Effective The vulnerability of current watermarking techniques lies in the consistency in watermarks across image collections. Therefore, to counter it, we need to introduce inconsistencies when embedding the watermark in each image. In our paper we looked at several types of inconsistencies and how they affect the techniques described above. We found for example that simply changing the watermark’s position randomly per image does not prevent removing the watermark, nor do small random changes in the watermark’s opacity. But we found that introducing random geometric perturbations to the watermark — warping it when embedding it in each image — improves its robustness. Interestingly, very subtle warping is already enough to generate watermarks that this technique cannot fully defeat.
Flipping between the original watermark and a slightly, randomly warped watermark that can improve its robustness
This warping produces a watermarked image that is very similar to the original (top right in the following figure), yet now if an attempt is made to remove it, it leaves very visible artifacts (bottom right):
In a nutshell, the reason this works is because that removing the randomly-warped watermark from any single image requires to additionally estimate the warp field that was applied to the watermark for that image — a task that is inherently more difficult. Therefore, even if the watermark pattern can be estimated in the presence of these random perturbations (which by itself is nontrivial), accurately removing it without any visible artifact is far more challenging.
Here are some more results on the images from above when using subtle, randomly warped versions of the watermarks. Notice again how visible artifacts remain when trying to remove the watermark in this case, compared to the accurate reconstructions that are achievable with current, consistent watermarks. More results and a detailed analysis can be found in our paper and project page.
Left column: Watermarked image, using subtle, random warping of the watermark. Right Column: Watermark removal result.
This subtle random warping is only one type of randomization that can introduced to make watermarks more effective. A nice feature of that solution is that it is simple to implement and already improves the robustness of the watermark to image-collection attacks while at the same time being mostly imperceptible. If more visible changes to the watermark across the images are acceptable — for example, introducing larger shifts in the watermark or incorporating other random elements in it — they may lead to an even better protection.
While we cannot guarantee that there will not be a way to break such randomized watermarking schemes in the future, we believe (and our experiments show) that randomization will make watermarked collection attacks fundamentally more difficult. We hope that these findings will be helpful for the photography and stock image communities.
Acknowledgements The research described in this post was performed by Tali Dekel, Michael Rubinstein, Ce Liu and Bill Freeman. We thank Aaron Maschinot for narrating our video.