The 23rd ACM conference on Knowledge Discovery and Data Mining (KDD’17), a main venue for academic and industry research in data science, information retrieval, data mining and machine learning, was held last week in Halifax, Canada. Google has historically been an active participant in KDD, and this year was no exception, with Googlers’ contributing numerous papers and participating in workshops.
In addition to our overall participation, we are happy to congratulate fellow Googler Bryan Perozzi for receiving the SIGKDD 2017 Doctoral Dissertation Award, which serves to recognize excellent research by doctoral candidates in the field of data mining and knowledge discovery. This award was given in recognition of his thesis on the topic of machine learning on graphs performed at Stony Brook University, under the advisorship of Steven Skiena. Part of his thesis was developed during his internships at Google. The thesis dealt with using a restricted set of local graph primitives (such as ego-networks and truncated random walks) to effectively exploit the information around each vertex for classification, clustering, and anomaly detection. Most notably, the work introduced the random-walk paradigm for graph embedding with neural networks in DeepWalk.
DeepWalk: Online Learning of Social Representations, originally presented at KDD'14, outlines a method for using a series of local information obtained from truncated random walks to learn latent representations of nodes in a graph (e.g. users in a social network). The core idea was to treat each segment of a random walk as a sentence “in the language of the graph.” These segments could then be used as input for neural network models to learn representations of the graph’s nodes, using sequence modeling methods like word2vec (which had just been developed at the time). This research continues at Google, most recently with Learning Edge Representations via Low-Rank Asymmetric Projections.
The full list of Google contributions at KDD’17 is listed below (Googlers highlighted in blue).
Panel Chair: Andrew Tomkins
Research Track Program Chair: Ravi Kumar
Applied Data Science Track Program Chair: Roberto J. Bayardo
Research Track Program Committee: Sergei Vassilvitskii, Alex Beutel, Abhimanyu Das, Nan Du, Alessandro Epasto, Alex Fabrikant, Silvio Lattanzi, Kristen Lefevre, Bryan Perozzi, Karthik Raman, Steffen Rendle, Xiao Yu
Applied Data Science Program Track Committee: Edith Cohen, Ariel Fuxman, D. Sculley, Isabelle Stanton, Martin Zinkevich, Amr Ahmed, Azin Ashkan, Michael Bendersky, James Cook, Nan Du, Balaji Gopalan, Samuel Huston, Konstantinos Kollias, James Kunz, Liang Tang, Morteza Zadimoghaddam
Doctoral Dissertation Award: Bryan Perozzi, for Local Modeling of Attributed Graphs: Algorithms and Applications.
Doctoral Dissertation Runner-up Award: Alex Beutel, for User Behavior Modeling with Large-Scale Graph Analysis.
Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters
Alessandro Epasto, Silvio Lattanzi, Renato Paes Leme
HyperLogLog Hyperextended: Sketches for Concave Sublinear Frequency Statistics
Google Vizier: A Service for Black-Box Optimization
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, D. Sculley
Quick Access: Building a Smart Experience for Google Drive
Sandeep Tata, Alexandrin Popescul, Marc Najork, Mike Colagrosso, Julian Gibbons, Alan Green, Alexandre Mah, Michael Smith, Divanshu Garg, Cayden Meyer, Reuben KanPapers
TFX: A TensorFlow Based Production Scale Machine Learning Platform
Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, Martin Zinkevich
Construction of Directed 2K Graphs
Balint Tillman, Athina Markopoulou, Carter T. Butts, Minas Gjoka
A Practical Algorithm for Solving the Incoherence Problem of Topic Models In Industrial Applications
Amr Ahmed, James Long, Dan Silva, Yuan Wang
Train and Distribute: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks
Heng-Tze Cheng, Lichan Hong, Mustafa Ispir, Clemens Mewald, Zakaria Haque, Illia Polosukhin, Georgios Roumpos, D Sculley, Jamie Smith, David Soergel, Yuan Tang, Philip Tucker, Martin Wicke, Cassandra Xia, Jianwei Xie
Learning to Count Mosquitoes for the Sterile Insect Technique
Yaniv Ovadia, Yoni Halpern, Dilip Krishnan, Josh Livni, Daniel Newburger, Ryan Poplin, Tiantian Zha, D. Sculley
13th International Workshop on Mining and Learning with Graphs
Keynote Speaker: Vahab Mirrokni - Distributed Graph Mining: Theory and Practice
Contributed talks include:
HARP: Hierarchical Representation Learning for Networks
Haochen Chen, Bryan Perozzi, Yifan Hu and Steven Skiena
Fairness, Accountability, and Transparency in Machine Learning
Contributed talks include:
Fair Clustering Through Fairlets
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Sergei Vassilvitskii
Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations
Alex Beutel, Jilin Chen, Zhe Zhao, Ed H. Chi
Rajat Monga, Martin Wicke, Daniel ‘Wolff’ Dobson, Joshua Gordon