Tag Archives: CVPR

Google-Landmarks: A New Dataset and Challenge for Landmark Recognition

Image classification technology has shown remarkable improvement over the past few years, exemplified in part by the Imagenet classification challenge, where error rates continue to drop substantially every year. In order to continue advancing the state of the art in computer vision, many researchers are now putting more focus on fine-grained and instance-level recognition problems – instead of recognizing general entities such as buildings, mountains and (of course) cats, many are designing machine learning algorithms capable of identifying the Eiffel Tower, Mount Fuji or Persian cats. However, a significant obstacle for research in this area has been the lack of large annotated datasets.

Today, we are excited to advance instance-level recognition by releasing Google-Landmarks, the largest worldwide dataset for recognition of human-made and natural landmarks. Google-Landmarks is being released as part of the Landmark Recognition and Landmark Retrieval Kaggle challenges, which will be the focus of the CVPR’18 Landmarks workshop. The dataset contains more than 2 million images depicting 30 thousand unique landmarks from across the world (their geographic distribution is presented below), a number of classes that is ~30x larger than what is available in commonly used datasets. Additionally, to spur research in this field, we are open-sourcing Deep Local Features (DELF), an attentive local feature descriptor that we believe is especially suited for this kind of task.

Geographic distribution of landmarks in our dataset.
Landmark recognition presents some noteworthy differences from other problems. For example, even within a large annotated dataset, there might not be much training data available for some of the less popular landmarks. Additionally, since landmarks are generally rigid objects which do not move, the intra-class variation is very small (in other words, a landmark’s appearance does not change that much across different images of it). As a result, variations only arise due to image capture conditions, such as occlusions, different viewpoints, weather and illumination, making this distinct from other image recognition datasets where images of a particular class (such as a dog) can vary much more. These characteristics are also shared with other instance-level recognition problems, such as artwork recognition — so we hope the new dataset can benefit research for other image recognition problems as well.

The two Kaggle challenges provide access to annotated data to help researchers address these problems. The recognition track challenge is to build models that recognize the correct landmark in a dataset of challenging test images, while the retrieval track challenges participants to retrieve images containing the same landmark.

A few examples of images from the Google-Landmarks dataset, including landmarks such as Big Ben, Sacre Coeur Basilica, the rock sculpture of Decebalus and the Megyeri Bridge, among others.
If you plan to be at CVPR this year, we hope you’ll attend the CVPR’18 Landmarks workshop. However, everyone is able to participate in the challenge, and access to the new dataset is available via the Kaggle website. We hope this resource is valuable to your research and we can’t wait to see the ideas you will come up with for recognizing landmarks!

Jack Sim, Will Cukierski, Maggie Demkin, Hartwig Adam, Bohyung Han, Shih-Fu Chang, Ondrej Chum, Torsten Sattler, Giorgos Tolias, Xu Zhang, Fernando Brucher, Marco Andreetto, Gursheesh Kour.

Google at CVPR 2017

From July 21-26, Honolulu, Hawaii hosts the 2017 Conference on Computer Vision and Pattern Recognition (CVPR 2017), the premier annual computer vision event comprising the main conference and several co-located workshops and tutorials. As a leader in computer vision research and a Platinum Sponsor, Google will have a strong presence at CVPR 2017 — over 250 Googlers will be in attendance to present papers and invited talks at the conference, and to organize and participate in multiple workshops.

If you are attending CVPR this year, please stop by our booth and chat with our researchers who are actively pursuing the next generation of intelligent systems that utilize the latest machine learning techniques applied to various areas of machine perception. Our researchers will also be available to talk about and demo several recent efforts, including the technology behind Headset Removal for Virtual and Mixed Reality, Image Compression with Neural Networks, Jump, TensorFlow Object Detection API and much more.

You can learn more about our research being presented at CVPR 2017 in the list below (Googlers highlighted in blue).

Organizing Committee
Corporate Relations Chair - Mei Han
Area Chairs include - Alexander Toshev, Ce Liu, Vittorio Ferrari, David Lowe

Training object class detectors with click supervision
Dim Papadopoulos, Jasper Uijlings, Frank Keller, Vittorio Ferrari

Unsupervised Pixel-Level Domain Adaptation With Generative Adversarial Networks
Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan

BranchOut: Regularization for Online Ensemble Tracking With Convolutional Neural Networks Bohyung Han, Jack Sim, Hartwig Adam

Enhancing Video Summarization via Vision-Language Embedding
Bryan A. Plummer, Matthew Brown, Svetlana Lazebnik

Learning by Association — A Versatile Semi-Supervised Training Method for Neural Networks Philip Haeusser, Alexander Mordvintsev, Daniel Cremers

Context-Aware Captions From Context-Agnostic Supervision
Ramakrishna Vedantam, Samy Bengio, Kevin Murphy, Devi Parikh, Gal Chechik

Spatially Adaptive Computation Time for Residual Networks
Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan HuangDmitry Vetrov, Ruslan Salakhutdinov

Xception: Deep Learning With Depthwise Separable Convolutions
François Chollet

Deep Metric Learning via Facility Location
Hyun Oh Song, Stefanie Jegelka, Vivek Rathod, Kevin Murphy

Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

Synthesizing Normalized Faces From Facial Identity Features
Forrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, Inbar Mosseri, William T. Freeman

Towards Accurate Multi-Person Pose Estimation in the Wild
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy

GuessWhat?! Visual Object Discovery Through Multi-Modal Dialogue
Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville

Learning discriminative and transformation covariant local feature detectors
Xu Zhang, Felix X. Yu, Svebor Karaman, Shih-Fu Chang

Full Resolution Image Compression With Recurrent Neural Networks
George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, Michele Covell

Learning From Noisy Large-Scale Datasets With Minimal Supervision
Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, Serge Belongie

Unsupervised Learning of Depth and Ego-Motion From Video
Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe

Cognitive Mapping and Planning for Visual Navigation
Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik

Fast Fourier Color Constancy
Jonathan T. Barron, Yun-Ta Tsai

On the Effectiveness of Visible Watermarks
Tali Dekel, Michael Rubinstein, Ce Liu, William T. Freeman

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
Esteban Real, Jonathon Shlens, Stefano Mazzocchi, Xin Pan, Vincent Vanhoucke

Deep Learning for Robotic Vision
Organizers include: Anelia Angelova, Kevin Murphy
Program Committee includes: George Papandreou, Nathan Silberman, Pierre Sermanet

The Fourth Workshop on Fine-Grained Visual Categorization
Organizers include: Yang Song
Advisory Panel includes: Hartwig Adam
Program Committee includes: Anelia Angelova, Yuning Chai, Nathan Frey, Jonathan Krause, Catherine Wah, Weijun Wang

Language and Vision Workshop
Organizers include: R. Sukthankar

The First Workshop on Negative Results in Computer Vision
Organizers include: R. Sukthankar, W. Freeman, J. Malik

Visual Understanding by Learning from Web Data
General Chairs include: Jesse Berent, Abhinav Gupta, Rahul Sukthankar
Program Chairs include: Wei Li

YouTube-8M Large-Scale Video Understanding Challenge
General Chairs: Paul Natsev, Rahul Sukthankar
Program Chairs: Joonseok Lee, George Toderici
Challenge Organizers: Sami Abu-El-Haija, Anja Hauth, Nisarg Kothari, Hanhan Li, Sobhan Naderi Parizi, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan, Jian Wang

CVPR 2016 & Research at Google

This week, Las Vegas hosts the 2016 Conference on Computer Vision and Pattern Recognition (CVPR 2016), the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. As a leader in computer vision research, Google has a strong presence at CVPR 2016, with many Googlers presenting papers and invited talks at the conference, tutorials and workshops.

We congratulate Google Research Scientist Ce Liu and Google Faculty Advisor Abhinav Gupta, who were selected as this year’s recipients of the PAMI Young Researcher Award for outstanding research contributions within computer vision. We also congratulate Googler Henrik Stewenius for receiving the Longuet-Higgins Prize, a retrospective award that recognizes up to two CVPR papers from ten years ago that have made a significant impact on computer vision research, for his 2006 CVPR paper “Scalable Recognition with a Vocabulary Tree”, co-authored with David Nister.

If you are attending CVPR this year, please stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for hundreds of millions of people. The Google booth will also showcase sveral recent efforts, including the technology behind Motion Stills and a live demo of neural network-based image compression. Learn more about our research being presented at CVPR 2016 in the list below (Googlers highlighted in blue).

Oral Presentations
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille, Kevin Murphy

Detecting Events and Key Actors in Multi-Person Videos
Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-Fei

Spotlight Session: 3D Reconstruction
DeepStereo: Learning to Predict New Views From the World’s Imagery
John Flynn, Ivan Neulander, James Philbin, Noah Snavely

Discovering the Physical Parts of an Articulated Object Class From Multiple Videos
Luca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari

Blockout: Dynamic Model Selection for Hierarchical Deep Networks
Calvin Murdock, Zhen Li, Howard Zhou, Tom Duerig

Rethinking the Inception Architecture for Computer Vision
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew Wojna

Improving the Robustness of Deep Neural Networks via Stability Training
Stephan Zheng, Yang Song, Thomas Leung, Ian Goodfellow

Semantic Image Segmentation With Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform
Liang-Chieh Chen, Jonathan T. Barron, George Papandreou, Kevin Murphy, Alan L. Yuille

Optimization Algorithms for Subset Selection and Summarization in Large Data Sets
Ehsan Elhamifar, Jeff Bilmes, Alex Kulesza, Michael Gygli

Perceptual Organization in Computer Vision: The Role of Feedback in Recognition and Reorganization
Organizers: Katerina Fragkiadaki, Phillip Isola, Joao Carreira
Invited talks: Viren Jain, Jitendra Malik

VQA Challenge Workshop
Invited talks: Jitendra Malik, Kevin Murphy

Women in Computer Vision
Invited talk: Caroline Pantofaru

Computational Models for Learning Systems and Educational Assessment
Invited talk: Jonathan Huang

Large-Scale Scene Understanding (LSUN) Challenge
Invited talk: Jitendra Malik

Large Scale Visual Recognition and Retrieval: BigVision 2016
General Chairs: Jason Corso, Fei-Fei Li, Samy Bengio

ChaLearn Looking at People
Invited talk: Florian Schroff

Medical Computer Vision
Invited talk: Ramin Zabih