Tag Archives: Kaggle

Machine Learning Communities: Q4 ‘21 highlights and achievements

Posted by HyeJung Lee, DevRel Community Manager and Soonson Kwon, DevRel Program Manager

Image shows graphic illustrating Q4 success. Includes an arrow pointing to a group of stick figures

Let’s explore highlights and achievements of vast Google Machine Learning communities over the last quarter of last year! We are excited and grateful about all the activities that the communities across the globe do.

Image of the Jax logo  next to images of animals and objects. The animals and objects are labelled Predictions

India-based Aakash Nain has completed the TF-Jax tutorial series with Part 9 (Autodiff in JAX) and Part 10 (Pytrees in JAX). Aakash also started a new tutorial series to learn about the existing methods of building models in JAX. The first tutorial Building models in JAX - Part1 (Stax) is released.

Christmas tree made of code next to words that say Advent of Code

On Dec 12th, ML GDE Paolo Galeone started to solve puzzles of the Advent of Code series using “pure TensorFlow” (without any other library). His solution has been updated in a series of 12 on his blog. He explained how he designed the solutions, how he implemented them, and - when needed - focused on some TensorFlow features not widely used. (Day 1, Day 2, Day 3, Day 4, Day 5, Day 6, Day 7, Day 8, Day 9, Day 10, Day 11, Day 12, Wrap up)

Detailed  diagram of batch prediction/evaluation pipeline leading to model training pipeline

ML GDE Chansung Park (Korea) & Sayak Paul (India) published an “Continuous Adaptation for Machine Learning System to Data Changes” article on TensorFlow blog. They presented a project that implements a workflow combining batch prediction and model evaluation for continuous evaluation retraining In order to capture changes in the data.

Image of Elyes Manais' Google Cloud Certification

ML GDE Elyes Manai from Tunisia wrote an article on GDE blog about his experience on the Google Cloud ML Engineer certification covering guide to certificate and tips.

Image collage of medical staff wearing PPE

TFUG organizer Ali Mustufa Shaikh and Rishit Dagli released “CPPE-5: Medical Personal Protective Equipment Dataset” (paper, code). This paper got featured on Google Research TRC's publication section on January 5, 2022.

Image of a Google slide with text reading Ok, but what are transformers?

TFUG New York hosted a series of events in Dec. End-to-End NLP Workshop with TensorFlow. Brief introduction to the Kaggle competition for Great Barrier Reef challenge by Google(Slide). TF idea for ML Projects with GCP.

Left side of image shows a screenshot  from the Google for Startups Accelerator:MENA page. Right side of mage shows man with glasses holding a piece of paper in front of a wall that has signs on it that say hashtag creativity and hashtag technology

ML GDE Elyes Manai from Tunisia wrote an article “The ability to change people’s lives and leave one’s mark“. Are you facing difficulties growing in constrained environments? And do you think you're not a first-class student and you don't have connections in the industry? Then, check out Elyes’s story. He shared how Google helped him accelerate his impact.

Image shows a graph with data. Labels are on the side to denote wing, body, and tail

ML GDE Sayak Paul (India) and Soumik Rakshit’s Point Cloud Segmentation implemented the PointNet architecture for segmenting 3D point clouds using the ShapeNetCore dataset with TensorFlow 2.x. It is a winner of #TFCommunitySpotlight too.

Screenshot from a paper titled What Should Not be Contrastive in Contrastive Learning

Annotated Research Papers by ML GDE Aakash Kumar Nain (India) is an effort to make papers more accessible to a wider community. It also supports the web version and includes papers from Google Research and etc. This repository is popular enough to have a +2k star and a +200 fork.

Graphic wih text that reads A DevLibrary video interview wth Shai Reznik

Interview series of DevLibrary contributors : Meet the ML GDE Shai Reznik (Israel) and Doug Duhaime. And check out what they built with Google technology and what made them passionate.

Image of a TensorFlow 2.0 Global Docs Sprint event invite with Vikram Tiwari

ML DevFest 2021 by GDG Cloud San Francisco. There are 5 sessions that walk you through framing ML problems, researching ML, building proofs of concepts using existing ML APIs and models, building ML pipelines and etc. ML GDE Vikram Tiwari (USA) presented Vertex, ML Ops and GCP.

The words using Machine Learning for COVID19 helpline with Krupal Modi next to a picture of a man holding a microphone

Krupal Modi (India)’s blog article and #IamaGDE video shares how he’s been leading the machine learning initiatives at Haptik, a conversational AI platform, and how the team paired with the Indian Government and WhatsApp to build a COVID-19 helpline.

Hashtag I am a GDE next to a photo of a woman with sunglasses on her head

Leigh Johnson from USA is the founder of Print Nanny, an automated failure detection system and monitoring system for 3D printers. Meet Leigh in this blog and video!

ML Olympiad: Globally Distributed ML Competitions by the Community

Posted by Hee Jung, DevRel Community Manager

Blog header image shows graphic illustration of people, a group, and a medal

We are happy to announce ML Olympiad, an associated Kaggle Community Competitions hosted by Machine Learning Google Developer Experts (ML GDE) and TensorFlow User Group (TFUG).

Kaggle recently announced "Community Competitions" allowing anyone to create and host a competition at no cost. And our proud members of ML communities decided to dive in and take advantage of the feature to solve critical issues of our time, providing opportunities to train developers.

Why the ML Olympiad?

To train ML for developers leveraging Kaggle’s community competition. This is an opportunity for the participants to practice ML. This is the first 2022 global campaign of the ML Ecosystem team and this helps build stronger communities.

Image with text that reads Community Competitions make machine learning fun

ML Olympiad Community Competitions

Currently, 16 ML Olympiad community competitions are open, hosted by ML GDEs and TFUGs.

Arabic_Poems (in local language) link

  • Predict the name of a poet for Arabic poems. Encourage people to practice on Arabic NLP using TF.
  • Hosts: Ruqiya Bin Safi (ML GDE), Eyad Sibai, Hussain Alfayez / Saudi TFUG & Applied ML/AI group

Sky Survey link

  • Stellar classification with the digital sky survey
  • Hosts: Jieun Yoo, Michael Mellinger / NYTFUG

Análisis epidemiológico Guatemala (in local language) link

  • Make an analysis and prediction of epidemiological cases in Guatemala and the relations.
  • Hosts: Alvin Estrada, Julio Monterroso / TensorFlow User Group Guatemala

QUALITY EDUCATION (in local language) link

  • Competition will be focused on the Enem (National High School Examination) data. Competitors will have to create models to predict student scores in multiple tests.
  • Hosts: Vinicius Fernandes Caridá (ML GDE), Pedro Gengo, Alex Fernandes Mansano / Tensorflow User Group São Paulo

Landscape Image Classification link

  • Classification of partially masked natural images of mountains, buildings, seas, etc.
  • Hosts: Aditya Kane, Yogesh Kulkarni (ML GDE), Shashank Sane / TFUG Pune

Autism Prediction Challenge link

  • Classifying whether individuals have Autism or not.
  • Hosts: Usha Rengaraju, Vijayabharathi Karuppasamy, Samuel T / TFUG Mysuru and TFUG Chennai

Tamkeen Fund Granted link

  • Predict the company funds based on the company's features
  • Hosts: Mohammed buallay (ML GDE), Sayed Ali Alkamel (ML GDE)

Hausa Sentiment Analysis (in local language) link

  • Classify the sentiment of sentences of Hausa Language
  • Hosts: Nuruddeen Sambo, Dattijo Murtala Makama / TFUG Bauchi

TSA Classification (in local language) link

  • We invite participants to develop a classification method to identify early autistic disorders.
  • Hosts: Yannick Serge Obam (ML GDE), Arnold Junior Mve Mve

Let's Fight lung cancer (in local language) link

  • Spotting factors that are link to lung cancer detection
  • Hosts: abderrahman jaize, Sara EL-ATEIF / TFUG Casablanca

Genome Sequences classification (in local language) link

  • Genome sequence classification based on NCBI's GenBank database
  • Hosts: Taha Bouhsine, Said ElHachmey, Lahcen Ousayd / TensorFlow User Group Agadir

GOOD HEALTH AND WELL BEING link

  • Using ML to predict heart disease - If a patient has heart disease or not
  • Hosts: Ibrahim Olagoke, Ahmad Olanrewaju, Ernest Owojori / TensorFlow User Group Ibadan

Preserving North African Culture link

  • We are tackling cultural preservation through a machine learning model capable of identifying the origin of a given item (food, clothing, building).
  • Hosts: elyes manai (ML GDE), Rania Boughanmi, Kayoum Djedidi / IEEE ESSTHS + GDSC ENIT

Delivery Assignment Prediction link

  • The aim of this competition is to build a multi-class classification model capable of accurately predicting the most suitable driver for one or several given orders based on the destination of the order and the paths covered by the deliverers.
  • Host: Thierno Ibrahima DIOP (ML GDE)

Used car price link

  • Predicting the price of an imported used car.
  • Hosts: Armel Yara, Kimana Misago, Jordan Erifried / TFUG Abidjan

TensorFlow Malaysia User Group link

  • Using AI/ML to solve Business Data problem
  • Hosts: Poo Kuan Hoong (ML GDE), Yu Yong Poh, Lau Sian Lun / TensorFlow & Deep Learning Malaysia User Group

Navigating ML Olympiad

You can search “ML Olympiad” on Kaggle Community Competitions page to see them all. And for further info, look for #MLOlympiad on social media.

Google Developers support ML Olympiad by providing swag for top 3 winners of each competition. Find your interest among the competitions, join/share them, and get your part of the swag for competition winners!

Machine Learning Communities: Q3 ‘21 highlights and achievements

Posted by HyeJung Lee, DevRel Community Manager and Soonson Kwon, DevRel Program Manager

Let’s explore highlights and achievements of vast Google Machine Learning communities by region for the last quarter. Activities of experts (GDE, professional individuals), communities (TFUG, TensorFlow user groups), students (GDSC, student clubs), and developers groups (GDG) are presented here.

Key highlights

Image shows a banner for 30 days of ML with Kaggle

30 days of ML with Kaggle is designed to help beginners study ML using Kaggle Learn courses as well as a competition specifically for the participants of this program. Collaborated with the Kaggle team so that +30 the ML GDEs and TFUG organizers participated as volunteers as online mentors as well as speakers for this initiative.

Total 16 of the GDE/GDSC/TFUGs run community organized programs by referring to the shared community organize guide. Houston TensorFlow & Applied AI/ML placed 6th out of 7573 teams — the only Americans in the Top 10 in the competition. And TFUG Santiago (Chile) organizers participated as well and they are number 17 on the public leaderboard.

Asia Pacific

Image shows Google Cloud and Coca-Cola logos

GDE Minori MATSUDA (Japan)’s project on Coca-Cola Bottlers Japan was published on Google Cloud Japan Blog covering creating an ML pipeline to deploy into real business within 2 months by using Vertex AI. This is also published on GCP blog in English.

GDE Chansung Park (Korea) and Sayak Paul (India) published many articles on GCP Blog. First, “Image search with natural language queries” explained how to build a simple image parser from natural language inputs using OpenAI's CLIP model. From this second “Model training as a CI/CD system: (Part I, Part II)” post, you can learn more about why having a resilient CI/CD system for your ML application is crucial for success. Last, “Dual deployments on Vertex AI” talks about end-to-end workflow using Vertex AI, TFX and Kubeflow.

In China, GDE Junpeng Ye used TensorFlow 2.x to significantly reduce the codebase (15k → 2k) on WeChat Finder which is a TikTok alternative in WeChat. GDE Dan lee wrote an article on Understanding TensorFlow Series: Part 1, Part 2, Part 3-1, Part 3-2, Part 4

GDE Ngoc Ba from Vietnam has contributed AI Papers Reading and Coding series implementing ML/DL papers in TensorFlow and creates slides/videos every two weeks. (videos: Vit Transformer, MLP-Mixer and Transformer)

A beginner friendly codelabs (Get started with audio classification ,Go further with audio classification) by GDSC Sookmyung (Korea) learning to customize pre-trained audio classification models to your needs and deploy them to your apps, using TFlite Model Maker.

Cover image for Mat Kelcey's talk on JAX at the PyConAU event

GDE Matthew Kelcey from Australia gave a talk on JAX at PyConAU event. Mat gave an overview to fundamentals of JAX and an intro to some of the libraries being developed on top.

Image shows overview for the released PerceiverIO code

In Singapore, TFUG Singapore dived back into some of the latest papers, techniques, and fields of research that are delivering state-of-the-art results in a number of fields. GDE Martin Andrews included a brief code walkthrough for the released PerceiverIO code at perceiver- highlighting what JAX looks like, how Haiku relates to Sonnet, but also the data loading stuff which is done via tf.data.

Machine Learning Experimentation with TensorBoard book cover

GDE Imran us Salam Mian from Pakistan published a book "Machine Learning Experimentation with TensorBoard".

India

GDE Aakash Nain has published the TF-JAX tutorial series from Part 4 to Part 8. Part 4 gives a brief introduction about JAX (What/Why), and DeviceArray. Part 5 covers why pure functions are good and why JAX prefers them. Part 6 focuses on Pseudo Random Number Generation (PRNG) in Numpy and JAX. Part 7 focuses on Just In Time Compilation (JIT) in JAX. And Part 8 covers vmap and pmap.

Image of Bhavesh's Google Cloud certificate

GDE Bhavesh Bhatt published a video about his experience on the Google Cloud Professional Data Engineer certification exam.

Image shows phase 1 and 2 of the Climate Change project using Vertex AI

Climate Change project using Vertex AI by ML GDE Sayak Paul and Siddha Ganju (NVIDIA). They published a paper (Flood Segmentation on Sentinel-1 SAR Imagery with Semi-Supervised Learning) and open-sourced the project with regard to NASA Impact's ETCI competition. This project made four NeurIPS workshops AI for Science: Mind the Gaps, Tackling Climate Change with Machine Learning, Women in ML, and Machine Learning and the Physical Sciences. And they finished as the first runners-up (see Test Phase 2).

Image shows example of handwriting recognition tutorial

Tutorial on handwriting recognition was contributed to Keras example by GDE Sayak Paul and Aakash Kumar Nain.

Graph regularization for image classification using synthesized graphs by GDE Sayak Pau was added to the official examples in the Neural Structured Learning in TensorFlow.

GDE Sayak Paul and Soumik Rakshit shared a new NLP dataset for multi-label text classification. The dataset consists of paper titles, abstracts, and term categories scraped from arXiv.

North America

Banner image shows students participating in Google Summer of Code

During the GSoC (Google Summer of Code), some GDEs mentored or co-mentored students. GDE Margaret Maynard-Reid (USA) mentored TF-GAN, Model Garden, TF Hub and TFLite products. You can get some of her experience and tips from the GDE Blog. And you can find GDE Sayak Paul (India) and Googler Morgan Roff’s GSoC experience in (co-)mentoring TensorFlow and TF Hub as well.

A beginner friendly workshop on TensorFlow with ML GDE Henry Ruiz (USA) was hosted by GDSC Texas A&M University (USA) for the students.

Screenshot from Youtube video on how transformers work

Youtube video Self-Attention Explained: How do Transformers work? by GDE Tanmay Bakshi from Canada explained how you can build a Transformer encoder-based neural network to classify code into 8 different programming languages using TPU, Colab with Keras.

Europe

GDG / GDSC Turkey hosted AI Summer Camp in cooperation with Global AI Hub. 7100 participants learned about ML, TensorFlow, CV and NLP.

Screenshot from slide presentation titled Why Jax?

TechTalk Speech Processing with Deep Learning and JAX/Trax by GDE Sergii Khomenko (Germany) and M. Yusuf Sarıgöz (Turkey). They reviewed technologies such as Jax, TensorFlow, Trax, and others that can help boost our research in speech processing.

South/Central America

Image shows Custom object detection in the browser using TensorFlow.js

On the other side of the world, in Brazil, GDE Hugo Zanini Gomes wrote an article about “Custom object detection in the browser using TensorFlow.js” using the TensorFlow 2 Object Detection API and Colab was posted on the TensorFlow blog.

Screenshot from a talk about Real-time semantic segmentation in the browser - Made with TensorFlow.js

And Hugo gave a talk about Real-time semantic segmentation in the browser - Made with TensorFlow.js covered using SavedModels in an efficient way in JavaScript directly enabling you to get the reach and scale of the web for your new research.

Data Pipelines for ML was talked about by GDE Nathaly Alarcon Torrico from Bolivia explained all the phases involved in the creation of ML and Data Science products, starting with the data collection, transformation, storage and Product creation of ML models.

Screensho from TechTalk “Machine Learning Competitivo: Top 1% en Kaggle (Video)

TechTalk “Machine Learning Competitivo: Top 1% en Kaggle (Video)“ was hosted by TFUG Santiago (Chile). In this talk the speaker gave a tour of the steps to follow to generate a model capable of being in the top 1% of the Kaggle Leaderboard. The focus was on showing the libraries and“ tricks ”that are used to be able to test many ideas quickly both in implementation and in execution and how to use them in productive environments.

MENA

Screenshot from workshop about Recurrent Neural Networks

GDE Ruqiya Bin Safi (Saudi Arabia) had a workshop about Recurrent Neural Networks : part 1 (Github / Slide) at the GDG Mena. And Ruqiya gave a talk about Recurrent Neural Networks: part 2 at the GDG Cloud Saudi (Saudi Arabia).

AI Training with Kaggle by GDSC Islamic University of Gaza from Palestine. It is a two month training covering Data Processing, Image Processing and NLP with Kaggle.

Sub-Saharan Africa

TFUG Ibadan had two TensorFlow events : Basic Sentiment analysis with Tensorflow and Introduction to Recommenders Systems with TensorFlow”.

Image of Yannick Serge Obam Akou's TensorFlow Certificate

Article covered some tips to study, prepare and pass the TensorFlow developer exam in French by ML GDE Yannick Serge Obam Akou (Cameroon).

Advancing Instance-Level Recognition Research

Instance-level recognition (ILR) is the computer vision task of recognizing a specific instance of an object, rather than simply the category to which it belongs. For example, instead of labeling an image as “post-impressionist painting”, we’re interested in instance-level labels like “Starry Night Over the Rhone by Vincent van Gogh”, or “Arc de Triomphe de l'Étoile, Paris, France”, instead of simply “arch”. Instance-level recognition problems exist in many domains, like landmarks, artwork, products, or logos, and have applications in visual search apps, personal photo organization, shopping and more. Over the past several years, Google has been contributing to research on ILR with the Google Landmarks Dataset and Google Landmarks Dataset v2 (GLDv2), and novel models such as DELF and Detect-to-Retrieve.

Three types of image recognition problems, with different levels of label granularity (basic, fine-grained, instance-level), for objects from the artwork, landmark and product domains. In our work, we focus on instance-level recognition.

Today, we highlight some results from the Instance-Level Recognition Workshop at ECCV’20. The workshop brought together experts and enthusiasts in this area, with many fruitful discussions, some of which included our ECCV’20 paper “DEep Local and Global features” (DELG), a state-of-the-art image feature model for instance-level recognition, and a supporting open-source codebase for DELG and other related ILR techniques. Also presented were two new landmark challenges (on recognition and retrieval tasks) based on GLDv2, and future ILR challenges that extend to other domains: artwork recognition and product retrieval. The long-term goal of the workshop and challenges is to foster advancements in the field of ILR and push forward the state of the art by unifying research workstreams from different domains, which so far have mostly been tackled as separate problems.

DELG: DEep Local and Global Features
Effective image representations are the key components required to solve instance-level recognition problems. Often, two types of representations are necessary: global and local image features. A global feature summarizes the entire contents of an image, leading to a compact representation but discarding information about spatial arrangement of visual elements that may be characteristic of unique examples. Local features, on the other hand, comprise descriptors and geometry information about specific image regions; they are especially useful to match images depicting the same objects.

Currently, most systems that rely on both of these types of features need to separately adopt each of them using different models, which leads to redundant computations and lowers overall efficiency. To address this, we proposed DELG, a unified model for local and global image features.

The DELG model leverages a fully-convolutional neural network with two different heads: one for global features and the other for local features. Global features are obtained using pooled feature maps of deep network layers, which in effect summarize the salient features of the input images making the model more robust to subtle changes in input. The local feature branch leverages intermediate feature maps to detect salient image regions, with the help of an attention module, and to produce descriptors that represent associated localized contents in a discriminative manner.

Our proposed DELG model (left). Global features can be used in the first stage of a retrieval-based system, to efficiently select the most similar images (bottom). Local features can then be employed to re-rank top results (top, right), increasing the precision of the system.

This novel design allows for efficient inference since it enables extraction of global and local features within a single model. For the first time, we demonstrated that such a unified model can be trained end-to-end and deliver state-of-the-art results for instance-level recognition tasks. When compared to previous global features, this method outperforms other approaches by up to 7.5% mean average precision; and for the local feature re-ranking stage, DELG-based results are up to 7% better than previous work. Overall, DELG achieves 61.2% average precision on the recognition task of GLDv2, which outperforms all except two methods of the 2019 challenge. Note that all top methods from that challenge used complex model ensembles, while our results use only a single model.

Tensorflow 2 Open-Source Codebase
To foster research reproducibility, we are also releasing a revamped open-source codebase that includes DELG and other techniques relevant to instance-level recognition, such as DELF and Detect-to-Retrieve. Our code adopts the latest Tensorflow 2 releases, and makes available reference implementations for model training & inference, besides image retrieval and matching functionalities. We invite the community to use and contribute to this codebase in order to develop strong foundations for research in the ILR field.

New Challenges for Instance Level Recognition
Focused on the landmarks domain, the Google Landmarks Dataset v2 (GLDv2) is the largest available dataset for instance-level recognition, with 5 million images spanning 200 thousand categories. By training landmark retrieval models on this dataset, we have demonstrated improvements of up to 6% mean average precision, compared to models trained on earlier datasets. We have also recently launched a new browser interface for visually exploring the GLDv2 dataset.

This year, we also launched two new challenges within the landmark domain, one focusing on recognition and the other on retrieval. These competitions feature newly-collected test sets, and a new evaluation methodology: instead of uploading a CSV file with pre-computed predictions, participants have to submit models and code that are run on Kaggle servers, to compute predictions that are then scored and ranked. The compute restrictions of this environment put an emphasis on efficient and practical solutions.

The challenges attracted over 1,200 teams, a 3x increase over last year, and participants achieved significant improvements over our strong DELG baselines. On the recognition task, the highest scoring submission achieved a relative increase of 43% average precision score and on the retrieval task, the winning team achieved a 59% relative improvement of the mean average precision score. This latter result was achieved via a combination of more effective neural networks, pooling methods and training protocols (see more details on the Kaggle competition site).

In addition to the landmark recognition and retrieval challenges, our academic and industrial collaborators discussed their progress on developing benchmarks and competitions in other domains. A large-scale research benchmark for artwork recognition is under construction, leveraging The Met’s Open Access image collection, and with a new test set consisting of guest photos exhibiting various photometric and geometric variations. Similarly, a new large-scale product retrieval competition will capture various challenging aspects, including a very large number of products, a long-tailed class distribution and variations in object appearance and context. More information on the ILR workshop, including slides and video recordings, is available on its website.

With this research, open source code, data and challenges, we hope to spur progress in instance-level recognition and enable researchers and machine learning enthusiasts from different communities to develop approaches that generalize across different domains.

Acknowledgements
The main Google contributors of this project are André Araujo, Cam Askew, Bingyi Cao, Jack Sim and Tobias Weyand. We’d like to thank the co-organizers of the ILR workshop Ondrej Chum, Torsten Sattler, Giorgos Tolias (Czech Technical University), Bohyung Han (Seoul National University), Guangxing Han (Columbia University), Xu Zhang (Amazon), collaborators on the artworks dataset Nanne van Noord, Sarah Ibrahimi (University of Amsterdam), Noa Garcia (Osaka University), as well as our collaborators from the Metropolitan Museum of Art: Jennie Choi, Maria Kessler and Spencer Kiser. For the open-source Tensorflow codebase, we’d like to thank the help of recent contributors: Dan Anghel, Barbara Fusinska, Arun Mukundan, Yuewei Na and Jaeyoun Kim. We are grateful to Will Cukierski, Phil Culliton, Maggie Demkin for their support with the landmarks Kaggle competitions. Also we’d like to thank Ralph Keller and Boris Bluntschli for their help with data collection.

Source: Google AI Blog


An End-to-End AutoML Solution for Tabular Data at KaggleDays



Machine learning (ML) for tabular data (e.g. spreadsheet data) is one of the most active research areas in both ML research and business applications. Solutions to tabular data problems, such as fraud detection and inventory prediction, are critical for many business sectors, including retail, supply chain, finance, manufacturing, marketing and others. Current ML-based solutions to these problems can be achieved by those with significant ML expertise, including manual feature engineering and hyper-parameter tuning, to create a good model. However, the lack of broad availability of these skills limits the efficiency of business improvements through ML.

Google’s AutoML efforts aim to make ML more scalable and accelerate both research and industry applications. Our initial efforts of neural architecture search have enabled breakthroughs in computer vision with NasNet, and evolutionary methods such as AmoebaNet and hardware-aware mobile vision architecture MNasNet further show the benefit of these learning-to-learn methods. Recently, we applied a learning-based approach to tabular data, creating a scalable end-to-end AutoML solution that meets three key criteria:
  • Full automation: Data and computation resources are the only inputs, while a servable TensorFlow model is the output. The whole process requires no human intervention.
  • Extensive coverage: The solution is applicable to the majority of arbitrary tasks in the tabular data domain.
  • High quality: Models generated by AutoML has comparable quality to models manually crafted by top ML experts.
To benchmark our solution, we entered our algorithm in the KaggleDays SF Hackathon, an 8.5 hour competition of 74 teams with up to 3 members per team, as part of the KaggleDays event. The first time that AutoML has competed against Kaggle participants, the competition involved predicting manufacturing defects given information about the material properties and testing results for batches of automotive parts. Despite competing against participants thats were at the Kaggle progression system Master level, including many who were at the GrandMaster level, our team (“Google AutoML”) led for most of the day and ended up finishing second place by a narrow margin, as seen in the final leaderboard.

Our team’s AutoML solution was a multistage TensorFlow pipeline. The first stage is responsible for automatic feature engineering, architecture search, and hyperparameter tuning through search. The promising models from the first stage are fed into the second stage, where cross validation and bootstrap aggregating are applied for better model selection. The best models from the second stage are then combined in the final model.
The workflow for the “Google AutoML” team was quite different from that of other Kaggle competitors. While they were busy with analyzing data and experimenting with various feature engineering ideas, our team spent most of time monitoring jobs and and waiting for them to finish. Our solution for second place on the final leaderboard required 1 hour on 2500 CPUs to finish end-to-end.

After the competition, Kaggle published a public kernel to investigate winning solutions and found that augmenting the top hand-designed models with AutoML models, such as ours, could be a useful way for ML experts to create even better performing systems. As can be seen in the plot below, AutoML has the potential to enhance the efforts of human developers and address a broad range of ML problems.
Potential model quality improvement on final leaderboard if AutoML models were merged with other Kagglers’ models. “Erkut & Mark, Google AutoML”, includes the top winner “Erkut & Mark” and the second place “Google AutoML” models. Erkut Aykutlug and Mark Peng used XGBoost with creative feature engineering whereas AutoML uses both neural network and gradient boosting tree (TFBT) with automatic feature engineering and hyperparameter tuning.
Google Cloud AutoML Tables
The solution we presented at the competitions is the main algorithm in Google Cloud AutoML Tables, which was recently launched (beta) at Google Cloud Next ‘19. The AutoML Tables implementation regularly performs well in benchmark tests against Kaggle competitions as shown in the plot below, demonstrating state-of-the-art performance across the industry.
Third party benchmark of AutoML Tables on multiple Kaggle competitions
We are excited about the potential application of AutoML methods across a wide range of real business problems. Customers have already been leveraging their tabular enterprise data to tackle mission-critical tasks like supply chain management and lead conversion optimization using AutoML Tables, and we are excited to be providing our state-of-the-art models to solve tabular data problems.

Acknowledgements
This project was only possible thanks to Google Brain team members Ming Chen, Da Huang, Yifeng Lu, Quoc V. Le and Vishy Tirumalashetty. We also thank Dawei Jia, Chenyu Zhao and Tin-yun Ho from the Cloud AutoML Tables team for great infrastructure and product landing collaboration. Thanks to Walter Reade, Julia Elliott and Kaggle for organizing such an engaging competition.

Source: Google AI Blog


Welcome Kaggle to Google Cloud

Today, I’m excited to announce that Kaggle will be joining Google Cloud. Founded in 2010, Kaggle is home to the world's largest community of data scientists and machine learning enthusiasts. More than 800,000 data experts use Kaggle to explore, analyze and understand the latest updates in machine learning and data analytics. Kaggle is the best place to search and analyze public datasets, build machine learning models and grow your data science expertise.


During my keynote talk at Next ‘17, I emphasized the importance of democratizing AI. We must lower the barriers of entry to AI and make it available to the largest community of developers, users and enterprises, so they can apply it to their own unique needs. With Kaggle joining the Google Cloud team, we can accelerate this mission.


Kaggle and Google Cloud will continue to support machine learning training and deployment services, while offering the community the ability to store and query large datasets.


I’m thrilled to welcome Kaggle to the team. Kaggle and Google Cloud will foster a thriving community of machine learning developers and data scientists, giving them direct access to the most advanced cloud machine learning environment.


I can’t wait to see what machine learning problems you solve and the breakthroughs you achieve.

Posted by, Fei-Fei Li, Chief Scientist, Google Cloud AI and Machine Learning