Anomaly detection with few labeled samples under distribution mismatch

SPADE: Semi-Supervised Anomaly Detection under Distribution Mismatch

What is SPADE?

Recently, we have open-sourced SPADE (Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling), a semi-supervised framework for anomaly detection that overcomes some of the drawbacks of alternative anomaly detection methods.

What Problem does SPADE Solve?

Anomaly detection is the process of identifying samples in a dataset that diverge from some expected pattern. This process has wide applications in several industries such as API security, financial fraud and manufacturing defect detection. SPADE is especially designed for semi-supervised settings where we have a handful of labeled data and a large number of unlabeled data.

When is SPADE better for your Use Case?

Creating a large labeled set of anomalous and non-anomalous samples for supervised learning can be time-consuming, expensive and error-prone. So unsupervised and semi-supervised methods have become an active area of research.

Most of these semi-supervised methods make the assumption that the labeled and unlabeled data come from the same distribution, that is, they are generated by the same underlying process—physical, financial, manufacturing or other process. This assumption is often violated in different ways—the labeled data could contain one type of anomaly while the unlabeled data contains other types of anomalies; or the labeled data could only contain samples that were easy to label. In these and potentially other cases, SPADE has been shown to have better performance than alternatives.

How does it Work?

SPADE constructs an ensemble of One-Class Classifiers (OCCs); each OCC is a Gaussian Mixture Model trained in a self-supervised manner on a disjoint subset of the unlabeled samples and non-anomalous samples.

moving image of the process of SPADE training an ensemble of OCC, providing pseudo-labels, and then using both labeled and pseudo-labeled sampled to train a supervised model for anomaly detection

Figure 1. SPADE first trains an ensemble of OCC to provide pseudo-labels to the unlabeled samples. Then, both labeled and pseudo-labeled samples are used to train a supervised model for the anomaly detection.

The ensemble is used to obtain pseudo-labels for the unlabeled data. A pseudo-label of is-anomalous or not-anomalous is assigned only if all the members of the ensemble agree. The pseudo-labels and any original labels are used together to train a supervised anomaly detector model. In the version of SPADE that we are open-sourcing, this model is a Tensorflow Random Forest that is trained with a binary cross-entropy loss. Once trained on the labels and pseudo-labels, the detector model can be used for online or batch prediction.

Example Use Cases

The above described benefits of SPADE are highlighted in our experiments as detailed in the published paper (in TMLR with feature certification). Here we present some results on a selection of datasets that demonstrate SPADE performance when (a) there are new types of anomalies in the unlabeled dataset, (b) when the labeled anomalies are easy to label, and (c) when the dataset contains only positively labeled and unlabeled samples.

Graph showing SPADE performance compared against other supervised, semi-supervised and unsupervised methods.

Figure 2. SPADE performance compared against other supervised, semi-supervised and unsupervised methods. Details about the datasets and the methods can be found in our paper.

As shown in Figure 2, SPADE consistently outperforms alternative methods. The CoverType and Thyroid datasets have Creative Commons Attribution 4.0 International (CC BY 4.0) licenses and are present in the SPADE repository.

How to use SPADE

We have just open-sourced SPADE. The repository contains scripts that build a Docker container and push the container, then run the container as a Vertex Custom Job on Google Cloud Platform. The dataset is read from BigQuery. Metrics such as AUC, Precision and Recall can currently be tracked in the job logs. The job launch script is configured with a default set of hyperparameters as described in the documentation. Users may need to adjust the hyperparameters to obtain optimal performance. The final trained anomaly detection model artifact is written to Google Cloud Storage (GCS). This artifact can be deployed as a Vertex Endpoint to serve predictions (not demonstrated in this repository).

Ways to Help

By open sourcing SPADE, we hope to foster more usage of this innovative anomaly detection method in the community, as well as invite contributions to improve the method. The SPADE model and code is freely available on Github under the Apache-2.0 license. SPADE is currently set up to run in a Docker container as a Vertex Custom Job on Google Cloud Platform. It can also be run by installing from PyPi using pip install spade-anomaly-detection. Users can upload their dataset to BigQuery, and run the training job on Vertex, or on a local machine from the PyPi installation.

More detailed usage instructions are available in the documentation.

By Raj Sinha and Jinsung Yoon, Cloud AI Research Team

Source: Google Open Source Blog

7 new Android features to elevate your everyday

Android is announcing new features rolling out now and in the coming weeks.

Source: Android

Beta Channel Update for ChromeOS/ChromeOS Flex

The Beta channel is being updated to OS version: 15886.16.0, Browser version: 126.0.6478.24 for most ChromeOS devices.

If you find new issues, please let us know one of the following ways:

File a bug
Visit our ChromeOS communities

General: Chromebook Help Community
Beta Specific: ChromeOS Beta Help Community

Report an issue or send feedback on Chrome
Interested in switching channels? Find out how.

Cole Brown,

Google ChromeOS

Source: Google Chrome Releases

News Showcase is launching in Cyprus

Google News Showcase is rolling out in Cyprus. Here’s how we are partnering with publishers.

Source: The Official Google Blog

Supporting the UK General Election in 2024

We are supporting the UK General Election by surfacing high-quality information, safeguarding our platforms and equipping campaigns with best-in-class security tools and…

Source: Google in Europe

Search Central Live 2024 is coming back to the APAC region

Search Central Live is coming back to the Asia Pacific region, bringing you insights from Google Search, fun networking opportunity, and more! This year we're aiming to visit Indonesia, Malaysia, Taiwan, and Thailand.

Source: Google Search Central Blog

AI Edge Torch Generative API for Custom LLMs on Device

AI Edge Torch Generative API enables developers to bring powerful new capabilities on-device, such as summarization, content generation, and more.

Source: Google Developers Blog

Chrome Beta for Desktop Update

The Beta channel has been updated to 126.0.6478.26 for Windows, Mac and Linux.

A partial list of changes is available in the Git log. Interested in switching release channels? Find out how. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.

Srinivas Sista
Google Chrome

Source: Google Chrome Releases

Gemini for Workspace usage reports are now available in Admin console

What’s changing

Starting today, we’re introducing Gemini for Workspace usage reports in the Admin console. This report gives admins an overarching view of how Gemini is being used in their organization, specifically:

Assigned Gemini licenses,
Active Gemini users,
And the number of users who are using Gemini over time.

Gemini usage reports in the Admin console

These reports will help admins understand how many users are using Gemini features and make informed decisions about expanding Gemini further within their organizations. We plan to introduce more reporting features over time, such as the ability to filter these reports by Organizational Units and Groups.

Additional details

Admins can access these reports via admin console under Menu > Generative AI > Gemini reports. Visit the Help Center to learn more about reviewing Gemini usage in your organization.

Getting started

Admins: Visit the Help Center to learn more about reviewing Gemini usage in your organization.
End users: There is no end user impact or action required.

Rollout pace

Rapid Release and Scheduled Release domains: Full rollout (1–4 days for feature visibility) starting on May 27, 2024

Availability

Available for Google Workspace customers with the Gemini Business and Gemini Enterprise add-ons.

We plan to introduce Gemini reports for the Gemini Education and Gemini Premium add-ons in the coming weeks. Stay tuned to the Workspace Updates blog for more information.

Resources

Source: Google Workspace Updates

Chrome Beta for Android Update

Hi everyone! We've just released Chrome Beta 126 (126.0.6478.26) for Android. It's now available on Google Play.

You can see a partial list of the changes in the Git log. For details on new features, check out the Chromium blog, and for details on web platform updates, check here.

If you find a new issue, please let us know by filing a bug.

Krishna Govind
Google Chrome

googblogs.com

All Google blogs and Press in one site

Anomaly detection with few labeled samples under distribution mismatch

SPADE: Semi-Supervised Anomaly Detection under Distribution Mismatch

What is SPADE?

What Problem does SPADE Solve?

When is SPADE better for your Use Case?

How does it Work?

Example Use Cases

How to use SPADE

Ways to Help

Source: Google Open Source Blog

7 new Android features to elevate your everyday

Source: Android

Beta Channel Update for ChromeOS/ChromeOS Flex

Source: Google Chrome Releases

News Showcase is launching in Cyprus

Source: The Official Google Blog

Supporting the UK General Election in 2024

Source: Google in Europe

Search Central Live 2024 is coming back to the APAC region

Source: Google Search Central Blog

AI Edge Torch Generative API for Custom LLMs on Device

Source: Google Developers Blog

Chrome Beta for Desktop Update

Source: Google Chrome Releases

Gemini for Workspace usage reports are now available in Admin console

What’s changing

Additional details

Getting started

Rollout pace

Availability

Resources

Source: Google Workspace Updates

Chrome Beta for Android Update

Source: Google Chrome Releases