Anomaly detection with few labeled samples under distribution mismatch


SPADE: Semi-Supervised Anomaly Detection under Distribution Mismatch


What is SPADE?

Recently, we have open-sourced SPADE (Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling), a semi-supervised framework for anomaly detection that overcomes some of the drawbacks of alternative anomaly detection methods.


What Problem does SPADE Solve?

Anomaly detection is the process of identifying samples in a dataset that diverge from some expected pattern. This process has wide applications in several industries such as API security, financial fraud and manufacturing defect detection. SPADE is especially designed for semi-supervised settings where we have a handful of labeled data and a large number of unlabeled data.


When is SPADE better for your Use Case?

Creating a large labeled set of anomalous and non-anomalous samples for supervised learning can be time-consuming, expensive and error-prone. So unsupervised and semi-supervised methods have become an active area of research.

Most of these semi-supervised methods make the assumption that the labeled and unlabeled data come from the same distribution, that is, they are generated by the same underlying process—physical, financial, manufacturing or other process. This assumption is often violated in different ways—the labeled data could contain one type of anomaly while the unlabeled data contains other types of anomalies; or the labeled data could only contain samples that were easy to label. In these and potentially other cases, SPADE has been shown to have better performance than alternatives.


How does it Work?

SPADE constructs an ensemble of One-Class Classifiers (OCCs); each OCC is a Gaussian Mixture Model trained in a self-supervised manner on a disjoint subset of the unlabeled samples and non-anomalous samples.

moving image of the process of SPADE training an ensemble of OCC, providing pseudo-labels, and then using both labeled and pseudo-labeled sampled to train a supervised model for anomaly detection
Figure 1. SPADE first trains an ensemble of OCC to provide pseudo-labels to the unlabeled samples. Then, both labeled and pseudo-labeled samples are used to train a supervised model for the anomaly detection.

The ensemble is used to obtain pseudo-labels for the unlabeled data. A pseudo-label of is-anomalous or not-anomalous is assigned only if all the members of the ensemble agree. The pseudo-labels and any original labels are used together to train a supervised anomaly detector model. In the version of SPADE that we are open-sourcing, this model is a Tensorflow Random Forest that is trained with a binary cross-entropy loss. Once trained on the labels and pseudo-labels, the detector model can be used for online or batch prediction.


Example Use Cases

The above described benefits of SPADE are highlighted in our experiments as detailed in the published paper (in TMLR with feature certification). Here we present some results on a selection of datasets that demonstrate SPADE performance when (a) there are new types of anomalies in the unlabeled dataset, (b) when the labeled anomalies are easy to label, and (c) when the dataset contains only positively labeled and unlabeled samples.

Graph showing SPADE performance compared against other supervised, semi-supervised and unsupervised methods.
Figure 2. SPADE performance compared against other supervised, semi-supervised and unsupervised methods. Details about the datasets and the methods can be found in our paper.

As shown in Figure 2, SPADE consistently outperforms alternative methods. The CoverType and Thyroid datasets have Creative Commons Attribution 4.0 International (CC BY 4.0) licenses and are present in the SPADE repository.


How to use SPADE

We have just open-sourced SPADE. The repository contains scripts that build a Docker container and push the container, then run the container as a Vertex Custom Job on Google Cloud Platform. The dataset is read from BigQuery. Metrics such as AUC, Precision and Recall can currently be tracked in the job logs. The job launch script is configured with a default set of hyperparameters as described in the documentation. Users may need to adjust the hyperparameters to obtain optimal performance. The final trained anomaly detection model artifact is written to Google Cloud Storage (GCS). This artifact can be deployed as a Vertex Endpoint to serve predictions (not demonstrated in this repository).


Ways to Help

By open sourcing SPADE, we hope to foster more usage of this innovative anomaly detection method in the community, as well as invite contributions to improve the method. The SPADE model and code is freely available on Github under the Apache-2.0 license. SPADE is currently set up to run in a Docker container as a Vertex Custom Job on Google Cloud Platform. It can also be run by installing from PyPi using pip install spade-anomaly-detection. Users can upload their dataset to BigQuery, and run the training job on Vertex, or on a local machine from the PyPi installation.

More detailed usage instructions are available in the documentation.

By Raj Sinha and Jinsung Yoon, Cloud AI Research Team

Beta Channel Update for ChromeOS/ChromeOS Flex

 The Beta channel is being updated to OS version: 15886.16.0, Browser version: 126.0.6478.24 for most ChromeOS devices.

If you find new issues, please let us know one of the following ways:

  1. File a bug
  2. Visit our ChromeOS communities
    1. General: Chromebook Help Community
    2. Beta Specific: ChromeOS Beta Help Community
  3. Report an issue or send feedback on Chrome
  4. Interested in switching channels? Find out how.

Cole Brown,

Google ChromeOS

Chrome Beta for Desktop Update

The Beta channel has been updated to 126.0.6478.26 for Windows, Mac and Linux.

A partial list of changes is available in the Git log. Interested in switching release channels? Find out how. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.

Srinivas Sista
Google Chrome

Gemini for Workspace usage reports are now available in Admin console

What’s changing 

Starting today, we’re introducing Gemini for Workspace usage reports in the Admin console. This report gives admins an overarching view of how Gemini is being used in their organization, specifically: 
  • Assigned Gemini licenses, 
  • Active Gemini users, 
  • And the number of users who are using Gemini over time.


Gemini usage reports in the Admin console


These reports will help admins understand how many users are using Gemini features and make informed decisions about expanding Gemini further within their organizations. We plan to introduce more reporting features over time, such as the ability to filter these reports by Organizational Units and Groups.


Additional details

Admins can access these reports via admin  console under Menu > Generative AI > Gemini reports. Visit the Help Center to learn more about reviewing Gemini usage in your organization.


Getting started

Rollout pace


Availability

  • Available for Google Workspace customers with the Gemini Business and Gemini Enterprise add-ons.
We plan to introduce Gemini reports for the Gemini Education and Gemini Premium add-ons in the coming weeks. Stay tuned to the Workspace Updates blog for more information. 

Chrome Beta for Android Update

Hi everyone! We've just released Chrome Beta 126 (126.0.6478.26) for Android. It's now available on Google Play.

You can see a partial list of the changes in the Git log. For details on new features, check out the Chromium blog, and for details on web platform updates, check here.

If you find a new issue, please let us know by filing a bug.

Krishna Govind
Google Chrome