Tag Archives: Supervised Learning

Extending Contrastive Learning to the Supervised Setting

In recent years, self-supervised representation learning, which is used in a variety of image and video tasks, has significantly advanced due to the application of contrastive learning. These contrastive learning approaches typically teach a model to pull together the representations of a target image (a.k.a., the “anchor”) and a matching (“positive”) image in embedding space, while also pushing apart the anchor from many non-matching (“negative”) images. Because labels are assumed to be unavailable in self-supervised learning, the positive is often an augmentation of the anchor, and the negatives are chosen to be the other samples from the training minibatch. However, because of this random sampling, false negatives, i.e., negatives generated from samples of the same class as the anchor, can cause a degradation in the representation quality. Furthermore, determining the optimal method to generate positives is still an area of active research.

In contrast to the self-supervised approach, a fully-supervised approach could use labeled data to generate positives from existing same-class examples, providing more variability in pretraining than could typically be achieved by simply augmenting the anchor. However, very little work has been done to successfully apply contrastive learning in the fully-supervised domain.

In “Supervised Contrastive Learning”, presented at NeurIPS 2020, we propose a novel loss function, called SupCon, that bridges the gap between self-supervised learning and fully supervised learning and enables contrastive learning to be applied in the supervised setting. Leveraging labeled data, SupCon encourages normalized embeddings from the same class to be pulled closer together, while embeddings from different classes are pushed apart. This simplifies the process of positive selection, while avoiding potential false negatives. Because it accommodates multiple positives per anchor, this approach results in an improved selection of positive examples that are more varied, while still containing semantically relevant information. SupCon also allows label information to play an active role in representation learning rather than restricting it to be used only in downstream training, as is the case for conventional contrastive learning. To the best of our knowledge, this is the first contrastive loss to consistently perform better on large-scale image classification problems than the common approach of using cross-entropy loss to train the model directly. Importantly, SupCon is straightforward to implement and stable to train, provides consistent improvement to top-1 accuracy for a number of datasets and architectures (including Transformer architectures), and is robust to image corruptions and hyperparameter variations.

Self-supervised (left) vs supervised (right) contrastive losses: The self-supervised contrastive loss contrasts a single positive for each anchor (i.e., an augmented version of the same image) against a set of negatives consisting of the entire remainder of the minibatch. The supervised contrastive loss considered in this paper, however, contrasts the set of all samples from the same class as positives against the negatives from the remainder of the batch.

The Supervised Contrastive Learning Framework
SupCon can be seen as a generalization of both the SimCLR and N-pair losses — the former uses positives generated from the same sample as that of the anchor, and the latter uses positives generated from different samples by exploiting known class labels. The use of many positives and many negatives for each anchor allows SupCon to achieve state-of-the-art performance without the need for hard negative mining (i.e., searching for negatives similar to the anchor), which can be difficult to tune properly.

SupCon subsumes multiple losses from the literature and is a generalization of the SimCLR and N-Pair losses.

This method is structurally similar to those used in self-supervised contrastive learning, with modifications for supervised classification. Given an input batch of data, we first apply data augmentation twice to obtain two copies, or “views,” of each sample in the batch (though one could create and use any number of augmented views). Both copies are forward propagated through an encoder network, and the resulting embedding is then L2-normalized. Following standard practice, the representation is further propagated through an optional projection network to help identify meaningful features. The supervised contrastive loss is computed on the normalized outputs of the projection network. Positives for an anchor consist of the representations originating from the same batch instance as the anchor or from other instances with the same label as the anchor; the negatives are then all remaining instances. To measure performance on downstream tasks, we train a linear classifier on top of the frozen representations.

Cross-entropy, self-supervised contrastive loss and supervised contrastive loss Left: The cross-entropy loss uses labels and a softmax loss to train a classifier. Middle: The self-supervised contrastive loss uses a contrastive loss and data augmentations to learn representations. Right: The supervised contrastive loss also learns representations using a contrastive loss, but uses label information to sample positives in addition to augmentations of the same image.

Key Findings
SupCon consistently boosts top-1 accuracy compared to cross-entropy, margin classifiers (with use of labels), and self-supervised contrastive learning techniques on CIFAR-10 and CIFAR-100 and ImageNet datasets. With SupCon, we achieve excellent top-1 accuracy on the ImageNet dataset with the ResNet-50 and ResNet-200 architectures. On ResNet-200, we achieve a top-1 accuracy of 81.4%, which is a 0.8% improvement over the state-of-the-art cross-entropy loss using the same architecture (which represents a significant advance for ImageNet). We also compared cross-entropy and SupCon on a Transformer-based ViT-B/16 model and found a consistent improvement over cross-entropy (77.8% versus 76% for ImageNet; 92.6% versus 91.6% for CIFAR-10) under the same data augmentation regime (without any higher-resolution fine-tuning).

The SupCon loss consistently outperforms cross-entropy with standard data augmentation strategies (AutoAugment, RandAugment and CutMix). We show top-1 accuracy for ImageNet, on ResNet-50, ResNet-101 and ResNet200.

We also demonstrate analytically that the gradient of our loss function encourages learning from hard positives and hard negatives. The gradient contributions from hard positives/negatives are large while those for easy positives/negatives are small. This implicit property allows the contrastive loss to sidestep the need for explicit hard mining, which is a delicate but critical part of many losses, such as triplet loss. See the supplementary material of our paper for a full derivation.

SupCon is also more robust to natural corruptions, such as noise, blur and JPEG compression. The mean Corruption Error (mCE) measures the average degradation in performance compared to the benchmark ImageNet-C dataset. The SupCon models have lower mCE values across different corruptions compared to cross-entropy models, showing increased robustness.

We show empirically that the SupCon loss is less sensitive than cross-entropy to a range of hyperparameters. Across changes in augmentations, optimizers, and learning rates, we observe significantly lower variance in the output of the contrastive loss. Moreover, applying different batch sizes while holding all other hyperparameters constant results in consistently better top-1 accuracy of SupCon to that of cross-entropy at each batch size.

Accuracy of cross-entropy and supervised contrastive loss as a function of hyperparameters and training data size, measured on ImageNet with a ResNet-50 encoder. Left: Boxplot showing Top-1 accuracy vs changes in augmentation, optimizer and learning rates. SupCon yields more consistent results across variations in each, which is useful when the best strategies are unknown a priori. Right: Top-1 accuracy as a function of batch size shows both losses benefit from larger batch sizes while SupCon has higher Top-1 accuracy, even when trained with small batch sizes.
Accuracy of supervised contrastive loss as a function of training duration and the temperature hyperparameter, measured on ImageNet with a ResNet-50 encoder. Left: Top-1 accuracy as a function of SupCon pre-training epochs. Right: Top-1 accuracy as a function of temperature during the pre-training stage for SupCon. Temperature is an important hyperparameter in contrastive learning and reducing sensitivity to temperature is desirable.

Broader Impact and Next Steps
This work provides a technical advancement in the field of supervised classification. Supervised contrastive learning can improve both the accuracy and robustness of classifiers with minimal complexity. The classic cross-entropy loss can be seen as a special case of SupCon where the views correspond to the images and the learned embeddings in the final linear layer corresponding to the labels. We note that SupCon benefits from large batch sizes, and being able to train the models on smaller batches is an important topic for future research.

Our Github repository includes Tensorflow code to train the models in the paper. Our pre-trained models are also released on TF-Hub.

The NeurIPS paper was jointly co-authored with Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Special thanks to Jenny Huang for leading the writing process for this blogpost.

Source: Google AI Blog

Learning to Smell: Using Deep Learning to Predict the Olfactory Properties of Molecules

Smell is a sense shared by an incredible range of living organisms, and plays a critical role in how they analyze and react to the world. For humans, our sense of smell is tied to our ability to enjoy food and can also trigger vivid memories. Smell allows us to appreciate all of the fragrances that abound in our everyday lives, be they the proverbial roses, a batch of freshly baked cookies, or a favorite perfume. Yet despite its importance, smell has not received the same level of attention from machine learning researchers as have vision and hearing.

Odor perception in humans is the result of the activation of 400 different types of olfactory receptors (ORs), expressed in 1 million olfactory sensory neurons (OSNs), in a small patch of tissue called the olfactory epithelium. These OSNs send signals to the olfactory bulb, and then to further structures in the brain. Based on analogous advances in deep learning for sight and sound, it should be possible to directly predict the end sensory result of an input molecule, even without knowing the intricate details of all the systems involved. Solving the odor prediction problem would aid in discovering new synthetic odorants, thereby reducing the ecological impact of harvesting natural products. Inspection of the resulting olfactory models may even lead to new insights into the biology of smell.

Small odorant molecules are the most basic building blocks of flavors and fragrances, and therefore represent the simplest version of the odor prediction problem. Yet each molecule can have multiple odor descriptors. Vanillin, for example, has descriptors such as sweet, vanilla, creamy, and chocolate, with some notes being more apparent than others. So odor prediction is also a multi-label classification problem.

In “Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules”, we leverage graph neural networks (GNNs), a kind of deep neural network designed to operate on graphs as input, to directly predict the odor descriptors for individual molecules, without using any handcrafted rules. We demonstrate that this approach yields significantly improved performance in odor prediction compared to current state-of-the-art and is a promising direction for future research.

Graph Neural Networks for Odor Prediction
Since molecules are analogous to graphs, with atoms forming the vertices and bonds forming the edges, GNNs are the natural model of choice for their understanding. But how does one translate the structure of a molecule into a graph representation? Initially, every node in the graph is represented as a vector, using any preferred featurization — atom identity, atom charge, etc. Then, in a series of message passing steps, every node broadcasts its current vector value to each of its neighbors. An update function then takes the collection of vectors sent to it, and generates an updated vector value. This process can be repeated many times, until finally all of the nodes in the graph are summarized into a single vector via summing or averaging. That single vector, representing the entire molecule, can then be passed into a fully connected network as a learned molecular featurization. This network outputs a prediction for odor descriptors, as provided by perfume experts.
Each node is represented as a vector, and each entry in the vector initially encodes some atomic-level information.
For each node we look at adjacent nodes and collect their information, which is then transformed with a neural network into new information for the centered node. This procedure is performed iteratively. Other variants of GNNs utilize edge and graph-level information.
Illustration of a GNN for odor prediction. We translate the structure of molecules into graphs that are fed into GNN layers to learn a better representation of the nodes. These nodes are reduced into a single vector and passed into a neural network that is used to predict multiple odor descriptors.
This representation doesn’t know anything about spatial positions of atoms, and so it can’t distinguish stereoisomers, molecules made of the same atoms but in slightly different configurations that can smell different, such as (R)- and (S)-carvone. Nevertheless, we have found that even without distinguishing stereoisomers, in practice it is still possible to predict odor quite well.

For odor prediction, GNNs consistently demonstrate improved performance compared to previous state-of-the-art methods, such as random forests, which do not directly encode graph structure. The magnitude of the improvement depends on which odor one tries to predict.
Example of the performance of a GNN on odor descriptors against a strong baseline, as measured by the AUROC score. Example odor descriptors are picked randomly. Closer to 1.0 means better. In the majority of cases GNNs outperform the field-standard baseline substantially, with similar performance seen against other metrics (e.g., AUPRC, recall, precision).
Learning from the Model, and Extending It to Other Tasks
In addition to predicting odor descriptors, GNNs can be applied to other olfaction tasks. For example, take the case of classifying new or refined odor descriptors using only limited data. For each molecule, we extract a learned representation from an intermediate layer of the model that is optimized for our odor descriptors, which we call an “odor embedding”. One can think of this as an olfaction version of a color space, like RGB or CMYK. To see if this odor embedding is useful for predicting related but different tasks, we designed experiments that test our learned embedding on related tasks for which it was not originally designed. We then compared the performance of our odor embedding representation to a common chemoinformatic representation that encodes structural information of a molecule, but is agnostic to odor and found that the odor embedding generalized to several challenging new tasks, even matching state-of-the-art on some.
2D snapshot of our embedding space with some example odors highlighted. Left: Each odor is clustered in its own space. Right: The hierarchical nature of the odor descriptor. Shaded and contoured areas are computed with a kernel-density estimate of the embeddings.
Future Work
Within the realm of machine learning, smell remains the most elusive of the senses, and we’re excited to continue doing a small part to shed light on it through further fundamental research. The possibilities for future research are numerous, and touch on everything from designing new olfactory molecules that are cheaper and more sustainably produced, to digitizing scent, or even one day giving those without a sense of smell access to roses (and, unfortunately, also rotten eggs). We hope to also bring this problem to the attention of more of the machine learning world through the eventual creation and sharing of high-quality, open datasets.

This early research is the result of the work and advisement of a team of talented researchers and engineers in Google Brain — Benjamin Sanchez-Lengeling, Jennifer Wei, Brian Lee, Emily Reif, Carey Radebaugh, Max Bileschi, Yoni Halpern, and D. Sculley. We are delighted to have collaborated on this work with Richard Gerkin at ASU and Alán Aspuru-Guzik at the University of Toronto. We are of course building on an enormous amount of prior work, and have benefitted particularly from work by Justin Gilmer, George Dahl and others on fundamental methodology in GNNs, among many other works in neuroscience, statistics and chemistry. We are also grateful to helpful comments from Steven Kearnes, David Belanger, Joel Mainland, and Emily Mayhew.

Source: Google AI Blog

Accurate Online Speaker Diarization with Supervised Learning

Speaker diarization, the process of partitioning an audio stream with multiple people into homogeneous segments associated with each individual, is an important part of speech recognition systems. By solving the problem of “who spoke when”, speaker diarization has applications in many important scenarios, such as understanding medical conversations, video captioning and more. However, training these systems with supervised learning methods is challenging — unlike standard supervised classification tasks, a robust diarization model requires the ability to associate new individuals with distinct speech segments that weren't involved in training. Importantly, this limits the quality of both online and offline diarization systems. Online systems usually suffer more, since they require diarization results in real time.
Online speaker diarization on streaming audio input. Different colors in the bottom axis indicate different speakers.
In “Fully Supervised Speaker Diarization”, we describe a new model that seeks to make use of supervised speaker labels in a more effective manner. Here “fully” implies that all components in the speaker diarization system, including the estimation of the number of speakers, are trained in supervised ways, so that they can benefit from increasing the amount of labeled data available. On the NIST SRE 2000 CALLHOME benchmark, our diarization error rate (DER) is as low as 7.6%, compared to 8.8% DER from our previous clustering-based method, and 9.9% from deep neural network embedding methods. Moreover, our method achieves this lower error rate based on online decoding, making it specifically suitable for real-time applications. As such we are open sourcing the core algorithms in our paper to accelerate more research along this direction.

Clustering versus Interleaved-state RNN
Modern speaker diarization systems are usually based on clustering algorithms such as k-means or spectral clustering. Since these clustering methods are unsupervised, they could not make good use of the supervised speaker labels available in data. Moreover, online clustering algorithms usually have worse quality in real-time diarization applications with streaming audio inputs. The key difference between our model and common clustering algorithms is that in our method, all speakers’ embeddings are modeled by a parameter-sharing recurrent neural network (RNN), and we distinguish different speakers using different RNN states, interleaved in the time domain.

To understand how this works, consider the example below in which there are four possible speakers: blue, yellow, pink and green (this is arbitrary, and in fact there may be more — our model uses the Chinese restaurant process to accommodate the unknown number of speakers). Each speaker starts with its own RNN instance (with a common initial state shared among all speakers) and keeps updating the RNN state given the new embeddings from this speaker. In the example below, the blue speaker keeps updating its RNN state until a different speaker, yellow, comes in. If blue speaks again later, it resumes updating its RNN state. (This is just one of the possibilities for speech segment y7 in the figure below. If new speaker green enters, it will start with a new RNN instance.)
The generative process of our model. Colors indicate labels for speaker segments.
Representing speakers as RNN states enables us to learn the high-level knowledge shared across different speakers and utterances using RNN parameters, and this promises the usefulness of more labeled data. In contrast, common clustering algorithms almost always work with each single utterance independently, making it difficult to benefit from a large amount of labeled data.

The upshot of all this is that given time-stamped speaker labels (i.e. we know who spoke when), we can train the model with standard stochastic gradient descent algorithms. A trained model can be used for speaker diarization on new utterances from unheard speakers. Furthermore, the use of online decoding makes it more suitable for latency-sensitive applications.

Future Work
Although we've already achieved impressive diarization performance with this system, there are still many exciting directions we are currently exploring. First, we are refining our model so it can easily integrate contextual information to perform offline decoding. This will likely further reduce the DER, which is more useful for latency-insensitive applications. Second, we would like to model acoustic features directly instead of using d-vectors. In this way, the entire speaker diarization system can be trained in an end-to-end way.

To learn more about this work, please see our paper. To download the core algorithm of this system, please visit the Github page.

This work was done as a close collaboration between Google AI and Speech & Assistant teams. Contributors include Aonan Zhang (intern), Quan Wang, Zhengyao Zhu and Chong Wang.

Source: Google AI Blog

Neural Network-Generated Illustrations in Allo

Taking, sharing, and viewing selfies has become a daily habit for many — the car selfie, the cute-outfit selfie, the travel selfie, the I-woke-up-like-this selfie. Apart from a social capacity, self-portraiture has long served as a means for self and identity exploration. For some, it’s about figuring out who they are. For others it’s about projecting how they want to be perceived. Sometimes it’s both.

Photography in the form of a selfie is a very direct form of expression. It comes with a set of rules bounded by reality. Illustration, on the other hand, empowers people to define themselves - it’s warmer and less fraught than reality.
Today, Google is introducing a feature in Allo that uses a combination of neural networks and the work of artists to turn your selfie into a personalized sticker pack. Simply snap a selfie, and it’ll return an automatically generated illustrated version of you, on the fly, with customization options to help you personalize the stickers even further.
What makes you, you?
The traditional computer vision approach to mapping selfies to art would be to analyze the pixels of an image and algorithmically determine attribute values by looking at pixel values to measure color, shape, or texture. However, people today take selfies in all types of lighting conditions and poses. And while people can easily pick out and recognize qualitative features, like eye color, regardless of the lighting condition, this is a very complex task for computers. When people look at eye color, they don’t just interpret the pixel values of blue or green, but take into account the surrounding visual context.

In order to account for this, we explored how we could enable an algorithm to pick out qualitative features in a manner similar to the way people do, rather than the traditional approach of hand coding how to interpret every permutation of lighting condition, eye color, etc. While we could have trained a large convolutional neural network from scratch to attempt to accomplish this, we wondered if there was a more efficient way to get results, since we expected that learning to interpret a face into an illustration would be a very iterative process.

That led us to run some experiments, similar to DeepDream, on some of Google's existing more general-purpose computer vision neural networks. We discovered that a few neurons among the millions in these networks were good at focusing on things they weren’t explicitly trained to look at that seemed useful for creating personalized stickers. Additionally, by virtue of being large general-purpose neural networks they had already figured out how to abstract away things they didn’t need. All that was left to do was to provide a much smaller number of human labeled examples to teach the classifiers to isolate out the qualities that the neural network already knew about the image.

To create an illustration of you that captures the qualities that would make it recognizable to your friends, we worked alongside an artistic team to create illustrations that represented a wide variety of features. Artists initially designed a set of hairstyles, for example, that they thought would be representative, and with the help of human raters we used these hairstyles to train the network to match the right illustration to the right selfie. We then asked human raters to judge the sticker output against the input image to see how well it did. In some instances, they determined that some styles were not well represented, so the artists created more that the neural network could learn to identify as well.
Raters were asked to classify hairstyles that the icon on the left resembled closest. Then, once consensus was reached, resident artist Lamar Abrams drew a representation of what they had in common.
Avoiding the uncanny valley
In the study of aesthetics, a well-known problem is the uncanny valley - the hypothesis that human replicas which appear almost, but not exactly, like real human beings can feel repulsive. In machine learning, this could be compounded if were confronted by a computer’s perception of you, versus how you may think of yourself, which can be at odds.

Rather than aim to replicate a person’s appearance exactly, pursuing a lower resolution model, like emojis and stickers, allows the team to explore expressive representation by returning an image that is less about reproducing reality and more about breaking the rules of representation.
The team worked with artist Lamar Abrams to design the features that make up more than 563 quadrillion combinations.
Translating pixels to artistic illustrations
Reconciling how the computer perceives you with how you perceive yourself and what you want to project is truly an artistic exercise. This makes a customization feature that includes different hairstyles, skin tones, and nose shapes, essential. After all, illustration by its very nature can be subjective. Aesthetics are defined by race, culture, and class which can lead to creating zones of exclusion without consciously trying. As such, we strove to create a space for a range of race, age, masculinity, femininity, and/or androgyny. Our teams continue to evaluate the research results to help prevent against incorporating biases while training the system.
Creating a broad palette for identity and sentiment
There is no such thing as a ‘universal aesthetic’ or ‘a singular you’. The way people talk to their parents is different than how they talk to their friends which is different than how they talk to their colleagues. It’s not enough to make an avatar that is a literal representation of yourself when there are many versions of you. To address that, the Allo team is working with a range of artistic voices to help others extend their own voice. This first style that launched today speaks to your sarcastic side but the next pack might be more cute for those sincere moments. Then after that, maybe they’ll turn you into a dog. If emojis broadened the world of communication it’s not hard to imagine how this technology and language evolves. What will be most exciting is listening to what people say with it.

This feature is starting to roll out in Allo today for Android, and will come soon to Allo on iOS.

This work was made possible through a collaboration of the Allo Team and Machine Perception researchers at Google. We additionally thank Lamar Abrams, Koji Ashida, Forrester Cole, Jennifer Daniel, Shiraz Fuman, Dilip Krishnan, Inbar Mosseri, Aaron Sarna, and Bhavik Singh.

Equality of Opportunity in Machine Learning

As machine learning technology progresses rapidly, there is much interest in understanding its societal impact. A particularly successful branch of machine learning is supervised learning. With enough past data and computational resources, learning algorithms often produce surprisingly effective predictors of future events. To take one hypothetical example: an algorithm could, for example, be used to predict with high accuracy who will pay back their loan. Lenders might then use such a predictor as an aid in deciding who should receive a loan in the first place. Decisions based on machine learning can be both incredibly useful and have a profound impact on our lives.

Even the best predictors make mistakes. Although machine learning aims to minimize the chance of a mistake, how do we prevent certain groups from experiencing a disproportionate share of these mistakes? Consider the case of a group that we have relatively little data on and whose characteristics differ from those of the general population in ways that are relevant to the prediction task. As prediction accuracy is generally correlated with the amount of data available for training, it is likely that incorrect predictions will be more common in this group. A predictor might, for example, end up flagging too many individuals in this group as ‘high risk of default’ even though they pay back their loan. When group membership coincides with a sensitive attribute, such as race, gender, disability, or religion, this situation can lead to unjust or prejudicial outcomes.

Despite the need, a vetted methodology in machine learning for preventing this kind of discrimination based on sensitive attributes has been lacking. A naive approach might require a set of sensitive attributes to be removed from the data before doing anything else with it. This idea of “fairness through unawareness,” however, fails due to the existence of “redundant encodings.” Even if a particular attribute is not present in the data, combinations of other attributes can act as a proxy.

Another common approach, called demographic parity, asks that the prediction must be uncorrelated with the sensitive attribute. This might sound intuitively desirable, but the outcome itself is often correlated with the sensitive attribute. For example, the incidence of heart failure is substantially more common in men than in women. When predicting such a medical condition, it is therefore neither realistic nor desirable to prevent all correlation between the predicted outcome and group membership.

Equal Opportunity

Taking these conceptual difficulties into account, we’ve proposed a methodology for measuring and preventing discrimination based on a set of sensitive attributes. Our framework not only helps to scrutinize predictors to discover possible concerns. We also show how to adjust a given predictor so as to strike a better tradeoff between classification accuracy and non-discrimination if need be.

At the heart of our approach is the idea that individuals who qualify for a desirable outcome should have an equal chance of being correctly classified for this outcome. In our fictional loan example, it means the rate of ‘low risk’ predictions among people who actually pay back their loan should not depend on a sensitive attribute like race or gender. We call this principle equality of opportunity in supervised learning.

When implemented, our framework also improves incentives by shifting the cost of poor predictions from the individual to the decision maker, who can respond by investing in improved prediction accuracy. Perfect predictors always satisfy our notion, showing that the central goal of building more accurate predictors is well aligned with the goal of avoiding discrimination.

Learn more

To explore the ideas in this blog post on your own, our Big Picture team created a beautiful interactive visualization of the different concepts and tradeoffs. So, head on over to their page to learn more.

Once you’ve walked through the demo, please check out the full version of our paper, a joint work with Eric Price (UT Austin) and Nati Srebro (TTI Chicago). We’ll present the paper at this year’s Conference on Neural Information Processing Systems (NIPS) in Barcelona. So, if you’re around, be sure to stop by and chat with one of us.

Our paper is by no means the final word on this important and complex topic. It joins an ongoing conversation with a multidisciplinary focus of research. We hope to inspire future research that will sharpen the discussion of the different achievable tradeoffs surrounding discrimination and machine learning, as well as the development of tools that will help practitioners address these challenges.