Tag Archives: HCI

Predicting Text Readability from Scrolling Interactions

Posted by Sian Gooding, Intern, Google Research

Illiteracy affects at least 773 million people globally, both young and old. For these individuals, reading information from unfamiliar sources or on unfamiliar topics can be extremely difficult. Unfortunately, these inequalities have been further magnified by the global pandemic as a result of unequal access to education in reading and writing. In fact, UNESCO reports that over 100 million children are falling behind the minimum proficiency level in reading due to COVID-related school closures.

With increasing world-wide access to technology, reading on a device, such as a tablet or phone, has largely taken the place of traditional formats. This provides a unique opportunity to observe reading interactions, e.g., how a reader scrolls through a text, which can inform our understanding of what can make text difficult to read. This understanding is crucial when designing educational applications for low-proficiency readers and language learners, because it can be used to match learners with appropriately leveled texts as well as to support readers in understanding texts beyond their reading level.

In “Predicting Text Readability from Scrolling Interactions”, presented at CoNLL 2021, we show that data from on-device reading interactions can be used to predict how readable a text is. This novel approach provides insights into subjective readability — whether an individual reader has found a text accessible — and demonstrates that existing readability models can be improved by including feedback from scroll-based reading interactions. In order to encourage research in this area and to help enable more personalized tools for language learning and text simplification, we are releasing the dataset of reading interactions generated from our scrolling behavior–based readability assessment of English-language texts.

Understanding Text Difficulty
There are multiple aspects of a text that impact how difficult it is to read, including the vocabulary level, the syntactic structure, and overall coherence. Traditional machine learning approaches to measure readability have exclusively relied on such linguistic features. However, using these features alone does not work well for online content, because such content often contains abbreviations, emojis, broken text, and short passages, which detrimentally impact the performance of readability models.

To address this, we investigated whether aggregate data about the reading interactions of a group can be used to predict how difficult a text is, as well as how reading interactions may differ based on a readers’ understanding. When reading on a device, readers typically interact with text by scrolling in a vertical fashion, which we hypothesize can be used as a coarse proxy for reading comprehension. With this in mind, we recruited 518 paid participants and asked them to read English-language texts of different difficulty levels. We recorded the reading interactions by measuring different features of the participants’ scrolling behavior, such as the speed, acceleration and number of times areas of text were revisited. We then used this information to produce a set of features for a readability classifier.

Predicting Text Difficulty from Scrolling Behavior
We investigated which types of scrolling behaviors were most impacted by text difficulty and tested the significance using linear mixed effect models. In our set up, we have repeated measures, as multiple participants read the same texts and each participant reads more than one text. Using linear mixed-effect models gives us a higher confidence that the differences in interactions we are observing are because of the text difficulty, and not other random effects.

Our results showed that multiple reading behaviors differed significantly based on the text level, for example, the average, maximum and minimum acceleration of scrolling. We found the most significant features to be the total read time and the maximum reading speeds.

We then used these features as inputs to a machine learning algorithm. We designed and trained a support vector machine (i.e., a binary classifier) to predict whether a text is either advanced or elementary based only on scrolling behaviors as individuals interacted with it. The dataset on which the model was trained contains 60 articles, each of which were read by an average of 17 participants. From these interactions we produced aggregate features by taking the mean of the significant measures across participants.

We measured the accuracy of the approach using a metric called f-score, which measures how accurate the model is at classifying a text as either “easy” or “difficult” (where 1.0 reflects perfect classification accuracy). We are able to achieve an f-score of 0.77 on this task, using interaction features alone. This is the first work to show that it is possible to predict the readability of a text using only interaction features.

Improving Readability Models
In order to demonstrate the value of applying readability measures from scrolling behaviors to existing readability models, we integrated scroll-based features into the state-of-the-art automated readability assessment tool, which was released as part of the OneStopEnglish corpus. We found that the addition of interaction features improves the f-score of this model from 0.84 to 0.88. In addition, we were able to significantly outperform this system by using interaction information with simple vocabulary features, such as the number of words in the text, achieving an impressive f-score of 0.96.

In our study, we recorded comprehension scores to evaluate the understanding and readability of text for individuals. Participants were asked three questions per article to assess the reader’s understanding of what they had read. The interaction features of an individual’s scrolling behavior was represented as a high dimensional vector. To explore this data, we visualized the reading interaction features for each participant using t-distributed stochastic neighbor embeddings, which is a statistical method for visualizing high-dimensional data. The results revealed clusters in the comprehension score based on how well individuals understood the text. This shows that there is implicit information in reading interactions about the likelihood that an individual has understood a given text. We refer to this phenomenon as subjective readability. This information can be very useful for educational applications or for simplifying online content.

Plot showing t-SNE projection of scroll interactions in 2-dimensions. The color of each data point corresponds to the comprehension score. Clusters of comprehension scores indicate that there are correlations between reading behaviors and comprehension.

Finally, we investigated the extent to which reading interactions vary across audiences. We compared the average scrolling speed across different reader groups, covering reading proficiency and the reader’s first language. We found that the speed distribution varies depending on the proficiency and first language of the audience. This supports the case that first language and proficiency alter the reading behaviors of audiences, which allows us to contextualize the reading behavior of groups and better understand which areas of text may be harder for them to read.

Histogram showing the average speeds of scrolling (in vertical pixels per millisecond) across readers of different proficiency levels (beginner, intermediate and advanced), with lines showing the smoothed trend for each group. A higher average scroll speed indicates faster reading times. For example, a more challenging text that corresponds to slower scroll speeds by advanced readers is associated with higher scroll speeds by beginners because they engage with the text only superficially.

Histogram showing the average speeds of scrolling (in vertical pixels per millisecond) across audiences by first language of the readers, Tamil or English, with lines showing the smoothed trend for each group. A higher average scroll speed indicates faster reading times. Dark blue bars are where the histograms overlap.

Conclusion
This work is the first to show that reading interactions, such as scrolling behavior, can be used to predict the readability of text, which can yield numerous benefits. Such measures are language agnostic, unobtrusive, and robust to noisy text. Implicit user feedback allows insight into readability at an individual level, thereby allowing for a more inclusive and personalisable assessment of text difficulty. Furthermore, being able to judge the subjective readability of text benefits language learning and educational apps. We conducted a 518 participant study to investigate the impact of text readability on reading interactions and are releasing a novel dataset of the associated reading interactions. We confirm that there are statistically significant differences in the way that readers interact with advanced and elementary texts, and that the comprehension scores of individuals correlate with specific measures of scrolling interaction. For more information our conference presentation is available to view.

Acknowledgements
We thank our collaborators Yevgeni Berzak, Tony Mak and Matt Sharifi, as well as Dmitry Lagun and Blaise Aguera y Arcas for their helpful feedback on the paper.

Source: Google AI Blog

An Open Source Vibrotactile Haptics Platform for On-Body Applications.

Posted by Artem Dementyev, Hardware Engineer, Google Research

Most wearable smart devices and mobile phones have the means to communicate with the user through tactile feedback, enabling applications from simple notifications to sensory substitution for accessibility. Typically, they accomplish this using vibrotactile actuators, which are small electric vibration motors. However, designing a haptic system that is well-targeted and effective for a given task requires experimentation with the number of actuators and their locations in the device, yet most practical applications require standalone on-body devices and integration into small form factors. This combination of factors can be difficult to address outside of a laboratory as integrating these systems can be quite time-consuming and often requires a high level of expertise.

A typical lab setup on the left and the VHP board on the right.

In “VHP: Vibrotactile Haptics Platform for On-body Applications”, presented at ACM UIST 2021, we develop a low-power miniature electronics board that can drive up to 12 independent channels of haptic signals with arbitrary waveforms. The VHP electronics board can be battery-powered, and integrated into wearable devices and small gadgets. It allows all-day wear, has low latency, battery life between 3 and 25 hours, and can run 12 actuators simultaneously. We show that VHP can be used in bracelet, sleeve, and phone-case form factors. The bracelet was programmed with an audio-to-tactile interface to aid lipreading and remained functional when worn for multiple months by developers. To facilitate greater progress in the field of wearable multi-channel haptics with the necessary tools for their design, implementation, and experimentation, we are releasing the hardware design and software for the VHP system via GitHub.

Front and back sides of the VHP circuit board.

Block diagram of the system.

Platform Specifications.
VHP consists of a custom designed circuit board, where the main components are the microcontroller and haptic amplifier, which converts microcontroller’s digital output into signals that drive the actuators. The haptic actuators can be controlled by signals arriving via serial, USB, and Bluetooth Low Energy (BLE), as well as onboard microphones, using an nRF52840 microcontroller, which was chosen because it offers many input and output options and BLE, all in a small package. We added several sensors into the board to provide more experimental flexibility: an on-board digital microphone, an analog microphone amplifier, and an accelerometer. The firmware is a portable C/C++ library that works in the Arduino ecosystem.

To allow for rapid iteration during development, the interface between the board and actuators is critical. The 12 tactile signals’ wiring have to be quick to set up in order to allow for such development, while being flexible and robust to stand up to prolonged use. For the interface, we use a 24-pin FPC (flexible printed circuit) connector on the VHP. We support interfacing to the actuators in two ways: with a custom flexible circuit board and with a rigid breakout board.

VHP board (small board on the right) connected to three different types of tactile actuators via rigid breakout board (large board on the left).

Using Haptic Actuators as Sensors
In our previous blog post, we explored how back-EMF in a haptic actuator could be used for sensing and demonstrated a variety of useful applications. Instead of using back-EMF sensing in the VHP system, we measure the electrical current that drives each vibrotactile actuator and use the current load as the sensing mechanism. Unlike back-EMF sensing, this current-sensing approach allows simultaneous sensing and actuation, while minimizing the additional space needed on the board.

One challenge with the current-sensing approach is that there is a wide variety of vibrotactile actuators, each of which may behave differently and need different presets. In addition, because different actuators can be added and removed during prototyping with the adapter board, it would be useful if the VHP were able to identify the actuator automatically. This would improve the speed of prototyping and make the system more novice-friendly.

To explore this possibility, we collected current-load data from three off-the-shelf haptic actuators and trained a simple support vector machine classifier to recognize the difference in the signal pattern between actuators. The test accuracy was 100% for classifying the three actuators, indicating that each actuator has a very distinct response.

Different actuators have a different current signature during a frequency sweep, thus allowing for automatic identification.

Additionally, vibrotactile actuators require proper contact with the skin for consistent control over stimulation. Thus, the device should measure skin contact and either provide an alert or self-adjust if it is not loaded correctly. To test whether a skin contact measuring technique works in practice, we measured the current load on actuators in a bracelet as it was tightened and loosened around the wrist. As the bracelet strap is tightened, the contact pressure between the skin and the actuator increases and the current required to drive the actuator signal increases commensurately.

Current load sensing is responding to touch, while the actuator is driven at 250 Hz frequency.

Quality of the fit of the bracelet is measured.

Audio-to-Tactile Feedback
To demonstrate the utility of the VHP platform, we used it to develop an audio-to-tactile feedback device to help with lipreading. Lipreading can be difficult for many speech sounds that look similar (visemes), such as “pin” and “min”. In order to help the user differentiate visemes like these, we attach a microphone to the VHP system, which can then pick up the speech sounds and translate the audio to vibrations on the wrist. For audio-to-tactile translation, we used our previously developed algorithms for real-time audio-to-tactile conversion, available via GitHub. Briefly, audio filters are paired with neural networks to recognize certain viesemes (e.g., picking up the hard consonant “p” in “pin”), and are then translated to vibrations in different parts of the bracelet. Our approach is inspired by tactile phonemic sleeve (TAPS), however the major difference is that in our approach the tactile signal is presented continuously and in real-time.

One of the developers who employs lipreading in daily life wore the bracelet daily for several months and found it to give better information to facilitate lipreading than previous devices, allowing improved understanding of lipreading visemes with the bracelet versus lipreading alone. In the future, we plan to conduct full-scale experiments with multiple users wearing the device for an extended time.

Left: Audio-to-tactile sleeve. Middle: Audio-to-tactile bracelet. Right: One of our developers tests out the bracelets, which are worn on both arms.

Potential Applications
The VHP platform enables rapid experimentation and prototyping that can be used to develop techniques for a variety of applications. For example:

Rich haptics on small devices: Expanding the number of actuators on mobile phones, which typically only have one or two, could be useful to provide additional tactile information. This is especially useful as fingers are sensitive to vibrations. We demonstrated a prototype mobile phone case with eight vibrotactile actuators. This could be used to provide rich notifications and enhance effects in a mobile game or when watching a video.
Lab psychophysical experiments: Because VHP can be easily set up to send and receive haptic signals in real time, e.g., from a Jupyter notebook, it could be used to perform real-time haptic experiments.
Notifications and alerts: The wearable VHP could be used to provide haptic notifications from other devices, e.g., alerting if someone is at the door, and could even communicate distinguishable alerts through use of multiple actuators.
Sensory substitution: Besides the lipreading assistance example above, there are many other potential applications for accessibility using sensory substitution, such as visual-to-tactile sensing or even sensing magnetic fields.
Loading sensing: The ability to sense from the haptic actuator current load is unique to our platform, and enables a variety of features, such as pressure sensing or automatically adjusting actuator output.

Integrating eight voice coils into a phone case. We used loading sensing to understand which voice coils are being touched.

What's next?
We hope that others can utilize the platform to build a diverse set of applications. If you are interested and have ideas about using our platform or want to receive updates, please fill out this form. We hope that with this platform, we can help democratize the use of haptics and inspire a more widespread use of tactile devices.

Acknowledgments
This work was done by Artem Dementyev, Pascal Getreuer, Dimitri Kanevsky, Malcolm Slaney and Richard Lyon. We thank Alex Olwal, Thad Starner, Hong Tan, Charlotte Reed, Sarah Sterman for valuable feedback and discussion on the paper. Yuhui Zhao, Dmitrii Votintcev, Chet Gnegy, Whitney Bai and Sagar Savla for feedback on the design and engineering.

Source: Google AI Blog

Data Cascades in Machine Learning

Nithya Sambasivan, Research Scientist, Google Research

Data is a foundational aspect of machine learning (ML) that can impact performance, fairness, robustness, and scalability of ML systems. Paradoxically, while building ML models is often highly prioritized, the work related to data itself is often the least prioritized aspect. This data work can require multiple roles (such as data collectors, annotators, and ML developers) and often involves multiple teams (such as database, legal, or licensing teams) to power a data infrastructure, which adds complexity to any data-related project. As such, the field of human-computer interaction (HCI), which is focused on making technology useful and usable for people, can help both to identify potential issues and to assess the impact on models when data-related work is not prioritized.

In “‘Everyone wants to do the model work, not the data work’: Data Cascades in High-Stakes AI”, published at the 2021 ACM CHI Conference, we study and validate downstream effects from data issues that result in technical debt over time (defined as "data cascades"). Specifically, we illustrate the phenomenon of data cascades with the data practices and challenges of ML practitioners across the globe working in important ML domains, such as cancer detection, landslide detection, loan allocation and more — domains where ML systems have enabled progress, but also where there is opportunity to improve by addressing data cascades. This work is the first that we know of to formalize, measure, and discuss data cascades in ML as applied to real-world projects. We further discuss the opportunity presented by a collective re-imagining of ML data as a high priority, including rewarding ML data work and workers, recognizing the scientific empiricism in ML data research, improving the visibility of data pipelines, and improving data equity around the world.

Origins of Data Cascades
We observe that data cascades often originate early in the lifecycle of an ML system, at the stage of data definition and collection. Cascades also tend to be complex and opaque in diagnosis and manifestation, so there are often no clear indicators, tools, or metrics to detect and measure their effects. Because of this, small data-related obstacles can grow into larger and more complex challenges that affect how a model is developed and deployed. Challenges from data cascades include the need to perform costly system-level changes much later in the development process, or the decrease in users’ trust due to model mis-predictions that result from data issues. Nevertheless and encouragingly, we also observe that such data cascades can be avoided through early interventions in ML development.

Different color arrows indicate different types of data cascades, which typically originate upstream, compound over the ML development process, and manifest downstream.

Examples of Data Cascades
One of the most common causes of data cascades is when models that are trained on noise-free datasets are deployed in the often-noisy real world. For example, a common type of data cascade originates from model drifts, which occur when target and independent variables deviate, resulting in less accurate models. Drifts are more common when models closely interact with new digital environments — including high-stakes domains, such as air quality sensing, ocean sensing, and ultrasound scanning — because there are no pre-existing and/or curated datasets. Such drifts can lead to more factors that further decrease a model’s performance (e.g., related to hardware, environmental, and human knowledge). For example, to ensure good model performance, data is often collected in controlled, in-house environments. But in the live systems of new digital environments with resource constraints, it is more common for data to be collected with physical artefacts such as fingerprints, shadows, dust, improper lighting, and pen markings, which can add noise that affects model performance. In other cases, environmental factors such as rain and wind can unexpectedly move image sensors in deployment, which also trigger cascades. As one of the model developers we interviewed reported, even a small drop of oil or water can affect data that could be used to train a cancer prediction model, therefore affecting the model’s performance. Because drifts are often caused by the noise in real-world environments, they also take the longest — up to 2-3 years — to manifest, almost always in production.

Another common type of data cascade can occur when ML practitioners are tasked with managing data in domains in which they have limited expertise. For instance, certain kinds of information, such as identifying poaching locations or data collected during underwater exploration, rely on expertise in the biological sciences, social sciences, and community context. However, some developers in our study described having to take a range of data-related actions that surpassed their domain expertise — e.g., discarding data, correcting values, merging data, or restarting data collection — leading to data cascades that limited model performance. The practice of relying on technical expertise more than domain expertise (e.g., by engaging with domain experts) is what appeared to set off these cascades.

Two other cascades observed in this paper resulted from conflicting incentives and organizational practices between data collectors, ML developers, and other partners — for example, one cascade was caused by poor dataset documentation. While work related to data requires careful coordination across multiple teams, this is especially challenging when stakeholders are not aligned on priorities or workflows.

How to Address Data Cascades
Addressing data cascades requires a multi-part, systemic approach in ML research and practice:

Develop and communicate the concept of goodness of the data that an ML system starts with, similar to how we think about goodness of fit with models. This includes developing standardized metrics and frequently using those metrics to measure data aspects like phenomenological fidelity (how accurately and comprehensively does the data represent the phenomena) and validity (how well the data explains things related to the phenomena captured by the data), similar to how we have developed good metrics to measure model performance, like F1-scores.
Innovate on incentives to recognize work on data, such as welcoming empiricism on data in conference tracks, rewarding dataset maintenance, or rewarding employees for their work on data (collection, labelling, cleaning, or maintenance) in organizations.
Data work often requires coordination across multiple roles and multiple teams, but this is quite limited currently (partly, but not wholly, because of the previously stated factors). Our research points to the value of fostering greater collaboration, transparency, and fairer distribution of benefits between data collectors, domain experts, and ML developers, especially with ML systems that rely on collecting or labelling niche datasets.
Finally, our research across multiple countries indicates that data scarcity is pronounced in lower-income countries, where ML developers face the additional problem of defining and hand-curating new datasets, which makes it difficult to even start developing ML systems. It is important to enable open dataset banks, create data policies, and foster ML literacy of policy makers and civil society to address the current data inequalities globally.

Conclusion
In this work we both provide empirical evidence and formalize the concept of data cascades in ML systems. We hope to create an awareness of the potential value that could come from incentivising data excellence. We also hope to introduce an under-explored but significant new research agenda for HCI. Our research on data cascades has led to evidence-backed, state-of-the-art guidelines for data collection and evaluation in the revised PAIR Guidebook, aimed at ML developers and designers.

Acknowledgements
This paper was written in collaboration with Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh and Lora Aroyo. We thank our study participants, and Sures Kumar Thoddu Srinivasan, Jose M. Faleiro, Kristen Olson, Biswajeet Malik, Siddhant Agarwal, Manish Gupta, Aneidi Udo-Obong, Divy Thakkar, Di Dang, and Solomon Awosupin.

Source: Google AI Blog

Haptics with Input: Using Linear Resonant Actuators for Sensing

Posted by Artem Dementyev, Hardware Engineer, Google Research

As wearables and handheld devices decrease in size, haptics become an increasingly vital channel for feedback, be it through silent alerts or a subtle "click" sensation when pressing buttons on a touch screen. Haptic feedback, ubiquitous in nearly all wearable devices and mobile phones, is commonly enabled by a linear resonant actuator (LRA), a small linear motor that leverages resonance to provide a strong haptic signal in a small package. However, the touch and pressure sensing needed to activate the haptic feedback tend to depend on additional, separate hardware which increases the price, size and complexity of the system.

In “Haptics with Input: Back-EMF in Linear Resonant Actuators to Enable Touch, Pressure and Environmental Awareness”, presented at ACM UIST 2020, we demonstrate that widely available LRAs can sense a wide range of external information, such as touch, tap and pressure, in addition to being able to relay information about contact with the skin, objects and surfaces. We achieve this with off-the-shelf LRAs by multiplexing the actuation with short pulses of custom waveforms that are designed to enable sensing using the back-EMF voltage. We demonstrate the potential of this approach to enable expressive discrete buttons and vibrotactile interfaces and show how the approach could bring rich sensing opportunities to integrated haptics modules in mobile devices, increasing sensing capabilities with fewer components. Our technique is potentially compatible with many existing LRA drivers, as they already employ back-EMF sensing for autotuning of the vibration frequency.

Different off-the-shelf LRAs that work using this technique.

Back-EMF Principle in an LRA
Inside the LRA enclosure is a magnet attached to a small mass, both moving freely on a spring. The magnet moves in response to the excitation voltage introduced by the voice coil. The motion of the oscillating mass produces a counter-electromotive force, or back-EMF, which is a voltage proportional to the rate of change of magnetic flux. A greater oscillation speed creates a larger back-EMF voltage, while a stationary mass generates zero back-EMF voltage.

Anatomy of the LRA.

Active Back-EMF for Sensing
Touching or making contact with the LRA during vibration changes the velocity of the interior mass, as energy is dissipated into the contact object. This works well with soft materials that deform under pressure, such as the human body. A finger, for example, absorbs different amounts of energy depending on the contact force as it flattens against the LRA. By driving the LRA with small amounts of energy, we can measure this phenomenon using the back-EMF voltage. Because leveraging the back-EMF behavior for sensing is an active process, the key insight that enabled this work was the design of a custom, off-resonance driver waveform that allows continuous sensing while minimizing vibrations, sound and power consumption.

Touch and pressure sensing on the LRA.

We measure back-EMF from the floating voltage between the two LRA leads, which requires disconnecting the motor driver briefly to avoid disturbances. While the driver is disconnected, the mass is still oscillating inside the LRA, producing an oscillating back-EMF voltage. Because commercial back-EMF sensing LRA drivers do not provide the raw data, we designed a custom circuit that is able to pick up and amplify small back-EMF voltage. We also generated custom drive pulses that minimize vibrations and energy consumption.

Simplified schematic of the LRA driver and the back-EMF measurement circuit for active sensing.

After exciting the LRA with a short drive pulse, the back-EMF voltage fluctuates due to the continued oscillations of the mass on the spring (top, red line). The change in the back-EMF signal when subject to a finger press depends on the pressure applied (middle/bottom, green/blue lines).

Applications
The behavior of the LRAs used in mobile phones is the same, whether they are on a table, on a soft surface, or hand held. This may cause problems, as a vibrating phone could slide off a glass table or emit loud and unnecessary vibrating sounds. Ideally, the LRA on a phone would automatically adjust based on its environment. We demonstrate our approach for sensing using the LRA back-EMF technique by wiring directly to a Pixel 4’s LRA, and then classifying whether the phone is held in hand, placed on a soft surface (foam), or placed on a table.

Sensing phone surroundings.

We also present a prototype that demonstrates how LRAs could be used as combined input/output devices in portable electronics. We attached two LRAs, one on the left and one on the right side of a phone. The buttons provide tap, touch, and pressure sensing. They are also programmed to provide haptic feedback, once the touch is detected.

Pressure-sensitive side buttons.

There are a number of wearable tactile aid devices, such as sleeves, vests, and bracelets. To transmit tactile feedback to the skin with consistent force, the tactor has to apply the right pressure; it can not be too loose or too tight. Currently, the typical way to do so is through manual adjustment, which can be inconsistent and lacks measurable feedback. We show how the LRA back-EMF technique can be used to continuously monitor the fit bracelet device and prompt the user if it's too tight, too loose, or just right.

Fit sensing bracelet.

Evaluating an LRA as a Sensor
The LRA works well as a pressure sensor, because it has a quadratic response to the force magnitude during touch. Our method works for all five off-the-shelf LRA types that we evaluated. Because the typical power consumption is only 4.27 mA, all-day sensing would only reduce the battery life of a Pixel 4 phone from 25 to 24 hours. The power consumption can be greatly reduced by using low-power amplifiers and employing active sensing only when needed, such as when the phone is active and interacting with the user.

Back-EMF voltage changes when pressure is applied with a finger.

The challenge with active sensing is to minimize vibrations, so they are not perceptible when touching and do not produce audible sound. We optimize the active sensing to produce only 2 dB of sound and 0.45 m/s² peak-to-peak acceleration, which is just barely perceptible by finger and is quiet, in contrast to the regular 8.49 m/s² .

Future Work and Conclusion
To see the work presented here in action, please see the video below.

In the future, we plan to explore other sensing techniques, perhaps measuring the current could be an alternative approach. Also, using machine learning could potentially improve the sensing and provide more accurate classification of the complex back-EMF patterns. Our method could be developed further to enable closed-loop feedback with the actuator and sensor, which would allow the actuator to provide the same force, regardless of external conditions.

We believe that this work opens up new opportunities for leveraging existing ubiquitous hardware to provide rich interactions and closed-loop feedback haptic actuators.

Acknowledgments
This work was done by Artem Dementyev, Alex Olwal, and Richard Lyon. Thanks to Mathieu Le Goc and Thad Starner for feedback on the paper.

Source: Google AI Blog

Enabling E-Textile Microinteractions: Gestures and Light through Helical Structures

Posted by Alex Olwal, Research Scientist, Google Research

Textiles have the potential to help technology blend into our everyday environments and objects by improving aesthetics, comfort, and ergonomics. Consumer devices have started to leverage these opportunities through fabric-covered smart speakers and braided headphone cords, while advances in materials and flexible electronics have enabled the incorporation of sensing and display into soft form factors, such as jackets, dresses, and blankets.

A scalable interactive E-textile architecture with embedded touch sensing, gesture recognition and visual feedback.

In “E-textile Microinteractions” (Proceedings of ACM CHI 2020), we bring interactivity to soft devices and demonstrate how machine learning (ML) combined with an interactive textile topology enables parallel use of discrete and continuous gestures. This work extends our previously introduced E-textile architecture (Proceedings of ACM UIST 2018). This research focuses on cords, due to their modular use as drawstrings in garments, and as wired connections for data and power across consumer devices. By exploiting techniques from textile braiding, we integrate both gesture sensing and visual feedback along the surface through a repeating matrix topology.

For insight into how this works, please see this video about E-textile microinteractions and this video about the E-textile architecture.

E-textile microinteractions combining continuous sensing with discrete motion and grasps.

The Helical Sensing Matrix (HSM)
Braiding generally refers to the diagonal interweaving of three or more material strands. While braids are traditionally used for aesthetics and structural integrity, they can also be used to enable new sensing and display capabilities.

Whereas cords can be made to detect basic touch gestures through capacitive sensing, we developed a helical sensing matrix (HSM) that enables a larger gesture space. The HSM is a braid consisting of electrically insulated conductive textile yarns and passive support yarns,where conductive yarns in opposite directions take the role of transmit and receive electrodes to enable mutual capacitive sensing. The capacitive coupling at their intersections is modulated by the user’s fingers, and these interactions can be sensed anywhere on the cord since the braided pattern repeats along the length.

Left: A Helical Sensing Matrix based on a 4×4 braid (8 conductive threads spiraled around the core). Magenta/cyan are conductive yarns, used as receive/transmit lines. Grey are passive yarns (cotton). Center: Flattened matrix, that illustrates the infinite number of 4×4 matrices (colored circles 0-F), which repeat along the length of the cord. Right: Yellow are fiber optic lines, which provide visual feedback.

Rotation Detection
A key insight is that the two axial columns in an HSM that share a common set of electrodes (and color in the diagram of the flattened matrix) are 180º opposite each other. Thus, pinching and rolling the cord activates a set of electrodes and allows us to track relative motion across these columns. Rotation detection identifies the current phase with respect to the set of time-varying sinusoidal signals that are offset by 90º. The braid allows the user to initiate rotation anywhere, and is scalable with a small set of electrodes.

Rotation is deduced from horizontal finger motion across the columns. The plots below show the relative capacitive signal strengths, which change with finger proximity.

Interaction Techniques and Design Guidelines
This e-textile architecture makes the cord touch-sensitive, but its softness and malleability limit suitable interactions compared to rigid touch surfaces. With the unique material in mind, our design guidelines emphasize:

Simple gestures. We design for short interactions where the user either makes a single discrete gesture or performs a continuous manipulation.
Closed-loop feedback. We want to help the user discover functionality and get continuous feedback on their actions. Where possible, we provide visual, tactile, and audio feedback integrated in the device.

Based on these principles, we leverage our e-textile architecture to enable interaction techniques based on our ability to sense proximity, area, contact time, roll and pressure.

Our e-textile enables interaction based on capacitive sensing of proximity, contact area, contact time, roll, and pressure.

The inclusion of fiber optic strands that can display color of varying intensity enable dynamic real-time feedback to the user.

Braided fiber optics strands create the illusion of directional motion.

Motion Gestures (Flicks and Slides) and Grasping Styles (Pinch, Grab, Pinch)
We conducted a gesture elicitation study, which showed opportunities for an expanded gesture set. Inspired by these results, we decided to investigate five motion gestures based on flicks and slides, along with single-touch gestures (pinch, grab and pat).

Gesture elicitation study with imagined touch sensing.

We collected data from 12 new participants, which resulted in 864 gesture samples (12 participants performed eight gestures each, repeating nine times), each having 16 features linearly interpolated to 80 observations over time. Participants performed the eight gestures in their own style without feedback as we wanted to accommodate individual differences since the classification is highly dependent on user style (“contact”), preference (“how to pinch/grab”) and anatomy (e.g., hand size). Our pipeline was thus designed for user-dependent training to enable individual styles with differences across participants, such as the inconsistent use of clockwise/counterclockwise, overlap between temporal gestures (e.g., flick vs. flick and hold, and similar pinch and grab gestures.) For a user-independent system, we would need to address such differences, for example with stricter instructions for consistency, data from a larger population, and in more diverse settings. Real-time feedback during training will also help mitigate differences as the user learns to adjust their behavior.

Twelve participants (horizontal axis) performed 9 repetitions (animation) for the eight gestures (vertical axis). Each sub-image shows 16 overlaid feature vectors, interpolated to 80 observations over time.

We performed cross-validation for each user across the gestures by training on eight repetitions and testing on one, through nine permutations, and achieved a gesture recognition accuracy of ~94%. This result is encouraging, especially given the expressivity enabled by such a low-resolution sensor matrix (eight electrodes).

Notable here is that inherent relationships in the repeated sensing matrices are well-suited for machine learning classification. The ML classifier used in our research enables quick training with limited data, which makes a user-dependent interaction system reasonable. In our experience, training for a typical gesture takes less than 30s, which is comparable to the amount of time required to train a fingerprint sensor.

User-Independent, Continuous Twist: Quantifying Precision and Speed
The per-user trained gesture recognition enabled eight new discrete gestures. For continuous interactions, we also wanted to quantify how well user-independent, continuous twist performs for precision tasks. We compared our e-textile with two baselines, a capacitive multi-touch trackpad (“Scroll”) and the familiar headphone cord remote control (“Buttons”). We designed a lab study where the three devices controlled 1D movement in a targeting task.

We analysed three dependent variables for the 1800 trials, covering 12 participants and three techniques: time on task (milliseconds), total motion, and motion during end-of-trial. Participants also provided qualitative feedback through rankings and comments.

Our quantitative analysis suggests that our e-textile’s twisting is faster than existing headphone button controls and comparable in speed to a touch surface. Qualitative feedback also indicated a preference for e-textile interaction over headphone controls.

Left: Weighted average subjective feedback. We mapped the 7-point Likert scale to a score in the range [-3, 3] and multiplied by the number of times the technique received that rating, and computed an average for all the scores. Right: Mean completion times for target distances show that Buttons were consistently slower.

These results are particularly interesting given that our e-textile was more sensitive, compared to the rigid input devices. One explanation might be its expressiveness — users can twist quickly or slowly anywhere on the cord, and the actions are symmetric and reversible. Conventional buttons on headphones require users to find their location and change grips for actions, which adds a high cost to pressing the wrong button. We use a high-pass filter to limit accidental skin contact, but further work is needed to characterize robustness and evaluate long-term performance in actual contexts of use.

Gesture Prototypes: Headphones, Hoodie Drawstrings, and Speaker Cord
We developed different prototypes to demonstrate the capabilities of our e-textile architecture: e-textile USB-C headphones to control media playback on the phone, a hoodie drawstring to invisibly add music control to clothing, and an interactive cord for gesture controls of smart speakers.

Left: Tap = Play/Pause; Center: Double-tap = Next track; Right: Roll = Volume +/-

Interactive speaker cord for simultaneous use of continuous (twisting/rolling) and discrete gestures (pinch/pat) to control music playback.

Conclusions and Future Directions
We introduce an interactive e-textile architecture for embedded sensing and visual feedback, which can enable both precise small-scale and large-scale motion in a compact cord form factor. With this work, we hope to advance textile user interfaces and inspire the use of microinteractions for future wearable interfaces and smart fabrics, where eyes-free access and casual, compact and efficient input is beneficial. We hope that our e-textile will inspire others to augment physical objects with scalable techniques, while preserving industrial design and aesthetics.

Acknowledgements
This work is a collaboration across multiple teams at Google. Key contributors to the project include Alex Olwal, Thad Starner, Jon Moeller, Greg Priest-Dorman, Ben Carroll, and Gowa Mainini. We thank the Google ATAP Jacquard team for our collaboration, especially Shiho Fukuhara, Munehiko Sato, and Ivan Poupyrev. We thank Google Wearables, and Kenneth Albanowski and Karissa Sawyer, in particular. Finally, we would like to thank Mark Zarich for illustrations, Bryan Allen for videography, Frank Li for data processing, Mathieu Le Goc for valuable discussions, and Carolyn Priest-Dorman for textile advice.

Source: Google AI Blog

Toward Human-Centered Design for ML Frameworks

Posted by Carrie J. Cai, Senior Research Scientist, Google Research and Philip J. Guo, Assistant Professor, UC San Diego

As machine learning (ML) increasingly impacts diverse stakeholders and social groups, it has become necessary for a broader range of developers — even those without formal ML training — to be able to adapt and apply ML to their own problems. In recent years, there have been many efforts to lower the barrier to machine learning, by abstracting complex model behavior into higher-level APIs. For instance, Google has been developing TensorFlow.js, an open-source framework that lets developers write ML code in JavaScript to run directly in web browsers. Despite the abundance of engineering work towards improving APIs, little is known about what non-ML software developers actually need to successfully adopt ML into their daily work practices. Specifically, what do they struggle with when trying modern ML frameworks, and what do they want these frameworks to provide?

In “Software Developers Learning Machine Learning: Motivations, Hurdles, and Desires,” which received a Best Paper Award at the IEEE conference on Visual Languages and Human-Centric Computing (VL/HCC), we share our research on these questions and report the results from a large-scale survey of 645 people who used TensorFlow.js. The vast majority of respondents were software or web developers, who were fairly new to machine learning and usually did not use ML as part of their primary job. We examined the hurdles experienced by developers when using ML frameworks and explored the features and tools that they felt would best assist in their adoption of these frameworks into their programming workflows.

What Do Developers Struggle With Most When Using ML Frameworks?
Interestingly, by far the most common challenge reported by developers was not the lack of a clear API, but rather their own lack of conceptual understanding of ML, which hindered their ability to successfully use ML frameworks. These hurdles ranged from the initial stages of picking a good problem to which they could apply TensorFlow.js (e.g., survey respondents reported not knowing “what to apply ML to, where ML succeeds, where it sucks”), to creating the architecture of a neural net (e.g., “how many units [do] I have to put in when adding layers to the model?”) and knowing how to set and tune parameters during model training (e.g., “deciding what optimizers, loss functions to use”). Without a conceptual understanding of how different parameters affect outcomes, developers often felt overwhelmed by the seemingly infinite space of parameters to tune when debugging ML models.

Without sufficient conceptual support, developers also found it hard to transfer lessons learned from “hello world” API tutorials to their own real-world problems. While API tutorials provide syntax for implementing specific models (e.g., classifying MNIST digits), they typically don't provide the underlying conceptual scaffolding necessary to generalize beyond that specific problem.

Developers often attributed these challenges to their own lack of experience in advanced mathematics. Ironically, despite the abundance of non-experts tinkering with ML frameworks nowadays, many felt that ML frameworks were intended for specialists with advanced training in linear algebra and calculus, and thus not meant for general software developers or product managers. This semblance of imposter syndrome may be fueled by the prevalence of esoteric mathematical terminology in API documentation, which may unintentionally give the impression that an advanced math degree is necessary for even practical integration of ML into software projects. Though math training is indeed beneficial, the ability to grasp and apply practical concepts (e.g., a model’s learning rate) to real-world problems does not require an advanced math degree.

What Do Developers Want From ML Frameworks?
Developers who responded to our survey wanted ML frameworks to teach them not only how to use the API, but also the unspoken idioms that would help them to effectively apply the framework to their own problems.

Pre-made Models with Explicit Support for Modification
A common desire was to have access to libraries of canonical ML models, so that they could modify an existing template rather than creating new ones from scratch. Currently, pre-trained models are being made more widely available in many ML platforms, including TensorFlow.js. However, in their current form, these models do not provide explicit support for novice consumption. For example, in our survey, developers reported substantial hurdles transferring and modifying existing model examples to their own use cases. Thus, the provision of pre-made ML models should also be coupled with explicit support for modification.

Synthesize ML Best Practices into Just-in-Time Hints
Developers also wished frameworks could provide ML best practices, i.e., practical tips and tricks that they could use when designing or debugging models. While ML experts may acquire heuristics and go-to strategies through years of dedicated trial and error, the mere decision overhead of “which parameter should I try tuning first?” can be overwhelming for developers who aren't ML experts. To help narrow this broad space of decision possibilities, ML frameworks could embed tips on best practices directly into the programming workflow. Currently, visualizations like TensorBoard and tfjs-vis make it possible to help see what's going on inside of their models.

Coupling these with just-in-time strategic pointers, such as whether to adapt a pre-trained model or to build one from scratch, or diagnostic checks, like practical tips to “decrease learning rate” if the model is not converging, could help users acquire and make use of practical strategies. These tips could serve as an intermediate scaffolding layer that helps demystify the math theory underlying ML into developer-friendly terms.

Support for Learning-by-Doing
Finally, even though ML frameworks are not traditional learning platforms, software developers are indeed treating them as lightweight vehicles for learning-by-doing. For example, one survey respondent appreciated when conceptual support was tightly interwoven into the framework, rather than being a separate resource: “...the small code demos that you can edit and run right there. Really helps basic understanding.” Another explained that “I prefer learning by doing, so I would like to see more tutorials, examples” embedded into ML frameworks. Some found it difficult to take a formal online course, and would rather learn in bite-sized pieces through hands-on tinkering: “Due to the rest of life, I have to fit learning into small 5-15 minute blocks.”

Given these desires to learn-by-doing, ML frameworks may need to more clearly distinguish between a spectrum of resources aimed at different levels of expertise. Although many frameworks already have “hello world” tutorials, to properly set expectations these frameworks could more explicitly differentiate between API (syntax-specific) onboarding and ML (conceptual) onboarding.

Looking Forward
Ultimately, as the frontiers of ML are still evolving, providing practical, conceptual tips for software developers and creating a shared reservoir of community-curated best practices can benefit ML experts and novices alike. Hopefully, these research findings pave the way for more user-centric designs of future ML frameworks.

Acknowledgements
This work would not have been possible without Yannick Assogba, Sandeep Gupta, Lauren Hannah-Murphy, Michael Terry, Ann Yuan, Nikhil Thorat, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, and members of PAIR and TensorFlow.js.

Source: Google AI Blog

Toward Human-Centered Design for ML Frameworks

Source: Google AI Blog

Toward Human-Centered Design for ML Frameworks

Source: Google AI Blog

Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology

Posted by Narayan Hegde, Software Engineer, Google Health and Carrie J. Cai, Research Scientist, Google Research

Advances in machine learning (ML) have shown great promise for assisting in the work of healthcare professionals, such as aiding the detection of diabetic eye disease and metastatic breast cancer. Though high-performing algorithms are necessary to gain the trust and adoption of clinicians, they are not always sufficient—what information is presented to doctors and how doctors interact with that information can be crucial determinants in the utility that ML technology ultimately has for users.

The medical specialty of anatomic pathology, which is the gold standard for the diagnosis of cancer and many other diseases through microscopic analysis of tissue samples, can greatly benefit from applications of ML. Though diagnosis through pathology is traditionally done on physical microscopes, there has been a growing adoption of “digital pathology,” where high-resolution images of pathology samples can be examined on a computer. With this movement comes the potential to much more easily look up information, as is needed when pathologists tackle the diagnosis of difficult cases or rare diseases, when “general” pathologists approach specialist cases, and when trainee pathologists are learning. In these situations, a common question arises, “What is this feature that I’m seeing?” The traditional solution is for doctors to ask colleagues, or to laboriously browse reference textbooks or online resources, hoping to find an image with similar visual characteristics. The general computer vision solution to problems like this is termed content-based image retrieval (CBIR), one example of which is the “reverse image search” feature in Google Images, in which users can search for similar images by using another image as input.

Today, we are excited to share two research papers describing further progress in human-computer interaction research for similar image search in medicine. In “Similar Image Search for Histopathology: SMILY” published in Nature Partner Journal (npj) Digital Medicine, we report on our ML-based tool for reverse image search for pathology. In our second paper, “Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making” (preprint available here), which received an honorable mention at the 2019 ACM CHI Conference on Human Factors in Computing Systems, we explored different modes of refinement for image-based search, and evaluated their effects on doctor interaction with SMILY.

SMILY Design
The first step in developing SMILY was to apply a deep learning model, trained using 5 billion natural, non-pathology images (e.g., dogs, trees, man-made objects, etc.), to compress images into a “summary” numerical vector, called an embedding. The network learned during the training process to distinguish similar images from dissimilar ones by computing and comparing their embeddings. This model is then used to create a database of image patches and their associated embeddings using a corpus of de-identified slides from The Cancer Genome Atlas. When a query image patch is selected in the SMILY tool, the query patch’s embedding is similarly computed and compared with the database to retrieve the image patches with the most similar embeddings.

Schematic of the steps in building the SMILY database and the process by which input image patches are used to perform the similar image search.

The tool allows a user to select a region of interest, and obtain visually-similar matches. We tested SMILY’s ability to retrieve images along a pre-specified axis of similarity (e.g. histologic feature or tumor grade), using images of tissue from the breast, colon, and prostate (3 of the most common cancer sites). We found that SMILY demonstrated promising results despite not being trained specifically on pathology images or using any labeled examples of histologic features or tumor grades.

Example of selecting a small region in a slide and using SMILY to retrieve similar images. SMILY efficiently searches a database of billions of cropped images in a few seconds. Because pathology images can be viewed at different magnifications (zoom levels), SMILY automatically searches images at the same magnification as the input image.

Second example of using SMILY, this time searching for a lobular carcinoma, a specific subtype of breast cancer.

Refinement tools for SMILY
However, a problem emerged when we observed how pathologists interacted with SMILY. Specifically, users were trying to answer the nebulous question of “What looks similar to this image?” so that they could learn from past cases containing similar images. Yet, there was no way for the tool to understand the intent of the search: Was the user trying to find images that have a similar histologic feature, glandular morphology, overall architecture, or something else? In other words, users needed the ability to guide and refine the search results on a case-by-case basis in order to actually find what they were looking for. Furthermore, we observed that this need for iterative search refinement was rooted in how doctors often perform “iterative diagnosis”—by generating hypotheses, collecting data to test these hypotheses, exploring alternative hypotheses, and revisiting or retesting previous hypotheses in an iterative fashion. It became clear that, for SMILY to meet real user needs, it would need to support a different approach to user interaction.

Through careful human-centered research described in our second paper, we designed and augmented SMILY with a suite of interactive refinement tools that enable end-users to express what similarity means on-the-fly: 1) refine-by-region allows pathologists to crop a region of interest within the image, limiting the search to just that region; 2) refine-by-example gives users the ability to pick a subset of the search results and retrieve more results like those; and 3) refine-by-concept sliders can be used to specify that more or less of a clinical concept be present in the search results (e.g., fused glands). Rather than requiring that these concepts be built into the machine learning model, we instead developed a method that enables end-users to create new concepts post-hoc, customizing the search algorithm towards concepts they find important for each specific use case. This enables new explorations via post-hoc tools after a machine learning model has already been trained, without needing to re-train the original model for each concept or application of interest.

Through our user study with pathologists, we found that the tool-based SMILY not only increased the clinical usefulness of search results, but also significantly increased users’ trust and likelihood of adoption, compared to a conventional version of SMILY without these tools. Interestingly, these refinement tools appeared to have supported pathologists’ decision-making process in ways beyond simply performing better on similarity searches. For example, pathologists used the observed changes to their results from iterative searches as a means of progressively tracking the likelihood of a hypothesis. When search results were surprising, many re-purposed the tools to test and understand the underlying algorithm, for example, by cropping out regions they thought were interfering with the search or by adjusting the concept sliders to increase the presence of concepts they suspected were being ignored. Beyond being passive recipients of ML results, doctors were empowered with the agency to actively test hypotheses and apply their expert domain knowledge, while simultaneously leveraging the benefits of automation.

With these interactive tools enabling users to tailor each search experience to their desired intent, we are excited for SMILY’s potential to assist with searching large databases of digitized pathology images. One potential application of this technology is to index textbooks of pathology images with descriptive captions, and enable medical students or pathologists in training to search these textbooks using visual search, speeding up the educational process. Another application is for cancer researchers interested in studying the correlation of tumor morphologies with patient outcomes, to accelerate the search for similar cases. Finally, pathologists may be able to leverage tools like SMILY to locate all occurrences of a feature (e.g. signs of active cell division, or mitosis) in the same patient’s tissue sample to better understand the severity of the disease to inform cancer therapy decisions. Importantly, our findings add to the body of evidence that sophisticated machine learning algorithms need to be paired with human-centered design and interactive tooling in order to be most useful.

Acknowledgements
This work would not have been possible without Jason D. Hipp, Yun Liu, Emily Reif, Daniel Smilkov, Michael Terry, Craig H. Mermel, Martin C. Stumpe and members of Google Health and PAIR. Preprints of the two papers are available here and here.

Source: Google AI Blog

Using Deep Learning to Improve Usability on Mobile Devices

Posted by Yang Li, Research Scientist, Google AI

Tapping is the most commonly used gesture on mobile interfaces, and is used to trigger all kinds of actions ranging from launching an app to entering text. While the style of clickable elements (e.g., buttons) in traditional desktop graphical user interfaces is often conventionally defined, on mobile interfaces it can still be difficult for people to distinguish tappable versus non-tappable elements due to the diversity of styles. This confusion can lead to false affordances (e.g., a feature that could be mistaken for a button) and a lack of discoverability that can lead to user frustration, uncertainty, and errors. To avoid this, interface designers can conduct a study or a visual affordance test to help clarify the tappability of items in their interfaces. However, such studies are time-consuming and their findings are often limited to a specific app or interface design.

In our CHI'19 paper, "Modeling Mobile Interface Tappability Using Crowdsourcing and Deep Learning", we introduced an approach for modeling the usability of mobile interfaces at scale. We crowdsourced a task to study UI elements across a range of mobile apps to measure the perceived tappability by a user. Our model predictions were consistent with the user group at the ~90% level, demonstrating that a machine learning model can be effectively used to estimate the perceived tappability of interface elements in their design without the need for expensive and time consuming user testing.

Predicting Tappability with Deep Learning
Designers often use visual properties such as the color or depth of an element to signify its availability for interaction on interfaces, e.g., the blue color and underline of a link. While these common signifiers are useful, it is not always clear when to apply them in each specific design setting. Furthermore, with design trends evolving, traditional signifiers are constantly being altered and challenged, potentially causing user uncertainty and mistakes.

To understand how users perceive this changing landscape, we analyzed the potential signifiers affecting tappability in real mobile apps—element type (e.g., check boxes, text boxes, etc.), location, size, color, and words. We started by crowdsourcing volunteers to label the perceived clickability of ~20,000 unique interface elements from ~3,500 apps. With the exception of text boxes, type signifiers yielded low uncertainty in user perceived tappability. The location signifier refers to the position of a feature on the screen and is informed by the common layout design in mobile apps, as demonstrated in the figure below.

Heatmaps displaying the accuracy of tappable and non-tappable elements by location, where warmer colors represent areas of higher accuracy. Users labeled non-tappable elements more accurately towards the upper center of the interface, and tappable elements towards the bottom center of the interface.

The impact of element size was relatively weak, but did indicate confusion in the case of large non-tappable elements. Users showed a tendency to bright colors and short word counts for tappable elements, though word semantics also played a significant role.

We used these labels to train a simple deep neural network that predicts the likelihood that a user will perceive an interface element as tappable versus non-tappable. For a given element of the interface, the model uses a range of features, including the spatial context of the element on the screen (location), the semantics and functionality of the element (words and type), and the visual appearance (size as well as raw pixels). The neural network model applies a convolutional neural network (CNN) to extract features from raw pixels, and uses learned semantic embeddings to represent text content and element properties. The concatenation of all these features are then fed to a fully-connected network layer, the output of which produces a binary classification of an element's tappability.

Evaluation of the Model
The model allowed us to automatically diagnose mismatches between the tappability of each interface element as perceived by a user—predicted by our model—and the intended or actual tappable state of the element specified by the developer or designer. In the example below, our model predicts that there is a 73% chance that a user would think the labels such as "Followers" or "Following" are tappable, while these interface elements are in fact not programmed to be tappable.

To understand how our model behaves compared to human users, particularly when there is ambiguity in human perception, we generated a second, independent dataset by crowdsourcing an effort among 290 volunteers to label each of 2,000 unique interface elements with respect to their perceived tappability. Each element was labeled independently by five different users. We found that more than 40% of the elements in our sample were labeled inconsistently by volunteers. Our model matches this uncertainty in human perception quite well, as demonstrated in the figure below.

The scatterplot of the tappability probability predicted by the model (the Y axis) versus the consistency in the human user labels (the X axis) for each element in the consistency dataset.

When users agree an element's tappability, our model tends to give a more definite answer—a probability close to 1 for tappable and close to 0 for not tappable. When workers are less consistent on an element (towards the middle of the X axis), our model is also less certain about the decision. Overall, our model achieved reasonable accuracy of matching human perception in identifying tappable UI elements with a mean precision of 90.2% and recall of 87.0%.

Predicting tappability is merely one example of what we can do with machine learning to solve usability issues in user interfaces. There are many other challenges in interaction design and user experience research where deep learning models can offer a vehicle to distill large, diverse user experience datasets and advance scientific understandings about interaction behaviors.

Acknowledgements
This research was a joint work of Amanda Swangson, summer intern at Google, and Yang Li, a Research Scientist in Deep Learning and Human Computer Interaction.

googblogs.com

All Google blogs and Press in one site

Tag Archives: HCI

An Open Source Vibrotactile Haptics Platform for On-Body Applications.

Source: Google AI Blog

Data Cascades in Machine Learning

Source: Google AI Blog

Haptics with Input: Using Linear Resonant Actuators for Sensing

Source: Google AI Blog

Enabling E-Textile Microinteractions: Gestures and Light through Helical Structures

Source: Google AI Blog

Toward Human-Centered Design for ML Frameworks

Source: Google AI Blog

Toward Human-Centered Design for ML Frameworks

Source: Google AI Blog

Toward Human-Centered Design for ML Frameworks

Source: Google AI Blog

Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology

Source: Google AI Blog

Using Deep Learning to Improve Usability on Mobile Devices

Source: Google AI Blog