Tag Archives: UI

Grounding Natural Language Instructions to Mobile UI Actions



Mobile devices offer a myriad of functionalities that can assist in everyday activities. However, many of these functionalities are not easily discoverable or accessible to users, forcing users to look up how to perform a specific task -- how to turn on the traffic mode in Maps or change notification settings in YouTube, for example. While searching the web for detailed instructions for these questions is an option, it is still up to the user to follow these instructions step-by-step and navigate UI details through a small touchscreen, which can be tedious and time consuming, and results in reduced accessibility. What if one could design a computational agent to turn these language instructions into actions and automatically execute them on the user’s behalf?

In “Mapping Natural Language Instructions to Mobile UI Action Sequences”, published at ACL 2020, we present the first step towards addressing the problem of automatic action sequence mapping, creating three new datasets used to train deep learning models that ground natural language instructions to executable mobile UI actions. This work lays the technical foundation for task automation on mobile devices that would alleviate the need to maneuver through UI details, which may be especially valuable for users who are visually or situationally impaired. We have also open-sourced our model code and data pipelines through our GitHub repository, in order to spur further developments among the research community.

Constructing Language Grounding Models
People often provide one another with instructions in order to coordinate joint efforts and accomplish tasks involving complex sequences of actions, for example, following a recipe to bake a cake, or having a friend walk you through setting up a home network. Building computational agents able to help with similar interactions is an important goal that requires true language grounding in the environments in which the actions take place.

The learning task addressed here is to predict a sequence of actions for a mobile platform given a set of instructions, a sequence of screens produced as the system transitions from one screen to another, as well as the set of interactive elements on those screens. Training such a model end-to-end would require paired language-action data, which is difficult to acquire at a large scale.

Instead, we deconstruct the problem into two sequential steps: an action phrase-extraction step and a grounding step.
The workflow of grounding language instructions to executable actions.
The action phrase-extraction step identifies the operation, object and argument descriptions from multi-step instructions using a Transformer model with area attention for representing each description phrase. Area attention allows the model to attend to a group of adjacent words in the instruction (a span) as a whole for decoding a description.
The action phrase extraction model takes a word sequence of a natural language instruction and outputs a sequence of spans (denoted in red boxes) that indicate the phrases describing the operation, the object and the argument of each action in the task.
Next, the grounding step matches the extracted operation and object descriptions with a UI object on the screen. Again, we use a Transformer model, but in this case, it contextually represents UI objects and grounds object descriptions to them.
The grounding model takes the extracted spans as input and grounds them to executable actions, including the object an action is applied to, given the UI screen at each step during execution.
Results
To investigate the feasibility of this task and the effectiveness of our approach, we construct three new datasets to train and evaluate our model. The first dataset includes 187 multi-step English instructions for operating Pixel phones along their corresponding action-screen sequences and enables assessment of full task performance on naturally occurring instructions, which is used for testing end-to-end grounding quality. For action phrase extraction training and evaluation, we obtain English “how-to” instructions that can be found abundantly from the web and annotate phrases that describe each action. To train the grounding model, we synthetically generate 295K single-step commands to UI actions, covering 178K different UI objects across 25K mobile UI screens from a public android UI corpus.

A Transformer with area attention obtains 85.56% accuracy for predicting span sequences that completely match the ground truth. The phrase extractor and grounding model together obtain 89.21% partial and 70.59% complete accuracy for matching ground-truth action sequences on the more challenging task of mapping language instructions to executable actions end-to-end. We also evaluated alternative methods and representations of UI objects, such as using a graph convolutional network (GCN) or a feedforward network, and found those that can represent an object contextually in the screen lead to better grounding accuracy. The new datasets, models and results provide an important first step on the challenging problem of grounding natural language instructions to mobile UI actions.

Conclusion
This research, and language grounding in general, is an important step for translating multi-stage instructions into actions on a graphical user interface. Successful application of task automation to the UI domain has the potential to significantly improve accessibility, where language interfaces might help individuals who are visually impaired perform tasks with interfaces that are predicated on sight. This also matters for situational impairment when one cannot access a device easily while encumbered by tasks at hand.

By deconstructing the problem into action phrase extraction and language grounding, progress on either can improve full task performance and it alleviates the need to have language-action paired datasets, which are difficult to collect at scale. For example, action span extraction is related to both semantic role labeling and extraction of multiple facts from text and could benefit from innovations in span identification and multitask learning. Reinforcement learning that has been applied in previous grounding work may help improve out-of-sample prediction for grounding in UIs and improve direct grounding from hidden state representations. Although our datasets were based on Android UIs, our approach can be applied generally to instruction grounding on other user interface platforms. Lastly, our work provides a technical foundation for investigating user experiences in language-based human computer interaction.

Acknowledgements
Many thanks to my collaborators on this work at Google Research. Xin Zhou and Jiacong He contributed substantially to the data pipelines and the creation of the datasets. Yuan Zhang and Jason Baldridge provided much valuable advice for the project and contributed to the presentation of the work. Gang Li provided generous help for creating open-source datasets. Many thanks to Ashwin Kakarla, Muqthar Mohammad and Mohd Majeed for their help with the annotations.

Source: Google AI Blog


Expand your app beyond mobile to reach Android users at large

Posted by Sameer Samat, Vice President, Platforms & Ecosystems

dark theme graphic illustration with geometric shapes and Android 2019 logo

From day one, we designed Android to be a flexible, adaptive platform.

Most people picture a smartphone when they think of Android, but Android also powers an amazing number of large-screen devices. In fact, there are more than 175 million Android tablets with the Google Play store,1 making Android tablets a vital form factor for Google and our OEM partners today. Android apps also run on Chrome OS laptops, and the number of monthly active users who enabled Android apps grew 250% in just the last year.2

Here at Google, we’re excited to see how you can take advantage of large-screen formats - including Samsung’s new Galaxy Tab S6, the upcoming Lenovo™ Smart Tab M8 with Google Assistant, the upcoming Samsung Fold, and other devices launching this week at IFA. Our OEM partners are building experiences that help users every day:

image of two quotes

From the start, Android was designed as a platform that could handle multiple screen sizes. Over the years, we’ve continued to add functionality for developers to accommodate new devices and form factors.

  • We started with a phone. Developers could write Android apps that would work on phones of all sizes, all over the world. Part of what made this work was Android’s resource and layout system, which enabled applications to smoothly adapt to different screen sizes.
  • In Android 3.0 Honeycomb, we added support for tablets. In particular, capabilities like Fragments allow you to create applications that work across vastly different form factors.
  • Android 7 Nougat brought multi-window and multi-display capabilities, including the ability to drag-and-drop across apps. Meanwhile, Chrome OS added the capability to run Android applications on laptops. With some adjustments to handle different inputs and windowing dynamics, you could now reach app users in a desktop-style environment.
Android’s layout system helps applications smoothly resize and adjust their layout interactively.

Android’s layout system helps applications smoothly resize and adjust their layout interactively.

  • Now, in Android 10, we’ve made even more enhancements for development on large screens. We’ve improved multi-window capabilities, making it easier to use multiple apps in parallel. We also continued improving multi-display support, enabling more multi-monitor use cases. And we made it easy for you to experiment and test new form factors by adding dedicated emulator for foldables as well as publishing a foldables guide.

By optimizing your app to take advantage of different form factors, developers have an opportunity to deliver richer, more engaging experiences to millions of users on larger screens. And if you don’t have access to physical devices, the Android Emulator supports all of the form factors mentioned above, from Chrome OS to phones and tablets.


Developers of apps like Mint, Evernote, and Asphalt are just a few who have seen success from taking their existing APK to larger screens.

image of a single quote from Damien Marchi, VP of Marketing at Gameloft

To learn more about optimizing your Android apps for richer experiences on tablets, Chrome OS laptops, foldables, and more, join us at the Android Developer Summit on October 23-24 — either in person or via the livestream — or check out our recap videos on YouTube.

Sources:

[1] The number of tablets only accounts for devices that have the Google Play Store installed (for example, this excludes tablets in China); the actual number of tablets capable of running Android applications is much larger.

[2] Google Internal Data, March 2018 to March 2019.

Flutter and Chrome OS: Better Together

Posted by the Flutter and Chrome OS teams

Chrome OS is the fast, simple, and secure operating system that powers Chromebooks, including the Google Pixelbook and millions of devices used by consumers and students every day. The latest Flutter release adds support for building beautiful, tailored Chrome OS applications, including rich support for keyboard and mouse, and tooling to ensure that your app runs well on a Chromebook. Furthermore, Chrome OS is a great developer workstation for building general-purpose Flutter apps, thanks to its support for developing and running Flutter apps locally on the same device.

Flutter is a great way to build Chrome OS apps

Since its inception, Flutter has shared many of the same principles as Chrome OS: productive, fast, and beautiful experiences. Flutter allows developers to build beautiful, fast UIs, while also providing a high degree of developer productivity, and a completely open-source engine, framework and tools. In short, it’s the ideal modern toolkit for building multi-platform apps, including apps for Chrome OS.

Flutter initially focused on providing a UI toolkit for building apps for mobile devices, which typically feature touch input and small screens. However, we’ve been building keyboard and mouse support into Flutter since before our 1.0 release last December. And today, we’re pleased to announce that Flutter for Chrome OS is now stronger with scroll wheel support, hover management, and better keyboard event support. In addition, Flutter has always been great at allowing you to build apps that run at any size (large screen or small), with seamless resizing, as shown here in the Chrome OS Best Practices Sample:

The Chrome OS best practices sample in action

The Chrome OS best practices sample in action

The Chrome OS Hello World sample is an app built with Flutter that is optimized for Chrome OS. This includes a responsive UI to showcase how to reposition items and have layouts that respond well to changes in size from mobile to desktop.

Because Chrome OS runs Android apps, targeting Android is the way to build Chrome OS apps. However, while building Chrome OS apps on Android has always been possible, as described in these guidelines, it’s often difficult to know whether your Android app is going to run well on Chrome OS. To help with that problem, today we are adding a new set of lint rules to the Flutter tooling to catch violations of the most important of the Chrome OS best practice guidelines:

The Flutter Chrome OS lint rules in action

The Flutter Chrome OS lint rules in action

When you’re able to put these Chrome OS lint rules in place, you’ll quickly be able to see any problems in your Android app that would hamper it when running on Chrome OS. To learn how to take advantage of these rules, see the linting docs for Flutter Chrome OS.

But all of that is just the beginning -- the Flutter tools allow you to develop and test your apps directly on Chrome OS as well.

Chrome OS is a great developer platform to build Flutter apps

No matter what platform you're targeting, Flutter has support for rich IDEs and programming tools like Android Studio and Visual Studio Code. Over the last year, Chrome OS has been building support for running the Linux version of these tools with the beta of Linux on Chrome OS (aka Crostini). And, because Chrome OS also supports Android natively, you can configure the Flutter tooling to run your Android apps directly without an emulator involved.

The Flutter development tools running on Chrome OS

The Flutter development tools running on Chrome OS

All of the great productivity of Flutter is available, including Stateful Hot Reload, seamless resizing, keyboard and mouse support, and so on. Recent improvements in Crostini, such as high DPI support, Crostini file system integration, easier adb, and so on, have made this experience even better! Of course, you don’t have to test against the Android container running on Chrome OS; you can also test against Android devices attached to your Chrome OS box. In short, Chrome OS is the ideal environment in which to develop and test your Flutter apps, especially when you’re targeting Chrome OS itself.

Customers love Flutter on Chrome OS

With its unique combination of simplicity, security, and capability, Chrome OS is an increasingly popular platform for enterprise applications. These apps often work with large quantities of data, whether it’s a chart, or a graph for visualization, or lists and forms for data entry. The support in Flutter for high quality graphics, large screen layout, and input features (like text selection, tab order and mousewheel), make it an ideal way to port mobile applications for the enterprise. One purveyor of such apps is AppTree, who use Flutter and Chrome OS to solve problems for their enterprise customers.

“Creating a Chrome OS version of our app took very little effort. In 10 minutes we tweaked a few values and now our users have access to our app on a whole new class of devices. This is a huge deal for our enterprise customers who have been wanting access to our app on Desktop devices.”
--Matthew Smith, CTO, AppTree Software

By using Flutter to target Chrome OS, AppTree was able to start with their existing Flutter mobile app and easily adapt it to take advantage of the capabilities of Chrome OS.

Try Flutter on Chrome OS today!

If you’d like to target Chrome OS with Flutter, you can do so today simply by installing the latest version of Flutter. If you’d like to run the Flutter development tools on Chrome OS, you can follow these instructions to get started fast. To see a real-world app built with Flutter that has been optimized for Chrome OS, check out the the Developer Quest sample that the Flutter DevRel team launched at the 2019 Google I/O conference. And finally, don’t forget to try out the Flutter Chrome OS linting rules to make sure that your Chrome OS apps are following the most important practices.

Flutter and Chrome OS go great together. What are you going to build?

Build your own Machine Learning Visualizations with the new TensorBoard API



When we open-sourced TensorFlow in 2015, it included TensorBoard, a suite of visualizations for inspecting and understanding your TensorFlow models and runs. Tensorboard included a small, predetermined set of visualizations that are generic and applicable to nearly all deep learning applications such as observing how loss changes over time or exploring clusters in high-dimensional spaces. However, in the absence of reusable APIs, adding new visualizations to TensorBoard was prohibitively difficult for anyone outside of the TensorFlow team, leaving out a long tail of potentially creative, beautiful and useful visualizations that could be built by the research community.

To allow the creation of new and useful visualizations, we announce the release of a consistent set of APIs that allows developers to add custom visualization plugins to TensorBoard. We hope that developers use this API to extend TensorBoard and ensure that it covers a wider variety of use cases.

We have updated the existing dashboards (tabs) in TensorBoard to use the new API, so they serve as examples for plugin creators. For the current listing of plugins included within TensorBoard, you can explore the tensorboard/plugins directory on GitHub. For instance, observe the new plugin that generates precision-recall curves:
The plugin demonstrates the 3 parts of a standard TensorBoard plugin:
  • A TensorFlow summary op used to collect data for later visualization. [GitHub]
  • A Python backend that serves custom data. [GitHub]
  • A dashboard within TensorBoard built with TypeScript and polymer. [GitHub]
Additionally, like other plugins, the “pr_curves” plugin provides a demo that (1) users can look over in order to learn how to use the plugin and (2) the plugin author can use to generate example data during development. To further clarify how plugins work, we’ve also created a barebones TensorBoard “Greeter” plugin. This simple plugin collects greetings (simple strings preceded by “Hello, ”) during model runs and displays them. We recommend starting by exploring (or forking) the Greeter plugin as well as other existing plugins.

A notable example of how contributors are already using the TensorBoard API is Beholder, which was recently created by Chris Anderson while working on his master’s degree. Beholder shows a live video feed of data (e.g. gradients and convolution filters) as a model trains. You can watch the demo video here.
We look forward to seeing what innovations will come out of the community. If you plan to contribute a plugin to TensorBoard’s repository, you should get in touch with us first through the issue tracker with your idea so that we can help out and possibly guide you.

Acknowledgements
Dandelion Mané and William Chargin played crucial roles in building this API.



Harness the Power of Machine Learning in Your Browser with Deeplearn.js



Machine learning (ML) has become an increasingly powerful tool, one that can be applied to a wide variety of areas spanning object recognition, language translation, health and more. However, the development of ML systems is often restricted to those with computational resources and the technical expertise to work with commonly available ML libraries.

With PAIR — an initiative to study and redesign human interactions with ML — we want to open machine learning up to as many people as possible. In pursuit of that goal, we are excited to announce deeplearn.js 0.1.0, an open source WebGL-accelerated JavaScript library for machine learning that runs entirely in your browser, with no installations and no backend.
There are many reasons to bring machine learning into the browser. A client-side ML library can be a platform for interactive explanations, for rapid prototyping and visualization, and even for offline computation. And if nothing else, the browser is one of the world's most popular programming platforms.

While web machine learning libraries have existed for years (e.g., Andrej Karpathy's convnetjs) they have been limited by the speed of Javascript, or have been restricted to inference rather than training (e.g., TensorFire). By contrast, deeplearn.js offers a significant speedup by exploiting WebGL to perform computations on the GPU, along with the ability to do full backpropagation.

The API mimics the structure of TensorFlow and NumPy, with a delayed execution model for training (like TensorFlow), and an immediate execution model for inference (like NumPy). We have also implemented versions of some of the most commonly-used TensorFlow operations. With the release of deeplearn.js, we will be providing tools to export weights from TensorFlow checkpoints, which will allow authors to import them into web pages for deeplearn.js inference.

You can explore the potential of this library by training a convolutional neural network to recognize photos and handwritten digits — all in your browser without writing a single line of code.
We're releasing a series of demos that show deeplearn.js in action. Play with an image classifier that uses your webcam in real-time and watch the network’s internal representations of what it sees. Or generate abstract art videos at a smooth 60 frames per second. The deeplearn.js homepage contains these and other demos.

Our vision is that this library will significantly increase visibility and engagement with machine learning, giving developers access to powerful tools while simultaneously providing the everyday user with a way to interact with them. We’re looking forward to collaborating with the open source community to drive this vision forward.

Build beautiful apps and websites with modular, customizable UI components

Posted by Adrian Secord and Omer Ziv, Material Design

Material Components lets you build easily for Android, iOS, and the web using open-source code for Material Design, a shared set of principles uniting style, brand, interaction, and motion.

These components are regularly updated by a team of engineers and designers to follow the latest Material Design guidelines, ensuring well-crafted implementations that meet development standards such as internationalization and accessibility support.

Accurate

Pixel-perfect components for Android, iOS, and the web

Current

Maintained by Google engineers and designers, using the latest APIs and features.

Open-source

The code on GitHub is available for you to contribute or simply use elements as needed

Industry standards

Also used in Google's products, these components meet industry standards, such as internationalization and accessibility

Material Components are maintained by a core team of Android, iOS, and web engineers and UX designers at Google. We strive to support the best of each platform by:

  • Supporting older Android versions with graceful degradation
  • Developing iOS apps that use industry standards like Swift, Objective-C, and storyboards
  • Integrating seamlessly with popular web frameworks and libraries

With these components, your team can easily develop rich user experiences using Material Design. We'll be continually updating the components to match the latest Material Design guidelines, and we're looking forward to you and your team contributing to the project. To get the latest news and chat with us directly, please check out our GitHub repos, follow us on Twitter (@materialdesign), and visit us at https://material.io/components/.

How to measure translation quality in your user interfaces



Worldwide, there are about 200 languages that are spoken by at least 3 million people. In this global context, software developers are required to translate their user interfaces into many languages. While graphical user interfaces have evolved substantially when compared to text-based user interfaces, they still rely heavily on textual information. The perceived language quality of translated user interfaces (UIs) can have a significant impact on the overall quality and usability of a product. But how can software developers and product managers learn more about the quality of a translation when they don’t speak the language themselves?

Key information in interaction elements and content are mostly conveyed through text. This aspect can be illustrated by removing text elements from a UI, as shown in the the figure below.
Three versions of the YouTube UI: (a) the original, (b) YouTube without text elements, and (c) YouTube without graphic elements. It gets apparent how the textless version is stripped of the most useful information: it is almost impossible to choose a video to watch and navigating the site is impossible.
In "Measuring user rated language quality: Development and validation of the user interface Language Quality Survey (LQS)", recently published in the International Journal of Human-Computer Studies, we describe the development and validation of a survey that enables users to provide feedback about the language quality of the user interface.

UIs are generally developed in one source language and translated afterwards string by string. The process of translation is prone to errors and might introduce problems that are not present in the source. These problems are most often due to difficulties in the translation process. For example, the word “auto” can be translated to French as automatique (automatic) or automobile (car), which obviously has a different meaning. Translators might chose the wrong term if context is missing during the process. Another problem arises from words that behave as a verb when placed in a button or as a noun if part of a label. For example, “access” can stand for “you have access” (as a label) or “you can request access” (as a button).

Further pitfalls are gender, prepositions without context or other characteristics of the source text that might influence translation. These problems sometimes even get aggravated by the fact that translations are made by different linguists at different points in time. Such mistranslations might not only negatively affect trustworthiness and brand perception, but also the acceptance of the product and its perceived usefulness.

This work was motivated by the fact that in 2012, the YouTube internationalization team had anecdotal evidence which suggested that some language versions of YouTube might benefit from improvement efforts. While expert evaluations led to significant improvements of text quality, these evaluations were expensive and time-consuming. Therefore, it was decided to develop a survey that enables users to provide feedback about the language quality of the user interface to allow a scalable way of gathering quantitative data about language quality.

The Language Quality Survey (LQS) contains 10 questions about language quality. The first five questions form the factor “Readability”, which describes how natural and smooth to read the used text is. For instance, one question targets ease of understanding (“How easy or difficult to understand is the text used in the [product name] interface?”). Questions 6 to 9 summarize the frequency of (in)consistencies in the text, called “Linguistic Correctness”. The full survey can be found in the publication.

Case study: applying the LQS in the field

As the LQS was developed to discover problematic translations of the YouTube interface and allow focused quality improvement efforts, it was made available in over 60 languages and data were gathered for all these versions of the YouTube interface. To understand the quality of each UI version, we compared the results for the translated versions to the source language (here: US-English). We inspected first the global item, in combination with Linguistic Correctness and Readability. Second, we inspected each item separately, to understand which notion of Linguistic Correctness or Readability showed worse (or better) values. Here are some results:
  • The data revealed that about one third of the languages showed subpar language quality levels, when compared to the source language.
  • To understand the source of these problems and fix them, we analyzed the qualitative feedback users had provided (every time someone selected the lower two end scale points, pointing at a problem in the language, a text box was surfaced, asking them to provide examples or links to illustrate the issues).
  • The analysis of these comments provided linguists with valuable feedback of various kinds. For instance, users pointed to confusing terminology, untranslated words that were missed during translation, typographical or grammatical problems, words that were translated but are commonly used in English, or screenshots in help pages that were in English but needed to be localized. Some users also pointed to readability aspects such as sections with old fashioned or too formal tone as well as too informal translations, complex technical or legal wordings, unnatural translations or rather lengthy sections of text. In some languages users also pointed to text that was too small or criticized the readability of the font that was used.
  • In parallel, in-depth expert reviews (so-called “language find-its”) were organized. In these sessions, a group of experts for each language met and screened all of YouTube to discover aspects of the language that could be improved and decided on concrete actions to fix them. By using the LQS data to select target languages, it was possible to reduce the number of language find-its to about one third of the original estimation (if all languages had been screened).
LQS has since been successfully adapted and used for various Google products such as Docs, Analytics, or AdWords. We have found the LQS to be a reliable, valid and useful tool to approach language quality evaluation and improvement. The LQS can be regarded as a small piece in the puzzle of understanding and improving localization quality. Google is making this survey broadly available, so that everyone can start improving their products for everyone around the world.