GDE community highlight: Lars Knudsen

Posted by Monika Janota, Community Manager

Lars Knudsen is a Google Developer Expert; we talked to him about how a $10 device can make computers more accessible for people with disabilities.
 

Monika: What inspired you to become a developer? What’s your current professional focus?

Lars: I got my MSc in engineering, but in fact my interest in tech started much earlier. When I was a kid in the 80s, my father owned a computing company working with graphic design. Sometimes, especially during the summer holidays, he would take me to work with him. At times, some of his employees would keep an eye on me. There was this really smart guy who once said to me, “Lars, I need to get some work done, but here's a C manual, and there’s a computer over there. Here’s how you start a C compiler. If you have any questions, come and ask me.” I started to write short texts that were translated into something the computer could understand. It seemed magical to me. I was 11 years old when I started and around seventh grade, I was able to create small applications for my classmates or to be used at school. That’s how it started.

Over the years, I’ve worked for many companies, including Nokia, Maersk, and Openwave. At the beginning, like in many other professions, because you know a little, you feel like you can do everything, but with time you learn each company has a certain way of doing things.

After a few years of working for a medical company, I started my own business in 1999. I worked as a freelance contractor and, thanks to that, had the chance to get to know multiple organizations quickly. After completing the first five contracts, I found out that every company thinks they’ve found the perfect setup, but all of them are completely different. At that time, I was also exposed to a lot of different technologies, operating systems etc. Around my early twenties, my mindset changed. At the beginning, I was strictly focused on one technology and wanted to learn all about it. With time, I started to think about combining technologies as a way of improving our lives. I have a particular interest in narrowing the gap between what we call the A and the B team in the world. I try to transfer as much knowledge as possible to regions where people don’t have the luxury of owning a computer or studying at university free of charge.

I continue to work as a contractor for external partners but, whenever possible, I try to choose projects that have some kind of positive impact on the environment or society. I’m currently working on embedded software for a hearing-aid company called Oticon. Software-wise, I’ve been working on everything from the tiniest microcontrollers to the cloud; a lot of what I do revolves around the web. I’m trying to combine technologies whenever it makes sense.

Monika: Were you involved in developer communities before joining the Google Developer Experts program?

Lars: Yes, I was engaged in meetups and conferences. I first connected with the community while working for Nokia. Around 2010, I met Kenneth Rohde Christiansen, who became a GDE before me. He inspired me to see how web technologies can be useful for aspiring tech professionals in developing countries. Developing and deploying solutions using C++, C# or Java requires some years of experience, but everyone who has access to a computer, browser, and notepad can start developing web-based applications and learn really fast. It’s possible to build a fully functional application with limited resources, and ramp up from nothing. That’s why I call the web a very democratizing technology stack.

But back to the community—after a while I got interested in web standardization and what problems bleeding edge web technologies could solve. I experimented with new capabilities in a browser before release. I was working for Nokia at the time, developing for a Linux-based flagship device, the N9. The browser we built was WebKit based and I got some great experience developing features for a large open source project. In the years after leaving Nokia, I got involved in web conferences and meetups, so it made sense to join the GDE community in 2017.

I really enjoy the community work and everything we’re doing together, especially the pre-pandemic Chrome Developer Summits, where I got to help with booth duty alongside a bunch of awesome Google Engineers and other GDEs.

Monika: What advice would you give to a young developer who’s just starting their professional career and is not sure which path to take?

Lars: I’d say from my own experience—if you can afford it—consider freelancing for a couple of different companies. This way, you’ll be exposed to code in many different forms and stages of development. You’ll get to know a multitude of operating systems and languages, and learn how to resolve problems in many ways. This helped me a lot. I gained experience as senior developer in my twenties. This approach will help you achieve your professional goals faster.

Besides that, have fun, explore, play with the hardware and software. Consider building something that solves a real problem—maybe for your friends, family, or a local business. Don’t be afraid to jump into something you’ve never done before.

Monika: What does the future hold for web technologies?

Lars: I think that for a couple of years now the web has been fully capable of providing a platform for large field applications, both for the consumer and for business. On the server side of things, web technologies offer a seamless experience, especially for frontend developers who want to build a backend component. It’s easier for them to get started now. I know people who were using both Firebase and Heroku to get the job done. And this trend will grow—web technologies will be enough to build complex solutions of any kind. I believe that the Web Capabilities - Project Fugu ? really unlocks that potential.

Looking at it from a slightly different point of view, I also think that if we provide full documentation and in-depth articles not only in English but also in other languages (for example, Spanish and Portuguese), we would unlock a lot of potential in Latin America—and other regions, of course. Developers there often don’t know English well enough to fully understand all the relevant articles. We should also give them the opportunity to learn as early as possible, even before they start university, while still in their hometowns. They may use those skills to help local communities and businesses before they leave home and maybe never come back.

Thomas: You came a long way from doing C development on a random computer to hacking on hardware. How did you do that?

Lars: I started taking apart a lot of hardware I had at home. My dad was not always happy when I couldn’t put it back together. With time, I learned how to build some small devices, but it really took off much later, around the time I joined Nokia, where I got my embedded experience. I had the chance to build small screensavers, components for the Series 30 phones. I was really passionate about it and could really think outside the box. They assigned me a task to build a Snake game for those devices. It was a very interesting experience. The main difference between building embedded systems and most other things (including web) is that you leave a small footprint—you don’t have much space or memory to use. While building Snake, the RAM that I had available was less than one-third of the frame buffer (around 120 x 120 pixels). I had to come up with ways to algorithmically rejoin components on screen so they’d look static, as if they were tiles. I learned a lot—that was the move from larger systems to small, embedded solutions.

Thomas: The skill set of a typical frontend developer is very different from the skill set of someone who builds embedded hardware. How would you encourage a frontend developer to look into hardware and to start thinking in binary?

Lars: I think that the first step is to look at some of the Fugu APIs that work in Chrome and Edge, and are built into all the major systems today. That’s all you need at the start.

Another thing is that the toolchains for building embedded solutions have a steep learning curve. If you want to build your own custom hardware, start with Arduino or ESP32—something that is easy to buy and fairly cheap. With the right development environment, you can get your project up and running in no time.

You could also buy a heart rate monitor or a multisensor unit, which are already using Bluetooth GATT services, so you don’t have to build your own hardware or firmware—you can use what’s already there and start experimenting with the Web Bluetooth API to start communicating with it.

There are also devices that use a serial protocol—for these, you can use the Web Serial API (also Fugu). Recently I’ve been looking into using the WebHID API, which enables you to talk to all the human interface devices that everyone has access to. I found some old ones in my basement that had not been supported by any operating system for years, but thanks to reverse engineering it took me a few hours to re-enable them.

There are different approaches depending on what you want to build, but to a web developer I would say, get a solid sensor unit, maybe a Thingy 52 from Nordic Semiconductor; it has a lot of sensors, and you can hook up to your web application with very little effort.

Thomas: Connecting to the device is the first step, but then speaking to it effectively—that’s a whole other thing. How come you did not give up after facing obstacles? What kept you motivated to continue working?

Lars: For me personally the social aspect of solving a problem was the most important. When I started working on my own embedded projects, I had a vision and a desire to build a science lab in a box for developing regions. My wife is from Mexico and I saw some of the schools there; some that are located outside of the big cities are pretty shabby, without access to the materials and equipment that we have in our part of the world.

The passion for building something that can potentially be used to help others—that’s what kept me going. I also really enjoyed the community support. I reached out to some people at Google and all were extremely helpful and patiently answered all of my questions.

Thomas: A lot of people have some sort of hardware at home, but don’t know what to do with it. How do you find inspiration for all your amazing projects, in particular the one under the working name SimpleMouse?

Lars: Well, recently I have been in fact reviving a lot of old hardware, but for this particular project—the name has not been set yet, but let’s call it SimpleMouse—I used my experience. I worked with some accessibility solutions earlier and I saw how some of them just don’t work anymore; you’d need to have an old Windows XP with certain software installed to run them. You can’t really update those, you can only use those at home because you can’t move your setup.

Because of that, I wondered how to combine my skills from the embedded world with project Fugu and what is now possible on the web to create cheap, affordable hardware combined with easy-to-understand software on both sides, so people can build on that.

For that particular project, I took a small USB dongle with a reflexive chip, the nRF52840. It communicates with Bluetooth on one side and USB on the other. You can basically program it to be anything on both sides. And then I thought about the devices that control a computer—a mouse and a keyboard. Some people with disabilities may find it difficult to operate those devices, and I wanted to help them.

The first thing I did was to make sure that any operating system would see the USB dongle as a mouse. You can control it from a native application or a web application—directly into Bluetooth. After that, I built a web application—a simple template that people can extend the way they want using web components. Thanks to that, everyone can control their computer with a web app that I made in just a couple of hours on an Android phone.

Having that set up will enable anyone in the world with some web experience to build, in a matter of days, a very customized solution for anyone with a disability who wants to control their computer. The cool thing is that you can take it with you anywhere you go and use it with other devices as well. It will be the exact same experience. To me, the portability and affordability of the device are very important because people are no longer confined to using their own devices, and are no longer limited to one location.

Thomas: Did you have a chance to test the device in real life?

Lars: Actually during my last trip to Mexico I discussed it with a web professional living there; he’s now looking into the possibilities of using the device locally. Over there the equipment is really expensive, but a USB dongle normally costs around ten US dollars. He’s now checking if we could build local setups there to try it out. But I haven’t done official trials yet here in Denmark.

Thomas: Many devices designed to assist people with disabilities are really expensive. Are you planning on cooperating with any particular company and putting it into production for a fraction of the price of that expensive equipment?

Lars: Yes, definitely! I’ve already been talking to a local hardware manufacturer about that. Of course, the device won’t replace all those highly specialized solutions, but it can be the first step to building something bigger—for example, using voice recognition, already available for web technologies. It’ll be an easy way of controlling devices using your Android phone; it can work with a device of any kind.

Just being able to build whatever you want on the web and to use that to control any host computer opens up a lot of possibilities.

Thomas: Are you releasing your Zephyr project as open source? What kind of license do you use? Are there plans to monetize the project?

Lars: Yes, the solution is open source. I did not put a specific license on it, but I think Apache 2.0 would be the way to go. Many major companies use this license, including Google. When I worked on SimpleMouse, I did not think about monetizing the project—that was not my goal. But I also think it would make sense to try to put it into production in some way, and with this comes cost. The ultimate goal is to make it available. I’d love to see it being implemented at a low cost and on a large scale.

Beta Channel Update for Desktop

The Chrome team is excited to announce the promotion of Chrome 109 to the Beta channel for Windows, Mac and Linux. Chrome 109.0.5414.25 contains our usual under-the-hood performance and stability tweaks, but there are also some cool new features to explore - please head to the Chromium blog to learn more!



A full list of changes in this build is available in the log. Interested in switching release channels? Find out how here. If you find a new issues, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.



Prudhvikumar BommanaGoogle Chrome

Beta Channel Update for Desktop

The Chrome team is excited to announce the promotion of Chrome 109 to the Beta channel for Windows, Mac and Linux. Chrome 109.0.5414.25 contains our usual under-the-hood performance and stability tweaks, but there are also some cool new features to explore - please head to the Chromium blog to learn more!



A full list of changes in this build is available in the log. Interested in switching release channels? Find out how here. If you find a new issues, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.



Prudhvikumar BommanaGoogle Chrome

Talking to Robots in Real Time

A grand vision in robot learning, going back to the SHRDLU experiments in the late 1960s, is that of helpful robots that inhabit human spaces and follow a wide variety of natural language commands. Over the last few years, there have been significant advances in the application of machine learning (ML) for instruction following, both in simulation and in real world systems. Recent Palm-SayCan work has produced robots that leverage language models to plan long-horizon behaviors and reason about abstract goals. Code as Policies has shown that code-generating language models combined with pre-trained perception systems can produce language conditioned policies for zero shot robot manipulation. Despite this progress, an important missing property of current "language in, actions out" robot learning systems is real time interaction with humans.

Ideally, robots of the future would react in real time to any relevant task a user could describe in natural language. Particularly in open human environments, it may be important for end users to customize robot behavior as it is happening, offering quick corrections ("stop, move your arm up a bit") or specifying constraints ("nudge that slowly to the right"). Furthermore, real-time language could make it easier for people and robots to collaborate on complex, long-horizon tasks, with people iteratively and interactively guiding robot manipulation with occasional language feedback.

The challenges of open-vocabulary language following. To be successfully guided through a long horizon task like "put all the blocks in a vertical line", a robot must respond precisely to a wide variety of commands, including small corrective behaviors like "nudge the red circle right a bit".

However, getting robots to follow open vocabulary language poses a significant challenge from a ML perspective. This is a setting with an inherently large number of tasks, including many small corrective behaviors. Existing multitask learning setups make use of curated imitation learning datasets or complex reinforcement learning (RL) reward functions to drive the learning of each task, and this significant per-task effort is difficult to scale beyond a small predefined set. Thus, a critical open question in the open vocabulary setting is: how can we scale the collection of robot data to include not dozens, but hundreds of thousands of behaviors in an environment, and how can we connect all these behaviors to the natural language an end user might actually provide?

In Interactive Language, we present a large scale imitation learning framework for producing real-time, open vocabulary language-conditionable robots. After training with our approach, we find that an individual policy is capable of addressing over 87,000 unique instructions (an order of magnitude larger than prior works), with an estimated average success rate of 93.5%. We are also excited to announce the release of Language-Table, the largest available language-annotated robot dataset, which we hope will drive further research focused on real-time language-controllable robots.




Guiding robots with real time language.

Real Time Language-Controllable Robots

Key to our approach is a scalable recipe for creating large, diverse language-conditioned robot demonstration datasets. Unlike prior setups that define all the skills up front and then collect curated demonstrations for each skill, we continuously collect data across multiple robots without scene resets or any low-level skill segmentation. All data, including failure data (e.g., knocking blocks off a table), goes through a hindsight language relabeling process to be paired with text. Here, annotators watch long robot videos to identify as many behaviors as possible, marking when each began and ended, and use freeform natural language to describe each segment. Importantly, in contrast to prior instruction following setups, all skills used for training emerge bottom up from the data itself rather than being determined upfront by researchers.

Our learning approach and architecture are intentionally straightforward. Our robot policy is a cross-attention transformer, mapping 5hz video and text to 5hz robot actions, using a standard supervised learning behavioral cloning objective with no auxiliary losses. At test time, new spoken commands can be sent to the policy (via speech-to-text) at any time up to 5hz.

Interactive Language: an imitation learning system for producing real time language-controllable robots.

Open Source Release: Language-Table Dataset and Benchmark

This annotation process allowed us to collect the Language-Table dataset, which contains over 440k real and 180k simulated demonstrations of the robot performing a language command, along with the sequence of actions the robot took during the demonstration. This is the largest language-conditioned robot demonstration dataset of its kind, by an order of magnitude. Language-Table comes with a simulated imitation learning benchmark that we use to perform model selection, which can be used to evaluate new instruction following architectures or approaches.


Dataset # Trajectories (k)     # Unique (k)     Physical Actions     Real     Available
Episodic Demonstrations
BC-Z 25 0.1
SayCan 68 0.5
Playhouse 1,097 779
Hindsight Language Labeling
BLOCKS 30 n/a
LangLFP 10 n/a
LOREL 6 1.7
CALVIN 20 0.4
Language-Table (real + sim) 623 (442+181) 206 (127+79)

We compare Language-Table to existing robot datasets, highlighting proportions of simulated (red) or real (blue) robot data, the number of trajectories collected, and the number of unique language describable tasks.

Learned Real Time Language Behaviors

Examples of short horizon instructions the robot is capable of following, sampled randomly from the full set of over 87,000.

Short-Horizon Instruction Success
(87,000 more…)
push the blue triangle to the top left corner    80.0%
separate the red star and red circle 100.0%
nudge the yellow heart a bit right 80.0%
place the red star above the blue cube 90.0%
point your arm at the blue triangle 100.0%
push the group of blocks left a bit 100.0%
Average over 87k, CI 95% 93.5% +- 3.42%

95% Confidence interval (CI) on the average success of an individual Interactive Language policy over 87,000 unique natural language instructions.

We find that interesting new capabilities arise when robots are able to follow real time language. We show that users can walk robots through complex long-horizon sequences using only natural language to solve for goals that require multiple minutes of precise, coordinated control (e.g., "make a smiley face out of the blocks with green eyes" or "place all the blocks in a vertical line"). Because the robot is trained to follow open vocabulary language, we see it can react to a diverse set of verbal corrections (e.g., "nudge the red star slightly right") that might otherwise be difficult to enumerate up front.

Examples of long horizon goals reached under real time human language guidance.

Finally, we see that real time language allows for new modes of robot data collection. For example, a single human operator can control four robots simultaneously using only spoken language. This has the potential to scale up the collection of robot data in the future without requiring undivided human attention for each robot.

One operator controlling multiple robots at once with spoken language.

Conclusion

While currently limited to a tabletop with a fixed set of objects, Interactive Language shows initial evidence that large scale imitation learning can indeed produce real time interactable robots that follow freeform end user commands. We open source Language-Table, the largest language conditioned real-world robot demonstration dataset of its kind and an associated simulated benchmark, to spur progress in real time language control of physical robots. We believe the utility of this dataset may not only be limited to robot control, but may provide an interesting starting point for studying language- and action-conditioned video prediction, robot video-conditioned language modeling, or a host of other interesting active questions in the broader ML context. See our paper and GitHub page to learn more.


Acknowledgements

We would like to thank everyone who supported this research. This includes robot teleoperators: Alex Luong, Armando Reyes, Elio Prado, Eric Tran, Gavin Gonzalez, Jodexty Therlonge, Joel Magpantay, Rochelle Dela Cruz, Samuel Wan, Sarah Nguyen, Scott Lehrer, Norine Rosales, Tran Pham, Kyle Gajadhar, Reece Mungal, and Nikauleene Andrews; robot hardware support and teleoperation coordination: Sean Snyder, Spencer Goodrich, Cameron Burns, Jorge Aldaco, Jonathan Vela; data operations and infrastructure: Muqthar Mohammad, Mitta Kumar, Arnab Bose, Wayne Gramlich; and the many who helped provide language labeling of the datasets. We would also like to thank Pierre Sermanet, Debidatta Dwibedi, Michael Ryoo, Brian Ichter and Vincent Vanhoucke for their invaluable advice and support.

Source: Google AI Blog


Open sourcing the attention center model

When you look at an image, what parts of an image do you pay attention to first? Would a machine be able to learn this? We provide a machine learning model that can be used to do just that. Why is it useful? The latest generation image format (JPEG XL) supports serving the parts that you pay attention to first, which results in an improved user experience: images will appear to load faster. But the model not only works for encoding JPEG XL images, but can be used whenever we need to know where a human would look first.

An open sourcing attention center model

What regions in an image will attract the majority of human visual attention first? We trained a model to predict such a region when given an image, called the attention center model, which is now open sourced. In addition to the model, we provide a script to use it in combination with the JPEG XL encoder: google/attention-center.

Some example predictions of our attention center model are shown in the following figure, where the green dot is the predicted attention center point for the image. Note that in the “two parrots” image both parrots’ heads are visually important, so the attention center point will be in the middle.

Four images in quadrants as follows: A red door with brass doorknob in top left quadrant, headshot of a brown skinned girl waering a colorful sweater and ribbons in her hair and painted face smiling at the camera in the top right quadrant, A teal shuttered catherdral style window against a sand colored stucco wall with pink and red hibiscus in the forefront in the bottom left quadrant, A blue and yellow macaw and red and green macaw next to each other in the bottom right quadrant
Images are from Kodak image data set: http://r0k.us/graphics/kodak/

The model is 2MB and in the TensorFlow Lite format. It takes an RGB image as input and outputs a 2D point, which is the predicted center of human attention on the image. That predicted center is the place where we should start with operations (decoding and displaying in JPEG XL case). This allows the most visually salient/import regions to be processed as early as possible. Check out the code and continue to build upon it!

Attention center ground-truth data

To train a model to predict the attention center, we first need to have some ground-truth data from the attention center. Given an image, some attention points can either be collected by eye trackers [1], or be approximated by mouse clicks on a blurry version of the image [2]. We first apply temporal filtering to those attention points and keep only the initial ones, and then apply spatial filtering to remove noise (e.g., random gazes). We then compute the center of the remaining attention points as the attention center ground-truth. An example illustration figure is shown below for the process of obtaining the ground-truth.

Five images in a row showing the original image of a person standing on a rock by the ocean; the first is the original image, the second showing gaze/attention points, the third shoing temporal filtering, the fourth spatial filtering, and fifth, attention center

Attention center model architecture

The attention center model is a deep neural net, which takes an image as input, and uses a pre-trained classification network, e.g, ResNet, MobileNet, etc., as the backbone. Several intermediate layers that output from the backbone network are used as input for the attention center prediction module. These different intermediate layers contain different information e.g., shallow layers often contain low level information like intensity/color/texture, while deeper layers usually contain higher and more semantic information like shape/object. All are useful for the attention prediction. The attention center prediction applies convolution, deconvolution and/or resizing operator together with aggregation and sigmoid function to generate a weighting map for the attention center. And then an operator (the Einstein summation operator in our case) can be applied to compute the (gravity) center from the weighting map. An L2 norm between the predicted attention center and the ground-truth attention center can be computed as the training loss.

Attention center model architecture

Progressive JPEG XL images with attention center model

JPEG XL is a new image format that allows the user to encode images in a way to ensure the more interesting parts come first. This has the advantage that when viewing images that are transferred over the web, we can already display the attention grabbing part of the image, i.e. the parts where the user looks first and as soon as the user looks elsewhere ideally the rest of the image already has arrived and has been decoded. Using Saliency in progressive JPEG XL images | Google Open Source Blog illustrates how this works in principle. In short, in JPEG XL, the image is divided into square groups (typically of size 256 x 256), and the JPEG XL encoder will choose a starting group in the image and then grow concentric squares around that group. It was this need for figuring out where the attention center of an image is that led us to open source the attention center model, together with a script to use it in combination with the JPEG XL encoder. Progressive decoding of JPEG XL images has recently been added to Chrome starting from version 107. At the moment, JPEG XL is behind an experimental flag, which can be enabled by going to chrome://flags, searching for “jxl”.

To try out how partially loaded progressive JPEG XL images look, you can go to https://google.github.io/attention-center/.

By Moritz Firsching, Junfeng He, and Zoltan Szabadka – Google Research

References

[1] Valliappan, Nachiappan, Na Dai, Ethan Steinberg, Junfeng He, Kantwon Rogers, Venky Ramachandran, Pingmei Xu et al. "Accelerating eye movement research via accurate and affordable smartphone eye tracking." Nature communications 11, no. 1 (2020): 1-12.

[2] Jiang, Ming, Shengsheng Huang, Juanyong Duan, and Qi Zhao. "Salicon: Saliency in context." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1072-1080. 2015.

Memory Safe Languages in Android 13

For more than a decade, memory safety vulnerabilities have consistently represented more than 65% of vulnerabilities across products, and across the industry. On Android, we’re now seeing something different - a significant drop in memory safety vulnerabilities and an associated drop in the severity of our vulnerabilities.

Looking at vulnerabilities reported in the Android security bulletin, which includes critical/high severity vulnerabilities reported through our vulnerability rewards program (VRP) and vulnerabilities reported internally, we see that the number of memory safety vulnerabilities have dropped considerably over the past few years/releases. From 2019 to 2022 the annual number of memory safety vulnerabilities dropped from 223 down to 85.

This drop coincides with a shift in programming language usage away from memory unsafe languages. Android 13 is the first Android release where a majority of new code added to the release is in a memory safe language.

As the amount of new memory-unsafe code entering Android has decreased, so too has the number of memory safety vulnerabilities. From 2019 to 2022 it has dropped from 76% down to 35% of Android’s total vulnerabilities. 2022 is the first year where memory safety vulnerabilities do not represent a majority of Android’s vulnerabilities.

While correlation doesn’t necessarily mean causation, it’s interesting to note that the percent of vulnerabilities caused by memory safety issues seems to correlate rather closely with the development language that’s used for new code. This matches the expectations published in our blog post 2 years ago about the age of memory safety vulnerabilities and why our focus should be on new code, not rewriting existing components. Of course there may be other contributing factors or alternative explanations. However, the shift is a major departure from industry-wide trends that have persisted for more than a decade (and likely longer) despite substantial investments in improvements to memory unsafe languages.

We continue to invest in tools to improve the safety of our C/C++. Over the past few releases we’ve introduced the Scudo hardened allocator, HWASAN, GWP-ASAN, and KFENCE on production Android devices. We’ve also increased our fuzzing coverage on our existing code base. Vulnerabilities found using these tools contributed both to prevention of vulnerabilities in new code as well as vulnerabilities found in old code that are included in the above evaluation. These are important tools, and critically important for our C/C++ code. However, these alone do not account for the large shift in vulnerabilities that we’re seeing, and other projects that have deployed these technologies have not seen a major shift in their vulnerability composition. We believe Android’s ongoing shift from memory-unsafe to memory-safe languages is a major factor.

Rust for Native Code

In Android 12 we announced support for the Rust programming language in the Android platform as a memory-safe alternative to C/C++. Since then we’ve been scaling up our Rust experience and usage within the Android Open Source Project (AOSP).

As we noted in the original announcement, our goal is not to convert existing C/C++ to Rust, but rather to shift development of new code to memory safe languages over time.

In Android 13, about 21% of all new native code (C/C++/Rust) is in Rust. There are approximately 1.5 million total lines of Rust code in AOSP across new functionality and components such as Keystore2, the new Ultra-wideband (UWB) stack, DNS-over-HTTP3, Android’s Virtualization framework (AVF), and various other components and their open source dependencies. These are low-level components that require a systems language which otherwise would have been implemented in C++.

Security impact

To date, there have been zero memory safety vulnerabilities discovered in Android’s Rust code.


We don’t expect that number to stay zero forever, but given the volume of new Rust code across two Android releases, and the security-sensitive components where it’s being used, it’s a significant result. It demonstrates that Rust is fulfilling its intended purpose of preventing Android’s most common source of vulnerabilities. Historical vulnerability density is greater than 1/kLOC (1 vulnerability per thousand lines of code) in many of Android’s C/C++ components (e.g. media, Bluetooth, NFC, etc). Based on this historical vulnerability density, it’s likely that using Rust has already prevented hundreds of vulnerabilities from reaching production.

What about unsafe Rust?

Operating system development requires accessing resources that the compiler cannot reason about. For memory-safe languages this means that an escape hatch is required to do systems programming. For Java, Android uses JNI to access low-level resources. When using JNI, care must be taken to avoid introducing unsafe behavior. Fortunately, it has proven significantly simpler to review small snippets of C/C++ for safety than entire programs. There are no pure Java processes in Android. It’s all built on top of JNI. Despite that, memory safety vulnerabilities are exceptionally rare in our Java code.

Rust likewise has the unsafe{} escape hatch which allows interacting with system resources and non-Rust code. Much like with Java + JNI, using this escape hatch comes with additional scrutiny. But like Java, our Rust code is proving to be significantly safer than pure C/C++ implementations. Let’s look at the new UWB stack as an example.

There are exactly two uses of unsafe in the UWB code: one to materialize a reference to a Rust object stored inside a Java object, and another for the teardown of the same. Unsafe was actively helpful in this situation because the extra attention on this code allowed us to discover a possible race condition and guard against it.

In general, use of unsafe in Android’s Rust appears to be working as intended. It’s used rarely, and when it is used, it’s encapsulating behavior that’s easier to reason about and review for safety.

Safety measures make memory-unsafe languages slow

Mobile devices have limited resources and we’re always trying to make better use of them to provide users with a better experience (for example, by optimizing performance, improving battery life, and reducing lag). Using memory unsafe code often means that we have to make tradeoffs between security and performance, such as adding additional sandboxing, sanitizers, runtime mitigations, and hardware protections. Unfortunately, these all negatively impact code size, memory, and performance.

Using Rust in Android allows us to optimize both security and system health with fewer compromises. For example, with the new UWB stack we were able to save several megabytes of memory and avoid some IPC latency by running it within an existing process. The new DNS-over-HTTP/3 implementation uses fewer threads to perform the same amount of work by using Rust’s async/await feature to process many tasks on a single thread in a safe manner.

What about non-memory-safety vulnerabilities?

The number of vulnerabilities reported in the bulletin has stayed somewhat steady over the past 4 years at around 20 per month, even as the number of memory safety vulnerabilities has gone down significantly. So, what gives? A few thoughts on that.

A drop in severity

Memory safety vulnerabilities disproportionately represent our most severe vulnerabilities. In 2022, despite only representing 36% of vulnerabilities in the security bulletin, memory-safety vulnerabilities accounted for 86% of our critical severity security vulnerabilities, our highest rating, and 89% of our remotely exploitable vulnerabilities. Over the past few years, memory safety vulnerabilities have accounted for 78% of confirmed exploited “in-the-wild” vulnerabilities on Android devices.

Many vulnerabilities have a well defined scope of impact. For example, a permissions bypass vulnerability generally grants access to a specific set of information or resources and is generally only reachable if code is already running on the device. Memory safety vulnerabilities tend to be much more versatile. Getting code execution in a process grants access not just to a specific resource, but everything that that process has access to, including attack surface to other processes. Memory safety vulnerabilities are often flexible enough to allow chaining multiple vulnerabilities together. The high versatility is perhaps one reason why the vast majority of exploit chains that we have seen use one or more memory safety vulnerabilities.

With the drop in memory safety vulnerabilities, we’re seeing a corresponding drop in vulnerability severity.



With the decrease in our most severe vulnerabilities, we’re seeing increased reports of less severe vulnerability types. For example, about 15% of vulnerabilities in 2022 are DoS vulnerabilities (requiring a factory reset of the device). This represents a drop in security risk.

Android appreciates our security research community and all contributions made to the Android VRP. We apply higher payouts for more severe vulnerabilities to ensure that incentives are aligned with vulnerability risk. As we make it harder to find and exploit memory safety vulnerabilities, security researchers are pivoting their focus towards other vulnerability types. Perhaps the total number of vulnerabilities found is primarily constrained by the total researcher time devoted to finding them. Or perhaps there’s another explanation that we have not considered. In any case, we hope that if our vulnerability researcher community is finding fewer of these powerful and versatile vulnerabilities, the same applies to adversaries.

Attack surface

Despite most of the existing code in Android being in C/C++, most of Android’s API surface is implemented in Java. This means that Java is disproportionately represented in the OS’s attack surface that is reachable by apps. This provides an important security property: most of the attack surface that’s reachable by apps isn’t susceptible to memory corruption bugs. It also means that we would expect Java to be over-represented when looking at non-memory safety vulnerabilities. It’s important to note however that types of vulnerabilities that we’re seeing in Java are largely logic bugs, and as mentioned above, generally lower in severity. Going forward, we will be exploring how Rust’s richer type system can help prevent common types of logic bugs as well.

Google’s ability to react

With the vulnerability types we’re seeing now, Google’s ability to detect and prevent misuse is considerably better. Apps are scanned to help detect misuse of APIs before being published on the Play store and Google Play Protect warns users if they have abusive apps installed.

What’s next?

Migrating away from C/C++ is challenging, but we’re making progress. Rust use is growing in the Android platform, but that’s not the end of the story. To meet the goals of improving security, stability, and quality Android-wide, we need to be able to use Rust anywhere in the codebase that native code is required. We’re implementing userspace HALs in Rust. We’re adding support for Rust in Trusted Applications. We’ve migrated VM firmware in the Android Virtualization Framework to Rust. With support for Rust landing in Linux 6.1 we’re excited to bring memory-safety to the kernel, starting with kernel drivers.

As Android migrates away from C/C++ to Java/Kotlin/Rust, we expect the number of memory safety vulnerabilities to continue to fall. Here’s to a future where memory corruption bugs on Android are rare!