Author Archives: Scott Westover

Rust in the Linux kernel

In our previous post, we announced that Android now supports the Rust programming language for developing the OS itself. Related to this, we are also participating in the effort to evaluate the use of Rust as a supported language for developing the Linux kernel. In this post, we discuss some technical aspects of this work using a few simple examples.

C has been the language of choice for writing kernels for almost half a century because it offers the level of control and predictable performance required by such a critical component. Density of memory safety bugs in the Linux kernel is generally quite low due to high code quality, high standards of code review, and carefully implemented safeguards. However, memory safety bugs do still regularly occur. On Android, vulnerabilities in the kernel are generally considered high-severity because they can result in a security model bypass due to the privileged mode that the kernel runs in.

We feel that Rust is now ready to join C as a practical language for implementing the kernel. It can help us reduce the number of potential bugs and security vulnerabilities in privileged code while playing nicely with the core kernel and preserving its performance characteristics.

Supporting Rust

We developed an initial prototype of the Binder driver to allow us to make meaningful comparisons between the safety and performance characteristics of the existing C version and its Rust counterpart. The Linux kernel has over 30 million lines of code, so naturally our goal is not to convert it all to Rust but rather to allow new code to be written in Rust. We believe this incremental approach allows us to benefit from the kernel’s existing high-performance implementation while providing kernel developers with new tools to improve memory safety and maintain performance going forward.

We joined the Rust for Linux organization, where the community had already done and continues to do great work toward adding Rust support to the Linux kernel build system. We also need designs that allow code in the two languages to interact with each other: we're particularly interested in safe, zero-cost abstractions that allow Rust code to use kernel functionality written in C, and how to implement functionality in idiomatic Rust that can be called seamlessly from the C portions of the kernel.

Since Rust is a new language for the kernel, we also have the opportunity to enforce best practices in terms of documentation and uniformity. For example, we have specific machine-checked requirements around the usage of unsafe code: for every unsafe function, the developer must document the requirements that need to be satisfied by callers to ensure that its usage is safe; additionally, for every call to unsafe functions (or usage of unsafe constructs like dereferencing a raw pointer), the developer must document the justification for why it is safe to do so.

Just as important as safety, Rust support needs to be convenient and helpful for developers to use. Let’s get into a few examples of how Rust can assist kernel developers in writing drivers that are safe and correct.

Example driver

We'll use an implementation of a semaphore character device. Each device has a current value; writes of n bytes result in the device value being incremented by n; reads decrement the value by 1 unless the value is 0, in which case they will block until they can decrement the count without going below 0.

Suppose semaphore is a file representing our device. We can interact with it from the shell as follows:

> cat semaphore

When semaphore is a newly initialized device, the command above will block because the device's current value is 0. It will be unblocked if we run the following command from another shell because it increments the value by 1, which allows the original read to complete:

> echo -n a > semaphore

We could also increment the count by more than 1 if we write more data, for example:

> echo -n abc > semaphore

increments the count by 3, so the next 3 reads won't block.

To allow us to show a few more aspects of Rust, we'll add the following features to our driver: remember what the maximum value was throughout the lifetime of a device, and remember how many reads each file issued on the device.

We'll now show how such a driver would be implemented in Rust, contrasting it with a C implementation. We note, however, we are still early on so this is all subject to change in the future. How Rust can assist the developer is the aspect that we'd like to emphasize. For example, at compile time it allows us to eliminate or greatly reduce the chances of introducing classes of bugs, while at the same time remaining flexible and having minimal overhead.

Character devices

A developer needs to do the following to implement a driver for a new character device in Rust:

  1. Implement the FileOperations trait: all associated functions are optional, so the developer only needs to implement the relevant ones for their scenario. They relate to the fields in C's struct file_operations.
  2. Implement the FileOpener trait: it is a type-safe equivalent to C's open field of struct file_operations.
  3. Register the new device type with the kernel: this lets the kernel know what functions need to be called in response to files of this new type being operated on.

The following outlines how the first two steps of our example compare in Rust and C:

impl FileOpener<Arc<Semaphore>> for FileState {
fn open(
shared: &Arc<Semaphore>
) -> KernelResult<Box<Self>> {
[...]
}
}

impl FileOperations for FileState {
type Wrapper = Box<Self>;

fn read(
&self,
_: &File,
data: &mut UserSlicePtrWriter,
offset: u64
) -> KernelResult<usize> {
[...]
}

fn write(
&self,
data: &mut UserSlicePtrReader,
_offset: u64
) -> KernelResult<usize> {
[...]
}

fn ioctl(
&self,
file: &File,
cmd: &mut IoctlCommand
) -> KernelResult<i32> {
[...]
}

fn release(_obj: Box<Self>, _file: &File) {
[...]
}

declare_file_operations!(read, write, ioctl);
}
static 
int semaphore_open(struct inode *nodp,
struct file *filp)

{
struct semaphore_state *shared =
container_of(filp->private_data,
struct semaphore_state,
miscdev);
[...]
}

static
ssize_t semaphore_write(struct file *filp,
const char __user *buffer,
size_t count, loff_t *ppos)

{
struct file_state *state = filp->private_data;
[...]
}

static
ssize_t semaphore_read(struct file *filp,
char __user *buffer,
size_t count, loff_t *ppos)

{
struct file_state *state = filp->private_data;
[...]
}

static
long semaphore_ioctl(struct file *filp,
unsigned int cmd,
unsigned long arg)

{
struct file_state *state = filp->private_data;
[...]
}

static
int semaphore_release(struct inode *nodp,
struct file *filp)

{
struct file_state *state = filp->private_data;
[...]
}

static const struct file_operations semaphore_fops = {
.owner = THIS_MODULE,
.open = semaphore_open,
.read = semaphore_read,
.write = semaphore_write,
.compat_ioctl = semaphore_ioctl,
.release = semaphore_release,
};

Character devices in Rust benefit from a number of safety features:

  • Per-file state lifetime management: FileOpener::open returns an object whose lifetime is owned by the caller from then on. Any object that implements the PointerWrapper trait can be returned, and we provide implementations for Box<T> and Arc<T>, so developers that use Rust's idiomatic heap-allocated or reference-counted pointers have no additional requirements.

    All associated functions in FileOperations receive non-mutable references to self (more about this below), except the release function, which is the last function to be called and receives the plain object back (and its ownership with it). The release implementation can then defer the object destruction by transferring its ownership elsewhere, or destroy it then; in the case of a reference-counted object, 'destruction' means decrementing the reference count (and actual object destruction if the count goes to zero).

    That is, we use Rust's ownership discipline when interacting with C code by handing the C portion ownership of a Rust object, allowing it to call functions implemented in Rust, then eventually giving ownership back. So as long as the C code is correct, the lifetime of Rust file objects work seamlessly as well, with the compiler enforcing correct lifetime management on the Rust side, for example: open cannot return stack-allocated pointers or heap-allocated objects containing pointers to the stack, ioctl/read/write cannot free (or modify without synchronization) the contents of the object stored in filp->private_data, etc.

  • Non-mutable references: the associated functions called between open and release all receive non-mutable references to self because they can be called concurrently by multiple threads and Rust aliasing rules prohibit more than one mutable reference to an object at any given time.

    If a developer needs to modify some state (and they generally do), they can do so via interior mutability: mutable state can be wrapped in a Mutex<T> or SpinLock<T> (or atomics) and safely modified through them.

    This prevents, at compile-time, bugs where a developer fails to acquire the appropriate lock when accessing a field (the field is inaccessible), or when a developer fails to wrap a field with a lock (the field is read-only).

  • Per-device state: when file instances need to share per-device state, which is a very common occurrence in drivers, they can do so safely in Rust. When a device is registered, a typed object can be provided and a non-mutable reference to it is provided when FileOperation::open is called. In our example, the shared object is wrapped in Arc<T>, so files can safely clone and hold on to a reference to them.

    The reason FileOperation is its own trait (as opposed to, for example, open being part of the FileOperations trait) is to allow a single file implementation to be registered in different ways.

    This eliminates opportunities for developers to get the wrong data when trying to retrieve shared state. For example, in C when a miscdevice is registered, a pointer to it is available in filp->private_data; when a cdev is registered, a pointer to it is available in inode->i_cdev. These structs are usually embedded in an outer struct that contains the shared state, so developers usually use the container_of macro to recover the shared state. Rust encapsulates all of this and the potentially troublesome pointer casts in a safe abstraction.

  • Static typing: we take advantage of Rust's support for generics to implement all of the above functions and types with static types. So there are no opportunities for a developer to convert an untyped variable or field to the wrong type. The C code in the table above has casts from an essentially untyped (void *) pointer to the desired type at the start of each function: this is likely to work fine when first written, but may lead to bugs as the code evolves and assumptions change. Rust would catch any such mistakes at compile time.

  • File operations: as we mentioned before, a developer needs to implement the FileOperations trait to customize the behavior of their device. They do this with a block starting with impl FileOperations for Device, where Device is the type implementing the file behavior (FileState in our example). Once inside this block, tools know that only a limited number of functions can be defined, so they can automatically insert the prototypes. (Personally, I use neovim and the rust-analyzer LSP server.)

    While we use this trait in Rust, the C portion of the kernel still requires an instance of struct file_operations. The kernel crate automatically generates one from the trait implementation (and optionally the declare_file_operations macro): although it has code to generate the correct struct, it is all const, so evaluated at compile-time with zero runtime cost.

Ioctl handling

For a driver to provide a custom ioctl handler, it needs to implement the ioctl function that is part of the FileOperations trait, as exemplified in the table below.

fn ioctl(
&self,
file: &File,
cmd: &mut IoctlCommand
) -> KernelResult<i32> {
cmd.dispatch(self, file)
}

impl IoctlHandler for FileState {
fn read(
&self,
_file: &File,
cmd: u32,
writer: &mut UserSlicePtrWriter
) -> KernelResult<i32> {
match cmd {
IOCTL_GET_READ_COUNT => {
writer.write(
&self
.read_count
.load(Ordering::Relaxed))?;
Ok(0)
}
_ => Err(Error::EINVAL),
}
}

fn write(
&self,
_file: &File,
cmd: u32,
reader: &mut UserSlicePtrReader
) -> KernelResult<i32> {
match cmd {
IOCTL_SET_READ_COUNT => {
self
.read_count
.store(reader.read()?,
Ordering::Relaxed);
Ok(0)
}
_ => Err(Error::EINVAL),
}
}
}
#define IOCTL_GET_READ_COUNT _IOR('c', 1, u64)
#define IOCTL_SET_READ_COUNT _IOW('c', 1, u64)

static
long semaphore_ioctl(struct file *filp,
unsigned int cmd,
unsigned long arg)

{
struct file_state *state = filp->private_data;
void __user *buffer = (void __user *)arg;
u64 value;

switch (cmd) {
case IOCTL_GET_READ_COUNT:
value = atomic64_read(&state->read_count);
if (copy_to_user(buffer, &value, sizeof(value)))
return -EFAULT;
return 0;
case IOCTL_SET_READ_COUNT:
if (copy_from_user(&value, buffer, sizeof(value)))
return -EFAULT;
atomic64_set(&state->read_count, value);
return 0;
default:
return -EINVAL;
}
}

Ioctl commands are standardized such that, given a command, we know whether a user buffer is provided, its intended use (read, write, both, none), and its size. In Rust, we provide a dispatcher (accessible by calling cmd.dispatch) that uses this information to automatically create user memory access helpers and pass them to the caller.

A driver is not required to use this though. If, for example, it doesn't use the standard ioctl encoding, Rust offers the flexibility of simply calling cmd.raw to extract the raw arguments and using them to handle the ioctl (potentially with unsafe code, which will need to be justified).

However, if a driver implementation does use the standard dispatcher, it will benefit from not having to implement any unsafe code, and:

  • The pointer to user memory is never a native pointer, so the developer cannot accidentally dereference it.
  • The types that allow the driver to read from user space only allow data to be read once, so we eliminate the risk of time-of-check to time-of-use (TOCTOU) bugs because when a driver needs to access data twice, it needs to copy it to kernel memory, where an attacker is not allowed to modify it. Excluding unsafe blocks, there is no way to introduce this class of bugs in Rust.
  • No accidental overflow of the user buffer: we'll never read or write past the end of the user buffer because this is enforced automatically based on the size encoded in the ioctl command. In our example above, the implementation of IOCTL_GET_READ_COUNT only has access to an instance of UserSlicePtrWriter, which limits the number of writable bytes to sizeof(u64) as encoded in the ioctl command.
  • No mixing of reads and writes: we'll never write buffers for ioctls that are only meant to read and never read buffers for ioctls that are only meant to write. This is enforced by read and write handlers only getting instances of UserSlicePtrWriter and UserSlicePtrReader respectively.

All of the above could potentially also be done in C, but it's very easy for developers to (likely unintentionally) break contracts that lead to unsafety; Rust requires unsafe blocks for this, which should only be used in rare cases and brings additional scrutiny. Additionally, Rust offers the following:

  • The types used to read and write user memory do not implement the Send and Sync traits, which means that they (and pointers to them) are not safe to be used in another thread context. In Rust, if a driver developer attempted to write code that passed one of these objects to another thread (where it wouldn't be safe to use them because it isn't necessarily in the right memory manager context), they would get a compilation error.
  • When calling IoctlCommand::dispatch, one might understandably think that we need dynamic dispatching to reach the actual handler implementation (which would incur additional cost in comparison to C), but we don't. Our usage of generics will lead the compiler to monomorphize the function, which will result in static function calls that can even be inlined if the optimizer so chooses.

Locking and condition variables

We allow developers to use mutexes and spinlocks to provide interior mutability. In our example, we use a mutex to protect mutable data; in the tables below we show the data structures we use in C and Rust, and how we implement a wait until the count is nonzero so that we can satisfy a read:

struct SemaphoreInner {
count: usize,
max_seen: usize,
}

struct Semaphore {
changed: CondVar,
inner: Mutex<SemaphoreInner>,
}

struct FileState {
read_count: AtomicU64,
shared: Arc<Semaphore>,
}
struct semaphore_state {
struct kref ref;
struct miscdevice miscdev;
wait_queue_head_t changed;
struct mutex mutex;
size_t count;
size_t max_seen;
};

struct file_state {
atomic64_t read_count;
struct semaphore_state *shared;
};

fn consume(&self) -> KernelResult {
let mut inner = self.shared.inner.lock();
while inner.count == 0 {
if self.shared.changed.wait(&mut inner) {
return Err(Error::EINTR);
}
}
inner.count -= 1;
Ok(())
}
static int semaphore_consume(
struct semaphore_state *state)

{
DEFINE_WAIT(wait);

mutex_lock(&state->mutex);
while (state->count == 0) {
prepare_to_wait(&state->changed, &wait,
TASK_INTERRUPTIBLE);
mutex_unlock(&state->mutex);
schedule();
finish_wait(&state->changed, &wait);
if (signal_pending(current))
return -EINTR;
mutex_lock(&state->mutex);
}

state->count--;
mutex_unlock(&state->mutex);

return 0;
}

We note that such waits are not uncommon in the existing C code, for example, a pipe waiting for a "partner" to write, a unix-domain socket waiting for data, an inode search waiting for completion of a delete, or a user-mode helper waiting for state change.

The following are benefits from the Rust implementation:

  • The Semaphore::inner field is only accessible when the lock is held, through the guard returned by the lock function. So developers cannot accidentally read or write protected data without locking it first. In the C example above, count and max_seen in semaphore_state are protected by mutex, but there is no enforcement that the lock is held while they're accessed.
  • Resource Acquisition Is Initialization (RAII): the lock is unlocked automatically when the guard (inner in this case) goes out of scope. This ensures that locks are always unlocked: if the developer needs to keep a lock locked, they can keep the guard alive, for example, by returning the guard itself; conversely, if they need to unlock before the end of the scope, they can explicitly do it by calling the drop function.
  • Developers can use any lock that implements the Lock trait, which includes Mutex and SpinLock, at no additional runtime cost when compared to a C implementation. Other synchronization constructs, including condition variables, also work transparently and with zero additional run-time cost.
  • Rust implements condition variables using kernel wait queues. This allows developers to benefit from atomic release of the lock and putting the thread to sleep without having to reason about low-level kernel scheduler functions. In the C example above, semaphore_consume is a mix of semaphore logic and subtle Linux scheduling: for example, the code is incorrect if mutex_unlock is called before prepare_to_wait because it may result in a wake up being missed.
  • No unsynchronized access: as we mentioned before, variables shared by multiple threads/CPUs must be read-only, with interior mutability being the solution for cases when mutability is needed. In addition to the example with locks above, the ioctl example in the previous section also has an example of using an atomic variable; Rust also requires developers to specify how memory is to be synchronized by atomic accesses. In the C part of the example, we happen to use atomic64_t, but the compiler won't alert a developer to this need.

Error handling and control flow

In the tables below, we show how open, read, and write are implemented in our example driver:

fn read(
&self,
_: &File,
data: &mut UserSlicePtrWriter,
offset: u64
) -> KernelResult<usize> {
if data.is_empty() || offset > 0 {
return Ok(0);
}

self.consume()?;
data.write_slice(&[0u8; 1])?;
self.read_count.fetch_add(1, Ordering::Relaxed);
Ok(1)
}

static
ssize_t semaphore_read(struct file *filp,
char __user *buffer,
size_t count, loff_t *ppos)

{
struct file_state *state = filp->private_data;
char c = 0;
int ret;

if (count == 0 || *ppos > 0)
return 0;

ret = semaphore_consume(state->shared);
if (ret)
return ret;

if (copy_to_user(buffer, &c, sizeof(c)))
return -EFAULT;

atomic64_add(1, &state->read_count);
*ppos += 1;
return 1;
}

fn write(
&self,
data: &mut UserSlicePtrReader,
_offset: u64
) -> KernelResult<usize> {
{
let mut inner = self.shared.inner.lock();
inner.count = inner.count.saturating_add(data.len());
if inner.count > inner.max_seen {
inner.max_seen = inner.count;
}
}

self.shared.changed.notify_all();
Ok(data.len())
}
static
ssize_t semaphore_write(struct file *filp,
const char __user *buffer,
size_t count, loff_t *ppos)

{
struct file_state *state = filp->private_data;
struct semaphore_state *shared = state->shared;

mutex_lock(&shared->mutex);
shared->count += count;
if (shared->count < count)
shared->count = SIZE_MAX;

if (shared->count > shared->max_seen)
shared->max_seen = shared->count;

mutex_unlock(&shared->mutex);

wake_up_all(&shared->changed);
return count;
}

fn open(
shared: &Arc<Semaphore>
) -> KernelResult<Box<Self>> {
Ok(Box::try_new(Self {
read_count: AtomicU64::new(0),
shared: shared.clone(),
})?)
}
static 
int semaphore_open(struct inode *nodp,
struct file *filp)

{
struct semaphore_state *shared =
container_of(filp->private_data,
struct semaphore_state,
miscdev);
struct file_state *state;

state = kzalloc(sizeof(*state), GFP_KERNEL);
if (!state)
return -ENOMEM;

kref_get(&shared->ref);
state->shared = shared;
atomic64_set(&state->read_count, 0);

filp->private_data = state;

return 0;
}

They illustrate other benefits brought by Rust:

  • The ? operator: it is used by the Rust open and read implementations to do error handling implicitly; the developer can focus on the semaphore logic, the resulting code being quite small and readable. The C versions have error-handling noise that can make them less readable.
  • Required initialization: Rust requires all fields of a struct to be initialized on construction, so the developer can never accidentally fail to initialize a field; C offers no such facility. In our open example above, the developer of the C version could easily fail to call kref_get (even though all fields would have been initialized); in Rust, the user is required to call clone (which increments the ref count), otherwise they get a compilation error.
  • RAII scoping: the Rust write implementation uses a statement block to control when inner goes out of scope and therefore the lock is released.
  • Integer overflow behavior: Rust encourages developers to always consider how overflows should be handled. In our write example, we want a saturating one so that we don't end up with a zero value when adding to our semaphore. In C, we need to manually check for overflows, there is no additional support from the compiler.

What's next

The examples above are only a small part of the whole project. We hope it gives readers a glimpse of the kinds of benefits that Rust brings. At the moment we have nearly all generic kernel functionality needed by Binder neatly wrapped in safe Rust abstractions, so we are in the process of gathering feedback from the broader Linux kernel community with the intent of upstreaming the existing Rust support.

We also continue to make progress on our Binder prototype, implement additional abstractions, and smooth out some rough edges. This is an exciting time and a rare opportunity to potentially influence how the Linux kernel is developed, as well as inform the evolution of the Rust language. We invite those interested to join us in Rust for Linux and attend our planned talk at Linux Plumbers Conference 2021!


Thanks Nick Desaulniers, Kees Cook, and Adrian Taylor for contributions to this post. Special thanks to Jeff Vander Stoep for contributions and editing, and to Greg Kroah-Hartman for reviewing and contributing to the code examples.

Rust in the Android platform

Correctness of code in the Android platform is a top priority for the security, stability, and quality of each Android release. Memory safety bugs in C and C++ continue to be the most-difficult-to-address source of incorrectness. We invest a great deal of effort and resources into detecting, fixing, and mitigating this class of bugs, and these efforts are effective in preventing a large number of bugs from making it into Android releases. Yet in spite of these efforts, memory safety bugs continue to be a top contributor of stability issues, and consistently represent ~70% of Android’s high severity security vulnerabilities.

In addition to ongoing and upcoming efforts to improve detection of memory bugs, we are ramping up efforts to prevent them in the first place. Memory-safe languages are the most cost-effective means for preventing memory bugs. In addition to memory-safe languages like Kotlin and Java, we’re excited to announce that the Android Open Source Project (AOSP) now supports the Rust programming language for developing the OS itself.

Systems programming

Managed languages like Java and Kotlin are the best option for Android app development. These languages are designed for ease of use, portability, and safety. The Android Runtime (ART) manages memory on behalf of the developer. The Android OS uses Java extensively, effectively protecting large portions of the Android platform from memory bugs. Unfortunately, for the lower layers of the OS, Java and Kotlin are not an option.


Lower levels of the OS require systems programming languages like C, C++, and Rust. These languages are designed with control and predictability as goals. They provide access to low level system resources and hardware. They are light on resources and have more predictable performance characteristics.

For C and C++, the developer is responsible for managing memory lifetime. Unfortunately, it's easy to make mistakes when doing this, especially in complex and multithreaded codebases.


Rust provides memory safety guarantees by using a combination of compile-time checks to enforce object lifetime/ownership and runtime checks to ensure that memory accesses are valid. This safety is achieved while providing equivalent performance to C and C++.

The limits of sandboxing

C and C++ languages don’t provide these same safety guarantees and require robust isolation. All Android processes are sandboxed and we follow the Rule of 2 to decide if functionality necessitates additional isolation and deprivileging. The Rule of 2 is simple: given three options, developers may only select two of the following three options.

For Android, this means that if code is written in C/C++ and parses untrustworthy input, it should be contained within a tightly constrained and unprivileged sandbox. While adherence to the Rule of 2 has been effective in reducing the severity and reachability of security vulnerabilities, it does come with limitations. Sandboxing is expensive: the new processes it requires consume additional overhead and introduce latency due to IPC and additional memory usage. Sandboxing doesn’t eliminate vulnerabilities from the code and its efficacy is reduced by high bug density, allowing attackers to chain multiple vulnerabilities together.

Memory-safe languages like Rust help us overcome these limitations in two ways:

  1. Lowers the density of bugs within our code, which increases the effectiveness of our current sandboxing.
  2. Reduces our sandboxing needs, allowing introduction of new features that are both safer and lighter on resources.

But what about all that existing C++?

Of course, introducing a new programming language does nothing to address bugs in our existing C/C++ code. Even if we redirected the efforts of every software engineer on the Android team, rewriting tens of millions of lines of code is simply not feasible.

The above analysis of the age of memory safety bugs in Android (measured from when they were first introduced) demonstrates why our memory-safe language efforts are best focused on new development and not on rewriting mature C/C++ code. Most of our memory bugs occur in new or recently modified code, with about 50% being less than a year old.

The comparative rarity of older memory bugs may come as a surprise to some, but we’ve found that old code is not where we most urgently need improvement. Software bugs are found and fixed over time, so we would expect the number of bugs in code that is being maintained but not actively developed to go down over time. Just as reducing the number and density of bugs improves the effectiveness of sandboxing, it also improves the effectiveness of bug detection.

Limitations of detection

Bug detection via robust testing, sanitization, and fuzzing is crucial for improving the quality and correctness of all software, including software written in Rust. A key limitation for the most effective memory safety detection techniques is that the erroneous state must actually be triggered in instrumented code in order to be detected. Even in code bases with excellent test/fuzz coverage, this results in a lot of bugs going undetected.

Another limitation is that bug detection is scaling faster than bug fixing. In some projects, bugs that are being detected are not always getting fixed. Bug fixing is a long and costly process.

Each of these steps is costly, and missing any one of them can result in the bug going unpatched for some or all users. For complex C/C++ code bases, often there are only a handful of people capable of developing and reviewing the fix, and even with a high amount of effort spent on fixing bugs, sometimes the fixes are incorrect.

Bug detection is most effective when bugs are relatively rare and dangerous bugs can be given the urgency and priority that they merit. Our ability to reap the benefits of improvements in bug detection require that we prioritize preventing the introduction of new bugs.

Prioritizing prevention

Rust modernizes a range of other language aspects, which results in improved correctness of code:

  • Memory safety - enforces memory safety through a combination of compiler and run-time checks.
  • Data concurrency - prevents data races. The ease with which this allows users to write efficient, thread-safe code has given rise to Rust’s Fearless Concurrency slogan.
  • More expressive type system - helps prevent logical programming bugs (e.g. newtype wrappers, enum variants with contents).
  • References and variables are immutable by default - assist the developer in following the security principle of least privilege, marking a reference or variable mutable only when they actually intend it to be so. While C++ has const, it tends to be used infrequently and inconsistently. In comparison, the Rust compiler assists in avoiding stray mutability annotations by offering warnings for mutable values which are never mutated.
  • Better error handling in standard libraries - wrap potentially failing calls in Result, which causes the compiler to require that users check for failures even for functions which do not return a needed value. This protects against bugs like the Rage Against the Cage vulnerability which resulted from an unhandled error. By making it easy to propagate errors via the ? operator and optimizing Result for low overhead, Rust encourages users to write their fallible functions in the same style and receive the same protection.
  • Initialization - requires that all variables be initialized before use. Uninitialized memory vulnerabilities have historically been the root cause of 3-5% of security vulnerabilities on Android. In Android 11, we started auto initializing memory in C/C++ to reduce this problem. However, initializing to zero is not always safe, particularly for things like return values, where this could become a new source of faulty error handling. Rust requires every variable be initialized to a legal member of its type before use, avoiding the issue of unintentionally initializing to an unsafe value. Similar to Clang for C/C++, the Rust compiler is aware of the initialization requirement, and avoids any potential performance overhead of double initialization.
  • Safer integer handling - Overflow sanitization is on for Rust debug builds by default, encouraging programmers to specify a wrapping_add if they truly intend a calculation to overflow or saturating_add if they don’t. We intend to enable overflow sanitization for all builds in Android. Further, all integer type conversions are explicit casts: developers can not accidentally cast during a function call when assigning to a variable or when attempting to do arithmetic with other types.

Where we go from here

Adding a new language to the Android platform is a large undertaking. There are toolchains and dependencies that need to be maintained, test infrastructure and tooling that must be updated, and developers that need to be trained. For the past 18 months we have been adding Rust support to the Android Open Source Project, and we have a few early adopter projects that we will be sharing in the coming months. Scaling this to more of the OS is a multi-year project. Stay tuned, we will be posting more updates on this blog.

Java is a registered trademark of Oracle and/or its affiliates.

Announcing the Android Ready SE Alliance

When the Pixel 3 launched in 2018, it had a new tamper-resistant hardware enclave called Titan M. In addition to being a root-of-trust for Pixel software and firmware, it also enabled tamper-resistant key storage for Android Apps using StrongBox. StrongBox is an implementation of the Keymaster HAL that resides in a hardware security module. It is an important security enhancement for Android devices and paved the way for us to consider features that were previously not possible.

StrongBox and tamper-resistant hardware are becoming important requirements for emerging user features, including:

  • Digital keys (car, home, office)
  • Mobile Driver’s License (mDL), National ID, ePassports
  • eMoney solutions (for example, Wallet)

All these features need to run on tamper-resistant hardware to protect the integrity of the application executables and a user’s data, keys, wallet, and more. Most modern phones now include discrete tamper-resistant hardware called a Secure Element (SE). We believe this SE offers the best path for introducing these new consumer use cases in Android.

In order to accelerate adoption of these new Android use cases, we are announcing the formation of the Android Ready SE Alliance. SE vendors are joining hands with Google to create a set of open-source, validated, and ready-to-use SE Applets. Today, we are launching the General Availability (GA) version of StrongBox for SE. This applet is qualified and ready for use by our OEM partners. It is currently available from Giesecke+Devrient, Kigen, NXP, STMicroelectronics, and Thales.

It is important to note that these features are not just for phones and tablets. StrongBox is also applicable to WearOS, Android Auto Embedded, and Android TV.

Using Android Ready SE in a device requires the OEM to:

  1. Pick the appropriate, validated hardware part from their SE vendor
  2. Enable SE to be initialized from the bootloader and provision the root-of-trust (RoT) parameters through the SPI interface or cryptographic binding
  3. Work with Google to provision Attestation Keys/Certificates in the SE factory
  4. Use the GA version of the StrongBox for the SE applet, adapted to your SE
  5. Integrate HAL code
  6. Enable an SE upgrade mechanism
  7. Run CTS/VTS tests for StrongBox to verify that the integration is done correctly

We are working with our ecosystem to prioritize and deliver the following Applets in conjunction with corresponding Android feature releases:

  • Mobile driver’s license and Identity Credentials
  • Digital car keys

We already have several Android OEMs adopting Android Ready SE for their devices. We look forward to working with our OEM partners to bring these next generation features for our users.

Please visit our Android Security and Privacy developer site for more info.

Continuing to Raise the Bar for Verifiable Security on Pixel

Evaluating the security of mobile devices is difficult, and a trusted way to validate a company’s claims is through independent, industry certifications. When it comes to smartphones one of the most rigorous end-to-end certifications is the Common Criteria (CC) Mobile Device Fundamentals (MDF) Protection Profile. Common Criteria is the driving force for establishing widespread mutual recognition of secure IT products across 31 countries . Over the past few years only three smartphone manufacturers have continually been certified on every OS version: Google, Samsung, and Apple. At the beginning of February, we successfully completed this certification for all currently supported Pixel smartphones running Android 11. Google is the first manufacturer to be certified on the latest OS version.

This specific certification is designed to evaluate how a device defends against the real-world threats facing both consumers and businesses. The table below outlines the threats and mitigations provided in the CC MDF protection profile:

This specific certification is designed to evaluate how a device defends against the real-world threats facing both consumers and businesses. The table below outlines the threats and mitigations provided in the CC MDF protection profile:

Threats

Mitigations

Network Eavesdropping - An attacker is positioned on a wireless communications channel or elsewhere on the network infrastructure

Network Attack - An attacker is positioned on a wireless communications channel or elsewhere on the network infrastructure

Protected Communications - Standard protocols such as IPsec, DTLS, TLS, HTTPS, and Bluetooth to ensure encrypted communications are secure

Authorization and Authentication - Secure authentication for networks and backends

Mobile Device Configuration - Capabilities for configuring and applying security policies defined by the user and/or Enterprise Administrator

Physical Access - An attacker, with physical access, may attempt to retrieve user data on the mobile device, including credentials

Protected Storage - Secure storage (that is, encryption of data-at-rest) for data contained on the device 

Authorization and Authentication - Secure device authentication using a known unlock factor, such as a password, PIN, fingerprint, or face authentication

Malicious or Flawed Application - Applications loaded onto the Mobile Device may include malicious or exploitable code 

Protected Communications - Standard protocols such as IPsec, DTLS, TLS, HTTPS, and Bluetooth to ensure encrypted communications are secure

Authorization and Authentication - Secure authentication for networks and backends

Mobile Device Configuration - Capabilities for configuring and applying security policies defined by the user and/or Enterprise Administrator

Mobile Device Integrity - Device integrity for critical functionality of both software and hardware

End User Privacy and Device Functionality - Application isolation/sandboxing and framework permissions provide separation and privacy between user activities

Persistent Presence - Persistent presence on a device by an attacker implies that the device has lost integrity and cannot regain it 

Mobile Device Integrity - Device integrity to ensure the integrity of critical functionality of both software and hardware is maintained

End User Privacy and Device Functionality - Application isolation/sandboxing and framework permissions provide separation and privacy between user activities


What makes this certification important is the fact that it is a hands on evaluation done by an authorized lab to evaluate the device and perform a variety of tests to ensure that:

  1. Every mitigation meets a predefined standard and set of criteria.
  2. Every mitigation works as advertised.

At a high level, the target of evaluation (TOE) is the combination of device hardware (i.e. system on chip) and operating system (i.e. Android). In order to validate our mitigations for the threats listed above, the lab looks at the following security functionality:

  • Protected Communications (encryption of data-in-transit) - Cryptographic algorithms and transport protocols used to encrypt the Wi-Fi traffic and all other network operations and communications.
  • Protected Storage (encryption of data-at-rest) - Cryptography provided by the system on chip, trusted execution environment, and any other discrete tamper resistant hardware such as the Titan M and the Android OS. Specifically looking at things like implementation of file-based encryption, hardware root of trust, keystore operations (such as, key generation), key storage, key destruction, and key hierarchy.
  • Authorization and Authentication - Mechanisms for unlocking the user’s devices, such as password, PIN or Biometric. Mitigation techniques like rate limiting and for biometrics, False Acceptance and Spoof Acceptance Rates.
  • Mobile Device Integrity - Android’s implementation of Verified Boot, Google Play System Updates, and Seamless OS Updates.
  • Auditability - Features that allow a user or IT admin to log events such as device start-up and shutdown, data encryption, data decryption, and key management.
  • Mobile Device Configuration - Capabilities that allow the user or enterprise admin to apply security policies to the device using Android Enterprise.

Why this is important for enterprises

It’s incredibly important to ensure Pixel security can specifically support enterprise needs. Many regulated industries require the use of Common Criteria certified devices to ensure that sensitive data is backed by the strongest possible protections. The Android Enterprise management framework enables enterprises to do things like control devices by setting restrictions around what the end user can do and audit devices to ensure all software settings are configured properly. For example, enterprise IT admins wish to enforce policies for features like the camera, location services or app installation process.

Why this is important for consumers

Security isn’t just an enterprise concern and many of the protections validated by Common Criteria certification apply to consumers as well. For example, when you’re connecting to Wi-Fi, you want to ensure no one can spy on your web browsing. If your device is lost or stolen, you want to be confident that your lock screen can reduce the chances of someone accessing your personal information.

We believe in making security & privacy accessible to all of our users. This is why we take care to ensure that Pixel devices meet or exceed these certification standards.. We’re committed to meeting these standards moving forward, so you can rest assured that your Pixel phone comes with top-of-the-line security built in, from the moment you turn it on.

Why this is important to the Android Ecosystem

While certifications are a great form of third party validation, they often fall under what we like to call the 3 C’s:

  • Complex - Due to the scope of the evaluation including the device hardware, the operating system and everything in between.
  • Costly - Because they require a hands on evaluation by a certified lab for every make/model combination (SoC + OS) which equates to hundreds of individual tests.
  • Cumbersome - Because it’s a fairly lengthy evaluation process that can take upwards of 18 months the first time you go through it.

We have been working these last three years to reduce this complexity for our OEM partners. We are excited to tell you that the features required to satisfy the necessary security requirements are baked directly into the Android Open Source Project. We’ve also added all of the management and auditability requirements into the Android Enterprise Management framework. Last year we started publishing the tools we have developed for this on GitHub to allow other Android OEMs to take advantage of our efforts as they go through their certification.

While we continue certifying Pixel smartphones with new Android OS versions, we have worked to enable other Android OEMs to achieve this certification as well as others, such as:

  • National Institute of Technology’s Cryptographic Algorithm and Module Validation Programs which is an evaluation of the cryptographic algorithms and/or modules and is something the US Public Sector and numerous other regulated verticals look for. With Android 11, BoringSSL which is part of the conscrypt mainline module has completed this validation (Certificate #3753)
  • US Department of Defense's Security Technical Implementation Guide; STIG for short is a guideline for how to deploy technology on a US Department of Defense network. In the past there were different STIGs for different Android OEMs which had their own implementations and proprietary controls, but thanks to our efforts we are now unifying this under a single Android STIG template so that Android OEMs don’t have to go through the burden of building custom controls to satisfy the various requirements.

We’ll continue to invest in additional ways to measure security for both enterprises and consumers, and we welcome the industry to join us in this effort.

New Password Checkup Feature Coming to Android

With the proliferation of digital services in our lives, it’s more important than ever to make sure our online information remains safe and secure. Passwords are usually the first line of defense against hackers, and with the number of data breaches that could publicly expose those passwords, users must be vigilant about safeguarding their credentials.

To make this easier, Chrome introduced the Password Checkup feature in 2019, which notifies you when one of the passwords you’ve saved in Chrome is exposed. We’re now bringing this functionality to your Android apps through Autofill with Google. Whenever you fill or save credentials into an app, we’ll check those credentials against a list of known compromised credentials and alert you if your password has been compromised. The prompt can also take you to your Password Manager page, where you can do a comprehensive review of your saved passwords. Password Checkup on Android apps is available on Android 9 and above, for users of Autofill with Google.

Follow the instructions below to enable Autofill with Google on your Android device:

  1. Open your phone’s Settings app
  2. Tap System > Languages & input > Advanced
  3. Tap Autofill service
  4. Tap Google to make sure the setting is enabled

If you can’t find these options, check out this page with details on how to get information from your device manufacturer.

How it works

User privacy is top of mind, especially when it comes to features that handle sensitive data such as passwords. Autofill with Google is built on the Android autofill framework which enforces strict privacy & security invariants that ensure that we have access to the user’s credentials only in the following two cases: 1) the user has already saved said credential to their Google account; 2) the user was offered to save a new credential by the Android OS and chose to save it to their account.

When the user interacts with a credential by either filling it into a form or saving it for the first time, we use the same privacy preserving API that powers the feature in Chrome to check if the credential is part of the list of known compromised passwords tracked by Google.

This implementation ensures that:

  • Only an encrypted hash of the credential leaves the device (the first two bytes of the hash are sent unencrypted to partition the database)
  • The server returns a list of encrypted hashes of known breached credentials that share the same prefix
  • The actual determination of whether the credential has been breached happens locally on the user’s device
  • The server (Google) does not have access to the unencrypted hash of the user’s password and the client (User) does not have access to the list of unencrypted hashes of potentially breached credentials

For more information on how this API is built under the hood, check out this blog from the Chrome team.

Additional security features

In addition to Password Checkup, Autofill with Google offers other features to help you keep your data secure:

  • Password generation: With so many credentials to manage, it’s easy for users to recycle the same password across multiple accounts. With password generation, we’ll generate a unique, secure password for you and save it to your Google account so you don’t have to remember it at all. On Android, you can request password generation for an app by long pressing the password field and selecting “Autofill” in the pop-up menu.
  • Biometric authentication: You can add an extra layer of protection on your device by requiring biometric authentication any time you autofill your credentials or payment information. Biometric authentication can be enabled inside of the Autofill with Google settings.

As always, stay tuned to the Google Security blog to keep up to date on the latest ways we’re improving security across our products.

Vulnerability Reward Program: 2020 Year in Review

Despite the challenges of this unprecedented year, our vulnerability researchers have achieved more than ever before, partnering with our Vulnerability Reward Programs (VRPs) to protect Google’s users by discovering security and abuse bugs and reporting them to us for remediation. Their diligence helps us keep our users, and the internet at large, safe, and enables us to fix security issues before they can be exploited.

The incredibly hard work, dedication, and expertise of our researchers in 2020 resulted in a record-breaking payout of over $6.7 million in rewards, with an additional $280,000 given to charity. We’d like to extend a big thank you to our community of researchers for collaborating with us. It’s your excellent work that brings our programs to life, so we wanted to take a moment to look back on last year’s successes.

Our rewards programs span several Google product areas, including Chrome, Android, and the Google Play Store. As in past years, we are sharing our 2020 Year in Review statistics across all of these programs.

Android 

2020 was a fantastic year for the Android VRP, and in response to the valiant efforts of multiple teams of researchers, we paid out $1.74M in rewards. Following our increase in exploit payouts in November 2019, we received a record 13 working exploit submissions in 2020, representing over $1M in exploit reward payouts. Some highlights include:

  • We awarded our first-ever Android 11 developer preview bonus, which paid out over $50,000 across 11 reports. This allowed us to patch the issues proactively, before the official release of Android 11.
  • Guang Gong (@oldfresher) and his team at 360 Alpha Lab, Qihoo 360 Technology Co. Ltd., now hold a record 8 exploits (30% of the all-time total) on the leaderboard. Most recently, Alpha Lab submitted an impressive 1-click remote root exploit targeting recent Android devices. They maintain the top Android payout ($161,337, plus another $40,000 from Chrome VRP) for their 2019 exploit.
  • Another researcher submitted an additional two exploits, and is vying for the top all-time spot with an impressive $400,000 in all-time exploit payouts.

In addition, we launched a number of pilot rewards programs to guide security researchers toward additional areas of interest, including Android Auto OS, writing fuzzers for Android code, and a reward program for Android chipsets. And in 2021, we'll be working on additional improvements and exciting initiatives related to our programs.

Chrome 

Chrome has also seen a record year of VRP payouts! We increased our reward amounts in July 2019, and as a result, 2020 has seen us pay out 83% more than 2019, totalling $2.1M across 300 bugs.

In 2019, 14% of our payouts were for V8 bugs. This decreased to just 6% in 2020. At the end of 2020, we announced a further bonus reward for clearly exploitable V8 bugs, so we expect to see this amount increase again in 2021.

Google Play 

It’s been another stellar year for the Google Play Security Rewards Program! This year, we expanded the criteria for qualifying Android apps to include apps utilizing the Exposure Notification API and performing contact tracing to help combat Covid-19. We also increased our maximum bounty award amount to $20,000 for qualifying vulnerabilities.

In 2020, the Google Play Security Rewards Program and Developer Data Protection Reward Program awarded over $270,000 to Android researchers around the world.

Abuse Program

Beyond typical security vulnerabilities, we remain interested in research focused on abuse-related risks.

The Abuse program released an official definition describing what an abuse risk is and how abuse-related reports are assessed. We also announced increased rewards for reports focused on abuse-related methodologies. These efforts led to a huge spike of abuse-related reports. In fact, we received more than twice as many reports in 2020 as in 2019, a level of growth we’ve never seen before. The fantastic work of our researchers in 2020 allowed us to identify and fix over 100 issues across more than 60 different products.

Research Grants

Besides reward payouts, in 2020 we also awarded over $400,000 in grants to more than 180 security researchers around the world, which is a record for this program. More than a third of these grants were awarded in response to the Covid-19 crisis, to extend our support to researchers and enable them to continue with their work. Our researchers got back to us with over 200 reports which resulted in more than 100 identified vulnerabilities.

"The point is, the value of these research grants is not $1337, $500 or $5000 etc. It is priceless!" – Research Grantee

Looking Forward

Finally, because of the ongoing Covid-19 pandemic and related restrictions on travel last year, we couldn’t keep our tradition of meeting our bug hunters in person and organizing events like ESCAL8, where we can engage with our incredible community of researchers. Like everyone else, we are full of hope that 2021 will allow us to meet in person again, and celebrate the 10 year VRP anniversary and the fantastic work our researchers have contributed during this time.

We look forward to another year of working with our security researchers to make Google, Android, Chrome and the Google Play Store safer for everyone. Follow us on @GoogleVRP to keep tabs on the latest.

Thank you to Mike Antares, Adam Bacchus, Dirk Göhmann, Amy Ressler, Martin Straka, Adrian Taylor and Jan Keller for their contributions to this post.

Data Driven Security Hardening in Android

The Android platform team is committed to securing Android for every user across every device. In addition to monthly security updates to patch vulnerabilities reported to us through our Vulnerability Rewards Program (VRP), we also proactively architect Android to protect against undiscovered vulnerabilities through hardening measures such as applying compiler-based mitigations and improving sandboxing. This post focuses on the decision-making process that goes into these proactive measures: in particular, how we choose which hardening techniques to deploy and where they are deployed. As device capabilities vary widely within the Android ecosystem, these decisions must be made carefully, guided by data available to us to maximize the value to the ecosystem as a whole.

The overall approach to Android Security is multi-pronged and leverages several principles and techniques to arrive at data-guided solutions to make future exploitation more difficult. In particular, when it comes to hardening the platform, we try to answer the following questions:

  • What data are available and how can they guide security decisions?
  • What mitigations are available, how can they be improved, and where should they be enabled?
  • What are the deployment challenges of particular mitigations and what tradeoffs are there to consider?

By shedding some light on the process we use to choose security features for Android, we hope to provide a better understanding of Android's overall approach to protecting our users.

Data-driven security decision-making

We use a variety of sources to determine what areas of the platform would benefit the most from different types of security mitigations. The Android Vulnerability Rewards Program (VRP) is one very informative source: all vulnerabilities submitted through this program are analyzed by our security engineers to determine the root cause of each vulnerability and its overall severity (based on these guidelines). Other sources are internal and external bug-reports, which identify vulnerable components and reveal coding practices that commonly lead to errors. Knowledge of problematic code patterns combined with the prevalence and severity of the vulnerabilities they cause can help inform decisions about which mitigations are likely to be the most beneficial.



Types of Critical and High severity vulnerabilities fixed in Android Security Bulletins in 2019

Relying purely on vulnerability reports is not sufficient as the data are inherently biased: often, security researchers flock to "hot" areas, where other researchers have already found vulnerabilities (e.g. Stagefright). Or they may focus on areas where readily-available tools make it easier to find bugs (for instance, if a security research tool is posted to Github, other researchers commonly utilize that tool to explore deeper).

To ensure that mitigation efforts are not biased only toward areas where bugs and vulnerabilities have been reported, internal Red Teams analyze less scrutinized or more complex parts of the platform. Also, continuous automated fuzzers run at-scale on both Android virtual machines and physical devices. This also ensures that bugs can be found and fixed early in the development lifecycle. Any vulnerabilities uncovered through this process are also analyzed for root cause and severity, which inform mitigation deployment decisions.

The Android VRP rewards submissions of full exploit-chains that demonstrate a full end-to-end attack. These exploit-chains, which generally utilize multiple vulnerabilities, are very informative in demonstrating techniques that attackers use to chain vulnerabilities together to accomplish their goals. Whenever a researcher submits a full exploit chain, a team of security engineers analyzes and documents the overall approach, each link in the chain, and any innovative attack strategies used. This analysis informs which exploit mitigation strategies could be employed to prevent pivoting directly from one vulnerability to another (some examples include Address Space Layout Randomization and Control-Flow Integrity) and whether the process’s attack surface could be reduced if it has unnecessary access to resources.

There are often multiple different ways to use a collection of vulnerabilities to create an exploit chain. Therefore a defense-in-depth approach is beneficial, with the goal of reducing the usefulness of some vulnerabilities and lengthening exploit chains so that successful exploitation requires more vulnerabilities. This increases the cost for an attacker to develop a full exploit chain.

Keeping up with developments in the wider security community helps us understand the current threat landscape, what techniques are currently used for exploitation, and what future trends look like. This involves but is not limited to:

  • Close collaboration with the external security research community
  • Reading journals and attending conferences
  • Monitoring techniques used by malware
  • Following security research trends in security communities
  • Participating in external efforts and projects such as KSPP, syzbot, LLVM, Rust, and more

All of these data sources provide feedback for the overall security hardening strategy, where new mitigations should be deployed, and what existing security mitigations should be improved.

Reasoning About Security Hardening

Hardening and Mitigations

Analyzing the data reveals areas where broader mitigations can eliminate entire classes of vulnerabilities. For instance, if parts of the platform show a large number of vulnerabilities due to integer overflow bugs, they are good candidates to enable Undefined Behavior Sanitizer (UBSan) mitigations such as the Integer Overflow Sanitizer. When common patterns in memory access vulnerabilities appear, they inform efforts to build hardened memory allocators (enabled by default in Android 11) and implement mitigations (such as CFI) against exploitation techniques that provide better resilience against memory overflows or Use-After-Free vulnerabilities.

Before discussing how the data can be used, it is important to understand how we classify our overall efforts in hardening the platform. There are a few broadly defined buckets that hardening techniques and mitigations fit into (though sometimes a particular mitigation may not fit cleanly into any single one):

  • Exploit mitigations
    • Deterministic runtime prevention of vulnerabilities detects undefined or unexpected behavior and aborts execution when the behavior is detected. This turns potential memory corruption vulnerabilities into less harmful crashes. Often these mitigations can be enabled selectively and still be effective because they impact individual bugs. Examples include Integer Sanitizer and Bounds Sanitizer.
    • Exploitation technique mitigations target the techniques used to pivot from one vulnerability to another or to gain code execution. These mitigations theoretically may render some vulnerabilities useless, but more often serve to constrain the actions available to attackers seeking to exploit vulnerabilities. This increases the difficulty of exploit development in terms of time and resources. These mitigations may need to be enabled across an entire process's memory space to be effective. Examples include Address Space Layout Randomization, Control Flow Integrity (CFI), Stack Canaries and Memory Tagging.
    • Compiler transformations that change undefined behavior to defined behavior at compile-time. This prevents attackers from taking advantage of undefined behavior such as uninitialized memory. An example of this is stack initialization.
  • Architectural decomposition
    • Splits larger, more privileged components into smaller pieces, each of which has fewer privileges than the original. After this decomposition, a vulnerability in one of the smaller components will have reduced severity by providing less access to the system, lengthening exploit chains, and making it harder for an attacker to gain access to sensitive data or additional privilege escalation paths.
  • Sandboxing/isolation
    • Related to architectural decomposition, enforces a minimal set of permissions/capabilities that a process needs to correctly function, often through mandatory and/or discretionary access control. Like architectural decomposition, this makes vulnerabilities in these processes less valuable as there are fewer things attackers can do in that execution context, by applying the principle of least privilege. Some examples are Android Permissions, Unix Permissions, Linux Capabilities, SELinux, and Seccomp.
  • Migrating to memory-safe languages
    • C and C++ do not provide memory safety the way that languages like Java, Kotlin, and Rust do. Given that the majority of security vulnerabilities reported to Android are memory safety issues, a two-pronged approach is applied: improving the safety of C/C++ while also encouraging the use of memory safe languages.

Enabling these mitigations

With the broad arsenal of mitigation techniques available, which of these to employ and where to apply them depends on the type of problem being solved. For instance, a monolithic process that handles a lot of untrusted data and does complex parsing would be a good candidate for all of these. The media frameworks provide an excellent historical example where an architectural decomposition enabled incrementally turning on more exploit mitigations and deprivileging.

Architectural decomposition and isolation of the Media Frameworks over time

Remotely reachable attack surfaces such as NFC, Bluetooth, WiFi, and media components have historically housed the most severe vulnerabilities, and as such these components are also prioritized for hardening. These components often contain some of the most common vulnerability root causes that are reported in the VRP, and we have recently enabled sanitizers in all of them.

Libraries and processes that enforce or sit at security boundaries, such as libbinder, and widely-used core libraries such as libui, libcore, and libcutils are good targets for exploit mitigations since these are not process-specific. However, due to performance and stability sensitivities around these core libraries, mitigations need to be supported by strong evidence of their security impact.

Finally, the kernel’s high level of privilege makes it an important target for hardening as well. Because different codebases have different characteristics and functionality, susceptibility to and prevalence of certain kinds of vulnerabilities will differ. Stability and performance of mitigations here are exceptionally important to avoid negatively impacting the user experience, and some mitigations that make sense to deploy in user space may not be applicable or effective. Therefore our considerations for which hardening strategies to employ in the kernel are based on a separate analysis of the available kernel-specific data.

This data-driven approach has led to tangible and measurable results. Starting in 2015 with Stagefright, a large number of Critical severity vulnerabilities were reported in Android's media framework. These were especially sensitive because many of these vulnerabilities were remotely reachable. This led to a large architectural decomposition effort in Android Nougat, followed by additional efforts to improve our ability to patch media vulnerabilities quickly. Thanks to these changes, in 2020 we had no internet-reachable Critical severity vulnerabilities reported to us in the media frameworks.

Deployment Considerations

Some of these mitigations provide more value than others, so it is important to focus engineering resources where they are most effective. This involves weighing the performance cost of each mitigation as well as how much work is required to deploy it and support it without negatively affecting device stability or user experience.

Performance

Understanding the performance impact of a mitigation is a critical step toward enabling it. Adding too much overhead to some components or the entire system can negatively impact user experience by reducing battery life and making the device less responsive. This is especially true for entry-level devices, which should benefit from hardening as well. We thus want to prioritize engineering efforts on impactful mitigations with acceptable overheads.

When investigating performance, important factors include not just CPU time but also memory increase, code size, battery life, and UI jank. These factors are especially important to consider for more constrained entry-level devices, to ensure that the mitigations perform well across the entire Android ecosystem.

The system-wide performance impact of a mitigation is also dependent on where that mitigation is enabled, as certain components are more performance-sensitive than others. For example, binder is one of the most used paths for interprocess communication, so even small additional overhead could significantly impact user experience on a device. On the other hand, video players only need to ensure that frames are rendered at the source framerate; if frames are rendered much faster than the rate at which they are displayed, additional overhead may be more acceptable.

Benchmarks, if available, can be extremely useful to evaluate the performance impact of a mitigation. If there are no benchmarks for a certain component, new ones should be created, for instance by calling impacted codec code to decode a media file. If this testing reveals unacceptable overhead, there are often a few options to address it:

  • Selectively disable the mitigation in performance-sensitive functions identified during benchmarks. A small number of functions are often responsible for a large part of the runtime overhead, so disabling the mitigation in those functions can maximize the security benefit while minimizing the performance cost. Here is an example of this in one of the media codecs. These exempted functions must be manually reviewed for bugs to reduce the risk of disabling the mitigation there.
  • Optimize the implementation of the mitigation to improve its performance. This often involves modifying the compiler. For example, our team has upstreamed optimizations to the Integer Overflow Sanitizer and the Bounds Sanitizer.
  • Certain mitigations, such as the Scudo allocator’s built-in robustness against heap-based vulnerabilities, have tunable parameters that can be tweaked to improve performance.

Most of these improvements involve changes or contributions to the LLVM project. By working with upstream LLVM, these improvements have impact and benefit beyond Android. At the same time Android benefits from upstream improvements when others in the LLVM community make improvements as well.

Deployment and Support

There is more to consider when enabling a mitigation than its security benefit and performance cost, such as the cost of short-term deployment and long-term support.

Deployment Stability Considerations

One important issue is whether a mitigation can contain false positives. For example, if the Bounds Sanitizer produces an error, there is definitely an out-of-bounds access (although it might not be exploitable). But the Integer Overflow Sanitizer can produce false positives, as many integer overflows are harmless or even perfectly expected and correct.

It is thus important to consider the impact of a mitigation on the stability of the system. Whether a crash is due to a false positive or a legitimate security issue, it still disrupts the user experience and so is undesirable. This is another reason to carefully consider which components should have which mitigations, as crashes in some components are worse than others. If a mitigation causes a crash in a media codec, the user’s video playback will be stopped, but if netd crashes during an update, the phone could be bricked. For a mitigation like Bounds Sanitizer, where false positives are not an issue, we still need to perform extensive testing to ensure the device remains stable. Off-by-one errors, for example, may not crash during normal operation, but Bounds Sanitizer would abort execution and result in instability.

Another consideration is whether it is possible to enumerate everything a mitigation might break. For example, it is not easy to contain the risk of the Integer Overflow Sanitizer without extensive testing, as it is difficult to determine which overflows are intentional/benign (and thus should be allowed) and which could lead to vulnerabilities.

Support

We must consider not just issues caused by deploying mitigations but also how to support them long-term. This includes the developer time to integrate a mitigation into existing systems, enable and debug it, deploy it onto devices, and support it after launch. SELinux is a good example of this; it takes a significant amount of effort to write the policy for a new device, and even once enforcing mode is enabled, the policy must be supported for years as code changes and functionality is added or removed.

We try to make mitigations less disruptive and spread awareness of how they affect developers. This is done by making documentation available on source.android.com and by improving existing algorithms to reduce false positives. Making it easier to debug mitigations when something goes wrong reduces the developer maintenance burden that can accompany mitigations. For example, when developers found it difficult to identify UBSan errors, we enabled support for the UBSan Minimal Runtime by default in the Android build system. The minimal runtime itself was first upstreamed by others at Google specifically for this purpose. When the Integer Overflow Sanitizer crashes a program, that adds the following hint to the generic SIGABRT crash message:

    Abort message: 'ubsan: sub-overflow'

Developers who see this message then know to enable diagnostics mode, which prints out details about the crash:

    frameworks/native/services/surfaceflinger/SurfaceFlinger.cpp:2188:32: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'size_t' (aka 'unsigned long')

Similarly, upstream SELinux provides a tool called audit2allow that can be used to suggest rules to allow blocked behaviors:

    adb logcat -d | audit2allow -p policy

#============= rmt ==============
allow rmt kmem_device:chr_file { read write };

A debugging tool does not need to be perfect to be helpful; audit2allow does not always suggest the correct options, but for developers without detailed knowledge of SELinux it provides a strong starting point.

Conclusion

With every Android release, our team works hard to balance security improvements that benefit the entire ecosystem with performance and stability, drawing heavily from the data that are available to us. We hope that this sheds some light on the particular challenges involved and the overall process that leads to mitigations introduced in each Android release.

Thank you to Jeff Vander Stoep for contributions to this blog post.

New Year, new password protections in Chrome

Passwords help protect our online information, which is why it’s never been more important to keep them safe. But when we’re juggling dozens (if not hundreds!) of passwords across various websites—from shopping, to entertainment to personal finance—it feels like there’s always a new account to set up or manage. While it’s definitely a best practice to have a strong, unique password for each account, it can be really difficult to remember them all—that’s why we have a password manager in Chrome to back you up.

As you browse the web, on your phone, computer or tablet, Chrome can create, store and fill in your passwords with a single click. We'll warn you if your passwords have been compromised after logging in to sites, and you can always check for yourself in Chrome Settings. As we kick off the New Year, we’re excited to announce new updates that will give you even greater control over your passwords:

Easily fix weak passwords

We’ve all had moments where we’ve rushed to set up a new login, choosing a simple “name-of-your-pet” password to get set up quickly. However, weak passwords expose you to security risks and should be avoided. In Chrome 88, you can now complete a simple check to identify any weak passwords and take action easily.

To check your passwords, click on the key icon under your profile image, or type chrome://settings/passwords in your address bar.

Edit your passwords in one place

Chrome can already prompt you to update your saved passwords when you log in to websites. However, you may want to update multiple usernames and passwords easily, in one convenient place. That’s why starting in Chrome 88, you can manage all of your passwords even faster and easier in Chrome Settings on desktop and iOS (Chrome’s Android app will be getting this feature soon, too).

Building on the 2020 improvements

These new updates come on top of many improvements from last year which have all contributed to your online safety and make browsing the web even easier:

  • Password breaches remain a critical concern online. So we’re proud to share that Chrome’s Safety Check is used 14 million times every week! As a result of Safety Check and other improvements launched in 2020, we’ve seen a 37% reduction in compromised credentials stored in Chrome.
  • Starting last September, iOS users were able to autofilll their saved passwords in other apps and browsers. Today, Chrome is streamlining 3 million sign-ins across iOS apps every week! We also made password filling more secure for Chrome on iOS users by adding biometric authentication (coming soon to Chrome on Android).
  • We’re always looking for ways to improve the user experience, so we made the password manager easier to use on Android with features like Touch-to-fill.

The new features with Chrome 88 will be rolled out over the coming weeks, so take advantage of the new updates to keep your passwords secure. Stay tuned for more great password features throughout 2021.

Announcing Bonus Rewards for V8 Exploits

Starting today, the Chrome Vulnerability Rewards Program is offering a new bonus for reports which demonstrate exploitability in V8, Chrome’s JavaScript engine. We have historically had many great V8 bugs reported (thank you to all of our reporters!) but we'd like to know more about the exploitability of different V8 bug classes, and what mechanisms are effective to go from an initial bug to a full exploit. That's why we're offering this additional reward for bugs that show how a V8 vulnerability could be used as part of a real world attack.

In the past, exploits had to be fully functional to be rewarded at our highest tier, high-quality report with functional exploit. Demonstration of how a bug might be exploited is one factor that the panel may use to determine that a report is high-quality, our second highest tier, but we want to encourage more of this type of analysis. This information is very useful for us when planning future mitigations, making release decisions, and fixing bugs faster. We also know it requires a bit more effort for our reporters, and that effort should be rewarded. For the time being this only applies to V8 bugs, but we’re curious to see what our reporters come up with!

The full details are available on the Chrome VRP rules page. At a high-level, we’re offering increased reward amounts, up to double, for qualifying V8 bugs.

The following table shows the updated reward amounts for reports qualifying for this new bonus. These new, higher values replace the normal reward. If a bug in V8 doesn’t fit into one of these categories, it may still qualify for an increased reward at the panel’s discretion.

[1] Baseline reports are unable to meet the requirements to qualify for this special reward.

So what does a report need to do to demonstrate that a bug is likely exploitable? Any V8 bug report which would have previously been rewarded at the high-quality report with functional exploit level will likely qualify with no additional effort from the reporter. By definition, these demonstrate that the issue was exploitable. V8 reports at the high-quality level may also qualify if they include evidence that the bug is exploitable as part of their analysis. See the rules page for more information about our reward levels.

The following are some examples of how a report could demonstrate that exploitation is likely, but any analysis or proof of concept will be considered by the panel:

  • Executing shellcode from the context of Chrome or d8 (V8’s developer shell)
  • Creating an exploit primitive that allows arbitrary reads from or writes to specific addresses or attacker-controlled offsets
  • Demonstrating instruction pointer control
  • Demonstrating an ASLR bypass by computing the memory address of an object in a way that’s exposed to script
  • Providing analysis of how a bug could lead to type confusion with a JSObject

For example reports, see issues 914736 and 1076708.

We’d like to thank all of our VRP reporters for helping us keep Chrome users safe! We look forward to seeing what you find.

-The Chrome Vulnerability Rewards Panel

Privacy-preserving features in the Mobile Driving License

In the United States and other countries a Driver's License is not only used to convey driving privileges, it is also commonly used to prove identity or personal details.

Presenting a Driving License is simple, right? You hand over the card to the individual wishing to confirm your identity (the so-called “Relying Party” or “Verifier”); they check the security features of the plastic card (hologram, micro-printing, etc.) to ensure it’s not counterfeit; they check that it’s really your license, making sure you look like the portrait image printed on the card; and they read the data they’re interested in, typically your age, legal name, address etc. Finally, the verifier needs to hand back the plastic card.

Most people are so familiar with this process that they don’t think twice about it, or consider the privacy implications. In the following we’ll discuss how the new and soon-to-be-released ISO 18013-5 standard will improve on nearly every aspect of the process, and what it has to do with Android.

Mobile Driving License ISO Standard

The ISO 18013-5 “Mobile driving licence (mDL) application” standard has been written by a diverse group of people representing driving license issuers (e.g. state governments in the US), relying parties (federal and state governments, including law enforcement), academia, industry (including Google), and many others. This ISO standard allows for construction of Mobile Driving License (mDL) applications which users can carry in their phone and can use instead of the plastic card.

Instead of handing over your plastic card, you open the mDL application on your phone and press a button to share your mDL. The Verifier (aka “Relying Party”) has their own device with an mDL reader application and they either scan a QR code shown in your mDL app or do an NFC tap. The QR code (or NFC tap) conveys an ephemeral cryptographic public key and hardware address the mDL reader can connect to.

Once the mDL reader obtains the cryptographic key it creates its own ephemeral keypair and establishes an encrypted and authenticated, secure wireless channel (BLE, Wifi Aware or NFC)). The mDL reader uses this secure channel to request data, such as the portrait image or what kinds of vehicles you're allowed to drive, and can also be used to ask more abstract questions such as “is the holder older than 18?”

Crucially, the mDL application can ask the user to approve which data to release and may require the user to authenticate with fingerprint or face — none of which a passive plastic card could ever do.

With this explanation in mind, let’s see how presenting an mDL application compares with presenting a plastic-card driving license:

  • Your phone need not be handed to the verifier, unlike your plastic card. The first step, which requires closer contact to the Verifier to scan the QR code or tap the NFC reader, is safe from a data privacy point of view, and does not reveal any identifying information to the verifier. For additional protection, mDL apps will have the option of both requiring user authentication before releasing data and then immediately placing the phone in lockdown mode, to ensure that if the verifier takes the device they cannot easily get information from it.
  • All data is cryptographically signed by the Issuing Authority (for example the DMV who issued the mDL) and the verifier's app automatically validates the authenticity of the data transmitted by the mDL and refuses to display inauthentic data. This is far more secure than holograms and microprinting used in plastic cards where verification requires special training which most (human) verifiers don't receive. With most plastic cards, fake IDs are relatively easy to create, especially in an international context, putting everyone’s identity at risk.
    • The amount of data presented by the mDL is minimized — only data the user elects to release, either explicitly via prompts or implicitly via e.g. pre-approval and user settings, is released. This minimizes potential data abuse and increases the personal safety of users.

      For example, any bartender who checks your mDL for the sole purpose of verifying you’re old enough to buy a drink needs only a single piece of information which is whether the holder is e.g. older than 21, yes or no. Compared to the plastic card, this is a huge improvement; a plastic card shows all your data even if the verifier doesn’t need it.

      Additionally, all of this information is available via a 2D barcode on the back so if you use your plastic card driving license to buy beer, tobacco, or other restricted items at a store it’s common in some states for the cashier to scan your license. In some cases, this means you may get advertising in the mail but they may sell your identifying information to the highest bidder or, worst case, leak their whole database.

These are some of the reasons why we think mDL is a big win for end users in terms of privacy.

One commonality between plastic-card driving licences and the mDL is how the relying party verifies that the person presenting the license is the authorized holder. In both cases, the verifier manually compares the appearance of the individual against a portrait photo, either printed on the plastic or transmitted electronically and research has shown that it’s hard for individuals to match strangers to portrait images.

The initial version of ISO 18013-5 won’t improve on this but the ISO committee working on the standard is already investigating ways to utilize on-device biometrics sensors to perform this match in a secure and privacy-protecting way. The hope is that improved fidelity in the process helps reduce unauthorized use of identity documents.

mDL support in Android

Through facilities such as hardware-based Keystore, Android already offers excellent support for security and privacy-sensitive applications and in fact it’s already possible to implement the ISO 18013-5 standard on Android without further platform changes. Many organizations participating in the ISO committee have already implemented 18013-5 Android apps.

That said, with purpose-built support in the operating system it is possible to provide better security and privacy properties. Android 11 includes the Identity Credential APIs at the Framework level along with a Hardware Abstraction Layer interface which can be implemented by Android OEMs to enable identity credential support in Secure Hardware. Using the Identity Credential API, the Trusted Computing Base of mDL applications does not include the application or even Android itself. This will be particularly important for future versions where the verifier must trust the device to identify and authenticate the user, for example through fingerprint or face matching on the holder's own device. It’s likely such a solution will require certified hardware and/or software and certification is not practical if the TCB includes the hundreds of millions of lines of code in Android and the Linux kernel.

One advantage of plastic cards is that they don't require power or network communication to be useful. Putting all your licenses on your phone could seem inconvenient in cases where your device is low on battery, or does not have enough battery life to start. The Android Identity Credential HAL therefore provides support for a mode called Direct Access, where the license is still available through an NFC tap even when the phone's battery is too low to boot it up. Device makers can implement this mode, but it will require hardware support that will take several years to roll out.

For devices without the Identity Credential HAL, we have an Android Jetpack which implements the same API and works on nearly every Android device in the world (API level 24 or later). If the device has hardware-backed Identity Credential support then this Jetpack simply forwards calls to the platform API. Otherwise, an Android Keystore-backed implementation will be used. While the Android Keystore-backed implementation does not provide the same level of security and privacy, it is perfectly adequate for both holders and issuers in cases where all data is issuer-signed. Because of this, the Jetpack is the preferred way to use the Identity Credential APIs. We also made available sample open-source mDL and mDL Reader applications using the Identity Credential APIs.

Conclusion

Android now includes APIs for managing and presenting with identity documents in a more secure and privacy-focused way than was previously possible. These can be used to implement ISO 18013-5 mDLs but the APIs are generic enough to be usable for other kinds of electronic documents, from school ID or bonus program club cards to passports.

Additionally, the Android Security and Privacy team actively participates in the ISO committees where these standards are written and also works with civil liberties groups to ensure it has a positive impact on our end users.