Author Archives: Open Source Programs Office

Pigweed: A collection of embedded libraries

We’re excited to announce Pigweed, an open source collection of embedded-targeted libraries, or as we like to call them, modules. Pigweed modules are built to enable faster and more reliable development on 32-bit microcontrollers.

Pigweed is in early development and is not suitable for production use at this time.

Getting Started with Pigweed

As of today, the source is available for everyone at pigweed.googlesource.com under an Apache 2.0 license. Instructions on getting up and running can be found in the README.

See Pigweed in action

Pigweed offers modules that address a wide range of embedded developer needs. These highlight how Pigweed can accelerate embedded development across the entire lifecycle:
  • Setup: Get started faster with simplified setup via a virtual environment with all required tools
  • Development: Accelerated edit-compile-flash-test cycles with watchers and distributed testing
  • Code Submission: Pre-configured code formatting and integrated presubmit checks

SETUP

A classic challenge in the embedded space is reducing the time from running git clone to having a binary executing on a device. Oftentimes, an entire suite of tools is needed for non-trivial production embedded projects. For example, your project likely needs:
  • A C++ compiler for your target device, and also for your host
  • A build system (or three); for example, GN, Ninja, CMake, Bazel
  • A code formatting program like clang-format
  • A debugger like OpenOCD to flash and debug your embedded device
  • A known Python version with known modules installed for scripting
  • … and so on
Below is a system with Python 2.7, clang-format, and no ARM compiler. The bootstrap script in Pigweed’s pw_env_setup module, sets up the current shell to have access to a standardized set of tools—Python 3.8, clang-format, and an ARM compiler among them. All of this is done in a virtual environment so the system’s default environment remains unmodified.

DEVELOPMENT

In typical embedded development, adding even a small change involves the following additional manual steps:
  • Re-building the image
  • Flashing the image to a device
  • Ensuring that the change works as expected
  • Verifying that existing tests continue to pass
This is a huge disparity from web development workflows where file watchers are prevalent—you save a file and instantly see the results of the change.

Pigweed’s pw_watch module solves this inefficiency directly, providing a watcher that automatically invokes a build when a file is saved, and also runs the specific tests affected by the code changes. This drastically reduces the edit-compile-flash-test cycle for changes.



In the demo above, the pw_watch module (on the right) does the following:
  • Detects source file changes
  • Builds the affected libraries, tests, and binaries
  • Flashes the tests to the device (in this case a STM32F429i Discovery board)
  • Runs the specific unit tests
There’s no need to leave your code editor—all of this is done automatically. You can save additional time by using the pw_target_runner module to run tests in parallel across multiple devices.

CODE SUBMISSION

When developing code as a part of a team, consistent code is an important part of a healthy codebase. However, setting up linters, configuring code formatting, and adding automated presubmit checks is work that often gets delayed indefinitely.

Pigweed’s pw_presubmit module provides an off-the-shelf integrated suite of linters, based on tools that you’ve probably already used, that are pre-configured for immediate use for microcontroller developers. This means that your project can have linting and automatic formatting presubmit checks from its inception.

And a bunch of other modules

There are many modules in addition to the ones highlighted above...
  • pw_tokenizer – A module that converts strings to binary tokens at compile time. This enables logging with dramatically less overhead in flash, RAM, and CPU usage.
  • pw_string – Provides the flexibility, ease-of-use, and safety of C++-style string manipulation, but with no dynamic memory allocation and a much smaller binary size impact. Using pw_string in place of the standard C functions eliminates issues related to buffer overflow or missing null terminators.
  • pw_bloat – A module to generate memory reports for output binaries empowering developers with information regarding the memory impact of any change.
  • pw_unit_test – Unit testing is important and Pigweed offers a portable library that’s broadly compatible with Google Test. Unlike Google Test, pw_unit_test is built on top of embedded friendly primitives; for example, it does not use dynamic memory allocation. Additionally, it is to port to new target platforms by implementing the test event handler interface.
  • pw_kvs – A key-value-store implementation for flash-backed persistent storage with integrated wear levelling. This is a lightweight alternative to a file system for embedded devices.
  • pw_cpu_exception_armv7m – Robust low level hardware fault handler for ARM Cortex-M; the handler even has unit tests written in assembly to verify nested-hardware-fault handling!
  • pw_protobuf – An early preview of our wire-format-oriented protocol buffer implementation. This protobuf compiler makes a different set of implementation tradeoffs than the most popular protocol buffer library in this space, nanopb.

Why name the project Pigweed?

Pigweed, also known as amaranth, is a nutritious grain and leafy salad green that is also a rapidly growing weed. When developing the project that eventually became Pigweed, we wanted to find a name that was fun, playful, and reflective of how we saw Pigweed growing. Teams would start out using one module that catches their eye, and after that goes well, they’d quickly start using more.

So far, so good ?

What’s next?

We’re continuing to evolve the collection and add new modules. It’s our hope that others in the embedded community find these modules helpful for their projects.

By Keir Mierle and Mohammed Habibulla, on behalf of the Pigweed team

Google Summer of Code 2020 now open for student applications!

If you’re a university student and want to sharpen your software development skills while doing good for the open source community, check out Google Summer of Code (GSoC) 2020! This will be our 16th year of GSoC!

We are now accepting student applications for our program that introduces university students from around the world to open source software communities, as well as our enthusiastic and generous community of mentors. For three months students code from the comfort of their homes (the program is entirely online!) and receive stipends based on the successful completion of their project milestones.

Past participants say the real-world experience that GSoC provides honed their technical skills, boosted their confidence, expanded their professional network, and enhanced their resume, all while making them better developers.

Interested students can submit proposals on the program site between now and Tuesday, March 31, 2020 at 18:00 UTC.

While many students began preparing in February when we announced the 200 participating open source organizations, it’s not too late for you to start! The first step is to browse the list of organizations and look for project ideas that appeal to you. Next, reach out to the organization to introduce yourself and determine if your skills and interests are a good fit. Since spots are limited, we recommend writing a strong proposal and submitting a draft early so you can communicate with the organization and get their feedback to increase your odds of being selected.

You can learn more about how to prepare by watching the video below and checking out the Student Guide and Advice for Students.



You can find more information on our website, including a full timeline of important dates. We also highly recommend reviewing the FAQ and Program Rules.

Remember to submit your proposals early as you only have until Tuesday, March 31 at 18:00 UTC. Good luck to all who apply!

By Stephanie Taylor, Google Open Source

WebAssembly brings extensibility to network proxies

With the Istio 1.5 release we are happy to introduce WebAssembly (Wasm) extensions in Envoy, built with our long running collaborators Lyft and IBM. With partners like Solo.io deepening their engagement as well, we are excited to see the community and ecosystem developing around this segment of the open source world.

The Envoy service proxy has taken the Cloud Native landscape by storm since it was open sourced by Lyft in 2016, quickly becoming a fixture in modern app deployment—both at the edge and as a sidecar. Since Google and IBM started the Istio project and selected Envoy as the proxy of choice for service mesh, we have been working with the Envoy community to improve performance and add functionality. In fact, Google now commits more code to Envoy than any other company.

Envoy has always had an extension mechanism, either with compiled-in C++ modules or Lua scripts—both with downsides. One of our design goals with Istio was to bring ease of extensibility to allow an ecosystem of policy, telemetry, and logging systems. We did this with a control plane component and out-of-process adapters that could be written in any language, but this approach introduced additional network hops and latency.

This is where Wasm comes in. Wasm is a binary instruction format, compilable from over 30 languages, with a runtime to execute it in a sandboxed environment. Already embedded in all major browsers and with a W3C working group defining the standards, we are now bringing it server-side via Envoy. It allows adding functionality to the Envoy proxy without recompiling it, without forking, and without difficult rollouts. Istio can distribute extensions to proxies and load them without even restarting. This really brings together the best of both worlds in terms of extensibility—choice of language and great performance.

“I am extremely excited to see Wasm support land in Envoy; this is the future of Envoy extensibility, full stop. Envoy’s Wasm support coupled with a community driven hub will unlock an incredible amount of innovation in the networking space across both service mesh and API gateway use cases. I can’t wait to see what the community builds moving forward.” – Matt Klein, Envoy creator

To make sure that developing Wasm extensions is a great experience, our partner Solo.io has been working hard on creating a great developer experience. Solo.io recently announced WebAssembly Hub, a service for building, sharing, discovering and deploying Wasm extensions. With the WebAssembly Hub, Wasm extensions are as easy to manage, install and run as containers.

“We are committed to creating the most user friendly developer experience for service mesh. Like Docker did for containers, our goal is to simplify the consumption of WebAssembly extensions, which is the ‘why' behind WebAssembly Hub. By working with Google and the Istio open source community, we are able to simplify the experience of creating, sharing and deploying WebAssembly extensions to Envoy proxy and Istio, to bring the power of WebAssembly to more languages, and to enable a broader set of developers to innovate on service mesh." said Idit Levine, CEO and Founder, Solo.io.

One major retailer is looking to use Wasm to integrate with their policy system as they standardize use of Envoy—at the edge, as a sidecar, and even in their stores. The ability to roll out a policy change that is enforced everywhere they serve traffic, all with a great developer experience, makes Wasm a very attractive option for them.

By Dan Ciruli, Istio

Measuring Compositional Generalization

People are capable of learning the meaning of a new word and then applying it to other language contexts. As Lake and Baroni put it, “Once a person learns the meaning of a new verb ‘dax’, he or she can immediately understand the meaning of ‘dax twice’ and ‘sing and dax’.” Similarly, one can learn a new object shape and then recognize it with different compositions of previously learned colors or materials (e.g., in the CLEVR dataset). This is because people exhibit the capacity to understand and produce a potentially infinite number of novel combinations of known components, or as Chomsky said, to make “infinite use of finite means.” In the context of a machine learning model learning from a set of training examples, this skill is called compositional generalization.

A common approach for measuring compositional generalization in machine learning (ML) systems is to split the training and testing data based on properties that intuitively correlate with compositional structure. For instance, one approach is to split the data based on sequence length—the training set consists of short examples, while the test set consists of longer examples. Another approach uses sequence patterns, meaning the split is based on randomly assigning clusters of examples sharing the same pattern to either train or test sets. For instance, the questions "Who directed Movie1" and "Who directed Movie2" both fall into the pattern "Who directed <MOVIE>" so they would be grouped together. Yet another method uses held out primitives—some linguistic primitives are shown very rarely during training (e.g., the verb “jump”), but are very prominent in testing. While each of these experiments are useful, it is not immediately clear which experiment is a "better" measure for compositionality. Is it possible to systematically design an “optimal” compositional generalization experiment?

In “Measuring Compositional Generalization: A Comprehensive Method on Realistic Data”, we attempt to address this question by introducing the largest and most comprehensive benchmark for compositional generalization using realistic natural language understanding tasks, specifically, semantic parsing and question answering. In this work, we propose a metric—compound divergence—that allows one to quantitatively assess how much a train-test split measures the compositional generalization ability of an ML system. We analyze the compositional generalization ability of three sequence to sequence ML architectures, and find that they fail to generalize compositionally. We also are releasing the Compositional Freebase Questions dataset used in the work as a resource for researchers wishing to improve upon these results.

Measuring Compositionality

In order to measure the compositional generalization ability of a system, we start with the assumption that we understand the underlying principles of how examples are generated. For instance, we begin with the grammar rules to which we must adhere when generating questions and answers. We then draw a distinction between atoms and compounds. Atoms are the building blocks that are used to generate examples and compounds are concrete (potentially partial) compositions of these atoms. For example, in the figure below, every box is an atom (e.g., Shane Steel, brother, <entity>'s <entity>, produce, etc.), which fits together to form compounds, such as produce and <verb>, Shane Steel’s brother, Did Shane Steel’s brother produce and direct Revenge of the Spy?, etc.
Building compositional sentences (compounds) from building blocks (atoms)


An ideal compositionality experiment then should have a similar atom distribution, i.e., the distribution of words and sub-phrases in the training set is as similar as possible to their distribution in the test set, but with a different compound distribution. To measure compositional generalization on a question answering task about a movie domain, one might, for instance, have the following questions in train and test:

Train set Test set
Who directed Inception?
Did Greta Gerwig direct Goldfinger?
...
Did Greta Gerwig produce Goldfinger?
Who produced Inception?
...
While atoms such as “directed”, “Inception”, and “who <predicate> <entity>” appear in both the train and test sets, the compounds are different.

The Compositional Freebase Questions dataset

In order to conduct an accurate compositionality experiment, we created the Compositional Freebase Questions (CFQ) dataset, a simple, yet realistic, large dataset of natural language questions and answers generated from the public Freebase knowledge base. The CFQ can be used for text-in / text-out tasks, as well as semantic parsing. In our experiments, we focus on semantic parsing, where the input is a natural language question and the output is a query, which when executed against Freebase, produces the correct outcome. CFQ contains around 240k examples and almost 35k query patterns, making it significantly larger and more complex than comparable datasets — about 4 times that of WikiSQL with about 17x more query patterns than Complex Web Questions. Special care has been taken to ensure that the questions and answers are natural. We also quantify the complexity of the syntax in each example using the “complexity level” metric (L), which corresponds roughly to the depth of the parse tree, examples of which are shown below.

LQuestion → Answer
10What did Commerzbank acquire? → Eurohypo; Dresdner Bank
15Did Dianna Rhodes’s spouse produce Soldier Blue? → No
20Which costume designer of E.T. married Mannequin’s cinematographer? → Deborah Lynn Scott
40Was Weekend Cowgirls produced, directed, and written by a film editor that The Evergreen State College and Fairway Pictures employed → No
50Were It’s Not About the Shawerma, The Fifth Wall, Rick’s Canoe, White Stork Is Coming, and Blues for the Avatar executive produced, edited, directed, and written by a screenwriter’s parent? → Yes

Compositional Generalization Experiments on CFQ

For a given train-test split, if the compound distributions of the train and test sets are very similar, then their compound divergence would be close to 0, indicating that they are not difficult tests for compositional generalization. A compound divergence close to 1 means that the train-test sets have many different compounds, which makes it a good test for compositional generalization. Compound divergence thus captures the notion of "different compound distribution", as desired.

We algorithmically generate train-test splits using the CFQ dataset that have a compound divergence ranging from 0 to 0.7 (the maximum that we were able to achieve). We fix the atom divergence to be very small. Then, for each split we measure the performance of three standard ML architectures — LSTM+attention, Transformer, and Universal Transformer. The results are shown in the graph below.
Compound divergence vs accuracy for three ML architectures. There is a surprisingly strong negative correlation between compound divergence and accuracy.

We measure the performance of a model by comparing the correct answers with the output string given by the model. All models achieve an accuracy greater than 95% when the compound divergence is very low. The mean accuracy on the split with highest compound divergence is below 20% for all architectures, which means that even a large training set with a similar atom distribution between train and test is not sufficient for the architectures to generalize well. For all architectures, there is a strong negative correlation between the compound divergence and the accuracy. This seems to indicate that compound divergence successfully captures the core difficulty for these ML architectures to generalize compositionally.

Potentially promising directions for future work might be to apply unsupervised pre-training on input language or output queries, or to use more diverse or more targeted learning architectures, such as syntactic attention. It would also be interesting to apply this approach to other domains such as visual reasoning, e.g. based on CLEVR, or to extend our approach to broader subsets of language understanding, including the use of ambiguous constructs, negations, quantification, comparatives, additional languages, and other vertical domains. We hope that this work will inspire others to use this benchmark to advance the compositional generalization capabilities of learning systems.

By Marc van Zee, Software Engineer, Google Research – Brain Team

USB Keystroke Injection Protection

USB keystroke injection attacks have been an issue for a long time—problematic and affordable, due to the availability and price of keystroke injection tools. Those attacks send keystrokes immensely fast, in a human eyeblink, while being effectively invisible to the victim. Initially proposed to ease system administrator tasks, attackers learned how to use this technology for their purpose and compromise user systems. Here is an attack example, with a more or less benign payload:



To make the life of an attacker harder, we propose a tool that measures the timing of incoming keystrokes and determines if it is an attack based on predefined heuristics (without a user being involved in the decision). In contrast to the successful “attack” shown above, the following shows the same payload but with the tool installed on the system:


Choosing the RUN mode

The tool offers two different modes of operation: MONITOR and HARDENING. When running it in monitoring mode it won’t block a device that was classified as malicious, but will write a log line with information about the device to syslog. If it is run in hardening mode, it will immediately block a device that was classified as malicious/attacking. Out of the box, the tool is shipped in HARDENING mode.

Investigation

If the tool is running in monitoring mode, it logs information to the syslog. For one time inspection, this log can be read by simply using journalctl:
journalctl -u ukip.service
If it is rolled out to more machines in a network, it makes sense to collect each syslog at a centralized place for investigation.

Choosing the heuristics

A challenge when running the tool is the proper selection of the two main heuristic variables: KEYSTROKE_WINDOW and ABNORMAL_TYPING, which control the behaviour of the tool and its detection capabilities. The first one is the number of keystrokes it looks at, to determine whether it’s dealing with an attack or not. The lower the number, the higher the false positive rate; if the number is 2, the tool only looks at 1 interarrival time (the time between 2 keystrokes) to determine if it's an attack. Since users sometimes hit two keys almost at the same time it leads to the aforementioned false positive. Based on internal observations, 5 is an effective value, but should be adjusted based on the specific user’s experiences and typing behavior. The second variable specifies what interarrival time should be classified as malicious. More false-positives arise with a higher number (normal typing speed will be classified as malicious), versus with a lower number where more false-negatives arise (even very fast typing attacks will be classified as benign). That said, the preset 50000 after initial installation is a safe default but should be changed to a number reflecting the typing speed of the user using the tool. Finding the proper speed can be achieved in two ways: 1) By using one of the various online tools to measure the typing speed, and 2) using the Monitoring mode and letting it run for a few days (or even weeks) and gradually lower the false positive rate until it’s gone.

Getting it up and running

The README on Github contains a step-by-step guide to prepare the tool, set it up and run it as a systemd daemon, that is enabled on reboot. Over time it may be necessary to revise the variables for the tool by simply adjusting the values on top of /usr/sbin/ukip and restarting of the daemon:
sudo systemctl restart ukip.service

A note on silver bullets

The tool is not a silver bullet against USB-based attacks or keystroke injection attacks, since an attacker with access to a user’s machine (required for USB-based keystroke injection attacks) can do worse things if the machine is left unlocked. The tool is meant to provide another layer of protection and to defend a user sitting in front of their unlocked machine by them seeing the attack happening. They are able to see the attack either because the keystrokes are delayed enough to circumvent the tool’s logic or fast enough to be detected by it, i.e., blocking the device by unbinding its driver and logging information to syslog.

Keystroke injection attacks are difficult to detect and prevent since they’re delivered over USB (the most widely used computer peripheral connector) and require a Human Interface Device Driver (available on likely every operating system for mouse and keyboard input). The proposed tool raises the bar making it more difficult for the attacker while removing the user in the decision about whether a device is malicious or benign, apart from the refinement of the heuristic variables mentioned above. The tool can be complemented with other Linux tools, such as fine-grained udev rules or open source projects like USBGuard, to make successful attacks more challenging. The latter lets users define policies and allow/block specific USB devices or block USB devices while the screen is locked. That feature is specifically useful, since an attacker could plug in a device while the user is away from their keyboard and launch an attack once they are back. With USBGuard in place, the device would need to be replugged when the system is unlocked to work correctly.

By Sebastian Neuner, Google Information Security Engineering Team

Season of Docs announces the successful 2019 long-running projects


Season of Docs is happy to announce that all eight of the 2019 long-running documentation projects have finished successfully!

The successful long-running documentation projects are (in alphabetical order):

Apache Cassandra (Project Report, Project Description)

CERN-HSF (Project Report, Project Description)

Kolibri (Project Report, Project Description)

Mattermost (Project Report, Project Description)

MDAnalysis (Project Report, Project Description)

Open Food Facts (Project Report, Project Description)

Open Source Geospatial Foundation (Project Report, Project Description)

Tor Project (Project Report, Project Description)

Congratulations to the technical writers and organization mentors on these successful projects!

During the program, technical writers spent a few months working closely with an open source community. They brought their technical writing expertise to improve the project's documentation while the open source projects provided mentors to introduce the technical writers to open source tools, workflows, and the project's technology.

The technical writers and their mentors did a fantastic job with the inaugural year of Season of Docs! Participants in the 2019 program represented countries across all continents except for Antarctica!

You can view a list of the 44 successfully completed technical writing projects and read their project reports on the Season of Docs website.

What’s next?

Program participants should expect an email in the next few weeks about how to get their Season of Docs 2019 t-shirt (sure to be a collector’s item)!

If you’re interested in participating in a future Season of Docs, stay tuned for more information shortly—watch for posts on this blog and sign up for the announcements email list.

By Erin McKean and Kassandra Dhillon, Google Open Source

Season of Docs announces the successful 2019 long-running projects


Season of Docs is happy to announce that all eight of the 2019 long-running documentation projects have finished successfully!

The successful long-running documentation projects are (in alphabetical order):

Apache Cassandra (Project Report, Project Description)

CERN-HSF (Project Report, Project Description)

Kolibri (Project Report, Project Description)

Mattermost (Project Report, Project Description)

MDAnalysis (Project Report, Project Description)

Open Food Facts (Project Report, Project Description)

Open Source Geospatial Foundation (Project Report, Project Description)

Tor Project (Project Report, Project Description)

Congratulations to the technical writers and organization mentors on these successful projects!

During the program, technical writers spent a few months working closely with an open source community. They brought their technical writing expertise to improve the project's documentation while the open source projects provided mentors to introduce the technical writers to open source tools, workflows, and the project's technology.

The technical writers and their mentors did a fantastic job with the inaugural year of Season of Docs! Participants in the 2019 program represented countries across all continents except for Antarctica!

You can view a list of the 44 successfully completed technical writing projects and read their project reports on the Season of Docs website.

What’s next?

Program participants should expect an email in the next few weeks about how to get their Season of Docs 2019 t-shirt (sure to be a collector’s item)!

If you’re interested in participating in a future Season of Docs, stay tuned for more information shortly—watch for posts on this blog and sign up for the announcements email list.

By Erin McKean and Kassandra Dhillon, Google Open Source

FuzzBench: Fuzzer Benchmarking as a Service

We are excited to launch FuzzBench, a fully automated, open source, free service for evaluating fuzzers. The goal of FuzzBench is to make it painless to rigorously evaluate fuzzing research and make fuzzing research easier for the community to adopt.

Fuzzing is an important bug finding technique. At Google, we’ve found tens of thousands of bugs (1, 2) with fuzzers like libFuzzer and AFL. There are numerous research papers that either improve upon these tools (e.g. MOpt-AFL, AFLFast, etc) or introduce new techniques (e.g. Driller, QSYM, etc) for bug finding. However, it is hard to know how well these new tools and techniques generalize on a large set of real world programs. Though research normally includes evaluations, these often have shortcomings—they don't use a large and diverse set of real world benchmarks, use few trials, use short trials, or lack statistical tests to illustrate if findings are significant. This is understandable since full scale experiments can be prohibitively expensive for researchers. For example, a 24-hour, 10-trial, 10 fuzzer, 20 benchmark experiment would require 2,000 CPUs to complete in a day.

To help solve these issues the OSS-Fuzz team is launching FuzzBench, a fully automated, open source, free service. FuzzBench provides a framework for painlessly evaluating fuzzers in a reproducible way. To use FuzzBench, researchers can simply integrate a fuzzer and FuzzBench will run an experiment for 24 hours with many trials and real world benchmarks. Based on data from this experiment, FuzzBench will produce a report comparing the performance of the fuzzer to others and give insights into the strengths and weaknesses of each fuzzer. This should allow researchers to focus more of their time on perfecting techniques and less time setting up evaluations and dealing with existing fuzzers.

Integrating a fuzzer with FuzzBench is simple as most integrations are less than 50 lines of code (example). Once a fuzzer is integrated, it can fuzz almost all 250+ OSS-Fuzz projects out of the box. We have already integrated ten fuzzers, including AFL, LibFuzzer, Honggfuzz, and several academic projects such as QSYM and Eclipser.

Reports include statistical tests to give an idea how likely it is that performance differences between fuzzers are simply due to chance, as well as the raw data so researchers can do their own analysis. Performance is determined by the amount of covered program edges, though we plan on adding crashes as a performance metric. You can view a sample report here.

How to Participate

Our goal is to develop FuzzBench with community contributions and input so that it becomes the gold standard for fuzzer evaluation. We invite members of the fuzzing research community to contribute their fuzzers and techniques, even while they are in development. Better evaluations will lead to more adoption and greater impact for fuzzing research.

We also encourage contributions of better ideas and techniques for evaluating fuzzers. Though we have made some progress on this problem, we have not solved it and we need the community’s help in developing these best practices.

Please join us by contributing to the FuzzBench repo on GitHub.

By: Jonathan Metzman, Abhishek Arya, Google OSS-Fuzz Team and László Szekeres‎, Google Software Analysis Team

Google Summer of Code 2020 mentoring orgs announced!

We are delighted to announce the open source projects and organizations that have been accepted for Google Summer of Code (GSoC) 2020, the 16th year of the program!

After careful review, we have chosen 200 open source projects to be mentor organizations this year, 30 of which are new to the program. Please see the program website for a complete list of the accepted organizations.

Are you a student interested in participating in GSoC this year? We will begin accepting student applications on Monday, March 16, 2020 at 18:00 UTC and the deadline to apply is Tuesday, March 31, 2020 at 18:00 UTC.


The most successful applications come from students who start preparing now. Here are some proactive tips for a successful before the application period begins:
You can find more information on our website which includes a full timeline of important dates. We also highly recommend perusing the FAQ and Program Rules and watching some of our other videos with more details about GSoC for students and mentors.

A hearty congratulations—and thank you—to all of our mentor organizations! We look forward to working with all of you during Google Summer of Code 2020.

By Stephanie Taylor, Google Open Source

AutoFlip: An Open Source Framework for Intelligent Video Reframing

Originally posted on the AI Blog

Videos filmed and edited for television and desktop are typically created and viewed in landscape aspect ratios (16:9 or 4:3). However, with an increasing number of users creating and consuming content on mobile devices, historical aspect ratios don’t always fit the display being used for viewing. Traditional approaches for reframing video to different aspect ratios usually involve static cropping, i.e., specifying a camera viewport, then cropping visual contents that are outside. Unfortunately, these static cropping approaches often lead to unsatisfactory results due to the variety of composition and camera motion styles. More bespoke approaches, however, typically require video curators to manually identify salient contents on each frame, track their transitions from frame-to-frame, and adjust crop regions accordingly throughout the video. This process is often tedious, time-consuming, and error-prone.

To address this problem, we are happy to announce AutoFlip, an open source framework for intelligent video reframing. AutoFlip is built on top of the MediaPipe framework that enables the development of pipelines for processing time-series multimodal data. Taking a video (casually shot or professionally edited) and a target dimension (landscape, square, portrait, etc.) as inputs, AutoFlip analyzes the video content, develops optimal tracking and cropping strategies, and produces an output video with the same duration in the desired aspect ratio.
Left: Original video (16:9). Middle: Reframed using a standard central crop (9:16). Right: Reframed with AutoFlip (9:16). By detecting the subjects of interest, AutoFlip is able to avoid cropping off important visual content.

AutoFlip Overview

AutoFlip provides a fully automatic solution to smart video reframing, making use of state-of-the-art ML-enabled object detection and tracking technologies to intelligently understand video content. AutoFlip detects changes in the composition that signify scene changes in order to isolate scenes for processing. Within each shot, video analysis is used to identify salient content before the scene is reframed by selecting a camera mode and path optimized for the contents.

Shot (Scene) Detection

A scene or shot is a continuous sequence of video without cuts (or jumps). To detect the occurrence of a shot change, AutoFlip computes the color histogram of each frame and compares this with prior frames. If the distribution of frame colors changes at a different rate than a sliding historical window, a shot change is signaled. AutoFlip buffers the video until the scene is complete before making reframing decisions, in order to optimize the reframing for the entire scene.

Video Content Analysis

We utilize deep learning-based object detection models to find interesting, salient content in the frame. This content typically includes people and animals, but other elements may be identified, depending on the application, including text overlays and logos for commercials, or motion and ball detection for sports.

The face and object detection models are integrated into AutoFlip through MediaPipe, which uses TensorFlow Lite on CPU. This structure allows AutoFlip to be extensible, so developers may conveniently add new detection algorithms for different use cases and video content. Each object type is associated with a weight value, which defines its relative importance — the higher the weight, the more influence the feature will have when computing the camera path.


Left: People detection on sports footage. Right: Two face boxes (‘core’ and ‘all’ face landmarks). In narrow portrait crop cases, often only the core landmark box can fit.

Reframing

After identifying the subjects of interest on each frame, logical decisions about how to reframe the content for a new view can be made. AutoFlip automatically chooses an optimal reframing strategy — stationary, panning or tracking — depending on the way objects behave during the scene (e.g., moving around or stationary). In stationary mode, the reframed camera viewport is fixed in a position where important content can be viewed throughout the majority of the scene. This mode can effectively mimic professional cinematography in which a camera is mounted on a stationary tripod or where post-processing stabilization is applied. In other cases, it is best to pan the camera, moving the viewport at a constant velocity. The tracking mode provides continuous and steady tracking of interesting objects as they move around within the frame.

Based on which of these three reframing strategies the algorithm selects, AutoFlip then determines an optimal cropping window for each frame, while best preserving the content of interest. While the bounding boxes track the objects of focus in the scene, they typically exhibit considerable jitter from frame-to-frame and, consequently, are not sufficient to define the cropping window. Instead, we adjust the viewport on each frame through the process of Euclidean-norm optimization, in which we minimize the residuals between a smooth (low-degree polynomial) camera path and the bounding boxes.

Top: Camera paths resulting from following the bounding boxes from frame-to-frame. Bottom: Final smoothed camera paths generated using Euclidean-norm path formation. Left: Scene in which objects are moving around, requiring a tracking camera path. Right: Scene where objects stay close to the same position; a stationary camera covers the content for the full duration of the scene.

AutoFlip’s configuration graph provides settings for either best-effort or required reframing. If it becomes infeasible to cover all the required regions (for example, when they are too spread out on the frame), the pipeline will automatically switch to a less aggressive strategy by applying a letterbox effect, padding the image to fill the frame. For cases where the background is detected as being a solid color, this color is used to create seamless padding; otherwise a blurred version of the original frame is used.

AutoFlip Use Cases

We are excited to release this tool directly to developers and filmmakers, reducing the barriers to their design creativity and reach through the automation of video editing. The ability to adapt any video format to various aspect ratios is becoming increasingly important as the diversity of devices for video content consumption continues to rapidly increase. Whether your use case is portrait to landscape, landscape to portrait, or even small adjustments like 4:3 to 16:9, AutoFlip provides a solution for intelligent, automated and adaptive video reframing.


What’s Next?

Like any machine learning algorithm, AutoFlip can benefit from an improved ability to detect objects relevant to the intent of the video, such as speaker detection for interviews or animated face detection on cartoons. Additionally, a common issue arises when input video has important overlays on the edges of the screen (such as text or logos) as they will often be cropped from the view. By combining text/logo detection and image inpainting technology, we hope that future versions of AutoFlip can reposition foreground objects to better fit the new aspect ratios. Lastly, in situations where padding is required, deep uncrop technology could provide improved ability to expand beyond the original viewable area.

While we work to improve AutoFlip internally at Google, we encourage contributions from developers and filmmakers in the open source communities.

Acknowledgments

We would like to thank our colleagues who contributed to Autoflip, Alexander Panagopoulos, Jenny Jin, Brian Mulford, Yuan Zhang, Alex Chen, Xue Yang, Mickey Wang, Justin Parra, Hartwig Adam, Jingbin Wang, and Weilong Yang; MediaPipe team who helped with open sourcing, Jiuqiang Tang, Tyler Mullen, Mogan Shieh, Ming Guang Yong, and Chuo-Ling Chang.

By Nathan Frey, Senior Software Engineer, Google Research, Los Angeles and Zheng Sun, Senior Software Engineer, Google Research, Mountain View