Posted by Nikhil Thorat and Daniel Smilkov, Software Engineers, Google Big Picture Team
Machine learning (ML) has become an increasingly powerful tool, one that can be applied to a wide variety of areas spanning object recognition, language translation, health and more. However, the development of ML systems is often restricted to those with computational resources and the technical expertise to work with commonly available ML libraries.
There are many reasons to bring machine learning into the browser. A client-side ML library can be a platform for interactive explanations, for rapid prototyping and visualization, and even for offline computation. And if nothing else, the browser is one of the world's most popular programming platforms.
The API mimics the structure of TensorFlow and NumPy, with a delayed execution model for training (like TensorFlow), and an immediate execution model for inference (like NumPy). We have also implemented versions of some of the most commonly-used TensorFlow operations. With the release of deeplearn.js, we will be providing tools to export weights from TensorFlow checkpoints, which will allow authors to import them into web pages for deeplearn.js inference.
You can explore the potential of this library by training a convolutional neural network to recognize photos and handwritten digits — all in your browser without writing a single line of code.
We're releasing a series of demos that show deeplearn.js in action. Play with an image classifier that uses your webcam in real-time and watch the network’s internal representations of what it sees. Or generate abstract art videos at a smooth 60 frames per second. The deeplearn.js homepage contains these and other demos.
Our vision is that this library will significantly increase visibility and engagement with machine learning, giving developers access to powerful tools while simultaneously providing the everyday user with a way to interact with them. We’re looking forward to collaborating with the open source community to drive this vision forward.
Climate change is one of the greatest challenges of our time, and the way we generate and use electricity now is a major contributor to that issue. To solve it, we need to find a way to eliminate the carbon emissions associated with our electricity as quickly and as cheaply as possible.
Many analysts have come up with a number of possible solutions: renewable energy plus increased energy storage capacity, nuclear power, carbon capture and sequestration from fossil fuels, or a mixture of these. But we realized that the different answers came from different assumptions that people were making about what combination of those technologies and policies would lead to a positive change.
To help our team understand these dynamics, we created a tool that allows us to quickly see how different assumptions—wind, solar, coal, nuclear, for example—affect the future cost to generate electricity and the amount of carbon dioxide emitted.
We created a simplified model of the electrical grid, where demand is always fulfilled at least cost. By “least cost,” we mean the cost of constructing and maintaining power plants, and generating electricity (with fuel, if required). For a given set of assumptions, the model determines the amount of generation capacity to build and when to turn on which type of generator. Our model is similar to othersproposed in other research, but we’ve simplified the model to make it run fast.
We then ran the model hundreds of thousands of times with different assumptions, using our computing infrastructure. We gather all of the runs of the model and present them in a simple web page. Anyone —from students to energy policy wonks—can try different assumptions and see how those assumptions will affect the cost and CO2. The web UI is available for you to try: you can explore the how utilities decide to dispatch their generation capacity, then can test different assumptions. Finally, you can compare different assumptions and share them with others.
We’ve written up the technical details of the model in this paper. In case you want to change the assumptions in the model, we are also releasing the code on Github. The paper shows how the cost of generation technologies change as a function of the fraction of demand that they fulfill. The paper also discusses the limitations and validity of the model.
One interesting conclusion of the paper: if we can find a zero-carbon, 24x7 electricity source that costs about $2200/kW to build, it can displace carbon emission from the electricity grid in less than 27 years. We hope that the tool and the paper help people understand their assumptions about the future of electricity, and stimulate research into climate and energy.
Posted by Christian Howard, Editor-in-Chief, Research Communications
Machine learning (ML) is a key strategic focus at Google, with highly active groups pursuing research in virtually all aspects of the field, including deep learning and more classical algorithms, exploring theory as well as application. We utilize scalable tools and architectures to build machine learning systems that enable us to solve deep scientific and engineering challenges in areas of language, speech, translation, music, visual processing and more.
As a leader in ML research, Google is proud to be a Platinum Sponsor of the thirty-fourth International Conference on Machine Learning (ICML 2017), a premier annual event supported by the International Machine Learning Society taking place this week in Sydney, Australia. With over 130 Googlers attending the conference to present publications and host workshops, we look forward to our continued colalboration with the larger ML research community.
If you're attending ICML 2017, we hope you'll visit the Google booth and talk with our researchers to learn more about the exciting work, creativity and fun that goes into solving some of the field's most interesting challenges. Our researchers will also be available to talk about and demo several recent efforts, including the technology behind Facets, neural audio synthesis with Nsynth, a Q&A session on the Google Brain Residency program and much more. You can also learn more about our research being presented at ICML 2017 in the list below (Googlers highlighted in blue).
ICML 2017 Committees Senior Program Committee includes: Alex Kulesza, Amr Ahmed, Andrew Dai, Corinna Cortes, George Dahl, Hugo Larochelle, Matthew Hoffman, Maya Gupta, Moritz Hardt, Quoc Le
As a leader in natural language processing & understanding and a Platinum sponsor of ACL 2017, Google will be on hand to showcase research interests that include syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better systems using labeled and unlabeled data, state-of-the-art modeling and learning from indirect supervision.
If you’re attending ACL 2017, we hope that you’ll stop by the Google booth to check out some demos, meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Learn more about the Google research being presented at ACL 2017 below (Googlers highlighted in blue).
Organizing Committee Area Chairs include: Sujith Ravi (Machine Learning), Thang Luong (Machine Translation) Publication Chairs include: Margaret Mitchell (Advisory)
Posted by Avneesh Sud, Software Engineer, Machine Perception
Recently Google Machine Perception researchers, in collaboration with Daydream Labs and YouTube Spaces, presented a solution for virtual headset ‘removal’ for mixed reality in order to create a more rich and engaging VR experience. While that work could infer eye-gaze directions and blinks, enabled by a headset modified with eye-tracking technology, a richer set of facial expressions — which are key to understanding a person's experience in VR, as well as conveying important social engagement cues — were missing.
Today we present an approach to infer select facial action units and expressions entirely by analyzing a small part of the face while the user is engaged in a virtual reality experience. Specifically, we show that images of the user’s eyes captured from an infrared (IR) gaze-tracking camera within a VR headset are sufficient to infer at least a subset of facial expressions without the use of any external cameras or additional sensors.
Left: A user wearing a VR HMD modified with eye-tracking used for expression classification (Note that no external camera is used in our method; this is just for visualization). Right: inferred expression from eye images using our model. A video demonstrating the work can be seen here.
We use supervised deep learning to classify facial expressions from images of the eyes and surrounding areas, which typically contain the iris, sclera, eyelids and may include parts of the eyebrows and top of cheeks. Obtaining large scale annotated data from such novel sensors is a challenging task, hence we collected training data by capturing 46 subjects while performing a set of facial expressions.
To perform expression classification, we fine-tuned a variant of the widespread Inception architecture with TensorFlow using weights from a model trained to convergence on Imagenet. We attempted to partially remove variance due to differences in participant appearance (i.e., individual differences that do not depend on expression), inspired by the standard practice of mean image subtraction. Since this variance removal occurs within-subject, it is effectively personalization. Further details, along with examples of eye-images, and results are presented in our accompanying paper.
Results and Extensions We demonstrate that the information required to classify a variety of facial expressions is reliably present in IR eye images captured by a commercial HMD sensor, and that this information can be decoded using a CNN-based method, even though classifying facial expressions from eye-images alone is non-trivial even for humans. Our model inference can be performed in real-time, and we show this can be used to generate expressive avatars in real-time, which can function as an expressive surrogate for users engaged in VR. This interaction mechanism also yields a more intuitive interface for sharing expression in VR as opposed to gestures or keyboard inputs.
The ability to capture a user’s facial expressions using existing eye-tracking cameras enables a fully mobile solution to facial performance capture in VR, without additional external cameras. This technology extends beyond animating cartoon avatars; it could be used to provide a richer headset removal experience, enhancing communication and social interaction in VR by transmitting far more authentic and emotionally textured information.
Acknowledgements The research described in this post was performed by Steven Hickson (as an intern), Nick Dufour, Avneesh Sud, Vivek Kwatra and Irfan Essa. We also thank Hayes Raffle and Alex Wong from Daydream, and Chris Bregler, Sergey Ioffe and authors of TF-Slim from Google Research for their guidance and suggestions.
Posted by Ted Baltz, Senior Staff Software Engineer, Google Accelerated Science Team
Wait, what? Why is Google interested in plasma physics?
Google is always interested in solving complex engineering problems, and few are more complex than fusion. Physicists have been trying since the 1950s to control the fusion of hydrogen atoms into helium, which is the same process that powers the Sun. The key to harnessing this power is to confine hydrogen plasmas for long enough to get more energy out from fusion reactions than was put in. This point is called “breakeven.” If it works, it would represent a technological breakthrough, and could provide an abundant source of zero-carbon energy.
There are currently several large academic and government research efforts in fusion. Just to rattle off a few, in plasma fusion there are tokamak machines like ITER and stellarator machines like Wendelstein 7-X. The stellarator design actually goes back to 1951, so physicists have been working on this for a while. Oh yeah, and if you like giant lasers, there’s the National Ignition Facility which users lasers to generate X-rays to generate fusion reactions. So far, none of these has gotten to the economic breakeven point.
Did you really just say that you got to fire a plasma collider?
Yeah. Tri Alpha Energy has a unique scheme for plasma confinement called a field-reversed configuration that’s predicted to get more stable as the energy goes up, in contrast to other methods where plasmas get harder to control as you heat them. Tri Alpha built a giant ionized plasma machine, C-2U, that fills an entire warehouse in an otherwise unassuming office park. The plasma that this machine generates and confines exhibits all kinds of highly nonlinear behavior. The machine itself pushes the envelope of how much electrical power can be applied to generate and confine the plasma in such a small space over such a short time. It’s a complex machine with more than 1000 knobs and switches, an investment (not ours!) in exploring clean energy north of $100 million. This is a high-stakes optimization problem, dealing with both plasma performance and equipment constraints. This is where Google comes in.
End-on view of C-2U
Wait, why not just simulate what will happen? Isn’t this simple physics?
The “simple” simulations using magnetohydrodynamics don’t really apply. Even if these machines operated in that limit, which they very much don’t, the simulations make fluid dynamics simulations look easy! The reality is much more complicated, as the ion temperature is three times larger than the electron temperature, so the plasma is far out of thermal equilibrium, also, the fluid approximation is totally invalid, so you have to track at least some of the trillion+ individual particles, so the whole thing is beyond what we know how to do even with Google-scale compute resources.
So why are we doing this? Real experiments! With atoms not bits! At Google we love to run experiments and optimize things. We thought it would be a great challenge to see if we could help Tri Alpha. They run a plasma “shot” on the C-2U machine every 8 minutes. Each shot consists of creating two spinning blobs of plasma in the vacuum sealed innards of C-2U, smashing them together at over 600,000 miles per hour, creating a bigger, hotter, spinning football of plasma. Then they blast it continuously with particle beams (actually neutral hydrogen atoms) to keep it spinning. They hang on to the spinning football with magnetic fields for as long as 10 milliseconds. They’re trying to experimentally verify that these advanced beam driven field-reversed plasma configurations behave as expected by theory. If they do, this scheme could lead to net-energy-out fusion.
Now 8 minutes sounds like a long time (which is the time it takes for C-2U to cool, recharge, and get ready for another 10 ms shot), but when you’re sitting in the control room during an experimental campaign, it goes by really quickly. There are a lot of sensor outputs to look at, to try to figure out how the plasma was behaving. Before you know it, the power supplies are charged again, and they’re ready for another go!
What was that about optimization? What are you optimizing?
That’s the thing, it’s not completely obvious what good plasma performance is. Of course, Tri Alpha has some of the world’s best plasma physicists, but even they disagree on what “good” is. We can boil down the machine controls to “only” 30 parameters or so, but when you have to wait 8 minutes per experiment, it’s a pretty hard problem even with a concrete objective. Also, it’s not entirely known, day-to-day, what the reliable operating envelope of the machine is. And it keeps changing since the quality of the vacuum keeps changing and electrodes wear out and...
So we boil the problem down to “let’s find plasma behaviors that an expert human plasma physicist thinks are interesting, and let’s not break the machine when we’re doing it.” We developed the Optometrist Algorithm, which is sort of a Markov Chain Monte Carlo (MCMC) where the likelihood function being explored is in the plasma physicist’s mind rather than being explicitly written down. Just like getting an eyeglass prescription, the algorithm presents the expert human with machine settings and the associated outcomes. They can just use their judgement on what is interesting, and what is unhealthy for the machine. These could be “That initial collision looked really strong!” or “The edge biasing is actually working well now!” or “Wow, that was awesome, but the electrode current was way too high, let’s not do that again!” The key improvement we provided was a technique to search the high-dimensional space of machine parameters efficiently.
Oh, I like MCMC, it’s like the best thing ever!
I knew you’d like that bit. Using this technique, we actually found something really interesting. As we describe in our paper, we found a regime where the neutral particle beams dumping energy into the plasma were able to completely balance the cooling losses, and the total energy in the plasma actually went up after formation. It was only for about 2 milliseconds, but still, it was a first! Since rising energy due to neutral beam heating was not necessarily expected for C-2U, it would have been difficult to plug into an objective function. We really needed a human expert to notice. This was a classic case of humans and computers doing a better job together than either could have separately. You know how it is — when you think you have an optimization problem, and you optimize the objective, you usually just look at the result and say, “No no no, that’s not what I meant,” and you add some other term and repeat until you get sick of it?
That hasn’t happened to me. This week. Yet.
Yeah, so we just cut out that iteration and let the expert human use their judgment. This learning from human preferences is becoming a thing. Google and Tri Alpha made a pretty good team for it, for a really important problem.
So what now?
So actually, Tri Alpha learned everything they could have from C-2U and then dismantled it. They built a new machine called Norman (after their late co-founder Norman Rostoker) in the same warehouse. It’s much more powerful both in plasma acceleration and in neutral particle beams. It also has a more sophisticated system to confine the plasma in the central region. The pressure vessel, accelerators, and banks of capacitors and power supplies cover the building’s concrete floor.
They just achieved “first plasma” on it. They’re hoping, with our help, to verify this theoretical prediction that the plasma will actually behave better in the “burning plasma” regime. If they can do that over the next 18 months, it will be a lot more likely that the field-reversed configuration is a viable approach for breakeven fusion. In that case, Tri Alpha will try to build their follow-on design, an actual demonstration power generator. That one won’t fit in their warehouse!
Acknowledgements On the Google side, we wish to thank John Platt, Michael Dikovsky, Patrick Riley and Ross Koningstein for their significant contributions to this work. We thank the Google Accelerated Science team for their continual support. We are also grateful to the entire team at Tri Alpha for giving us the opportunity to try our hand at optimization for this crucially important problem.
Posted by Sergey Levine, Faculty Advisor and Pierre Sermanet, Research Scientist, Google Brain Team
Machine learning can allow robots to acquire complex skills, such as grasping and opening doors. However, learning these skills requires us to manually program reward functions that the robots then attempt to optimize. In contrast, people can understand the goal of a task just from watching someone else do it, or simply by being told what the goal is. We can do this because we draw on our own prior knowledge about the world: when we see someone cut an apple, we understand that the goal is to produce two slices, regardless of what type of apple it is, or what kind of tool is used to cut it. Similarly, if we are told to pick up the apple, we understand which object we are to grab because we can ground the word “apple” in the environment: we know what it means.
These are semantic concepts: salient events like producing two slices, and object categories denoted by words such as “apple.” Can we teach robots to understand semantic concepts, to get them to follow simple commands specified through categorical labels or user-provided examples? In this post, we discuss some of our recent work on robotic learning that combines experience that is autonomously gathered by the robot, which is plentiful but lacks human-provided labels, with human-labeled data that allows a robot to understand semantics. We will describe how robots can use their experience to understand the salient events in a human-provided demonstration, mimic human movements despite the differences between human robot bodies, and understand semantic categories, like “toy” and “pen”, to pick up objects based on user commands.
Understanding human demonstrations with deep visual features In the first set of experiments, which appear in our paper Unsupervised Perceptual Rewards for Imitation Learning, our is aim is to enable a robot to understand a task, such as opening a door, from seeing only a small number of unlabeled human demonstrations. By analyzing these demonstrations, the robot must understand what is the semantically salient event that constitutes task success, and then use reinforcement learning to perform it.
Examples of human demonstrations (left) and the corresponding robotic imitation (right).
Unsupervised learning on very small datasets is one of the most challenging scenarios in machine learning. To make this feasible, we use deep visual features from a large network trained for image recognition on ImageNet. Such features are known to be sensitive to semantic concepts, while maintaining invariance to nuisance variables such as appearance and lighting. We use these features to interpret user-provided demonstrations, and show that it is indeed possible to learn reward functions in an unsupervised fashion from a few demonstrations and without retraining.
Example of reward functions learned solely from observation for the door opening tasks. Rewards progressively increase from zero to the maximum reward as a task is completed.
After learning a reward function from observation only, we use it to guide a robot to learn a door opening task, using only the images to evaluate the reward function. With the help of an initial kinesthetic demonstration that succeeds about 10% of the time, the robot learns to improve to 100% accuracy using the learned reward function.
Emulating human movements with self-supervision and imitation. In Time-Contrastive Networks: Self-Supervised Learning from Multi-View Observation, we propose a novel approach to learn about the world from observation and demonstrate it through self-supervised pose imitation. Our approach relies primarily on co-occurrence in time and space for supervision: by training to distinguish frames from different times of a video, it learns to disentangle and organize reality into useful abstract representations.
In a pose imitation task for example, different dimensions of the representation may encode for different joints of a human or robotic body. Rather than defining by hand a mapping between human and robot joints (which is ambiguous in the first place because of physiological differences), we let the robot learn to imitate in an end-to-end fashion. When our model is simultaneously trained on human and robot observations, it naturally discovers the correspondence between the two, even though no correspondence is provided. We thus obtain a robot that can imitate human poses without having ever been given a correspondence between humans and robots.
Self-supervised human pose imitation by a robot.
A striking evidence of the benefits of learning end-to-end is the many-to-one and highly non-linear joints mapping shown above. In this example, the up-down motion involves many joints for the human while only one joint is needed for the robot. We show that the robot has discovered this highly complex mapping on its own, without any explicit human pose information.
Grasping with semantic object categories The experiments above illustrate how a person can specify a goal for a robot through an example demonstration, in which case the robots must interpret the semantics of the task -- salient events and relevant features of the pose. What if instead of showing the task, the human simply wants to tell it to what to do? This also requires the robot to understand semantics, in order to identify which objects in the world correspond to the semantic category specified by the user. In End-to-End Learning of Semantic Grasping, we study how a combination of manually labeled and autonomously collected data can be used to perform the task of semantic grasping, where the robot must pick up an object from a cluttered bin that matches a user-specified class label, such as “eraser” or “toy.”
In our semantic grasping setup, the robotic arm is tasked with picking up an object corresponding to a user-provided semantic category (e.g. Legos).
To learn how to perform semantic grasping, our robots first gather a large dataset of grasping data by autonomously attempting to pick up a large variety of objects, as detailed in our previous post and prior work. This data by itself can allow a robot to pick up objects, but doesn’t allow it to understand how to associate them with semantic labels. To enable an understanding of semantics, we again enlist a modest amount of human supervision. Each time a robot successfully grasps an object, it presents it to the camera in a canonical pose, as illustrated below.
The robot presents objects to the camera after grasping. These images can be used to label which object category was picked up.
A subset of these images is then labeled by human labelers. Since the presentation images show the object in a canonical pose, it is easy to then propagate these labels to the remaining presentation images by training a classifier on the labeled examples. The labeled presentation images then tell the robot which object was actually picked up, and it can associate this label, in hindsight, with the images that it observed while picking up that object from the bin.
Using this labeled dataset, we can then train a two-stream model that predicts which object will be grasped, conditioned on the current image and the actions that the robot might take. The two-stream model that we employ is inspired by the dorsal-ventral decomposition observed in the human visual cortex, where the ventral stream reasons about the semantic class of objects, while the dorsal stream reasons about the geometry of the grasp. Crucially, the ventral stream can incorporate auxiliary data consisting of labeled images of objects (not necessarily from the robot), while the dorsal stream can incorporate auxiliary data of grasping that does not have semantic labels, allowing the entire system to be trained more effectively using larger amounts of heterogeneously labeled data. In this way, we can combine a limited amount of human labels with a large amount of autonomously collected robotic data to grasp objects based on desired semantic category, as illustrated in the video below: Future Work Our experiments show how limited semantically labeled data can be combined with data that is collected and labeled automatically by the robots, in order to enable robots to understand events, object categories, and user demonstrations. In the future, we might imagine that robotic systems could be trained with a combination of user-annotated data and ever-increasing autonomously collected datasets, improving robotic capability and easing the engineering burden of designing autonomous robots. Furthermore, as robotic systems collect more and more automatically annotated data in the real world, this data can be used to improve not just robotic systems, but also systems for computer vision, speech recognition, and natural language processing that can all benefit from such large auxiliary data sources.
Of course, we are not the first to consider the intersection of robotics and semantics. Extensive prior work in natural language understanding, robotic perception, grasping, and imitation learning has considered how semantics and action can be combined in a robotic system. However, the experiments we discussed above might point the way to future work into combining self-supervised and human-labeled data in the context of autonomous robotic systems.
Acknowledgements The research described in this post was performed by Pierre Sermanet, Kelvin Xu, Corey Lynch, Jasmine Hsu, Eric Jang, Sudheendra Vijayanarasimhan, Peter Pastor, Julian Ibarz, and Sergey Levine. We also thank Mrinal Kalakrishnan, Ali Yahya, and Yevgen Chebotar for developing the policy learning framework used for the door task, and John-Michael Burke for conducting experiments for semantic grasping.
Posted by Christian Howard, Editor-in-Chief, Research Communications
From July 21-26, Honolulu, Hawaii hosts the 2017 Conference on Computer Vision and Pattern Recognition (CVPR 2017), the premier annual computer vision event comprising the main conference and several co-located workshops and tutorials. As a leader in computer vision research and a Platinum Sponsor, Google will have a strong presence at CVPR 2017 — over 250 Googlers will be in attendance to present papers and invited talks at the conference, and to organize and participate in multiple workshops.
YouTube-8M Large-Scale Video Understanding Challenge General Chairs: Paul Natsev, Rahul Sukthankar Program Chairs:Joonseok Lee, George Toderici Challenge Organizers: Sami Abu-El-Haija, Anja Hauth, Nisarg Kothari, Hanhan Li, Sobhan Naderi Parizi, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan, Jian Wang
Posted by Vittorio Ferrari, Research Scientist, Machine Perception
Last year we introduced Open Images, a collaborative release of ~9 million images annotated with labels spanning over 6000 object categories, designed to be a useful dataset for machine learning research. The initial release featured image-level labels automatically produced by a computer vision model similar to Google Cloud Vision API, for all 9M images in the training set, and a validation set of 167K images with 1.2M human-verified image-level labels.
Today, we introduce an update to Open Images, which contains the addition of a total of ~2M bounding-boxes to the existing dataset, along with several million additional image-level labels. Details include:
1.2M bounding-boxes around objects for 600 categories on the training set. These have been produced semi-automatically by an enhanced version of the technique outlined in , and are all human-verified.
Complete bounding-box annotation for all object instances of the 600 categories on the validation set, all manually drawn (830K boxes). The bounding-box annotations in the training and validations sets will enable research on object detection on this dataset. The 600 categories offer a broader range than those in the ILSVRC and COCO detection challenges, and include new objects such as fedora hat and snowman.
4.3M human-verified image-level labels on the training set (over all categories). This will enable large-scale experiments on object classification, based on a clean training set with reliable labels.
We hope that this update to Open Images will stimulate the broader research community to experiment with object classification and detection models, and facilitate the development and evaluation of new techniques.
Posted by Karthik Raveendran and Suril Shah, Software Engineers, Google Research
Last year, we launched Motion Stills, an iOS app that stabilizes your Live Photos and lets you view and share them as looping GIFs and videos. Since then, Motion Stills has been well received, being listed as one of the top apps of 2016 by The Verge and Mashable. However, from its initial release, the community has been asking us to also make Motion Stills available for Android. We listened to your feedback and today, we're excited to announce that we’re bringing this technology, and more, to devices running Android 5.1 and later!
Motion Stills on Android: Instant stabilization on your device.
With Motion Stills on Android we built a new recording experience where everything you capture is instantly transformed into delightful short clips that are easy to watch and share. You can capture a short Motion Still with a single tap like a photo, or condense a longer recording into a new feature we call Fast Forward. In addition to stabilizing your recordings, Motion Stills on Android comes with an improved trimming algorithm that guards against pocket shots and accidental camera shakes. All of this is done during capture on your Android device, no internet connection required!
New streaming pipeline For this release, we redesigned our existing iOS video processing pipeline to use a streaming approach that processes each frame of a video as it is being recorded. By computing intermediate motion metadata, we are able to immediately stabilize the recording while still performing loop optimization over the full sequence. All this leads to instant results after recording — no waiting required to share your new GIF.
Capture using our streaming pipeline gives you instant results.
In order to display your Motion Stills stream immediately, our algorithm computes and stores the necessary stabilizing transformation as a low resolution texture map. We leverage this texture to apply the stabilization transform using the GPU in real-time during playback, instead of writing a new, stabilized video that would tax your mobile hardware and battery.
Fast Forward Fast Forward allows you to speed up and condense a longer recording into a short, easy to share clip. The same pipeline described above allows Fast Forward to process up to a full minute of video, right on your phone. You can even change the speed of playback (from 1x to 8x) after recording. To make this possible, we encode videos with a denser I-frame spacing to enable efficient seeking and playback. We also employ additional optimizations in the Fast Forward mode. For instance, we apply adaptive temporal downsampling in the linear solver and long-range stabilization for smooth results over the whole sequence.
Fast Forward condenses your recordings into easy to share clips.
Try out Motion Stills Motion Stills is an app for us to experiment and iterate quickly with short-form video technology, gathering valuable feedback along the way. The tools our users find most fun and useful may be integrated later on into existing products like Google Photos. Download Motion Stills for Android from the Google Play store—available for mobile phones running Android 5.1 and later—and share your favorite clips on social media with hashtag #motionstills. Acknowledgements Motion Stills would not have been possible without the help of many Googlers. We want to especially acknowledge the work of Matthias Grundmann in advancing our stabilization technology, as well as our UX and interaction designers Jacob Zukerman, Ashley Ma and Mark Bowers.