Source: Official Google Australia Blog
- Better videos with Cinematic Pan: Pixel 4a with 5G and Pixel 5 come with Cinematic Pan, which gives your videos a professional look with ultrasmooth panning that’s inspired by the equipment Hollywood directors use.
- Night Sight in Portrait Mode: Night Sight currently gives you the ability to capture amazing low-light photos—and even the Milky Way with astrophotography. Now, these phones bring the power of Night Sight into Portrait Mode to capture beautifully blurred backgrounds in Portraits even in extremely low light.
- Portrait Light: Portrait Mode on the Pixel 4a with 5G and Pixel 5 lets you capture beautiful portraits that focus on your subject as the background fades into an artful blur. If the lighting isn’t right, your Pixel can drop in extra light to illuminate your subjects.
- Ultrawide lens for ultra awesome shots: With an ultrawide lens alongside the standard rear camera, you’ll be able to capture the whole scene. And thanks to Google’s software magic, the latest Pixels still get our Super Res Zoom. So whether you’re zooming in or zooming out, you get sharp details and breathtaking images.
- New editor in Google Photos: Even after you’ve captured your portrait, Google Photos can help you add studio-quality light to your portraits of people with Portrait Light, in the new, more helpful Google Photos editor.
Source: Official Google Australia Blog
This year, we’ve all spent a lot of time exploring things to do at home. Some of us gardened, and others baked. We tried at-home workouts, or took up art projects. But one thing that many—maybe all of us—did? Enjoyed a lot of music at home. I’ve spent so much more time listening to music during quarantine—bossa nova is my go-to soundtrack for doing the dishes and Lil Baby has become one of my favorite artists. But you might even prefer Mohammed Rafi or Ilayaraja.
To help provide a richer soundtrack to your time at home, we’re especially excited to introduce Nest Audio, our latest smart speaker made for music lovers.
A music machine
Nest Audio is 75 percent louder and has 50 percent stronger bass than the original Google Home—measurements of both devices were taken in controlled conditions. With a 19mm tweeter for consistent high frequency coverage and clear vocals and a 75mm mid-woofer that really brings the bass, this smart speaker is built to deliver a rich musical experience.
Nest Audio’s sound is full, clear, and natural. We completed more than 500 hours of tuning to ensure balanced lows, mids and highs so nothing is lacking or overbearing. The bass is significant and the vocals have depth, which makes Nest Audio sound great across genres: classical, R&B, pop and more. The custom-designed tweeter allows each musical detail to come through, and we optimized the grill, fabric and materials so that you can enjoy the audio without distortion.
Our goal was to ensure that Nest Audio stayed faithful to what the artist intended when they were in the recording studio. We minimized the use of compressors to preserve dynamic range, so the auditory contrast in the original production is preserved—the quiet parts are delicate and subtle, and the loud ones are more dramatic and powerful.
Nest Audio also adapts to your home. Our Media EQ feature enables Nest Audio to automatically tune itself to whatever you’re listening to: music, podcasts, audiobooks or even a response from Google Assistant. And Ambient IQ lets Nest Audio also adjust the volume of Assistant, news, podcasts and audiobooks based on the background noise in your home, so you can hear the weather forecast over a noisy vacuum cleaner.
Whole home audio
If you have a Google Home, Nest Mini or even a Nest Hub, you can easily make Nest Audio the center of your whole home sound system. In my living room, I’ve connected two Nest Audio speakers as a stereo pair for left and right channel separation. I also have a Nest Mini in my bedroom and a Nest Hub in the entryway. These devices are grouped so that I can blast the same song on all of them when I have my daily dance party.
With our stream transfer feature, I can move music from one device to the other with just my voice*. I can even transfer music or podcasts from my phone when I walk in the door. Just last month, we launched multi-room control, which allows you to dynamically group multiple cast-enabled Nest devices in real time.
The Google Assistant you love
Google Assistant, available in Hindi and English, helps you tackle your day, enjoy your entertainment and control compatible smart home brands like Philips Hue, TP-Link and more. In fact, people have already set up more than 100 million devices to work with Google Assistant. Plus, if you’re a YouTube Music or Spotify Premium subscriber, you can say, “Ok Google, recommend some music” and Google Assistant will offer a variety of choices from artists and genres that you like as well as others that are similar.
Differentiated by design
Typically, a bigger speaker equals bigger sound, but Nest Audio has a really slim profile—so it fits anywhere in the home. In order to maximize audio output, we custom-designed quality drivers and housed them in an enclosure that helps it squeeze out every bit of sound possible.
Nest Audio will be available in India in two colors: Chalk and Charcoal. Its soft, rounded edges blend in with your home’s decor, and its minimal footprint doesn't take up too much space on your shelf or countertop.
We’re continuing our commitment to sustainability with Nest Audio. It’s covered in the same sustainable fabric that we first introduced with Nest Mini last year, and the enclosure (meaning the fabric, housing, foot, and a few smaller parts) is made from 70 percent recycled plastic.
Nest Audio will be available in India on Flipkart and at other retail outlets later this month. Stay tuned for more information on pricing and offers, which will be announced closer to the sale date.
Posted by Mark Spates, Product Manager, Google Nest
*currently only available in English in India
Source: Official Google India Blog
Last year, we announced that Google entered into an agreement to acquire Fitbit to help spur innovation in wearable devices and build products that help people lead healthier lives. As we continue to work with regulators to answer their questions, we wanted to share more about how we believe this deal will increase choice, and create engaging products and helpful experiences for consumers.
There's vibrant competition when it comes to smartwatches and fitness trackers, with Apple, Samsung, Garmin, Fossil, Huawei, Xiaomi and many others offering numerous products at a range of prices. We don’t currently make or sell wearable devices like these today. We believe the combination of Google and Fitbit's hardware efforts will increase competition in the sector, making the next generation of devices better and more affordable.
This deal is about devices, not data. We’ve been clear from the beginning that we will not use Fitbit health and wellness data for Google ads. We recently offered to make a legally binding commitment to the European Commission regarding our use of Fitbit data. As we do with all our products, we will give Fitbit users the choice to review, move or delete their data. And we’ll continue to support wide connectivity and interoperability across our and other companies’ products.
We appreciate the opportunity to work with the European Commission on an approach that addresses consumers' expectations of their wearable devices. We’re confident that by working closely with Fitbit’s team of experts, and bringing together our experience in AI, software and hardware, we can build compelling devices for people around the world.
Source: Google in Europe
Textiles have the potential to help technology blend into our everyday environments and objects by improving aesthetics, comfort, and ergonomics. Consumer devices have started to leverage these opportunities through fabric-covered smart speakers and braided headphone cords, while advances in materials and flexible electronics have enabled the incorporation of sensing and display into soft form factors, such as jackets, dresses, and blankets.
|A scalable interactive E-textile architecture with embedded touch sensing, gesture recognition and visual feedback.|
For insight into how this works, please see this video about E-textile microinteractions and this video about the E-textile architecture.
|E-textile microinteractions combining continuous sensing with discrete motion and grasps.|
Braiding generally refers to the diagonal interweaving of three or more material strands. While braids are traditionally used for aesthetics and structural integrity, they can also be used to enable new sensing and display capabilities.
Whereas cords can be made to detect basic touch gestures through capacitive sensing, we developed a helical sensing matrix (HSM) that enables a larger gesture space. The HSM is a braid consisting of electrically insulated conductive textile yarns and passive support yarns,where conductive yarns in opposite directions take the role of transmit and receive electrodes to enable mutual capacitive sensing. The capacitive coupling at their intersections is modulated by the user’s fingers, and these interactions can be sensed anywhere on the cord since the braided pattern repeats along the length.
A key insight is that the two axial columns in an HSM that share a common set of electrodes (and color in the diagram of the flattened matrix) are 180º opposite each other. Thus, pinching and rolling the cord activates a set of electrodes and allows us to track relative motion across these columns. Rotation detection identifies the current phase with respect to the set of time-varying sinusoidal signals that are offset by 90º. The braid allows the user to initiate rotation anywhere, and is scalable with a small set of electrodes.
|Rotation is deduced from horizontal finger motion across the columns. The plots below show the relative capacitive signal strengths, which change with finger proximity.|
This e-textile architecture makes the cord touch-sensitive, but its softness and malleability limit suitable interactions compared to rigid touch surfaces. With the unique material in mind, our design guidelines emphasize:
- Simple gestures. We design for short interactions where the user either makes a single discrete gesture or performs a continuous manipulation.
- Closed-loop feedback. We want to help the user discover functionality and get continuous feedback on their actions. Where possible, we provide visual, tactile, and audio feedback integrated in the device.
|Our e-textile enables interaction based on capacitive sensing of proximity, contact area, contact time, roll, and pressure.|
|Braided fiber optics strands create the illusion of directional motion.|
We conducted a gesture elicitation study, which showed opportunities for an expanded gesture set. Inspired by these results, we decided to investigate five motion gestures based on flicks and slides, along with single-touch gestures (pinch, grab and pat).
|Gesture elicitation study with imagined touch sensing.|
|Twelve participants (horizontal axis) performed 9 repetitions (animation) for the eight gestures (vertical axis). Each sub-image shows 16 overlaid feature vectors, interpolated to 80 observations over time.|
Notable here is that inherent relationships in the repeated sensing matrices are well-suited for machine learning classification. The ML classifier used in our research enables quick training with limited data, which makes a user-dependent interaction system reasonable. In our experience, training for a typical gesture takes less than 30s, which is comparable to the amount of time required to train a fingerprint sensor.
User-Independent, Continuous Twist: Quantifying Precision and Speed
The per-user trained gesture recognition enabled eight new discrete gestures. For continuous interactions, we also wanted to quantify how well user-independent, continuous twist performs for precision tasks. We compared our e-textile with two baselines, a capacitive multi-touch trackpad (“Scroll”) and the familiar headphone cord remote control (“Buttons”). We designed a lab study where the three devices controlled 1D movement in a targeting task.
We analysed three dependent variables for the 1800 trials, covering 12 participants and three techniques: time on task (milliseconds), total motion, and motion during end-of-trial. Participants also provided qualitative feedback through rankings and comments.
Our quantitative analysis suggests that our e-textile’s twisting is faster than existing headphone button controls and comparable in speed to a touch surface. Qualitative feedback also indicated a preference for e-textile interaction over headphone controls.
|Left: Weighted average subjective feedback. We mapped the 7-point Likert scale to a score in the range [-3, 3] and multiplied by the number of times the technique received that rating, and computed an average for all the scores. Right: Mean completion times for target distances show that Buttons were consistently slower.|
Gesture Prototypes: Headphones, Hoodie Drawstrings, and Speaker Cord
We developed different prototypes to demonstrate the capabilities of our e-textile architecture: e-textile USB-C headphones to control media playback on the phone, a hoodie drawstring to invisibly add music control to clothing, and an interactive cord for gesture controls of smart speakers.
|Left: Tap = Play/Pause; Center: Double-tap = Next track; Right: Roll = Volume +/-|
|Interactive speaker cord for simultaneous use of continuous (twisting/rolling) and discrete gestures (pinch/pat) to control music playback.|
We introduce an interactive e-textile architecture for embedded sensing and visual feedback, which can enable both precise small-scale and large-scale motion in a compact cord form factor. With this work, we hope to advance textile user interfaces and inspire the use of microinteractions for future wearable interfaces and smart fabrics, where eyes-free access and casual, compact and efficient input is beneficial. We hope that our e-textile will inspire others to augment physical objects with scalable techniques, while preserving industrial design and aesthetics.
This work is a collaboration across multiple teams at Google. Key contributors to the project include Alex Olwal, Thad Starner, Jon Moeller, Greg Priest-Dorman, Ben Carroll, and Gowa Mainini. We thank the Google ATAP Jacquard team for our collaboration, especially Shiho Fukuhara, Munehiko Sato, and Ivan Poupyrev. We thank Google Wearables, and Kenneth Albanowski and Karissa Sawyer, in particular. Finally, we would like to thank Mark Zarich for illustrations, Bryan Allen for videography, Frank Li for data processing, Mathieu Le Goc for valuable discussions, and Carolyn Priest-Dorman for textile advice.
The revolution of modern computing has been largely enabled by remarkable advances in computer systems and hardware. With the slowing of Moore’s Law and Dennard scaling, the world is moving toward specialized hardware to meet the exponentially growing demand for compute. However, today’s chips take years to design, resulting in the need to speculate about how to optimize the next generation of chips for the machine learning (ML) models of 2-5 years from now. Dramatically shortening the chip design cycle would allow hardware to adapt to the rapidly advancing field of ML. What if ML itself could provide the means to shorten the chip design cycle, creating a more integrated relationship between hardware and ML, with each fueling advances in the other?
In “Chip Placement with Deep Reinforcement Learning”, we pose chip placement as a reinforcement learning (RL) problem, where we train an agent (i.e, an RL policy) to optimize the quality of chip placements. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. Whereas existing baselines require human experts in the loop and take several weeks to generate, our method can generate placements in under six hours that outperform or match their manually designed counterparts. While we show that we can generate optimized placements for Google accelerator chips (TPUs), our methods are applicable to any kind of chip (ASIC).
The Chip Floorplanning Problem
A computer chip is divided into dozens of blocks, each of which is an individual module, such as a memory subsystem, compute unit, or control logic system. These blocks can be described by a netlist, a graph of circuit components, such as macros (memory components) and standard cells (logic gates like NAND, NOR, and XOR), all of which are connected by wires. Determining the layout of a chip block, a process called chip floorplanning, is one of the most complex and time-consuming stages of the chip design process and involves placing the netlist onto a chip canvas (a 2D grid), such that power, performance, and area (PPA) are minimized, while adhering to constraints on density and routing congestion. Despite decades of research on this topic, it is still necessary for human experts to iterate for weeks to produce solutions that meet multi-faceted design criteria. This problem’s complexity arises from the size of the netlist graph (millions to billions of nodes), the granularity of the grid onto which that graph must be placed, and the exorbitant cost of computing the true target metrics, which can take many hours (sometimes over a day) using industry-standard electronic design automation tools.
The Deep Reinforcement Learning Model
The input to our model is the chip netlist (node types and graph adjacency information), the ID of the current node to be placed, and some netlist metadata, such as the total number of wires, macros, and standard cell clusters. The netlist graph and the current node are passed through an edge-based graph neural network that we developed to encode the input state. This generates embeddings of the partially placed graph and the candidate node.
|A graph neural network generates embeddings that are concatenated with the metadata embeddings to form the input to the policy and value networks.|
In each iteration of training, the macros are sequentially placed by the RL agent, after which the standard cell clusters are placed by a force-directed method, which models the circuit as a system of springs to minimize wirelength. RL training is guided by a fast-but-approximate reward signal calculated for each of the agent’s chip placements using the weighted average of approximate wirelength (i.e., the half-perimeter wirelength, HPWL) and approximate congestion (the fraction of routing resources consumed by the placed netlist).
|During each training iteration, the macros are placed by the policy one at a time and the standard cell clusters are placed by a force-directed method. The reward is calculated from the weighted combination of approximate wirelength and congestion.|
To our knowledge, this method is the first chip placement approach that has the ability to generalize, meaning that it can leverage what it has learned while placing previous netlists to generate better placements for new unseen netlists. We show that as we increase the number of chip netlists on which we perform pre-training (i.e., as our method becomes more experienced in placement optimization), our policy better generalizes to new netlists.
For example, the pre-trained policy organically identifies an arrangement that places the macros near the edges of the chip with a convex space in the center in which to place the standard cells. This results in lower wirelength between the macros and standard cells without introducing excessive routing congestion. In contrast, the policy trained from scratch starts with random placements and takes much longer to converge to a high-quality solution, rediscovering the need to leave an opening in the center of the chip canvas. This is demonstrated in the animation below.
|Macro placements of Ariane, an open-source RISC-V processor, as training progresses. On the left, the policy is being trained from scratch, and on the right, a pre-trained policy is being fine-tuned for this chip. Each rectangle represents an individual macro placement. Notice how the cavity discovered by the from-scratch policy is already present from the outset in the pre-trained policy’s placement.|
|Convergence plots for two policies on Ariane blocks. One is training from scratch and the other is finetuning a pre-trained policy.|
|Training data size vs. fine-tuning performance.|
This project was a collaboration between Google Research and Google Hardware and Architecture teams. We would like to thank our coauthors: Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Sungmin Bae, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Anand Babu, Quoc Le, James Laudon, Roger Carpenter, Richard Ho, and Jeff Dean for their support and contributions to this work.
The ability to determine 3D information about the scene, called depth sensing, is a valuable tool for developers and users alike. Depth sensing is a very active area of computer vision research with recent innovations ranging from applications like portrait mode and AR to fundamental sensing innovations such as transparent object detection. Typical RGB-based stereo depth sensing techniques can be computationally expensive, suffer in regions with low texture, and fail completely in extreme low light conditions.
Because the Face Unlock feature on Pixel 4 must work at high speed and in darkness, it called for a different approach. To this end, the front of the Pixel 4 contains a real-time infrared (IR) active stereo depth sensor, called uDepth. A key computer vision capability on the Pixel 4, this technology helps the authentication system identify the user while also protecting against spoof attacks. It also supports a number of novel capabilities, such as after-the-fact photo retouching, depth-based segmentation of a scene, background blur, portrait effects and 3D photos.
Recently, we provided access to uDepth as an API on Camera2, using the Pixel Neural Core, two IR cameras, and an IR pattern projector to provide time-synchronized depth frames (in DEPTH16) at 30Hz. The Google Camera App uses this API to bring improved depth capabilities to selfies taken on the Pixel 4. In this post, we explain broadly how uDepth works, elaborate on the underlying algorithms, and discuss applications with example results for the Pixel 4.
Overview of Stereo Depth Sensing
All stereo camera systems reconstruct depth using parallax. To observe this effect, look at an object, close one eye, then switch which eye is closed. The apparent position of the object will shift, with closer objects appearing to move more. uDepth is part of the family of dense local stereo matching techniques, which estimate parallax computationally for each pixel. These techniques evaluate a region surrounding each pixel in the image formed by one camera, and try to find a similar region in the corresponding image from the second camera. When calibrated properly, the reconstructions generated are metric, meaning that they express real physical distances.
|Pixel 4 front sensor setup, an example of an active stereo system.|
What Makes uDepth Distinct?
Stereo sensing systems can be extremely computationally intensive, and it’s critical that a sensor running at 30Hz is low power while remaining high quality. uDepth leverages a number of key insights to accomplish this.
One such insight is that given a pair of regions that are similar to each other, most corresponding subsets of those regions are also similar. For example, given two 8x8 patches of pixels that are similar, it is very likely that the top-left 4x4 sub-region of each member of the pair is also similar. This informs the uDepth pipeline’s initialization procedure, which builds a pyramid of depth proposals by comparison of non-overlapping tiles in each image and selecting those most similar. This process starts with 1x1 tiles, and accumulates support hierarchically until an initial low-resolution depth map is generated.
After initialization, we apply a novel technique for neural depth refinement to support the regular grid pattern illuminator on the Pixel 4. Typical active stereo systems project a pseudo-random grid pattern to help disambiguate matches in the scene, but uDepth is capable of supporting repeating grid patterns as well. Repeating structure in such patterns produces regions that look similar across stereo pairs, which can lead to incorrect matches. We mitigate this issue using a lightweight (75k parameter) convolutional architecture, using IR brightness and neighbor information to adjust incorrect matches — in less than 1.5ms per frame.
|Neural depth refinement architecture.|
Finally, the best match from among neighboring plane hypotheses is selected, with subpixel refinement and invalidation if no good match could be found.
|Simplified depth architecture. Green components run on the GPU, yellow on the CPU, and blue on the Pixel Neural Core.|
|Left: Stereo depth with inaccurate calibration. Right: After autocalibration.|
Depth for Computational Photography
The raw data from the uDepth sensor is designed to be accurate and metric, which is a fundamental requirement for Face Unlock. Computational photography applications such as portrait mode and 3D photos have very different needs. In these use cases, it is not critical to achieve video frame rates, but the depth should be smooth, edge-aligned and complete in the whole field-of-view of the color camera.
|Left to right: raw depth sensing result, predicted depth, 3D photo. Notice the smooth rotation of the wall, demonstrating a continuous depth gradient rather than a single focal plane.|
|Architecture for computational photography depth enhancement.|
|Data acquisition overview.|
With all of these components in place, uDepth produces both a depth stream at 30Hz (exposed via Camera2), and smooth, post-processed depth maps for photography (exposed via Google Camera App when you take a depth-enabled selfie). The smooth, dense, per-pixel depth that our system produces is available on every Pixel 4 selfie with Social Media Depth features enabled, and can be used for post-capture effects such as bokeh and 3D photos for social media.
|Example applications. Notice the multiple focal planes in the 3D photo on the right.|
|Example single-frame, RGB point cloud from uDepth on the Pixel 4.|
This work would not have been possible without the contributions of many, many people, including but not limited to Peter Barnum, Cheng Wang, Matthias Kramm, Jack Arendt, Scott Chung, Vaibhav Gupta, Clayton Kimber, Jeremy Swerdlow, Vladimir Tankovich, Christian Haene, Yinda Zhang, Sergio Orts Escolano, Sean Ryan Fanello, Anton Mikhailov, Philippe Bouchilloux, Mirko Schmidt, Ruofei Du, Karen Zhu, Charlie Wang, Jonathan Taylor, Katrina Passarella, Eric Meisner, Vitalii Dziuba, Ed Chang, Phil Davidson, Rohit Pandey, Pavel Podlipensky, David Kim, Jay Busch, Cynthia Socorro Herrera, Matt Whalen, Peter Lincoln, Geoff Harvey, Christoph Rhemann, Zhijie Deng, Daniel Finchelstein, Jing Pu, Chih-Chung Chang, Eddy Hsu, Tian-yi Lin, Sam Chang, Isaac Christensen, Donghui Han, Speth Chang, Zhijun He, Gabriel Nava, Jana Ehmann, Yichang Shih, Chia-Kai Liang, Isaac Reynolds, Dillon Sharlet, Steven Johnson, Zalman Stern, Jiawen Chen, Ricardo Martin Brualla, Supreeth Achar, Mike Mehlman, Brandon Barbello, Chris Breithaupt, Michael Rosenfield, Gopal Parupudi, Steve Goldberg, Tim Knight, Raj Singh, Shahram Izadi, as well as many other colleagues across Devices and Services, Google Research, Android and X.
For several decades, computer processors have doubled their performance every couple of years by reducing the size of the transistors inside each chip, as described by Moore’s Law. As reducing transistor size becomes more and more difficult, there is a renewed focus in the industry on developing domain-specific architectures — such as hardware accelerators — to continue advancing computational power. This is especially true for machine learning, where efforts are aimed at building specialized architectures for neural network (NN) acceleration. Ironically, while there has been a steady proliferation of these architectures in data centers and on edge computing platforms, the NNs that run on them are rarely customized to take advantage of the underlying hardware.
Today, we are happy to announce the release of EfficientNet-EdgeTPU, a family of image classification models derived from EfficientNets, but customized to run optimally on Google’s Edge TPU, a power-efficient hardware accelerator available to developers through the Coral Dev Board and a USB Accelerator. Through such model customizations, the Edge TPU is able to provide real-time image classification performance while simultaneously achieving accuracies typically seen only when running much larger, compute-heavy models in data centers.
Using AutoML to customize EfficientNets for Edge TPU
EfficientNets have been shown to achieve state-of-the-art accuracy in image classification tasks while significantly reducing the model size and computational complexity. To build EfficientNets designed to leverage the Edge TPU’s accelerator architecture, we invoked the AutoML MNAS framework and augmented the original EfficientNet’s neural network architecture search space with building blocks that execute efficiently on the Edge TPU (discussed below). We also built and integrated a “latency predictor” module that provides an estimate of the model latency when executing on the Edge TPU, by running the models on a cycle-accurate architectural simulator. The AutoML MNAS controller implements a reinforcement learning algorithm to search this space while attempting to maximize the reward, which is a joint function of the predicted latency and model accuracy. From past experience, we know that Edge TPU’s power efficiency and performance tend to be maximized when the model fits within its on-chip memory. Hence we also modified the reward function to generate a higher reward for models that satisfy this constraint.
|Overall AutoML flow for designing customized EfficientNet-EdgeTPU models.|
When performing the architecture search described above, one must consider that EfficientNets rely primarily on depthwise-separable convolutions, a type of neural network block that factorizes a regular convolution to reduce the number of parameters as well as the amount of computations. However, for certain configurations, a regular convolution utilizes the Edge TPU architecture more efficiently and executes faster, despite the much larger amount of compute. While it is possible, albeit tedious, to manually craft a network that uses an optimal combination of the different building blocks, augmenting the AutoML search space with these accelerator-optimal blocks is a more scalable approach.
swish non-linearity and squeeze-and-excitation block, naturally leads to models that are readily ported to the Edge TPU hardware. These operations tend to improve model quality slightly, so by eliminating them from the search space, we have effectively instructed AutoML to discover alternate network architectures that may compensate for any potential loss in quality.
The neural architecture search (NAS) described above produced a baseline model, EfficientNet-EdgeTPU-S, which is subsequently scaled up using EfficientNet's compound scaling method to produce the -M and -L models. The compound scaling approach selects an optimal combination of input image resolution scaling, network width, and depth scaling to construct larger, more accurate models. The -M, and -L models achieve higher accuracy at the cost of increased latency as shown in the figure below.
|EfficientNet-EdgeTPU-S/M/L models achieve better latency and accuracy than existing EfficientNets (B1), ResNet, and Inception by specializing the network architecture for Edge TPU hardware. In particular, our EfficientNet-EdgeTPU-S achieves higher accuracy, yet runs 10x faster than ResNet-50.|
This work represents a first experiment in building accelerator-optimized models using AutoML. The AutoML-based model customization can be extended to not only a wide range of hardware accelerators, but also to several different applications that rely on neural networks.
From Cloud TPU training to Edge TPU deployment
We have released the training code and pretrained models for EfficientNet-EdgeTPU on our github repository. We employ tensorflow’s post-training quantization tool to convert a floating-point trained model to an Edge TPU-compatible integer-quantized model. For these models, the post-training quantization works remarkably well and produces only a very slight loss in accuracy (~0.5%). The script for exporting the quantized model from a training checkpoint can be found here. For an update on the Coral platform, see this post on the Google Developer’s Blog, and for full reference materials and detailed instructions, please refer to the Coral website.
Special thanks to Quoc Le, Hongkun Yu, Yunlu Li, Ruoming Pang, and Vijay Vasudevan from the Google Brain team; Bo Wu, Vikram Tank, and Ajay Nair from the Google Coral team; Han Vanholder, Ravi Narayanaswami, John Joseph, Dong Hyuk Woo, Raksit Ashok, Jason Jong Kyu Park, Jack Liu, Mohammadali Ghodrat, Cao Gao, Berkin Akin, Liang-Yun Wang, Chirag Gandhi, and Dongdong Li from the Google Edge TPU team.