
Gemini breaks new ground with a faster model, longer context, AI agents and more

AI is unlocking experiences that were not even possible a few years ago, and we’ve been hard at work reimaging Android with AI at the core, to help enable you to build a whole new class of apps. At this year’s Google I/O, we’re covering how new tools like Gemini can power building the next generations of apps on Android. Plus, we showcased a range of updates to our tools and services grounded in productivity, helping you make it faster and easier to build excellent experiences across form factors. Let’s dive in!
Gemini in Android Studio (formerly Studio Bot) is your coding companion for Android development, and thanks to your feedback since its preview at last year’s Google I/O, we’ve evolved our models, expanded to over 200 countries and territories, and brought it into the Gemini family of products. Earlier today, we previewed a number of new features coming soon, like Code suggestions, App Quality Insights that leverage Gemini, and a preview of the multi-modal inputs that are coming using Gemini 1.5 Pro. You can read more about the updates here, and make sure to check out What’s new in Android development tools.
Android provides the solution you need to build Generative AI apps. You can use our most capable models over the Cloud with the Gemini API in Google AI or Vertex AI for Firebase directly in your Android apps. For on-device, Gemini Nano is our most efficient model. We’re working closely with a few early adopters such as Patreon, Grammarly, and Adobe to ensure we’re creating the best APIs that unlock the most innovative experiences. For example, Adobe is experimenting with Gemini Nano to enhance the on-device experience of Acrobat AI Assistant, a tool that allows their users to summarize and interact with documents. Be sure to check out the Build your own generative AI powered Android app, Android on-device gen AI under the hood, and the What’s New in Android sessions to learn more!
Build and design apps that adapt beyond the phone, with the new Compose adaptive layout libraries built with Material guidance in beta. Add rich stylus and keyboard support to increase user productivity. Check out three of our key Android adaptive sessions at Google I/O: Designing adaptive apps, Building adaptive Android apps, and Increase user productivity with large screens and accessories.
Jetpack Glance 1.1 is now available in release candidate and lets you build high quality widgets using your Compose skills. Check out our new canonical layouts, design guidance and figma updates to the Android UI kit. To learn more check out our Improve the user experience of your Android app workshop and Build Android widgets with Jetpack Glance technical session.
We’ll continue to share more updates for Android Developers throughout Google I/O, so check back here tomorrow!
Kotlin Multiplatform (KMP) enables sharing Kotlin code across different platforms and several of our Jetpack libraries, like DataStore and Room, have already been migrated to take advantage of KMP. We use Kotlin Multiplatform within Google and recommend using KMP for sharing business logic between platforms. Learn more about it here.
The upcoming Compose June ‘24 release is packed with the features you’ve been asking for! Shared element transitions, lazy list item reordering animations, strong skipping mode, performance improvements, a new lazy flow layout and more. Read more about it in our blog.
Android Studio Koala 🐨Feature Drop (2024.1.2) available today in the canary channel, builds on top of IntelliJ 2024.1 and adds new innovative features unlocked by Gemini, such as insights for crashes in App Quality Insights, code transformations and a Gemini API starter template to get you quickly started with Gemini. Additionally, new features such as USB speed detection, shortcut UI to control device settings, a new way to sign into Google services, updated and speedier UI for profilers with a new task centric approach and a deep integration with the Google Play SDK index are intended to make the development process extremely productive. Read more here.
Discover new ways to attract and engage users with enhanced custom store listings. Optimize revenue with expanded payment options. Reinforce trust through secure, high-quality experiences made easier with our latest SDK Console improvements. Learn about these updates and more, including our new vertical approach, in our blog.
Streamline your app's privacy compliance with Checks, Google's AI-powered compliance solution! Checks empowers developers to swiftly identify, address, resolve privacy issues, and enables you to launch apps faster and with confidence. Harness the power of automation with Checks' intelligent reports, saving you valuable time and resources. Get started now at checks.google.com.
…but for that, you’ll have to stay tuned tomorrow, when we’ve got a bit more up our sleeve!
As AI models grow increasingly complex and compute-intensive, the need for efficient, scalable, and hardware-agnostic infrastructure has never been greater. OpenXLA is a deep learning compiler framework that makes it easy to speed up and massively scale AI models on a wide range of hardware types—from GPUs and CPUs to specialized chips like Google TPUs and AWS Trainium. It is compatible with popular modeling frameworks—JAX, PyTorch, and TensorFlow—and delivers leading performance. OpenXLA is the acceleration infrastructure of choice for global-scale AI-powered products like Amazon.com Search, Google Gemini, Waymo self-driving vehicles, and x.AI's Grok.
On April 25th, the OpenXLA Dev Lab played host to over 100 expert ML practitioners from 10 countries, representing industry leaders like AMD, Arm, AWS, ByteDance, Cerebras, Cruise, Google, NVIDIA, Intel, Tesla, SambaNova, and more. The full-day event, tailored to AI hardware vendors and infrastructure engineers, broke the mold of previous OpenXLA Summits by focusing purely on “Lab Sessions”, akin to office hours for developers, and hands-on Tutorials. The energy of the event was palpable as developers worked side-by-side, learning and collaborating on both practical challenges and exciting possibilities for AI infrastructure.
![]() |
Figure 1: Developers from around the world congregated at the OpenXLA Dev Lab. |
The Dev Lab was all about three key things:
The Tutorials included:
Led by Jieying Luo and Skye Wanderman-Milne
Led by Kevin Gleason Jen Ha, and Xing Liu
Led by Yeounoh Chung and Pratik Fegade
Led by Frederik Gossen, TJ Xu, and Abhinav Goel
Lab Sessions featured use case-specific office hours for AMD, Arm, AWS, ByteDance, Intel, NVIDIA, SambaNova, Tesla, and more. OpenXLA engineers were on hand to provide development teams with dedicated support and walkthrough specific pain points and designs. In addition, Informational Roundtables that covered broader topics like GPU ML Performance Optimization, JAX, and PyTorch-XLA GPU were available for those without specific use cases. This approach led to productive exchanges and fine-grained exploration of critical contribution areas for ML hardware vendors.
![]() |
Don’t just take our word for it – here’s some of the feedback we received from developers:
"PJRT is awesome, we're looking forward to building with it. We are very grateful for the support we are getting."
"Today I learned a lot about Shardonnay and about some of the bugs I found in the GSPMD partitioner, and I got to learn a lot of cool stuff."
“I learned a lot, a lot about how XLA is making tremendous progress in building their community.”
“Loved the format this year - please continue … lots of learning, lots of interactive sessions. It was great!”
The event kicked off with a keynote by Robert Hundt, Distinguished Engineer at Google, who outlined OpenXLA's ambitious plans for 2024, particularly three major areas of focus:,/p>
OpenXLA is introducing powerful features to enable model training at record-breaking scales. One of the most notable additions is Shardonnay, a tool coming soon to OpenXLA that automates and optimizes how large AI workloads are divided across multiple processing units, ensuring efficient use of resources and faster time to solution. Building on the success of its predecessor, SPMD, Shardonnay empowers developers with even more fine-grained control over partitioning decisions, all while maintaining the productivity benefits that SPMD is known for.
![]() |
Figure 2: Sharding representation example with a simple rank 2 tensor and 4 devices. |
In addition to Shardonnay, developers can expect a suite of features designed to optimize computation and communication overlap, including:
These innovations will enable developers to push the boundaries of large-scale training and achieve unprecedented performance and efficiency.
OpenXLA has also made significant strides in enhancing performance, particularly on GPUs with key PyTorch-based generative AI models. PyTorch-XLA GPU is now neck and neck with TorchInductor for TorchBench Full Graph Models and has a TorchBench pass rate within 5% of TorchInductor.
![]() |
Figure 3: Performance comparison of TorchInductor vs. PyTorch-XLA GPU on Google Cloud NVIDIA H100 GPUs. “Full graph models” represent all TorchBench models that can be fully represented by StableHLO |
Behind these impressive gains lies XLA GPU's global cost model, a game-changer for developers. In essence, this cost model acts as a sophisticated decision-making system, intelligently determining how to best optimize computations for specific hardware. The cost model delivers state-of-the-art performance through a priority-based queue for fusion decisions and is highly extensible, allowing third-party developers to seamlessly integrate their backend infrastructure for both general-purpose and specialized accelerators. The cost model's adaptability ensures that computation optimizations are tailored to specific accelerator architectures, while less suitable computations can be offloaded to the host or other accelerators.
OpenXLA is also breaking new ground with novel kernel programming languages, Pallas and Mosaic, which empower developers to write highly optimized code for specialized hardware. Mosaic demonstrates remarkable efficiency in programming key AI accelerators, surpassing widely used libraries in GPU code generation efficiency for models with 64, 128, and 256 Q head sizes, as evidenced by its enhanced utilization of TensorCores.
![]() |
Figure 4: Performance comparison of Flash Attention vs. Mosaic GPU on NVIDIA H100 GPUs. |
In addition to performance enhancements, OpenXLA is committed to making the entire stack more modular and extensible. Several initiatives planned for 2024 include:
![]() |
Figure 5: Modules and subcomponents of the OpenXLA stack. |
These improvements will make it easier for developers to build upon and extend OpenXLA.
Alibaba's success with PyTorch XLA FSDP within their TorchAcc framework is a prime example of the benefits of OpenXLA's modularity and extensibility. By leveraging these features, Alibaba achieved state-of-the-art performance for the LLaMa 2 13B model, surpassing the previous benchmark set by Megatron. This demonstrates the power of the developer community in extending OpenXLA to push the boundaries of AI development.
![]() |
Figure 6: Performance comparison of TorchAcc and Megatron for LLaMa 2 13B at different numbers of GPUs. |
If you missed the Dev Lab, don't worry! You can still access StableHLO walkthroughs on openxla.org, as well as the GitHub Gist for the PJRT session. Additionally, the recorded keynote and tutorials are available on our YouTube channel. Explore these resources and join our global community – whether you're an AI systems expert, model developer, student, or just starting out, there's a place for you in our innovative ecosystem.
![]() |
Adam Paske, Allen Hutchison, Amin Vahdat, Andrew Leaver, Andy Davis, Artem Belevich, Abhinav Goel, Benjamin Kramer, Berkin Illbeyi, Bill Jia, Eugene Zhulenev, Florian Reichl, Frederik Gossen, George Karpenkov, Gunhyun Park, Han Qi, Jack Cao, Jaesung Chung, Jen Ha, Jianting Cao, Jieying Luo, Jiewin Tan, Jini Khetan, Kevin Gleason, Kyle Lucke, Kuy Mainwaring, Lauren Clemens, Manfei Bai, Marisa Miranda, Michael Levesque-Dion, Milad Mohammadi, Nisha Miriam Johnson, Penporn Koanantakool, Robert Hundt, Sandeep Dasgupta, Sayce Falk, Shauheen Zahirazami, Skye Wanderman-milne, Yeounoh Chung, Pratik Fegade, Peter Hawkins, Vaibhav Singh, Tamás Danyluk, Thomas Jeorg, Adam Paszke and TJ Xu.
By James Rubin, Aditi Joshi, and Elliot English on behalf of the OpenXLA Project