Author Archives: Google Developers

Generative AI ‘Food Coach’ that pairs food with your mood

Posted by Avneet Singh, Product Manager, Google PI

Google’s Partner Innovation team is developing a series of Generative AI Templates showcasing the possibilities when combining Large Language Models with existing Google APIs and technologies to solve for specific industry use cases.

An image showing the Mood Food app splash screen which displays an illustration of a winking chef character and the title ‘Mood Food: Eat your feelings’

Overview

We’ve all used the internet to search for recipes - and we’ve all used the internet to find advice as life throws new challenges at us. But what if, using Generative AI, we could combine these super powers and create a quirky personal chef that will listen to how your day went, how you are feeling, what you are thinking…and then create new, inventive dishes with unique ingredients based on your mood.

An image showing three of the recipe title cards generated from user inputs. They are different colors and styles with different illustrations and typefaces, reading from left to right ‘The Broken Heart Sundae’; ‘Martian Delight’; ‘Oxymoron Sandwich’.

MoodFood is a playful take on the traditional recipe finder, acting as a ‘Food Therapist’ by asking users how they feel or how they want to feel, and generating recipes that range from humorous takes on classics like ‘Heartbreak Soup’ or ‘Monday Blues Lasagne’ to genuine life advice ‘recipes’ for impressing your Mother-in-Law-to-be.

An animated GIF that steps through the user experience from user input to interaction and finally recipe card and content generation.

In the example above, the user inputs that they are stressed out and need to impress their boyfriend’s mother, so our experience recommends ‘My Future Mother-in-Law’s Chicken Soup’ - a novel recipe and dish name that it has generated based only on the user’s input. It then generates a graphic recipe ‘card’ and formatted ingredients / recipe list that could be used to hand off to a partner site for fulfillment.

Potential Use Cases are rooted in a novel take on product discovery. Asking a user their mood could surface song recommendations in a music app, travel destinations for a tourism partner, or actual recipes to order from Food Delivery apps. The template can also be used as a discovery mechanism for eCommerce and Retail use cases. LLMs are opening a new world of exploration and possibilities. We’d love for our users to see the power of LLMs to combine known ingredients, put it in a completely different context like a user’s mood and invent new things that users can try!


Implementation

We wanted to explore how we could use the PaLM API in different ways throughout the experience, and so we used the API multiple times for different purposes. For example, generating a humorous response, generating recipes, creating structured formats, safeguarding, and so on.

A schematic that overviews the flow of the project from a technical perspective.

In the current demo, we use the LLM four times. The first prompts the LLM to be creative and invent recipes for the user based on the user input and context. The second prompt formats the responses json. The third prompt ensures the naming is appropriate as a safeguard. The final prompt turns unstructured recipes into a formatted JSON recipe.

One of the jobs that LLMs can help developers is data formatting. Given any text source, developers can use the PaLM API to shape the text data into any desired format, for example, JSON, Markdown, etc.

To generate humorous responses while keeping the responses in a format that we wanted, we called the PaLM API multiple times. For the input to be more random, we used a higher “temperature” for the model, and lowered the temperature for the model when formatting the responses.

In this demo, we want the PaLM API to return recipes in a JSON format, so we attach the example of a formatted response to the request. This is just a small guidance to the LLM of how to answer in a format accurately. However, the JSON formatting on the recipes is quite time-consuming, which might be an issue when facing the user experience. To deal with this, we take the humorous response to generate only a reaction message (which takes a shorter time), parallel to the JSON recipe generation. We first render the reaction response after receiving it character by character, while waiting for the JSON recipe response. This is to reduce the feeling of waiting for a time-consuming response.

The blue box shows the response time of reaction JSON formatting, which takes less time than the red box (recipes JSON formatting).

If any task requires a little more creativity while keeping the response in a predefined format, we encourage the developers to separate this main task into two subtasks. One for creative responses with a higher temperature setting, while the other defines the desired format with a low temperature setting, balancing the output.


Prompting

Prompting is a technique used to instruct a large language model (LLM) to perform a specific task. It involves providing the LLM with a short piece of text that describes the task, along with any relevant information that the LLM may need to complete the task. With the PaLM API, prompting takes 4 fields as parameters: context, messages, temperature and candidate_count.

  • The context is the context of the conversation. It is used to give the LLM a better understanding of the conversation.
  • The messages is an array of chat messages from past to present alternating between the user (author=0) and the LLM (author=1). The first message is always from the user.
  • The temperature is a float number between 0 and 1. The higher the temperature, the more creative the response will be. The lower the temperature, the more likely the response will be a correct one.
  • The candidate_count is the number of responses that the LLM will return.

In Mood Food, we used prompting to instruct PaLM API. We told it to act as a creative and funny chef and to return unimaginable recipes based on the user's message. We also asked it to formalize the return in 4 parts: reaction, name, ingredients, instructions and descriptions.

  • Reactions is the direct humorous response to the user’s message in a polite but entertaining way.
  • Name: recipe name. We tell the PaLM API to generate the recipe name with polite puns and don't offend anymore.
  • Ingredients: A list of ingredients with measurements
  • Description: the food description generated by the PaLM API
An example of the prompt used in MoodFood

Third Party Integration

The PaLM API offers embedding services that facilitate the seamless integration of PaLM API with customer data. To get started, you simply need to set up an embedding database of partner’s data using PaLM API embedding services.

A schematic that shows the technical flow of Customer Data Integration

Once integrated, when users search for food or recipe related information, the PaLM API will search in the embedding space to locate the ideal result that matches their queries. Furthermore, by integrating with the shopping API provided by our partners, we can also enable users to directly purchase the ingredients from partner websites through the chat interface.


Partnerships

Swiggy, an Indian online food ordering and delivery platform, expressed their excitement when considering the use cases made possible by experiences like MoodFood.

“We're excited about the potential of Generative AI to transform the way we interact with our customers and merchants on our platform. Moodfood has the potential to help us go deeper into the products and services we offer, in a fun and engaging way"- Madhusudhan Rao, CTO, Swiggy

Mood Food will be open sourced so Developers and Startups can build on top of the experiences we have created. Google’s Partner Innovation team will also continue to build features and tools in partnership with local markets to expand on the R&D already underway. View the project on GitHub here.


Acknowledgements

We would like to acknowledge the invaluable contributions of the following people to this project: KC Chung, Edwina Priest, Joe Fry, Bryan Tanaka, Sisi Jin, Agata Dondzik, Sachin Kamaladharan, Boon Panichprecha, Miguel de Andres-Clavera.

PaLM API & MakerSuite moving into public preview

Posted by Barnaby James, Director, Engineering, Google Labs and Simon Tokumine, Director, Product Management, Google Labs

At Google I/O, we showed how PaLM 2, our next generation model, is being used to improve products across Google. Today, we’re making PaLM 2 available to developers so you can build your own generative AI applications through the PaLM API and MakerSuite. If you’re a Google Cloud customer, you can also use PaLM API in Vertex AI.


The PaLM API, now powered by PaLM 2

We’ve instruction-tuned PaLM 2 for ease of use by developers, unlocking PaLM 2’s improved reasoning and code generation capabilities and enabling developers to easily use the PaLM API for use cases like content and code generation, dialog agents, summarization, classification, and more using natural language prompting. It’s highly efficient, thanks to its new model architecture improvements, so it can handle complex prompts and instructions which, when combined with our TPU technologies, enable speeds as high as 75+ tokens per second and 8k context windows.

Integrating the PaLM API into the developer ecosystem

Since March, we've been running a private preview with the PaLM API, and it’s been amazing to see how quickly developers have used it in their applications. Here are just a few:

  • GameOn Technology has used the chat endpoint to build their next-gen chat experience to bring fans together and summarize live sporting events
  • Vercel has been using the text endpoint to build a video title generator
  • Wendy’s has used embeddings so customers can place the correct order with their talk-to-menu feature

We’ve also been excited by the response from the developer tools community. Developers want choice in language models, and we're working with a range of partners to be able to access the PaLM API in the common frameworks, tools, and services that you’re using. We’re also making the PaLM API available in Google developer tools, like Firebase and Colab.

Image of logos of PaLM API partners including Baseplate, Gradient, Hubble, Magick, Stack, Vellum, Vercel, Weaviate. Text reads, 'Integrated into Google tools you already use' Blelow this is the Firebase logo
The PaLM API and MakerSuite make it fast and easy to use Google’s large language models to build innovative AI applications

Build powerful prototypes with the PaLM API and MakerSuite

The PaLM API and MakerSuite are now available for public preview. For developers based in the U.S., you can access the documentation and sign up to test your own prototypes at no cost. We showed two demos at Google I/O to give you a sense of how easy it is to get started building generative AI applications.

Image of logos of PaLM API partners including Baseplate, Gradient, Hubble, Magick, Stack, Vellum, Vercel, Weaviate. Text reads, 'Integrated into Google tools you already use' Blelow this is the Firebase logo
We demoed Project Tailwind at Google I/O 2023, an AI-first notebook that helps you learn faster using your notes and sources

Project Tailwind is an AI-first notebook that helps you learn faster by using your personal notes and sources. It’s a prototype that was built with the PaLM API by a core team of five engineers at Google in just a few weeks. You simply import your notes and documents from Google Drive, and it essentially creates a personalized and private AI model grounded in your sources. From there, you can prompt it to learn about anything related to the information you’ve provided it. You can sign up to test it now.

Image of logos of PaLM API partners including Baseplate, Gradient, Hubble, Magick, Stack, Vellum, Vercel, Weaviate. Text reads, 'Integrated into Google tools you already use' Blelow this is the Firebase logo
MakerSuite was used to help create the descriptions in I/O FLIP

I/O FLIP is an AI-designed take on a classic card game where you compete against opposing players with AI-generated cards. We created millions of unique cards for the game using DreamBooth, an AI technique invented in Google Research, and then populated the cards with fun descriptions. To build the descriptions, we used MakerSuite to quickly experiment with different prompts and generate examples. You can play I/O FLIP and sign up for MakerSuite now.

Over the next few months, we’ll keep expanding access to the PaLM API and MakerSuite. Please keep sharing your feedback on the #palm-api channel on the Google Developer Discord. Whether it’s helping generate code, create content, or come up with ideas for your app or website, we want to help you be more productive and creative than ever before.

Introducing MediaPipe Solutions for On-Device Machine Learning

Posted by Paul Ruiz, Developer Relations Engineer & Kris Tonthat, Technical Writer

MediaPipe Solutions is available in preview today

This week at Google I/O 2023, we introduced MediaPipe Solutions, a new collection of on-device machine learning tools to simplify the developer process. This is made up of MediaPipe Studio, MediaPipe Tasks, and MediaPipe Model Maker. These tools provide no-code to low-code solutions to common on-device machine learning tasks, such as audio classification, segmentation, and text embedding, for mobile, web, desktop, and IoT developers.

image showing a 4 x 2 grid of solutions via MediaPipe Tools

New solutions

In December 2022, we launched the MediaPipe preview with five tasks: gesture recognition, hand landmarker, image classification, object detection, and text classification. Today we’re happy to announce that we have launched an additional nine tasks for Google I/O, with many more to come. Some of these new tasks include:

  • Face Landmarker, which detects facial landmarks and blendshapes to determine human facial expressions, such as smiling, raised eyebrows, and blinking. Additionally, this task is useful for applying effects to a face in three dimensions that matches the user’s actions.
moving image showing a human with a racoon face filter tracking a range of accurate movements and facial expressions
  • Image Segmenter, which lets you divide images into regions based on predefined categories. You can use this functionality to identify humans or multiple objects, then apply visual effects like background blurring.
moving image of two panels showing a person on the left and how the image of that person is segmented into rergions on the right
  • Interactive Segmenter, which takes the region of interest in an image, estimates the boundaries of an object at that location, and returns the segmentation for the object as image data.
moving image of a dog  moving around as the interactive segmenter identifies boundaries and segments

Coming soon

  • Image Generator, which enables developers to apply a diffusion model within their apps to create visual content.
moving image showing the rendering of an image of a puppy among an array of white and pink wildflowers in MediaPipe from a prompt that reads, 'a photo realistic and high resolution image of a cute puppy with surrounding flowers'
  • Face Stylizer, which lets you take an existing style reference and apply it to a user’s face.
image of a 4 x 3 grid showing varying iterations of a known female and male face acrosss four different art styles

MediaPipe Studio

Our first MediaPipe tool lets you view and test MediaPipe-compatible models on the web, rather than having to create your own custom testing applications. You can even use MediaPipe Studio in preview right now to try out the new tasks mentioned here, and all the extras, by visiting the MediaPipe Studio page.

In addition, we have plans to expand MediaPipe Studio to provide a no-code model training solution so you can create brand new models without a lot of overhead.

moving image showing Gesture Recognition in MediaPipe Studio

MediaPipe Tasks

MediaPipe Tasks simplifies on-device ML deployment for web, mobile, IoT, and desktop developers with low-code libraries. You can easily integrate on-device machine learning solutions, like the examples above, into your applications in a few lines of code without having to learn all the implementation details behind those solutions. These currently include tools for three categories: vision, audio, and text.

To give you a better idea of how to use MediaPipe Tasks, let’s take a look at an Android app that performs gesture recognition.

moving image showing Gesture Recognition across a series of hand gestures in MediaPipe Studio including closed fist, victory, thumb up, thumb down, open palm and i love you.

The following code will create a GestureRecognizer object using a built-in machine learning model, then that object can be used repeatedly to return a list of recognition results based on an input image:

// STEP 1: Create a gesture recognizer val baseOptions = BaseOptions.builder() .setModelAssetPath("gesture_recognizer.task") .build() val gestureRecognizerOptions = GestureRecognizerOptions.builder() .setBaseOptions(baseOptions) .build() val gestureRecognizer = GestureRecognizer.createFromOptions( context, gestureRecognizerOptions) // STEP 2: Prepare the image val mpImage = BitmapImageBuilder(bitmap).build() // STEP 3: Run inference val result = gestureRecognizer.recognize(mpImage)

As you can see, with just a few lines of code you can implement seemingly complex features in your applications. Combined with other Android features, like CameraX, you can provide delightful experiences for your users.

Along with simplicity, one of the other major advantages to using MediaPipe Tasks is that your code will look similar across multiple platforms, regardless of the task you’re using. This will help you develop even faster as you can reuse the same logic for each application.


MediaPipe Model Maker

While being able to recognize and use gestures in your apps is great, what if you have a situation where you need to recognize custom gestures outside of the ones provided by the built-in model? That’s where MediaPipe Model Maker comes in. With Model Maker, you can retrain the built-in model on a dataset with only a few hundred examples of new hand gestures, and quickly create a brand new model specific to your needs. For example, with just a few lines of code you can customize a model to play Rock, Paper, Scissors.

image showing 5 examples of the 'paper' hand gesture in the top row and 5 exaples of the 'rock' hand gesture on the bottom row

from mediapipe_model_maker import gesture_recognizer # STEP 1: Load the dataset. data = gesture_recognizer.Dataset.from_folder(dirname='images') train_data, validation_data = data.split(0.8) # STEP 2: Train the custom model. model = gesture_recognizer.GestureRecognizer.create( train_data=train_data, validation_data=validation_data, hparams=gesture_recognizer.HParams(export_dir=export_dir) ) # STEP 3: Evaluate using unseen data. metric = model.evaluate(test_data) # STEP 4: Export as a model asset bundle. model.export_model(model_name='rock_paper_scissor.task')

After retraining your model, you can use it in your apps with MediaPipe Tasks for an even more versatile experience.

moving image showing Gesture Recognition in MediaPipe Studio recognizing rock, paper, and scissiors hand gestures

Getting started

To learn more, watch our I/O 2023 sessions: Easy on-device ML with MediaPipe, Supercharge your web app with machine learning and MediaPipe, and What's new in machine learning, and check out the official documentation over on developers.google.com/mediapipe.


What’s next?

We will continue to improve and provide new features for MediaPipe Solutions, including new MediaPipe Tasks and no-code training through MediaPipe Studio. You can also keep up to date by joining the MediaPipe Solutions announcement group, where we send out announcements as new features are available.

We look forward to all the exciting things you make, so be sure to share them with @googledevs and your developer communities!

Google I/O 2023 recap: Updates across mobile, web, AI, and cloud

Posted by Jeanine Banks, VP & General Manager of Developer X & Head of Developer Relations

Thank you for another great Google I/O! We’re continuing to make deep investments across AI, mobile, web, and the cloud to make your life easier as a developer. Today you saw many of the ways we’re using generative AI to improve our products. We’re excited about the opportunities these tools can unlock and to see what you build. From simplifying your end to end workflows to improving productivity, catch up on key announcements below.


AI

Making it possible for everyone to build AI-powered products in the most productive and responsible way.

PaLM API and MakerSuite
Build generative AI applications with access to Google’s state-of-the-art large language model through the PaLM API. Quickly create and prototype prompts directly in your browser with MakerSuite — no machine learning expertise or coding required. 
Firebase AI extensions
Developers can now access the PaLM API with Firebase Extensions. The new Chatbot with PaLM API extension allows you to add a chat interface for continuous dialog, text summarization, and more.
MediaPipe Studio and solutions
MediaPipe is an open source cross-platform framework for building machine learning solutions on mobile, desktop, and the web. You can try nine new solutions, like a face landmarker, running locally on-device in the browser with MediaPipe Studio. 
Tools across your workflow
From datasets and pre-trained models with Kaggle to easy-to-use modular libraries for computer vision and natural language processing with KerasCV and KerasNLP, we’re proud to power end-to-end experiences with a diverse set of tools across your workflow.


Mobile

Increase productivity with the power of AI, build for a multi-device world, and do more faster with Modern Android Development.

Studio Bot
We’re introducing Studio Bot, an AI-powered conversational experience in Android Studio which makes you more productive. This is an early experiment that helps you write and debug code, and answers your Android development questions.
Going big on Android foldables & tablets
With two new Android devices coming from Pixel - the Pixel Fold and the Pixel Tablet, Google and our partners are all in on large screens; it's a great time to invest, with improved tools and guidance like the new Pixel Fold and Pixel Tablet emulator configurations in Android Studio Hedgehog Canary 3, expanded Material design updates, and inspiration for gaming and creativity apps.
Wear OS: Watch faces, Wear OS 4, & Tiles animations
Wear OS active devices have grown 5x since launching Wear OS 3, so there’s more reason to build a great app experience for the wrist. To help you on your way, we announced the new Watch Face Format, a new declarative XML format built in partnership with Samsung to help you bring your great idea to the watch face market.
Modern Android Development
Several updates to Jetpack Compose make it easier to build rich UIs across more surfaces like Compose for TV in alpha and screen widgets with Glance, now in beta. Meanwhile, the new features in Android Studio help you stay productive, including added functionality in App Quality Insights and more.
Flutter 3.10
Tap into Impeller for enhanced graphics performance. The latest version of Flutter now includes a JNI bridge to Jetpack libraries written in Kotlin, enabling you to call a new Jetpack library directly from Dart without needing an external plugin.
Geospatial Creator
Easily design and publish AR content with the new Geospatial Creator powered by ARCore and 3D maps from Google Maps Platform. Geospatial Creator is available in Unity or Adobe Aero.
 

Web

Experience a more powerful and open web, made easier and AI-ready.

WebAssembly (aka WASM) - managed memory language support
WASM now supports Kotlin and Dart, extending its benefit of reaching new customers on the web with native performance while reusing existing code, to Android and Flutter developers.
WebGPU
This newly available API unlocks the power of GPU hardware and makes the web AI-ready. Save money, increase speed, and build privacy-preserving AI features with access to on device computing power.
Support for web frameworks
Chrome DevTools has improved debugging for various frameworks. Firebase Hosting is also expanding experimental support to Nuxt, Flutter, and many more. Angular v16, includes better server side rendering, hydration, Signals, and more. Last, Flutter 3.10 reduces load time for web apps and integrates with existing web components.
Baseline
We introduced Baseline, a stable and predictable view of the web, alongside browser vendors in the W3C and framework providers. Baseline captures an evergreen set of cross-browser features and will be updated every year.
 

Cloud

New generative AI cloud capabilities open the door for developers with all different skill levels to build enterprise-ready applications.

Duet AI
Duet AI is a new generative AI-powered interface that acts as your expert pair programmer, providing assistance within Cloud Workstations, Cloud Console, and Chat. It will also allow you to call Google trained models and custom code models, trained directly on your code.
Vertex AI
Vertex AI lets you tune, customize, and deploy foundation models with simple prompts, no ML expertise required. Now you can access foundational models like Imagen 2, our text-to-image foundation model, with enterprise-grade security and governance controls.
Text Embeddings API
This new API endpoint lets developers build recommendation engines, classifiers, question-answering systems, similarity matching, and other sophisticated applications based on semantic understanding of text or images.
Workspace additions
New Chat APIs in Google Workspace will help you build apps that provide link previews and let users create or update records, generally available in the coming weeks. And coming to Preview this summer, new Google Meet APIs and two new SDKs will enable Google Meet and its data capabilities in your apps.
 

And that’s a wrap

These are just a few highlights of a number of new tools and technologies we announced today to help developers more easily harness the power of AI, and to more easily create applications for a variety of form factors and platforms. And we’re not done yet. Visit the Google I/O website to find over 200 sessions and other learning material, and connect with Googlers and fellow developers in I/O Adventure Chat.

We’re also excited to come to you with four Google I/O Connect events, which will bring Google experts and developers together for hands-on demos, code labs, office hours, and more. In addition, you can join one of the more than 250 I/O Extended meetups taking place across the globe over the next few months. We can’t wait to see what you will build next!

New Resources to Build with Google AI

Posted by Jaimie Hwang, ML Product Marketing and Danu Mbanga, ML Product Management

Today's development environment is only getting more complex and as machine learning becomes increasingly integrated with mobile, web, and cloud platforms, developers are looking for clear pathways to cut through this growing complexity. To help developers of all levels, we've unified our machine learning products, tools, and guidance on Google AI, so you can spend less time searching, and more time building AI solutions.

Whether you are looking for a pre-trained dataset, the latest in Generative AI, or tracking the latest announcements from Google I/O, you’ll be able to find it on ai.google/build/machinelearning. It’s a single destination for building AI solutions, no matter where you are in your machine learning workflow, or where your models are deployed.

cropped screenshot of Google AI landing page

Toolkits: end-to-end guidance

We're taking it one step further with new toolkits that provide you end-to-end guidance to build the latest AI solutions. These toolkits combine many of our products, many of which are open source, alongside a walkthrough so you can learn best practices and implement code. Check out how to build a text classifier using Keras or how you can take a large language model and shrink it to run on Android using Keras and TensorFlow Lite. And we are just getting started. These are our first two toolkits but have many more to come soon – so stay tuned!

moving image of finding a toolkit on Google AI to build an LLM on Android and Keras and TensorFlow Lite
Toolkit to build a LLM on Android with Keras and TensorFlow Lite

Whether you're just starting out with machine learning or you're an experienced developer looking for the latest tools and resources, Google AI has the resources you need to build AI solutions. Visit ai.google/build/machinelearning today to learn more.

Bringing Kotlin to the Web

Posted by Vivek Sekhar, Product Manager

This post describes early experimental work from JetBrains and Google. You can learn more in the session on WebAssembly at Google I/O 2023.

Application developers want to reach as many users on as many platforms as they can. Until now, that goal has meant building an app on each of Android, iOS and the Web, as well as building the backend servers and infrastructure to power them.

Image showing infrastructure of Web, Android, and iOS Apps in relation to backend servers and programming support - JavaScript, Kotlin, and Swift respectively

To reduce effort, some developers use multiplatform languages and frameworks to develop their app's business logic and UI. Bringing these multiplatform apps to the Web has previously meant "compiling" shared application code to a slower JavaScript version that can run in the browser. Instead, developers often rewrite their apps in JavaScript, or simply direct Web users to download their native mobile apps.

The Web community is developing a better alternative: direct Web support for modern languages thanks to a new technology called WebAssembly GC. This new Web feature allows cross-platform code written in supported languages to run with near-native performance inside all major browsers.

We're excited to roll-out experimental support for this new capability on the Web for Kotlin, unlocking new code sharing opportunities with faster performance for Android and Web developers.


Kotlin Multiplatform Development on the Web

Kotlin is a productive and powerful language used in 95% of the top 1,000 Android apps. Developers say they are more productive and produce fewer bugs after switching to Kotlin.

The Kotlin Multiplatform Mobile and Compose Multiplatform frameworks from JetBrains help developers share code between their Android and iOS apps. These frameworks now offer experimental support for Kotlin compilation to WebAssembly. Early experiments indicate Kotlin code runs up to 2x faster on the Web using WebAssembly instead of JavaScript.

Image showing infrastructure of Web, Android, and iOS Apps in relation to backend servers and programming support - JavaScript, Kotlin, and Swift respectively

JetBrains shares more details in the release notes for version 1.18.20 of their K2 compiler, as well as documentation on how you can try Kotlin/Wasm with your app.


Pulling it off

Bringing modern mobile languages like Kotlin to the Web required solving challenging technical problems like multi-language garbage collection and JavaScript interoperability. You can learn more in the session on new WebAssembly languages from this year's Google I/O conference.

This work wouldn't have been possible without an open collaboration between browser vendors, academics, and service providers across the Web as part of the W3C WebAssembly Community Group. In the coming weeks, we'll share technical details about this innovative work on the V8 Blog.


Looking ahead: Web and Native Development

For decades, developers have dreamed of the Web as a kind of "universal runtime," while at the same time acknowledging certain feature or performance gaps relative to native platforms. Developers have long had to switch between working on the Web or their native mobile apps.

However, we want to make it possible for you to work on the Web and your native experiences together, not only to help you reduce effort, but also to help you tap into the Web's unique superpowers.

On the open web, your app is just a click away from new users, who can discover it and share it just as easily as they share a web page, with no app stores getting in the way and no revenue split affecting your profitability.

The productivity of cross-platform development, the performance of native mobile apps and the openness of the web. That's why we love WebAssembly.

We can't wait to see what you build next!


"The productivity of cross-platform development, the performance of native mobile apps, and the openness of the Web."

Create world-scale augmented reality experiences in minutes with Google’s Geospatial Creator

Posted by Stevan Silva, Senior Product Manager

ARCore, our augmented reality developer platform, provides developers and creators alike with simple yet powerful tools to build world-scale and room-scale immersive experiences on 1.4 billion Android devices.

Since last year, we have extended coverage of the ARCore Geospatial API from 87 countries to over 100 countries provided by Google’s Visual Positioning System and the expansion of Street View coverage, helping developers build and publish more transformative and robust location-based, immersive experiences. We continue to push the boundaries of introducing helpful applications and delightful new world-scale use cases, whether it's the innovative hackathon submissions from the ARCore Geospatial API Challenge or our partnership with Gorillaz, where we transformed Times Square and Piccadilly Circus into a music stage to witness Gorillaz play in a larger-than-life immersive performance.

One thing we’ve consistently heard from you over the past year is to broaden access to these powerful resources and ensure anyone can create, visualize, and deploy augmented reality experiences around the world.

Introducing Geospatial Creator


Today, we are launching Geospatial Creator, a tool that helps anyone easily visualize, design, and publish world-anchored immersive content in minutes straight from platforms you already know and love — Unity or Adobe Aero.

Easily visualize, create, and publish augmented reality experiences with Geospatial Creator in Unity (left) and Adobe Aero (right)

Geospatial Creator, powered by ARCore and Photorealistic 3D Tiles from Google Maps Platform, enables developers and creators to easily visualize where in the real-world they want to place their digital content, similar to how Google Earth or Google Street View visualize the world. Geospatial Creator also includes new capabilities, such as Rooftop anchors, to make it even easier to anchor virtual content with the 3D Tiles, saving developers and creators time and effort in the creation process.

These tools help you build world-anchored, cross-platform experiences on supported devices on both Android and iOS. Immersive experiences built in Adobe Aero can be shared via a simple QR code scan or link with no full app download required. Everything you create in Geospatial Creator can be experienced in the physical world through real time localization and real world augmentation.


With Geospatial Creator, developers and creators can now build on top of Photorealistic 3D Tiles from Google Maps Platform (left) which provide real time localization and real time augmentation (right)

When the physical world is augmented with digital content, it redefines the way people play, shop, learn, create, shop and get information. To give you an idea of what you can achieve with these tools, we’ve been working with partners in gaming, retail, and local discovery including Gap, Mattel, Global Street Art, Singapore Tourism Board, Gensler, TAITO, and more to build real world use cases.

SPACE INVADERS: World Defense immersive game turns the world into a playground

Later this summer you’ll be able to play one of the most acclaimed arcade games in real life, in the real world. To celebrate the 45 year anniversary of the original release, TAITO will launch SPACE INVADERS: World Defense. The game, powered by ARCore and Geospatial Creator, is inspired by the original gameplay where players will have to defend the Earth from SPACE INVADERS in their neighborhood. It will combine AR and 3D gameplay to deliver a fully contextual and highly engaging immersive experience that connects multi-generations of players.



Gap and Mattel transform a storefront into an interactive immersive experience

Gap and Mattel will transform the iconic Times Square Gap Store into an interactive Gap x Barbie experience powered by Geospatial Creator in Adobe Aero. Starting May 23, customers will see the store come to life with colors and shapes and be able to interact with Barbie and her friends modeling the new limited edition Gap x Barbie collection of clothing.

moving image of Gap by Mattel

Global Street Art brings street art to a new dimension with AR murals

Google Arts & Culture partnered with Global Street Art and three world-renowned artists to augment physical murals in London (Camille Walala), Mexico City (Edgar Saner), and Los Angeles (Tristan Eaton). The artists used Geospatial Creator in Adobe Aero to create the virtual experience, augmenting physical murals digitally in AR and bringing to life a deeper and richer story about the art pieces.



Singapore Tourism Board creates an immersive guided tour to explore Singapore

Google Partner Innovation team partnered with Singapore Tourism Board to launch a preview of an immersive Singapore guided tour in their VisitSingapore app. Merli, Singapore's tourism mascot, leads visitors on an interactive augmented tour of the city’s iconic landmarks and hidden gems, beginning with the iconic Merlion Park and engaging visitors with an AR symphony performance at Victoria Theatre and Concert Hall. The full guided tour is launching later this summer, and will help visitors discover the best local hawker food, uncover the city's history through scenes from the past, and more.


Gensler helps communities visualize new urban projects

Gensler used Geospatial Creator in Adobe Aero to help communities easily envision what new city projects might look like for the unhoused. The immersive designs of housing projects allows everyone to better visualize the proposed urban changes and their social impact—ultimately bringing suitable shelter to those who need it.

moving image of city projects from Gensler

Geospatial Creator gives anyone the superpower of creating world scale AR experience remotely. Both developers and creators can build and publish immersive experiences in minutes in countries where Photorealistic 3D Tiles are available. In just a few clicks, you can create applications that help communities, delight your users, and provide solutions to businesses. Get started today at goo.gle/geospatialcreator. We’re excited to see what you create when the world is your canvas, playground, gallery, or more!

Build transformative augmented reality experiences with new ARCore and geospatial features

  Posted by Eric Lai, Group Product Manager

With ARCore, Google’s platform for building augmented reality experiences, we continue to enhance the ways we interact with information and experience the people and things around us. ARCore is now available on 1.4 billion Android devices and select features are also available on compatible iOS devices, making it the largest cross-device augmented reality platform.

Last year, we launched the ARCore Geospatial API, which leverages our understanding of the world through Google Maps and helps developers build AR experiences that are more immersive, richer, and more useful. We further engaged with all of you through global hackathons, such as the ARCore Geospatial API Challenge, where we saw a number of high quality submissions across a number of use cases, including gaming, local discovery, and navigation.

Today, we are introducing new ARCore Geospatial capabilities, including Streetscape Geometry API, Geospatial Depth API, and Scene Semantics API to help you build transformative, world-scale immersive experiences.


Introducing Streetscape Geometry API

With the new Streetscape Geometry API, you can interact, visualize, and transform building geometry around the user. The Streetscape Geometry API makes it easy for developers to build experiences that interact with real world geometry, like reskinning buildings, power more accurate occlusion, or just placing a virtual asset on a building, by providing a 3D mesh within a 100m radius of the user’s mobile device location.

moving image showing streetscape geometry
Streetscape Geometry API provides a 3D mesh of nearby buildings and terrain geometry

You can use this API to build immersive experiences like transforming building geometry into live plants growing on top of them or using the building geometry as a feature in your game by having virtual balls bounce off and interact with them.

Streetscape Geometry API is available on Android and iOS.


Introducing Rooftop Anchors and Geospatial Depth

Previously, we launched Geospatial anchors which allow developers to place stable geometry at exact locations using latitude, longitude, and altitude. Over the past year, we added Terrain anchors which are placed on Earth's terrain, using only longitude and latitude coordinates, with the altitude being calculated automatically.

Today we are introducing a new type of anchor: Rooftop anchors. Rooftop anchors let you anchor digital content securely to building rooftops, respecting the building geometry and the height of buildings.

moving image showing rooftop anchors
Rooftop anchors make it easier to
anchor digital content to building rooftops
moving image showing geospatial depth
Geospatial depth combines
real time depth measurement from
users' device with Streetscape Geometry data
to generate a depth map of up to 65 meters

In addition to new anchoring features, we are also leveraging the Streetscape Geometry API to improve one of the most important capabilities in AR: Depth. Depth is critical to enable more realistic occlusion or collision of virtual objects in the real world.

Today, we are launching Geospatial Depth. It combines the mobile device real time depth measurement with Streetscape Geometry data to improve depth measurements using building and terrain data providing depth for up to 65m. With Geospatial Depth you can build increasingly realistic geospatial experiences in the real world.

Rooftop Anchors are available on Android and iOS. Geospatial Depth is available on Android.


Introducing Scene Semantics API

The Scene Semantics API uses AI to provide a class label to every pixel in an outdoor scene, so you can create custom AR experiences based on the features in an area around your user. At launch, twelve class labels are available, including sky, building, tree, road, sidewalk, vehicle, person, water and more.

moving image showing streetscape geometry
Scene Semantics API uses AI to provide accurate labels for different features that are present in a scene outdoors

You can use the Scene Semantics API to enable different experiences in your app. For example, you can identify specific scene components, such as roads and sidewalks to help guide a user through the city, people and vehicles to render realistic occlusions, the sky to create a sunset at any time of the day, and buildings to modify their appearance and anchor virtual objects.

The Scene Semantics API is available on Android.


Mega Golf: The game that brings augmented mini-golf to your neighborhood

To help you get started, we’re also releasing Mega Golf, an open source demo that helps you experience the new APIs in action. In Mega Golf you will use buildings in your city to bounce off and propel a golf ball towards a hole while avoiding 3D virtual obstacles. This open source demo is available on GitHub. We're excited to see what you can do with this project.

moving image showing streetscape geometry
Mega Golf uses Streetscape Geometry API to transform neighborhoods into a playable mini golf course where players use nearby buildings to bounce and propel a golf ball towards a hole

With these new ARCore features improvements and the new Geospatial Creator in Adobe Aero and Unity, we’ll make it easier than ever for developers and creators to build realistic augmented reality experiences that delight and provide utility for users. Get started today at g.co/ARCore. We’re excited to see what you create when the world is your canvas, playground, gallery, or more!

How It’s Made: I/O FLIP adds a twist to a classic card game with generative AI

Posted by Jay Chang, Product Marketing Manager for Flutter & Dart and Glenn Cameron, Product Marketing Manager for Core ML

I/O FLIP is an AI-designed take on a classic card game, powered by Google, and created to inspire developers to experiment with what is possible with Google’s new generative AI technologies. Thousands of custom character images were pre-generated with DreamBooth on Muse and their descriptions were written using the PaLM API. The game’s UI and backend were built in Flutter and Dart, a suite of Firebase tools were used for hosting, and sharing, and Cloud Run was used to help scale.

When a user plays I/O FLIP, they:

  1. Select a character class and a power to generate a pack of 12 cards
  2. Select three cards from the pack to create their team
  3. Join a match and win a best-of-3
  4. Win multiple matches in a row to create a streak of wins for a chance to make the leaderboard
  5. Share their deck with players from all over the globe
Four phones side by side showing the I/O FLIP game, including drop-downs to select classes and powers for cards and various card battles.

Let’s dig into how we built the game.

Flutter and Dart: User Interface, Hologram effects, and backend

I/O FLIP’s game logic and UI is built on a foundation provided by features from the Flutter Casual Games Toolkit, including audio functionality and app navigation via the go_router package. Since FLIP is a web app, it was important that it was responsive – resizing depending on the user’s screen size and that it took input from a variety of devices, mobile, tablet, and desktop.

Much of the logic in FLIP is based on the game cards, so they’re a good place to start. Each card consists of an image of one of four Google mascots: Dash, Sparky, Dino, and Android, and a description – both of which are inspired by the class and power the user selects at the beginning of the game. Cards are also randomly assigned an elemental power (air, water, fire, metal, earth) and a number between 10-100 indicating the card’s strength. Elemental powers can impact each other in match play, indicated in the image below.

Five phones side by side showing the I/O FLIP game, including screens including illustrations of the elemental powers and their effects on each other

Elemental powers aren’t just for show. Cards receive a 10 point penalty if they are on the wrong end of an element matchup, as explained in the images above.

Speaking of matchups, each match is a best-of-3. The winner continues playing with their chosen hand to start (or continue) their streak, while the loser can share their hand or pick a new hand to try again.

New Flutter and Dart features helped us quickly bring this life: For instance, records, a Dart feature announced at Flutter Forward, helped us to render a frame based on the card element, and Flutter’s official support for fragment shaders on web helped us to create a special hologram effect on some cards, which are the only cards in the game that have 100 points.

A screen recording from I/O FLIP, showing a character card with the shader effect applied

Dreambooth on Muse and PaLM API: AI-generated images and descriptions


Four cards side by side from the I/O FLIP game, including screens

Each card in I/O FLIP is unique because it contains an AI-generated image and description.

Images were pre-generated using two technologies pioneered out of Google Research: Muse, a text-to-image AI model from the Imagen family of models, and DreamBooth, a technique running on top of Muse that allows you to personalize text-to-image models to generate novel images of a specific subject using a small set of your own images for training.

Card descriptions were prototyped in MakerSuite and pre-generated using the PaLM API which accesses Google’s large language models. Based on the power a player selects at the beginning of the game, you may get a card description that provides context to the image, including the character’s special powers such as: “Dash the Wizard lives in a castle with his pet dragon. He loves to cast spells and make people laugh.” Join the PaLM API and MakerSuite waitlist here.

Flutter is used to compose the cards from a name, description, image and power using the GameCard widget. Once the card is created, a border indicating its element is applied. If you’re lucky enough to land a hologram card, a special foil shader effect will be applied to the design.


Firebase: game hosting, sharing, and real-time game play

Cloud Storage for Firebase stores all of the images, descriptions, elements, and numbers that generate players’ card decks. Firestore keeps track of the leaderboard for “Highest Win Streak” with new leaders added using the firedart package.

In all cases where the Flutter app directly accesses Firestore, we've used App Check to ensure that only the code that we wrote ourselves is allowed, and we used Firebase security rules to ensure the code can only access data and make changes that it is authorized to.


Dart Frog: sharing code between the backend and frontend

I/O FLIP needed more ways to prevent cheating. This is where Dart Frog came in handy. It allowed us to keep the game logic, such as the winner of each round, on the backend, but also share this code between the Flutter frontend and the Firestore backend, which not only helped with cheating prevention, but also allowed the team to move just a little bit faster, since we were writing our backend and frontend code in the same language.

I/O FLIP is most fun when many players are online and playing. By deploying the I/O FLIP Dart Frog server to Cloud Run, the game can take advantage of features like autoscaling, which allows it to handle many players at once.

Finally, Dart Frog also enables downloading or sharing cards on social media. At the end of a round, a player can choose to download or share to Twitter or Facebook. When a user clicks the share button, Dart Frog generates a pre-populated post that contains text to share and a link to a webpage with the corresponding hand or card and a button for visitors to play the game too!


Try it yourself

We hope you’ve had a chance to try I/O FLIP and that it inspires you to think about ways generative AI can be used in your products, safely and responsibly. We’ve open sourced the code for I/O FLIP so you can take a deeper look at how we built it too. If you’d like to try your hand at some of the generative AI technologies used in I/O FLIP, tune in to Google I/O to learn more.

Introducing the new Google Pay button view on Android

Posted by Jose Ugia – Developer Relations Engineer

The Google Pay button is your customer’s entry point to a swift and familiar checkout with Google Pay. Here are some updates we are bringing to the button to make it easier for you to use it and customize it based on your checkout design, while creating a more consistent and informative experience to help your customers fly through your checkout flows more easily.


A new look and enhanced customization capabilities

Previously, you told us about the importance of a consistent checkout experience for your business. We gave the Google Pay button a fresh new look, applying the latest Material 3 design principles. The new Google Pay button comes in two versions that make it look great on dark and light themed applications. We have also added a subtle stroke that makes the button stand out in low contrast interfaces. In addition, we are introducing new customization capabilities that make it easy to adjust the button shape and corner roundness and help you create more consistent checkout experiences.

image of the updated Pay with Google Pay button annotating the subtle stroke and adjustable rounded corners
Figure 1: The new Google Pay button view for Android can be customized to make it more consistent with your checkout experience.

A new button view for Android

We also improved the Google Pay API for Android, introducing a new button view that simplifies the integration. Now, you can configure the Google Pay button in a more familiar way and add it directly to your XML layout. You can also configure the button programmatically if you prefer. This view lets you configure properties like the button theme and corner radius, and includes graphics and translations so you don't have to download and configure them manually. Simply add the view to your interface, adjust it and that's it.

The new button API is available today in beta. Check out the updated tutorial for Android to start using the new button view.

image of an example implementation showing how you can add and configure the new button view directly to your XML layout on Android
Figure 2: An example implementation of how you can add and configure the new button view directly to your XML layout on Android.

Giving additional context to customers with cards’ last 4 digits

Earlier, we introduced the dynamic Google Pay button for Web, which previewed the card network and last four digits for the customer’s last used form of payment directly on the button. Since then, we have seen great results, with this additional context driving conversion improvements and increases of up to 40% in the number of transactions.

Later this quarter, you’ll be able to configure the new button view for Android to show your users additional information about the last card they used to complete a payment with Google Pay. We are also working to offer the Google Pay button as a UI element in Jetpack Compose, to help you build your UIs more rapidly using intuitive Kotlin APIs.

image showing an example of how the dynamic version of the new Google Pay button view will look on Android.
Figure 3: An example of how the dynamic version of the new Google Pay button view will look on Android.

Next steps

Check out the documentation to start using the new Google Pay button view on Android. Also, check out our own button integration in the sample open source application in GitHub.