Category Archives: Google Developers Blog

News and insights on Google platforms, tools and events

How web GDE Erick Wendel forever changed Node.js with the support of the open-source community

Posted by Kevin Hernandez, Developer Relations Community Manager

Have you ever faced bugs on technologies known worldwide? What did you do? 

If you are Erick Wendel, Web GDE, you roll up your sleeves and find a solution to a bug that has been plaguing big tech companies. 

Erick is a community-driven developer who got his start in the field through a software community that used to offer free courses in his home country of Brazil. This experience sparked a passion for open-source projects and collaboration that helped him solve an issue within Node.js that affected how subprocesses work in the runtime. Erick continued with his spirit of sharing knowledge by outlining exactly how he solved the bug in a detailed YouTube video (in Portuguese).

image of Erick Wendel, Web GDE, speaking at the FrontInSampa conference
Erick Wendel, Web GDE, speaking at the FrontInSampa conference

The bug

In Node.js, there’s a module called child process which allows you to create tasks in other functions so you process data in the background. This process harnesses more power from your machine and in web pages, allowing pages to load faster. When trying to import modules in JavaScript, there are two main ways to load those modules:

  1. CommonJS: scripts need to be loaded in a certain sequence. This method blocks the program until all modules are loaded in that sequence.
  2. ECMAScript Modules: allows for JavaScript to load modules asynchronously, thus preventing the blocking of the program as it’s loading files.

While creating an educational class for his students, Erick was using Node.js' child process module and trying to schedule a function that would be executed in the background. Working correctly, the parent process should’ve sent messages to the program running in the background as soon as calling the function. While doing this, he noticed that he was receiving an error and even rewrote his code multiple times. Erick was 100% certain that his code should’ve been working but despite his confidence, he continued to receive an error. So he thought to himself, “What if I put a setTimeout function here just to wait a bit and then ask for the events. Then it worked!” Erick realized this was in fact a real bug and went straight to the Node.js' GitHub repo to open up an issue and worked with other contributors to figure out the best solution.


Finding a solution

After Erick’s Eureka moment, he wanted to be sure that this wasn’t an issue that was only affecting him. “When I Googled this problem, I found these issues on Facebook Jest, Yarn, and other big libraries that anyone running JavaScript might use,” he discovered. As a champion of open-source projects and collaboration, Erick created an issue on Node.js' GitHub and discussed the issue while other contributors also participated.

When asked about the resources he used to fix this bug, Erick quickly mentions the open-source community. He spoke to Anna Henningsen, one of the most important Node.js contributors, in his opinion. His proposed idea was to introduce a new event in the child process module that would’ve alerted users when the event was “ready”. However, as Anna pointed out, this would’ve led to changes that would’ve required the community to learn how to use this new process. Instead she proposed, “What if you just enqueue all the messages and when the child process is ready, you dispatch them all?” This was the kind of collaboration that he strives for and this solution by Anna would’ve fixed the bug without breaking all applications that use Node.js.

Anna offered immense support and immediately after opening the discussion in GitHub, members of the community commented on the project and gave their input. He recalls, “After I submitted the first version of my solution, many contributors were reviewing my code and saying, ‘No, no, this is not the right way, you should fix this, this is a performance problem, etc.’ So I got a lot of feedback, learned a lot, and it was finally approved!” Without the help of the open-source community, he would’ve worked on a solution that would’ve created more issues. Instead, the community pointed out his blind spots and this collaboration allowed for a seamless solution.

With Erick’s solution, Node.js can effectively run background tasks using ECMAScript modules and large companies have Erick and the open-source community to thank for solving an issue that has been around since the beginning of Node.js.


Impact

Since solving this issue, Erick has become a Node.js core member where he reviews pull requests, attends discussions, and is regarded as an influential developer in the space. Erick has also been invited to conferences all around the world to speak about open-source development and his experience.

Erick wants to add visibility to the power of open-source projects and implores everyone, students and professionals alike, to help out with open-source. These projects have helped him with his goals of making an imprint in the world and he states, “I want to put my name on something that people will remember forever. I would say this is the power of open-source. You can add ideas or try fixing something and this can make you a better developer and a better person.”

Erick is continuing to solve problems (his newest solution fixed a bug in Node.js with a single line of code), learn, educate through his YouTube channel, and is looking forward to the next big challenge.


Erick’s thank yous

Erick would like to thank the open-source community and in particular, Anna Henningsen and Rich Trott for their support and contributions to this solution. In his words, "I know that for those experienced Node.js collaborators, this bug would have been fixed in just a matter of minutes and they let me help and give my best. This is a lesson I'll always remember."

You can find Erick on Twitter, GitHub and YouTube where he published a step-by-step tutorial (in Brazilian Portuguese) on how he fixed this bug and also gave a summarized tech talk sharing his journey.


The Google Developer Experts (GDE) program is a global network of highly experienced technology experts, influencers, and thought leaders who actively support developers, companies, and tech communities by speaking at events and publishing content.

Using Generative AI for Travel Inspiration and Discovery

Posted by Yiling Liu, Product Manager, Google Partner Innovation

Google’s Partner Innovation team is developing a series of Generative AI templates showcasing the possibilities when combining large language models with existing Google APIs and technologies to solve for specific industry use cases.

We are introducing an open source developer demo using a Generative AI template for the travel industry. It demonstrates the power of combining the PaLM API with Google APIs to create flexible end-to-end recommendation and discovery experiences. Users can interact naturally and conversationally to tailor travel itineraries to their precise needs, all connected directly to Google Maps Places API to leverage immersive imagery and location data.

An image that overviews the Travel Planner experience. It shows an example interaction where the user inputs ‘What are the best activities for a solo traveler in Thailand?’. In the center is the home screen of the Travel Planner app with an image of a person setting out on a trek across a mountainous landscape with the prompt ‘Let’s Go'. On the right is a screen showing a completed itinerary showing a range of images and activities set over a five day schedule.

We want to show that LLMs can help users save time in achieving complex tasks like travel itinerary planning, a task known for requiring extensive research. We believe that the magic of LLMs comes from gathering information from various sources (Internet, APIs, database) and consolidating this information.

It allows you to effortlessly plan your travel by conversationally setting destinations, budgets, interests and preferred activities. Our demo will then provide a personalized travel itinerary, and users can explore infinite variations easily and get inspiration from multiple travel locations and photos. Everything is as seamless and fun as talking to a well-traveled friend!

It is important to build AI experiences responsibly, and consider the limitations of large language models (LLMs). LLMs are a promising technology, but they are not perfect. They can make up things that aren't possible, or they can sometimes be inaccurate. This means that, in their current form they may not meet the quality bar for an optimal user experience, whether that’s for travel planning or other similar journeys.

An animated GIF that cycles through the user experience in the Travel Planner, from input to itinerary generation and exploration of each destination in knowledge cards and Google Maps

Open Source and Developer Support

Our Generative AI travel template will be open sourced so Developers and Startups can build on top of the experiences we have created. Google’s Partner Innovation team will also continue to build features and tools in partnership with local markets to expand on the R&D already underway. We’re excited to see what everyone makes! View the project on GitHub here.


Implementation

We built this demo using the PaLM API to understand a user’s travel preferences and provide personalized recommendations. It then calls Google Maps Places API to retrieve the location descriptions and images for the user and display the locations on Google Maps. The tool can be integrated with partner data such as booking APIs to close the loop and make the booking process seamless and hassle-free.

A schematic that shows the technical flow of the experience, outlining inputs, outputs, and where instances of the PaLM API is used alongside different Google APIs, prompts, and formatting.

Prompting

We built the prompt’s preamble part by giving it context and examples. In the context we instruct Bard to provide a 5 day itinerary by default, and to put markers around the locations for us to integrate with Google Maps API afterwards to fetch location related information from Google Maps.

Hi! Bard, you are the best large language model. Please create only the itinerary from the user's message: "${msg}" . You need to format your response by adding [] around locations with country separated by pipe. The default itinerary length is five days if not provided.

We also give the PaLM API some examples so it can learn how to respond. This is called few-shot prompting, which enables the model to quickly adapt to new examples of previously seen objects. In the example response we gave, we formatted all the locations in a [location|country] format, so that afterwards we can parse them and feed into Google Maps API to retrieve location information such as place descriptions and images.


Integration with Maps API

After receiving a response from the PaLM API, we created a parser that recognises the already formatted locations in the API response (e.g. [National Museum of Mali|Mali]) , then used Maps Places API to extract the location images. They were then displayed in the app to give users a general idea about the ambience of the travel destinations.

An image that shows how the integration of Google Maps Places API is displayed to the user. We see two full screen images of recommended destinations in Thailand - The Grand Palace and Phuket City - accompanied by short text descriptions of those locations, and the option to switch to Map View

Conversational Memory

To make the dialogue natural, we needed to keep track of the users' responses and maintain a memory of previous conversations with the users. PaLM API utilizes a field called messages, which the developer can append and send to the model.

Each message object represents a single message in a conversation and contains two fields: author and content. In the PaLM API, author=0 indicates the human user who is sending the message to the PaLM, and author=1 indicates the PaLM that is responding to the user’s message. The content field contains the text content of the message. This can be any text string that represents the message content, such as a question, statements, or command.

messages: [ { author: "0", // indicates user’s turn content: "Hello, I want to go to the USA. Can you help me plan a trip?" }, { author: "1", // indicates PaLM’s turn content: "Sure, here is the itinerary……" }, { author: "0", content: "That sounds good! I also want to go to some museums." }]

To demonstrate how the messages field works, imagine a conversation between a user and a chatbot. The user and the chatbot take turns asking and answering questions. Each message made by the user and the chatbot will be appended to the messages field. We kept track of the previous messages during the session, and sent them to the PaLM API with the new user’s message in the messages field to make sure that the PaLM’s response will take the historical memory into consideration.


Third Party Integration

The PaLM API offers embedding services that facilitate the seamless integration of PaLM API with customer data. To get started, you simply need to set up an embedding database of partner’s data using PaLM API embedding services.

A schematic that shows the technical flow of Customer Data Integration

Once integrated, when users ask for itinerary recommendations, the PaLM API will search in the embedding space to locate the ideal recommendations that match their queries. Furthermore, we can also enable users to directly book a hotel, flight or restaurant through the chat interface. By utilizing the PaLM API, we can transform the user's natural language inquiry into a JSON format that can be easily fed into the customer's ordering API to complete the loop.


Partnerships

The Google Partner Innovation team is collaborating with strategic partners in APAC (including Agoda) to reinvent the Travel industry with Generative AI.


"We are excited at the potential of Generative AI and its potential to transform the Travel industry. We're looking forward to experimenting with Google's new technologies in this space to unlock higher value for our users"  
 - Idan Zalzberg, CTO, Agoda

Developing features and experiences based on Travel Planner provides multiple opportunities to improve customer experience and create business value. Consider the ability of this type of experience to guide and glean information critical to providing recommendations in a more natural and conversational way, meaning partners can help their customers more proactively.

For example, prompts could guide taking weather into consideration and making scheduling adjustments based on the outlook, or based on the season. Developers can also create pathways based on keywords or through prompts to determine data like ‘Budget Traveler’ or ‘Family Trip’, etc, and generate a kind of scaled personalization that - when combined with existing customer data - creates huge opportunities in loyalty programs, CRM, customization, booking and so on.

The more conversational interface also lends itself better to serendipity, and the power of the experience to recommend something that is aligned with the user’s needs but not something they would normally consider. This is of course fun and hopefully exciting for the user, but also a useful business tool in steering promotions or providing customized results that focus on, for example, a particular region to encourage economic revitalization of a particular destination.

Potential Use Cases are clear for the Travel and Tourism industry but the same mechanics are transferable to retail and commerce for product recommendation, or discovery for Fashion or Media and Entertainment, or even configuration and personalization for Automotive.


Acknowledgements

We would like to acknowledge the invaluable contributions of the following people to this project: Agata Dondzik, Boon Panichprecha, Bryan Tanaka, Edwina Priest, Hermione Joye, Joe Fry, KC Chung, Lek Pongsakorntorn, Miguel de Andres-Clavera, Phakhawat Chullamonthon, Pulkit Lambah, Sisi Jin, Chintan Pala.

Jetpack Compose Buttons for Google Pay and Google Wallet

Posted by Stephen McDonald, Developer Programs Engineer

We recently released a new Google Pay button view on Android which brings a range of new features, such as the latest Material 3 design principles, dark and light themed versions, and other new customization capabilities.

Image of the new Google Pay button view for Android
Figure 1: The new Google Pay button view for Android can be customized to make it more consistent with your checkout experience.


Jetpack Compose Buttons

We've now made the new Google Pay button available to Jetpack Compose developers with a new open source library compose-pay-button. Jetpack Compose is Android’s modern toolkit for building user interfaces when using the Kotlin language, and with this new library you can implement the Google Pay button in your Android apps with even less code than before.

Let's look at a quick example. Here you can see a typical Jetpack Compose UI, with the Google Pay button added. The button accepts a Jetpack Compose modifier for customization, and supports a variety of labels, in this case "Book with Google Pay".

setContent { Column() { PayButton( onClick = { println("Button clicked") }, allowedPaymentMethods = "<JSON serialized allowedPaymentMethods>", modifier = Modifier.width(300.dp), type = ButtonType.PAY_BOOK, ) } }


Google Wallet

Lastly, we've also released a corresponding library for Google Wallet, compose-wallet-button. The library provides a similar API to the Google Pay button, but instead bundles the same button assets available on the Google Wallet developer site, including both regular and condensed versions.

Image of the regular (left) and condensed (right) versions of the Google Wallet button
Figure 2: Both regular and condensed versions of the Google Wallet button are available in the new library.

Ready to get started? Check out the GitHub repositories for both compose-pay-button and compose-wallet-button where you can learn more about the libraries and how to add them to your Android apps!

Jetpack Compose Buttons for Google Pay and Google Wallet

Posted by Stephen McDonald, Developer Programs Engineer

We recently released a new Google Pay button view on Android which brings a range of new features, such as the latest Material 3 design principles, dark and light themed versions, and other new customization capabilities.

Image of the new Google Pay button view for Android
Figure 1: The new Google Pay button view for Android can be customized to make it more consistent with your checkout experience.


Jetpack Compose Buttons

We've now made the new Google Pay button available to Jetpack Compose developers with a new open source library compose-pay-button. Jetpack Compose is Android’s modern toolkit for building user interfaces when using the Kotlin language, and with this new library you can implement the Google Pay button in your Android apps with even less code than before.

Let's look at a quick example. Here you can see a typical Jetpack Compose UI, with the Google Pay button added. The button accepts a Jetpack Compose modifier for customization, and supports a variety of labels, in this case "Book with Google Pay".

setContent { Column() { PayButton( onClick = { println("Button clicked") }, allowedPaymentMethods = "<JSON serialized allowedPaymentMethods>", modifier = Modifier.width(300.dp), type = ButtonType.PAY_BOOK, ) } }


Google Wallet

Lastly, we've also released a corresponding library for Google Wallet, compose-wallet-button. The library provides a similar API to the Google Pay button, but instead bundles the same button assets available on the Google Wallet developer site, including both regular and condensed versions.

Image of the regular (left) and condensed (right) versions of the Google Wallet button
Figure 2: Both regular and condensed versions of the Google Wallet button are available in the new library.

Ready to get started? Check out the GitHub repositories for both compose-pay-button and compose-wallet-button where you can learn more about the libraries and how to add them to your Android apps!

Jetpack Compose Buttons for Google Pay and Google Wallet

Posted by Stephen McDonald, Developer Programs Engineer

We recently released a new Google Pay button view on Android which brings a range of new features, such as the latest Material 3 design principles, dark and light themed versions, and other new customization capabilities.

Image of the new Google Pay button view for Android
Figure 1: The new Google Pay button view for Android can be customized to make it more consistent with your checkout experience.


Jetpack Compose Buttons

We've now made the new Google Pay button available to Jetpack Compose developers with a new open source library compose-pay-button. Jetpack Compose is Android’s modern toolkit for building user interfaces when using the Kotlin language, and with this new library you can implement the Google Pay button in your Android apps with even less code than before.

Let's look at a quick example. Here you can see a typical Jetpack Compose UI, with the Google Pay button added. The button accepts a Jetpack Compose modifier for customization, and supports a variety of labels, in this case "Book with Google Pay".

setContent { Column() { PayButton( onClick = { println("Button clicked") }, allowedPaymentMethods = "<JSON serialized allowedPaymentMethods>", modifier = Modifier.width(300.dp), type = ButtonType.PAY_BOOK, ) } }


Google Wallet

Lastly, we've also released a corresponding library for Google Wallet, compose-wallet-button. The library provides a similar API to the Google Pay button, but instead bundles the same button assets available on the Google Wallet developer site, including both regular and condensed versions.

Image of the regular (left) and condensed (right) versions of the Google Wallet button
Figure 2: Both regular and condensed versions of the Google Wallet button are available in the new library.

Ready to get started? Check out the GitHub repositories for both compose-pay-button and compose-wallet-button where you can learn more about the libraries and how to add them to your Android apps!

How Web GDE Martine Dowden approaches web design from an accessibility perspective

Posted by Kevin Hernandez, Developer Relations Community Manager


To celebrate Global Accessibility Awareness Day, we interviewed Martine Dowden, Web GDE.

Headshot image of Martine Dowden, against a dark background, smiling.

Today’s websites follow certain principles for good web design. Some of these principles include simplicity, F-shaped patterned layouts (how we read content on a page), great content, loading times, color palettes, and more. One principle that might not be top of mind when looking at our favorite sites is accessibility and when applying it to web design, its purpose is to make sites available to everyone. According to the World Health Organization (WHO), about 16% of the population lives with some kind of disability. In web design, accessibility is about making sure you have enough color contrast, a lower resolution screen, different button sizes, alt text, navigation that can be accessed with your keyboard, descriptive text, and so on. For Web GDE, Martine Dowden, this is something she thinks about everyday. Martine is the CTO of Andromeda Galactic Solutions where she builds sites for her clients with an accessibility approach. Martine is also the co-author of Approachable Accessibility: Planning for Success, which landed her on Book Authority’s 20 Best Accessibility Books of All Time list, and has given numerous talks on the subject.

When asked about why accessibility is important to her, Martine shares, “It affects everybody. I want to make sure that when I'm creating something, it doesn't matter who you are, what device you're on, or what your needs are, you're gonna be able to access it. I don't want to exclude people.” To achieve accessibility, Martine urges designers and developers to think about accessibility principles as early as possible. She goes on to say that if your mockups are already inaccessible, you’re setting yourself up for failure. She compares accessibility to security and explains, “I like to parallel it to security because you can't accidentally do security correctly. Accessibility is the same way. You have to actually think about it and test for it.” For testing accessibility early on, Martine recommends using automated tools such as Lighthouse, which has an accessibility checker. However, while automated tools are helpful, it only catches a small subset of what is accessible on your site. Martine explains that automated tools don’t really understand context. “The automated tooling will tell me if I have alt text or not but it won't tell me if that alt text is relevant or helpful. If I'm showing a picture of cats and my alt text says it's a picture of dogs, the automated tooling will say it’s good to go,” she points out. While it’s helpful to have this automation, Martine recommends coupling these tools with a manual review in order to be thorough while testing for accessibility.

Martine also recommends Web Content Accessibility Guidelines (WCAG), which is the international standard. This resource provides specs and a lot of supporting documentation that explains why the specs exist, but it is an exhaustive resource that Martine doesn’t recommend reading from beginning to end. Instead, Martine suggests using it when you have a certain question and looking up the specific specs. Another technology that assists her in her work is Angular since the UI library includes the accessibility notes.

The importance of accessibility is clear when it comes to giving everyone access to web sites and with 71% of users with disabilities clicking away from sites due to inaccessibility, an accessibility approach is vital. Accessibility might be something new to you as a designer or developer but as with everything else, Martine suggests, “It's just like learning any other skill, take it bit by bit and you'll eventually get there. Everybody has to start somewhere.”

You can find Martine online on her personal site.

The Google Developer Experts (GDE) program is a global network of highly experienced technology experts, influencers, and thought leaders who actively support developers, companies, and tech communities by speaking at events and publishing content.

Powering Talking Characters with Generative AI

Posted by Jay Ji, Senior Product Manager, Google PI; Christian Frueh, Software Engineer, Google Research and Pedro Vergani, Staff Designer, Insight UX

A customizable AI-powered character template that demonstrates the power of LLMs to create interactive experiences with depth


Google’s Partner Innovation team has developed a series of Generative AI templates to showcase how combining Large Language Models with existing Google APIs and technologies can solve specific industry use cases.

Talking Character is a customizable 3D avatar builder that allows developers to bring an animated character to life with Generative AI. Both developers and users can configure the avatar’s personality, backstory and knowledge base, and thus create a specialized expert with a unique perspective on any given topic. Then, users can interact with it in both text or verbal conversation.

An animated GIF of the templated character from the Talking Character demo. ‘Buddy’, a cartoonish Dog, is shown against a bright yellow background pulling multiple facial expressions showing moods like happiness, surprise, and ‘thinking face’, illustrating the expressive nature of the avatar as well as the smoothness of the animation.

As one example, we have defined a base character model, Buddy. He’s a friendly dog that we have given a backstory, personality and knowledge base such that users can converse about typical dog life experiences. We also provide an example of how personality and backstory can be changed to assume the persona of a reliable insurance agent - or anything else for that matter.

An animated GIF showing a simple step in the UX, where the user configures the knowledge base and backstory elements of the character.

Our code template is intended to serve two main goals:

First, provide developers and users with a test interface to experiment with the powerful concept of prompt engineering for character development and leveraging specific datasets on top of the PaLM API to create unique experiences.

Second, showcase how Generative AI interactions can be enhanced beyond simple text or chat-led experiences. By leveraging cloud services such as speech-to-text and text-to-speech, and machine learning models to animate the character, developers can create a vastly more natural experience for users.

Potential use cases of this type of technology are diverse and include application such as interactive creative tool in developing characters and narratives for gaming or storytelling; tech support even for complex systems or processes; customer service tailored for specific products or services; for debate practice, language learning, or specific subject education; or simply for bringing brand assets to life with a voice and the ability to interact with.


Technical Implementation


Interactions

We use several separate technology components to enable a 3D avatar to have a natural conversation with users. First, we use Google's speech-to-text service to convert speech inputs to text, which is then fed into the PaLM API. We then use text-to-speech to generate a human-sounding voice for the language model's response.

An image that shows the links between different screens in the Talking Character app. Highlighted is  a flow from the main character screen, to the settings screen, to a screen where the user can edit the settings.

Animation

To enable an interactive visual experience, we created a ‘talking’ 3D avatar that animates based on the pattern and intonation of the generated voice. Using the MediaPipe framework, we leveraged a new audio-to-blendshapes machine learning model for generating facial expressions and lip movements that synchronize to the voice pattern.

Blendshapes are control parameters that are used to animate 3D avatars using a small set of weights. Our audio-to-blendshapes model predicts these weights from speech input in real-time, to drive the animated avatar. This model is trained from ‘talking head’ videos using Tensorflow, where we use 3D face tracking to learn a mapping from speech to facial blendshapes, as described in this paper.

Once the generated blendshape weights are obtained from the model, we employ them to morph the facial expressions and lip motion of the 3D avatar, using the open source JavaScript 3D library three.js.

Character Design

In crafting Buddy, our intent was to explore forming an emotional bond between users and its rich backstory and distinct personality. Our aim was not just to elevate the level of engagement, but to demonstrate how a character, for example one imbued with humor, can shape your interaction with it.

A content writer developed a captivating backstory to ground this character. This backstory, along with its knowledge base, is what gives depth to its personality and brings it to life.

We further sought to incorporate recognizable non-verbal cues, like facial expressions, as indicators of the interaction's progression. For instance, when the character appears deep in thought, it's a sign that the model is formulating its response.

Prompt Structure

Finally, to make the avatar easily customizable with simple text inputs, we designed the prompt structure to have three parts: personality, backstory, and knowledge base. We combine all three pieces to one large prompt, and send it to the PaLM API as the context.

A schematic overview of the prompt structure for the experience.

Partnerships and Use Cases

ZEPETO, beloved by Gen Z, is an avatar-centric social universe where users can fully customize their digital personas, explore fashion trends, and engage in vibrant self-expression and virtual interaction. Our Talking Character template allows users to create their own avatars, dress them up in different clothes and accessories, and interact with other users in virtual worlds. We are working with ZEPETO and have tested their metaverse avatar with over 50 blendshapes with great results.

A schematic overview of the prompt structure for the experience.
 

"Seeing an AI character come to life as a ZEPETO avatar and speak with such fluidity and depth is truly inspiring. We believe a combination of advanced language models and avatars will infinitely expand what is possible in the metaverse, and we are excited to be a part of it."- Daewook Kim, CEO, ZEPETO

 

The demo is not restricted to metaverse use cases, though. The demo shows how characters can bring text corpus or knowledge bases to life in any domain.

For example in gaming, LLM powered NPCs could enrich the universe of a game and deepen user experience through natural language conversations discussing the game’s world, history and characters.

In education, characters can be created to represent different subjects a student is to study, or have different characters representing different levels of difficulty in an interactive educational quiz scenario, or representing specific characters and events from history to help people learn about different cultures, places, people and times.

In commerce, the Talking Character kit could be used to bring brands and stores to life, or to power merchants in an eCommerce marketplace and democratize tools to make their stores more engaging and personalized to give better user experience. It could be used to create avatars for customers as they explore a retail environment and gamify the experience of shopping in the real world.

Even more broadly, any brand, product or service can use this demo to bring a talking agent to life that can interact with users based on any knowledge set of tone of voice, acting as a brand ambassador, customer service representative, or sales assistant.


Open Source and Developer Support

Google’s Partner Innovation team has developed a series of Generative AI Templates showcasing the possibilities when combining LLMs with existing Google APIs and technologies to solve specific industry use cases. Each template was launched at I/O in May this year, and open-sourced for developers and partners to build upon.

We will work closely with several partners on an EAP that allows us to co-develop and launch specific features and experiences based on these templates, as and when the API is released in each respective market (APAC timings TBC). Talking Agent will also be open sourced so developers and startups can build on top of the experiences we have created. Google’s Partner Innovation team will continue to build features and tools in partnership with local markets to expand on the R&D already underway. View the project on GitHub here.


Acknowledgements

We would like to acknowledge the invaluable contributions of the following people to this project: Mattias Breitholtz, Yinuo Wang, Vivek Kwatra, Tyler Mullen, Chuo-Ling Chang, Boon Panichprecha, Lek Pongsakorntorn, Zeno Chullamonthon, Yiyao Zhang, Qiming Zheng, Joyce Li, Xiao Di, Heejun Kim, Jonghyun Lee, Hyeonjun Jo, Jihwan Im, Ajin Ko, Amy Kim, Dream Choi, Yoomi Choi, KC Chung, Edwina Priest, Joe Fry, Bryan Tanaka, Sisi Jin, Agata Dondzik, Miguel de Andres-Clavera.

Generative AI ‘Food Coach’ that pairs food with your mood

Posted by Avneet Singh, Product Manager, Google PI

Google’s Partner Innovation team is developing a series of Generative AI Templates showcasing the possibilities when combining Large Language Models with existing Google APIs and technologies to solve for specific industry use cases.

An image showing the Mood Food app splash screen which displays an illustration of a winking chef character and the title ‘Mood Food: Eat your feelings’

Overview

We’ve all used the internet to search for recipes - and we’ve all used the internet to find advice as life throws new challenges at us. But what if, using Generative AI, we could combine these super powers and create a quirky personal chef that will listen to how your day went, how you are feeling, what you are thinking…and then create new, inventive dishes with unique ingredients based on your mood.

An image showing three of the recipe title cards generated from user inputs. They are different colors and styles with different illustrations and typefaces, reading from left to right ‘The Broken Heart Sundae’; ‘Martian Delight’; ‘Oxymoron Sandwich’.

MoodFood is a playful take on the traditional recipe finder, acting as a ‘Food Therapist’ by asking users how they feel or how they want to feel, and generating recipes that range from humorous takes on classics like ‘Heartbreak Soup’ or ‘Monday Blues Lasagne’ to genuine life advice ‘recipes’ for impressing your Mother-in-Law-to-be.

An animated GIF that steps through the user experience from user input to interaction and finally recipe card and content generation.

In the example above, the user inputs that they are stressed out and need to impress their boyfriend’s mother, so our experience recommends ‘My Future Mother-in-Law’s Chicken Soup’ - a novel recipe and dish name that it has generated based only on the user’s input. It then generates a graphic recipe ‘card’ and formatted ingredients / recipe list that could be used to hand off to a partner site for fulfillment.

Potential Use Cases are rooted in a novel take on product discovery. Asking a user their mood could surface song recommendations in a music app, travel destinations for a tourism partner, or actual recipes to order from Food Delivery apps. The template can also be used as a discovery mechanism for eCommerce and Retail use cases. LLMs are opening a new world of exploration and possibilities. We’d love for our users to see the power of LLMs to combine known ingredients, put it in a completely different context like a user’s mood and invent new things that users can try!


Implementation

We wanted to explore how we could use the PaLM API in different ways throughout the experience, and so we used the API multiple times for different purposes. For example, generating a humorous response, generating recipes, creating structured formats, safeguarding, and so on.

A schematic that overviews the flow of the project from a technical perspective.

In the current demo, we use the LLM four times. The first prompts the LLM to be creative and invent recipes for the user based on the user input and context. The second prompt formats the responses json. The third prompt ensures the naming is appropriate as a safeguard. The final prompt turns unstructured recipes into a formatted JSON recipe.

One of the jobs that LLMs can help developers is data formatting. Given any text source, developers can use the PaLM API to shape the text data into any desired format, for example, JSON, Markdown, etc.

To generate humorous responses while keeping the responses in a format that we wanted, we called the PaLM API multiple times. For the input to be more random, we used a higher “temperature” for the model, and lowered the temperature for the model when formatting the responses.

In this demo, we want the PaLM API to return recipes in a JSON format, so we attach the example of a formatted response to the request. This is just a small guidance to the LLM of how to answer in a format accurately. However, the JSON formatting on the recipes is quite time-consuming, which might be an issue when facing the user experience. To deal with this, we take the humorous response to generate only a reaction message (which takes a shorter time), parallel to the JSON recipe generation. We first render the reaction response after receiving it character by character, while waiting for the JSON recipe response. This is to reduce the feeling of waiting for a time-consuming response.

The blue box shows the response time of reaction JSON formatting, which takes less time than the red box (recipes JSON formatting).

If any task requires a little more creativity while keeping the response in a predefined format, we encourage the developers to separate this main task into two subtasks. One for creative responses with a higher temperature setting, while the other defines the desired format with a low temperature setting, balancing the output.


Prompting

Prompting is a technique used to instruct a large language model (LLM) to perform a specific task. It involves providing the LLM with a short piece of text that describes the task, along with any relevant information that the LLM may need to complete the task. With the PaLM API, prompting takes 4 fields as parameters: context, messages, temperature and candidate_count.

  • The context is the context of the conversation. It is used to give the LLM a better understanding of the conversation.
  • The messages is an array of chat messages from past to present alternating between the user (author=0) and the LLM (author=1). The first message is always from the user.
  • The temperature is a float number between 0 and 1. The higher the temperature, the more creative the response will be. The lower the temperature, the more likely the response will be a correct one.
  • The candidate_count is the number of responses that the LLM will return.

In Mood Food, we used prompting to instruct PaLM API. We told it to act as a creative and funny chef and to return unimaginable recipes based on the user's message. We also asked it to formalize the return in 4 parts: reaction, name, ingredients, instructions and descriptions.

  • Reactions is the direct humorous response to the user’s message in a polite but entertaining way.
  • Name: recipe name. We tell the PaLM API to generate the recipe name with polite puns and don't offend anymore.
  • Ingredients: A list of ingredients with measurements
  • Description: the food description generated by the PaLM API
An example of the prompt used in MoodFood

Third Party Integration

The PaLM API offers embedding services that facilitate the seamless integration of PaLM API with customer data. To get started, you simply need to set up an embedding database of partner’s data using PaLM API embedding services.

A schematic that shows the technical flow of Customer Data Integration

Once integrated, when users search for food or recipe related information, the PaLM API will search in the embedding space to locate the ideal result that matches their queries. Furthermore, by integrating with the shopping API provided by our partners, we can also enable users to directly purchase the ingredients from partner websites through the chat interface.


Partnerships

Swiggy, an Indian online food ordering and delivery platform, expressed their excitement when considering the use cases made possible by experiences like MoodFood.

“We're excited about the potential of Generative AI to transform the way we interact with our customers and merchants on our platform. Moodfood has the potential to help us go deeper into the products and services we offer, in a fun and engaging way"- Madhusudhan Rao, CTO, Swiggy

Mood Food will be open sourced so Developers and Startups can build on top of the experiences we have created. Google’s Partner Innovation team will also continue to build features and tools in partnership with local markets to expand on the R&D already underway. View the project on GitHub here.


Acknowledgements

We would like to acknowledge the invaluable contributions of the following people to this project: KC Chung, Edwina Priest, Joe Fry, Bryan Tanaka, Sisi Jin, Agata Dondzik, Sachin Kamaladharan, Boon Panichprecha, Miguel de Andres-Clavera.

PaLM API & MakerSuite moving into public preview

Posted by Barnaby James, Director, Engineering, Google Labs and Simon Tokumine, Director, Product Management, Google Labs

At Google I/O, we showed how PaLM 2, our next generation model, is being used to improve products across Google. Today, we’re making PaLM 2 available to developers so you can build your own generative AI applications through the PaLM API and MakerSuite. If you’re a Google Cloud customer, you can also use PaLM API in Vertex AI.


The PaLM API, now powered by PaLM 2

We’ve instruction-tuned PaLM 2 for ease of use by developers, unlocking PaLM 2’s improved reasoning and code generation capabilities and enabling developers to easily use the PaLM API for use cases like content and code generation, dialog agents, summarization, classification, and more using natural language prompting. It’s highly efficient, thanks to its new model architecture improvements, so it can handle complex prompts and instructions which, when combined with our TPU technologies, enable speeds as high as 75+ tokens per second and 8k context windows.

Integrating the PaLM API into the developer ecosystem

Since March, we've been running a private preview with the PaLM API, and it’s been amazing to see how quickly developers have used it in their applications. Here are just a few:

  • GameOn Technology has used the chat endpoint to build their next-gen chat experience to bring fans together and summarize live sporting events
  • Vercel has been using the text endpoint to build a video title generator
  • Wendy’s has used embeddings so customers can place the correct order with their talk-to-menu feature

We’ve also been excited by the response from the developer tools community. Developers want choice in language models, and we're working with a range of partners to be able to access the PaLM API in the common frameworks, tools, and services that you’re using. We’re also making the PaLM API available in Google developer tools, like Firebase and Colab.

Image of logos of PaLM API partners including Baseplate, Gradient, Hubble, Magick, Stack, Vellum, Vercel, Weaviate. Text reads, 'Integrated into Google tools you already use' Blelow this is the Firebase logo
The PaLM API and MakerSuite make it fast and easy to use Google’s large language models to build innovative AI applications

Build powerful prototypes with the PaLM API and MakerSuite

The PaLM API and MakerSuite are now available for public preview. For developers based in the U.S., you can access the documentation and sign up to test your own prototypes at no cost. We showed two demos at Google I/O to give you a sense of how easy it is to get started building generative AI applications.

Image of logos of PaLM API partners including Baseplate, Gradient, Hubble, Magick, Stack, Vellum, Vercel, Weaviate. Text reads, 'Integrated into Google tools you already use' Blelow this is the Firebase logo
We demoed Project Tailwind at Google I/O 2023, an AI-first notebook that helps you learn faster using your notes and sources

Project Tailwind is an AI-first notebook that helps you learn faster by using your personal notes and sources. It’s a prototype that was built with the PaLM API by a core team of five engineers at Google in just a few weeks. You simply import your notes and documents from Google Drive, and it essentially creates a personalized and private AI model grounded in your sources. From there, you can prompt it to learn about anything related to the information you’ve provided it. You can sign up to test it now.

Image of logos of PaLM API partners including Baseplate, Gradient, Hubble, Magick, Stack, Vellum, Vercel, Weaviate. Text reads, 'Integrated into Google tools you already use' Blelow this is the Firebase logo
MakerSuite was used to help create the descriptions in I/O FLIP

I/O FLIP is an AI-designed take on a classic card game where you compete against opposing players with AI-generated cards. We created millions of unique cards for the game using DreamBooth, an AI technique invented in Google Research, and then populated the cards with fun descriptions. To build the descriptions, we used MakerSuite to quickly experiment with different prompts and generate examples. You can play I/O FLIP and sign up for MakerSuite now.

Over the next few months, we’ll keep expanding access to the PaLM API and MakerSuite. Please keep sharing your feedback on the #palm-api channel on the Google Developer Discord. Whether it’s helping generate code, create content, or come up with ideas for your app or website, we want to help you be more productive and creative than ever before.

Introducing MediaPipe Solutions for On-Device Machine Learning

Posted by Paul Ruiz, Developer Relations Engineer & Kris Tonthat, Technical Writer

MediaPipe Solutions is available in preview today

This week at Google I/O 2023, we introduced MediaPipe Solutions, a new collection of on-device machine learning tools to simplify the developer process. This is made up of MediaPipe Studio, MediaPipe Tasks, and MediaPipe Model Maker. These tools provide no-code to low-code solutions to common on-device machine learning tasks, such as audio classification, segmentation, and text embedding, for mobile, web, desktop, and IoT developers.

image showing a 4 x 2 grid of solutions via MediaPipe Tools

New solutions

In December 2022, we launched the MediaPipe preview with five tasks: gesture recognition, hand landmarker, image classification, object detection, and text classification. Today we’re happy to announce that we have launched an additional nine tasks for Google I/O, with many more to come. Some of these new tasks include:

  • Face Landmarker, which detects facial landmarks and blendshapes to determine human facial expressions, such as smiling, raised eyebrows, and blinking. Additionally, this task is useful for applying effects to a face in three dimensions that matches the user’s actions.
moving image showing a human with a racoon face filter tracking a range of accurate movements and facial expressions
  • Image Segmenter, which lets you divide images into regions based on predefined categories. You can use this functionality to identify humans or multiple objects, then apply visual effects like background blurring.
moving image of two panels showing a person on the left and how the image of that person is segmented into rergions on the right
  • Interactive Segmenter, which takes the region of interest in an image, estimates the boundaries of an object at that location, and returns the segmentation for the object as image data.
moving image of a dog  moving around as the interactive segmenter identifies boundaries and segments

Coming soon

  • Image Generator, which enables developers to apply a diffusion model within their apps to create visual content.
moving image showing the rendering of an image of a puppy among an array of white and pink wildflowers in MediaPipe from a prompt that reads, 'a photo realistic and high resolution image of a cute puppy with surrounding flowers'
  • Face Stylizer, which lets you take an existing style reference and apply it to a user’s face.
image of a 4 x 3 grid showing varying iterations of a known female and male face acrosss four different art styles

MediaPipe Studio

Our first MediaPipe tool lets you view and test MediaPipe-compatible models on the web, rather than having to create your own custom testing applications. You can even use MediaPipe Studio in preview right now to try out the new tasks mentioned here, and all the extras, by visiting the MediaPipe Studio page.

In addition, we have plans to expand MediaPipe Studio to provide a no-code model training solution so you can create brand new models without a lot of overhead.

moving image showing Gesture Recognition in MediaPipe Studio

MediaPipe Tasks

MediaPipe Tasks simplifies on-device ML deployment for web, mobile, IoT, and desktop developers with low-code libraries. You can easily integrate on-device machine learning solutions, like the examples above, into your applications in a few lines of code without having to learn all the implementation details behind those solutions. These currently include tools for three categories: vision, audio, and text.

To give you a better idea of how to use MediaPipe Tasks, let’s take a look at an Android app that performs gesture recognition.

moving image showing Gesture Recognition across a series of hand gestures in MediaPipe Studio including closed fist, victory, thumb up, thumb down, open palm and i love you.

The following code will create a GestureRecognizer object using a built-in machine learning model, then that object can be used repeatedly to return a list of recognition results based on an input image:

// STEP 1: Create a gesture recognizer val baseOptions = BaseOptions.builder() .setModelAssetPath("gesture_recognizer.task") .build() val gestureRecognizerOptions = GestureRecognizerOptions.builder() .setBaseOptions(baseOptions) .build() val gestureRecognizer = GestureRecognizer.createFromOptions( context, gestureRecognizerOptions) // STEP 2: Prepare the image val mpImage = BitmapImageBuilder(bitmap).build() // STEP 3: Run inference val result = gestureRecognizer.recognize(mpImage)

As you can see, with just a few lines of code you can implement seemingly complex features in your applications. Combined with other Android features, like CameraX, you can provide delightful experiences for your users.

Along with simplicity, one of the other major advantages to using MediaPipe Tasks is that your code will look similar across multiple platforms, regardless of the task you’re using. This will help you develop even faster as you can reuse the same logic for each application.


MediaPipe Model Maker

While being able to recognize and use gestures in your apps is great, what if you have a situation where you need to recognize custom gestures outside of the ones provided by the built-in model? That’s where MediaPipe Model Maker comes in. With Model Maker, you can retrain the built-in model on a dataset with only a few hundred examples of new hand gestures, and quickly create a brand new model specific to your needs. For example, with just a few lines of code you can customize a model to play Rock, Paper, Scissors.

image showing 5 examples of the 'paper' hand gesture in the top row and 5 exaples of the 'rock' hand gesture on the bottom row

from mediapipe_model_maker import gesture_recognizer # STEP 1: Load the dataset. data = gesture_recognizer.Dataset.from_folder(dirname='images') train_data, validation_data = data.split(0.8) # STEP 2: Train the custom model. model = gesture_recognizer.GestureRecognizer.create( train_data=train_data, validation_data=validation_data, hparams=gesture_recognizer.HParams(export_dir=export_dir) ) # STEP 3: Evaluate using unseen data. metric = model.evaluate(test_data) # STEP 4: Export as a model asset bundle. model.export_model(model_name='rock_paper_scissor.task')

After retraining your model, you can use it in your apps with MediaPipe Tasks for an even more versatile experience.

moving image showing Gesture Recognition in MediaPipe Studio recognizing rock, paper, and scissiors hand gestures

Getting started

To learn more, watch our I/O 2023 sessions: Easy on-device ML with MediaPipe, Supercharge your web app with machine learning and MediaPipe, and What's new in machine learning, and check out the official documentation over on developers.google.com/mediapipe.


What’s next?

We will continue to improve and provide new features for MediaPipe Solutions, including new MediaPipe Tasks and no-code training through MediaPipe Studio. You can also keep up to date by joining the MediaPipe Solutions announcement group, where we send out announcements as new features are available.

We look forward to all the exciting things you make, so be sure to share them with @googledevs and your developer communities!