Tag Archives: Tutorial

How to effectively A/B test power consumption for your Android app’s features

Posted by Mayank Jain - Product Manager, and Yasser Dbeis - Software Engineer; Android Studio

Android developers have been telling us they're looking for tools to help optimize power consumption for different devices on Android.

The new Power Profiler in Android Studio helps Android developers by showing power consumption happening on devices as the app is being used. Understanding power consumption across Android devices can help Android developers identify and fix power consumption issues in their apps. They can run A/B tests to compare the power consumption of different algorithms, features or even different versions of their app.

The new Power Profiler in Android Studio
The new Power Profiler in Android Studio

Apps which are optimized for lower power consumption lead to an improved battery and thermal performance of the device, which means an improved user experience on Android.

This power consumption data is made available through the On Device Power Monitor (ODPM) on Pixel 6+ devices, segmented by each sub-system called “Power Rails”. See Profileable power rails for a list of supported sub-systems.

The Power Profiler can help app developers detect problems in several areas:

    • Detecting unoptimized code that is using more power than necessary.
    • Finding background tasks that are causing unnecessary CPU usage.
    • Identifying wakelocks that are keeping the device awake when they are not needed.

Once a power consumption issue has been identified, the Power Profiler can be used when testing different hypotheses to understand why the app could be consuming excessive power. For example, if the issue is caused by background tasks, the developer can try to stop the tasks from running unnecessarily or for longer periods. And if the issue is caused by wakelocks, the developer can try to release the wakelocks when the resource is not in use or use them more judiciously. Then compare the power consumption before/after the change using the Power Profiler.

In this blog post, we showcase a technique which uses A/B testing to understand how your app’s power consumption characteristics might change with different versions of the same feature - and how you can effectively measure them.

A real-life example of how the Power Profiler can be used to improve the battery life of an app.

Let’s assume you have an app through which users can purchase their favorite movies.

Sample app to demonstrate A/B testing for measure power consumption
Sample app to demonstrate A/B testing for measure power consumption 
Video (c) copyright Blender Foundation | www.bigbuckbunny.org

As your app becomes popular and is used by more users, you realize that a high quality 4K video takes very long to load every time the app is started. Because of its large size, you want to understand its impact on power consumption on the device.

Originally, this video was in 4K quality in the best of intentions, so as to showcase the best possible movie highlights to your customers.

This makes you think…

    • Do you really need a 4K video banner on the home screen?
    • Does it make sense to load a 4K video over the network every time your app is run?
    • How will the power consumption characteristics of your app change if you replace the 4K video with something of lower quality (while still preserving the vivid look & feel of the video)?

This is a perfect scenario to perform an A/B test for power consumption

With an A/B test, you can test two slightly different variations of the video banner feature and choose the one with the better power consumption characteristics.

Scenario A : Run the app with 4K video banner on screen & measure power consumption

Scenario B : Run the app with lower resolution video banner on screen & measure power consumption

A/B Test setup

Let's take a moment and set up our Android Studio profiler to run this A/B test. We need to start the app and attach the CPU profiler to it and trigger a system trace (where the Power Profiler will be shown).

Step 1

Create a custom “Run configuration” by clicking the 3 dot menu > Edit

Custom run configuration
Custom run configuration

Step 2

Then select the “Profiling” tab and ensure that “Start this recording on startup” and CPU Activity > System Trace is selected. Then click “Apply”.

Edit configuration settings
Edit configuration settings

Now simply run the “Profile app startup profiling with low overhead” whenever you want to run this app from start and attach the CPU profiler to it.

Note on precision

The following example scenarios use the entire app startup for estimating the power consumption for this blog’s purpose. However you can use more advanced techniques to have even higher precision in getting power readings. Some techniques to try are:

    • Isolate and measure power consumption for video playback only after a tap event on the video player
    • Use the trace markers API to mark the start and stop time for power measurement timeline - and then only measure power consumption within that marked window

Scenario A

In this scenario, we run the app with 4K video playing and measure power consumption for the first 30 seconds. We can optionally also run the scenario A multiple times and average out the readings. Once the System trace is shown in Android Studio, select the 0-30 second time range from the timeline selection panel and record as a screenshot for comparing against scenario B

Power consumption in scenario A - playing a 4k video
Power consumption in scenario A - playing a 4k video

As you can see, the average power consumed by WLAN, CPU cores & Memory combined is about 1,352 mW (milliwatts)

Now let's compare and contrast how this power consumption changes in Scenario B

Scenario B

In this scenario, we run the app with low quality video playing and measure power consumption for the first 30 seconds. As before, we can also optionally run scenario B multiple times and average out the power consumption readings. Again, once the System trace is shown in Android Studio, select the 0-30 second time range from the timeline selection panel.

Power consumption in scenario B - playing a lower quality video
Power consumption in scenario B - playing a lower quality video

The total power consumed by WLAN, CPU Little, CPU Big and CPU Mid & Memory is about 741 mW (milliwatts)

Conclusion

All else being equal, Scenario B (with lower quality video) consumed 741 mW power as compared to Scenario A (with 4K video) which required 1,352 mW power.

Scenario B (lower quality video) took 45% less power than Scenario A (4K) - while the lower quality video provides little to no visual difference in perceived quality of the app’s screen.

As a result of this A/B test for power consumption, you conclude that replacing the 4K video with a lower quality video on our app’s home screen not only reduces power consumption by 45%, also reduces the required network bandwidth and can potentially also improve the thermal performance of the devices.

If your app’s business logic still requires the 4K video to be shown on the app’s screen, you can explore strategies like:

    • Caching the 4K video across subsequent runs of the app.
    • Loading video on a user tap.
    • Loading an image initially and only load the video after the screen has fully rendered (delayed loading).

The overall power consumption numbers presented in the above A/B test scenario might seem small, but it shows the techniques that app developers can use to effectively A/B test power consumption for their app’s features using the Power Profiler in Android Studio.

Next Steps

The new Power Profiler is available in Android Studio Hedgehog onwards. To know more, please head over to the official documentation.

Create smart chips for link previewing in Google Docs

Posted by Chanel Greco, Developer Advocate

Earlier this year, we announced the general availability of third-party smart chips in Google Docs. This new feature lets you add, view, and engage with critical information from third party apps directly in Google Docs. Several partners, including Asana, Atlassian, Figma, Loom, Miro, Tableau, and Whimsical, have already created smart chips so users can start embedding content from their apps directly into Docs. Sourabh Choraria, a Google Developer Expert for Google Workspace and hobby developer, published a third-party smart chip solution called “Link Previews” to the Google Workspace Marketplace. This app adds information to Google Docs from multiple commonly used SaaS tools.

In this blog post you will find out how you too can create your own smart chips for Google Docs.

Example of a smart chip that was created to preview information from an event management system
Example of a smart chip that was created to preview information from an event management system


Understanding how smart chips for third-party services work

Third-party smart chips are powered by Google Workspace Add-ons and can be published to the Google Workspace Marketplace. From there, an admin or user can install the add-on and it will appear in the sidebar on the right hand side of Google Docs.

The Google Workspace Add-on detects a service's links and prompts Google Docs users to preview them. This means that you can create smart chips for any service that has a publicly accessible URL. You can configure an add-on to preview multiple URL patterns, such as links to support cases, sales leads, employee profiles, and more. This configuration is done in the add-on’s manifest file.

{
  "timeZone": "America/Los_Angeles",
  "exceptionLogging": "STACKDRIVER",
  "runtimeVersion": "V8",
  "oauthScopes": [
    "https://www.googleapis.com/auth/workspace.linkpreview",
    "https://www.googleapis.com/auth/script.external_request"
  ],
  "addOns": {
    "common": {
      "name": "Preview Books Add-on",
      "logoUrl": "https://developers.google.com/workspace/add-ons/images/library-icon.png",
      "layoutProperties": {
        "primaryColor": "#dd4b39"
      }
    },
    "docs": {
      "linkPreviewTriggers": [
        {
          "runFunction": "bookLinkPreview",
          "patterns": [
            {
              "hostPattern": "*.google.*",
              "pathPrefix": "books"
            },
            {
              "hostPattern": "*.google.*",
              "pathPrefix": "books/edition"
            }
          ],
          "labelText": "Book",
          "logoUrl": "https://developers.google.com/workspace/add-ons/images/book-icon.png",
          "localizedLabelText": {
            "es": "Libros"
          }
        }
      ]
    }
  }
}
The manifest file contains the URL pattern for the Google Books API

The smart chip displays an icon and short title or description of the link's content. When the user hovers over the chip, they see a card interface that previews more information about the file or link. You can customize the card interface that appears when the user hovers over a smart chip. To create the card interface, you use widgets to display information about the link. You can also build actions that let users open the link or modify its contents. For a list of all the supported components for preview cards check the developer documentation.

function getBook(id) {
// Code to fetch the data from the Google Books API
}

function bookLinkPreview(event) {
 if (event.docs.matchedUrl.url) {
// Through getBook(id) the relevant data is fetched and used to build the smart chip and card

    const previewHeader = CardService.newCardHeader()
      .setSubtitle('By ' + bookAuthors)
      .setTitle(bookTitle);

    const previewPages = CardService.newDecoratedText()
      .setTopLabel('Page count')
      .setText(bookPageCount);

    const previewDescription = CardService.newDecoratedText()
      .setTopLabel('About this book')
      .setText(bookDescription).setWrapText(true);

    const previewImage = CardService.newImage()
      .setAltText('Image of book cover')
      .setImageUrl(bookImage);

    const buttonBook = CardService.newTextButton()
      .setText('View book')
      .setOpenLink(CardService.newOpenLink()
        .setUrl(event.docs.matchedUrl.url));

    const cardSectionBook = CardService.newCardSection()
      .addWidget(previewImage)
      .addWidget(previewPages)
      .addWidget(CardService.newDivider())
      .addWidget(previewDescription)
      .addWidget(buttonBook);

    return CardService.newCardBuilder()
    .setHeader(previewHeader)
    .addSection(cardSectionBook)
    .build();
  }
}
This is the Apps Script code to create a smart chip.

A smart chip hovered state.
A smart chip hovered state. The data displayed is fetched from the Google for Developers blog post URL that was pasted by the user.


For a detailed walkthrough of the code used in this post, please checkout the Preview links from Google Books with smart chips sample tutorial.



How to choose the technology for your add-on

When creating smart chips for link previewing, you can choose from two different technologies to create your add-on: Google Apps Script or alternate runtime.

Apps script is a rapid application development platform that is built into Google Workspace. This fact makes Apps Script a good choice for prototyping and validating your smart chip solution as it requires no pre-existing development environment. But Apps Script isn’t only for prototyping as some developers choose to create their Google Workspace Add-on with it and even publish it to the Google Workspace Marketplace for users to install.

If you want to create your smart chip with Apps Script you can check out the video below in which you learn how to build a smart chip for link previewing in Google Docs from A - Z. Want the code used in the video tutorial? Then have a look at the Preview links from Google Books with smart chips sample page.

If you prefer to create your Google Workspace Add-on using your own development environment, programming language, hosting, packages, etc., then alternate runtime is the right choice. You can choose from different programming languages like Node.js, Java, Python, and more. The hosting of the add-on runtime code can be on any cloud or on premise infrastructure as long as runtime code can be exposed as a public HTTP(S) endpoint. You can learn more about how to create smart chips using alternate runtimes from the developer documentation.



How to share your add-on with others

You can share your add-on with others through the Google Workspace Marketplace. Let’s say you want to make your smart chip solution available to your team. In that case you can publish the add-on to your Google Workspace organization, also known as a private app. On the other hand, if you want to share your add-on with anyone who has a Google Account, you can publish it as a public app.

To find out more about publishing to the Google Workspace Marketplace, you can watch this video that will walk you through the process.



Getting started

Learn more about creating smart chips for link previewing in the developer documentation. There you will find further information and code samples you can base your solution of. We can’t wait to see what smart chip solutions you will build.

Create smart chips for link previewing in Google Docs

Posted by Chanel Greco, Developer Advocate

Earlier this year, we announced the general availability of third-party smart chips in Google Docs. This new feature lets you add, view, and engage with critical information from third party apps directly in Google Docs. Several partners, including Asana, Atlassian, Figma, Loom, Miro, Tableau, and Whimsical, have already created smart chips so users can start embedding content from their apps directly into Docs. Sourabh Choraria, a Google Developer Expert for Google Workspace and hobby developer, published a third-party smart chip solution called “Link Previews” to the Google Workspace Marketplace. This app adds information to Google Docs from multiple commonly used SaaS tools.

In this blog post you will find out how you too can create your own smart chips for Google Docs.

Example of a smart chip that was created to preview information from an event management system
Example of a smart chip that was created to preview information from an event management system


Understanding how smart chips for third-party services work

Third-party smart chips are powered by Google Workspace Add-ons and can be published to the Google Workspace Marketplace. From there, an admin or user can install the add-on and it will appear in the sidebar on the right hand side of Google Docs.

The Google Workspace Add-on detects a service's links and prompts Google Docs users to preview them. This means that you can create smart chips for any service that has a publicly accessible URL. You can configure an add-on to preview multiple URL patterns, such as links to support cases, sales leads, employee profiles, and more. This configuration is done in the add-on’s manifest file.

{
  "timeZone": "America/Los_Angeles",
  "exceptionLogging": "STACKDRIVER",
  "runtimeVersion": "V8",
  "oauthScopes": [
    "https://www.googleapis.com/auth/workspace.linkpreview",
    "https://www.googleapis.com/auth/script.external_request"
  ],
  "addOns": {
    "common": {
      "name": "Preview Books Add-on",
      "logoUrl": "https://developers.google.com/workspace/add-ons/images/library-icon.png",
      "layoutProperties": {
        "primaryColor": "#dd4b39"
      }
    },
    "docs": {
      "linkPreviewTriggers": [
        {
          "runFunction": "bookLinkPreview",
          "patterns": [
            {
              "hostPattern": "*.google.*",
              "pathPrefix": "books"
            },
            {
              "hostPattern": "*.google.*",
              "pathPrefix": "books/edition"
            }
          ],
          "labelText": "Book",
          "logoUrl": "https://developers.google.com/workspace/add-ons/images/book-icon.png",
          "localizedLabelText": {
            "es": "Libros"
          }
        }
      ]
    }
  }
}
The manifest file contains the URL pattern for the Google Books API

The smart chip displays an icon and short title or description of the link's content. When the user hovers over the chip, they see a card interface that previews more information about the file or link. You can customize the card interface that appears when the user hovers over a smart chip. To create the card interface, you use widgets to display information about the link. You can also build actions that let users open the link or modify its contents. For a list of all the supported components for preview cards check the developer documentation.

function getBook(id) {
// Code to fetch the data from the Google Books API
}

function bookLinkPreview(event) {
 if (event.docs.matchedUrl.url) {
// Through getBook(id) the relevant data is fetched and used to build the smart chip and card

    const previewHeader = CardService.newCardHeader()
      .setSubtitle('By ' + bookAuthors)
      .setTitle(bookTitle);

    const previewPages = CardService.newDecoratedText()
      .setTopLabel('Page count')
      .setText(bookPageCount);

    const previewDescription = CardService.newDecoratedText()
      .setTopLabel('About this book')
      .setText(bookDescription).setWrapText(true);

    const previewImage = CardService.newImage()
      .setAltText('Image of book cover')
      .setImageUrl(bookImage);

    const buttonBook = CardService.newTextButton()
      .setText('View book')
      .setOpenLink(CardService.newOpenLink()
        .setUrl(event.docs.matchedUrl.url));

    const cardSectionBook = CardService.newCardSection()
      .addWidget(previewImage)
      .addWidget(previewPages)
      .addWidget(CardService.newDivider())
      .addWidget(previewDescription)
      .addWidget(buttonBook);

    return CardService.newCardBuilder()
    .setHeader(previewHeader)
    .addSection(cardSectionBook)
    .build();
  }
}
This is the Apps Script code to create a smart chip.

A smart chip hovered state.
A smart chip hovered state. The data displayed is fetched from the Google for Developers blog post URL that was pasted by the user.


For a detailed walkthrough of the code used in this post, please checkout the Preview links from Google Books with smart chips sample tutorial.



How to choose the technology for your add-on

When creating smart chips for link previewing, you can choose from two different technologies to create your add-on: Google Apps Script or alternate runtime.

Apps script is a rapid application development platform that is built into Google Workspace. This fact makes Apps Script a good choice for prototyping and validating your smart chip solution as it requires no pre-existing development environment. But Apps Script isn’t only for prototyping as some developers choose to create their Google Workspace Add-on with it and even publish it to the Google Workspace Marketplace for users to install.

If you want to create your smart chip with Apps Script you can check out the video below in which you learn how to build a smart chip for link previewing in Google Docs from A - Z. Want the code used in the video tutorial? Then have a look at the Preview links from Google Books with smart chips sample page.

If you prefer to create your Google Workspace Add-on using your own development environment, programming language, hosting, packages, etc., then alternate runtime is the right choice. You can choose from different programming languages like Node.js, Java, Python, and more. The hosting of the add-on runtime code can be on any cloud or on premise infrastructure as long as runtime code can be exposed as a public HTTP(S) endpoint. You can learn more about how to create smart chips using alternate runtimes from the developer documentation.



How to share your add-on with others

You can share your add-on with others through the Google Workspace Marketplace. Let’s say you want to make your smart chip solution available to your team. In that case you can publish the add-on to your Google Workspace organization, also known as a private app. On the other hand, if you want to share your add-on with anyone who has a Google Account, you can publish it as a public app.

To find out more about publishing to the Google Workspace Marketplace, you can watch this video that will walk you through the process.



Getting started

Learn more about creating smart chips for link previewing in the developer documentation. There you will find further information and code samples you can base your solution of. We can’t wait to see what smart chip solutions you will build.

How it’s Made: Interacting with Gemini through multimodal prompting

Posted by Alexander Chen, Creative Director


Let’s try an experiment. We’ll show this picture to our multimodal model Gemini and ask it to describe what it sees:

A close up of a hand with five fingers spread out, palm facing down on a wooden surface.
Tell me what you see
Gemini:  I see a person's right hand. The hand is open with the fingers spread apart.

Ok, let’s try this one:

A closeup of the same hand, now balled up in a fist.
Gemini: A person knocking on a wooden door.

How about this one?

A closeup of the same hand, with two fingers extended like that of a peace sign.
Gemini: I see a hand with two fingers extended, which is a common symbol for the number two.

But wait … what if we asked Gemini to reason about all of these images together?

A sequence of three frames depicting the same hand in different formations. The first frame shows the hand with all five fingers spread out, palm facing down. The second frame shows the hand balled up in a fist. The third frame shows the hand with two fingers extended, like that of a peace sign.
What do you think I'm doing? Hint: it's a game.
Gemini: You're playing rock, paper, scissors.

Nice. What else could we try? Let's see if Gemini can notice this pattern:

A sequence of six frames that alternate between a hand that is balled up in a fist, and a hand that has two fingers extended.
I’m playing rock paper scissors. Do you notice a pattern in my gameplay?
Gemini: You are alternating between rock and scissors.

Neat, it spotted the pattern! But does Gemini really know how this game works? Let’s try asking:

Hmm ... do you think it's a good strategy?
Gemini: No, it is not a good strategy. Your opponent will quickly learn your pattern and will be able to win every time. You should try to randomize your choices more.

Good advice, Gemini! Okay, for this last one, let’s put your multimodal reasoning capabilities to the test. Here’s your prompt:

This is a secret message. What does it say?

A six by eight grid of frames of a hands balled up in a fist or the hand with all five fingers spread out, palm facing down. The series of photos spells 'Gemini' in ASCII binary.

Can you figure it out? We have gotten Gemini to solve it, but we won’t spoil it by putting the answer in this post. 🙂



What is multimodal prompting?

Everything we did just now is an example of “multimodal prompting.” We’re basically giving Gemini combinations of different modalities — image and text in this case — and having Gemini respond by predicting what might come next. Gemini’s ability to seamlessly combine these modes together enables new possibilities for what you can do.

In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video. We’ll soon be rolling out Gemini for people to try in Google AI Studio, our free, web-based developer tool where you’ll be able to try your own multimodal prompts with Gemini. We’re hoping this guide of starter prompts and ideas helps inspire you to start exploring your own ideas.



🧮 Spatial reasoning and logic

Logic and puzzles are fun way to test Gemini. Let’s try one that requires both left-to-right spatial reasoning and knowledge about our solar system:

Three sticky notes on a wooden surface, each featuring a hand-drawn sketch of a solar-body. The left-most note depicts a drawing of the sun. The middle note has a drawing of saturn. The right-most note shows a drawing of earth.
Is this the right order? Consider the distance from the sun and explain your reasoning.

Gemini: No, the correct order is Sun, Earth, Saturn. The Sun is the closest to the center of the solar system, followed by the Earth, and then Saturn.

Nice! Here’s another, inspired by pinewood derby challenges:

Two sticky notes depicting two different hand-drawn cars going down-hill. The car on the left is blocky and awkwardly shaped, the car on the right is slanted and smooth.
Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details.

Gemini: The car on the right is more aerodynamic. It has a lower profile and a more streamlined shape. The car on the left has a higher profile and a more boxy shape, which makes it less aerodynamic.



📷 Image sequences

Next, what if we showed Gemini a sequence of images? Let’s see we can show a few still frames from a game of charades and have Gemini guess the movie:

Four frames from a video of a person leaning back and moving their arms wildly.
Guess what movie I'm acting out.
Gemini: The Matrix
Nice! But which part specifically? Look at my body movements.
Gemini: The part where Neo dodges bullets.



🪄 Magic tricks

Exploring image sequences really lets you dig into how Gemini reasons over time. Let’s try a classic magic trick.

An image of two hands with palms facing the camera, there is a coin in the right hand.
Explain what I'm doing.
Gemini: You are holding a coin in your right hand.
An image of two hands in motion, palms turning inward and away from the camera.
An image of two hands, palms facing down and away from the camera.
Where do you think the coin is?
Gemini: The coin is in your right hand.

Sure, that’s a logical answer given what it’s seen. But let’s pull a sleight of hand as we reveal this next image to Gemini...

An image of two hands, the left palm is facing down and the right hand is tilted upwards revealing nothing under the right palm.
Look here, what happened?
Gemini: The coin is gone.

Gemini notices it’s gone. But can it summarize what may have happened?

Summarize what may have happened, step by step, starting with the first image.

Gemini: 
1. You started with a coin in your right hand.
2. You then made a motion with your hands, and the coin disappeared.
3. You then showed that the coin was gone.

Gemini is leveraging the entire context of this conversation – both text and image – to summarize its answer. Finally, let’s ask:

Can you explain how it disappeared?
Gemini: The coin was palmed.

Gemini makes a really solid guess! But we won’t reveal our secret 🙂.



🔀 Cup shuffling

Let’s try a trick that requires memory and logic: the classic ball and cup shuffling game. For this one, we’ll break it down into steps for Gemini. First, let’s establish the basics with these instructions:

Instructions: There are three cups in position 1 (left), position 2 (middle), position 3 (right).

Next, we’ll need to tell Gemini where we’re placing the ball:

Starting position: The ball is in position 3.
Current state: empty, empty, ball

Notice how we came up with a simple format for remembering where the ball is with “empty, empty, ball.” Next, let’s show Gemini two example turns:

Images:
Three frames depicting a person shuffling three green cups. The person switches positions of the middle and right-most cups. 
Swap: 2 and 3
Current state: empty, ball, empty

Images:
Three frames depicting a person shuffling three green cups. The person switches positions of the middle and left-most cups. 
Swap: 1 and 2
Current state: ball, empty, empty

In these two example turns, we’re both showing it what a swap looks like, and how to update the ball position. Now, let’s see if Gemini can generalize to a swap it hasn't seen:

Images:
Three frames depicting a person shuffling three green cups. The person switches positions of the left-most and right-most cups. 
Swap: 

Here's Gemini's response:

1 and 3
Current state: empty, empty, ball

Gemini got it! It looked at these images and correctly inferred that cups 1 and 3 are being swapped. And it reasoned correctly about how to update the ball position. Let’s ask:

Tell me where the ball is and summarize how it got there, in paragraph form.

Gemini: The ball is in cup 3. 

Starting with the ball in cup 3, the first swap was cup 2 and 3. This put the ball in cup 2. The second swap was cup 1 and 2. This put the ball in cup 1. The third swap was cup 1 and 3. This put the ball in cup 3.

Not only did Gemini get the answer correctly, it accurately summarized the game history. Of course, it won’t always get this challenge right. Sometimes the fake out move (where you swap two empty cups) seems to trip it up, but sometimes it gets that too. But simple prompts like this make it really fun to rapidly test Gemini. You can change the variables in your prompt, including the order of swaps, and see how it does.



🔨 Tool use

If you want to use Gemini in your own apps, you’ll want it to be able to connect to other tools. Let’s try a simple idea where Gemini needs to combine multimodality with tool use: drawing a picture to search for music.

A musical instruments and palm tree sketch on a notebook. The sketch features a guitar, an amplifier, a drum set, and a palm tree drawn in a loose and expressive style. The notebook sits on a wooden desk.
Step 1. Look at this drawing, describe everything in it, and come up with a fun, specific idea for a good musical soundtrack genre, mood, or soundtrack. 
Step 2. Provide a search query in the form: "Search [insert query]"

Gemini: 
The drawing is of a guitar, drums, and an amplifier. There is also a palm tree. The mood is fun, upbeat, and tropical. The genre is reggae. The soundtrack could be used for a movie or TV show set in a tropical location. 

Search "Reggae music for a tropical vacation"

Nice! Gemini both reasons about what it sees and then generates a search query you can parse to do a search. It’s like Gemini is acting like a translator for you – but instead of translating between languages, it’s translating modalities – from drawing to music in this case. With multimodal prompting, you can use Gemini to invent your own entirely new translations between different inputs and outputs.



🕹️Game creation

What if we tried using Gemini to quickly prototype a multimodal game? Here’s an idea: a geography guessing game where you have to point at a map to make your guess. Let’s start by prompting Gemini with the core idea:

Instructions: Let's play a game. Think of a country and give me a clue. The clue must be specific enough that there is only one correct country. I will try pointing at the country on a map.

Next, let’s give Gemini an example turn of gameplay, showing it how we want it to handle both incorrect and correct answers:

Country: This country is huge island that is mostly covered in ice.
(Answer: Greenland) 

Guess:
An image of a world map, a hand icon with the index finger pointing at Canada
I am pointing at the country of Canada *incorrect*

Guess:
An image of a world map, a hand icon with the index finger pointing at Greenland
I am pointing at the country of Greenland *correct*

Let’s give it a go and prompt Gemini to generate a clue:

Clue:
This country is known for its unique wildlife, including kangaroos and koalas. (Answer: Australia)

Ok, that’s a good clue. Let’s test out whether pointing will work. Just for fun, let’s try pointing at the wrong place first:

Guess:
A person pointing at Brazil on a map of the world with their right index finger. The map includes blue oceans, and green continents with no country borders.
I am pointing at the country of Brazil *incorrect*

Great! Gemini looked at my image and figured out I’m pointing at Brazil, and correctly reasoned that’s wrong. Now let’s point at the right place on the map:

A person pointing at Brazil on a map of the world with their right index finger. The map includes blue oceans, and green continents with no country borders.
I am pointing at the country of Australia *correct*

Nice! We’ve basically taught Gemini our game logic just by giving it an example. You'll also notice that it generalized from the illustrated hand in the examples.



⌨️ Coding

Of course, to bring your game idea to life, you’ll eventually have to write some executable code. Let’s see if Gemini can make a simple countdown timer for a game, but with a few fun twists:

Implement a simple timer in HTML/CSS/Javascript. Use a sans serif font and dark mode. Start it at 10 seconds and start counting down. When it reaches zero, replace the timer with a random emoji that is associated with excitement and motivation! Then go back to the timer at 10 seconds and start counting down again.

With just this single instruction, Gemini gives us a working timer that does what we asked for:

An animated gif of a countdown timer starting from 10. At the end of the countdown, a rocket emoji is shown followed lightning bolt emoji and a confetti emoji.

My favorite part is scrolling through Gemini’s source code to find the array of motivational emojis it picked for me:

 const emojis  = ['🚀', '⚡️', '🎉', '🎊', '🥳', '🤩', '✨'];



👀 A sneak peek

Throughout this post, we’ve been giving Gemini an input, and having Gemini make predictions for what might come next. That's basically what prompting is. And our inputs have been multimodal – image and text, combined.

But so far we've only shown Gemini responding in text. Maybe you’re wondering, can Gemini also respond with a combination of image and text? It can! This is a capability of Gemini called “interleaved text and image generation.” While this feature won’t be ready in the first version of Gemini for people to try, we hope to roll it out soon. Here’s a sneak peek of what’s possible.

Let’s see if we could use Gemini to provide everyday creative inspiration. And let’s try it in a domain that requires a bit of multimodal reasoning ... knitting! 🧶. Similar to our map game above, let's provide one example turn of interaction:

Give me three ideas for crochet creations I could make with these two colors:

Balls of blue and yellow yarn on a wooden table.

Colors: I see blue and yellow yarn.
Idea 1: How about a cute dog with one yellow ear? 
Image 1:

A blue and yellow crocheted dog head with ears placed on a wooden table.

Idea 2: Or how about a jellyfish with a few yellow tentacles?
Image 2:

A blue and yellow crocheted octopus with tentacles placed on a wooden table.

Idea 3: Or how about a blue cat with a yellow nose?
Image 3:

A blue and yellow crocheted cat with whiskers placed on a wooden table.

We’re essentially teaching Gemini about how we want each interaction to go: “I’ll take a photo of two balls of yarn, and I expect you (Gemini) to both come up with an idea for something I could make, and generate an image of it.”

Now, let’s show it a new pair of yarn colors it hasn't yet seen, and see if it can generalize:

Give me three ideas for crochet creations I could make with these two colors:

Balls of blue and pink yarn on a wooden table.

Gemini:
Colors: I see blue and pink yarn.
Idea 1: How about a pig with blue ears?
Image 1:

A blue and pink crocheted pig face with ears placed on a wooden table.

Idea 2: Or an octopus with pink and blue tentacles?
Image 2:

A blue and yellow crocheted octopus with tentacles placed on a wooden table.

Idea 3: Or a bunny with a pink nose?
Image 3:

A blue and pink crocheted bunny placed on a wooden table.

Nice! Gemini correctly reasoned about the new colors (“I see blue and pink yarn”) and generated these ideas and the images in a single, interleaved output of text and image.

What Gemini did here is fundamentally different from today’s text-to-image models. It's not just passing an instruction to a separate text-to-image model. It sees the image of my actual yarn on my wooden table, truly doing multimodal reasoning about my text and image together.


What's Next?

We hope you found this a helpful starter guide to get a sense of what’s possible with Gemini. We’re very excited to roll it out to more people soon so you can explore your own ideas through prompting. Stay tuned!

Order Files in Android

Posted by Aditya Kumar – Software Engineer

Context

Binary layout using a symbol order file (also known as binary order file or linker order file) is a well-known link-time optimization. The linker uses the order of symbols in order file to lay out symbols in the binary. Order file based binary layout improves application launch time as well as other critical user journeys. Order file generation is typically a multi-step process where developers use different tools at every stage. We are providing a unified set of tools and documentation that will allow every native app developer to leverage this optimization. Both Android app developers and the AOSP community can benefit from the tools.

Background

Source code is typically structured to facilitate software development and comprehension. The layout of functions and variables in a binary is also impacted by their relative ordering in the source code. The binary layout impacts application performance as the operating system has no way of knowing which symbols will be required in future and typically uses spatial locality as one of the cost models for prefetching subsequent pages.

But the order of symbols in a binary may not reflect the program execution order. When an application executes, fetching symbols that are not present in memory would result in page faults. For example, consider the following program:

// Test.cpp
int foo() { /* */ } int bar() { /* */ } // Other functions... int main() { bar(); foo();

}

Which gets compiled into:

# Test.app page_x: _foo page_y: _bar # Other symbols page_z:_main

When Test.app starts, its entrypoint _main is fetched first then _bar followed by _foo. Executing Test.app can lead to page faults for fetching each function. Compare this to the following binary layout where all the functions are located in the same page (assuming the functions are small enough).

# Test.app page_1: _main page_1: _bar page_1: _foo # Other symbols

In this case when _main gets fetched, _bar and _foo can get fetched in the memory at the same time. In case these symbols are large and they are located in consecutive pages, there is a high chance the operating system may prefetch those pages resulting in less page faults.

Because execution order of functions during an application lifecycle may depend on various factors it is impossible to have a unique order of symbols that is most efficient. Fortunately, application startup sequence is fairly deterministic and stable in general. And it is also possible to build a binary having a desired symbol order with the help of linkers like lld which is the default linker for Android NDK toolchain.

Order file is a text file containing a list of symbols. The linker uses the order of symbols in order file to lay out symbols in the binary. An order file having functions that get called during the app startup sequence can reduce page faults resulting in improved launch time. Order files can improve the launch time of mobile applications by more than 2%. The benefits of order files are more meaningful on larger apps and lower end devices. A more mature order file generation system can improve other critical user journeys.

Design

The order file generation involves the following steps

    • Collect app startup sequence using compiler instrumentation technique
      • Use compiler instrumentation to report every function invocation
      • Run the instrumented binary to collect launch sequence in a (binary) profraw file
    • Generate order file from the profraw files
    • Validate order file
    • Merge multiple order files into one
    • Recompile the app with the merged order file

Overview

The order file generation is based on LLVM’s compiler instrumentation process. LLVM has a stage to generate the order file then recompile the source code using the order file.ALT TEXT


Collect app startup sequence

The source code is instrumented by passing -forder-file-instrumentation to the compiler. Additionally, the -orderfile-write-mapping flag is also required for the compiler to generate a mapping file. The mapping file is generated during compilation and it is used while processing the profraw file. The mapping file shows the mapping from MD5 hash to function symbol (as shown below).

# Mapping file MD5 db956436e78dd5fa main MD5 83bff1e88ac48f32 _GLOBAL__sub_I_main.cpp MD5 c943255f95351375 _Z5mergePiiii MD5 d2d2238cf08db816 _Z9mergeSortPiii MD5 11ed18006e729e73 _Z4partPiii MD5 3e897b5ee8bebbd1 _Z9quickSortPiii

The profile (profraw file) is generated every time the instrumented application is executed. The profile data in the profraw file contains the MD5 hash of the functions executed in chronological order. The profraw file does not have duplicate entries because each function only outputs its MD5 hash on first invocation. A typical run of binary containing the functions listed in the mapping file above can have the following profraw entries.

# Profraw file 00000000 32 8f c4 8a e8 f1 bf 83 fa d5 8d e7 36 64 95 db |2...........6d..| 00000010 16 b8 8d f0 8c 23 d2 d2 75 13 35 95 5f 25 43 c9 |.....#..u.5._%C.| 00000020 d1 bb be e8 5e 7b 89 3e 00 00 00 00 00 00 00 00 |....^{.>........| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

In order to find the function names corresponding to the MD5 hashes in a profraw file, a corresponding mapping file is used.

Note: The compiler instrumentation for order files (-forder-file-instrumentation) only works when an optimization flag (01, 02, 03, 0s, 0z) is passed. So, if -O0 (compiler flag typically used for debug builds) is passed, the compiler will not instrument the binary. In principle, one should use the same optimization flag for instrumentation that is used in shipping release binaries.

The Android NDK repository has scripts that automate the order file generation given a mapping file and an order file.


Recompiling with Order File

Once you have an order file, you provide the path of the order file to the linker using the --symbol-ordering-file flag.


Detailed design

Creating Order File Build Property

The Android Open Source Project (AOSP) uses a build system called soong so we can leverage this build system to pass the flags as necessary. The order file build property has four main fields:

    • instrumentation
    • load_order_file
    • order_file_path
    • cflags

The cflags are meant to add other necessary flags (like mapping flags) during compilation. The load_order_file and order_file_path tells the build system to recompile with the order file rather than set it to the profiling stage. The order files must be in saved in one of two paths:

    • toolchain/pgo-profiles/orderfiles
    • vendor/google_data/pgo_profile/orderfiles

# Profiling orderfile: { instrumentation: true, load_order_file: false, order_file_path: "", cflags: [ "-mllvm", "-orderfile-write-mapping=<filename>-mapping.txt", ], } #Recompiling with Order File orderfile: { instrumentation: true, load_order_file: true, order_file_path: "<filename>.orderfile", }

Creating order files

We provide a python script to create an order file from a mapping file and a profraw file. The script also allows removing a particular symbol or creating an order file until a particular symbol.

Script Flags:

        • Profile file (--profile-file):
                • Description: The profile file generated by running a binary compiled with -forder-file-instrumentation
        • Mapping file (--mapping-file):
                • Description: The mapping file generated during compilation that maps MD5 hashes to symbol names
        • Output file (--output):
                • Description: The output file name for the order file. Default Name: default.orderfile
        • Deny List (--denylist):
                • Description: Symbols that you want to exclude from the order file
        • Last symbol (--last-symbol):
                • Description: The order file will end at the passed last symbol and ignore the symbols after it. If you want an order file only for startup, you should pass the last startup symbol. Last-symbol has priority over leftover so we will output until the last symbol and ignore the leftover flag.
        • Leftover symbols (--leftover):
                • Description: Some symbols (functions) might not have been executed so they will not appear in the profile file. If you want these symbols in your order file, you can use this flag and it will add them at the end.

Validating order files

Once we get an order file for a library or binary, we need to check if it is valid based on a set of criteria. Some order files may not be of good quality so they are better discarded. This can happen due to several reasons like application terminated unexpectedly, the runtime could not write the complete profraw file before exiting, an undesired code-sequence was collected in the profile, etc. To automate this process, we provide a python script that can help developers check for:

    • Partial order that needs to be in the order file
    • Symbols that have to be present in order file
    • Symbols that should not be present in order file
    • Minimum number of symbols to make an order file

Script Flags:

        • Order file (--order-file):
                • Description: The order file you are validating on the below criteria.
        • Partial Order (--partial):
                • Description: A partial order of symbols that must be held in the order file.
        • Allowed Lists (--allowlist):
                • Description: Symbols that must be present in the order file.
        • Denied Lists (--denylist):
                • Description: Symbols that should not be in the order file. Denylist flag has priority over allowlist.
        • Minimum Number of Entries (--min):
                • Description: Minimum number of symbols needed for an order file

Merging orderfiles

At a higher level, the order file symbols in a collection of order files approximate a partial order (poset) of function names with order defined by time of execution. Across different runs of an application, the order files might have variations. These variations could be due to OS, device class, build version, user configurations etc. However, the linker can only take one order file to build an application. In order to have one order file that provides the desired benefits, we need to merge these order files into a single order file. The merging algorithm also needs to be efficient so as to not slow down the build time. There are non-linear clustering algorithms that may not scale well for merging large numbers of order files, each having many symbols. We provide an efficient merging algorithm that converges quickly. The algorithm allows for customizable parameters, such that developers can tune the outcome.

Merging N partial order sets can be done either pessimistically (merging a selection of order files all the way until there is one order file left) or optimistically (merging all of them at once). The pessimistic approach can be inefficient as well as sub-optimal. As a result, it is better to work with all N partial order sets at once. In order to have an efficient implementation it helps to represent all N posets with a weighted directed Graph (V,E) where:

    • V: Elements of partial order sets (symbols) and the number of times it appears in different partial order sets. Note that the frequency of vertices may be greater than the sum of all incoming edges because of invocations from uninstrumented parts of binary, dependency injection etc.
    • E (V1 -> V2): An edge occurs if the element of V2 immediately succeeds V1 in any partial order set with its weight being the number of times this happens.

For a binary executable, there is one root (e.g., main) vertex, but shared libraries might have many roots based on which functions are called in the binary using them. The graph gets complicated if the application has threads as they frequently result in cycles. To have a topological order, cycles are removed by preferring the highest probability path over others. A Depth-First traversal that selects the highest weighted edge serves the purpose.

Removing Cycles:

- Mark back edges during a Depth-First traversal - For each Cycle (c):      - Add the weights of all in-edges of each vertex (v) in the cycle excluding the edges in the cycle      - Remove the cycle edge pointing **to** the vertex with highest sum

After cycles are removed, the same depth first traversal gives a topological order (the order file) when all the forward edges are removed. Essentially, the algorithm computes a minimum-spanning-tree of maximal weights and traverses the tree in topological order.

Producing an order:

printOrderUtil(G, n, order):    - If n was visited:         - return    - Add n to the end of order    - Sort all out edges based on weight    - For every out_edge (n, v):        - printOrderUtil(G, v, order) printOrder(G):    - Get all roots    - order = []    - For each root r:        - printOrderUtil(G, r, order)    - return order

Example:

Given the following order files:

    • main -> b -> c -> d
    • main -> a -> c
    • main -> e -> f
    • main -> b
    • main -> b
    • main -> c -> b
Flow diagram of orderfiles

The graph to the right is obtained by removing cycles.

    • DFS: main -> b-> c -> b
    • Back edge: c -> b
    • Cycle: b -> c-> b
    • Cycle edges: [b -> c, c -> b]
    • b’s sum of in-edges is 3
    • c’s sum of in-edges is 2
    • This implies b will be traversed from a higher frequency edge, so c -> b is removed
    • Ignore forward edges a->c, main->c
    • The DFS of the acyclic graph on the right will produce an order file main -> b -> c -> d -> a -> e -> f after ignoring the forward edges.

Collecting order files for Android Apps (Java, Kotlin)

The order file instrumentation and profile data collection is only enabled for C/C++ applications. As a result, it cannot benefit Java or Kotlin applications. However, Android apps that ship compiled C/C++ libraries can benefit from order file.

To generate order file for libraries that are used by Java/Kotlin applications, we need to invoke the runtime methods (called as part of order file instrumentation) at the right places. There are three functions that users have to call:

    • __llvm_profile_set_filename(char *f): Set the name of the file where profraw data will be dumped.
    • __llvm_profile_initialize_file: Initialize the file set by __llvm_profile_set_filename
    • __llvm_orderfile_dump: Dumps the profile(order file data) collected while running instrumented binary

Similarly, the compiler and linker flags should be added to build configurations. We provide template build system files e.g, CMakeLists.txt to compile with the correct flags and add a function to dump the order files when the Java/Kotlin application calls it.

# CMakeLists.txt set(GENERATE_PROFILES ON) #set(USE_PROFILE "${CMAKE_SOURCE_DIR}/demo.orderfile") add_library(orderfiledemo SHARED orderfile.cpp) target_link_libraries(orderfiledemo log) if(GENERATE_PROFILES) # Generating profiles require any optimization flag aside from -O0. # The mapping file will not generate and the profile instrumentation does not work without an optimization flag. target_compile_options( orderfiledemo PRIVATE -forder-file-instrumentation -O2 -mllvm -orderfile-write-mapping=mapping.txt ) target_link_options( orderfiledemo PRIVATE -forder-file-instrumentation ) target_compile_definitions(orderfiledemo PRIVATE GENERATE_PROFILES) elseif(USE_PROFILE) target_compile_options( orderfiledemo PRIVATE -Wl,--symbol-ordering-file=${USE_PROFILE} -Wl,--no-warn-symbol-ordering ) target_link_options( orderfiledemo PRIVATE -Wl,--symbol-ordering-file=${USE_PROFILE} -Wl,--no-warn-symbol-ordering ) endif()

We also provide a sample app to dump order files from a Kotlin application. The sample app creates a shared library called “orderfiledemo” and invokes the DumpProfileDataIfNeeded function to dump the order file. This library can be taken out of this sample app and can be repurposed for other applications.

// Order File Library #if defined(GENERATE_PROFILES) extern "C" int __llvm_profile_set_filename(const char *); extern "C" int __llvm_profile_initialize_file(void); extern "C" int __llvm_orderfile_dump(void); #endif void DumpProfileDataIfNeeded(const char *temp_dir) { #if defined(GENERATE_PROFILES) char profile_location[PATH_MAX] = {}; snprintf(profile_location, sizeof(profile_location), "%s/demo.output", temp_dir); __llvm_profile_set_filename(profile_location); __llvm_profile_initialize_file(); __llvm_orderfile_dump(); __android_log_print(ANDROID_LOG_DEBUG, kLogTag, "Wrote profile data to %s", profile_location); #else __android_log_print(ANDROID_LOG_DEBUG, kLogTag, "Did not write profile data because the app was not " "built for profile generation"); #endif } extern "C" JNIEXPORT void JNICALL Java_com_example_orderfiledemo_MainActivity_runWorkload(JNIEnv *env, jobject /* this */, jstring temp_dir) { DumpProfileDataIfNeeded(env->GetStringUTFChars(temp_dir, 0)); }

# Kotlin Application class MainActivity : AppCompatActivity() { private lateinit var binding: ActivityMainBinding override fun onCreate(savedInstanceState: Bundle?) { super.onCreate(savedInstanceState) binding = ActivityMainBinding.inflate(layoutInflater) setContentView(binding.root) runWorkload(applicationContext.cacheDir.toString()) binding.sampleText.text = "Hello, world!" } /** * A native method that is implemented by the 'orderfiledemo' native library, * which is packaged with this application. */ external fun runWorkload(tempDir: String) companion object { // Used to load the 'orderfiledemo' library on application startup. init { System.loadLibrary("orderfiledemo") } } }

Limitation

order file generation only works for native binaries. The validation and merging scripts will work for any set of order files.

References

External References

MediaPipe for Raspberry Pi and iOS

Posted by Paul Ruiz, Developer Relations Engineer

Back in May we released MediaPipe Solutions, a set of tools for no-code and low-code solutions to common on-device machine learning tasks, for Android, web, and Python. Today we’re happy to announce that the initial version of the iOS SDK, plus an update for the Python SDK to support the Raspberry Pi, are available. These include support for audio classification, face landmark detection, and various natural language processing tasks. Let’s take a look at how you can use these tools for the new platforms.

Object Detection for Raspberry Pi

Aside from setting up your Raspberry Pi hardware with a camera, you can start by installing the MediaPipe dependency, along with OpenCV and NumPy if you don’t have them already.

python -m pip install mediapipe

From there you can create a new Python file and add your imports to the top.

import mediapipe as mp from mediapipe.tasks import python from mediapipe.tasks.python import vision import cv2 import numpy as np

You will also want to make sure you have an object detection model stored locally on your Raspberry Pi. For your convenience, we’ve provided a default model, EfficientDet-Lite0, that you can retrieve with the following command.

wget -q -O efficientdet.tflite -q https://storage.googleapis.com/mediapipe-models/object_detector/efficientdet_lite0/int8/1/efficientdet_lite0.tflite

Once you have your model downloaded, you can start creating your new ObjectDetector, including some customizations, like the max results that you want to receive, or the confidence threshold that must be exceeded before a result can be returned.

# Initialize the object detection model base_options = python.BaseOptions(model_asset_path=model)options = vision.ObjectDetectorOptions(                                   base_options=base_options,                                   running_mode=vision.RunningMode.LIVE_STREAM,                                   max_results=max_results,                                                       score_threshold=score_threshold,                                    result_callback=save_result) detector = vision.ObjectDetector.create_from_options(options)

After creating the ObjectDetector, you will need to open the Raspberry Pi camera to read the continuous frames. There are a few preprocessing steps that will be omitted here, but are available in our sample on GitHub.

Within that loop you can convert the processed camera image into a new MediaPipe.Image, then run detection on that new MediaPipe.Image before displaying the results that are received in an associated listener.

mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb_image) detector.detect_async(mp_image, time.time_ns())

Once you draw out those results and detected bounding boxes, you should be able to see something like this:

Moving image of a person holidng up a cup and a phone, and detected bounded boxes identifying these items in real time

You can find the complete Raspberry Pi example shown above on GitHub, or see the official documentation here.

Text Classification on iOS

While text classification is one of the more direct examples, the core ideas will still apply to the rest of the available iOS Tasks. Similar to the Raspberry Pi, you’ll start by creating a new MediaPipe Tasks object, which in this case is a TextClassifier.

var textClassifier: TextClassifier? textClassifier = TextClassifier(modelPath: model.modelPath)

Now that you have your TextClassifier, you just need to pass a String to it to get a TextClassifierResult.

func classify(text: String) -> TextClassifierResult? { guard let textClassifier = textClassifier else { return nil } return try? textClassifier.classify(text: text) }

You can do this from elsewhere in your app, such as a ViewController DispatchQueue, before displaying the results.

let result = self?.textClassifier.classify(text: inputText) let categories = result?.classificationResult.classifications.first?.categories?? []

You can find the rest of the code for this project on GitHub, as well as see the full documentation on developers.google.com/mediapipe.

Moving image of TextClasifier on an iPhone

Getting started

To learn more, watch our I/O 2023 sessions: Easy on-device ML with MediaPipe, Supercharge your web app with machine learning and MediaPipe, and What's new in machine learning, and check out the official documentation over on developers.google.com/mediapipe.

We look forward to all the exciting things you make, so be sure to share them with @googledevs and your developer communities!

Media3 is ready to play!

Posted by Nevin Mital - Developer Relations Engineer, Android Media

Today, we’re pleased to announce the full release of the Jetpack Media3 library. After sharing a first look at the library at Android Developer Summit 2021, we published several alpha and beta releases over the past several months to ensure a high-quality set of APIs that we now encourage everyone to adopt.

Media3 is the new home for APIs that enable you to create rich audio and video experiences. If you’ve used libraries like ExoPlayer, MediaCompat, or Media2, you’ll find Media3 to be familiar. However, instead of using these separate libraries, Media3 provides a unified API for playback use-cases and also expands to cover new use-cases like video editing and transcoding. The APIs are simple to use yet powerful, customizable to meet your needs, and reliable and optimized so you can build for the diverse Android device ecosystem.

In this blog post, we’ll focus on the playback APIs in Media3, so please stay tuned for an upcoming post where we’ll dive deeper into the video editing and transcoding APIs. As a brief introduction, the following table describes key components for playback in Media3:

Player

An interface that defines traditional high-level functionality for an audio or video player, such as playback controls.

ExoPlayer

The default implementation of the Player interface in Media3.

MediaSession

An API that advertises media playback to and receives playback command requests from external clients.

MediaSessionService

A service that holds a MediaSession to enable background playback.

MediaLibraryService

A service that additionally allows you to expose a content library to external clients.

MediaController

An API that is generally used by external clients to retrieve playback information and send playback command requests to your media app. Complementary to a MediaSession. Examples of external clients include the notification and lock screen media controls on mobile and large screen devices, Android Auto, WearOS, and Google Assistant.

MediaBrowser

An API that additionally enables external clients to navigate your media app’s content library. Complementary to a MediaLibraryService.

Our developer documentation has more details on these components. Let’s take a closer look into what this new library offers and how you can start using it.

Keeping it simple

By consolidating the APIs for the playback developer journey into a single library, Media3 is able to introduce a Player interface that is used by several components, such as MediaSession and MediaController. This interface outlines traditional high-level functionality for audio and video playback, such as playback controls and the ability to query properties of the currently playing media.

Having a common interface for all “player-like” components means that creating new instances of these objects is straightforward:

val player = ExoPlayer.builder(context).build() val session = MediaSession.Builder(context, player).build() val controller = MediaController.Builder(context, session.token).build()

Media3's MediaSession and MediaController will automatically reflect the state of the components they're connected to. As a result, you can also simplify your app’s architecture by removing connectors like ExoPlayer’s MediaSessionConnector and more easily follow the flow of logic through your app. Calling play() on the MediaController will forward the action to the MediaSession, which will then forward it to the player.

Similarly, Media3 aims to make background playback cases easier to handle. The PlayerNotificationManager from ExoPlayer is no longer needed, as Media3’s MediaSessionService and MediaLibraryService automatically handle publishing a media notification as needed. The library handles configuring, starting, and stopping a foreground service for you as needed, but please also note some known issues summarized in this comment.

ExoPlayer is deprecated, long live ExoPlayer!

ExoPlayer has a new home and is the default implementation of the aforementioned Player interface in Media3. The standalone ExoPlayer project, with package name com.google.android.exoplayer2, will soon be discontinued, and future updates will be published in Media3. For the next few months, we’ll continue publishing equivalent releases of both Media3 and ExoPlayer to help you make the transition to Media3. For example, this means that ExoPlayer 2.18.5 and ExoPlayer in Media3 1.0.0 are identical aside from their package names. However, this is only temporary and we will deprecate the standalone ExoPlayer later this year, so we highly recommend migrating to Media3 as soon as possible. The “Migrating to Media3” section below describes the process in more detail, which includes a script that handles most of the work for you.

Note that Media3 is developed with the same philosophy as ExoPlayer (and in fact, is developed by the same team!). In other words, Media3 retains ExoPlayer’s customizable components, open source development on GitHub, receptivity to pull requests, and public issue tracker, to name a few similarities.

Migrating to Media3

As mentioned previously, the standalone ExoPlayer project, with package name com.google.android.exoplayer2, will soon be discontinued, so to continue receiving updates, you will need to migrate to Media3 ExoPlayer. Other Media APIs that should be migrated to Media3 include, but are not limited to, MediaSessionConnectorMediaBrowserServiceCompat, and MediaBrowserCompat.

We’ve prepared two key resources to help you achieve this migration as smoothly as possible:

  1. migration guide to walk you through the process step-by-step
  2. migration script to convert your standalone ExoPlayer project packages to the corresponding new modules and packages under Media3

The good news is that if you’re currently using ExoPlayer, there’s no need for any code changes and no need to re-integrate or re-write any customizations. The standalone ExoPlayer and Media3 ExoPlayer are identical aside from the package name, and the conversion can be done automatically with the aforementioned migration script. Just make sure you’ve updated your project to use the latest version of ExoPlayer before getting started. For full details and steps, please refer to the migration guide.

Furthermore, since Media3 is fully backwards-compatible with prior media APIs such as MediaControllerCompat and MediaMetadataCompat, your existing integrations will continue to work as before even after the migration. Note that new features such as per-controller customization of commands are only available for clients using Media3. That is to say, for example, all legacy controllers, such as MediaControllerCompat, will receive the same set of available commands. You can identify a legacy controller by checking if getControllerVersion() returns 0 in the MediaSession.ControllerInfo.

The power of Media3, in the palm of your hand

Media3 offers several options for you to adjust its behavior to better fit your needs. The next few sections describe some such mechanisms.

Play it your own way

Although ExoPlayer is the recommended Player implementation to use for audio and video streaming apps, Media3 also introduces the SimpleBasePlayer to minimize the number of methods you need to implement to integrate with a custom player. Start by implementing the getState method. This is where you can declare the Command set supported by your player and configure metadata such as the currently playing media item index and the current timestamp.

class CustomPlayer : SimpleBasePlayer(looper) { override fun getState(): State { // Set available Commands // Configure playWhenReady, mediaItemIndex, currentPosition, etc. } // Implement methods required by available Commands }

The SimpleBasePlayer class will enforce valid player state and handle informing listeners of state changes. Additionally, any methods related to a Command you don’t declare as available are ignored, so beyond getState, you only need to implement the methods that will actually be used.

Better control over your commands

The MediaSession and MediaController APIs have also been updated to give you more control. With Media3, you can advertise your app’s playback capabilities on a per-controller basis. Modify the commands available to a client app in the onConnect method of your MediaSession.Callback. For example, to prevent a client app with package name com.example.myClient from having access to the “seek to next media item” Player.Command:

var sessionCallback = object : MediaSession.Callback { override fun onConnect( session: MediaSession, controller: MediaSession.ControllerInfo ): MediaSession.ConnectionResult { val connectionResult = super.onConnect(session, controller) if (controller.packageName == "com.example.myClient") { val availablePlayerCommands = connectionResult.availablePlayerCommands.buildUpon() .remove(Player.COMMAND_SEEK_TO_NEXT_MEDIA_ITEM) // Disallow myClient from being able to skip to the next media item .build() return MediaSession.ConnectionResult.accept( connectionResult.availableSessionCommands, availablePlayerCommands ) } return connectionResult // Other clients retain normal command access } } var mediaSession = MediaSession.Builder(context, player) .setCallback(sessionCallback) // Remember to set the callback on your MediaSession! .build()

Creating custom commands

Of course, as with the previous media APIs, you can add custom commands tailored to your app. To implement a custom command, create a new SessionCommand. Similar to as shown above, you can give controllers access to this custom command by including it in the list of available session commands. You can handle custom command behavior in the onCustomCommand method of the same Callback:

override fun onCustomCommand( session: MediaSession, controller: MediaSession.ControllerInfo, customCommand: SessionCommand, args: Bundle ): ListenableFuture<SessionResult> { if (customCommand.customAction == MY_CUSTOM_COMMAND) { // Do custom action return Futures.immediateFuture(SessionResult(SessionResult.RESULT_SUCCESS)) } // Return error for invalid custom command return Futures.immediateFuture(SessionResult(SessionResult.RESULT_ERROR_BAD_VALUE)) }

You can also ask client apps to display your custom command by including it in a setCustomLayout call in the onPostConnect method of the MediaSession.Callback.

Next steps

We’d love for you to start using Media3 in your app! 

To start exploring the library, feel free to check out the demo app to see an example of audio and video playback, including how to integrate with a media session. Stay tuned to our developer guides for more detailed guidance on the different components in Media3 landing soon. Our sample app, the Universal Android Music Player, and our testing tool, the Media Controller Test app, will also be updated to Media3 on their main branches in the coming weeks.

If you run into any issues, have any feature requests, or would like to share any other sort of feedback, please let us know using the Media3 issue tracker on GitHub. We look forward to hearing from you!

Build your first AppSheet app: how I built a food tracker

Posted by Filipe Gracio, PhD - Customer Engineer

I keep forgetting what I have in the freezer. At first I used Google Sheets to keep track of it, but I wanted something that was easy to consult and update on my smartphone. So I turned to AppSheet! Here’s a tutorial to follow to make a similar tracking solution.

Creating the database

First I created a database that imported my data from the Sheet:

A cropped screen shot illustrating creating a database in AppSheet by importing data from sheets

After I selected “Import from Sheets” and selected the sheet I was cumbersomely maintaining, I get the preview of the new database:

A cropped screen shot illustrating creating a database in AppSheet by importing data from sheets

Creating the App

Then I can go back and create an App:

A cropped screen shot illustrating creating an app

After I name it I can select the database I just created.

A cropped screen shot illustrating step 1 of selecting the database

Then

A cropped screen shot illustrating step 2 of selecting the database

The App now starts getting created, and then I can start customizing it!

Customizing the App:

I decided I want to actually add more information to the App. For example, I want to categorize my items, so I need another column. I can edit the data for this and I'll add a column “Category”.

A cropped screen shot illustrating editing the data

After adding the extra column, this is the result:

A cropped screen shot showing the data with the new column added

That’s going to come in handy later for presentation and organization!

Now let's do some configuration about how the items are presented on the actual app. That’s in the UX section of the App builder. I want to select “Table”, Group by “category” and then sort alphabetically by “Item”

A cropped screen shot showing The Pirmary views in the UX Section of the App builder

After tweaking a few more options in UX “Brand” and “Format Rules”, this is how my app is looks:

A screen shot of the app on a mobile device displaying with content from the original dataset

Using the App - adding and updating items.

Now, I can see what I have in the freezer at all times. If I cook something and have a leftover, I can just add it by clicking the + button. After that, I just need to add in the info:

A screen shot illustrating the functionality of the app on a mobile device

And of course, if I use something I can just tap on it to edit the amount (or delete it).

Try it yourself!

This small App is something I use every week now! It is much easier than my old method, plus I learned how to use AppSheet. And this was just a quite simple use case - which only touched the tip of the iceberg of AppSheet’s features. If you work for organizations that have information to share and organize, this technology could be useful for you.

Try it out for yourself: you can use the complete set of AppSheet features at no cost while building one or many app prototypes. You can also invite up to 10 test users at no cost to use your apps and share feedback.

Thank you to my colleague Florian Opitz, Customer Engineer - Google Workspace + Security , for his useful edits and suggestions.

Google Dev Library Letters: 19th Edition

Posted by the Dev Library team

In this newsletter, we’re highlighting the best projects developed with Google technologies that have been contributed to the Google Dev Library platform. We hope this will spark some inspiration for your next project!


Contributions of the Month


[ML] Serving Stable Diffusion by Chansung Park

Learn the various ways to deploy Stable Diffusion with TensorFlow Serving, Hugging Face Endpoint, and FastAPI.


[ML] Textual inversion pipeline for Stable Diffusion by Chansung Park

Dive into this repository which demonstrates how to manage multiple models and their prototype applications of fine-tuned Stable Diffusion on new concepts by Textual Inversion.

Read more on DevLibrary 


[Flutter] Animated soccer rating hexagon by Prateek Sharma

Create a hexagon widget in Flutter that displays the ratings of a soccer player or team. The six sides represent a different aspect of the player or team's rating such as speed, strength, and accuracy.

Read more on DevLibrary 


Android & Kotlin


Mastering Kotlin Coroutines by Amit Shekhar

Dive into an introduction to coroutines in Kotlin programming language. Coroutines are a way to write asynchronous and non-blocking code in a sequential and easy-to-understand manner.

Kotlin Symbol Processing (KSP) for code generation by Tim Lin

Discover more about KSP API you can use to develop lightweight compiler plugins, which helps you get the complete source code information during compile time.

Form Conductor by Naing Aung Luu

Learn about form conductor. More than form validation, it provides a handful of reusable API to construct a form in simple easy steps.

MovieDB by Gabriel Bronzatti Moro

Discover how to fetch data from Movie DB API and allow users to search for movies and view details and store them on a local database in this Android project.


Angular


A complete guide to Angular Multilingual Application by Hossein Mousavi

Dive into the technical aspects of building a multilingual Angular application, starting with the localization of the application's text.


Flutter


Bank cards UI by Ethiel Adiassa

See how Flutter can be used to create aesthetically pleasing and functional UI designs for banking applications.

macOS UI by Reuben Turner

Dive into the repo resource for designers and developers looking to create beautiful templates and tutorials to create macOS applications and interfaces.


Google Cloud


Search for Brazilian laws using Dialogflow CX and matching engine by Rubens Zimbres

Develop a chatbot using Dialogflow CX and a matching engine to help users search for something specific in legislation.

Awesome CloudOps automation by Doug Sillars

Learn how a single repository could satisfy all your day-to-day CloudOps automation needs.

Serverless Kubernetes on Google Cloud Platform by Gursimar Singh

Learn how serverless technologies like Cloud Run can be used to simplify and expedite the process of designing software applications.

Implement secure CI/CD with Workload Identity Federation, GitLab CI, and Cloud Deploy by Ezekias Bokove

See how to implement a secure Continuous Integration/Continuous Deployment (CI/CD) pipeline using Workload Identity Federation and GitLab CI.


Google Dev Library Letters: 18th Edition

Posted by the Dev Library Team

In this newsletter, we’re highlighting the best projects developed with Google technologies that have been contributed to the Google Dev Library platform. We hope this will spark some inspiration for your next project!


Contributions of the month


Moving image showing SSImagePicker in different modes

[Android] SSImagePicker by Simform

See how to use a lightweight and easy-to-use image picker library that has features like cropping, compression and rotation, video, and Live Photos support.



Moving image showing overview of coroutines

[Kotlin] Mastering Coroutines in Kotlin by Reyhaneh Ezatpanah

Dive into a comprehensive overview of coroutines including tips and best practices, along with a detailed explanation of the different types of coroutines available in Kotlin and how to use them effectively.

Read more on DevLibrary


Flow Chart demonstrating Image to Image stable diffusion in Flax

[Machine Learning] Image2Image with Stable Diffusion in Flax by Bachir Chihani

Learn the uses of the Diffusion method, a technique used to improve the stability and performance of image-to-image translation models.

Read more on DevLibrary


Android


Jetpack Compose state, deconstructed by Yves Kalume

Learn how state management in Jetpack Compose is implemented, how it can be used to build a responsive and dynamic UI, and how it compares to other solutions in Android development.


Dynamic environment switching on Android by Ashwini Kumar

Find out how to switch between different environments (such as development, staging, and production) in an Android app.


Migration to Jetpack Compose for a legacy application by Abhishek Saxena

Migrate an existing legacy Android application to Jetpack Compose, a modern UI toolkit for building native Android apps



Machine Learning


Simple diffusion in TensorFlow by Bachir Chihani

Understand the benefits of using TensorFlow for image processing, including the ability to easily parallelize computations and utilize GPUs for faster processing.


Deep dive into stable diffusion by Bachir Chihani

Look into the Flax implementation of the Stable Diffusion model to better understand how it works.


Create-tf-app by Radi Cho

See the tool that allows you to quickly create a TensorFlow application by generating the necessary code and file structure.

 

Angular


NGX-Valdemort by Cédric Exbrayat

Dive into a set of pre-built validation rules and error messages for commonly encountered use cases, making it easy to quickly implement robust form validation for your application.


Passing configuration dynamically from one module to another using ModuleWithProviders by Madhusuthanan B

Learn how to pass configuration data dynamically between modules in an Angular application.


Flutter


Mastering Dart & Flutter DevTools by Ashita Prasad

Look at the first part of the series aimed at helping developers to understand how to use the tools effectively to build applications with Dart and Flutter.


Server-driven UI in Flutter - an experiment on remote widgets by Akshat Vinaybhai Patel

Learn the insights, code snippets and results of the experiment for readers to better understand the concept of Server-Driven UI and its potential in Flutter app development.


Flutter Photo Manager by Alex Li

Learn an easy-to-use API for accessing the device's photo library, that performs operations like retrieving images, videos, and albums, as well as deleting, creating, and updating files in the photo library.


Firebase


How to authenticate to Firebase using email and password in Jetpack Compose? By Alex Mamo 

Here’s a simple solution for implementing Firebase Authentication with email and password, using a clean architecture with Jetpack Compose on Android.


Google Cloud


Google Firestore Data Source plugin for Grafana by Prasanna Kumar

Learn how it allows users to perform operations like querying, aggregating, and visualizing data from Firestore, making it a powerful tool for monitoring and analyzing real-time data in a variety of applications. The repository provides the source code for the plugin and documentation on how to install and use it with Grafana.


Cluster cloner by Joshua Fox

See how this project aims to replicate clusters across different cloud environments and examine these varying infrastructure models.


Getting to know Cloud Firestore by Mustapha Adekunle

Learn how this post covers the basic features and benefits of Cloud Firestore, and how this document database is a scalable and versatile NoSQL cloud database.


Google’s Mandar Chaphalkar has submitted Data Governance with Dataplex

Discover how Dataplex can be used to transform data to meet specific business requirements, and how it can integrate with other Google Cloud services like BigQuery for efficient data storage and analysis.