Category Archives: Android Developers Blog

An Open Handset Alliance Project

Chrome on Android to support third-party autofill services natively

Posted by Eiji Kitamura – Developer Advocate

Chrome on Android will soon allow third-party autofill services (like password managers) to natively autofill forms on websites. Developers of these services need to tell their users to toggle a setting in Chrome to continue using their service with Chrome.


Background

Google is the default autofill service on Chrome, providing passwords, passkeys and autofill for other information like addresses and payment data.

A third-party password manager can be set as the preferred autofill service on Android through System Settings. The preferred autofill service can fill across all Android apps. However, to autofill forms on Chrome, the autofill service needs to use "compatibility mode". This causes glitches on Chrome such as janky page scrolling and potentially showing duplicate suggestions from Google and a third-party.

With this coming change, Chrome on Android will allow third-party autofill services to natively autofill forms giving users a smoother and simpler user experience. Third-party autofill services can autofill passwords, passkeys and other information like addresses and payment data, as they would in other Android apps.


Try the feature yourself

You can already test the functionality on Chrome 131 and later. First, set a third-party autofill service as preferred in Android 14:

Note: Instructions may vary by device manufacturer. The below steps are for a Google Pixel device running Android 15.
    1. Open Android's System Settings
    2. Select Passwords, passkeys & accounts
    3. Tap on Change button under Preferred service
    4. Select a preferred service
    5. Confirm changing the preferred autofill service

Side by side screenshots show the steps involved in enabling third-party autofill service from your device: first tap 'Change', then select the new service, and finally confirm the change.

Secondly, enable third-party autofill service on Chrome

    1. Open Chrome on Android
    2. Open chrome://flags#enable-autofill-virtual-view-structure
    3. Set the flag to "Enabled" and restart
    4. Open Chrome's Settings and tap Autofill Services
    5. Choose Autofill using another service
    6. Confirm and restart Chrome
Note: Steps 2 and 3 are not necessary after Chrome 131. Chrome 131 is scheduled to be stable on November 12th, 2024.
Side by side screenshots show the steps involved in changing your preferred password service on a smartphone: first tap 'Autofill Services', then select 'Autofill using another service', and finally restart Chrome to complete setup.

You can emulate how Chrome behaves after compatibility mode is disabled by updating chrome://flags#suppress-autofill-via-accessibility to Enabled.

Actions required from third-party autofill services

Implementation wise, autofill service developers don't need an additional implementation as long as they have a proper integration with autofill services. Chrome will gracefully respect it and autofill forms.

Chrome plans to stop supporting compatibility mode in early 2025. Users must select Autofill using another service in Chrome settings to ensure their autofill experience is unaffected. The new setting is available in Chrome 131. You should encourage your users to toggle the setting, to ensure they have the best autofill experience possible with your service and Chrome on Android.


Timeline

    • October 16th, 2024: Chrome 131 beta is available
    • November 12th, 2024: Chrome 131 is in stable
    • Early 2025: Compatibility mode is no longer available on Chrome

Creating a responsive dashboard layout for JetLagged with Jetpack Compose

Posted by Rebecca Franks - Developer Relations Engineer

This blog post is part of our series: Adaptive Spotlight Week where we provide resources—blog posts, videos, sample code, and more—all designed to help you adapt your apps to phones, foldables, tablets, ChromeOS and even cars. You can read more in the overview of the Adaptive Spotlight Week, which will be updated throughout the week.


We’ve heard the news, creating adaptive layouts in Jetpack Compose is easier than ever. As a declarative UI toolkit, Jetpack Compose is well suited for designing and implementing layouts that adjust themselves to render content differently across a variety of sizes. By using logic coupled with Window Size Classes, Flow layouts, movableContentOf and LookaheadScope, we can ensure fluid responsive layouts in Jetpack Compose.

Following the release of the JetLagged sample at Google I/O 2023, we decided to add more examples to it. Specifically, we wanted to demonstrate how Compose can be used to create a beautiful dashboard-like layout. This article shows how we’ve achieved this.

Moving image demonstrating responsive design in Jetlagged where items animate positions automatically
Responsive design in Jetlagged where items animate positions automatically

Use FlowRow and FlowColumn to build layouts that respond to different screen sizes

Using Flow layouts ( FlowRow and FlowColumn ) make it much easier to implement responsive, reflowing layouts that respond to screen sizes and automatically flow content to a new line when the available space in a row or column is full.

In the JetLagged example, we use a FlowRow, with a maxItemsInEachRow set to 3. This ensures we maximize the space available for the dashboard, and place each individual card in a row or column where space is used wisely, and on mobile devices, we mostly have 1 card per row, only if the items are smaller are there two visible per row.

Some cards leverage Modifiers that don’t specify an exact size, therefore allowing the cards to grow to fill the available width, for instance using Modifier.widthIn(max = 400.dp), or set a certain size, like Modifier.width(200.dp).

FlowRow(
    modifier = Modifier.fillMaxSize(),
    horizontalArrangement = Arrangement.Center,
    verticalArrangement = Arrangement.Center,
    maxItemsInEachRow = 3
) {
    Box(modifier = Modifier.widthIn(max = 400.dp))
    Box(modifier = Modifier.width(200.dp))
    Box(modifier = Modifier.size(200.dp))
    // etc 
}

We could also leverage the weight modifier to divide up the remaining area of a row or column, check out the documentation on item weights for more information.


Use WindowSizeClasses to differentiate between devices

WindowSizeClasses are useful for building up breakpoints in our UI for when elements should display differently. In JetLagged, we use the classes to know whether we should include cards in Columns or keep them flowing one after the other.

For example, if WindowWidthSizeClass.COMPACT, we keep items in the same FlowRow, where as if the layout it larger than compact, they are placed in a FlowColumn, nested inside a FlowRow:

            FlowRow(
                modifier = Modifier.fillMaxSize(),
                horizontalArrangement = Arrangement.Center,
                verticalArrangement = Arrangement.Center,
                maxItemsInEachRow = 3
            ) {
                JetLaggedSleepGraphCard(uiState.value.sleepGraphData)
                if (windowSizeClass == WindowWidthSizeClass.COMPACT) {
                    AverageTimeInBedCard()
                    AverageTimeAsleepCard()
                } else {
                    FlowColumn {
                        AverageTimeInBedCard()
                        AverageTimeAsleepCard()
                    }
                }
                if (windowSizeClass == WindowWidthSizeClass.COMPACT) {
                    WellnessCard(uiState.value.wellnessData)
                    HeartRateCard(uiState.value.heartRateData)
                } else {
                    FlowColumn {
                        WellnessCard(uiState.value.wellnessData)
                        HeartRateCard(uiState.value.heartRateData)
                    }
                }
            }

From the above logic, the UI will appear in the following ways on different device sizes:

Side by side comparisons of the differeces in UI on three different sized devices
Different UI on different sized devices

Use movableContentOf to maintain bits of UI state across screen resizes

Movable content allows you to save the contents of a Composable to move it around your layout hierarchy without losing state. It should be used for content that is perceived to be the same - just in a different location on screen.

Imagine this, you are moving house to a different city, and you pack a box with a clock inside of it. Opening the box in the new home, you’d see that the time would still be ticking from where it left off. It might not be the correct time of your new timezone, but it will definitely have ticked on from where you left it. The contents inside the box don’t reset their internal state when the box is moved around.

What if you could use the same concept in Compose to move items on screen without losing their internal state?

Take the following scenario into account: Define different Tile composables that display an infinitely animating value between 0 and 100 over 5000ms.


@Composable
fun Tile1() {
    val repeatingAnimation = rememberInfiniteTransition()

    val float = repeatingAnimation.animateFloat(
        initialValue = 0f,
        targetValue = 100f,
        animationSpec = infiniteRepeatable(repeatMode = RepeatMode.Reverse,
            animation = tween(5000))
    )
    Box(modifier = Modifier
        .size(100.dp)
        .background(purple, RoundedCornerShape(8.dp))){
        Text("Tile 1 ${float.value.roundToInt()}",
            modifier = Modifier.align(Alignment.Center))
    }
}

We then display them on screen using a Column Layout - showing the infinite animations as they go:

A purple tile stacked in a column above a pink tile. Both tiles show a counter, counting up from 0 to 100 and back down to 0

But what If we wanted to lay the tiles differently, based on if the phone is in a different orientation (or different screen size), and we don’t want the animation values to stop running? Something like the following:

@Composable
fun WithoutMovableContentDemo() {
    val mode = remember {
        mutableStateOf(Mode.Portrait)
    }
    if (mode.value == Mode.Landscape) {
        Row {
           Tile1()
           Tile2()
        }
    } else {
        Column {
           Tile1()
           Tile2()
        }
    }
}

This looks pretty standard, but running this on device - we can see that switching between the two layouts causes our animations to restart.

A purple tile stacked in a column above a pink tile. Both tiles show a counter, counting upward from 0. The column changes to a row and back to a column, and the counter restarts everytime the layout changes

This is the perfect case for movable content - it is the same Composables on screen, they are just in a different location. So how do we use it? We can just define our tiles in a movableContentOf block, using remember to ensure its saved across compositions:

val tiles = remember {
        movableContentOf {
            Tile1()
            Tile2()
        }
 }

Now instead of calling our composables again inside the Column and Row respectively, we call tiles() instead.

@Composable
fun MovableContentDemo() {
    val mode = remember {
        mutableStateOf(Mode.Portrait)
    }
    val tiles = remember {
        movableContentOf {
            Tile1()
            Tile2()
        }
    }
    Box(modifier = Modifier.fillMaxSize()) {
        if (mode.value == Mode.Landscape) {
            Row {
                tiles()
            }
        } else {
            Column {
                tiles()
            }
        }

        Button(onClick = {
            if (mode.value == Mode.Portrait) {
                mode.value = Mode.Landscape
            } else {
                mode.value = Mode.Portrait
            }
        }, modifier = Modifier.align(Alignment.BottomCenter)) {
            Text("Change layout")
        }
    }
}

This will then remember the nodes generated by those Composables and preserve the internal state that these composables currently have.

A purple tile stacked in a column above a pink tile. Both tiles show a counter, counting upward from 0 to 100. The column changes to a row and back to a column, and the counter continues seamlessly when the layout changes

We can now see that our animation state is remembered across the different compositions. Our clock in the box will now keep state when it's moved around the world.

Using this concept, we can keep the animating bubble state of our cards, by placing the cards in movableContentOf:

Language
val timeSleepSummaryCards = remember { movableContentOf { AverageTimeInBedCard() AverageTimeAsleepCard() } } LookaheadScope { FlowRow( modifier = Modifier.fillMaxSize(), horizontalArrangement = Arrangement.Center, verticalArrangement = Arrangement.Center, maxItemsInEachRow = 3 ) { //.. if (windowSizeClass == WindowWidthSizeClass.Compact) { timeSleepSummaryCards() } else { FlowColumn { timeSleepSummaryCards() } } // } }

This allows the cards state to be remembered and the cards won't be recomposed. This is evident when observing the bubbles in the background of the cards, on resizing the screen the bubble animation continues without restarting the animation.

A purple tile showing Average time in bed stacked in a column above a green tile showing average time sleep. Both tiles show moving bubbles. The column changes to a row and back to a column, and the bubbles continue to move across the tiles as the layout changes

Use Modifier.animateBounds() to have fluid animations between different window sizes

From the above example, we can see that state is maintained between changes in layout size (or layout itself), but the difference between the two layouts is a bit jarring. We’d like this to animate between the two states without issue.

In the latest compose-bom-alpha (2024.09.03), there is a new experimental custom Modifier, Modifier.animateBounds(). The animateBounds modifier requires a LookaheadScope.

LookaheadScope enables Compose to perform intermediate measurement passes of layout changes, notifying composables of the intermediate states between them. LookaheadScope is also used for the new shared element APIs, that you may have seen recently.

To use Modifier.animateBounds(), we wrap the top-level FlowRow in a LookaheadScope, and then apply the animateBounds modifier to each card. We can also customize how the animation runs, by specifying the boundsTransform parameter to a custom spring spec:

val boundsTransform = { _ : Rect, _: Rect ->
   spring(
       dampingRatio = Spring.DampingRatioNoBouncy,
       stiffness = Spring.StiffnessMedium,
       visibilityThreshold = Rect.VisibilityThreshold
   )
}


LookaheadScope {
   val animateBoundsModifier = Modifier.animateBounds(
       lookaheadScope = this@LookaheadScope,
       boundsTransform = boundsTransform)
   val timeSleepSummaryCards = remember {
       movableContentOf {
           AverageTimeInBedCard(animateBoundsModifier)
           AverageTimeAsleepCard(animateBoundsModifier)
       }
   }
   FlowRow(
       modifier = Modifier
           .fillMaxSize()
           .windowInsetsPadding(insets),
       horizontalArrangement = Arrangement.Center,
       verticalArrangement = Arrangement.Center,
       maxItemsInEachRow = 3
   ) {
       JetLaggedSleepGraphCard(uiState.value.sleepGraphData, animateBoundsModifier.widthIn(max = 600.dp))
       if (windowSizeClass == WindowWidthSizeClass.Compact) {
           timeSleepSummaryCards()
       } else {
           FlowColumn {
               timeSleepSummaryCards()
           }
       }


       FlowColumn {
           WellnessCard(
               wellnessData = uiState.value.wellnessData,
               modifier = animateBoundsModifier
                   .widthIn(max = 400.dp)
                   .heightIn(min = 200.dp)
           )
           HeartRateCard(
               modifier = animateBoundsModifier
                   .widthIn(max = 400.dp, min = 200.dp),
               uiState.value.heartRateData
           )
       }
   }
}

Applying this to our layout, we can see the transition between the two states is more seamless without jarring interruptions.

A purple tile showing Average time in bed stacked in a column above a green tile showing average time sleep. Both tiles show moving bubbles. The column changes to a row and back to a column, and the bubbles continue to move across the tiles as the layout changes

Applying this logic to our whole dashboard, when resizing our layout, you will see that we now have a fluid UI interaction throughout the whole screen.

Moving image demonstrating responsive design in Jetlagged where items animate positions automatically

Summary

As you can see from this article, using Compose has enabled us to build a responsive dashboard-like layout by leveraging flow layouts, WindowSizeClasses, movable content and LookaheadScope. These concepts can also be used for your own layouts that may have items moving around in them too.

For more information on these different topics, be sure to check out the official documentation, for the detailed changes to JetLagged, take a look at this pull request.

#WeArePlay | NomadHer helps women travel the world

Posted by Robbie McLachlan, Developer Marketing


In our latest film for #WeArePlay, which celebrates the people behind apps and games, we meet Hyojeong - the visionary behind the app NomadHer. She’s aiming to reshape how women explore the world by building a global community: sharing travel tips, prioritizing safety, and connecting with one another to explore new destinations.



What inspired you to create NomadHer?

Honestly, NomadHer was born out of a personal need. I started traveling solo when I was 19 and have visited over 60 countries alone, and while it was an incredibly empowering and enriching journey, it wasn’t always easy—especially as a woman. There was this one moment when I was traveling in Italy that really shook me. I realized just how important it was to have a support system, not just friends or family, but other women who understand what it's like to be out there on your own. That’s when the idea hit me— I wanted to create a space where women could feel safe and confident while seeing the world.


NomadHer Founder - Hyojeong Kim from South Korean smiling, wearing a white tshirt with green text that reads 'she can travel anywhere'

The focus on connecting women who share similar travel plans is a powerful tool. Can you share feedback from someone who has found travel buddies through NomadHer?

Absolutely! One of my favorite stories comes from a woman who was planning a solo trip to Bali. She connected with another ‘NomadHer’ through the app who had the exact same travel dates and itinerary. They started chatting, and by the time they met up in Bali, it was like they’d known each other forever. They ended up traveling together, trying out new restaurants, exploring hidden beaches, and even taking a surfing class! After the trip, they both messaged us saying how the experience felt safer and more fun because they had each other. It’s stories like these that remind me why I started NomadHer in the first place.

How did Google Play help you grow NomadHer?

We couldn’t connect with the 90,000+ women worldwide without Google Play. We’ve been able to reach people from Latin America, Africa, Asia, and beyond. It’s incredible seeing women connect, share tips, and support each other, no matter where they are. With tools like Firebase, we can track and improve the app experience, making sure everything runs smoothly. Plus, Google Play's startup programs gave us mentorship and visibility, which really helped us grow and expand our reach faster. It’s been a game-changer.

NomadHer on Google Play on a device

What are your hopes for NomadHer in the future?

I want NomadHer to be more than just an app—it’s a movement. My dream is for it to become the go-to platform for women travelers everywhere. I want to see more partnerships with local women entrepreneurs, like the surf shop owner we work with in Busan. We host offline events like the She Can Travel Festival in Busan and I’m excited to host similar events in other countries like Paris, Tokyo, and Delhi. The goal is to continue creating these offline connections to build a community that empowers women, both socially and economically, through partnerships with local female businesses.

Discover more global #WeArePlay stories and share your favorites.



How useful did you find this blog post?

Here’s what happening in our latest Spotlight Week: Adaptive Android Apps

Posted by Alex Vanyo - Developer Relations Engineer

Adaptive Spotlight Week

With Android powering a diverse range of devices, users expect a seamless and optimized experience across their foldables, tablets, ChromeOS, and even cars. To meet these expectations, developers need to build their apps with multiple screen sizes and form factors in mind. Changing how you approach UI can drastically improve users' experiences across foldables, tablets, and more, while preventing tech debt that a portrait-only mindset can create – simply put, building adaptive is a great way to help future-proof your app.

The latest in our Spotlight Week series will focus on Building Adaptive Android apps all this week (October 14-18), and we’ll highlight the many ways you can improve your mobile app to adapt to all of these different environments.



Here’s what we’re covering during Adaptive Spotlight Week

Monday: What is adaptive?

October 14, 2024

Check out the new documentation for building adaptive apps and catch up on building adaptive Android apps if you missed it at I/O 2024. Also, learn how adaptive apps can be made available on another new form factor: cars!

Tuesday: Adaptive UIs with Compose

October 15, 2024

Learn the principles for how you can use Compose to build layouts that adapt to available window size and how the Material 3 adaptive library enables you to create list-detail and supporting pane layouts with out-of-the-box behavior.

Wednesday: Desktop windowing and productivity

October 16, 2024

Learn what desktop windowing on Android is, together with details about how to handle it in your app and build productivity experiences that let users take advantage of more powerful multitasking Android environments.

Thursday: Stylus

October 17, 2024

Take a closer look at how you can build powerful drawing experiences across stylus and touch input with the new Ink API.

Friday: #AskAndroid

October 18, 2024

Join us for a live Q&A on making apps more adaptive. During Spotlight Week, ask your questions on X and LinkedIn with #AskAndroid.


These are just some of the ways that you can improve your mobile app’s experience for more than just the smartphone with touch input. Keep checking this blog post for updates. We’ll be adding links and more throughout the week. Follow Android Developers on X and Android by Google at LinkedIn to hear even more about ways to adapt your app, and send in your questions with #AskAndroid.

Introducing Ink API, a new Jetpack library for stylus apps

Posted by Chris Assigbe – Developer Relations Engineer and Tom Buckley – Product Manager

With stylus input, Android apps on phones, foldables, tablets, and Chromebooks become even more powerful tools for productivity and creativity. While there's already a lot to think about when designing for large screens – see our full guidance and inspiration gallery – styluses are especially impactful, transforming these devices into a digital notebook or sketchbook. Users expect stylus experiences to feel as fluid and natural as writing on paper, which is why Android previously added APIs to reduce inking latency to as low as 4ms; virtually imperceptible. However, latency is just one aspect of an inking experience – developers currently need to generate stroke shapes from stylus input, render those strokes quickly, and efficiently run geometric queries over strokes for tools like selection and eraser. These capabilities can require significant investment in geometry and graphics just to get started.

Today, we're excited to share Ink API, an alpha Jetpack library that makes it easy to create, render, and manipulate beautiful ink strokes, enabling developers to build amazing features on top of these APIs. Ink API builds upon the Android framework's foundation of low latency and prediction, providing you with a powerful and intuitive toolkit for integrating rich inking features into your apps.

moving image of a stylus writing with Ink API on a Samsung Tab S8, 4ms showing end-to-end latency
Writing with Ink API on a Samsung Tab S8, 4ms end-to-end latency

What is Ink API?

Ink API is a comprehensive stylus input library that empowers you to quickly create innovative and expressive inking experiences. It offers a modular architecture rather than a one-size-fits-all canvas, so you can tailor Ink API to your app's stack and needs. The modules encompass key functionalities like:

    • Strokes module: Represents the ink input and its visual representation.
    • Geometry module: Supports manipulating and analyzing strokes, facilitating features like erasing, and selecting strokes.
    • Brush module: Provides a declarative way to define the visual style of strokes, including color, size, and the type of tool to draw with.
    • Rendering module: Efficiently displays ink strokes on the screen, allowing them to be combined with Jetpack Compose or Android Views.
    • Live Authoring module: Handles real-time inking input to create smooth strokes with the lowest latency a device can provide.

Ink API is compatible with devices running Android 5.0 (API level 21) or later, and offers benefits on all of these devices. It can also take advantage of latency improvements in Android 10 (API 29) and improved rendering effects and performance in Android 14 (API 34).

Why choose Ink API?

Ink API provides an out-of-the-box implementation for basic inking tasks so you can create a unique drawing experience for your own app. Ink API offers several advantages over a fully custom implementation:

    • Ease of Use: Ink API abstracts away the complexities of graphics and geometry, allowing you to focus on your app's unique inking features.
    • Performance: Built-in low latency support and optimized rendering ensure a smooth and responsive inking experience.
    • Flexibility: The modular design allows you to pick and choose the components you need, tailoring the library to your specific requirements.

Ink API has already been adopted across many Google apps because of these advantages, including for markup in Docs and Circle-to-Search; and the underlying technology also powers markup in Photos, Drive, Meet, Keep, and Classroom. For Circle to Search, the Ink API modular design empowered the team to utilize only the components they needed. They leveraged the live authoring and brush capabilities of Ink API to render a beautiful stroke as users circle (to search). The team also built custom geometry tools tailored to their ML models. That’s modularity at its finest.

moving image of a stylus writing with Ink API on a Samsung Tab S8, 4ms showing end-to-end latency

“Ink API was our first choice for Circle-to-Search (CtS). Utilizing their extensive documentation, integrating the Ink API was a breeze, allowing us to reach our first working prototype w/in just one week. Ink's custom brush texture and animation support allowed us to quickly iterate on the stroke design.” 

- Jordan Komoda, Software Engineer, Google

We have also designed Ink API with our Android app partners' feedback in mind to make sure it fits with their existing app architectures and requirements.

With Ink API, building a natural and fluid inking experience on Android is simpler than ever. Ink API lets you focus on what differentiates your experience rather than on the details of paths, meshes, and shaders. Whether you are exploring inking for note-taking, photo or document markup, interactive learning, or something completely different, we hope you’ll give Ink API a try!

Get started with Ink API

Ready to dive into the well of Ink API? Check out the official developer guide and explore the API reference to start building your next-generation inking app. We're eager to see the innovative experiences you create!

Note: This alpha release is just the beginning for Ink API. We're committed to continuously improving the library, adding new features and functionalities based on your feedback. Stay tuned for updates and join us in shaping the future of inking on Android!

Gemini API in action: showcase of innovative Android apps

Posted by Thomas Ezan, Sr Developer Relation Engineer

With the advent of Generative AI, Android developers now have access to capabilities that were previously out of reach. For instance, you can now easily add image captioning to your app without any computer vision knowledge.

With the upcoming launch of the stable version of VertexAI in Firebase in a few weeks (available in Beta Since Google I/O), you'll be able to effortlessly incorporate the capabilities of Gemini 1.5 Flash and Gemini 1.5 Pro into your app. The inference runs on Google's servers, making these capabilities accessible to any device with an internet connection.

Several Android developers have already begun leveraging this technology. Let's explore some of their creations.


Generate your meals for the week

The team behind Meal Planner, a meal planner and shopping list management app, is leveraging Gemini 1.5 Flash to create original meal plans. Based on the user’s diet, the number of people you are cooking for and any food allergies or intolerances, the app automatically creates a meal plan for the selected week.

For each dish, the model lists ingredients and quantities, taking into account the number of portions. It also provides instructions on how to prepare it. The app automatically generates a shopping list for the week based on the ingredient list for each meal.

moving image of Meal Planner app user experience

To enable reliable processing of the model’s response and to integrate it in the app, the team leveraged Gemini's JSON mode. They specified responseMimeType = "application/json" in the model configuration and defined the expected JSON response schema in the prompt (see API documentation).

Following the launch of the meal generation feature, Meal Planner received overwhelmingly positive feedback. The feature simplified meal planning for users with dietary restrictions and helped reduce food waste. In the months after its introduction, Meal Planner experienced a 17% surge in premium users.


Journaling while chatting with Leo

A few months ago, the team behind the journal app Life wanted to provide an innovative way to let their users log entries. They created "Leo", an AI diary assistant chatting with users and converting conversations into a journal entry.

To modify the behavior of the model and the tone of its responses, the team used system instructions to define the chatbot persona. This allows the user to set the behavior and tone of the assistant: Pick “Professional and formal” and the model will keep the conversation strict, select “Friendly and cheerful” and it will lighten up the dialogue with lots of emojis!

moving image of Leo app user experience

The team saw an increase of user engagement following the launch of the feature.

And if you want to know more about how the Life developers used Gemini API in their app, we had a great conversation with Jomin from the team. This conversation is part of a new Android podcast series called Android Build Time, that you can also watch on YouTube.


Create a nickname on the fly

The HiiKER app provides offline hiking maps. The app also fosters a community by letting users rating trails and leaving comments. But users signing up don’t always add a username to their profile. To avoid the risk of lowering the conversion rate by making username selection mandatory at signup time, the team decided to use the Gemini API to suggest unique usernames based on the user’s country or area.

moving image of HiiKER app user experience

To generate original usernames, the team set a high temperature value and played with the top-K and top-P values to increase the creativity of the model.

This AI-assisted feature led to a significant lift in the percentage of users with "complete" profiles contributing to a positive impact on engagement and retention.

It’s time to build!

Generative AI is still a very new space and we are just starting to have easy access to these capabilities for Android. From enabling advanced personalization, creating delightful interactive experiences or simplifying signup, you might have unique challenges that you are trying to solve as an Android app developer. It is a great time to start looking at these challenges as opportunities that generative AI can help you tackle!


You can learn more about the advanced features of the Gemini Cloud models, find an introduction to generative AI for Android developers, and get started with Vertex AI in Firebase documentation.

To learn more about AI on Android, check out other resources we have available during AI in Android Spotlight Week.

Use #AndroidAI hashtag to share your creations or feedback on social media, and join us at the forefront of the AI revolution!

Gemini API in action: showcase of innovative Android apps

Posted by Thomas Ezan, Sr Developer Relation Engineer

With the advent of Generative AI, Android developers now have access to capabilities that were previously out of reach. For instance, you can now easily add image captioning to your app without any computer vision knowledge.

With the upcoming launch of the stable version of VertexAI in Firebase in a few weeks (available in Beta Since Google I/O), you'll be able to effortlessly incorporate the capabilities of Gemini 1.5 Flash and Gemini 1.5 Pro into your app. The inference runs on Google's servers, making these capabilities accessible to any device with an internet connection.

Several Android developers have already begun leveraging this technology. Let's explore some of their creations.


Generate your meals for the week

The team behind Meal Planner, a meal planner and shopping list management app, is leveraging Gemini 1.5 Flash to create original meal plans. Based on the user’s diet, the number of people you are cooking for and any food allergies or intolerances, the app automatically creates a meal plan for the selected week.

For each dish, the model lists ingredients and quantities, taking into account the number of portions. It also provides instructions on how to prepare it. The app automatically generates a shopping list for the week based on the ingredient list for each meal.

moving image of Meal Planner app user experience

To enable reliable processing of the model’s response and to integrate it in the app, the team leveraged Gemini's JSON mode. They specified responseMimeType = "application/json" in the model configuration and defined the expected JSON response schema in the prompt (see API documentation).

Following the launch of the meal generation feature, Meal Planner received overwhelmingly positive feedback. The feature simplified meal planning for users with dietary restrictions and helped reduce food waste. In the months after its introduction, Meal Planner experienced a 17% surge in premium users.


Journaling while chatting with Leo

A few months ago, the team behind the journal app Life wanted to provide an innovative way to let their users log entries. They created "Leo", an AI diary assistant chatting with users and converting conversations into a journal entry.

To modify the behavior of the model and the tone of its responses, the team used system instructions to define the chatbot persona. This allows the user to set the behavior and tone of the assistant: Pick “Professional and formal” and the model will keep the conversation strict, select “Friendly and cheerful” and it will lighten up the dialogue with lots of emojis!

moving image of Leo app user experience

The team saw an increase of user engagement following the launch of the feature.

And if you want to know more about how the Life developers used Gemini API in their app, we had a great conversation with Jomin from the team. This conversation is part of a new Android podcast series called Android Build Time, that you can also watch on YouTube.


Create a nickname on the fly

The HiiKER app provides offline hiking maps. The app also fosters a community by letting users rating trails and leaving comments. But users signing up don’t always add a username to their profile. To avoid the risk of lowering the conversion rate by making username selection mandatory at signup time, the team decided to use the Gemini API to suggest unique usernames based on the user’s country or area.

moving image of HiiKER app user experience

To generate original usernames, the team set a high temperature value and played with the top-K and top-P values to increase the creativity of the model.

This AI-assisted feature led to a significant lift in the percentage of users with "complete" profiles contributing to a positive impact on engagement and retention.

It’s time to build!

Generative AI is still a very new space and we are just starting to have easy access to these capabilities for Android. From enabling advanced personalization, creating delightful interactive experiences or simplifying signup, you might have unique challenges that you are trying to solve as an Android app developer. It is a great time to start looking at these challenges as opportunities that generative AI can help you tackle!


You can learn more about the advanced features of the Gemini Cloud models, find an introduction to generative AI for Android developers, and get started with Vertex AI in Firebase documentation.

To learn more about AI on Android, check out other resources we have available during AI in Android Spotlight Week.

Use #AndroidAI hashtag to share your creations or feedback on social media, and join us at the forefront of the AI revolution!

Advanced capabilities of the Gemini API for Android developers

Posted by Thomas Ezan, Sr Developer Relation Engineer

Thousands of developers across the globe are harnessing the power of the Gemini 1.5 Pro and Gemini 1.5 Flash models to infuse advanced generative AI features into their applications. Android developers are no exception, and with the upcoming launch of the stable version of VertexAI in Firebase in a few weeks (available in Beta since Google I/O), it's the perfect time to explore how your app can benefit from it. We just published a codelab to help you get started.

Let's deep dive into some advanced capabilities of the Gemini API that go beyond simple text prompting and discover the exciting use cases they can unlock in your Android app.

Shaping AI behavior with system instructions

System instructions serve as a "preamble" that you incorporate before the user prompt. This enables shaping the model's behavior to align with your specific requirements and scenarios. You set the instructions when you initialize the model, and then those instructions persist through all interactions with the model, across multiple user and model turns.

For example, you can use system instructions to:

    • Define a persona or role for a chatbot (e.g, “explain like I am 5”)
    • Specify the response to the output format (e.g., Markdown, YAML, etc.)
    • Set the output style and tone (e.g, verbosity, formality, etc…)
    • Define the goals or rules for the task (e.g, “return a code snippet without further explanation”)
    • Provide additional context for the prompt (e.g., a knowledge cutoff date)

To use system instructions in your Android app, pass it as parameter when you initialize the model:

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  ...
  systemInstruction = 
    content { text("You are a knowledgeable tutor. Answer the questions using the socratic tutoring method.") }
)

You can learn more about system instruction in the Vertex AI in Firebase documentation.

You can also easily test your prompt with different system instructions in Vertex AI Studio, Google Cloud console tool for rapidly prototyping and testing prompts with Gemini models.


test system instructions with your prompts in Vertex AI Studio
Vertex AI Studio let’s you test system instructions with your prompts

When you are ready to go to production it is recommended to target a specific version of the model (e.g. gemini-1.5-flash-002). But as new model versions are released and previous ones are deprecated, it is advised to use Firebase Remote Config to be able to update the version of the Gemini model without releasing a new version of your app.

Beyond chatbots: leveraging generative AI for advanced use cases

While chatbots are a popular application of generative AI, the capabilities of the Gemini API go beyond conversational interfaces and you can integrate multimodal GenAI-enabled features into various aspects of your Android app.

Many tasks that previously required human intervention (such as analyzing text, image or video content, synthesizing data into a human readable format, engaging in a creative process to generate new content, etc… ) can be potentially automated using GenAI.

Gemini JSON support

Android apps don’t interface well with natural language outputs. Conversely, JSON is ubiquitous in Android development, and provides a more structured way for Android apps to consume input. However, ensuring proper key/value formatting when working with generative models can be challenging.

With the general availability of Vertex AI in Firebase, implemented solutions to streamline JSON generation with proper key/value formatting:

Response MIME type identifier

If you have tried generating JSON with a generative AI model, it's likely you have found yourself with unwanted extra text that makes the JSON parsing more challenging.

e.g:

Sure, here is your JSON:
```
{
   "someKey”: “someValue",
   ...
}
```

When using Gemini 1.5 Pro or Gemini 1.5 Flash, in the generation configuration, you can explicitly specify the model’s response mime/type as application/json and instruct the model to generate well-structured JSON output.

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  
  generationConfig = generationConfig {
     responseMimeType = "application/json"
  }
)

Review the API reference for more details.

Soon, the Android SDK for Vertex AI in Firebase will enable you to define the JSON schema expected in the response.


Multimodal capabilities

Both Gemini 1.5 Flash and Gemini 1.5 Pro are multimodal models. It means that they can process input from multiple formats, including text, images, audio, video. In addition, they both have long context windows, capable of handling up to 1 million tokens for Gemini 1.5 Flash and 2 million tokens for Gemini 1.5 Pro.

These features open doors to innovative functionalities that were previously inaccessible such as automatically generate descriptive captions for images, identify topics in a conversation and generate chapters from an audio file or describe the scenes and actions in a video file.

You can pass an image to the model as shown in this example:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(imageUri).use { stream ->
  stream?.let {
     val bitmap = BitmapFactory.decodeStream(stream)

    // Provide a prompt that includes the image specified above and text
    val prompt = content {
       image(bitmap)
       text("How many people are on this picture?")
    }
  }
  val response = generativeModel.generateContent(prompt)
}

You can also pass a video to the model:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        blob("video/mp4", bytes)
        text("What is in the video?")
    }

    val fullResponse = generativeModel.generateContent(prompt)
  }
}

You can learn more about multimodal prompting in the VertexAI for Firebase documentation.

Note: This method enables you to pass files up to 20 MB. For larger files, use Cloud Storage for Firebase and include the file’s URL in your multimodal request. Read the documentation for more information.

Function calling: Extending the model's capabilities

Function calling enables you to extend the capabilities to generative models. For example you can enable the model to retrieve information in your SQL database and feed it back to the context of the prompt. You can also let the model trigger actions by calling the functions in your app source code. In essence, function calls bridge the gap between the Gemini models and your Kotlin code.

Take the example of a food delivery application that is interested in implementing a conversational interface with the Gemini 1.5 Flash. Assume that this application has a getFoodOrder(cuisine: String) function that returns the list orders from the user for a specific type of cuisine:

fun getFoodOrder(cuisine: String) : JSONObject {
        // implementation…  
}

Note that the function, to be usable to by the model, needs to return the response in the form of a JSONObject.

To make the response available to Gemini 1.5 Flash, create a definition of your function that the model will be able to understand using defineFunction:

val getOrderListFunction = defineFunction(
            name = "getOrderList",
            description = "Get the list of food orders from the user for a define type of cuisine.",
            Schema.str(name = "cuisineType", description = "the type of cuisine for the order")
        ) {  cuisineType ->
            getFoodOrder(cuisineType)
        }

Then, when you instantiate the model, share this function definition with the model using the tools parameter:

val generativeModel = Firebase.vertexAI.generativeModel(
    modelName = "gemini-1.5-flash",
    ...
    tools = listOf(Tool(listOf(getExchangeRate)))
)

Finally, when you get a response from the model, check in the response if the model is actually requesting to execute the function:

// Send the message to the generative model
var response = chat.sendMessage(prompt)

// Check if the model responded with a function call
response.functionCall?.let { functionCall ->
  // Try to retrieve the stored lambda from the model's tools and
  // throw an exception if the returned function was not declared
  val matchedFunction = generativeModel.tools?.flatMap { it.functionDeclarations }
      ?.first { it.name == functionCall.name }
      ?: throw InvalidStateException("Function not found: ${functionCall.name}")
  
  // Call the lambda retrieved above
  val apiResponse: JSONObject = matchedFunction.execute(functionCall)

  // Send the API response back to the generative model
  // so that it generates a text response that can be displayed to the user
  response = chat.sendMessage(
    content(role = "function") {
        part(FunctionResponsePart(functionCall.name, apiResponse))
    }
  )
}

// If the model responds with text, show it in the UI
response.text?.let { modelResponse ->
    println(modelResponse)
}

To summarize, you’ll provide the functions (or tools to the model) at initialization:

A flow diagram shows a green box labeled 'Generative Model' connected to a list of model parameters and a list of tools. The parameters include 'gemini-1.5-flash', 'api_key', and 'configuration', while the tools are 'getOrderList()', 'getDate()', and 'placeOrder()'.

And when appropriate, the model will request to execute the appropriate function and provide the results:

A flow diagram illustrating the interaction between an Android app and a 'Generative Model'. The app sends 'getDate()' and 'getOrderList()' requests.

You can read more about function calling in the VertexAI for Firebase documentation.

Unlocking the potential of the Gemini API in your app

The Gemini API offers a treasure trove of advanced features that empower Android developers to craft truly innovative and engaging applications. By going beyond basic text prompts and exploring the capabilities highlighted in this blog post, you can create AI-powered experiences that delight your users and set your app apart in the competitive Android landscape.

Read more about how some Android apps are already starting to leverage the Gemini API.


To learn more about AI on Android, check out other resources we have available during AI on Android Spotlight Week.

Use #AndroidAI hashtag to share your creations or feedback on social media, and join us at the forefront of the AI revolution!


The code snippets in this blog post have the following license:

// Copyright 2024 Google LLC.
// SPDX-License-Identifier: Apache-2.0

Advanced capabilities of the Gemini API for Android developers

Posted by Thomas Ezan, Sr Developer Relation Engineer

Thousands of developers across the globe are harnessing the power of the Gemini 1.5 Pro and Gemini 1.5 Flash models to infuse advanced generative AI features into their applications. Android developers are no exception, and with the upcoming launch of the stable version of VertexAI in Firebase in a few weeks (available in Beta since Google I/O), it's the perfect time to explore how your app can benefit from it. We just published a codelab to help you get started.

Let's deep dive into some advanced capabilities of the Gemini API that go beyond simple text prompting and discover the exciting use cases they can unlock in your Android app.

Shaping AI behavior with system instructions

System instructions serve as a "preamble" that you incorporate before the user prompt. This enables shaping the model's behavior to align with your specific requirements and scenarios. You set the instructions when you initialize the model, and then those instructions persist through all interactions with the model, across multiple user and model turns.

For example, you can use system instructions to:

    • Define a persona or role for a chatbot (e.g, “explain like I am 5”)
    • Specify the response to the output format (e.g., Markdown, YAML, etc.)
    • Set the output style and tone (e.g, verbosity, formality, etc…)
    • Define the goals or rules for the task (e.g, “return a code snippet without further explanation”)
    • Provide additional context for the prompt (e.g., a knowledge cutoff date)

To use system instructions in your Android app, pass it as parameter when you initialize the model:

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  ...
  systemInstruction = 
    content { text("You are a knowledgeable tutor. Answer the questions using the socratic tutoring method.") }
)

You can learn more about system instruction in the Vertex AI in Firebase documentation.

You can also easily test your prompt with different system instructions in Vertex AI Studio, Google Cloud console tool for rapidly prototyping and testing prompts with Gemini models.


test system instructions with your prompts in Vertex AI Studio
Vertex AI Studio let’s you test system instructions with your prompts

When you are ready to go to production it is recommended to target a specific version of the model (e.g. gemini-1.5-flash-002). But as new model versions are released and previous ones are deprecated, it is advised to use Firebase Remote Config to be able to update the version of the Gemini model without releasing a new version of your app.

Beyond chatbots: leveraging generative AI for advanced use cases

While chatbots are a popular application of generative AI, the capabilities of the Gemini API go beyond conversational interfaces and you can integrate multimodal GenAI-enabled features into various aspects of your Android app.

Many tasks that previously required human intervention (such as analyzing text, image or video content, synthesizing data into a human readable format, engaging in a creative process to generate new content, etc… ) can be potentially automated using GenAI.

Gemini JSON support

Android apps don’t interface well with natural language outputs. Conversely, JSON is ubiquitous in Android development, and provides a more structured way for Android apps to consume input. However, ensuring proper key/value formatting when working with generative models can be challenging.

With the general availability of Vertex AI in Firebase, implemented solutions to streamline JSON generation with proper key/value formatting:

Response MIME type identifier

If you have tried generating JSON with a generative AI model, it's likely you have found yourself with unwanted extra text that makes the JSON parsing more challenging.

e.g:

Sure, here is your JSON:
```
{
   "someKey”: “someValue",
   ...
}
```

When using Gemini 1.5 Pro or Gemini 1.5 Flash, in the generation configuration, you can explicitly specify the model’s response mime/type as application/json and instruct the model to generate well-structured JSON output.

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  
  generationConfig = generationConfig {
     responseMimeType = "application/json"
  }
)

Review the API reference for more details.

Soon, the Android SDK for Vertex AI in Firebase will enable you to define the JSON schema expected in the response.


Multimodal capabilities

Both Gemini 1.5 Flash and Gemini 1.5 Pro are multimodal models. It means that they can process input from multiple formats, including text, images, audio, video. In addition, they both have long context windows, capable of handling up to 1 million tokens for Gemini 1.5 Flash and 2 million tokens for Gemini 1.5 Pro.

These features open doors to innovative functionalities that were previously inaccessible such as automatically generate descriptive captions for images, identify topics in a conversation and generate chapters from an audio file or describe the scenes and actions in a video file.

You can pass an image to the model as shown in this example:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(imageUri).use { stream ->
  stream?.let {
     val bitmap = BitmapFactory.decodeStream(stream)

    // Provide a prompt that includes the image specified above and text
    val prompt = content {
       image(bitmap)
       text("How many people are on this picture?")
    }
  }
  val response = generativeModel.generateContent(prompt)
}

You can also pass a video to the model:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        blob("video/mp4", bytes)
        text("What is in the video?")
    }

    val fullResponse = generativeModel.generateContent(prompt)
  }
}

You can learn more about multimodal prompting in the VertexAI for Firebase documentation.

Note: This method enables you to pass files up to 20 MB. For larger files, use Cloud Storage for Firebase and include the file’s URL in your multimodal request. Read the documentation for more information.

Function calling: Extending the model's capabilities

Function calling enables you to extend the capabilities to generative models. For example you can enable the model to retrieve information in your SQL database and feed it back to the context of the prompt. You can also let the model trigger actions by calling the functions in your app source code. In essence, function calls bridge the gap between the Gemini models and your Kotlin code.

Take the example of a food delivery application that is interested in implementing a conversational interface with the Gemini 1.5 Flash. Assume that this application has a getFoodOrder(cuisine: String) function that returns the list orders from the user for a specific type of cuisine:

fun getFoodOrder(cuisine: String) : JSONObject {
        // implementation…  
}

Note that the function, to be usable to by the model, needs to return the response in the form of a JSONObject.

To make the response available to Gemini 1.5 Flash, create a definition of your function that the model will be able to understand using defineFunction:

val getOrderListFunction = defineFunction(
            name = "getOrderList",
            description = "Get the list of food orders from the user for a define type of cuisine.",
            Schema.str(name = "cuisineType", description = "the type of cuisine for the order")
        ) {  cuisineType ->
            getFoodOrder(cuisineType)
        }

Then, when you instantiate the model, share this function definition with the model using the tools parameter:

val generativeModel = Firebase.vertexAI.generativeModel(
    modelName = "gemini-1.5-flash",
    ...
    tools = listOf(Tool(listOf(getExchangeRate)))
)

Finally, when you get a response from the model, check in the response if the model is actually requesting to execute the function:

// Send the message to the generative model
var response = chat.sendMessage(prompt)

// Check if the model responded with a function call
response.functionCall?.let { functionCall ->
  // Try to retrieve the stored lambda from the model's tools and
  // throw an exception if the returned function was not declared
  val matchedFunction = generativeModel.tools?.flatMap { it.functionDeclarations }
      ?.first { it.name == functionCall.name }
      ?: throw InvalidStateException("Function not found: ${functionCall.name}")
  
  // Call the lambda retrieved above
  val apiResponse: JSONObject = matchedFunction.execute(functionCall)

  // Send the API response back to the generative model
  // so that it generates a text response that can be displayed to the user
  response = chat.sendMessage(
    content(role = "function") {
        part(FunctionResponsePart(functionCall.name, apiResponse))
    }
  )
}

// If the model responds with text, show it in the UI
response.text?.let { modelResponse ->
    println(modelResponse)
}

To summarize, you’ll provide the functions (or tools to the model) at initialization:

A flow diagram shows a green box labeled 'Generative Model' connected to a list of model parameters and a list of tools. The parameters include 'gemini-1.5-flash', 'api_key', and 'configuration', while the tools are 'getOrderList()', 'getDate()', and 'placeOrder()'.

And when appropriate, the model will request to execute the appropriate function and provide the results:

A flow diagram illustrating the interaction between an Android app and a 'Generative Model'. The app sends 'getDate()' and 'getOrderList()' requests.

You can read more about function calling in the VertexAI for Firebase documentation.

Unlocking the potential of the Gemini API in your app

The Gemini API offers a treasure trove of advanced features that empower Android developers to craft truly innovative and engaging applications. By going beyond basic text prompts and exploring the capabilities highlighted in this blog post, you can create AI-powered experiences that delight your users and set your app apart in the competitive Android landscape.

Read more about how some Android apps are already starting to leverage the Gemini API.


To learn more about AI on Android, check out other resources we have available during AI on Android Spotlight Week.

Use #AndroidAI hashtag to share your creations or feedback on social media, and join us at the forefront of the AI revolution!


The code snippets in this blog post have the following license:

// Copyright 2024 Google LLC.
// SPDX-License-Identifier: Apache-2.0

PyTorch machine learning models on Android

Posted by Paul Ruiz – Senior Developer Relations Engineer

Earlier this year we launched Google AI Edge, a suite of tools with easy access to ready-to-use ML tasks, frameworks that enable you to build ML pipelines, and run popular LLMs and custom models – all on-device. For AI on Android Spotlight Week, the Google team is highlighting various ways that Android developers can use machine learning to help improve their applications.

In this post, we'll dive into Google AI Edge Torch, which enables you to convert PyTorch models to run locally on Android and other platforms, using the Google AI Edge LiteRT (formerly TensorFlow Lite) and MediaPipe Tasks libraries. For insights on other powerful tools, be sure to explore the rest of the AI on Android Spotlight Week content.

To get started with Google AI Edge easier, we've provided samples available on GitHub as an executable codelab. They demonstrate how to convert the MobileViT model for image classification (compatible with MediaPipe Tasks) and the DIS model for segmentation (compatible with LiteRT).

a red Android figurine is shown next to a black and white silhouette of the same figure, labeled 'Original Image' and 'PT Mask' respectively, demonstrating image segmentation.
DIS model output

This blog guides you through how to use the MobileViT model with MediaPipe Tasks. Keep in mind that the LiteRT runtime provides similar capabilities, enabling you to build custom pipelines and features.

Convert MobileViT model for image classification compatible with MediaPipe Tasks

Once you've installed the necessary dependencies and utilities for your app, the first step is to retrieve the PyTorch model you wish to convert, along with any other MobileViT components you might need (such as an image processor for testing).

from transformers import MobileViTImageProcessor, MobileViTForImageClassification

hf_model_path = 'apple/mobilevit-small'
processor = MobileViTImageProcessor.from_pretrained(hf_model_path)
pt_model = MobileViTForImageClassification.from_pretrained(hf_model_path)

Since the end result of this tutorial should work with MediaPipe Tasks, take an extra step to match the expected input and output shapes for image classification to what is used by the MediaPipe image classification Task.

class HF2MP_ImageClassificationModelWrapper(nn.Module):

  def __init__(self, hf_image_classification_model, hf_processor):
    super().__init__()
    self.model = hf_image_classification_model
    if hf_processor.do_rescale:
      self.rescale_factor = hf_processor.rescale_factor
    else:
      self.rescale_factor = 1.0

  def forward(self, image: torch.Tensor):
    # BHWC -> BCHW.
    image = image.permute(0, 3, 1, 2)
    # RGB -> BGR.
    image = image.flip(dims=(1,))
    # Scale [0, 255] -> [0, 1].
    image = image * self.rescale_factor
    logits = self.model(pixel_values=image).logits  # [B, 1000] float32.
    # Softmax is required for MediaPipe classification model.
    logits = torch.nn.functional.softmax(logits, dim=-1)

    return logits

hf_model_path = 'apple/mobilevit-small'
hf_mobile_vit_processor = MobileViTImageProcessor.from_pretrained(hf_model_path)
hf_mobile_vit_model = MobileViTForImageClassification.from_pretrained(hf_model_path)
wrapped_pt_model = HF2MP_ImageClassificationModelWrapper(
hf_mobile_vit_model, hf_mobile_vit_processor).eval()

Whether you plan to use the converted MobileViT model with MediaPipe Tasks or LiteRT, the next step is to convert the model to the .tflite format.

First, match the input shape. In this example, the input shape is 1, 256, 256, 3 for a 256x256 pixel three-channel RGB image.

Then, call AI Edge Torch's convert function to complete the conversion process.

import ai_edge_torch

sample_args = (torch.rand((1, 256, 256, 3)),)
edge_model = ai_edge_torch.convert(wrapped_pt_model, sample_args)

After converting the model, you can further refine it by incorporating metadata for the image classification labels. MediaPipe Tasks will utilize this metadata to display or return pertinent information after classification.

from mediapipe.tasks.python.metadata.metadata_writers import image_classifier
from mediapipe.tasks.python.metadata.metadata_writers import metadata_writer
from mediapipe.tasks.python.vision.image_classifier import ImageClassifier
from pathlib import Path

flatbuffer_file = Path('hf_mobile_vit_mp_image_classification_raw.tflite')
edge_model.export(flatbuffer_file)
tflite_model_buffer = flatbuffer_file.read_bytes()

//Extract the image classification labels from the HF models for later integration into the TFLite model.
labels = list(hf_mobile_vit_model.config.id2label.values())

writer = image_classifier.MetadataWriter.create(
    tflite_model_buffer,
    input_norm_mean=[0.0], #  Normalization is not needed for this model.
    input_norm_std=[1.0],
    labels=metadata_writer.Labels().add(labels),
)
tflite_model_buffer, _ = writer.populate()

With all of that completed, it's time to integrate your model into an Android app. If you're following the official Colab notebook, this involves saving the model locally. For an example of image classification with MediaPipe Tasks, explore the GitHub repository. You can find more information in the official Google AI Edge documentation.

moving image of Newly converted ViT model with MediaPipe Tasks
Newly converted ViT model with MediaPipe Tasks

After understanding how to convert a simple image classification model, you can use the same techniques to adapt various PyTorch models for Google AI Edge LiteRT or MediaPipe Tasks tooling on Android.

For further model optimization, consider methods like quantizing during conversion. Check out the GitHub example to learn more about how to convert a PyTorch image segmentation model to LiteRT and quantize it.

What's Next

To keep up to date on Google AI Edge developments, look for announcements on the Google for Developers YouTube channel and blog.

We look forward to hearing about how you're using these features in your projects. Use #AndroidAI hashtag to share your feedback or what you've built in social media and check out other content in AI on Android Spotlight Week!