Tag Archives: Gemini

Preview summaries in the Google Chat home view with the help of Gemini in four additional languages

What’s changing 

Last year, we announced that Gemini in Google Chat can now help you catch up on unread conversations in the Chat home view with summaries. In February, we extended this ability to direct messages and read conversations and most recently, this feature became available in Spanish, Portuguese and German.

Today, we’re excited to announce that Gemini summaries in home are now available in four additional languages: 
  • French 
  • Italian 
  • Japanese 
  • Korean 
Upon navigating to a conversation in the home view, click the “Summarize” button to see a quick, bulleted synopsis of the message content. This enables you to quickly review recent activity across all active conversations to determine where best to focus your time and attention. 

Who’s impacted

End users 

Why it matters 

Since introducing a more streamlined user experience in Chat to help you find what you need much faster, we’re adding more ways to help you prioritize the most important conversations. This new summaries in home feature does just that by helping you catch up more quickly. 

Getting started


Rollout pace 


Availability 

Available for Google Workspace: 
  • Business Standard and Plus 
  • Enterprise Standard and Plus 
  • Customers with the Gemini Education or Gemini Education Premium add-on 
Anyone who previously purchased these add-ons will also receive this feature: 
  • Gemini Business* 
  • Gemini Enterprise* 
  • AI Meetings and Messages
*As of January 15, 2025, we’re no longer offering the Gemini Business and Gemini Enterprise add-ons for sale. Please refer to this announcement for more details.

Resources 

New Gemini summary cards now available in the Gmail app on Android and iOS devices

What’s changing

Last year, we announced the general availability of Gemini in the side panel of Gmail, allowing users to summarize email threads, get help drafting an email, see suggested responses to an email thread, and more. To build upon this and streamline how users review and interact with emails, we’re excited to introduce Gemini summary cards on mobile. 

Prior to today, to generate an email summary, you would tap “Summarize this email” at the top of a message, which then opened Gemini to show you a summary of the thread. 

Starting today, summaries will be available at the top of the email content for messages where a summary is helpful, such as longer email threads or messages with several replies. Gemini will synthesize all the key points from the email thread and any replies thereafter will also be a part of the synopsis, keeping all summaries up to date.

gemini summary cards in gmail app
Additional details 

  • This feature is only available for emails in English at this time. 
  • As with all of our AI features, Gmail remains committed to protecting user data and prioritizing privacy. Refer to our Privacy Hub site to learn more. 
  • Emails that do not show summary cards will continue to support user-triggered summaries with the “summarize this email” chip at the top of emails or the option to summarize using Gemini in the side panel. 

Getting started 


Rollout pace


Availability 

Available for Google Workspace: 
  • Business Starter, Standard and Plus 
  • Enterprise Starter, Standard and Plus 
  • Google One AI Premium 
  • Customers with the Gemini Education or Gemini Education Premium add-on 
Anyone who previously purchased these add-ons will also receive this feature: 
  • Gemini Business* 
  • Gemini Enterprise* 
*As of January 15, 2025, we’re no longer offering the Gemini Business and Gemini Enterprise add-ons for sale. Please refer to this announcement for more details.

Resources 


Understand your videos much faster using Gemini in Google Drive

What’s changing

In addition to the ability to use Gemini in the side panel of Google Drive to summarize documents or to interact with PDFs in Google Drive’s overlay file previewer, we’re excited to extend Gemini’s summarization and Q&A capabilities to videos in Drive. 

Starting today, you can use Gemini in Drive to get summaries and ask questions about the content of videos in your Drive, such as: 
  • “Summarize this video” 
  • “List action items from this meeting recording” 
  • “What are the highlights from this announcement video?" 

Understand your videos much faster using Gemini in Google Drive
Who’s impacted 

End users 

Why you’d use it 

Videos contain a wealth of information, however going back to watch them can be time consuming. With this update, users can leverage Gemini to get what they need from their videos much faster. 

Additional details 

This feature is currently available in English only and accessible when utilizing Google Drive’s overlay previewer or a standalone file viewer (new browser tab). 

Getting started 


Rollout pace 


Availability 

Available for Google Workspace: 
  • Business Standard and Plus 
  • Enterprise Standard and Plus 
  • Customers with the Gemini Education or Gemini Education Premium add-on 
  • Google One AI Premium 
Anyone who previously purchased these add-ons will also receive this feature: 
  • Gemini Business* 
  • Gemini Enterprise* 
*As of January 15, 2025, we’re no longer offering the Gemini Business and Gemini Enterprise add-ons for sale. Please refer to this announcement for more details.

Resources 

New in-car app experiences

Posted by Ben Sagmoe - Developer Relations Engineer

The in-car experience continues to evolve rapidly, and Google remains committed to pushing the boundaries of what's possible. At Google I/O 2025, we're excited to unveil the latest advancements for drivers, car manufacturers, and developers, furthering our goal of a safe, seamless, and helpful connected driving experience.

Today's car cabins are increasingly digital, offering developers exciting new opportunities with larger displays and more powerful computing. Android Auto is now supported in nearly all new cars sold, with almost 250 million compatible vehicles on the road.

We're also seeing significant growth in cars powered by Android Automotive OS with Google built-in. Over 50 models are currently available, with more launching this year. This growth is fueled by a thriving app ecosystem, including over 300 apps already available on the Play Store. These include apps optimized for a safe and seamless experience while driving as well as entertainment apps for while you're parked and waiting in your car—many of which are adaptive mobile apps that have been seamlessly brought to cars through the Car Ready Mobile Apps Program.

A vibrant developer community is essential to delivering these innovative in-car experiences utilizing the different screens within the car cabin. This past year, we've focused on key areas to help empower developers to build more differentiated experiences in cars across both platforms, as we embark on the Gemini era in cars!

Gemini for Cars

Exciting news for in-car experiences: Gemini, Google's advanced AI, is coming to vehicles! This unlocks a new era of safe and helpful interactions on the go.

Gemini enables natural voice conversations and seamless multitasking, empowering drivers to get more done simply by speaking naturally. Imagine effortlessly finding charging stations or navigating to a location pulled directly from an email, all with just your voice.

You can learn how to leverage Gemini's potential to create engaging in-car experiences in your app.

Navigation apps can integrate with Gemini using three core intent formats, allowing you to start navigation, display relevant search results, and execute custom actions, such as enabling users to report incidents like traffic congestion using their voice.

Gemini for cars will be rolling out in the coming months. Get ready to build the next generation of in-car AI experiences!

New developer programs and tools

table of app categories showing availability in android Auto and cars with Google built-in, including media, navigation, point-of-interest, internet of things, weather, video, browsers, games, and communication such as messaging and voip

Last year, we introduced car app quality tiers to inspire developers to create high quality in-car experiences. By developing your app in compliance with the Car ready tier, you can bring video, gaming, or browser apps to run while parked in cars with Google built-in with almost no additional effort. Learn more about Car Ready Mobile Apps.

Your app can further shine in cars within the Car optimized and Car differentiated tiers to unlock experiences while the car is in motion, and also when transitioning between parked and driving modes, while utilizing the different screens within the modern car cabin. Check the car app quality guidelines for details.

To start with, across both Android Auto and for cars with Google built-in, we've made some exciting improvements for Car App Library:

    • The Weather app category has graduated from beta: any developer can now publish weather apps to production tracks on both Android Auto and cars with Google Built-in. Before you publish your app, check that it meets the quality guidelines for weather apps.


    • Two new templates, the SectionedItemTemplate and MediaPlaybackTemplate, are now available in the Car App Library 1.8 alpha release for use on Android Auto. These templates are a great fit for building templated media apps, allowing for increased customization in layout and browsing structure.

      example of sectioneditemtemplate on the left and mediaplaybacktemplate on the right

On Android Auto, many new app categories and capabilities are now in beta:

    • We are adding support for Building media apps with the Car App Library, enabling media app developers to build both richer and more complete experiences that users are used to on their phones. During beta, developers can build and publish media apps built using the Car App Library to internal testing and closed testing tracks. You can also express interest in being an early access partner to publish to production while the category is in beta. 

    • The communications category is in beta. We've simplified calling integration for calling apps by utilizing the CallsManager Jetpack API. Together with the templates provided by the Car App Library, this enables communications apps to build features like full message history, upcoming meetings list, rich in-call views, and more. During beta, developers can build and publish communications apps to internal testing and closed testing tracks. You can also express interest in being an early access partner to publish to production while the category is in beta.

    • Games are now supported in Android Auto, while parked, on phones running Android 15 and above. You can already find some popular titles like Angry Birds 2, Farm Heroes Saga, Candy Crush Soda Saga and Beach Buggy Racing 2. The Games category is in Beta and developers can publish games to internal testing and closed testing tracks. You can also express interest in being an early access partner to publish to production while the category is in beta.

Finally, we have further simplified building, testing and distribution experience for developers building apps for Android Automotive OS cars with Google built-in:

The road ahead

You can look forward to more updates later this year, including:

    • Video apps will be supported on Android Auto, starting with phones running Android 16 on select compatible cars. If your app is already adaptive, enabling your app experience while parked only requires minimal steps to distribute to cars.

    • For Android Automotive OS cars running Android 14+ with Google built-in, we are working with car manufacturers to add additional app compatibility, to enable thousands of adaptive mobile apps in the next phase of the Car Ready Mobile Apps Program.

    • Updated design documentation that visualizes car app quality guidelines and integration paths to simplify designing your app for cars.

    • Google Play Services for cars with Google built-in are expanding to bring them on-par with mobile, including:
      • a. Passkeys and Credential Manager APIs for a more seamless user sign-in experience.
        b. Quick Share, which will enable easy cross-device sharing from phone to car.



    • Pre-launch reports for Android Automotive OS are coming soon to the Play Console, helping you ensure app quality before distributing your app to cars.

Be sure to keep up to date through goo.gle/cars-whats-new on these features and more as we continuously invest in the future of Android in the car. Stay tuned for more resources to help you build innovative and engaging experiences for drivers and passengers.

Ready to publish your car app? Check our guidance for distributing to cars.

Explore this announcement and all Google I/O 2025 updates on io.google starting May 22.

Androidify: Building powerful AI-driven experiences with Jetpack Compose, Gemini and CameraX

Posted by Rebecca Franks – Developer Relations Engineer

The Android bot is a beloved mascot for Android users and developers, with previous versions of the bot builder being very popular - we decided that this year we’d rebuild the bot maker from the ground up, using the latest technology backed by Gemini. Today we are releasing a new open source app, Androidify, for learning how to build powerful AI driven experiences on Android using the latest technologies such as Jetpack Compose, Gemini through Firebase, CameraX, and Navigation 3.

a moving image of various droid bots dancing individually

Androidify app demo

Here’s an example of the app running on the device, showcasing converting a photo to an Android bot that represents my likeness:

moving image showing the conversion of an image of a woman in a pink dress holding na umbrella into a 3D image of a droid bot wearing a pink dress holding an umbrella

Under the hood

The app combines a variety of different Google technologies, such as:

    • Gemini API - through Firebase AI Logic SDK, for accessing the underlying Imagen and Gemini models.
    • Jetpack Compose - for building the UI with delightful animations and making the app adapt to different screen sizes.
    • Navigation 3 - the latest navigation library for building up Navigation graphs with Compose.
    • CameraX Compose and Media3 Compose - for building up a custom camera with custom UI controls (rear camera support, zoom support, tap-to-focus) and playing the promotional video.

This sample app is currently using a standard Imagen model, but we've been working on a fine-tuned model that's trained specifically on all of the pieces that make the Android bot cute and fun; we'll share that version later this year. In the meantime, don't be surprised if the sample app puts out some interesting looking examples!

How does the Androidify app work?

The app leverages our best practices for Architecture, Testing, and UI to showcase a real world, modern AI application on device.

Flow chart describing Androidify app flow
Androidify app flow chart detailing how the app works with AI

AI in Androidify with Gemini and ML Kit

The Androidify app uses the Gemini models in a multitude of ways to enrich the app experience, all powered by the Firebase AI Logic SDK. The app uses Gemini 2.5 Flash and Imagen 3 under the hood:

    • Image validation: We ensure that the captured image contains sufficient information, such as a clearly focused person, and assessing for safety. This feature uses the multi-modal capabilities of Gemini API, by giving it a prompt and image at the same time:

val response = generativeModel.generateContent(
   content {
       text(prompt)
       image(image)
   },
)

    • Text prompt validation: If the user opts for text input instead of image, we use Gemini 2.5 Flash to ensure the text contains a sufficiently descriptive prompt to generate a bot.

    • Image captioning: Once we’re sure the image has enough information, we use Gemini 2.5 Flash to perform image captioning., We ask Gemini to be as descriptive as possible,focusing on the clothing and its colors.

    • “Help me write” feature: Similar to an “I’m feeling lucky” type feature, “Help me write” uses Gemini 2.5 Flash to create a random description of the clothing and hairstyle of a bot.

    • Image generation from the generated prompt: As the final step, Imagen generates the image, providing the prompt and the selected skin tone of the bot.

The app also uses the ML Kit pose detection to detect a person in the viewfinder and enable the capture button when a person is detected, as well as adding fun indicators around the content to indicate detection.

Explore more detailed information about AI usage in Androidify.

Jetpack Compose

The user interface of Androidify is built using Jetpack Compose, the modern UI toolkit that simplifies and accelerates UI development on Android.

Delightful details with the UI

The app uses Material 3 Expressive, the latest alpha release that makes your apps more premium, desirable, and engaging. It provides delightful bits of UI out-of-the-box, like new shapes, componentry, and using the MotionScheme variables wherever a motion spec is needed.

MaterialShapes are used in various locations. These are a preset list of shapes that allow for easy morphing between each other—for example, the cute cookie shape for the camera capture button:


Androidify app UI showing camera button
Camera button with a MaterialShapes.Cookie9Sided shape

Beyond using the standard Material components, Androidify also features custom composables and delightful transitions tailored to the specific needs of the app:

    • There are plenty of shared element transitions across the app—for example, a morphing shape shared element transition is performed between the “take a photo” button and the camera surface.

      moving example of expressive button shapes in slow motion

    • Custom enter transitions for the ResultsScreen with the usage of marquee modifiers.

      animated marquee example

    • Fun color splash animation as a transition between screens.

      moving image of a blue color splash transition between Androidify demo screens

    • Animating gradient buttons for the AI-powered actions.

      animated gradient button for AI powered actions example

To learn more about the unique details of the UI, read Androidify: Building delightful UIs with Compose

Adapting to different devices

Androidify is designed to look great and function seamlessly across candy bar phones, foldables, and tablets. The general goal of developing adaptive apps is to avoid reimplementing the same app multiple times on each form factor by extracting out reusable composables, and leveraging APIs like WindowSizeClass to determine what kind of layout to display.

a collage of different adaptive layouts for the Androidify app across small and large screens
Various adaptive layouts in the app

For Androidify, we only needed to leverage the width window size class. Combining this with different layout mechanisms, we were able to reuse or extend the composables to cater to the multitude of different device sizes and capabilities.

    • Responsive layouts: The CreationScreen demonstrates adaptive design. It uses helper functions like isAtLeastMedium() to detect window size categories and adjust its layout accordingly. On larger windows, the image/prompt area and color picker might sit side-by-side in a Row, while on smaller windows, the color picker is accessed via a ModalBottomSheet. This pattern, called “supporting pane”, highlights the supporting dependencies between the main content and the color picker.

    • Foldable support: The app actively checks for foldable device features. The camera screen uses WindowInfoTracker to get FoldingFeature information to adapt to different features by optimizing the layout for tabletop posture.

    • Rear display: Support for devices with multiple displays is included via the RearCameraUseCase, allowing for the device camera preview to be shown on the external screen when the device is unfolded (so the main content is usually displayed on the internal screen).

Using window size classes, coupled with creating a custom @LargeScreensPreview annotation, helps achieve unique and useful UIs across the spectrum of device sizes and window sizes.

CameraX and Media3 Compose

To allow users to base their bots on photos, Androidify integrates CameraX, the Jetpack library that makes camera app development easier.

The app uses a custom CameraLayout composable that supports the layout of the typical composables that a camera preview screen would include— for example, zoom buttons, a capture button, and a flip camera button. This layout adapts to different device sizes and more advanced use cases, like the tabletop mode and rear-camera display. For the actual rendering of the camera preview, it uses the new CameraXViewfinder that is part of the camerax-compose artifact.

CameraLayout in Compose
CameraLayout composable that takes care of different device configurations, such as table top mode

CameraLayout in Compose
CameraLayout composable that takes care of different device configurations, such as table top mode

The app also integrates with Media3 APIs to load an instructional video for showing how to get the best bot from a prompt or image. Using the new media3-ui-compose artifact, we can easily add a VideoPlayer into the app:

@Composable
private fun VideoPlayer(modifier: Modifier = Modifier) {
    val context = LocalContext.current
    var player by remember { mutableStateOf<Player?>(null) }
    LifecycleStartEffect(Unit) {
        player = ExoPlayer.Builder(context).build().apply {
            setMediaItem(MediaItem.fromUri(Constants.PROMO_VIDEO))
            repeatMode = Player.REPEAT_MODE_ONE
            prepare()
        }
        onStopOrDispose {
            player?.release()
            player = null
        }
    }
    Box(
        modifier
            .background(MaterialTheme.colorScheme.surfaceContainerLowest),
    ) {
        player?.let { currentPlayer ->
            PlayerSurface(currentPlayer, surfaceType = SURFACE_TYPE_TEXTURE_VIEW)
        }
    }
}

Using the new onLayoutRectChanged modifier, we also listen for whether the composable is completely visible or not, and play or pause the video based on this information:

var videoFullyOnScreen by remember { mutableStateOf(false) }     

LaunchedEffect(videoFullyOnScreen) {
     if (videoFullyOnScreen) currentPlayer.play() else currentPlayer.pause()
} 

// We add this onto the player composable to determine if the video composable is visible, and mutate the videoFullyOnScreen variable, that then toggles the player state. 
Modifier.onVisibilityChanged(
                containerWidth = LocalView.current.width,
                containerHeight = LocalView.current.height,
) { fullyVisible -> videoFullyOnScreen = fullyVisible }

// A simple version of visibility changed detection
fun Modifier.onVisibilityChanged(
    containerWidth: Int,
    containerHeight: Int,
    onChanged: (visible: Boolean) -> Unit,
) = this then Modifier.onLayoutRectChanged(100, 0) { layoutBounds ->
    onChanged(
        layoutBounds.boundsInRoot.top > 0 &&
            layoutBounds.boundsInRoot.bottom < containerHeight &&
            layoutBounds.boundsInRoot.left > 0 &&
            layoutBounds.boundsInRoot.right < containerWidth,
    )
}

Additionally, using rememberPlayPauseButtonState, we add on a layer on top of the player to offer a play/pause button on the video itself:

val playPauseButtonState = rememberPlayPauseButtonState(currentPlayer)
            OutlinedIconButton(
                onClick = playPauseButtonState::onClick,
                enabled = playPauseButtonState.isEnabled,
            ) {
                val icon =
                    if (playPauseButtonState.showPlay) R.drawable.play else R.drawable.pause
                val contentDescription =
                    if (playPauseButtonState.showPlay) R.string.play else R.string.pause
                Icon(
                    painterResource(icon),
                    stringResource(contentDescription),
                )
            }

Check out the code for more details on how CameraX and Media3 were used in Androidify.

Navigation 3

Screen transitions are handled using the new Jetpack Navigation 3 library androidx.navigation3. The MainNavigation composable defines the different destinations (Home, Camera, Creation, About) and displays the content associated with each destination using NavDisplay. You get full control over your back stack, and navigating to and from destinations is as simple as adding and removing items from a list.

@Composable
fun MainNavigation() {
   val backStack = rememberMutableStateListOf<NavigationRoute>(Home)
   NavDisplay(
       backStack = backStack,
       onBack = { backStack.removeLastOrNull() },
       entryProvider = entryProvider {
           entry<Home> { entry ->
               HomeScreen(
                   onAboutClicked = {
                       backStack.add(About)
                   },
               )
           }
           entry<Camera> {
               CameraPreviewScreen(
                   onImageCaptured = { uri ->
                       backStack.add(Create(uri.toString()))
                   },
               )
           }
           // etc
       },
   )
}

Notably, Navigation 3 exposes a new composition local, LocalNavAnimatedContentScope, to easily integrate your shared element transitions without needing to keep track of the scope yourself. By default, Navigation 3 also integrates with predictive back, providing delightful back experiences when navigating between screens, as seen in this prior shared element transition:

CameraLayout in Compose

Learn more about Jetpack Navigation 3, currently in alpha.

Learn more

By combining the declarative power of Jetpack Compose, the camera capabilities of CameraX, the intelligent features of Gemini, and thoughtful adaptive design, Androidify is a personalized avatar creation experience that feels right at home on any Android device. You can find the full code sample at github.com/android/androidify where you can see the app in action and be inspired to build your own AI-powered app experiences.

Explore this announcement and all Google I/O 2025 updates on io.google starting May 22.


Androidify: How Androidify leverages Gemini, Firebase and ML Kit

Posted by Thomas Ezan – Developer Relations Engineer, Rebecca Franks – Developer Relations Engineer, and Avneet Singh – Product Manager

We’re bringing back Androidify later this year, this time powered by Google AI, so you can customize your very own Android bot and share your creativity with the world. Today, we’re releasing a new open source demo app for Androidify as a great example of how Google is using its Gemini AI models to enhance app experiences.

In this post, we'll dive into how the Androidify app uses Gemini models and Imagen via the Firebase AI Logic SDK, and we'll provide some insights learned along the way to help you incorporate Gemini and AI into your own projects. Read more about the Androidify demo app.

App flow

The overall app functions as follows, with various parts of it using Gemini and Firebase along the way:

flow chart demonstrating Androidify app flow

Gemini and image validation

To get started with Androidify, take a photo or choose an image on your device. The app needs to make sure that the image you upload is suitable for creating an avatar.

Gemini 2.5 Flash via Firebase helps with this by verifying that the image contains a person, that the person is in focus, and assessing image safety, including whether the image contains abusive content.

val jsonSchema = Schema.obj(
   properties = mapOf("success" to Schema.boolean(), "error" to Schema.string()),
   optionalProperties = listOf("error"),
   )
   
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI())
   .generativeModel(
            modelName = "gemini-2.5-flash-preview-04-17",
   	     generationConfig = generationConfig {
                responseMimeType = "application/json"
                responseSchema = jsonSchema
            },
            safetySettings = listOf(
                SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.CIVIC_INTEGRITY, HarmBlockThreshold.LOW_AND_ABOVE),
    	),
    )

 val response = generativeModel.generateContent(
            content {
                text("You are to analyze the provided image and determine if it is acceptable and appropriate based on specific criteria.... (more details see the full sample)")
                image(image)
            },
        )

val jsonResponse = Json.parseToJsonElement(response.text)
val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true
val error = jsonResponse.jsonObject["error"]?.jsonPrimitive?.content

In the snippet above, we’re leveraging structured output capabilities of the model by defining the schema of the response. We’re passing a Schema object via the responseSchema param in the generationConfig.

We want to validate that the image has enough information to generate a nice Android avatar. So we ask the model to return a json object with success = true/false and an optional error message explaining why the image doesn't have enough information.

Structured output is a powerful feature enabling a smoother integration of LLMs to your app by controlling the format of their output, similar to an API response.

Image captioning with Gemini Flash

Once it's established that the image contains sufficient information to generate an Android avatar, it is captioned using Gemini 2.5 Flash with structured output.

val jsonSchema = Schema.obj(
            properties = mapOf(
                "success" to Schema.boolean(),
                "user_description" to Schema.string(),
            ),
            optionalProperties = listOf("user_description"),
        )
val generativeModel = createGenerativeTextModel(jsonSchema)

val prompt = "You are to create a VERY detailed description of the main person in the given image. This description will be translated into a prompt for a generative image model..."

val response = generativeModel.generateContent(
content { 
       	text(prompt) 
             	image(image) 
	})
        
val jsonResponse = Json.parseToJsonElement(response.text!!) 
val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true

val userDescription = jsonResponse.jsonObject["user_description"]?.jsonPrimitive?.content

The other option in the app is to start with a text prompt. You can enter in details about your accessories, hairstyle, and clothing, and let Imagen be a bit more creative.

Android generation via Imagen

We’ll use this detailed description of your image to enrich the prompt used for image generation. We’ll add extra details around what we would like to generate and include the bot color selection as part of this too, including the skin tone selected by the user.

val imagenPrompt = "A 3D rendered cartoonish Android mascot in a photorealistic style, the pose is relaxed and straightforward, facing directly forward [...] The bot looks as follows $userDescription [...]"

We then call the Imagen model to create the bot. Using this new prompt, we create a model and call generateImages:

// we supply our own fine-tuned model here but you can use "imagen-3.0-generate-002" 
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()).imagenModel(
            "imagen-3.0-generate-002",
            safetySettings =
            ImagenSafetySettings(
                ImagenSafetyFilterLevel.BLOCK_LOW_AND_ABOVE,
                personFilterLevel = ImagenPersonFilterLevel.ALLOW_ALL,
            ),
)

val response = generativeModel.generateImages(imagenPrompt)

val image = response.images.first().asBitmap()

And that’s it! The Imagen model generates a bitmap that we can display on the user’s screen.

Finetuning the Imagen model

The Imagen 3 model was finetuned using Low-Rank Adaptation (LoRA). LoRA is a fine-tuning technique designed to reduce the computational burden of training large models. Instead of updating the entire model, LoRA adds smaller, trainable "adapters" that make small changes to the model's performance. We ran a fine tuning pipeline on the Imagen 3 model generally available with Android bot assets of different color combinations and different assets for enhanced cuteness and fun. We generated text captions for the training images and the image-text pairs were used to finetune the model effectively.

The current sample app uses a standard Imagen model, so the results may look a bit different from the visuals in this post. However, the app using the fine-tuned model and a custom version of Firebase AI Logic SDK was demoed at Google I/O. This app will be released later this year and we are also planning on adding support for fine-tuned models to Firebase AI Logic SDK later in the year.

moving image of Androidify app demo turning a selfie image of a bearded man wearing a black tshirt and sunglasses, with a blue back pack into a green 3D bearded droid wearing a black tshirt and sunglasses with a blue backpack
The original image... and Androidifi-ed image

ML Kit

The app also uses the ML Kit Pose Detection SDK to detect a person in the camera view, which triggers the capture button and adds visual indicators.

To do this, we add the SDK to the app, and use PoseDetection.getClient(). Then, using the poseDetector, we look at the detectedLandmarks that are in the streaming image coming from the Camera, and we set the _uiState.detectedPose to true if a nose and shoulders are visible:

private suspend fun runPoseDetection() {
    PoseDetection.getClient(
        PoseDetectorOptions.Builder()
            .setDetectorMode(PoseDetectorOptions.STREAM_MODE)
            .build(),
    ).use { poseDetector ->
        // Since image analysis is processed by ML Kit asynchronously in its own thread pool,
        // we can run this directly from the calling coroutine scope instead of pushing this
        // work to a background dispatcher.
        cameraImageAnalysisUseCase.analyze { imageProxy ->
            imageProxy.image?.let { image ->
                val poseDetected = poseDetector.detectPersonInFrame(image, imageProxy.imageInfo)
                _uiState.update { it.copy(detectedPose = poseDetected) }
            }
        }
    }
}

private suspend fun PoseDetector.detectPersonInFrame(
    image: Image,
    imageInfo: ImageInfo,
): Boolean {
    val results = process(InputImage.fromMediaImage(image, imageInfo.rotationDegrees)).await()
    val landmarkResults = results.allPoseLandmarks
    val detectedLandmarks = mutableListOf<Int>()
    for (landmark in landmarkResults) {
        if (landmark.inFrameLikelihood > 0.7) {
            detectedLandmarks.add(landmark.landmarkType)
        }
    }

    return detectedLandmarks.containsAll(
        listOf(PoseLandmark.NOSE, PoseLandmark.LEFT_SHOULDER, PoseLandmark.RIGHT_SHOULDER),
    )
}
moving image showing the camera shutter button activating when an orange droid figurine is held in the camera frame
The camera shutter button is activated when a person (or a bot!) enters the frame.

Get started with AI on Android

The Androidify app makes an extensive use of the Gemini 2.5 Flash to validate the image and generate a detailed description used to generate the image. It also leverages the specifically fine-tuned Imagen 3 model to generate images of Android bots. Gemini and Imagen models are easily integrated into the app via the Firebase AI Logic SDK. In addition, ML Kit Pose Detection SDK controls the capture button, enabling it only when a person is present in front of the camera.

To get started with AI on Android, go to the Gemini and Imagen documentation for Android.

Explore this announcement and all Google I/O 2025 updates on io.google starting May 22.