Tag Archives: Android

Advanced capabilities of the Gemini API for Android developers

Posted by Thomas Ezan, Sr Developer Relation Engineer

Thousands of developers across the globe are harnessing the power of the Gemini 1.5 Pro and Gemini 1.5 Flash models to infuse advanced generative AI features into their applications. Android developers are no exception, and with the upcoming launch of the stable version of VertexAI in Firebase in a few weeks (available in Beta since Google I/O), it's the perfect time to explore how your app can benefit from it. We just published a codelab to help you get started.

Let's deep dive into some advanced capabilities of the Gemini API that go beyond simple text prompting and discover the exciting use cases they can unlock in your Android app.

Shaping AI behavior with system instructions

System instructions serve as a "preamble" that you incorporate before the user prompt. This enables shaping the model's behavior to align with your specific requirements and scenarios. You set the instructions when you initialize the model, and then those instructions persist through all interactions with the model, across multiple user and model turns.

For example, you can use system instructions to:

Define a persona or role for a chatbot (e.g, “explain like I am 5”)

Specify the response to the output format (e.g., Markdown, YAML, etc.)

Set the output style and tone (e.g, verbosity, formality, etc…)

Define the goals or rules for the task (e.g, “return a code snippet without further explanation”)

Provide additional context for the prompt (e.g., a knowledge cutoff date)

To use system instructions in your Android app, pass it as parameter when you initialize the model:

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  ...
  systemInstruction = 
    content { text("You are a knowledgeable tutor. Answer the questions using the socratic tutoring method.") }
)

You can learn more about system instruction in the Vertex AI in Firebase documentation.

You can also easily test your prompt with different system instructions in Vertex AI Studio, Google Cloud console tool for rapidly prototyping and testing prompts with Gemini models.

test system instructions with your prompts in Vertex AI Studio

Vertex AI Studio let’s you test system instructions with your prompts

When you are ready to go to production it is recommended to target a specific version of the model (e.g. gemini-1.5-flash-002). But as new model versions are released and previous ones are deprecated, it is advised to use Firebase Remote Config to be able to update the version of the Gemini model without releasing a new version of your app.

Beyond chatbots: leveraging generative AI for advanced use cases

While chatbots are a popular application of generative AI, the capabilities of the Gemini API go beyond conversational interfaces and you can integrate multimodal GenAI-enabled features into various aspects of your Android app.

Many tasks that previously required human intervention (such as analyzing text, image or video content, synthesizing data into a human readable format, engaging in a creative process to generate new content, etc… ) can be potentially automated using GenAI.

Gemini JSON support

Android apps don’t interface well with natural language outputs. Conversely, JSON is ubiquitous in Android development, and provides a more structured way for Android apps to consume input. However, ensuring proper key/value formatting when working with generative models can be challenging.

With the general availability of Vertex AI in Firebase, implemented solutions to streamline JSON generation with proper key/value formatting:

Response MIME type identifier

If you have tried generating JSON with a generative AI model, it's likely you have found yourself with unwanted extra text that makes the JSON parsing more challenging.

e.g:

Sure, here is your JSON:
```
{
   "someKey”: “someValue",
   ...
}
```

When using Gemini 1.5 Pro or Gemini 1.5 Flash, in the generation configuration, you can explicitly specify the model’s response mime/type as application/json and instruct the model to generate well-structured JSON output.

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  …
  generationConfig = generationConfig {
     responseMimeType = "application/json"
  }
)

Review the API reference for more details.

Soon, the Android SDK for Vertex AI in Firebase will enable you to define the JSON schema expected in the response.

Multimodal capabilities

Both Gemini 1.5 Flash and Gemini 1.5 Pro are multimodal models. It means that they can process input from multiple formats, including text, images, audio, video. In addition, they both have long context windows, capable of handling up to 1 million tokens for Gemini 1.5 Flash and 2 million tokens for Gemini 1.5 Pro.

These features open doors to innovative functionalities that were previously inaccessible such as automatically generate descriptive captions for images, identify topics in a conversation and generate chapters from an audio file or describe the scenes and actions in a video file.

You can pass an image to the model as shown in this example:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(imageUri).use { stream ->
  stream?.let {
     val bitmap = BitmapFactory.decodeStream(stream)

    // Provide a prompt that includes the image specified above and text
    val prompt = content {
       image(bitmap)
       text("How many people are on this picture?")
    }
  }
  val response = generativeModel.generateContent(prompt)
}

You can also pass a video to the model:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        blob("video/mp4", bytes)
        text("What is in the video?")
    }

    val fullResponse = generativeModel.generateContent(prompt)
  }
}

You can learn more about multimodal prompting in the VertexAI for Firebase documentation.

Note: This method enables you to pass files up to 20 MB. For larger files, use Cloud Storage for Firebase and include the file’s URL in your multimodal request. Read the documentation for more information.

Function calling: Extending the model's capabilities

Function calling enables you to extend the capabilities to generative models. For example you can enable the model to retrieve information in your SQL database and feed it back to the context of the prompt. You can also let the model trigger actions by calling the functions in your app source code. In essence, function calls bridge the gap between the Gemini models and your Kotlin code.

Take the example of a food delivery application that is interested in implementing a conversational interface with the Gemini 1.5 Flash. Assume that this application has a getFoodOrder(cuisine: String) function that returns the list orders from the user for a specific type of cuisine:

fun getFoodOrder(cuisine: String) : JSONObject {
        // implementation…  
}

Note that the function, to be usable to by the model, needs to return the response in the form of a JSONObject.

To make the response available to Gemini 1.5 Flash, create a definition of your function that the model will be able to understand using defineFunction:

val getOrderListFunction = defineFunction(
            name = "getOrderList",
            description = "Get the list of food orders from the user for a define type of cuisine.",
            Schema.str(name = "cuisineType", description = "the type of cuisine for the order")
        ) {  cuisineType ->
            getFoodOrder(cuisineType)
        }

Then, when you instantiate the model, share this function definition with the model using the tools parameter:

val generativeModel = Firebase.vertexAI.generativeModel(
    modelName = "gemini-1.5-flash",
    ...
    tools = listOf(Tool(listOf(getExchangeRate)))
)

Finally, when you get a response from the model, check in the response if the model is actually requesting to execute the function:

// Send the message to the generative model
var response = chat.sendMessage(prompt)

// Check if the model responded with a function call
response.functionCall?.let { functionCall ->
  // Try to retrieve the stored lambda from the model's tools and
  // throw an exception if the returned function was not declared
  val matchedFunction = generativeModel.tools?.flatMap { it.functionDeclarations }
      ?.first { it.name == functionCall.name }
      ?: throw InvalidStateException("Function not found: ${functionCall.name}")
  
  // Call the lambda retrieved above
  val apiResponse: JSONObject = matchedFunction.execute(functionCall)

  // Send the API response back to the generative model
  // so that it generates a text response that can be displayed to the user
  response = chat.sendMessage(
    content(role = "function") {
        part(FunctionResponsePart(functionCall.name, apiResponse))
    }
  )
}

// If the model responds with text, show it in the UI
response.text?.let { modelResponse ->
    println(modelResponse)
}

To summarize, you’ll provide the functions (or tools to the model) at initialization:

A flow diagram shows a green box labeled 'Generative Model' connected to a list of model parameters and a list of tools. The parameters include 'gemini-1.5-flash', 'api_key', and 'configuration', while the tools are 'getOrderList()', 'getDate()', and 'placeOrder()'.

And when appropriate, the model will request to execute the appropriate function and provide the results:

A flow diagram illustrating the interaction between an Android app and a 'Generative Model'. The app sends 'getDate()' and 'getOrderList()' requests.

You can read more about function calling in the VertexAI for Firebase documentation.

Unlocking the potential of the Gemini API in your app

The Gemini API offers a treasure trove of advanced features that empower Android developers to craft truly innovative and engaging applications. By going beyond basic text prompts and exploring the capabilities highlighted in this blog post, you can create AI-powered experiences that delight your users and set your app apart in the competitive Android landscape.

To learn more about AI on Android, check out other resources we have available during AI on Android Spotlight Week.

Use #AndroidAI hashtag to share your creations or feedback on social media, and join us at the forefront of the AI revolution!

The code snippets in this blog post have the following license:

// Copyright 2024 Google LLC.
// SPDX-License-Identifier: Apache-2.0

Source: Android Developers Blog

Advanced capabilities of the Gemini API for Android developers

Posted by Thomas Ezan, Sr Developer Relation Engineer

Let's deep dive into some advanced capabilities of the Gemini API that go beyond simple text prompting and discover the exciting use cases they can unlock in your Android app.

Shaping AI behavior with system instructions

For example, you can use system instructions to:

Define a persona or role for a chatbot (e.g, “explain like I am 5”)

Specify the response to the output format (e.g., Markdown, YAML, etc.)

Set the output style and tone (e.g, verbosity, formality, etc…)

Define the goals or rules for the task (e.g, “return a code snippet without further explanation”)

Provide additional context for the prompt (e.g., a knowledge cutoff date)

To use system instructions in your Android app, pass it as parameter when you initialize the model:

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  ...
  systemInstruction = 
    content { text("You are a knowledgeable tutor. Answer the questions using the socratic tutoring method.") }
)

You can learn more about system instruction in the Vertex AI in Firebase documentation.

You can also easily test your prompt with different system instructions in Vertex AI Studio, Google Cloud console tool for rapidly prototyping and testing prompts with Gemini models.

Vertex AI Studio let’s you test system instructions with your prompts

Beyond chatbots: leveraging generative AI for advanced use cases

Gemini JSON support

With the general availability of Vertex AI in Firebase, implemented solutions to streamline JSON generation with proper key/value formatting:

Response MIME type identifier

If you have tried generating JSON with a generative AI model, it's likely you have found yourself with unwanted extra text that makes the JSON parsing more challenging.

e.g:

Sure, here is your JSON:
```
{
   "someKey”: “someValue",
   ...
}
```

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  …
  generationConfig = generationConfig {
     responseMimeType = "application/json"
  }
)

Review the API reference for more details.

Soon, the Android SDK for Vertex AI in Firebase will enable you to define the JSON schema expected in the response.

Multimodal capabilities

You can pass an image to the model as shown in this example:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(imageUri).use { stream ->
  stream?.let {
     val bitmap = BitmapFactory.decodeStream(stream)

    // Provide a prompt that includes the image specified above and text
    val prompt = content {
       image(bitmap)
       text("How many people are on this picture?")
    }
  }
  val response = generativeModel.generateContent(prompt)
}

You can also pass a video to the model:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        blob("video/mp4", bytes)
        text("What is in the video?")
    }

    val fullResponse = generativeModel.generateContent(prompt)
  }
}

You can learn more about multimodal prompting in the VertexAI for Firebase documentation.

Note: This method enables you to pass files up to 20 MB. For larger files, use Cloud Storage for Firebase and include the file’s URL in your multimodal request. Read the documentation for more information.

Function calling: Extending the model's capabilities

fun getFoodOrder(cuisine: String) : JSONObject {
        // implementation…  
}

Note that the function, to be usable to by the model, needs to return the response in the form of a JSONObject.

To make the response available to Gemini 1.5 Flash, create a definition of your function that the model will be able to understand using defineFunction:

val getOrderListFunction = defineFunction(
            name = "getOrderList",
            description = "Get the list of food orders from the user for a define type of cuisine.",
            Schema.str(name = "cuisineType", description = "the type of cuisine for the order")
        ) {  cuisineType ->
            getFoodOrder(cuisineType)
        }

Then, when you instantiate the model, share this function definition with the model using the tools parameter:

val generativeModel = Firebase.vertexAI.generativeModel(
    modelName = "gemini-1.5-flash",
    ...
    tools = listOf(Tool(listOf(getExchangeRate)))
)

Finally, when you get a response from the model, check in the response if the model is actually requesting to execute the function:

// Send the message to the generative model
var response = chat.sendMessage(prompt)

// Check if the model responded with a function call
response.functionCall?.let { functionCall ->
  // Try to retrieve the stored lambda from the model's tools and
  // throw an exception if the returned function was not declared
  val matchedFunction = generativeModel.tools?.flatMap { it.functionDeclarations }
      ?.first { it.name == functionCall.name }
      ?: throw InvalidStateException("Function not found: ${functionCall.name}")
  
  // Call the lambda retrieved above
  val apiResponse: JSONObject = matchedFunction.execute(functionCall)

  // Send the API response back to the generative model
  // so that it generates a text response that can be displayed to the user
  response = chat.sendMessage(
    content(role = "function") {
        part(FunctionResponsePart(functionCall.name, apiResponse))
    }
  )
}

// If the model responds with text, show it in the UI
response.text?.let { modelResponse ->
    println(modelResponse)
}

To summarize, you’ll provide the functions (or tools to the model) at initialization:

And when appropriate, the model will request to execute the appropriate function and provide the results:

You can read more about function calling in the VertexAI for Firebase documentation.

Unlocking the potential of the Gemini API in your app

To learn more about AI on Android, check out other resources we have available during AI on Android Spotlight Week.

Use #AndroidAI hashtag to share your creations or feedback on social media, and join us at the forefront of the AI revolution!

The code snippets in this blog post have the following license:

// Copyright 2024 Google LLC.
// SPDX-License-Identifier: Apache-2.0

Source: Android Developers Blog

Pixel’s Proactive Approach to Security: Addressing Vulnerabilities in Cellular Modems

Posted by Sherk Chung, Stephan Chen, Pixel team, and Roger Piqueras Jover, Ivan Lozano, Android team

Pixel phones have earned a well-deserved reputation for being security-conscious. In this blog, we'll take a peek under the hood to see how Pixel mitigates common exploits on cellular basebands.

Smartphones have become an integral part of our lives, but few of us think about the complex software that powers them, especially the cellular baseband – the processor on the device responsible for handling all cellular communication (such as LTE, 4G, and 5G). Most smartphones use cellular baseband processors with tight performance constraints, making security hardening difficult. Security researchers have increasingly exploited this attack vector and routinely demonstrated the possibility of exploiting basebands used in popular smartphones.

The good news is that Pixel has been deploying security hardening mitigations in our basebands for years, and Pixel 9 represents the most hardened baseband we've shipped yet. Below, we’ll dive into why this is so important, how specifically we’ve improved security, and what this means for our users.

The Cellular Baseband

The cellular baseband within a smartphone is responsible for managing the device's connectivity to cellular networks. This function inherently involves processing external inputs, which may originate from untrusted sources. For instance, malicious actors can employ false base stations to inject fabricated or manipulated network packets. In certain protocols like IMS (IP Multimedia Subsystem), this can be executed remotely from any global location using an IMS client.

The firmware within the cellular baseband, similar to any software, is susceptible to bugs and errors. In the context of the baseband, these software vulnerabilities pose a significant concern due to the heightened exposure of this component within the device's attack surface. There is ample evidence demonstrating the exploitation of software bugs in modem basebands to achieve remote code execution, highlighting the critical risk associated with such vulnerabilities.

The State of Baseband Security

Baseband security has emerged as a prominent area of research, with demonstrations of software bug exploitation featuring in numerous security conferences. Many of these conferences now also incorporate training sessions dedicated to baseband firmware emulation, analysis, and exploitation techniques.

Recent reports by security researchers have noted that most basebands lack exploit mitigations commonly deployed elsewhere and considered best practices in software development. Mature software hardening techniques that are commonplace in the Android operating system, for example, are often absent from cellular firmwares of many popular smartphones.

There are clear indications that exploit vendors and cyber-espionage firms abuse these vulnerabilities to breach the privacy of individuals without their consent. For example, 0-day exploits in the cellular baseband are being used to deploy the Predator malware in smartphones. Additionally, exploit marketplaces explicitly list baseband exploits, often with relatively low payouts, suggesting a potential abundance of such vulnerabilities. These vulnerabilities allow attackers to gain unauthorized access to a device, execute arbitrary code, escalate privileges, or extract sensitive information.

Recognizing these industry trends, Android and Pixel have proactively updated their Vulnerability Rewards Program in recent years, placing a greater emphasis on identifying and addressing exploitable bugs in connectivity firmware.

Building a Fortress: Proactive Defenses in the Pixel Modem

In response to the rising threat of baseband security attacks, Pixel has incrementally incorporated many of the following proactive defenses over the years, with the Pixel 9 phones (Pixel 9, Pixel 9 Pro, Pixel 9 Pro XL and Pixel 9 Pro Fold) showcasing the latest features:

Bounds Sanitizer: Buffer overflows occur when a bug in code allows attackers to cram too much data into a space, causing it to spill over and potentially corrupt other data or execute malicious code. Bounds Sanitizer automatically adds checks around a specific subset of memory accesses to ensure that code does not access memory outside of designated areas, preventing memory corruption.
Integer Overflow Sanitizer: Numbers matter, and when they get too large an “overflow” can cause them to be incorrectly interpreted as smaller values. The reverse can happen as well, a number can overflow in the negative direction as well and be incorrectly interpreted as a larger value. These overflows can be exploited by attackers to cause unexpected behavior. Integer Overflow Sanitizer adds checks around these calculations to eliminate the risk of memory corruption from this class of vulnerabilities.
Stack Canaries: Stack canaries are like tripwires set up to ensure code executes in the expected order. If a hacker tries to exploit a vulnerability in the stack to change the flow of execution without being mindful of the canary, the canary "trips," alerting the system to a potential attack.
Control Flow Integrity (CFI): Similar to stack canaries, CFI makes sure code execution is constrained along a limited number of paths. If an attacker tries to deviate from the allowed set of execution paths, CFI causes the modem to restart rather than take the unallowed execution path.
Auto-Initialize Stack Variables: When memory is designated for use, it’s not normally initialized in C/C+ as it is expected the developer will correctly set up the allocated region. When a developer fails to handle this correctly, the uninitialized values can leak sensitive data or be manipulated by attackers to gain code execution. Pixel phones automatically initialize stack variables to zero, preventing this class of vulnerabilities for stack data.

We also leverage a number of bug detection tools, such as address sanitizer, during our testing process. This helps us identify software bugs and patch them prior to shipping devices to our users.

The Pixel Advantage: Combining Protections for Maximum Security

Security hardening is difficult and our work is never done, but when these security measures are combined, they significantly increase Pixel 9’s resilience to baseband attacks.

Pixel's proactive approach to security demonstrates a commitment to protecting its users across the entire software stack. Hardening the cellular baseband against remote attacks is just one example of how Pixel is constantly working to stay ahead of the curve when it comes to security.

Special thanks to our colleagues who supported our cellular baseband hardening efforts: Dominik Maier, Shawn Yang, Sami Tolvanen, Pirama Arumuga Nainar, Stephen Hines, Kevin Deus, Xuan Xing, Eugene Rodionov, Stephan Somogyi, Wes Johnson, Suraj Harjani, Morgan Shen, Valery Wu, Clint Chen, Cheng-Yi He, Estefany Torres, Hungyen Weng, Jerry Hung, Sherif Hanna

Source: Google Online Security Blog

PyTorch machine learning models on Android

Posted by Paul Ruiz – Senior Developer Relations Engineer

Earlier this year we launched Google AI Edge, a suite of tools with easy access to ready-to-use ML tasks, frameworks that enable you to build ML pipelines, and run popular LLMs and custom models – all on-device. For AI on Android Spotlight Week, the Google team is highlighting various ways that Android developers can use machine learning to help improve their applications.

In this post, we'll dive into Google AI Edge Torch, which enables you to convert PyTorch models to run locally on Android and other platforms, using the Google AI Edge LiteRT (formerly TensorFlow Lite) and MediaPipe Tasks libraries. For insights on other powerful tools, be sure to explore the rest of the AI on Android Spotlight Week content.

To get started with Google AI Edge easier, we've provided samples available on GitHub as an executable codelab. They demonstrate how to convert the MobileViT model for image classification (compatible with MediaPipe Tasks) and the DIS model for segmentation (compatible with LiteRT).

a red Android figurine is shown next to a black and white silhouette of the same figure, labeled 'Original Image' and 'PT Mask' respectively, demonstrating image segmentation.

DIS model output

This blog guides you through how to use the MobileViT model with MediaPipe Tasks. Keep in mind that the LiteRT runtime provides similar capabilities, enabling you to build custom pipelines and features.

Convert MobileViT model for image classification compatible with MediaPipe Tasks

Once you've installed the necessary dependencies and utilities for your app, the first step is to retrieve the PyTorch model you wish to convert, along with any other MobileViT components you might need (such as an image processor for testing).

from transformers import MobileViTImageProcessor, MobileViTForImageClassification

hf_model_path = 'apple/mobilevit-small'
processor = MobileViTImageProcessor.from_pretrained(hf_model_path)
pt_model = MobileViTForImageClassification.from_pretrained(hf_model_path)

Since the end result of this tutorial should work with MediaPipe Tasks, take an extra step to match the expected input and output shapes for image classification to what is used by the MediaPipe image classification Task.

class HF2MP_ImageClassificationModelWrapper(nn.Module):

  def __init__(self, hf_image_classification_model, hf_processor):
    super().__init__()
    self.model = hf_image_classification_model
    if hf_processor.do_rescale:
      self.rescale_factor = hf_processor.rescale_factor
    else:
      self.rescale_factor = 1.0

  def forward(self, image: torch.Tensor):
    # BHWC -> BCHW.
    image = image.permute(0, 3, 1, 2)
    # RGB -> BGR.
    image = image.flip(dims=(1,))
    # Scale [0, 255] -> [0, 1].
    image = image * self.rescale_factor
    logits = self.model(pixel_values=image).logits  # [B, 1000] float32.
    # Softmax is required for MediaPipe classification model.
    logits = torch.nn.functional.softmax(logits, dim=-1)

    return logits

hf_model_path = 'apple/mobilevit-small'
hf_mobile_vit_processor = MobileViTImageProcessor.from_pretrained(hf_model_path)
hf_mobile_vit_model = MobileViTForImageClassification.from_pretrained(hf_model_path)
wrapped_pt_model = HF2MP_ImageClassificationModelWrapper(
hf_mobile_vit_model, hf_mobile_vit_processor).eval()

Whether you plan to use the converted MobileViT model with MediaPipe Tasks or LiteRT, the next step is to convert the model to the .tflite format.

First, match the input shape. In this example, the input shape is 1, 256, 256, 3 for a 256x256 pixel three-channel RGB image.

Then, call AI Edge Torch's convert function to complete the conversion process.

import ai_edge_torch

sample_args = (torch.rand((1, 256, 256, 3)),)
edge_model = ai_edge_torch.convert(wrapped_pt_model, sample_args)

After converting the model, you can further refine it by incorporating metadata for the image classification labels. MediaPipe Tasks will utilize this metadata to display or return pertinent information after classification.

from mediapipe.tasks.python.metadata.metadata_writers import image_classifier
from mediapipe.tasks.python.metadata.metadata_writers import metadata_writer
from mediapipe.tasks.python.vision.image_classifier import ImageClassifier
from pathlib import Path

flatbuffer_file = Path('hf_mobile_vit_mp_image_classification_raw.tflite')
edge_model.export(flatbuffer_file)
tflite_model_buffer = flatbuffer_file.read_bytes()

//Extract the image classification labels from the HF models for later integration into the TFLite model.
labels = list(hf_mobile_vit_model.config.id2label.values())

writer = image_classifier.MetadataWriter.create(
    tflite_model_buffer,
    input_norm_mean=[0.0], #  Normalization is not needed for this model.
    input_norm_std=[1.0],
    labels=metadata_writer.Labels().add(labels),
)
tflite_model_buffer, _ = writer.populate()

With all of that completed, it's time to integrate your model into an Android app. If you're following the official Colab notebook, this involves saving the model locally. For an example of image classification with MediaPipe Tasks, explore the GitHub repository. You can find more information in the official Google AI Edge documentation.

Newly converted ViT model with MediaPipe Tasks

After understanding how to convert a simple image classification model, you can use the same techniques to adapt various PyTorch models for Google AI Edge LiteRT or MediaPipe Tasks tooling on Android.

For further model optimization, consider methods like quantizing during conversion. Check out the GitHub example to learn more about how to convert a PyTorch image segmentation model to LiteRT and quantize it.

What's Next

To keep up to date on Google AI Edge developments, look for announcements on the Google for Developers YouTube channel and blog.

We look forward to hearing about how you're using these features in your projects. Use #AndroidAI hashtag to share your feedback or what you've built in social media and check out other content in AI on Android Spotlight Week!

Source: Android Developers Blog

How to bring your AI Model to Android devices

Posted by Kateryna Semenova – Senior Developer Relations Engineer and Mark Sherwood – Senior Product Manager

During AI on Android Spotlight Week, we're diving into how you can bring your own AI model to Android-powered devices such as phones, tablets, and beyond. By leveraging the tools and technologies available from Google and other sources, you can run sophisticated AI models directly on these devices, opening up exciting possibilities for better performance, privacy, and usability.

Understanding on-device AI

On-device AI involves deploying and executing machine learning or generative AI models directly on hardware devices, instead of relying on cloud-based servers. This approach offers several advantages, such as reduced latency, enhanced privacy, cost saving and less dependence on internet connectivity.

For generative text use cases, explore Gemini Nano that is now available in experimental access through its SDK. For many on-device AI use cases, you might want to package your own models in your app. Today we will walk through how to do so on Android.

Key resources for on-device AI

The Google AI Edge platform provides a comprehensive ecosystem for building and deploying AI models on edge devices. It supports various frameworks and tools, enabling developers to integrate AI capabilities seamlessly into their applications. The Google AI Edge platforms consists of:

MediaPipe Tasks - Cross-platform low-code APIs to tackle common generative AI, vision, text, and audio tasks

LiteRT (formerly known as TensorFlow Lite) - Lightweight runtime for deploying custom machine learning models on Android

MediaPipe Framework - Pipeline framework for chaining multiple ML models along with pre and post processing logic

Model Explorer - Conversion, performance, and debugging visualizer

How to build custom AI features on Android

1. Define your use case: Before diving into technical details, it's crucial to clearly define what you want your AI feature to achieve. Whether you're aiming for image classification, natural language processing, or another application, having a well-defined goal will guide your development process.

2. Choose the right tools and frameworks: Depending on your use case, you might be able to use an out of the box solution or you might need to create or source your own model. Look through MediaPipe Tasks for common solutions such as gesture recognition, image segmentation or face landmark detection. If you find a solution that aligns with your needs, you can proceed directly to the testing and deployment step.

If you need to create or source a custom model for your use case, you will need an on-device ML framework such as LiteRT (formerly TensorFlow Lite). LiteRT is designed specifically for mobile and edge devices and provides a lightweight runtime for deploying machine learning models. Simply follow these substeps:

a. Develop and train your model: Develop your AI model using your chosen framework. Training can be performed on a powerful machine or cloud environment, but the model should be optimized for deployment on a device. Techniques like quantization and pruning can help reduce the model size and improve inference speed. Model Explorer can help understand and explore your model as you're working with it.

b. Convert and optimize the model: Once your model is trained, convert it to a format suitable for on-device deployment. LiteRT, for example, requires conversion to its specific format. Optimization tools can help reduce the model’s footprint and enhance performance. AI Edge Torch allows you to convert PyTorch models to run locally on Android and other platforms, using Google AI Edge LiteRT and MediaPipe Tasks libraries.

c. Accelerate your model: You can speed up model inference on Android by using GPU and NPU. LiteRT’s GPU delegate allows you to run your model on GPU today. We’re working hard on building the next generation of GPU and NPU delegates that will make your models run even faster, and enable more models to run on GPU and NPU. We’d like to invite you to participate in our early access program to try out this new GPU and NPU infrastructure. We will select participants out on a rolling basis so don’t wait to reach out.

3. Test and deploy: To ensure that your model delivers the expected performance across various devices, rigorous testing is crucial. Deploy your app to users after completing the testing phase, offering them a seamless and efficient AI experience. We're working on bringing the benefits of Google Play and Android App Bundles to delivering custom ML models for on-device AI features. Play for On-device AI takes the complexity out of launching, targeting, versioning, downloading, and updating on-device models so that you can offer your users a better user experience without compromising your app's size and at no additional cost. Complete this form to express interest in joining the Play for On-device AI early access program.

Build trust in AI through privacy and transparency

With the growing role of AI in everyday life, ensuring models run as intended on devices is crucial. We're emphasizing a "zero trust" approach, providing developers with tools to verify device integrity and user control over their data. In the zero trust approach, developers need the ability to make informed decisions about the device's trustworthiness.

The Play Integrity API is recommended for developers looking to verify their app, server requests, and the device environment (and, soon, the recency of security updates on the device). You can call the API at important moments before your app’s backend decides to download and run your models. You can also consider turning on integrity checks for installing your app to reduce your app’s distribution to unknown and untrusted environments.

Play Integrity API makes use of Android Platform Key Attestation to verify hardware components and generate integrity verdicts across the fleet, eliminating the need for most developers to directly integrate different attestation tools and reducing device ecosystem complexity. Developers can use one or both of these tools to assess device security and software integrity before deciding whether to trust a device to run AI models.

Conclusion

Bringing your own AI model to a device involves several steps, from defining your use case to deploying and testing the model. With resources like Google AI Edge, developers have access to powerful tools and insights to make this process smoother and more effective. As on-device AI continues to evolve, leveraging these resources will enable you to create cutting-edge applications that offer enhanced performance, privacy, and user experience. We are currently seeking early access partners to try out some of our latest tools and APIs at Google AI Edge. Simply fill in this form to connect and explore how we can work together to make your vision a reality.

Dive into these resources and start exploring the potential of on-device AI—your next big innovation could be just a model away!

Use #AndroidAI hashtag to share your feedback or what you've built on social media and catch up with the rest of the updates being shared during Spotlight Week: AI on Android.

Source: Android Developers Blog

An introduction to privacy and safety for Gemini Nano

Posted by Terence Zhang – Developer Relations Engineer, and Adrien Couque – Software Engineer

AI can enhance the user experience and productivity of Android apps. If you're looking to build GenAI features that benefit from additional data privacy or offline inference, on-device GenAI is a good choice as it processes prompts directly on your device without any server calls.

Gemini Nano is the most efficient model in Google's Gemini family, and Android’s foundational model for running on-device GenAI. It's supported by AICore, a system service that works behind the scenes to centralize the model’s runtime, ensure its safe execution, and protect your privacy. With Gemini Nano, apps can offer more personalized and reliable AI experiences without sending your data off the device.

In this blog post, we'll provide an introductory look into how Gemini Nano and AICore work together to deliver powerful on-device AI capabilities while prioritizing users’ privacy and safety.

Private Compute Core (PCC) compliance

At Google I/O 2021, we introduced Private Compute Core (PCC), a secure environment designed to keep your data private. At I/O in 2024, we shared that AICore is PCC compliant, meaning that it operates under strict privacy rules. It can only interact with a limited set of other system packages that are also PCC compliant, and it cannot directly access the internet. Any requests to download models or other information are routed through a separate, open-source companion APK called Private Compute Services.

This framework helps protect your privacy while still allowing apps to benefit from the power of Gemini Nano. Consider a keyboard application using Gemini Nano for a reply suggestion feature. Without PCC, the keyboard would require direct access to the conversation context. With PCC, the code that has access to the conversation runs in a secure sandbox and interacts directly with Gemini Nano to generate suggestions on behalf of the keyboard. This allows the keyboard app to benefit from Gemini Nano's capabilities without directly accessing or storing sensitive conversation data. You can find out more about how this works in the PCC Whitepaper.

Protecting your privacy through data isolation

AICore is built to isolate each request to protect your privacy. This prevents apps from accessing data that does not belong to them. Requests are handled independently and processed from a single app at a time to mitigate the risk of data being exposed to other apps.

Additionally, AICore doesn't store any record of the input data or the resulting outputs after processing each request. This design, combined with the fact that Gemini Nano’s inference happens directly on your device, helps ensure your app’s data stays private and secure.

Prioritizing Safety in Gemini Nano

A flow chart illustrating the architecture of an AI system, highlighting the flow of data and processing steps from the 'Client app' to the 'Service' component, including 'Input safety signals', 'Output safety signals', 'Weights' and 'Runtime'

We're committed to building AI responsibly, and that includes making sure Gemini Nano is safe. We've implemented multiple layers of protection to limit harmful or unintended results:

Native model safety: All Gemini models, including Gemini Nano, are trained to be safety-aware out of the box. This means safety considerations are built into the core of the model, not just added as an afterthought.

Safety aware fine-tuning: We use a LoRA fine-tuning block to adapt Gemini Nano for the needs of specific apps. When we train the LoRA block, we incorporate safety data specific to the app’s use case to preserve and even enhance the model's safety features during fine-tuning where applicable.

Safety filters on input and output: As a final safeguard, both the input prompt and results generated by the Gemini Nano runtime are evaluated against our safety filters before providing the results to the app. This helps prevent unsafe content from slipping through, without any loss in quality.

These layers of protection work together to ensure that Gemini Nano provides a safe and helpful experience for everyone.

Get started

Learn more about Gemini Nano for app development, and try it out in your own app!

Be sure to check out the other amazing AI on Android Spotlight week content!

Source: Android Developers Blog

Gemini Nano is now available on Android via experimental access

Posted by Taj Darra – Product Manager

Gemini, introduced last year, is Google’s most capable family of models yet; designed for flexibility, it can run on everything from data centers to mobile devices. Since announcing Gemini Nano, our most efficient model built for on-device tasks, we've been working with a limited set of partners to support a range of use cases for their apps.

Today, we’re opening up access to experiment with Gemini Nano to all Android developers with the AI Edge SDK via AICore. Developers will initially have access to experiment with text-to-text prompts on Pixel 9 series devices. Support for more devices and modalities will be added in the future. Check out our documentation and video to get started. Note that experimental access is for development purposes, and is not for production usage at this time.

Fast, private and cost-effective on-device AI

On-device generative AI processes prompts directly on your device without server calls. It offers many benefits: sensitive user data is processed locally on the device, full functionality without internet connectivity, and no additional monetary cost for each inference.

Since on-device generative AI models run on devices with less computational power than cloud servers, they are significantly smaller and less generalized than their cloud-based equivalents. As a result, the model works best for tasks where the requests can be clearly specified rather than open-ended use cases such as chatbots. Here are some use cases you can try:

Rephrasing - Rephrasing and rewriting text to change the tone to be more casual or formal.

Smart reply - Given several chat messages in a thread, suggest the next likely response.

Proofreading - Removing spelling or grammatical errors from text.

Summarization - Generating a summary of a long document, either as a paragraph or as bullet points.

Check out our prompting strategies to achieve best results when experimenting with the above use-cases. If you want to test your own use case, you can download our sample app for an easy way to start experimenting with Gemini Nano.

Gemini Nano performance and usage

Compared to its predecessor, the model being made available to developers today (referred to in the academic paper as “Nano 2”) delivers a substantial improvement in quality. At nearly twice the size of the predecessor (“Nano 1”), it excels in both academic benchmarks and real-world applications, offering capabilities that rival much larger models.

	MMLU (5-shot)*	MATH (4-shot)*	Paraphrasing**	Smart Reply**
Nano 1	46%	14%	44%	44%
Nano 2	56%	23%	90%	82%

* As reported in Gemini: A Family of Highly Capable Multimodal Models. Note that both these models are a part of our Gemini 1.0 series.

** Percentage of good answers measured on public datasets via an autorater powered by Gemini 1.5 Pro.

Gemini Nano is already in use by Google apps. Pixel Screenshots, Talkback, Recorder and many more have leveraged Gemini Nano’s text and image understanding to deliver new experiences:

Talkback - Android’s accessibility app leverages Gemini Nano’s multimodal capabilities to improve image descriptions for blind and low vision users.

moving image of Talkback app UI highlighting improved image descriptions with multimodality model for users with low vision

Pixel Recorder - Gemini Nano with Multimodality model enables support for longer recordings and higher quality summaries.

Seamless model integration with AI Edge SDK using AICore

Integrating generative AI models directly into mobile apps is challenging due to the significant computational resources and storage space they require. To address this challenge, we developed AICore, a new system service in Android. AICore allows you to benefit from AI running directly on the device without needing to distribute runtimes, models and other components yourself.

To run inference with Gemini Nano in AICore, you use the AI Edge SDK. The AI Edge SDK enables developers to customize prompts and inference parameters to their specific needs, enabling greater control over each inference.

To experiment with the AI Edge SDK, add the following to your apps’ dependency:

implementation("com.google.ai.edge.aicore:aicore:0.0.1-exp01")

The AI Edge SDK allows you to customize inference parameters. Some of the more commonly-used parameters include:

Temperature, which controls randomness. Higher values increase diversity and creativity of output.

Top K, which specifies how many tokens from the highest-ranking ones are to be considered.

Candidate count, which describes the maximum number of responses to return.

Max output tokens, which is the length of the desired response.

When you are ready to run the inference with your model, the AI Edge SDK offers an easy way to pass in multiple strings as input to accommodate long inference data.

Here’s an example:

scope.launch {
    // Single string input prompt
    val input = "I want you to act as an English proofreader. I will 
    provide you texts, and I would like you to review them for any 
    spelling, grammar, or punctuation errors. Once you have finished 
    reviewing the text, provide me with any necessary corrections or 
    suggestions for improving the text: 
    These arent the droids your looking for."
    val response = generativeModel.generateContent(input)
    print(response.text)

    // Or multiple strings as input
    val response = generativeModel.generateContent(
        content {
            text("I want you to act as an English proofreader.I will 
            provide you texts and I would like you to review them for 
            any spelling, grammar, or punctuation errors.")
            text("Once you have finished reviewing the text, 
            provide me with any necessary corrections or suggestions 
            for improving the text:")
            text("These arent the droids your looking for.")
        }
    )
    print(response.text)
}

Our integration guide has more information on the AI Edge SDK as well as detailed instructions to start your experimentation with Gemini Nano. To learn more about prompting, check out the Gemini prompting strategies.

Get Started

Learn more about Gemini Nano for app development by watching our video walkthrough, and try out Gemini Nano experimental access in your own app today.

We are excited to see what you build and welcome your input as you evaluate this new technology for your use cases! Post your creations on social media and include the hashtag #AndroidAI to share what you build. To share your ideas and feedback for on-device GenAI and help shape our APIs, you can file a ticket.

There’s a lot more that we’re covering this week for you to build great AI experiences on Android so be sure to check out the rest of the AI on Android Spotlight Week content!

Source: Android Developers Blog

Welcome to AI on Android Spotlight Week

Posted by Joseph Lewis – Technical Writer, Android AI

AI on Android Spotlight Week this year runs September 30th to October 4th! As part of the Android “Spotlight Weeks” series, this week’s content and updates are your gateway to understanding how to integrate cutting-edge AI into your Android apps. Whether you're a seasoned Android developer, an AI enthusiast, or just starting out on your development journey, get ready for a week filled with insightful sessions, practical demos, and inspiring success stories that'll help you build intuitive and powerful AI integrations.

Throughout the week, we'll dive into the core technologies driving AI experiences on Android. This blog will be updated throughout the week with links to new announcements and resources, so check back here daily for updates!

Monday: Getting started with AI

September 30, 2024

Learn how to begin with AI on Android development. Understand which AI models and versions you can work with. Learn about developer tools to help you start building features empowered with AI.

We'll guide you through the differences between traditional programming and machine learning, and contrast traditional machine learning with generative AI. The post explains large language models (LLMs), the transformer architecture, and key concepts like context windows and embeddings. It also touches on fine-tuning and the future of LLMs on Android.

Read the blog post: A quick introduction to large language models for Android Developers

We'll then provide a look behind the scenes at our work improving developer productivity with Gemini in Android Studio. We'll discuss Studio's new AI code completion feature, how we've been working to improve the accuracy and quality of suggested code, and how this feature can benefit your workflow.

Read the blog post: Gemini in Android Studio: Code Completion gains powerful model improvements

Tuesday: On-device AI capabilities with Gemini Nano

October 1, 2024

Discover how Gemini Nano empowers Android developers to unlock the full potential of generative AI, offering personalization and privacy benefits for next-generation apps. We'll share how you can begin integrating your Android apps with on-device LLMs. Look for more information and announcements here on Tuesday!

Wednesday: On-device AI with custom models

October 2, 2024

On Wednesday, we'll help you understand how to bring your own AI model to Android devices, and how you can integrate tools and technologies from Google and other sources. The ability to run sophisticated AI models directly on devices – whether it's a smartphone, tablet, or embedded system – opens up exciting possibilities for better performance, privacy, usability, and cost efficiency.

We'll also give you a detailed walkthrough of how Android developers can leverage Google AI Edge Torch to convert PyTorch machine learning models for on-device execution, using the LiteRT and MediaPipe Tasks libraries. This walkthrough includes code examples and explanations for converting a MobileViT model for image classification and a DIS model for segmentation, and highlights the steps involved in preparing these models for seamless integration into Android applications. By following this guide, developers can harness PyTorch models to enhance their Android apps with advanced machine learning capabilities.

Thursday: Access cloud models with Android SDKs

October 3, 2024

Tap into the boundless potential of Gemini 1.5 Pro and Gemini 1.5 Flash, the revolutionary generative AI models that are redefining the capabilities of Android apps. With Gemini 1.5 Pro and 1.5 Flash, you'll have the tools you need to create apps that are truly intelligent and interactive.

On Thursday, we'll give you a codelab that'll help you understand how to integrate the Gemini API capabilities into your Android projects. We'll guide you through crafting effective prompts and integrate Vertex AI in Firebase. By the end of this hands-on tutorial, you'll be able to implement features like text summarization in your own app all powered by the cutting-edge Gemini models.

Next we'll publish a blog post exploring the potential of the Gemini API with case studies. We'll delve into how Android developers are leveraging generative AI capabilities in innovative ways, showcasing real-world examples of apps that have successfully integrated the Gemini API. From meal planning to journaling and personalized user experiences, the article highlights examples of how Android developers are already taking advantage of Gemini transformative capabilities in their apps.

We'll also share with you examples of advanced features of the Gemini API to go beyond simple text prompting. You'll learn how system instructions can shape the model behavior, how JSON support streamlines development, and how multimodal capabilities and function calling can unlock exciting new use cases for your apps.

Friday: Build with AI on Android and beyond

October 4, 2024

As the capstone for AI on Android Spotlight Week, we'll host a discussion with Kateryna Semenova, Oli Gaymond, Miguel Ramos, and Khanh LeViet to talk about building with AI on Android. We'll explore the latest AI advancements tailored for Android engineers, showcasing how these technologies can elevate your app development game. Through engaging discussions and real-world examples, we will unveil the potential of AI, from fast, private on-device solutions using Gemini Nano to the powerful capabilities of Gemini 1.5 Flash and Pro. We'll discuss building generative AI solutions rapidly using Vertex AI in Firebase. And we'll dive into harnessing the power of AI with safety and privacy in mind.

Work with Gemini beyond Android

As we wrap things up for AI on Android Spotlight Week, know that we're striving to provide comprehensive AI solutions for cross-platform Gemini development. The AI capabilities showcased during Android AI Week can extend to other platforms, such as built-in AI in Chrome. Web developers can leverage similar tools and techniques to create web experiences enhanced by AI. Developers can run Gemini Pro in the cloud for natural language processing and other complex user journeys. Or, you explore the benefits of performing AI inferenceclient-side, with Gemini Nano in Chrome.

Build with usability and privacy in mind

As you embark on your AI development journey, we want you to keep in mind a few important considerations:

Privacy: Prioritize user privacy and data security when implementing AI features, especially when handling sensitive user information. When it becomes available, opt for on-device AI solutions like Gemini Nano whenever possible to minimize data exposure.

Seamless user experience: Ensure that AI features seamlessly integrate into your app's overall user experience. AI should enhance the user experience, not disrupt it.

Ethical considerations: AI technologies are developed and deployed in a way that benefits society while minimizing potential harm. By considering fairness, transparency, privacy, accountability, and societal impact, developers can play a vital role in creating a future where AI serves humanity's best interests. Be mindful of the ethical implications of AI, such as potential biases in your AI models. Strive to create AI-powered features that are fair and inclusive.

AI on Android Spotlight Week is an opportunity to explore the latest in AI and its potential for Android app development. We encourage you to delve into the wealth of resources shared during the week and begin experimenting with AI in your own projects. The future of Android is rooted in AI and machine learning, and with the tools and knowledge shared during Android AI Week, developers are well-equipped to build the next generation of AI-powered apps.

What's next

Come back to this blog post for updates; we’ll add links to blog and video content and more throughout the week. Follow Android Developers on X and Android Developers at LinkedIn, and remember to use the hashtag #AndroidAI to share your AI-powered Android creations, and join the vibrant community of developers pushing the boundaries of mobile AI.

Source: Android Developers Blog

Gemini in Android Studio: Code Completion Gains Powerful Model Improvements

Posted by Sandhya Mohan – Product Manager, Android Studio and Sarmad Hashmi – Software Engineer, Labs

The Android team believes AI has the potential to revolutionize coding, drive unprecedented innovation and productivity in software development, and supercharge your development productivity. AI code completion is a key part of this effort within Gemini in Android Studio.

Since launching in May 2024, we've been hard at work improving this feature to provide the best possible experience for all Android developers. In this post, we want to take you “under the hood” on how we achieved a 40% relative increase in acceptance rate since release, and share some of our excitement for how we have seen Android developers use this feature. We hope you'll give it a try and let us know what you think.

An AI coding companion for every developer

Our vision for Gemini in Android Studio is to empower developers to build high quality Android apps — making it easy for developers to quickly write correct code aligned with Android's best practices. Launched last year, the first version of Studio Bot provided a chat experience where developers could access Android-specific guidance, powered by Google's latest AI models. Developers are able to ask Gemini in Android Studio to provide developer guidance, summarize technical documentation, and critique their Android code. But in all these cases the feedback is reactive, responding to a user's question.

AI code completion takes these capabilities a step further by providing real-time feedback as you work as a developer, thinking ahead and suggesting the next few lines of code that you are likely to type based on the context from the surrounding file and what was just typed. You can think of AI Code Completion as a partner in your work — a coding companion waiting to offer guidance when you need it.

This feature is particularly well suited for tasks like defining business logic, creating database schemas, making network requests, or even writing tests — tasks that are often time-consuming and distract from building the core experience for your app. Many developers have told us how much they enjoy the speed AI completions brings to their app development workflow.

A moving image demonstrating AI autocomplete in Android Studio

Bringing more intelligent code completion to Android development

While we are excited to see how AI Code Completions have improved developers’ workflows, we know there's still more we can do to improve developer productivity. Development of Gemini in Android Studio is an ongoing, large-scale collaborative effort by many teams across Google. Earlier this year, we switched to Gemini 1.5 models and saw a significant improvement in the quality of code completions, resulting in a 2x increase in our developer productivity metrics, including overall acceptance rate for suggestions.

Once we started doing A/B test experiments to improve AI code completion we found several improvements around model quality, context, and heuristics. This overall effort led to a 40% relative increase in acceptance rate — how often users accept the AI's proposed code suggestions — since we launched. Since then, we've been exploring several improvements like:

Retrieval augmentation: With your opt-in consent, we use the files and dependencies most relevant to your current coding context to enhance the accuracy of suggestions. This is just the first step and we're continuing to experiment with adding even more context from the IDE as part of each request.

Filtering out low-confidence completions: Prioritize showing high quality suggestions where they are most relevant, and therefore most likely to be accepted. We do this by using a combination of the probabilities returned by the model and using a classifier trained to identify high-quality completions based on developer feedback.

Smarter post-processing: The LLM's output for AI Code Completion is fundamentally different from the output users expect in a chat session. Responses need to be tightly scoped in order to quickly output useful code, without surrounding expository text. We apply additional heuristics on the model output to ensure responses are concise and accurate, as well as making sure that the generated code is valid within the context of the user's codebase.

Improved models: We use opt-in feedback from Android Studio users, such as noting when a code suggestion is accepted or rejected, to adapt the code completion model to their coding style and preferences over time. We regularly ship new models with higher quality data based on your feedback.

We are also exploring metrics beyond acceptance rate to better measure AI impact on developer velocity, such as the percentage of total code written by AI.

Try it out!

We are rolling out these successful experiments and others as quickly as possible.

If you haven't tried AI code completions yet, you can enable this feature by clicking on the Gemini button in your editor window and signing in to your Google account.

A screenshot of Android Studio with a pop-up notification about the Gemini AI coding companion. The notification explains that Gemini is a free feature in preview and requires a Google account login to use.

Figure 1. Launching Gemini in Android Studio for the first time

After doing so, navigate to Settings > Tools > Gemini and select "Enable AI-based inline code completions".

A screenshot of the settings menu within Android Studio, with the 'Gemini' section expanded showing options related to the AI coding companion, including privacy and context awareness.

Figure 2. Enabling "AI-based inline code completions"

As always, Google is committed to the responsible use of AI. Android Studio won't send any of your source code to servers without your consent — which means you'll need to opt-in to enable Gemini's developer assistance features in Android Studio. You can read more on Gemini in Android Studio's commitment to privacy.

Try enabling AI Code Completions in your project and tell us what you think on social media with #AndroidGeminiEra. We're excited to see how these enhancements help you build amazing apps!

This blog post is part of our series: AI on Android Spotlight Week, where we provide resources — blog posts, videos, sample code, and more — all designed to to explore the latest in AI and its potential for Android app development.

Source: Android Developers Blog

Quick introduction to Large Language Models for Android developers

Posted by Thomas Ezan, Sr Developer Relation Engineer

Android has supported traditional machine learning models for years. Frameworks and SDKs like LiteRT (formerly known as TensorFlow Lite), ML Kit and MediaPipe enabled developers to easily implement tasks like image classification and object detection.

In recent years, generative AI (gen AI) and large language models (LLMs), have opened up new possibilities for language understanding and text generation. We have lowered the barriers for integrating gen AI features into your apps and this blog post will provide you with the necessary high-level knowledge to get started.

Before we dive into the specificities of generative AI models, let’s take a high level look: how is machine learning (ML) different from traditional programming.

Machine learning as a new programming paradigm

A key difference between traditional programming and ML lies in how solutions are implemented.

In traditional programming, developers write explicit algorithms that take input and produce a desired output.

A flow chart showing the process of machine learning model training. Input data is fed into the training process, resulting in a trained ML model

Machine learning takes a different approach: developers provide a large set of previously collected input data and the corresponding output, and the ML model is trained to learn how to map the input to the output.

A flow chart illustrating the machine learning model training. This step is labeled above the process '1. Train the model with a large set of input and output data'. Below, arrows labeled 'Input' and 'Output' point to a green box labeled 'ML Model Training'. Another arrow points away from the box and is labeled 'ML Model'.

Then, the model is deployed on the Cloud or on-device to process input data. This step is called inference.

A flow chart illustrating the inference training for training an ML model. This step is labeled above the process '2. Deploy the model to run inferences on input data'. Below, an arrow labeled 'Input' points to a green box labeled 'Run ML Inference'. Another arrow points away from the box and is labeled 'Output'.

This paradigm enables developers to tackle problems that were previously difficult or impossible to solve with rule-based programming.

Traditional machine learning vs. generative AI on Android

Traditional ML on Android includes tasks such as image classification that can be implemented using mobilenet and LiteRT, or pose estimation that can be easily added to your Android app with the ML Kit SDK. These models are often trained on specific datasets and perform extremely well on well-defined, narrow tasks.

Generative AI introduces the capability to understand inputs such as text, images, audio and video and generate human-like responses. This enables applications like chatbots, language translation, text summarization, image captioning, image or code generation, creative writing assistance, and much more.

Most state of the art generative AI models like the Gemini models are built on the transformer architecture. To generate images, diffusion models are often used.

Understanding large language models

At its core, an LLM is a neural network model trained on massive amounts of text data. It learns patterns, grammar, and semantic relationships between words and phrases, enabling it to predict and generate text that mimics human language.

As mentioned earlier, most recent LLMs use the transformer architecture. It breaks down input into tokens, assigns numerical representations called “embeddings” (see Key concepts below) to these tokens, and then processes these embeddings through multiple layers of the neural network to understand the context and meaning.

LLMs typically go through two main phases of training:

1. Pre-training phase: The model is exposed to vast amounts of text from different sources to learn general language patterns and knowledge.

2. Fine-tuning phase: The model is trained on specific tasks and datasets to refine its performance for particular applications.

Classes of models and their capabilities.

Gen AI models come in various sizes, from smaller models like Gemini Nano or Gemma 2 2B, to massive models like Gemini 1.5 Pro that run on Google Cloud. The size of a model generally correlates with the capabilities and compute power required to run it.

Models are constantly evolving, with new research pushing the boundaries of their capabilities. These models are being evaluated on tasks like question answering, code generation, and creative writing, demonstrating impressive results.

In addition some models are multimodal which means that they are designed to process and understand information from multiple modalities, such as images, audio, and video, alongside text. This allows them to tackle a wider range of tasks, including image captioning, visual question answering, audio transcription. Multiple Google Generative AI models such as Gemini 1.5 Flash, Gemini 1.5 Pro, Gemini Nano with Multimodality and PaliGemma are multimodal.

Key concepts

Context Window

Context window refers to the amount of tokens (converted from text, image, audio or video) the model considers when generating a response. For chat use cases, it includes both the current input and a history of past interactions. For reference, 100 tokens is equal to about 60-80 English words.For reference, Gemini 1.5 Pro currently supports 2M input tokens. It is enough to fit the seven Harry Potter books… and more!

Embeddings

Embeddings are multidimensional numerical representations of tokens that accurately encode their semantic meaning and relationships within a given vector space. Words with similar meanings are closer together, while words with opposite meanings are farther apart.

The embedding process is a key component of an LLM. You can try it independently using MediaPipe Text Embedder for Android. It can be used to identify relations between words and sentences and implement a simplified semantic search directly on-device.

A 3-D graph plots 'Man' and 'King' in blue and 'Woman' and 'Queen' in green, with arrows pointing from 'Man' to 'Woman' and from 'King' to 'Queen'.

A (very) simplified representation of the embeddings for the words “king”, “queen”, “man” and “woman”

Top-K, Top-P and Temperature

Parameters like Top-K, Top-P and Temperature enable you to control the creativity of the model and the randomness of its output.

Top-K filters tokens for output. For example a Top-K of 3 keeps the three most probable tokens. Increasing the Top-K value will increase the randomness of the model response (learn about Top-K parameter).

Then, defining the Top-P value adds another step of filtering. Tokens with the highest probabilities are selected until their sum equals the Top-P value. Lower Top-P values result in less random responses, and higher values result in more random responses (learn about Top-P parameter).

Finally, the Temperature defines the randomness to select the tokens left. Lower temperatures are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperatures can lead to more diverse or creative results (learn about Temperature).

Fine-tuning

Iterating over several versions of a prompt to achieve an optimal response from the model for your use-case isn’t always enough. The next step is to fine-tune the model by re-training it with data specific to your use-case. You will then obtain a model customized to your application.

More specifically, Low rank adaptation (LoRA) is a fine-tuning technique that makes LLM training much faster and more memory-efficient while maintaining the quality of the model outputs. The process to fine-tune open models via LoRA is well documented. See, for example, how you can fine-tune Gemini models through Google AI Studio without advanced ML expertise. You can also fine-tune Gemma models using the KerasNLP library.

The future of generative AI on Android

With ongoing research and optimization of LLMs for mobile devices, we can expect even more innovative gen AI enabled features coming to Android soon. In the meantime check out other AI on Android Spotlight Week blog posts, and go to the Android AI documentation to learn more about how to power your apps with gen AI capabilities!