Tag Archives: beginner

It’s time for developers and enterprises to build with Gemini Pro

Posted by Jeanine Banks – VP/GM, Developer X and Developer Relations, and Burak Gokturk – VP/GM, Cloud AI and Industry Solutions

Learn more about how to integrate Gemini Pro into your app or business at ai.google.dev

This article is also published on the Keyword blog.

Last week, we announced Gemini, our largest and most capable AI model and the next step in our journey to make AI more helpful for everyone. It comes in three sizes: Ultra, Pro and Nano. We've already started rolling out Gemini in our products: Gemini Nano is in Android, starting with Pixel 8 Pro, and a specifically tuned version of Gemini Pro is in Bard.

Today, we’re making Gemini Pro available for developers and enterprises to build for your own use cases, and we’ll be further fine-tuning it in the weeks and months ahead as we listen and learn from your feedback.


Gemini Pro is available today

The first version of Gemini Pro is now accessible via the Gemini API and here’s more about it:

  • Gemini Pro outperforms other similarly-sized models on research benchmarks.
  • Today’s version comes with a 32K context window for text, and future versions will have a larger context window.
  • It’s free to use right now, within limits, and it will be competitively priced.
  • It comes with a range of features: function calling, embeddings, semantic retrieval and custom knowledge grounding, and chat functionality.
  • It supports 38 languages across 180+ countries and territories worldwide.
  • In today’s release, Gemini Pro accepts text as input and generates text as output. We’ve also made a dedicated Gemini Pro Vision multimodal endpoint available today that accepts text and imagery as input, with text output.
  • SDKs are available for Gemini Pro to help you build apps that run anywhere. Python, Android (Kotlin), Node.js, Swift and JavaScript are all supported.
A screenshot of a code snippet illustrating the SDKs supporting Gemini.
Gemini Pro has SDKs that help you build apps that run anywhere.

Google AI Studio: The fastest way to build with Gemini

Google AI Studio is a free, web-based developer tool that enables you to quickly develop prompts and then get an API key to use in your app development. You can sign into Google AI Studio with your Google account and take advantage of the free quota, which allows 60 requests per minute — 20x more than other free offerings. When you’re ready, you can simply click on “Get code” to transfer your work to your IDE of choice, or use one of the quickstart templates available in Android Studio, Colab or Project IDX. To help us improve product quality, when you use the free quota, your API and Google AI Studio input and output may be accessible to trained reviewers. This data is de-identified from your Google account and API key.

A screen recording of a developer using Google AI Studio.
Google AI Studio is a free, web-based developer tool that enables you to quickly develop prompts and then get an API key to use in your app development.

Build with Vertex AI on Google Cloud

When it's time for a fully-managed AI platform, you can easily transition from Google AI Studio to Vertex AI, which allows for customization of Gemini with full data control and benefits from additional Google Cloud features for enterprise security, safety, privacy and data governance and compliance.

With Vertex AI, you will have access to the same Gemini models, and will be able to:

  • Tune and distill Gemini with your own company’s data, and augment it with grounding to include up-to-minute information and extensions to take real-world actions.
  • Build Gemini-powered search and conversational agents in a low code / no code environment, including support for retrieval-augmented generation (RAG), blended search, embeddings, conversation playbooks and more.
  • Deploy with confidence. We never train our models on inputs or outputs from Google Cloud customers. Your data and IP are always your data and IP.

To read more about our new Vertex AI capabilities, visit the Google Cloud blog.


Gemini Pro pricing

Right now, developers have free access to Gemini Pro and Gemini Pro Vision through Google AI Studio, with up to 60 requests per minute, making it suitable for most app development needs. Vertex AI developers can try the same models, with the same rate limits, at no cost until general availability early next year, after which there will be a charge per 1,000 characters or per image across Google AI Studio and Vertex AI.

A screenshot of input and output prices for Gemini Pro.
Big impact, small price: Because of our investments in TPUs, Gemini Pro can be served more efficiently.

Looking ahead

We’re excited that Gemini is now available to developers and enterprises. As we continue to fine-tune it, your feedback will help us improve. You can learn more and start building with Gemini on ai.google.dev, or use Vertex AI’s robust capabilities on your own data with enterprise-grade controls.

Early next year, we’ll launch Gemini Ultra, our largest and most capable model for highly complex tasks, after further fine-tuning, safety testing and gathering valuable feedback from partners. We’ll also bring Gemini to more of our developer platforms like Chrome and Firebase.

We’re excited to see what you build with Gemini.

A New Foundation for AI on Android

Posted by Dave Burke, VP of Engineering

Foundation Models learn from a diverse range of data sources to produce AI systems capable of adapting to a wide range of tasks, instead of being trained for a single narrow use case. Today, we announced Gemini, our most capable model yet. Gemini was designed for flexibility, so it can run on everything from data centers to mobile devices. It's been optimized for three different sizes: Ultra, Pro and Nano.

Gemini Nano, optimized for mobile

Gemini Nano, our most efficient model built for on-device tasks, runs directly on mobile silicon, opening support for a range of important use cases. Running on-device enables features where the data should not leave the device, such as suggesting replies to messages in an end-to-end encrypted messaging app. It also enables consistent experiences with deterministic latency, so features are always available even when there’s no network.

Gemini Nano is distilled down from the larger Gemini models and specifically optimized to run on mobile silicon accelerators. Gemini Nano enables powerful capabilities such as high quality text summarization, contextual smart replies, and advanced proofreading and grammar correction. For example, the enhanced language understanding of Gemini Nano enables the Pixel 8 Pro to concisely summarize content in the Recorder app, even when the phone’s network connection is offline.

Moving image of Gemini Nano being used in the Recorder app on a Pixel 8 Pro device
Pixel 8 Pro using Gemini Nano in the Recorder app to summarize meeting audio, even without a network connection.

Gemini Nano is starting to power Smart Reply in Gboard on Pixel 8 Pro, ready to be enabled in settings as a developer preview. Available now to try with WhatsApp and coming to more apps next year, the on-device AI model saves you time by suggesting high-quality responses with conversational awareness1.

Moving image of WhatsApp’s use of Smart Reply in Gboard using Gemini Nano on Pixel 8 Pro device
Smart Reply in Gboard within WhatsApp using Gemini Nano on Pixel 8 Pro.

Android AICore, a new system service for on-device foundation models

Android AICore is a new system service in Android 14 that provides easy access to Gemini Nano. AICore handles model management, runtimes, safety features and more, simplifying the work for you to incorporate AI into your apps.

AICore is private by design, following the example of Android’s Private Compute Core with isolation from the network via open-source APIs, providing transparency and auditability. As part of our efforts to build and deploy AI responsibly, we also built dedicated safety features to make it safer and more inclusive for everyone.

AICore architechture
AICore manages model, runtime and safety features.

AICore enables Low Rank Adaptation (LoRA) fine tuning with Gemini Nano. This powerful concept enables app developers to create small LoRA adapters based on their own training data. The LoRA adapter is loaded by AICore, resulting in a powerful large language model fine tuned for the app’s own use-cases.

AICore takes advantage of new ML hardware like the latest Google Tensor TPU and NPUs in flagship Qualcomm Technologies, Samsung S.LSI and MediaTek silicon. AICore and Gemini Nano are rolling out to Pixel 8 Pro, with more devices and silicon partners to be announced in the coming months.

Build with Gemini

We're excited to bring together state-of-the-art AI research with easy-to-use tools and APIs for Android developers to build with Gemini on-device. If you are interested in building apps using Gemini Nano and AICore, please sign up for our Early Access Program.


1 Available globally, only using the United States English keyboard language. Read more for details.

How it’s Made: Interacting with Gemini through multimodal prompting

Posted by Alexander Chen, Creative Director


Let’s try an experiment. We’ll show this picture to our multimodal model Gemini and ask it to describe what it sees:

A close up of a hand with five fingers spread out, palm facing down on a wooden surface.
Tell me what you see
Gemini:  I see a person's right hand. The hand is open with the fingers spread apart.

Ok, let’s try this one:

A closeup of the same hand, now balled up in a fist.
Gemini: A person knocking on a wooden door.

How about this one?

A closeup of the same hand, with two fingers extended like that of a peace sign.
Gemini: I see a hand with two fingers extended, which is a common symbol for the number two.

But wait … what if we asked Gemini to reason about all of these images together?

A sequence of three frames depicting the same hand in different formations. The first frame shows the hand with all five fingers spread out, palm facing down. The second frame shows the hand balled up in a fist. The third frame shows the hand with two fingers extended, like that of a peace sign.
What do you think I'm doing? Hint: it's a game.
Gemini: You're playing rock, paper, scissors.

Nice. What else could we try? Let's see if Gemini can notice this pattern:

A sequence of six frames that alternate between a hand that is balled up in a fist, and a hand that has two fingers extended.
I’m playing rock paper scissors. Do you notice a pattern in my gameplay?
Gemini: You are alternating between rock and scissors.

Neat, it spotted the pattern! But does Gemini really know how this game works? Let’s try asking:

Hmm ... do you think it's a good strategy?
Gemini: No, it is not a good strategy. Your opponent will quickly learn your pattern and will be able to win every time. You should try to randomize your choices more.

Good advice, Gemini! Okay, for this last one, let’s put your multimodal reasoning capabilities to the test. Here’s your prompt:

This is a secret message. What does it say?

A six by eight grid of frames of a hands balled up in a fist or the hand with all five fingers spread out, palm facing down. The series of photos spells 'Gemini' in ASCII binary.

Can you figure it out? We have gotten Gemini to solve it, but we won’t spoil it by putting the answer in this post. 🙂



What is multimodal prompting?

Everything we did just now is an example of “multimodal prompting.” We’re basically giving Gemini combinations of different modalities — image and text in this case — and having Gemini respond by predicting what might come next. Gemini’s ability to seamlessly combine these modes together enables new possibilities for what you can do.

In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video. We’ll soon be rolling out Gemini for people to try in Google AI Studio, our free, web-based developer tool where you’ll be able to try your own multimodal prompts with Gemini. We’re hoping this guide of starter prompts and ideas helps inspire you to start exploring your own ideas.



🧮 Spatial reasoning and logic

Logic and puzzles are fun way to test Gemini. Let’s try one that requires both left-to-right spatial reasoning and knowledge about our solar system:

Three sticky notes on a wooden surface, each featuring a hand-drawn sketch of a solar-body. The left-most note depicts a drawing of the sun. The middle note has a drawing of saturn. The right-most note shows a drawing of earth.
Is this the right order? Consider the distance from the sun and explain your reasoning.

Gemini: No, the correct order is Sun, Earth, Saturn. The Sun is the closest to the center of the solar system, followed by the Earth, and then Saturn.

Nice! Here’s another, inspired by pinewood derby challenges:

Two sticky notes depicting two different hand-drawn cars going down-hill. The car on the left is blocky and awkwardly shaped, the car on the right is slanted and smooth.
Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details.

Gemini: The car on the right is more aerodynamic. It has a lower profile and a more streamlined shape. The car on the left has a higher profile and a more boxy shape, which makes it less aerodynamic.



📷 Image sequences

Next, what if we showed Gemini a sequence of images? Let’s see we can show a few still frames from a game of charades and have Gemini guess the movie:

Four frames from a video of a person leaning back and moving their arms wildly.
Guess what movie I'm acting out.
Gemini: The Matrix
Nice! But which part specifically? Look at my body movements.
Gemini: The part where Neo dodges bullets.



🪄 Magic tricks

Exploring image sequences really lets you dig into how Gemini reasons over time. Let’s try a classic magic trick.

An image of two hands with palms facing the camera, there is a coin in the right hand.
Explain what I'm doing.
Gemini: You are holding a coin in your right hand.
An image of two hands in motion, palms turning inward and away from the camera.
An image of two hands, palms facing down and away from the camera.
Where do you think the coin is?
Gemini: The coin is in your right hand.

Sure, that’s a logical answer given what it’s seen. But let’s pull a sleight of hand as we reveal this next image to Gemini...

An image of two hands, the left palm is facing down and the right hand is tilted upwards revealing nothing under the right palm.
Look here, what happened?
Gemini: The coin is gone.

Gemini notices it’s gone. But can it summarize what may have happened?

Summarize what may have happened, step by step, starting with the first image.

Gemini: 
1. You started with a coin in your right hand.
2. You then made a motion with your hands, and the coin disappeared.
3. You then showed that the coin was gone.

Gemini is leveraging the entire context of this conversation – both text and image – to summarize its answer. Finally, let’s ask:

Can you explain how it disappeared?
Gemini: The coin was palmed.

Gemini makes a really solid guess! But we won’t reveal our secret 🙂.



🔀 Cup shuffling

Let’s try a trick that requires memory and logic: the classic ball and cup shuffling game. For this one, we’ll break it down into steps for Gemini. First, let’s establish the basics with these instructions:

Instructions: There are three cups in position 1 (left), position 2 (middle), position 3 (right).

Next, we’ll need to tell Gemini where we’re placing the ball:

Starting position: The ball is in position 3.
Current state: empty, empty, ball

Notice how we came up with a simple format for remembering where the ball is with “empty, empty, ball.” Next, let’s show Gemini two example turns:

Images:
Three frames depicting a person shuffling three green cups. The person switches positions of the middle and right-most cups. 
Swap: 2 and 3
Current state: empty, ball, empty

Images:
Three frames depicting a person shuffling three green cups. The person switches positions of the middle and left-most cups. 
Swap: 1 and 2
Current state: ball, empty, empty

In these two example turns, we’re both showing it what a swap looks like, and how to update the ball position. Now, let’s see if Gemini can generalize to a swap it hasn't seen:

Images:
Three frames depicting a person shuffling three green cups. The person switches positions of the left-most and right-most cups. 
Swap: 

Here's Gemini's response:

1 and 3
Current state: empty, empty, ball

Gemini got it! It looked at these images and correctly inferred that cups 1 and 3 are being swapped. And it reasoned correctly about how to update the ball position. Let’s ask:

Tell me where the ball is and summarize how it got there, in paragraph form.

Gemini: The ball is in cup 3. 

Starting with the ball in cup 3, the first swap was cup 2 and 3. This put the ball in cup 2. The second swap was cup 1 and 2. This put the ball in cup 1. The third swap was cup 1 and 3. This put the ball in cup 3.

Not only did Gemini get the answer correctly, it accurately summarized the game history. Of course, it won’t always get this challenge right. Sometimes the fake out move (where you swap two empty cups) seems to trip it up, but sometimes it gets that too. But simple prompts like this make it really fun to rapidly test Gemini. You can change the variables in your prompt, including the order of swaps, and see how it does.



🔨 Tool use

If you want to use Gemini in your own apps, you’ll want it to be able to connect to other tools. Let’s try a simple idea where Gemini needs to combine multimodality with tool use: drawing a picture to search for music.

A musical instruments and palm tree sketch on a notebook. The sketch features a guitar, an amplifier, a drum set, and a palm tree drawn in a loose and expressive style. The notebook sits on a wooden desk.
Step 1. Look at this drawing, describe everything in it, and come up with a fun, specific idea for a good musical soundtrack genre, mood, or soundtrack. 
Step 2. Provide a search query in the form: "Search [insert query]"

Gemini: 
The drawing is of a guitar, drums, and an amplifier. There is also a palm tree. The mood is fun, upbeat, and tropical. The genre is reggae. The soundtrack could be used for a movie or TV show set in a tropical location. 

Search "Reggae music for a tropical vacation"

Nice! Gemini both reasons about what it sees and then generates a search query you can parse to do a search. It’s like Gemini is acting like a translator for you – but instead of translating between languages, it’s translating modalities – from drawing to music in this case. With multimodal prompting, you can use Gemini to invent your own entirely new translations between different inputs and outputs.



🕹️Game creation

What if we tried using Gemini to quickly prototype a multimodal game? Here’s an idea: a geography guessing game where you have to point at a map to make your guess. Let’s start by prompting Gemini with the core idea:

Instructions: Let's play a game. Think of a country and give me a clue. The clue must be specific enough that there is only one correct country. I will try pointing at the country on a map.

Next, let’s give Gemini an example turn of gameplay, showing it how we want it to handle both incorrect and correct answers:

Country: This country is huge island that is mostly covered in ice.
(Answer: Greenland) 

Guess:
An image of a world map, a hand icon with the index finger pointing at Canada
I am pointing at the country of Canada *incorrect*

Guess:
An image of a world map, a hand icon with the index finger pointing at Greenland
I am pointing at the country of Greenland *correct*

Let’s give it a go and prompt Gemini to generate a clue:

Clue:
This country is known for its unique wildlife, including kangaroos and koalas. (Answer: Australia)

Ok, that’s a good clue. Let’s test out whether pointing will work. Just for fun, let’s try pointing at the wrong place first:

Guess:
A person pointing at Brazil on a map of the world with their right index finger. The map includes blue oceans, and green continents with no country borders.
I am pointing at the country of Brazil *incorrect*

Great! Gemini looked at my image and figured out I’m pointing at Brazil, and correctly reasoned that’s wrong. Now let’s point at the right place on the map:

A person pointing at Brazil on a map of the world with their right index finger. The map includes blue oceans, and green continents with no country borders.
I am pointing at the country of Australia *correct*

Nice! We’ve basically taught Gemini our game logic just by giving it an example. You'll also notice that it generalized from the illustrated hand in the examples.



⌨️ Coding

Of course, to bring your game idea to life, you’ll eventually have to write some executable code. Let’s see if Gemini can make a simple countdown timer for a game, but with a few fun twists:

Implement a simple timer in HTML/CSS/Javascript. Use a sans serif font and dark mode. Start it at 10 seconds and start counting down. When it reaches zero, replace the timer with a random emoji that is associated with excitement and motivation! Then go back to the timer at 10 seconds and start counting down again.

With just this single instruction, Gemini gives us a working timer that does what we asked for:

An animated gif of a countdown timer starting from 10. At the end of the countdown, a rocket emoji is shown followed lightning bolt emoji and a confetti emoji.

My favorite part is scrolling through Gemini’s source code to find the array of motivational emojis it picked for me:

 const emojis  = ['🚀', '⚡️', '🎉', '🎊', '🥳', '🤩', '✨'];



👀 A sneak peek

Throughout this post, we’ve been giving Gemini an input, and having Gemini make predictions for what might come next. That's basically what prompting is. And our inputs have been multimodal – image and text, combined.

But so far we've only shown Gemini responding in text. Maybe you’re wondering, can Gemini also respond with a combination of image and text? It can! This is a capability of Gemini called “interleaved text and image generation.” While this feature won’t be ready in the first version of Gemini for people to try, we hope to roll it out soon. Here’s a sneak peek of what’s possible.

Let’s see if we could use Gemini to provide everyday creative inspiration. And let’s try it in a domain that requires a bit of multimodal reasoning ... knitting! 🧶. Similar to our map game above, let's provide one example turn of interaction:

Give me three ideas for crochet creations I could make with these two colors:

Balls of blue and yellow yarn on a wooden table.

Colors: I see blue and yellow yarn.
Idea 1: How about a cute dog with one yellow ear? 
Image 1:

A blue and yellow crocheted dog head with ears placed on a wooden table.

Idea 2: Or how about a jellyfish with a few yellow tentacles?
Image 2:

A blue and yellow crocheted octopus with tentacles placed on a wooden table.

Idea 3: Or how about a blue cat with a yellow nose?
Image 3:

A blue and yellow crocheted cat with whiskers placed on a wooden table.

We’re essentially teaching Gemini about how we want each interaction to go: “I’ll take a photo of two balls of yarn, and I expect you (Gemini) to both come up with an idea for something I could make, and generate an image of it.”

Now, let’s show it a new pair of yarn colors it hasn't yet seen, and see if it can generalize:

Give me three ideas for crochet creations I could make with these two colors:

Balls of blue and pink yarn on a wooden table.

Gemini:
Colors: I see blue and pink yarn.
Idea 1: How about a pig with blue ears?
Image 1:

A blue and pink crocheted pig face with ears placed on a wooden table.

Idea 2: Or an octopus with pink and blue tentacles?
Image 2:

A blue and yellow crocheted octopus with tentacles placed on a wooden table.

Idea 3: Or a bunny with a pink nose?
Image 3:

A blue and pink crocheted bunny placed on a wooden table.

Nice! Gemini correctly reasoned about the new colors (“I see blue and pink yarn”) and generated these ideas and the images in a single, interleaved output of text and image.

What Gemini did here is fundamentally different from today’s text-to-image models. It's not just passing an instruction to a separate text-to-image model. It sees the image of my actual yarn on my wooden table, truly doing multimodal reasoning about my text and image together.


What's Next?

We hope you found this a helpful starter guide to get a sense of what’s possible with Gemini. We’re very excited to roll it out to more people soon so you can explore your own ideas through prompting. Stay tuned!

NordVPN boosted the speed of its login user flow by 60% using Baseline Profiles

Posted by Ben Weiss, Senior Developer Relations Engineer

NordVPN is a virtual private network (VPN) app that protects users while they’re browsing the web by providing them a more secure and private connection. As a network utility, NordVPN’s users deserve a responsive UI, allowing them to set up their protections at a moment’s notice. That's why NordVPN developers recently integrated Baseline Profiles, a profile-guided optimization that helps Android developers improve an app's startup and runtime performance using ahead-of-time compilation.

Improving performance with Baseline profiles

As part of its product roadmap for 2023, the NordVPN team wanted to boost the application’s performance. Before implementing Baseline Profiles, NordVPN’s startup times on Android devices didn’t meet the team’s standards, prompting them to examine new ways to make the app run better.

After exploring ways to improve its runtime performance and streamline the login process for users, NordVPN developers identified an opportunity to make the app faster using Baseline Profiles. Baseline Profiles lets the Android Runtime (ART) know which code paths to optimize through Ahead-of-Time (AOT) compilation before an app launches, boosting speed, stability, and overall responsiveness during startup, when navigating through the app, and while viewing content.

“App speed and stability are essential for a better user experience, so we’re always looking for new ways to improve NordVPN’s performance,” said Himanshu Singh, senior Android engineer at NordVPN. “We wanted to speed up the app’s load time and make launch and navigation faster than ever.”

By applying Baseline Profiles, NordVPN improved its launch speed by an average of 24%. Using tools like Android Vitals, the NordVPN team measured that it had reduced the application’s cold start time from 4.3 seconds to 3.2 seconds, the warm start time from 2.7 seconds to 1.8 seconds, and the hot start time from 1 second to 0.7 seconds.

After implementation, NordVPN developers' also noticed that Baseline Profiles made it faster for users to login to the app, improving the user login flow. The login flow is measured from when a user starts an app to when a user is logged into it. Using the Macrobenchmark library to monitor the improvements, the team observed that the NordVPN app runs its login flow 60% faster than before.

A black quote card featuring a headshot if Ernestas on the right with text in white that reads “The introduction of Baseline Profiles helped us achieve outstanding results, elevating our application’s speed with minimal effort.” — Ernestas Balčiūnas, engineering lead at NordVPN

Integrating and testing Baseline Profiles is easy

The ease of implementing Baseline Profiles impressed NordVPN developers. The available resources, in-depth documentation, and codelabs from Android allowed them to enhance the app’s UX without having to write an extensive amount of code themselves.

Using the Macrobenchmark library, NordVPN developers quickly generated Baseline Profiles for the application. To do this, they used a Gradle managed device, which enabled them to create new profiles without a physical device. Using a Gradle Managed Device also allowed NordVPN developers to create fresh profiles for each app release build on their Continuous Integration platform. Looking forward, NordVPN developers plan to migrate Baseline Profile generation to the official Gradle plugin, which will further automate profile generation.

NordVPN developers combined development workflows to create an integration pipeline, allowing them to test the app under various conditions. Then, the Macrobenchmark library ran Baseline generation tests, pushing the latest Baseline Profiles into the code base.

A black Quote card featuring droid on the right side and a green half oval overlay on the left with white text that reads, 'Applying Baseline Profiles in the NordVPN application led to a 29% improvement to overall in-app speed.'

A quick boost to app quality

After integrating Baseline Profiles into NordVPN’s code, its developers saw immediate speed improvements. The engineering team assessed the app’s overall speed after finishing the project and found that, beyond improving the app’s launch times, applying Baseline Profiles led to a 29% improvement to overall in-app speed.

"We’re constantly working to improve app quality, and Baseline Profiles integration has proven to be one of the most successful steps we’ve taken,” said Šarūnas Rimša, product owner at NordVPN. “We’re helping users access the services they’re entitled to faster. What's not to like?"

Get started

Learn how you can improve your app’s performance using Baseline Profiles.

Women in ML Symposium 2023: Meet the presenters



Posted by Sharbani Roy – Senior Director, Product Management, Google

We’re back with the third annual Women in Machine Learning Symposium on December 7, 2023

Join us virtually from 9:30 am to 1:00 pm PT for an immersive and insightful set of deep dives for every level of Machine Learning experience.

The Women in ML Symposium is an inclusive event for anyone passionate about the transformative fields of Machine Learning (ML) and Artificial Intelligence (AI). Meet this year’s women in ML as they uncover practical applications across multiple industries and discuss the latest advancements in frameworks, generative AI, and more.


Joana Carrasqueira, presenter for “Enabling Anyone to Build with Google AI”

Joana is a Developer Relations Lead for AI/ML at Google and her mission is to empower individuals and organizations to harness the power of AI to address real-world challenges.

She is a business leader with a track record of bringing strategic vision and global cross-functional programs to life. She’s also the creator of Google’s Women in ML program and flagship symposium, a pioneering initiative that has equipped thousands of developers with knowledge and skills in AI/ML.

Prior to Google, she worked at the Silicon Valley Innovation Center on innovation consulting for Forbes top500, startups and Venture Capital firms. Served as Education Manager at the International Pharmaceutical Federation, working closely with WHO, UNESCO, the United Nations and started her career at the Portuguese Pharmaceutical Society.

Joana holds an MBA from IE Business School, a Master in Pharmaceutical Sciences and a Leadership Certificate from U.C. Berkeley in California.



Sharbani Roy, presenter for “What’s New in Machine Learning?”

Sharbani is Sr. Director in Google’s Core Machine Learning group.

Before joining Google, Sharbani led engineering and product teams in Amazon Alexa, focused on media streaming, real-time communication, and applied ML (e.g., NLU, CV, and AR) for 1P/3P developers and end consumers.

Sharbani holds degrees in physics and mathematics from the University of Chicago and an MBA from Stanford University, and lives in Seattle with her husband and three children.



Eve Phillips, presenter for “Future of Frameworks: Navigate the OSS Landscape"

Eve is a Director of Product Management at Google.

Currently, Eve leads the ML Frameworks product team, which includes responsibility for TensorFlow, JAX and Keras. Previously, she led product teams within Google for Clinicians and ChromeOS. Prior to Google, she served as CEO of Empower Interactive, delivering tech-enabled behavioral health.

Earlier, she held roles in leading technology companies and investors including Trilogy, Microsoft, and Greylock.

Eve earned a BS and M.Eng in EECS from MIT and an MBA from Stanford.



Meenu Gaba, presenter for “Data-Centric AI: A New Paradigm"

Meenu leads the Machine Learning infrastructure team at Google, with a mission to power AI innovation with world-class ML infrastructure and services.

She is a technology leader with years of experience launching new products and growing small teams into mature scalable, multi-tiered organizations that are poised to deliver high quality products. Meenu enjoys fast-paced, dynamic, highly iterative/innovative environments and has lots of experience in balancing these disciplines while fostering a people-first culture and forming solid grounds for cross-functional relationships.

Meenu holds a Master's degree in Computer Science. In her free time, she enjoys hiking, solving crosswords, and watching movies.



Kelly Shaefer, presenter for “Maximize Your Data Exploration”

Kelly leads product teams at Google Labs, building both entirely new AI products and AI-enabled features into Google's largest existing products.

In the past, she led the Growth team for Google Workspace, including Gmail, Drive, Docs, and many more.

Outside of Google, she led the Enterprise product team at Stripe and was the P&L owner for Stripe's multi-billion dollar Payments area.

Kelly has an undergraduate degree from Wharton at UPenn, and an MBA from Harvard Business School.



Divyashree Sreepathihalli, presenter for “Keras: Shortcut to AI Mastery”

Divya is a talented machine learning software engineer who is currently a part of the Keras team at Google.

In this role, she specializes in developing Keras core modeling APIs and KerasCV to improve the functionality of the software.

Prior to joining Google, Divya worked as a Deep Learning Scientist for Zazu Sensor, a startup group in Intel's Emerging Growth Incubation (EGI) group. Her work there focused on computer vision and deep learning algorithm development for object detection and tracking, resulting in significant advancements for the startup.

Divya completed her Masters in Computer Engineering from Texas A&M University where she focused on Artificial intelligence in 2017.



Na Li, presenter for “Prototype ML with Visual Blocks”

Na Li is a software engineer manager from Google CoreML.

She leads a team to build developer tools to support ML development journey, from prototyping to model visualization and benchmarking.

Prior to Google, she was a research scientist at Harvard, working in HCI domain.

Throughout her career, Na strives to make ML accessible for everyone.



Zoe Wang, presenter for “Deploying ML Models to Mobile Devices”

Zoe is a technical program manager at Google.

Her career has been focused on Machine Learning (ML) productionization.

Currently she works with her team bringing ML models to mobile devices that power some of AI features for Pixel and other edge devices.

Prior to Google, Zoe worked at Meta on ML Platforms for end-to-end ML lifecycles.



Yvonne Li, presenter for “New GenAI Products and Solutions on Google Cloud”

Yvonne Li is a software engineer on the Duet Platform team at Google, where she focuses on improving the quality of generative AI models.

As a machine learning engineer and developer advocate at IBM, she designed and developed language models and curated open source datasets.

She has over 3 years of experience in the big tech industry, and is passionate about using machine learning to solve real-world problems.

Yvonne is the author of two Coursera courses: Data Analysis with R, and, Data Visualization with R.



Nithya Natesan, presenter for “AI-powered Infrastructure: Cloud TPUs”

Nithya Natesan is a Group Product Manager in the Cloud ML Accelerators team focussing on GPU / TPU offerings for Google Cloud.

Prior to Google, she was head of product management at NVIDIA, launching several products like DGX Cloud, Base Command Platform.

She has ~14 years of experience in hyper convergence Data Center software products, with recent focus on ML / AI Infra and Platform products. She is passionate about building rock solid PM teams, and shipping high quality usable ML / AI products.

Nithya has also won industry accolades namely WomenImpactTech 2023.



Andrada Vulpe, presenter for “Community Matters: 8 Reasons Why You Should Be Involved with Kaggle”

Andrada is a Data Scientist at Endava, a Notebooks Grandmaster on Kaggle, a Dev Expert at Weights and Biases and a proud Z by HP Data Science Global Ambassador.

She is highly passionate about Python, R, Machine and Deep Learning, powerful visualizations and everything in between.

Andrada finished her MSc in Data Science and Analytics in the UK and won 2 Kaggle Analytics competitions.



Jeehae Lee, presenter for “From Recovering Pro Golfer to AI Entrepreneur”

Jeehae Lee is a golf industry executive who has worked to create and build transformational sports technology businesses.

As the Co-Founder & CEO of Sportsbox AI, Jeehae is currently developing products using AI-enabled 3D motion analysis technology that will help participants of various sports and fitness activities learn and improve their skills.

Before founding Sportsbox, she spent five years between 2015 and 2020 at Topgolf Entertainment Group, leading strategy and new business development for various divisions including Toptracer. Between 2012 and 2013, she was at global sports and entertainment marketing agency, IMG, representing professional golfer icon Michelle Wie West. Prior to her career in sports business, she played professional golf at the highest level in the sport, competing on the LPGA tour for three years between 2009 and 2011.

Jeehae is a proud graduate of Phillips Academy in Andover, MA, and has a BA in Economics from Yale and an MBA from The Wharton School at University of Pennsylvania.



Jingwan (Cynthia) Lu, panelist for “The Impact of Generative AI in Different Industries”

Cynthia is a senior director from Adobe leading an applied research organization focusing on developing the Adobe Firefly family of GenAI models built from the ground up.

Her team started training Adobe’s first large-scale foundational model and helped rally together the rest of the company to roll out a new web-based product called Firefly featuring the image generation model as the first step in early 2023.

The same technology and its extension power Adobe Photoshop’s Generative Fill and Generative Expand features giving users intelligent image inpainting and outpainting experience. Time recognizes Adobe Photoshop Generative Fill and Generative Expand as best inventions of 2023 in the AI category.

Before Firefly, Jingwan was a computer vision research scientist and team lead who pioneered and led a large group effort to explore early generative models such as GANs within Adobe.



Wei Xiao, panelist for “The Impact of Generative AI in Different Industries”

Wei is the Director of Developer Relations at NVIDIA for the Middle East, Africa, and emerging regions. Her primary focus is to drive AI and accelerated computing integration within the ecosystem.

Before assuming her current role, Wei Xiao headed Ecosystem Engineering and Evangelism teams at both ARM and Samsung Semiconductor.

In addition to her professional endeavors, Wei dedicates her free time to teaching AI courses at the Graduate School of Computer Science at Santa Clara University.



Priya Mathur, panelist for “The Impact of Generative AI in Different Industries”

Priya is a Staff Data Science Manager at Google and she is the founder of Sparkle – GenAI Data Analyst.

At Google, she leads Data Science for Home Platform Monetization and GenAI efforts for DSPA.

Previously at Groupon, she led Data Science for App Push Notifications and TV Ads.



Katherine Chou, panelist for “The Impact of Generative AI in Different Industries”

Katherine is the Senior Director of Research and Innovations at Google with a specific focus on nurturing scientific and technical breakthroughs that can lead to global impact for science, health, climate, and advancement of platform technologies for our developers and researchers.

Katherine is focused on improving the availability and accuracy of healthcare using machine learning. She is a serial intrapreneur, particularly interested in removing health inequities and improving health and well-being outcomes across all populations.

She previously developed products within Google[x] Labs for Life Sciences (now Verily) and co-founded Medical Brain (now “Health AI'') at Google. She also headed up global teams to develop partner solutions and establish developer ecosystems for Mobile Payments, Mobile Search, GeoCommerce, YouTube, and Android.

Outside of Google, she is a Board member and Program Chair of Lewa Wildlife Conservancy, a Scientific Advisor to the ARCS Foundation, a fellow of the Zoological Society of London, and collaborates with other wildlife NGOs and the Cambridge Business Sustainability Programme in applying the Silicon Valley innovation mindset to new areas.

Katherine holds a double major in Computer Science and Economics at Stanford University and an M.S. in CS specialized in graphics.



Jaimie Hwang, presenter for “Take Action, Learn More, Start Building with Google AI”

Jaimie Hwang is a global product marketing leader with over a decade of experience, specifically in AI/ML.

She has built and led global product marketing teams at a number of AI companies, including an award-winning computer vision startup and tech giant Amazon.

She specializes in executive thought leadership, product storytelling, and integrated GTM strategy. She is passionate about promoting AI technology that is built responsibly and solves real-world problems in a human-centric way.

Jaimie holds a BS in Journalism and Integrated Marketing and Communications from Northwestern University. She lives in Seattle, Washington.


Save your spot at WiML Symposium 2023

The Women in ML Symposium offers sessions for all expertise levels, from beginners to advanced practitioners. RSVP today to secure your spot and explore our comprehensive agenda. We can’t wait to see you there!

Women in ML Symposium 2023: Meet the presenters



Posted by Sharbani Roy – Senior Director, Product Management, Google

We’re back with the third annual Women in Machine Learning Symposium on December 7, 2023

Join us virtually from 9:30 am to 1:00 pm PT for an immersive and insightful set of deep dives for every level of Machine Learning experience.

The Women in ML Symposium is an inclusive event for anyone passionate about the transformative fields of Machine Learning (ML) and Artificial Intelligence (AI). Meet this year’s women in ML as they uncover practical applications across multiple industries and discuss the latest advancements in frameworks, generative AI, and more.


Joana Carrasqueira, presenter for “Enabling Anyone to Build with Google AI”

Joana is a Developer Relations Lead for AI/ML at Google and her mission is to empower individuals and organizations to harness the power of AI to address real-world challenges.

She is a business leader with a track record of bringing strategic vision and global cross-functional programs to life. She’s also the creator of Google’s Women in ML program and flagship symposium, a pioneering initiative that has equipped thousands of developers with knowledge and skills in AI/ML.

Prior to Google, she worked at the Silicon Valley Innovation Center on innovation consulting for Forbes top500, startups and Venture Capital firms. Served as Education Manager at the International Pharmaceutical Federation, working closely with WHO, UNESCO, the United Nations and started her career at the Portuguese Pharmaceutical Society.

Joana holds an MBA from IE Business School, a Master in Pharmaceutical Sciences and a Leadership Certificate from U.C. Berkeley in California.



Sharbani Roy, presenter for “What’s New in Machine Learning?”

Sharbani is Sr. Director in Google’s Core Machine Learning group.

Before joining Google, Sharbani led engineering and product teams in Amazon Alexa, focused on media streaming, real-time communication, and applied ML (e.g., NLU, CV, and AR) for 1P/3P developers and end consumers.

Sharbani holds degrees in physics and mathematics from the University of Chicago and an MBA from Stanford University, and lives in Seattle with her husband and three children.



Eve Phillips, presenter for “Future of Frameworks: Navigate the OSS Landscape"

Eve is a Director of Product Management at Google.

Currently, Eve leads the ML Frameworks product team, which includes responsibility for TensorFlow, JAX and Keras. Previously, she led product teams within Google for Clinicians and ChromeOS. Prior to Google, she served as CEO of Empower Interactive, delivering tech-enabled behavioral health.

Earlier, she held roles in leading technology companies and investors including Trilogy, Microsoft, and Greylock.

Eve earned a BS and M.Eng in EECS from MIT and an MBA from Stanford.



Meenu Gaba, presenter for “Data-Centric AI: A New Paradigm"

Meenu leads the Machine Learning infrastructure team at Google, with a mission to power AI innovation with world-class ML infrastructure and services.

She is a technology leader with years of experience launching new products and growing small teams into mature scalable, multi-tiered organizations that are poised to deliver high quality products. Meenu enjoys fast-paced, dynamic, highly iterative/innovative environments and has lots of experience in balancing these disciplines while fostering a people-first culture and forming solid grounds for cross-functional relationships.

Meenu holds a Master's degree in Computer Science. In her free time, she enjoys hiking, solving crosswords, and watching movies.



Kelly Shaefer, presenter for “Maximize Your Data Exploration”

Kelly leads product teams at Google Labs, building both entirely new AI products and AI-enabled features into Google's largest existing products.

In the past, she led the Growth team for Google Workspace, including Gmail, Drive, Docs, and many more.

Outside of Google, she led the Enterprise product team at Stripe and was the P&L owner for Stripe's multi-billion dollar Payments area.

Kelly has an undergraduate degree from Wharton at UPenn, and an MBA from Harvard Business School.



Divyashree Sreepathihalli, presenter for “Keras: Shortcut to AI Mastery”

Divya is a talented machine learning software engineer who is currently a part of the Keras team at Google.

In this role, she specializes in developing Keras core modeling APIs and KerasCV to improve the functionality of the software.

Prior to joining Google, Divya worked as a Deep Learning Scientist for Zazu Sensor, a startup group in Intel's Emerging Growth Incubation (EGI) group. Her work there focused on computer vision and deep learning algorithm development for object detection and tracking, resulting in significant advancements for the startup.

Divya completed her Masters in Computer Engineering from Texas A&M University where she focused on Artificial intelligence in 2017.



Na Li, presenter for “Prototype ML with Visual Blocks”

Na Li is a software engineer manager from Google CoreML.

She leads a team to build developer tools to support ML development journey, from prototyping to model visualization and benchmarking.

Prior to Google, she was a research scientist at Harvard, working in HCI domain.

Throughout her career, Na strives to make ML accessible for everyone.



Zoe Wang, presenter for “Deploying ML Models to Mobile Devices”

Zoe is a technical program manager at Google.

Her career has been focused on Machine Learning (ML) productionization.

Currently she works with her team bringing ML models to mobile devices that power some of AI features for Pixel and other edge devices.

Prior to Google, Zoe worked at Meta on ML Platforms for end-to-end ML lifecycles.



Yvonne Li, presenter for “New GenAI Products and Solutions on Google Cloud”

Yvonne Li is a software engineer on the Duet Platform team at Google, where she focuses on improving the quality of generative AI models.

As a machine learning engineer and developer advocate at IBM, she designed and developed language models and curated open source datasets.

She has over 3 years of experience in the big tech industry, and is passionate about using machine learning to solve real-world problems.

Yvonne is the author of two Coursera courses: Data Analysis with R, and, Data Visualization with R.



Nithya Natesan, presenter for “AI-powered Infrastructure: Cloud TPUs”

Nithya Natesan is a Group Product Manager in the Cloud ML Accelerators team focussing on GPU / TPU offerings for Google Cloud.

Prior to Google, she was head of product management at NVIDIA, launching several products like DGX Cloud, Base Command Platform.

She has ~14 years of experience in hyper convergence Data Center software products, with recent focus on ML / AI Infra and Platform products. She is passionate about building rock solid PM teams, and shipping high quality usable ML / AI products.

Nithya has also won industry accolades namely WomenImpactTech 2023.



Andrada Vulpe, presenter for “Community Matters: 8 Reasons Why You Should Be Involved with Kaggle”

Andrada is a Data Scientist at Endava, a Notebooks Grandmaster on Kaggle, a Dev Expert at Weights and Biases and a proud Z by HP Data Science Global Ambassador.

She is highly passionate about Python, R, Machine and Deep Learning, powerful visualizations and everything in between.

Andrada finished her MSc in Data Science and Analytics in the UK and won 2 Kaggle Analytics competitions.



Jeehae Lee, presenter for “From Recovering Pro Golfer to AI Entrepreneur”

Jeehae Lee is a golf industry executive who has worked to create and build transformational sports technology businesses.

As the Co-Founder & CEO of Sportsbox AI, Jeehae is currently developing products using AI-enabled 3D motion analysis technology that will help participants of various sports and fitness activities learn and improve their skills.

Before founding Sportsbox, she spent five years between 2015 and 2020 at Topgolf Entertainment Group, leading strategy and new business development for various divisions including Toptracer. Between 2012 and 2013, she was at global sports and entertainment marketing agency, IMG, representing professional golfer icon Michelle Wie West. Prior to her career in sports business, she played professional golf at the highest level in the sport, competing on the LPGA tour for three years between 2009 and 2011.

Jeehae is a proud graduate of Phillips Academy in Andover, MA, and has a BA in Economics from Yale and an MBA from The Wharton School at University of Pennsylvania.



Jingwan (Cynthia) Lu, panelist for “The Impact of Generative AI in Different Industries”

Cynthia is a senior director from Adobe leading an applied research organization focusing on developing the Adobe Firefly family of GenAI models built from the ground up.

Her team started training Adobe’s first large-scale foundational model and helped rally together the rest of the company to roll out a new web-based product called Firefly featuring the image generation model as the first step in early 2023.

The same technology and its extension power Adobe Photoshop’s Generative Fill and Generative Expand features giving users intelligent image inpainting and outpainting experience. Time recognizes Adobe Photoshop Generative Fill and Generative Expand as best inventions of 2023 in the AI category.

Before Firefly, Jingwan was a computer vision research scientist and team lead who pioneered and led a large group effort to explore early generative models such as GANs within Adobe.



Wei Xiao, panelist for “The Impact of Generative AI in Different Industries”

Wei is the Director of Developer Relations at NVIDIA for the Middle East, Africa, and emerging regions. Her primary focus is to drive AI and accelerated computing integration within the ecosystem.

Before assuming her current role, Wei Xiao headed Ecosystem Engineering and Evangelism teams at both ARM and Samsung Semiconductor.

In addition to her professional endeavors, Wei dedicates her free time to teaching AI courses at the Graduate School of Computer Science at Santa Clara University.



Priya Mathur, panelist for “The Impact of Generative AI in Different Industries”

Priya is a Staff Data Science Manager at Google and she is the founder of Sparkle – GenAI Data Analyst.

At Google, she leads Data Science for Home Platform Monetization and GenAI efforts for DSPA.

Previously at Groupon, she led Data Science for App Push Notifications and TV Ads.



Katherine Chou, panelist for “The Impact of Generative AI in Different Industries”

Katherine is the Senior Director of Research and Innovations at Google with a specific focus on nurturing scientific and technical breakthroughs that can lead to global impact for science, health, climate, and advancement of platform technologies for our developers and researchers.

Katherine is focused on improving the availability and accuracy of healthcare using machine learning. She is a serial intrapreneur, particularly interested in removing health inequities and improving health and well-being outcomes across all populations.

She previously developed products within Google[x] Labs for Life Sciences (now Verily) and co-founded Medical Brain (now “Health AI'') at Google. She also headed up global teams to develop partner solutions and establish developer ecosystems for Mobile Payments, Mobile Search, GeoCommerce, YouTube, and Android.

Outside of Google, she is a Board member and Program Chair of Lewa Wildlife Conservancy, a Scientific Advisor to the ARCS Foundation, a fellow of the Zoological Society of London, and collaborates with other wildlife NGOs and the Cambridge Business Sustainability Programme in applying the Silicon Valley innovation mindset to new areas.

Katherine holds a double major in Computer Science and Economics at Stanford University and an M.S. in CS specialized in graphics.



Jaimie Hwang, presenter for “Take Action, Learn More, Start Building with Google AI”

Jaimie Hwang is a global product marketing leader with over a decade of experience, specifically in AI/ML.

She has built and led global product marketing teams at a number of AI companies, including an award-winning computer vision startup and tech giant Amazon.

She specializes in executive thought leadership, product storytelling, and integrated GTM strategy. She is passionate about promoting AI technology that is built responsibly and solves real-world problems in a human-centric way.

Jaimie holds a BS in Journalism and Integrated Marketing and Communications from Northwestern University. She lives in Seattle, Washington.


Save your spot at WiML Symposium 2023

The Women in ML Symposium offers sessions for all expertise levels, from beginners to advanced practitioners. RSVP today to secure your spot and explore our comprehensive agenda. We can’t wait to see you there!

Snapchat integrated new camera features 50% faster with the Camera2 Extensions API

Posted by Fred Chung, Android Developer Relations

Snapchat is a visual messaging app that enhances Snapchatters’ relationships with friends, family, and the world. It opens to the camera and offers millions of augmented reality and AI-powered Lenses for self expression, learning, and play. Ensuring Snapchatters can easily capture and share their lives with close friends and family is a priority for Snapchat, and they're always exploring new ways to improve the overall app experience.

As part of this, the Snapchat team added new camera features into the app using Android’s Camera2 Extensions API, which allows developers to access various capabilities that OEMs have implemented on various devices, like Night Mode and Bokeh. Thanks to Android’s intuitive API, the Snapchat team implemented new camera features 50% faster than before.

Camera2 Extensions API gives access to advanced features

The Snapchat team wanted to optimize the application for the expanding selection of Android devices, knowing many OEMs differentiate their devices with their respective camera technologies. As Snapchat is a primarily visual app that works with a device’s camera, the team optimizes the app to take full advantage of each device’s unique hardware.

“We wanted to leverage each OEM’s software to enhance the Snapchat experience on Android,” said Ye Tian, a software engineer at Snapchat. “This would help the app achieve higher-quality Snaps that are comparable to what a device's native camera offers.”

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiU-F-r5DnCnu0YdvwT1OYmjJinR-gH1LNVq_wmImMPU4rwXDfroLQlvht1k8640kMaTabS8maaYYeRgfDQwBYrjv8Gi5QygnmWMb1nw-X8OfxSxEoSjp3V56uhg3lbdoaXRruZHzHuvscejVS-9dsqeQHzJ9QDytZQuQmmZRQcfLYb42v578M4Ln8OX9g/s1600/image3.gif
Snapchat developers enhanced the app’s zoom and night mode camera capabilities using the Camera2 Extensions API

What started as a goal to improve the app’s low-light capabilities led to much more. The Snapchat team worked on finding new ways to improve the app’s camera capabilities by implementing features like night mode, portrait mode, face retouch, tap-to-focus, zoom, and more.

“Our collaboration with Google Pixel paved the way for collaboration with other OEMs to implement night mode and super-night mode in their devices with very minimal code changes,” said Ye. “The Camera2 Extensions API is flexible and extensive. Snapchat can now use it to build full-fledged applications on demand without negatively impacting performance and stability.”

The implementation via the Camera2 Extension API made it easy for Snapchat developers to add more camera features into the app. And using the extensions made available with Android’s camera API, Snapchat integrated new camera features 50% faster when compared to the typical industry-standard approaches it used in the past.

The Camera2 Extensions API is flexible and extensive. Snapchat can now use it to build full-fledged applications on demand without negatively impacting performance and stability.” — Ye Tian, Software Engineer at Snapchat

More opportunities on more devices

The Snapchat team was happy to give its users a more cohesive experience using the Camera2 Extensions API. Thanks to the extensions provided in the API, developers easily improved the app’s camera on a range of manufacturer devices using the Android platform, and much faster than before.

“I enjoy the diversity of the Android platform and utilizing the unique advantages of each mobile phone manufacturers’ devices,” said Ye. “It helps us bring their cutting-edge innovations into the Snapchat app, allowing Snapchatters to better capture their life moments.”

Snapchat’s team looks forward to working with more OEMs to further improve the app’s processing capabilities across devices using the Camera2 Extensions API. They’re also looking forward to improving the app’s backward compatibility using the new API, which will allow even more users to benefit from the extensions.

“I would recommend using Camera2 Extension API. It provides extensive functionalities and stable performance to improve the velocity that developers can deliver features,” said Ye.

Get started

Learn how to increase your app’s camera capabilities with the Camera2 Extensions API.

Bringing New Input Support to Desktop AVD

Posted by Joshua Hale – Software Engineer

As large screens become increasingly important within the Android app ecosystem, we are committed to enhance tools to help Android developers adapt their apps for these large screen form factors. In doing so, we strive to ensure that we can bring impactful tools to enhance the overall experience for building for all large screens such as foldables, tablets, and Chromebooks.

Over the last year, the team has worked on bringing Android 13 to the Desktop AVD, along with some additional enhancements to input support within the emulator. The Android 13 release of the Desktop AVD is now available within Android Studio. To test using this emulator, create a new virtual device.

What is the Desktop AVD?

Android Studio comes bundled with various virtual devices that run on different API levels and architectures. These emulators help developers test Android apps across a variety of devices, allowing for testing across different screen sizes, form factors, and APIs.

When an Android app runs on a Chromebook, it uses functionality that mirrors desktop behaviors, such as minimizing, maximizing, or resizing to a user-specified size. The Desktop Android Virtual Device (AVD) is an emulator that allows testing in a freeform windowing mode, similar to a Chromebook, to support this functionality.

For a deeper dive into the Desktop AVD, check out Desktop AVD in Android Studio.

Screenshot of the Desktop AVD emulator, rendering a clock app, browser window, and downloads folder in freeform windowing mode

What enhancements come with the Android 13 desktop AVD?

Most laptops use a keyboard—and it’s a common input device for increased productivity with tablets and foldables. Prior to Android 13, the Desktop AVD relied solely on uncustomizable input mapping built into Android Studio, which can cause friction points for users who rely on physical devices for mapped input and shortcuts. The Android 13 release of the Desktop AVD adds support for common keyboard interactions with Android apps. You can now test shortcuts, support keys, and mouse support to help you adhere to the large screen app quality guidelines.

Keyboard Shortcuts

The majority of apps within Google Play are designed for mobile usage and as such do not always support keyboard interactions. In Android 13, the Desktop AVD adds support for commonly used shortcuts, such as Ctrl+C (Copy) and Ctrl+V (Paste). These shortcuts can be used when copying text from a TextView/Text composable or pasting text into an EditText/TextField. These shortcuts are intercepted by the system and automatically applied.

Custom shortcuts (which are not intercepted by the system) are also included in this release. An example of this type of shortcut: a media player app that uses the Spacebar to play or pause media. You must use the new Hardware Input feature within Android Studio Hedgehog to use custom shortcuts. This will allow Android Studio to pass custom shortcuts directly to the emulator. If this is not enabled, Android Studio may consume the key combination.

Support keys

Android 13 supports additional keymappings for support keys. These keys are mapped to controls that are similar to experiences for keyboard shortcuts on a desktop. Some examples of these support keys include:

    • Esc: Dismisses pop-ups and notifications.
    • Delete / Backspace: Deletes text within an EditText or TextField
    • Arrow Keys: Provides in-app navigation (Arrow Up/Down to scroll).

Mouse support

In addition to enhanced keyboard support, there are additional mouse controls integrated into the Desktop AVD. Using the scroll wheel sends a mouse scroll event to the app that has input focus. Right-clicking the mouse sends a right-click event—which can be used to show context menus if the app supports it.

Where can you start?

Large screen app quality provides guidance around creating high quality large screen apps across all form factors, outlining a comprehensive set of quality requirements for most types of Android apps. Not all requirements need to be met, but it’s best practice for you to adhere to the requirements that make sense for your apps.

Create a desktop emulator today in Android Studio Hedgehog to see how your Android app responds to keyboard and mouse inputs and freeform window resizing.

How KAYAK reduced sign in time by 50% and improved security with passkeys

Posted by Kateryna Semenova, Developer Relations Engineer, Android

Introduction

KAYAK is one of the world's leading travel search engines that helps users find the best deals on flights, hotels, and rental cars. In 2023, KAYAK integrated passkeys - a new type of passwordless authentication - into its Android and web apps. As a result, KAYAK reduced the average time it takes their users to sign-up and sign-in by 50%, and also saw a decrease in support tickets.

This case study explains KAYAK's implementation on Android with Credential Manager API and RxJava. You can use this case study as a model for implementing Credential Manager to improve security and user experience in your own apps.

If you want a quick summary, check out the companion video on YouTube.


Problem

Like most businesses, KAYAK has relied on passwords in the past to authenticate users. Passwords are a liability for both users and businesses alike: they're often weak, reused, guessed, phished, leaked, or hacked.

“Offering password authentication comes with a lot of effort and risk for the business. Attackers are constantly trying to brute force accounts while not all users understand the need for strong passwords. However, even strong passwords are not fully secure and can still be phished.” – Matthias Keller, Chief Scientist and SVP, Technology at KAYAK

To make authentication more secure, KAYAK sent "magic links" via email. While helpful from a security standpoint, this extra step introduced more user friction by requiring users to switch to a different app to complete the login process. Additional measures needed to be introduced to mitigate the risk of phishing attacks.


Solution

KAYAK's Android app now uses passkeys for a more secure, user-friendly, and faster authentication experience. Passkeys are unique, secure tokens that are stored on the user's device and can be synchronized across multiple devices. Users can sign in to KAYAK with a passkey by simply using their existing device's screen lock, making it simpler and more secure than entering a password.

“We've added passkeys support to our Android app so that more users can use passkeys instead of passwords. Within that work, we also replaced our old Smartlock API implementation with the Sign in with Google supported by Credential Manager API. Now, users are able to sign up and sign in to KAYAK with passkeys twice as fast as with an email link, which also improves the completion rate" – Matthias Keller, Chief Scientist and SVP, Technology at KAYAK


Credential Manager API integration

To integrate passkeys on Android, KAYAK used the Credential Manager API. Credential Manager is a Jetpack library that unifies passkey support starting with Android 9 (API level 28) and support for traditional sign-in methods such as passwords and federated authentication into a single user interface and API.

Image of Credential Manager's passkey creation screen.
Figure 1: Credential Manager's passkey creation screens.

Designing a robust authentication flow for apps is crucial to ensure security and a trustworthy user experience. The following diagram demonstrates how KAYAK integrated passkeys into their registration and authentication flows:

Flow diagram of KAYAK's registration and authentication processes
Figure 2:KAYAK's diagram showing their registration and authentication flows.

At registration time, users are given the opportunity to create a passkey. Once registered, users can sign in using their passkey, Sign in with Google, or password. Since Credential Manager launches the UI automatically, be careful not to introduce unexpected wait times, such as network calls. Always fetch a one-time challenge and other passkeys configuration (such as RP ID) at the beginning of any app session.

While the KAYAK team is now heavily invested in coroutines, their initial integration used RxJava to integrate with the Credential Manager API. They wrapped Credential Manager calls into RxJava as follows:

override fun createCredential(request: CreateCredentialRequest, activity: Activity): Single<CreateCredentialResponse> { return Single.create { emitter -> // Triggers credential creation flow credentialManager.createCredentialAsync( request = request, activity = activity, cancellationSignal = null, executor = Executors.newSingleThreadExecutor(), callback = object : CredentialManagerCallback<CreateCredentialResponse, CreateCredentialException> { override fun onResult(result: CreateCredentialResponse) { emitter.onSuccess(result) } override fun onError(e: CreateCredentialException) { emitter.tryOnError(e) } } ) } }

This example defines a Kotlin function called createCredential() that returns a credential from the user as an RxJava Single of type CreateCredentialResponse. The createCredential() function encapsulates the asynchronous process of credential registration in a reactive programming style using the RxJava Single class.

For a Kotlin implementation of this process using coroutines, read the Sign in your user with Credential Manager guide.

New user registration sign-up flow

This example demonstrates the approach KAYAK used to register a new credential, here Credential Manager was wrapped in Rx primitives.


webAuthnRetrofitService .getClientParams(username = /** email address **/) .flatMap { response -> // Produce a passkeys request from client params that include a one-time challenge CreatePublicKeyCredentialOption(/** produce JSON from response **/) } .subscribeOn(schedulers.io()) .flatMap { request -> // Call the earlier defined wrapper which calls the Credential Manager UI // to register a new passkey credential credentialManagerRepository .createCredential( request = request, activity = activity ) } .flatMap { // send credential to the authentication server } .observeOn(schedulers.main()) .subscribe( { /** process successful login, update UI etc. **/ }, { /** process error, send to logger **/ } )

Rx allowed KAYAK to produce more complex pipelines that can involve multiple interactions with Credential Manager.

Existing user sign-in

KAYAK used the following steps to launch the sign-in flow. The process launches a bottom sheet UI element, allowing the user to log in using a Google ID and an existing passkey or saved password.

Image of bottom sheet for passkey authentication
Figure 3:Bottom sheet for passkey authentication.

Developers should follow these steps when setting up a sign-in flow:

  1. Since the bottom sheet is launched automatically, be careful not to introduce unexpected wait times in the UI, such as network calls. Always fetch a one-time challenge and other passkeys configuration (such as RP ID) at the beginning of any app session.
  2. When offering Google sign-in via Credential Manager API, your code should initially look for Google accounts that have already been used with the app. To handle this, call the API with the setFilterByAuthorizedAccounts parameter set to true.
  3. If the result returns a list of available credentials, the app shows the bottom sheet authentication UI to the user.
  4. If a NoCredentialException appears, no credentials were found: No Google accounts, no passkeys, and no saved passwords. At this point, your app should call the API again and set setFilterByAuthorizedAccounts to false to initiate the Sign up with Google flow.
  5. Process the credential returned from Credential Manager.
Single.fromSupplier<GetPublicKeyCredentialOption> { GetPublicKeyCredentialOption(/** Insert challenge and RP ID that was fetched earlier **/) } .flatMap { response -> // Produce a passkeys request GetPublicKeyCredentialOption(response.toGetPublicKeyCredentialOptionRequest()) } .subscribeOn(schedulers.io()) .map { publicKeyCredentialOption -> // Merge passkeys request together with other desired options, // such as Google sign-in and saved passwords. } .flatMap { request -> // Trigger Credential Manager system UI credentialManagerRepository.getCredential( request = request, activity = activity ) } .onErrorResumeNext { throwable -> // When offering Google sign-in, it is recommended to first only look for Google accounts // that have already been used with our app. If there are no such Google accounts, no passkeys, // and no saved passwords, we try looking for any Google sign-in one more time. if (throwable is NoCredentialException) { return@onErrorResumeNext credentialManagerRepository.getCredential( request = GetCredentialRequest(/* Google ID with filterByAuthorizedOnly = false */), activity = activity ) } Single.error(throwable) } .flatMapCompletable { // Step 1: Use Retrofit service to send the credential to the server for validation. Waiting // for the server is handled on a IO thread using subscribeOn(schedulers.io()). // Step 2: Show the result in the UI. This includes changes such as loading the profile // picture, updating to the personalized greeting, making member-only areas active, // hiding the sign-in dialog, etc. The activities of step 2 are executed on the main thread. } .observeOn(schedulers.main()) .subscribe( // Handle errors, e.g. send to log ingestion service. // A subset of exceptions shown to the user can also be helpful, // such as user setup problems. // Check out more info in Troubleshoot common errors at // https://developer.android.com/training/sign-in/passkeys#troubleshoot )


“Once the Credential Manager API is generally implemented, it is very easy to add other authentication methods. Adding Google One-Tap Sign In was almost zero work after adding passkeys.” – Matthias Keller

To learn more, follow the guide on how to Integrate Credentials Manager API and how to Integrate Credential Manager with Sign in with Google.


UX considerations

Some of the major user experience considerations KAYAK faced when switching to passkeys included whether users should be able to delete passkeys or create more than one passkey.

Our UX guide for passkeys recommends that you have an option to revoke a passkey, and that you ensure that the user does not create duplicate passkeys for the same username in the same password manager.

Image of KAYAK's UI for passkey management
Figure 4:KAYAK's UI for passkey management.

To prevent registration of multiple credentials for the same account, KAYAK used the excludeCredentials property that lists credentials already registered for the user. The following example demonstrates how to create new credentials on Android without creating duplicates:


fun WebAuthnClientParamsResponse.toCreateCredentialRequest(): String { val credentialRequest = WebAuthnCreateCredentialRequest( challenge = this.challenge!!.asSafeBase64, relayingParty = this.relayingParty!!, pubKeyCredParams = this.pubKeyCredParams!!, userEntity = WebAuthnUserEntity( id = this.userEntity!!.id.asSafeBase64, name = this.userEntity.name, displayName = this.userEntity.displayName ), authenticatorSelection = WebAuthnAuthenticatorSelection( authenticatorAttachment = "platform", residentKey = "preferred" ), // Setting already existing credentials here prevents // creating multiple passkeys on the same keychain/password manager excludeCredentials = this.allowedCredentials!!.map { it.copy(id = it.id.asSafeBase64) }, ) return GsonBuilder().disableHtmlEscaping().create().toJson(credentialRequest) }

And this is how KAYAK implemented excludeCredentials functionality for their Web implementation.

var registrationOptions = { 'publicKey': { 'challenge': self.base64ToArrayBuffer(data.challenge), 'rp': data.rp, 'user': { 'id': new TextEncoder().encode(data.user.id), 'name': data.user.name, 'displayName': data.user.displayName }, 'pubKeyCredParams': data.pubKeyCredParams, 'authenticatorSelection': { 'residentKey': 'required' } } }; if (data.allowCredentials && data.allowCredentials.length > 0) { var excludeCredentials = []; for (var i = 0; i < data.allowCredentials.length; i++) { excludeCredentials.push({ 'id': self.base64ToArrayBuffer(data.allowCredentials[i].id), 'type': data.allowCredentials[i].type }); } registrationOptions.publicKey.excludeCredentials = excludeCredentials; } navigator.credentials.create(registrationOptions);

Server-side implementation

The server-side part is an essential component of an authentication solution. KAYAK added passkey capabilities to their existing authentication backend by utilizing WebAuthn4J, an open source Java library.

KAYAK broke down the server-side process into the following steps:

  1. The client requests parameters needed to create or use a passkey from the server. This includes the challenge, the supported encryption algorithm, the relying party ID, and related items. If the client already has a user email address, the parameters will include the user object for registration, and a list of passkeys if any exist.
  2. The client runs browser or app flows to start passkey registration or sign-in.
  3. The client sends retrieved credential information to the server. This includes client ID, authenticator data, client data, and other related items. This information is needed to create an account or verify a sign-in.

When KAYAK worked on this project, no third-party products supported passkeys. However, many resources are now available for creating a passkey server, including documentation and library examples.


Results

Since integrating passkeys, KAYAK has seen a significant increase in user satisfaction. Users have reported that they find passkeys to be much easier to use than passwords, as they do not require users to remember or type in a long, complex string of characters. KAYAK reduced the average time it takes their users to sign-up and sign-in by 50%, have seen a decrease in support tickets related to forgotten passwords, and have made their system more secure by reducing their exposure to password-based attacks. Thanks to these improvements, ​​KAYAK plans to eliminate password-based authentication in their app by the end of 2023.

“Passkeys make creating an account lightning fast by removing the need for password creation or navigating to a separate app to get a link or code. As a bonus, implementing the new Credential Manager library also reduced technical debt in our code base by putting passkeys, passwords and Google sign-in all into one new modern UI. Indeed, users are able to sign up and sign in to KAYAK with passkeys twice as fast as with an email link, which also improves the completion rate." – Matthias Keller


Conclusion

Passkeys are a new and innovative authentication solution that offers significant benefits over traditional passwords. KAYAK is a great example of how an organization can improve the security and usability of its authentication process by integrating passkeys. If you are looking for a more secure and user-friendly authentication experience, we encourage you to consider using passkeys with Android's Credential Manager API.

Password manager Dashlane sees 70% increase in conversion rate for signing-in with passkeys compared to passwords

Posted by Milica Mihajlija, Technical Writer

This article was originally posted on Google for Developers

Dashlane is a password management tool that provides a secure way to manage user credentials, access control, and authentication across multiple systems and applications. Dashlane has over 18 million users and 20,000 businesses in 180 countries. It’s available on Android, iOS, macOS, Windows, and as a web app with an extension for Chrome, Firefox, Edge, and Safari.


The opportunity

Many users choose password managers because of the pain and frustration of dealing with passwords. While password managers help here, the fact remains that one of the biggest issues with passwords are security breaches. Passkeys on the other hand bring passwordless authentication with major advancements in security.

Passkeys are a simple and secure authentication technology that enables signing in to online accounts without entering a password. They cannot be reused, don't leak in server breaches of relying parties, and protect users from phishing attacks. Passkeys are built on open standards and work on all major platforms and browsers.

As an authentication tool, Dashlane’s primary goal is to ensure customers’ credentials are kept safe. They realized how significant the impact of passkeys could be to the security of their users and adapted their applications to support passkeys across devices, browsers, and platforms. With passkey support they provide users a secure and convenient access with a phishing-resistant authentication method.


Implementation

Passkeys as a replacement for passwords is a relatively new concept and to address the challenge of going from a familiar to an unfamiliar way of logging in, the Dashlane team considered various solutions.

On the desktop web they implemented conditional UI support through a browser extension to help users gracefully navigate the choice between using a password and a passkey to log into websites that support both login methods. As soon as the user taps on the username input field, an autofill suggestion dialog pops up with the stored passkeys and password autofill suggestions. The user can then choose an account and use the device screen lock to sign in.

Moving image showing continual UI experience on the web

Note: To learn how to add passkeys support with conditional UI to your web app check out Create a passkey for passwordless logins and Sign in with a passkey through form autofill.

On Android, they used the Credential Manager API which supports multiple sign-in methods, such as username and password, passkeys, and federated sign-in solutions (such as Sign-in with Google) in a single API. The Credential Manager simplifies the development process and it has enabled Dashlane to implement passkeys support on Android in 8 weeks with a team of one engineer.

Moving image showing authentication UI experience in android

Note: If you are a credential provider, such as a password manager app, check out the guide on how to integrate Credential Manager with your credential provider solution.


Results

Data shows that users are more satisfied with the passkey flows than the existing password flows.

The conversion rate is 92% on passkey authentication opportunities on the web (when Dashlane suggests a saved passkey for the user to sign in), compared to a 54% conversion rate on opportunities to automatically sign in with passwords. That’s a 70% increase in conversion rate compared to passwords–a great sign for passkey adoption.

Graph showing evolution of positive actions on passkeys, measuring the rates of authentication with a passkey and registration of a passkey over a six month period

Image showing password sign-in prompt
Password sign-in prompt.

Image showing passkey sign-in prompt
Passkey sign-in prompt.

The conversion rate here refers to user actions when they visit websites that support passkeys. If a user attempts to register or use a passkey they will see a Dashlane dialog appear on Chrome on desktop. If they proceed and create new or use an existing passkey it is considered a success. If they dismiss the dialog or cancel passkey creation, it’s considered a failure. The same user experience flow applies to passwords.

Dashlane also saw a 63% conversion rate on passkey registration opportunities (when Dashlane offers to save a newly created passkey to the user’s vault) compared to only around 25% conversion rate on suggestions to save new passwords. This indicates that Dashlane’s suggestions to save passkeys are more relevant and precise than the suggestions to save passwords.

Image showing save passkey prompt
Save passkey prompt.

Image showing save password prompt
Save password prompt.

Dashlane observed an acceleration of passkey usage with 6.8% average weekly growth of passkeys saved and used on the web.

graph showing % of Active users that performed a passkey related event, out of users having ever interacted with a passkey with a moving average on 7 days over a six month period
Save password prompt.

Takeaways

While passkeys are a new technology that users are just starting to get familiar with, the adoption rate and positive engagement rates show that Dashlane users are more satisfied with passkey flows than the existing password flows. 


“Staying up to date on developments in the market landscape and industry, anticipating the potential impact to your customers’ experience, and being ready to meet their needs can pay off. Thanks in part to our rapid implementation of the Credential Manager API, customers can rest assured that they can continue to rely on Dashlane to store and help them access services, no matter how authentication methods evolve.“ –Rew Islam, Director of Product Engineering and Innovation at Dashlane
 

Dashlane tracks and investigates all passkey errors and says that there haven’t been many. They also receive few questions from customers around how to use or manage their passkeys. This can be a sign of an intuitive user experience, clear help center documentation, a tendency of passkey users today already being knowledgeable about passkeys, or some combination of these factors.