Tag Archives: AI

Democratizing access to AI-enabled coding with Colab

An overview of how we're expanding code assistance features to all Colab users.

Source: The Official Google Blog

Leverage Gemini in your Android apps

Posted by Dave Burke, VP of Engineering

Last week we unveiled our most capable foundation model, Gemini. Gemini is multimodal – it can accept both text and image inputs. We introduced a way for Android developers to leverage our smallest model Gemini Nano, on-device. This is available on select devices through AICore, a system service that handles model management, runtimes, safety features and more, simplifying the work for developers. And today, we're introducing new ways for Android developers to access the Gemini Pro model – which runs off-device, in Google's data centers.

App development with Gemini Pro

Gemini Pro is accessible via the Gemini API, and it’s our best model for scaling across a wide range of text and image reasoning tasks. To simplify integrating Gemini Pro, you can use the Google AI SDK, a client SDK for Android. This SDK enables direct integration from Android apps and removes the need for developers to build and manage their own backend infrastructure, reducing development costs and improving velocity.

Google AI Studio provides a streamlined way for developers to integrate the Gemini Pro model, craft prompts, create API keys, and effortlessly transform ideas into AI apps. Once you have developed your prompt in Google AI Studio, you can simply click on the “Get code” action to generate a Kotlin code snippet, and start integrating Gemini today using the Google AI SDK for Android.

Generate Kotlin code for the Gemini API in Google AI Studio

We are also making it easier for developers to use the Gemini API directly in the latest preview version of Android Studio. We’re introducing a new project template for developers to get started with the Google AI SDK for Android right away. You’ll benefit from Android Studio’s enhanced code completion and lint checkers, helping with API keys and security.

New Project template for AI in Android Studio

To leverage the new template in Android Studio, start a new project through File > New > New Project and pick the Gemini API starter template. This template provides a pre-configured project with the necessary code to use the Gemini API. After choosing a project name and location, you will be prompted to generate an API key in Google AI Studio, and asked to enter it in Android Studio. Android Studio will automatically set up the project for you with the Gemini API connection, simplifying your workflow.

Alternatively, you can import the generative AI code sample and set it up in Android Studio through File > New > Import Sample, and searching for "Generative AI Sample".

Get started building AI-powered features and Android apps using Gemini Pro.

Source: Android Developers Blog

It’s time for developers and enterprises to build with Gemini Pro

Posted by Jeanine Banks – VP/GM, Developer X and Developer Relations, and Burak Gokturk – VP/GM, Cloud AI and Industry Solutions

Learn more about how to integrate Gemini Pro into your app or business at ai.google.dev

This article is also published on the Keyword blog.

Last week, we announced Gemini, our largest and most capable AI model and the next step in our journey to make AI more helpful for everyone. It comes in three sizes: Ultra, Pro and Nano. We've already started rolling out Gemini in our products: Gemini Nano is in Android, starting with Pixel 8 Pro, and a specifically tuned version of Gemini Pro is in Bard.

Today, we’re making Gemini Pro available for developers and enterprises to build for your own use cases, and we’ll be further fine-tuning it in the weeks and months ahead as we listen and learn from your feedback.

Gemini Pro is available today

The first version of Gemini Pro is now accessible via the Gemini API and here’s more about it:

Gemini Pro outperforms other similarly-sized models on research benchmarks.

Today’s version comes with a 32K context window for text, and future versions will have a larger context window.

It’s free to use right now, within limits, and it will be competitively priced.

It comes with a range of features: function calling, embeddings, semantic retrieval and custom knowledge grounding, and chat functionality.

It supports 38 languages across 180+ countries and territories worldwide.

In today’s release, Gemini Pro accepts text as input and generates text as output. We’ve also made a dedicated Gemini Pro Vision multimodal endpoint available today that accepts text and imagery as input, with text output.

SDKs are available for Gemini Pro to help you build apps that run anywhere. Python, Android (Kotlin), Node.js, Swift and JavaScript are all supported.

A screenshot of a code snippet illustrating the SDKs supporting Gemini.

Gemini Pro has SDKs that help you build apps that run anywhere.

Google AI Studio: The fastest way to build with Gemini

Google AI Studio is a free, web-based developer tool that enables you to quickly develop prompts and then get an API key to use in your app development. You can sign into Google AI Studio with your Google account and take advantage of the free quota, which allows 60 requests per minute — 20x more than other free offerings. When you’re ready, you can simply click on “Get code” to transfer your work to your IDE of choice, or use one of the quickstart templates available in Android Studio, Colab or Project IDX. To help us improve product quality, when you use the free quota, your API and Google AI Studio input and output may be accessible to trained reviewers. This data is de-identified from your Google account and API key.

A screen recording of a developer using Google AI Studio.

Google AI Studio is a free, web-based developer tool that enables you to quickly develop prompts and then get an API key to use in your app development.

Build with Vertex AI on Google Cloud

When it's time for a fully-managed AI platform, you can easily transition from Google AI Studio to Vertex AI, which allows for customization of Gemini with full data control and benefits from additional Google Cloud features for enterprise security, safety, privacy and data governance and compliance.

With Vertex AI, you will have access to the same Gemini models, and will be able to:

Tune and distill Gemini with your own company’s data, and augment it with grounding to include up-to-minute information and extensions to take real-world actions.

Build Gemini-powered search and conversational agents in a low code / no code environment, including support for retrieval-augmented generation (RAG), blended search, embeddings, conversation playbooks and more.

Deploy with confidence. We never train our models on inputs or outputs from Google Cloud customers. Your data and IP are always your data and IP.

To read more about our new Vertex AI capabilities, visit the Google Cloud blog.

Gemini Pro pricing

Right now, developers have free access to Gemini Pro and Gemini Pro Vision through Google AI Studio, with up to 60 requests per minute, making it suitable for most app development needs. Vertex AI developers can try the same models, with the same rate limits, at no cost until general availability early next year, after which there will be a charge per 1,000 characters or per image across Google AI Studio and Vertex AI.

A screenshot of input and output prices for Gemini Pro.

Big impact, small price: Because of our investments in TPUs, Gemini Pro can be served more efficiently.

Looking ahead

We’re excited that Gemini is now available to developers and enterprises. As we continue to fine-tune it, your feedback will help us improve. You can learn more and start building with Gemini on ai.google.dev, or use Vertex AI’s robust capabilities on your own data with enterprise-grade controls.

Early next year, we’ll launch Gemini Ultra, our largest and most capable model for highly complex tasks, after further fine-tuning, safety testing and gathering valuable feedback from partners. We’ll also bring Gemini to more of our developer platforms like Chrome and Firebase.

We’re excited to see what you build with Gemini.

Source: Google for Developers Blog - News about Web, Mobile, AI and Cloud

Gemini Pro API and more new AI tools for developers and enterprises

Launching Gemini Pro API and four more AI tools: Imagen 2, MedLM, and Duet AI for Developers and Duet AI in Security Operations.

Source: AI

It’s time for developers and enterprises to build with Gemini Pro

Gemini Pro is now available for developers and enterprises to build AI applications.

Source: AI

NotebookLM adds more than a dozen new features

Now available in the U.S., NotebookLM has new features to help you easily read, take notes and organize your writing projects.

Source: AI

Announcing the inaugural Google for Startups Accelerator: Women Founders program, Europe & Israel – applications now open.

Posted by Karina Govindji Senior Director – LEAD - Global Workforce Diversity, and Noa Havazelet – Head of Google's accelerator programs across Europe and Israel

Applications are also open for underrepresented founders in North America

Artificial intelligence (AI) stands at the forefront of transformative technologies, reshaping industries and redefining the way we live and work. Yet, a closer look at the AI startup ecosystem reveals a stark gender disparity. Women, despite their profound capabilities and innovative prowess, often find themselves navigating a maze of obstacles in their entrepreneurial journey. Despite investment in AI software is booming globally, the venture capital funding problem for women is even more marked. Women-founded startups accounted for only 2.1% of VC deals involving AI startups¹. This is a reality that demands attention and action. Globally in 2023, all-women founding teams raised just 3% of all dollars invested in the year, with mixed gender founding teams taking 15%, leaving 82% of dollars to flow to founding teams that are all men².

Google's accelerator programs have actively taken a leading role in championing diversity and empowering women and minority founders - having supported 1100+ startups across the globe since 2016, 36% of which are women-led startups. As such, we are pleased to announce the launch of the Google for Startups Accelerator: Women Founders program (Europe & Israel), a 12 week program for Seed to Series A AI startups based in Europe and Israel.

The Google for Startups Accelerator: Women Founders program (Europe & Israel) provides a comprehensive mix of mentorship, technical support, and workshops, establishing a robust foundation for participants. Beyond Google's expert guidance, the accelerator cultivates a collaborative network among women founders, propelling innovation within the tech startup space. By empowering women founders, the Google for Startups Accelerator: Women Founders program (Europe & Israel) proactively contributes to creating a more inclusive and equitable tech community.

Applications for the Google for Startups Accelerator: Women Founders Europe & Israel program are open until January 19th, 2024. You can learn more and apply here.

In a similar vein, in North America, two other Google for Startups Accelerator programs for underrepresented founders have opened applications for the fifth Women Founders and Black Founders programs. These 10 week equity- free programs are best suited for Seed to Series A, high potential revenue generative women-led and black-led startups with growing teams (5+ employees). Applications for both programs close on February 1st, 2024.

To further explore these opportunities and why you should apply - listen to what past participants of the North American Women Founder and Black Founder programs have to say here.

¹2023 Rebalancing Innovation: Women, AI and Venture Capital in the UK, The Alan Turing Institute

²2023 State of European Tech, Atomico Report

Source: Google for Developers Blog - News about Web, Mobile, AI and Cloud

Learn more about Gemini, our most capable AI model

Explore our collection to find out more about Gemini, the most capable and general model we’ve ever built.

Source: The Official Google Blog

A New Foundation for AI on Android

Posted by Dave Burke, VP of Engineering

Foundation Models learn from a diverse range of data sources to produce AI systems capable of adapting to a wide range of tasks, instead of being trained for a single narrow use case. Today, we announced Gemini, our most capable model yet. Gemini was designed for flexibility, so it can run on everything from data centers to mobile devices. It's been optimized for three different sizes: Ultra, Pro and Nano.

Gemini Nano, optimized for mobile

Gemini Nano, our most efficient model built for on-device tasks, runs directly on mobile silicon, opening support for a range of important use cases. Running on-device enables features where the data should not leave the device, such as suggesting replies to messages in an end-to-end encrypted messaging app. It also enables consistent experiences with deterministic latency, so features are always available even when there’s no network.

Gemini Nano is distilled down from the larger Gemini models and specifically optimized to run on mobile silicon accelerators. Gemini Nano enables powerful capabilities such as high quality text summarization, contextual smart replies, and advanced proofreading and grammar correction. For example, the enhanced language understanding of Gemini Nano enables the Pixel 8 Pro to concisely summarize content in the Recorder app, even when the phone’s network connection is offline.

Moving image of Gemini Nano being used in the Recorder app on a Pixel 8 Pro device

Pixel 8 Pro using Gemini Nano in the Recorder app to summarize meeting audio, even without a network connection.

Gemini Nano is starting to power Smart Reply in Gboard on Pixel 8 Pro, ready to be enabled in settings as a developer preview. Available now to try with WhatsApp and coming to more apps next year, the on-device AI model saves you time by suggesting high-quality responses with conversational awareness¹.

Moving image of WhatsApp’s use of Smart Reply in Gboard using Gemini Nano on Pixel 8 Pro device

Smart Reply in Gboard within WhatsApp using Gemini Nano on Pixel 8 Pro.

Android AICore, a new system service for on-device foundation models

Android AICore is a new system service in Android 14 that provides easy access to Gemini Nano. AICore handles model management, runtimes, safety features and more, simplifying the work for you to incorporate AI into your apps.

AICore is private by design, following the example of Android’s Private Compute Core with isolation from the network via open-source APIs, providing transparency and auditability. As part of our efforts to build and deploy AI responsibly, we also built dedicated safety features to make it safer and more inclusive for everyone.

AICore manages model, runtime and safety features.

AICore enables Low Rank Adaptation (LoRA) fine tuning with Gemini Nano. This powerful concept enables app developers to create small LoRA adapters based on their own training data. The LoRA adapter is loaded by AICore, resulting in a powerful large language model fine tuned for the app’s own use-cases.

AICore takes advantage of new ML hardware like the latest Google Tensor TPU and NPUs in flagship Qualcomm Technologies, Samsung S.LSI and MediaTek silicon. AICore and Gemini Nano are rolling out to Pixel 8 Pro, with more devices and silicon partners to be announced in the coming months.

Build with Gemini

We're excited to bring together state-of-the-art AI research with easy-to-use tools and APIs for Android developers to build with Gemini on-device. If you are interested in building apps using Gemini Nano and AICore, please sign up for our Early Access Program.

¹ Available globally, only using the United States English keyboard language. Read more for details.

Source: Android Developers Blog

How it’s Made: Interacting with Gemini through multimodal prompting

Posted by Alexander Chen, Creative Director

Let’s try an experiment. We’ll show this picture to our multimodal model Gemini and ask it to describe what it sees:

A close up of a hand with five fingers spread out, palm facing down on a wooden surface.

Tell me what you see

Gemini: I see a person's right hand. The hand is open with the fingers spread apart.

Ok, let’s try this one:

A closeup of the same hand, now balled up in a fist.

Gemini: A person knocking on a wooden door.

How about this one?

A closeup of the same hand, with two fingers extended like that of a peace sign.

Gemini: I see a hand with two fingers extended, which is a common symbol for the number two.

But wait … what if we asked Gemini to reason about all of these images together?

A sequence of three frames depicting the same hand in different formations. The first frame shows the hand with all five fingers spread out, palm facing down. The second frame shows the hand balled up in a fist. The third frame shows the hand with two fingers extended, like that of a peace sign.

What do you think I'm doing? Hint: it's a game.

Gemini: You're playing rock, paper, scissors.

Nice. What else could we try? Let's see if Gemini can notice this pattern:

A sequence of six frames that alternate between a hand that is balled up in a fist, and a hand that has two fingers extended.

I’m playing rock paper scissors. Do you notice a pattern in my gameplay?

Gemini: You are alternating between rock and scissors.

Neat, it spotted the pattern! But does Gemini really know how this game works? Let’s try asking:

Hmm ... do you think it's a good strategy?

Gemini: No, it is not a good strategy. Your opponent will quickly learn your pattern and will be able to win every time. You should try to randomize your choices more.

Good advice, Gemini! Okay, for this last one, let’s put your multimodal reasoning capabilities to the test. Here’s your prompt:

This is a secret message. What does it say?

A six by eight grid of frames of a hands balled up in a fist or the hand with all five fingers spread out, palm facing down. The series of photos spells 'Gemini' in ASCII binary.

Can you figure it out? We have gotten Gemini to solve it, but we won’t spoil it by putting the answer in this post. 🙂

What is multimodal prompting?

Everything we did just now is an example of “multimodal prompting.” We’re basically giving Gemini combinations of different modalities — image and text in this case — and having Gemini respond by predicting what might come next. Gemini’s ability to seamlessly combine these modes together enables new possibilities for what you can do.

In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video. We’ll soon be rolling out Gemini for people to try in Google AI Studio, our free, web-based developer tool where you’ll be able to try your own multimodal prompts with Gemini. We’re hoping this guide of starter prompts and ideas helps inspire you to start exploring your own ideas.

🧮 Spatial reasoning and logic

Logic and puzzles are fun way to test Gemini. Let’s try one that requires both left-to-right spatial reasoning and knowledge about our solar system:

Three sticky notes on a wooden surface, each featuring a hand-drawn sketch of a solar-body. The left-most note depicts a drawing of the sun. The middle note has a drawing of saturn. The right-most note shows a drawing of earth.

Is this the right order? Consider the distance from the sun and explain your reasoning.

Gemini: No, the correct order is Sun, Earth, Saturn. The Sun is the closest to the center of the solar system, followed by the Earth, and then Saturn.

Nice! Here’s another, inspired by pinewood derby challenges:

Two sticky notes depicting two different hand-drawn cars going down-hill. The car on the left is blocky and awkwardly shaped, the car on the right is slanted and smooth.

Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details.

Gemini: The car on the right is more aerodynamic. It has a lower profile and a more streamlined shape. The car on the left has a higher profile and a more boxy shape, which makes it less aerodynamic.

📷 Image sequences

Next, what if we showed Gemini a sequence of images? Let’s see we can show a few still frames from a game of charades and have Gemini guess the movie:

Four frames from a video of a person leaning back and moving their arms wildly.

Guess what movie I'm acting out.

Gemini: The Matrix

Nice! But which part specifically? Look at my body movements.

Gemini: The part where Neo dodges bullets.

🪄 Magic tricks

Exploring image sequences really lets you dig into how Gemini reasons over time. Let’s try a classic magic trick.

An image of two hands with palms facing the camera, there is a coin in the right hand.

Explain what I'm doing.

Gemini: You are holding a coin in your right hand.

An image of two hands in motion, palms turning inward and away from the camera.

An image of two hands, palms facing down and away from the camera.

Where do you think the coin is?

Gemini: The coin is in your right hand.

Sure, that’s a logical answer given what it’s seen. But let’s pull a sleight of hand as we reveal this next image to Gemini...

An image of two hands, the left palm is facing down and the right hand is tilted upwards revealing nothing under the right palm.

Look here, what happened?

Gemini: The coin is gone.

Gemini notices it’s gone. But can it summarize what may have happened?

Summarize what may have happened, step by step, starting with the first image.

Gemini:

1. You started with a coin in your right hand.

2. You then made a motion with your hands, and the coin disappeared.
3. You then showed that the coin was gone.

Gemini is leveraging the entire context of this conversation – both text and image – to summarize its answer. Finally, let’s ask:

Can you explain how it disappeared?

Gemini: The coin was palmed.

Gemini makes a really solid guess! But we won’t reveal our secret 🙂.

🔀 Cup shuffling

Let’s try a trick that requires memory and logic: the classic ball and cup shuffling game. For this one, we’ll break it down into steps for Gemini. First, let’s establish the basics with these instructions:

Instructions: There are three cups in position 1 (left), position 2 (middle), position 3 (right).

Next, we’ll need to tell Gemini where we’re placing the ball:

Starting position: The ball is in position 3.

Current state: empty, empty, ball

Notice how we came up with a simple format for remembering where the ball is with “empty, empty, ball.” Next, let’s show Gemini two example turns:

Images:

Three frames depicting a person shuffling three green cups. The person switches positions of the middle and right-most cups.

Swap: 2 and 3

Current state: empty, ball, empty

Images:

Three frames depicting a person shuffling three green cups. The person switches positions of the middle and left-most cups.

Swap: 1 and 2

Current state: ball, empty, empty

In these two example turns, we’re both showing it what a swap looks like, and how to update the ball position. Now, let’s see if Gemini can generalize to a swap it hasn't seen:

Images:

Three frames depicting a person shuffling three green cups. The person switches positions of the left-most and right-most cups.

Swap:

Here's Gemini's response:

1 and 3

Current state: empty, empty, ball

Gemini got it! It looked at these images and correctly inferred that cups 1 and 3 are being swapped. And it reasoned correctly about how to update the ball position. Let’s ask:

Tell me where the ball is and summarize how it got there, in paragraph form.

Gemini: The ball is in cup 3.

Starting with the ball in cup 3, the first swap was cup 2 and 3. This put the ball in cup 2. The second swap was cup 1 and 2. This put the ball in cup 1. The third swap was cup 1 and 3. This put the ball in cup 3.

Not only did Gemini get the answer correctly, it accurately summarized the game history. Of course, it won’t always get this challenge right. Sometimes the fake out move (where you swap two empty cups) seems to trip it up, but sometimes it gets that too. But simple prompts like this make it really fun to rapidly test Gemini. You can change the variables in your prompt, including the order of swaps, and see how it does.

🔨 Tool use

If you want to use Gemini in your own apps, you’ll want it to be able to connect to other tools. Let’s try a simple idea where Gemini needs to combine multimodality with tool use: drawing a picture to search for music.

A musical instruments and palm tree sketch on a notebook. The sketch features a guitar, an amplifier, a drum set, and a palm tree drawn in a loose and expressive style. The notebook sits on a wooden desk.

Step 1. Look at this drawing, describe everything in it, and come up with a fun, specific idea for a good musical soundtrack genre, mood, or soundtrack.

Step 2. Provide a search query in the form: "Search [insert query]"

Gemini:

The drawing is of a guitar, drums, and an amplifier. There is also a palm tree. The mood is fun, upbeat, and tropical. The genre is reggae. The soundtrack could be used for a movie or TV show set in a tropical location.

Search "Reggae music for a tropical vacation"

Nice! Gemini both reasons about what it sees and then generates a search query you can parse to do a search. It’s like Gemini is acting like a translator for you – but instead of translating between languages, it’s translating modalities – from drawing to music in this case. With multimodal prompting, you can use Gemini to invent your own entirely new translations between different inputs and outputs.

🕹️Game creation

What if we tried using Gemini to quickly prototype a multimodal game? Here’s an idea: a geography guessing game where you have to point at a map to make your guess. Let’s start by prompting Gemini with the core idea:

Instructions: Let's play a game. Think of a country and give me a clue. The clue must be specific enough that there is only one correct country. I will try pointing at the country on a map.

Next, let’s give Gemini an example turn of gameplay, showing it how we want it to handle both incorrect and correct answers:

Country: This country is huge island that is mostly covered in ice.
(Answer: Greenland)

Guess:

An image of a world map, a hand icon with the index finger pointing at Canada

I am pointing at the country of Canada *incorrect*

Guess:

An image of a world map, a hand icon with the index finger pointing at Greenland

I am pointing at the country of Greenland *correct*

Let’s give it a go and prompt Gemini to generate a clue:

Clue:

This country is known for its unique wildlife, including kangaroos and koalas. (Answer: Australia)

Ok, that’s a good clue. Let’s test out whether pointing will work. Just for fun, let’s try pointing at the wrong place first:

Guess:

A person pointing at Brazil on a map of the world with their right index finger. The map includes blue oceans, and green continents with no country borders.

I am pointing at the country of Brazil *incorrect*

Great! Gemini looked at my image and figured out I’m pointing at Brazil, and correctly reasoned that’s wrong. Now let’s point at the right place on the map:

I am pointing at the country of Australia *correct*

Nice! We’ve basically taught Gemini our game logic just by giving it an example. You'll also notice that it generalized from the illustrated hand in the examples.

⌨️ Coding

Of course, to bring your game idea to life, you’ll eventually have to write some executable code. Let’s see if Gemini can make a simple countdown timer for a game, but with a few fun twists:

Implement a simple timer in HTML/CSS/Javascript. Use a sans serif font and dark mode. Start it at 10 seconds and start counting down. When it reaches zero, replace the timer with a random emoji that is associated with excitement and motivation! Then go back to the timer at 10 seconds and start counting down again.

With just this single instruction, Gemini gives us a working timer that does what we asked for:

An animated gif of a countdown timer starting from 10. At the end of the countdown, a rocket emoji is shown followed lightning bolt emoji and a confetti emoji.

My favorite part is scrolling through Gemini’s source code to find the array of motivational emojis it picked for me:

 const emojis  = ['🚀', '⚡️', '🎉', '🎊', '🥳', '🤩', '✨'];

👀 A sneak peek

Throughout this post, we’ve been giving Gemini an input, and having Gemini make predictions for what might come next. That's basically what prompting is. And our inputs have been multimodal – image and text, combined.

But so far we've only shown Gemini responding in text. Maybe you’re wondering, can Gemini also respond with a combination of image and text? It can! This is a capability of Gemini called “interleaved text and image generation.” While this feature won’t be ready in the first version of Gemini for people to try, we hope to roll it out soon. Here’s a sneak peek of what’s possible.

Let’s see if we could use Gemini to provide everyday creative inspiration. And let’s try it in a domain that requires a bit of multimodal reasoning ... knitting! 🧶. Similar to our map game above, let's provide one example turn of interaction:

Give me three ideas for crochet creations I could make with these two colors:

Balls of blue and yellow yarn on a wooden table.

Colors: I see blue and yellow yarn.

Idea 1: How about a cute dog with one yellow ear?

Image 1:

A blue and yellow crocheted dog head with ears placed on a wooden table.

Idea 2: Or how about a jellyfish with a few yellow tentacles?

Image 2:

A blue and yellow crocheted octopus with tentacles placed on a wooden table.

Idea 3: Or how about a blue cat with a yellow nose?

Image 3:

A blue and yellow crocheted cat with whiskers placed on a wooden table.

We’re essentially teaching Gemini about how we want each interaction to go: “I’ll take a photo of two balls of yarn, and I expect you (Gemini) to both come up with an idea for something I could make, and generate an image of it.”

Now, let’s show it a new pair of yarn colors it hasn't yet seen, and see if it can generalize:

Give me three ideas for crochet creations I could make with these two colors:

Balls of blue and pink yarn on a wooden table.

Gemini:

Colors: I see blue and pink yarn.

Idea 1: How about a pig with blue ears?

Image 1:

A blue and pink crocheted pig face with ears placed on a wooden table.

Idea 2: Or an octopus with pink and blue tentacles?

Image 2:

Idea 3: Or a bunny with a pink nose?

Image 3:

A blue and pink crocheted bunny placed on a wooden table.

Nice! Gemini correctly reasoned about the new colors (“I see blue and pink yarn”) and generated these ideas and the images in a single, interleaved output of text and image.

What Gemini did here is fundamentally different from today’s text-to-image models. It's not just passing an instruction to a separate text-to-image model. It sees the image of my actual yarn on my wooden table, truly doing multimodal reasoning about my text and image together.

What's Next?

We hope you found this a helpful starter guide to get a sense of what’s possible with Gemini. We’re very excited to roll it out to more people soon so you can explore your own ideas through prompting. Stay tuned!