Tag Archives: Google Cloud Next ’24

Gemini 1.5 Pro Now Available in 180+ Countries; With Native Audio Understanding, System Instructions, JSON Mode and More

Posted by Jaclyn Konzelmann and Megan Li - Google Labs

Grab an API key in Google AI Studio, and get started with the Gemini API Cookbook

Less than two months ago, we made our next-generation Gemini 1.5 Pro model available in Google AI Studio for developers to try out. We’ve been amazed by what the community has been able to debug, create and learn using our groundbreaking 1 million context window.

Today, we’re making Gemini 1.5 Pro available in 180+ countries via the Gemini API in public preview, with a first-ever native audio (speech) understanding capability and a new File API to make it easy to handle files. We’re also launching new features like system instructions and JSON mode to give developers more control over the model’s output. Lastly, we’re releasing our next generation text embedding model that outperforms comparable models. Go to Google AI Studio to create or access your API key, and start building.

Unlock new use cases with audio and video modalities

We’re expanding the input modalities for Gemini 1.5 Pro to include audio (speech) understanding in both the Gemini API and Google AI Studio. Additionally, Gemini 1.5 Pro is now able to reason across both image (frames) and audio (speech) for videos uploaded in Google AI Studio, and we look forward to adding API support for this soon.

screen grab of a clooege professor using Gemini 1.5 Pro to create a quiz based on their latest lecture video in Google AI Studio
You can upload a recording of a lecture, like this 117,000+ token lecture from Jeff Dean, and Gemini 1.5 Pro can turn it into a quiz with an answer key. Video sped up for demo purposes.

Gemini API Improvements

Today, we’re addressing a number of top developer requests:

1. System instructions: Guide the model’s responses with system instructions, now available in Google AI Studio and the Gemini API. Define roles, formats, goals, and rules to steer the model's behavior for your specific use case.

image showing where System Instructions is located in Google AI Studio
Set System Instructions easily in Google AI Studio

2. JSON mode: Instruct the model to only output JSON objects. This mode enables structured data extraction from text or images. You can get started with cURL, and Python SDK support is coming soon.

3. Improvements to function calling: You can now select modes to limit the model’s outputs, improving reliability. Choose text, function call, or just the function itself.

A new embedding model with improved performance

Starting today, developers will be able to access our next generation text embedding model via the Gemini API. The new model, text-embedding-004, (text-embedding-preview-0409 in Vertex AI), achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the MTEB benchmarks.

table showing Gecko: Versativel Text Embeddings Distilled from Large Language Models
'Text-embedding-004' (aka Gecko) using 256 dims output outperforms all larger 768 dim output models on MTEB benchmarks

These are just the first of many improvements coming to the Gemini API and Google AI Studio in the next few weeks. We’re continuing to work on making Google AI Studio and the Gemini API the easiest way to build with Gemini. Get started today in Google AI Studio with Gemini 1.5 Pro, explore code examples and quickstarts in our new Gemini API Cookbook, and join our community channel on Discord.

Gemma Family Expands with Models Tailored for Developers and Researchers

Posted by Tris Warkentin – Director, Product Management and Jane Fine - Senior Product Manager

In February we announced Gemma, our family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. The community's incredible response – including impressive fine-tuned variants, Kaggle notebooks, integration into tools and services, recipes for RAG using databases like MongoDB, and lots more – has been truly inspiring.

Today, we're excited to announce our first round of additions to the Gemma family, expanding the possibilities for ML developers to innovate responsibly: CodeGemma for code completion and generation tasks as well as instruction following, and RecurrentGemma, an efficiency-optimized architecture for research experimentation. Plus, we're sharing some updates to Gemma and our terms aimed at improvements based on invaluable feedback we've heard from the community and our partners.

Introducing the first two Gemma variants

CodeGemma: Code completion, generation, and chat for developers and businesses

Harnessing the foundation of our Gemma models, CodeGemma brings powerful yet lightweight coding capabilities to the community. CodeGemma models are available as a 7B pretrained variant that specializes in code completion and code generation tasks, a 7B instruction-tuned variant for code chat and instruction-following, and a 2B pretrained variant for fast code completion that fits on your local computer. CodeGemma models have several advantages:

  • Intelligent code completion and generation: Complete lines, functions, and even generate entire blocks of code – whether you're working locally or leveraging cloud resources. 
  • Enhanced accuracy: Trained on 500 billion tokens of primarily English language data from web documents, mathematics, and code, CodeGemma models generate code that's not only more syntactically correct but also semantically meaningful, helping reduce errors and debugging time. 
  • Multi-language proficiency: Your invaluable coding assistant for Python, JavaScript, Java, and other popular languages. 
  • Streamlined workflows: Integrate a CodeGemma model into your development environment to write less boilerplate, and focus on interesting and differentiated code that matters – faster.
image of streamlined workflows within an exisitng AI dev project with CodeGemma integrated
This table compares the performance of CodeGemma with other similar models on both single and multi-line code completion tasks. Learn more in the technical report.

Learn more about CodeGemma in our report or try it in this quickstart guide.

RecurrentGemma: Efficient, faster inference at higher batch sizes for researchers

RecurrentGemma is a technically distinct model that leverages recurrent neural networks and local attention to improve memory efficiency. While achieving similar benchmark score performance to the Gemma 2B model, RecurrentGemma's unique architecture results in several advantages:

  • Reduced memory usage: Lower memory requirements allow for the generation of longer samples on devices with limited memory, such as single GPUs or CPUs. 
  • Higher throughput: Because of its reduced memory usage, RecurrentGemma can perform inference at significantly higher batch sizes, thus generating substantially more tokens per second (especially when generating long sequences). 
  • Research innovation: RecurrentGemma showcases a non-transformer model that achieves high performance, highlighting advancements in deep learning research. 
graph showing maximum thoughput when sampling from a prompt of 2k tokens on TPUv5e
This chart reveals how RecurrentGemma maintains its sampling speed regardless of sequence length, while Transformer-based models like Gemma slow down as sequences get longer.

To understand the underlying technology, check out our paper. For practical exploration, try the notebook, which demonstrates how to finetune the model.

Built upon Gemma foundations, expanding capabilities

Guided by the same principles of the original Gemma models, the new model variants offer:

  • Open availability: Encourages innovation and collaboration with its availability to everyone and flexible terms of use. 
  • High-performance and efficient capabilities: Advances the capabilities of open models with code-specific domain expertise and optimized design for exceptionally fast completion and generation. 
  • Responsible design: Our commitment to responsible AI helps ensure the models deliver safe and reliable results. 
  • Flexibility for diverse software and hardware:  
    • Both CodeGemma and RecurrentGemma: Built with JAX and compatible with JAX, PyTorch, , Hugging Face Transformers, and Gemma.cpp. Enable local experimentation and cost-effective deployment across various hardware, including laptops, desktops, NVIDIA GPUs, and Google Cloud TPUs.  
    • CodeGemma: Additionally compatible with Keras, NVIDIA NeMo, TensorRT-LLM, Optimum-NVIDIA, MediaPipe, and availability on Vertex AI. 
    • RecurrentGemma: Support for all the aforementioned products will be available in the coming weeks.

Gemma 1.1 update

Alongside the new model variants, we're releasing Gemma 1.1, which includes performance improvements. Additionally, we've listened to developer feedback, fixed bugs, and updated our terms to provide more flexibility.

Get started today

These first Gemma model variants are available in various places worldwide, starting today on Kaggle, Hugging Face, and Vertex AI Model Garden. Here's how to get started:

We invite you to try the CodeGemma and RecurrentGemma models and share your feedback on Kaggle. Together, let's shape the future of AI-powered content creation and understanding.