Introducing PaliGemma, Gemma 2, and an Upgraded Responsible AI Toolkit

MAY 14, 2024
Tris Warkentin Director, Product Management
Xiaohua Zhai Senior Staff Research Scientist
Ludovic Peran Product Manager

At Google, we believe in the power of collaboration and open research to drive innovation, and we're grateful to see Gemma embraced by the community with millions of downloads within a few short months of its launch.

This enthusiastic response has been incredibly inspiring, as developers have created a diverse range of projects like Navarasa, a multilingual variant for Indic languages, to Octopus v2, an on-device action model, developers are showcasing the potential of Gemma to create impactful and accessible AI solutions.

This spirit of exploration and creativity has also fueled our development of CodeGemma, with its powerful code completion and generation capabilities, and RecurrentGemma, offering efficient inference and research possibilities.

Link to Youtube Video (visible only when JS is disabled)

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Today, we're excited to further expand the Gemma family with the introduction of PaliGemma, a powerful open vision-language model (VLM), and a sneak peek into the near future with the announcement of Gemma 2. Additionally, we're furthering our commitment to responsible AI with updates to our Responsible Generative AI Toolkit, providing developers with new and enhanced tools for evaluating model safety and filtering harmful content.


Introducing PaliGemma: Open Vision-Language Model

PaliGemma is a powerful open VLM inspired by PaLI-3. Built on open components including the SigLIP vision model and the Gemma language model, PaliGemma is designed for class-leading fine-tune performance on a wide range of vision-language tasks. This includes image and short video captioning, visual question answering, understanding text in images, object detection, and object segmentation.

We're providing both pretrained and fine-tuned checkpoints at multiple resolutions, as well as checkpoints specifically tuned to a mixture of tasks for immediate exploration.

To facilitate open exploration and research, PaliGemma is available through various platforms and resources. Start exploring today with free options like Kaggle and Colab notebooks. Academic researchers seeking to push the boundaries of vision-language research can also apply for Google Cloud credits to support their work.

Get started with PaliGemma today. You can find PaliGemma on GitHub, Hugging Face models, Kaggle, Vertex AI Model Garden, and ai.nvidia.com (accelerated with TensoRT-LLM) with easy integration through JAX and Hugging Face Transformers. (Keras integration coming soon) You can also interact with the model via this Hugging Face Space.

Screenshot from the HuggingFace Space running PaliGemma showing an image of a cat wearing a tiny hat, with his head on stack of four pancakes
Screenshot from the HuggingFace Space running PaliGemma

Announcing Gemma 2: Next-Gen Performance and Efficiency

We're thrilled to announce the upcoming arrival of Gemma 2, the next generation of Gemma models. Gemma 2 will be available in new sizes for a broad range of AI developer use cases and features a brand new architecture designed for breakthrough performance and efficiency, offering benefits such as:

  • Class Leading Performance: At 27 billion parameters, Gemma 2 delivers performance comparable to Llama 3 70B at less than half the size. This breakthrough efficiency sets a new standard in the open model landscape.

  • Reduced Deployment Costs: Gemma 2's efficient design allows it to fit on less than half the compute of comparable models. The 27B model is optimized to run on NVIDIA’s GPUs or can run efficiently on a single TPU host in Vertex AI, making deployment more accessible and cost-effective for a wider range of users.

  • Versatile Tuning Toolchains: Gemma 2 will provide developers with robust tuning capabilities across a diverse ecosystem of platforms and tools. From cloud-based solutions like Google Cloud to popular community tools like Axolotl, fine-tuning Gemma 2 will be easier than ever. Plus, seamless partner integration with Hugging Face and NVIDIA TensorRT-LLM, along with our own JAX and Keras, ensures you can optimize performance and efficiently deploy across various hardware configurations.
Gemma pre-trained model performance benchmarks
Gemma 2 is still pretraining. This chart shows performance from the latest Gemma 2 checkpoint along with benchmark pretraining metrics. Source: Hugging Face Open LLM Leaderboard (April 22, 2024) and Grok announcement blog

Stay tuned for the official launch of Gemma 2 in the coming weeks!


Expanding the Responsible Generative AI Toolkit

For this reason we're expanding our Responsible Generative AI Toolkit to help developers conduct more robust model evaluations by releasing the LLM Comparator in open source. The LLM Comparator is a new interactive and visual tool to perform effective side-by-side evaluations to assess the quality and safety of model' responses. To see the LLM Comparator in action, explore our demo showcasing a comparison between Gemma 1.1 and Gemma 1.0.

screenshot showing a side by side evaluation in the LLM Comparator

We hope that this tool will advance further the toolkit’s mission to help developers create AI applications that are not only innovative but also safe and responsible.

As we continue to expand the Gemma family of open models, we remain dedicated to fostering a collaborative environment where cutting-edge AI technology and responsible development go hand in hand. We're excited to see what you build with these new tools and how, together, we can shape the future of AI.