The Gemini API provides access to Imagen 3, Google's state-of-the-art image generation model. Using Imagen, you can generate novel images from text prompts. The Gemini API integration with Imagen is designed to help you build next-generation AI applications that transform user prompts into high quality visual assets in a matter of seconds.
This guide will help you get started with Imagen using the Gemini API Python SDK.
About Imagen 3
Imagen 3 is Google's highest quality text-to-image model, featuring a number of new and improved capabilities. Imagen 3 can do the following:
- Generate images with better detail, richer lighting, and fewer distracting artifacts than previous models.
- Understand prompts written in natural, everyday language, making it easier to generate aligned output without complex prompt engineering.
- Generate images in a wide range of formats and styles, from photorealistic landscapes to richly textured oil paintings or whimsical claymation scenes.
- Render text more effectively than previous models, opening up new possibilities for use cases like stylized birthday cards, presentations, and more.
Imagen 3 was built with Google's latest safety and responsibility innovations, from data and model development to production. The Google DeepMind team used extensive filtering and data labeling to minimize harmful content in datasets and reduce the likelihood of harmful outputs. The team also conducted red teaming and evaluations on topics including fairness, bias, and content safety.
To learn more and see example output, see the Google DeepMind Imagen 3 overview.
Before you begin: Set up your project and API key
pip install -U git+https://github.com/google-gemini/generative-ai-python@imagen
Before calling the Gemini API, you need to set up your project and configure your API key.
Get and secure your API key
You need an API key to call the Gemini API. If you don't already have one, create a key in Google AI Studio.
It's strongly recommended that you do not check an API key into your version control system.
You should store your API key in a secrets store such as Google Cloud Secret Manager.
This tutorial assumes that you're accessing your API key as an environment variable.
Install the SDK package and configure your API key
Install the dependency using pip:
pip install -U git+https://github.com/google-gemini/generative-ai-python@imagen
Import the package and configure the service with your API key:
import os import google.generativeai as genai genai.configure(api_key=os.environ['API_KEY'])
Generate images
This section shows you how to instantiate an Imagen model and generate images.
To run the example code, you must first install Pillow:
pip install --upgrade Pillow
Then, with Pillow and the Python SDK installed, you can use the following code to generate images:
import os
import google.generativeai as genai
genai.configure(api_key=os.environ['API_KEY'])
imagen = genai.ImageGenerationModel("imagen-3.0-generate-001")
result = imagen.generate_images(
prompt="Fuzzy bunnies in my kitchen",
number_of_images=4,
safety_filter_level="block_only_high",
person_generation="allow_adult",
aspect_ratio="3:4",
negative_prompt="Outside",
)
for image in result.images:
print(image)
# The output should look similar to this:
# <vertexai.preview.vision_models.GeneratedImage object at 0x78f3396ef370>
# <vertexai.preview.vision_models.GeneratedImage object at 0x78f3396ef700>
# <vertexai.preview.vision_models.GeneratedImage object at 0x78f33953c2b0>
# <vertexai.preview.vision_models.GeneratedImage object at 0x78f33953c280>
for image in result.images:
# Open and display the image using your local operating system.
image._pil_image.show()
The notebook should display four images similar to this one:
Imagen model parameters
The following parameters are available for generate_images()
:
prompt
: The text prompt for the image.negative_prompt
: A description of what you want to omit in the generated images. Defaults to none.For example, consider the prompt "a rainy city street at night with no people". The model might interpret "people" as a directive of what to include instead of omit. To generate better results, you could use the prompt "a rainy city street at night" with a negative prompt "people".
number_of_images
: The number of images to generate, from 1 to 4 (inclusive). The default is 4.aspect_ratio
: Changes the aspect ratio of the generated image. Supported values are"1:1"
,"3:4"
,"4:3"
,"9:16"
, and"16:9"
. The default is"1:1"
.safety_filter_level
: Adds a filter level to safety filtering. The following values are valid:"block_low_and_above"
: Block when the probability score or the severity score isLOW
,MEDIUM
, orHIGH
."block_medium_and_above"
: Block when the probability score or the severity score isMEDIUM
orHIGH
."block_only_high"
: Block when the probability score or the severity score isHIGH
.
person_generation
: Allow the model to generate images of people. The following values are supported:"dont_allow"
: Block generation of images of people."allow_adult"
: Generate images of adults, but not children.
Text prompt language
The following input text prompt languages are supported:
- Chinese (simplified) (
zh
/zh-CN
) - Chinese (traditional) (
zh-TW
) - English (
en
) - Hindi (
hi
) - Japanese (
ja
) - Korean (
ko
) - Portuguese (
pt
) - Spanish (
es
)
What's next
Imagen 3 in Gemini API is in early access. Stay tuned for announcements about the status of the feature.