Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window. Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window.

Batch text generation

Batch predictions are a way to efficiently send multiple multimodal prompts that are not latency sensitive. Unlike online prediction, where you are limited to one input prompt at a time, you can send a large number of multimodal prompts in a single batch request. Then, your responses asynchronously populate in your BigQuery storage output location.

Batch requests for Gemini models are discounted 50% from standard requests. To learn more, see the Pricing page.

Multimodal models that support batch predictions

The following multimodal models support batch predictions.

gemini-1.5-flash-001
gemini-1.5-pro-001
gemini-1.0-pro-002
gemini-1.0-pro-001

Prepare your inputs

Batch requests for multimodal models accept BigQuery storage sources and Cloud Storage sources.

BigQuery storage input

The content in the request column must be valid JSON. This JSON data represents your input for the model.
The content in the JSON instructions must match the structure of a GenerateContentRequest.
Your input table can have columns other than request. They are ignored for content generation but included in the output table. The system reserves two column names for output: response and status. These are used to provide information about the outcome of the batch prediction job.
Batch prediction doesn't support the fileData field for Gemini.

Example input (JSON)
`{ "contents": [ { "role": "user", "parts": { "text": "Give me a recipe for banana bread." } } ], "system_instruction": { "parts": [ { "text": "You are a chef." } ] } }`

Cloud Storage input

File format: JSON Lines (JSONL)
Located in us-central1
Appropriate read permissions for the service account

fileData limitations for certain Gemini models.

Example input (JSONL)

Example input (JSONL)
{"request":{"contents": [{"role": "user", "parts": [{"text": "What is the relation between the following video and image samples?"}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/animals.mp4", "mime_type": "video/mp4"}}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg", "mime_type": "image/jpeg"}}]}]}} {"request":{"contents": [{"role": "user", "parts": [{"text": "Describe what is happening in this video."}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/another_video.mov", "mime_type": "video/mov"}}]}]}}


{"request":{"contents": [{"role": "user", "parts": [{"text": "What is the relation between the following video and image samples?"}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/animals.mp4", "mime_type": "video/mp4"}}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg", "mime_type": "image/jpeg"}}]}]}}
{"request":{"contents": [{"role": "user", "parts": [{"text": "Describe what is happening in this video."}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/another_video.mov", "mime_type": "video/mov"}}]}]}}

Request a batch response

Depending on the number of input items that you submitted, a batch generation task can take some time to complete.

REST

To test a multimodal prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

PROJECT_ID: The name of your Google Cloud project.
BP_JOB_NAME: A name you choose for your job.
INPUT_URI: The input source URI. This is a BigQuery table URI in the form bq://PROJECT_ID.DATASET.TABLE. Or your Cloud Storage bucket URI.
INPUT_SOURCE: The input source type. Options are bigquerySource and gcsSource.
INSTANCES_FORMAT: Input instances format - can be `jsonl` or `bigquery`.
OUTPUT_URI: The URI of the output or target output table, in the form bq://PROJECT_ID.DATASET.TABLE. If the table doesn't already exist, then it is created for you.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs

Request JSON body:

{
    "displayName": "BP_JOB_NAME",
    "model": "publishers/google/models/gemini-1.0-pro-002",
    "inputConfig": {
      "instancesFormat":"INSTANCES_FORMAT",
      "inputSource":{ INPUT_SOURCE
        "inputUri" : "INPUT_URI"
      }
    },
    "outputConfig": {
      "predictionsFormat":"bigquery",
      "bigqueryDestination":{
        "outputUri": "OUTPUT_URI"
        }
    }
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}",
  "displayName": "My first batch prediction",
  "model": "projects/{PROJECT_ID}/locations/us-central1/models/gemini-1.0-pro-002",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_input"
    }
  },
  "modelParameters": {},
  "outputConfig": {
    "predictionsFormat": "bigquery",
    "bigqueryDestination": {
      "outputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_output"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2023-07-12T20:46:52.148717Z",
  "updateTime": "2023-07-12T20:46:52.148717Z",
  "modelVersionId": "1"
}

The response includes a unique identifier for the batch job. You can poll for the status of the batch job using the BATCH_JOB_ID until the job state is JOB_STATE_SUCCEEDED. For example:

curl \
  -X GET \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID

Retrieve batch output

When a batch prediction task completes, the output is stored in the BigQuery table that you specified in your request.

BigQuery output example

request	response	status
'{"content":[{...}]}'	{ "candidates": [ { "content": { "role": "model", "parts": [ { "text": "In a medium bowl, whisk together the flour, baking soda, baking powder." } ] }, "finishReason": "STOP", "safetyRatings": [ { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE", "probabilityScore": 0.14057204, "severity": "HARM_SEVERITY_NEGLIGIBLE", "severityScore": 0.14270912 } ] } ], "usageMetadata": { "promptTokenCount": 8, "candidatesTokenCount": 396, "totalTokenCount": 404 } }

request

response

status

'{"content":[{...}]}'

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "In a medium bowl, whisk together the flour, baking soda, baking powder."
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.14057204,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.14270912
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 8,
    "candidatesTokenCount": 396,
    "totalTokenCount": 404
  }
}

Cloud Storage output example

PROJECT_ID=[PROJECT ID]
REGION="us-central1"
MODEL_URI="publishers/google/models/gemini-1.0-pro-001@default"
INPUT_URI="[GCS INPUT URI]"
OUTPUT_URI="[OUTPUT URI]"

# Setting variables based on parameters
ENDPOINT="${REGION}-autopush-aiplatform.sandbox.googleapis.com"
API_VERSION=v1
ENV=autopush
BP_JOB_NAME="BP_testing_`date +%Y%m%d_%H%M%S`"

curl \
  -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${ENDPOINT}/${API_VERSION}/projects/${PROJECT_ID}/locations/${REGION}/batchPredictionJobs \
-d '{
    "name": "'${BP_JOB_NAME}'",
    "displayName": "'${BP_JOB_NAME}'",
    "model": "'${MODEL_URI}'",
    "inputConfig": {
      "instancesFormat":"jsonl",
      "gcsSource":{
        "uris" : "'${INPUT_URI}'"
      }
    },
    "outputConfig": {
      "predictionsFormat":"jsonl",
      "gcsDestination":{
        "outputUriPrefix": "'${OUTPUT_URI}'"
      }
    },
    "labels": {"stage": "'${ENV}'"},
}'

What's next

Learn how to tune a Gemini model in Overview of model tuning for Gemini
Learn more about the Batch prediction API.