Rank and rerank documents with RAG

As part of your Retrieval Augmented Generation (RAG) experience in Vertex AI Agent Builder, you can rank a set of documents based on a query.

The ranking API takes a list of documents and reranks those documents based on how relevant the documents are to a query. Compared to embeddings, which look only at the semantic similarity of a document and a query, the ranking API can give you precise scores for how well a document answers a given query. The ranking API can be used to improve the quality of search results after retrieving an initial set of candidate documents.

The ranking API is stateless so there's no need to index documents before calling the API. All you need to do is pass in the query and documents. This makes the API well suited for reranking documents from Vector Search and other search solutions.

This page describes how to use the ranking API to rank a set of documents based on a query.

Use cases

The primary use case of the ranking API is to improve the quality of search results.

However, the ranking API can be valuable for any scenario where you need to find what pieces of content are most relevant to a user's query. For example, the ranking API can assist you in the following:

Finding the right content to give to an LLM for grounding
Improving the relevance of an existing search experience
Identifying relevant sections of a document

The following flow outlines how you might use the ranking API to improve the quality of results for chunked documents:

Use Document AI Layout Parser API to split a set of documents into chunks.
Use an embeddings API to create embeddings for each of the chunks.
Load the embeddings into Vector Search or another search solution.
Query your search index and retrieve the most relevant chunks.
Rerank the relevant chunks using the ranking API.

Input data

The ranking API requires the following inputs:

The query for which you're ranking the records.

For example:
```
"query": "Why is the sky blue?"
```

A set of records that are relevant to the query. The records are provided as an array of objects. Each record can include a unique ID, a title, and the content of the document. For each record include either a title, content, or both. If the length of the title and content together exceed 512 tokens, the additional content is truncated. You can include up to 200 records per request.

For example, a record array looks something like this. In reality, many more records would be included in the array and the content would be much longer:

"records": [
   {
       "id": "1",
       "title": "The Color of the Sky: A Poem",
       "content": "A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright."
   },
   {
       "id": "2",
       "title": "The Science of a Blue Sky",
       "content": "The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."
   }
]

Optional: The maximum number of records that you want the ranking API to return. By default, all records are returned; however, you can use thetopN field to return fewer records. All records are ranked regardless of what value is set.

For example, this returns the top 10 ranked records:
```
"topN": 10,
```
Optional: A setting that specifies whether you want just the ID of the record returned by the API or if you want the record title and content returned as well. By default, the full record is returned. The main reason to set this is if you want to reduce the size of the response payload.

For example, setting to true returns only the record ID, not the title or content:
```
"ignoreRecordDetailsInResponse": true,
```
Optional: The model name. This specifies the model to be used for ranking the documents. If no model is specified, then semantic-ranker-512@latest is used, which automatically points to the latest available model. To point to a specific model, specify one of the model names listed in Supported models, for example semantic-ranker-512-002.

In the following example, model is set to semantic-ranker-512@latest. This means that the ranking API will always use the latest available model.
```
"model": "semantic-ranker-512@latest"
```

Output data

The ranking API returns a ranked list of records with following outputs:

Score: a float value between 0 and 1 that indicates relevance of the record.
ID: the unique ID of the record.
If requested, the full object: the ID, title, and content.

For example:

{
    "records": [
        {
            "id": "2",
            "score": 0.98,
            "title": "The Science of a Blue Sky",
            "content": "The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."
        },
        {
            "id": "1",
            "score": 0.64,
            "title": "The Color of the Sky: A Poem",
            "content": "A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright."
        }
    ]
}

Rank (or rerank) a set of records according to a query

Typically, you'll supply the ranking API with a query and a set of records that are relevant to that query and have already been ranked by some other method such as a keyword search or a vector search. Then, you use the ranking API to improve the quality of the ranking and determine a score that indicates the relevance of each record to the query.

Obtain the query and resulting records. Ensure that each record has an ID and either a title, content, or both.

The model supports up to 512 tokens per record. If the combined length of the title and content is more than 512 tokens, the extra content is truncated.
Call the rankingConfigs.rank method using the following code:

REST

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/rankingConfigs/default_ranking_config:rank" \
-d '{
"model": "semantic-ranker-512@latest",
"query": "QUERY",
"records": [
    {
        "id": "RECORD_ID_1",
        "title": "TITLE_1",
        "content": "CONTENT_1"
    },
    {
        "id": "RECORD_ID_2",
        "title": "TITLE_2",
        "content": "CONTENT_2"
    },
    {
        "id": "RECORD_ID_3",
        "title": "TITLE_3",
        "content": "CONTENT_3"
    }
]
}'

Replace the following:

PROJECT_ID: the ID of your Google Cloud project..
QUERY: the query against which the records are ranked and scored.
RECORD_ID_n: a unique string that identifies the record.
TITLE_n: the title of the record.
CONTENT_n: the content of the record.

For general information about this method, see rankingConfigs.rank.

Click for an example curl command and response.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: my-project-123" \
    "https://discoveryengine.googleapis.com/v1/projects/my-project-123/locations/global/rankingConfigs/default_ranking_config:rank" \
    -d '{
        "model": "semantic-ranker-512@latest",
        "query": "what is Google gemini?",
        "records": [
            {
                "id": "1",
                "title": "Gemini",
                "content": "The Gemini zodiac symbol often depicts two figures standing side-by-side."
            },
            {
                "id": "2",
                "title": "Gemini",
                "content": "Gemini is a cutting edge large language model created by Google."
            },
            {
                "id": "3",
                "title": "Gemini Constellation",
                "content": "Gemini is a constellation that can be seen in the night sky."
            }
        ]
    }'

{
    "records": [
        {
            "id": "2",
            "title": "Gemini",
            "content": "Gemini is a cutting edge large language model created by Google.",
            "score": 0.97
        },
        {
            "id": "3",
            "title": "Gemini Constellation",
            "content": "Gemini is a constellation that can be seen in the night sky.",
            "score": 0.18
        },
        {
            "id": "1",
            "title": "Gemini",
            "content": "The Gemini zodiac symbol often depicts two figures standing side-by-side.",
            "score": 0.05
        }
    ]
}

Python

For more information, see the Vertex AI Agent Builder Python API reference documentation.

To authenticate to Vertex AI Agent Builder, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from google.cloud import discoveryengine_v1alpha as discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"

client = discoveryengine.RankServiceClient()

# The full resource name of the ranking config.
# Format: projects/{project_id}/locations/{location}/rankingConfigs/default_ranking_config
ranking_config = client.ranking_config_path(
    project=project_id,
    location="global",
    ranking_config="default_ranking_config",
)
request = discoveryengine.RankRequest(
    ranking_config=ranking_config,
    model="semantic-ranker-512@latest",
    top_n=10,
    query="What is Google Gemini?",
    records=[
        discoveryengine.RankingRecord(
            id="1",
            title="Gemini",
            content="The Gemini zodiac symbol often depicts two figures standing side-by-side.",
        ),
        discoveryengine.RankingRecord(
            id="2",
            title="Gemini",
            content="Gemini is a cutting edge large language model created by Google.",
        ),
        discoveryengine.RankingRecord(
            id="3",
            title="Gemini Constellation",
            content="Gemini is a constellation that can be seen in the night sky.",
        ),
    ],
)

response = client.rank(request=request)

# Handle the response
print(response)

Supported models

The following models are available.

Model name	Latest model (`semantic-ranker-512@latest`)	Input	Context window	Release date	Discontinuation date
`semantic-ranker-512-003`	Yes	Text (25 languages)	512	September 10, 2024	To be determined
`semantic-ranker-512-002`	No	Text (en only)	512	June 3, 2024	To be determined

What's next

Learn how to use the ranking method with other RAG APIs to generate grounded answers from unstructured data.