Model Garden Gemma Deployment on Vertex - incomplete documentation about prediction response format #2799

afirstenberg · 2024-03-25T00:51:49Z

Environment

Deployed a gemma-7b-it model on Vertex AI Model Garden using the "Deploy" button from the Gemma card. No additional tuning was done.
I have an instance running on a g2-standard-12 machine with I4 GPU. It is visible in the Online Prediction section of my Cloud Console.
I am able to reach the endpoint without any issues.

Unable to find any good documentation on what needs to be sent to the model and what to get back, I used the "Model Garden Gemma Deployment on Vertex" notebook to try and get an idea. It did provide an example for what to provide to the prompt:

vertex-ai-samples/notebooks/community/model_garden/model_garden_gemma_deployment_on_vertex.ipynb

Line 621 in b37ed6e

"instances = [\n",

However, it does not indicate what to expect for the reply. So it wasn't clear that the reply includes the original prompt as well as part of the reply string and this would need to be parsed out:

{
  "predictions": [
    "Prompt:\nWhat is a car?\nOutput:\nA car is a motor vehicle that is propelled by gasoline. It has four wheels, a steering wheel, and a seat."
  ],
  "deployedModelId": "xxx",
  "model": "projects/111/locations/us-central1/models/gemma-7b-it-google",
  "modelDisplayName": "gemma-7b-it-google",
  "modelVersionId": "1"
}

The documentation should make clear what the output will be.

The text was updated successfully, but these errors were encountered:

gericdong · 2024-03-27T20:55:27Z

@kathyyu-google: could you please assist? Thanks.

kathyyu-google · 2024-09-12T21:25:54Z

Thank you @afirstenberg for this feedback. We will work on clarifying the prediction response format. The example response listed above includes the original prompt ("Prompt: ... Output: ..."). You can control whether the response goes through extra formatting by setting raw_response to True or False in the request.

kathyyu-google · 2024-10-08T18:28:16Z

We have updated the notebook and clarified the prediction response format with the following instructions: "Set raw_response to True to obtain the raw model output. Set raw_response to False to apply additional formatting in the structure of "Prompt:\n{prompt.strip()}\nOutput:\n{output}"."

Marking the issue as closed as the question has been addressed in the notebook. Please reopen if there are any further questions, thank you!

afirstenberg mentioned this issue Mar 25, 2024

Gemma in Model Garden Deployment - confusing section on Chat Applications #2800

Closed

kathyyu-google self-assigned this Sep 12, 2024

kathyyu-google closed this as completed Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Garden Gemma Deployment on Vertex - incomplete documentation about prediction response format #2799

Model Garden Gemma Deployment on Vertex - incomplete documentation about prediction response format #2799

afirstenberg commented Mar 25, 2024

gericdong commented Mar 27, 2024

kathyyu-google commented Sep 12, 2024

kathyyu-google commented Oct 8, 2024

Model Garden Gemma Deployment on Vertex - incomplete documentation about prediction response format #2799

Model Garden Gemma Deployment on Vertex - incomplete documentation about prediction response format #2799

Comments

afirstenberg commented Mar 25, 2024

gericdong commented Mar 27, 2024

kathyyu-google commented Sep 12, 2024

kathyyu-google commented Oct 8, 2024