Skip to main content

Chat: google/gemma-7b-it

Changes to Anyscale Endpoints API

Effective August 1, 2024 Anyscale Endpoints API will be available exclusively through the fully Hosted Anyscale Platform. Multi-tenant access to LLM models will be removed.

With the Hosted Anyscale Platform, you can access the latest GPUs billed by the second, and deploy models on your own dedicated instances. Enjoy full customization to build your end-to-end applications with Anyscale. Get started today.

info

See the Hugging Face model page for more model details.

About this model

Model name to use in API calls:

google/gemma-7b-it

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They're text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop, or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Model Developers: Google

Variations: this model corresponds to the 7B instruct version of the Gemma model. There are also variations of the model that are the 2B base model, 7B base model, and 2B instruct model.

Input Models: input text only.

Output Models: generate text only.

Context Length: 8192

License: a custom commercial license is available at: https://ai.google.dev/gemma/terms

Get started

  1. Register an account. (If you are viewing this from Anyscale Endpoints, you can skip this step.)
  2. Generate an API key.
  3. Run your first query:

See Query a model for more model query details.

import openai

query = "Write a program to load data from S3 with Ray and train using PyTorch."

client = openai.OpenAI(
base_url = "https://api.endpoints.anyscale.com/v1",
api_key = "esecret_ANYSCALE_API_KEY"
)
# Note: not all arguments are currently supported and will be ignored by the backend.
chat_completion = client.chat.completions.create(
model="google/gemma-7b-it",
messages=[{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": query}],
temperature=0.1,
stream=True
)
for message in chat_completion:
print(message.choices[0].delta.content, end="", flush=True)