The Gemini API provides safety settings that you can adjust during the prototyping stage to determine if your application requires more or less restrictive safety configuration. You can adjust these settings across four filter categories to restrict or allow certain types of content.
This guide covers how the Gemini API handles safety settings and filtering and how you can change the safety settings for your application.
Safety filters
The Gemini API's adjustable safety filters cover the following categories:
Category | Description |
---|---|
Harassment | Negative or harmful comments targeting identity and/or protected attributes. |
Hate speech | Content that is rude, disrespectful, or profane. |
Sexually explicit | Contains references to sexual acts or other lewd content. |
Dangerous | Promotes, facilitates, or encourages harmful acts. |
Civic integrity | Election-related queries. |
You can use these filters to adjust what's appropriate for your use case. For example, if you're building video game dialogue, you may deem it acceptable to allow more content that's rated as Dangerous due to the nature of the game.
In addition to the adjustable safety filters, the Gemini API has built-in protections against core harms, such as content that endangers child safety. These types of harm are always blocked and cannot be adjusted.
Content safety filtering level
The Gemini API categorizes the probability level of content being unsafe as
HIGH
, MEDIUM
, LOW
, or NEGLIGIBLE
.
The Gemini API blocks content based on the probability of content being unsafe and not the severity. This is important to consider because some content can have low probability of being unsafe even though the severity of harm could still be high. For example, comparing the sentences:
- The robot punched me.
- The robot slashed me up.
The first sentence might result in a higher probability of being unsafe, but you might consider the second sentence to be a higher severity in terms of violence. Given this, it is important that you carefully test and consider what the appropriate level of blocking is needed to support your key use cases while minimizing harm to end users.
Safety filtering per request
You can adjust the safety settings for each request you make to the API. When
you make a request, the content is analyzed and assigned a safety rating. The
safety rating includes the category and the probability of the harm
classification. For example, if the content was blocked due to the harassment
category having a high probability, the safety rating returned would have
category equal to HARASSMENT
and harm probability set to HIGH
.
By default, safety settings block content (including prompts) with medium or higher probability of being unsafe across any filter. This baseline safety is designed to work for most use cases, so you should only adjust your safety settings if it's consistently required for your application.
The following table describes the block settings you can adjust for each category. For example, if you set the block setting to Block few for the Hate speech category, everything that has a high probability of being hate speech content is blocked. But anything with a lower probability is allowed.
Threshold (Google AI Studio) | Threshold (API) | Description |
---|---|---|
Block none | BLOCK_NONE |
Always show regardless of probability of unsafe content |
Block few | BLOCK_ONLY_HIGH |
Block when high probability of unsafe content |
Block some | BLOCK_MEDIUM_AND_ABOVE |
Block when medium or high probability of unsafe content |
Block most | BLOCK_LOW_AND_ABOVE |
Block when low, medium or high probability of unsafe content |
N/A | HARM_BLOCK_THRESHOLD_UNSPECIFIED |
Threshold is unspecified, block using default threshold |
If the threshold is not set, the default block threshold is Block most (for
gemini-1.5-pro-002
and gemini-1.5-flash-002
only) or Block some (in all
other models) for all categories except the Civic integrity category.
The default block threshold for the Civic integrity category is Block most when sending prompts using Google AI Studio, and Block none when using the Gemini API directly.
You can set these settings for each request you make to the generative service.
See the HarmBlockThreshold
API
reference for details.
Safety feedback
generateContent
returns a
GenerateContentResponse
which
includes safety feedback.
Prompt feedback is included in
promptFeedback
. If
promptFeedback.blockReason
is set, then the content of the prompt was blocked.
Response candidate feedback is included in
Candidate.finishReason
and
Candidate.safetyRatings
. If response
content was blocked and the finishReason
was SAFETY
, you can inspect
safetyRatings
for more details. The content that was blocked is not returned.
Adjust safety settings
This section covers how to adjust the safety settings in both Google AI Studio and in your code.
Google AI Studio
You can adjust safety settings in Google AI Studio, but you cannot turn them off.
Click Edit safety settings in the Run settings panel to open the Run safety settings modal. In the modal, you can use the sliders to adjust the content filtering level per safety category:
When you send a request (for example, by asking the model a question), a
No Content message appears if the request's content is blocked. To see more details, hold the pointer over the No Content text and click Safety.Gemini API SDKs
The following code snippet shows how to set safety settings in your
GenerateContent
call. This sets the thresholds for the harassment
(HARM_CATEGORY_HARASSMENT
) and hate speech (HARM_CATEGORY_HATE_SPEECH
)
categories. For example, setting these categories to BLOCK_LOW_AND_ABOVE
blocks any content that has a low or higher probability of being harassment or
hate speech. To understand the threshold settings, see
Safety filtering per request.
Python
from google.generativeai.types import HarmCategory, HarmBlockThreshold
model = genai.GenerativeModel(model_name='gemini-1.5-flash')
response = model.generate_content(
['Do these look store-bought or homemade?', img],
safety_settings={
HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
}
)
Go
model := client.GenerativeModel("gemini-1.5-flash")
model.SafetySettings = []*genai.SafetySetting{
{
Category: genai.HarmCategoryHarassment,
Threshold: genai.HarmBlockLowAndAbove,
},
{
Category: genai.HarmCategoryHateSpeech,
Threshold: genai.HarmBlockLowAndAbove,
},
}
Node.js
import { HarmBlockThreshold, HarmCategory } from "@google/generative-ai";
// ...
const safetySettings = [
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
},
];
const model = genAi.getGenerativeModel({ model: "gemini-1.5-flash", safetySettings: safetySettings });
Web
import { HarmBlockThreshold, HarmCategory } from "@google/generative-ai";
// ...
const safetySettings = [
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
},
];
const model = genAi.getGenerativeModel({ model: "gemini-1.5-flash", safetySettings });
Dart (Flutter)
final safetySettings = [
SafetySetting(HarmCategory.harassment, HarmBlockThreshold.low),
SafetySetting(HarmCategory.hateSpeech, HarmBlockThreshold.low),
];
final model = GenerativeModel(
model: 'gemini-1.5-flash',
apiKey: apiKey,
safetySettings: safetySettings,
);
Kotlin
val harassmentSafety = SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.LOW_AND_ABOVE)
val hateSpeechSafety = SafetySetting(HarmCategory.HATE_SPEECH, BlockThreshold.LOW_AND_ABOVE)
val generativeModel = GenerativeModel(
modelName = "gemini-1.5-flash",
apiKey = BuildConfig.apiKey,
safetySettings = listOf(harassmentSafety, hateSpeechSafety)
)
Java
SafetySetting harassmentSafety = new SafetySetting(HarmCategory.HARASSMENT,
BlockThreshold.LOW_AND_ABOVE);
SafetySetting hateSpeechSafety = new SafetySetting(HarmCategory.HATE_SPEECH,
BlockThreshold.LOW_AND_ABOVE);
GenerativeModel gm = new GenerativeModel(
"gemini-1.5-flash",
BuildConfig.apiKey,
null, // generation config is optional
Arrays.asList(harassmentSafety, hateSpeechSafety)
);
GenerativeModelFutures model = GenerativeModelFutures.from(gm);
REST
echo '{
"safetySettings": [
{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH"},
{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_MEDIUM_AND_ABOVE"}
],
"contents": [{
"parts":[{
"text": "'I support Martians Soccer Club and I think Jupiterians Football Club sucks! Write a ironic phrase about them.'"}]}]}' > request.json
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d @request.json 2> /dev/null
Next steps
- See the API reference to learn more about the full API.
- Review the safety guidance for a general look at safety considerations when developing with LLMs.
- Learn more about assessing probability versus severity from the Jigsaw team
- Learn more about the products that contribute to safety solutions like the Perspective API. * You can use these safety settings to create a toxicity classifier. See the classification example to get started.