A safety policy is the prompt text the Safety & Responsible AI analyzer feeds to ShieldGemma when it acts as the LLM-as-a-judge. It tells the model which categories are unsafe and how to reason about edge cases. Agnes ships a sensible default; custom safety policies are how you adapt to industry-specific guidelines (medical advice safety, educational content, professional tone).Documentation Index
Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
Use this file to discover all available pages before exploring further.
Default safety policy
The shipped default covers seven categories:- No Harmful Content — violence, self-harm, harmful medical practices, dangerous or illegal activities.
- No Hate Speech or Discrimination — protected-class discrimination, dehumanization.
- No Harassment or Bullying — threats, intimidation, doxxing.
- No Sexual Content — explicit content, sexualization of minors, sexual exploitation.
- No Misinformation — harmful conspiracy theories, medical misinformation, election misinformation, public-safety misinformation.
- No Illegal Content — illegal activity, copyright infringement, fraud, illegal goods/services.
- No Privacy Violations — sharing personal information without consent, doxxing, privacy-rights violations.
agnes.lasscyber.com/protection/safety-moderation
and is the default for every new tenant.
Authoring a custom safety policy
In the dashboard:- Click New safety policy.
- Give it a name and description.
-
Author the
policy_contentmarkdown. Treat it as a system prompt for a judge model. The structure that works best: -
Optionally mark as default. The default policy is used when no
policy_idis set on the safety analyzer. - Save.
Yes/No plus a confidence score. Agnes
parses both and exposes the per-category score in
max_violation_score and violation_category_count.
What “category” means here
ShieldGemma is binary per category: each category gets its own forward pass and its ownYes/No verdict. The category names you
use in your policy markdown should match the
output match hints
exactly when you want to terminate on a specific category. The default
policy uses headings that line up with the shipped match hints
(Dangerous Content, Harassment, Hate Speech, Sexually Explicit).
If you invent new category names, make sure your termination rules
match the new strings.
Picking a model
Larger ShieldGemma models tolerate longer policies and longer inputs:| Model | Prompt char limit |
|---|---|
google/shieldgemma-2b | 8K |
google/shieldgemma-9b | 16K |
google/shieldgemma-27b | 32K |
Worked example: a strict policy
If the default ships too permissive for your audience, a stricter custom policy might:- Tighten Misinformation to flag medical claims even when contextual.
- Add a category Professional Conduct that flags personal-life questions in a workplace assistant.
- Drop Sexual Content if your domain is fully family-friendly
(the default would still catch it via
Sexually Explicit).
Wiring it into a combined policy
The safety analyzer takes apolicy_id parameter:
Permissions
| Role | Read | Create / update | Delete |
|---|---|---|---|
| Owner | Yes | Yes | Yes |
| Admin | Yes | Yes | Yes |
| Member | Yes | Yes | Yes |
| Viewer | Yes | No | No |
safety_policy:*.
Authoring tips
- Copy the default and edit the categories that matter to you; do not start from scratch unless you know what you are doing.
- Lead with the verdict format. ShieldGemma needs a clear
instruction to answer
Yes/No; if your prose buries that instruction, the model can ramble and the parser will fall back to uncertain. - Keep categories small and orthogonal. Five overlapping categories will fight each other; three orthogonal categories produce cleaner verdicts.
- Test with the Analyzer Labs page. The dashboard’s Analyzer Labs page lets you feed sample text through a single analyzer with a chosen safety policy; iterate on the policy text until verdicts match expectations before promoting it.
Next
- Safety analyzer — runtime metrics, termination signals, model selection.
- Combined analyzer — wire safety into termination rules.