> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Safety policies

> Author the policy text ShieldGemma uses as a judge in the Safety & Responsible AI analyzer.

A **safety policy** is the *prompt text* the
[Safety & Responsible AI analyzer](/analyzers/safe-responsible-ai)
feeds to ShieldGemma when it acts as the LLM-as-a-judge. It tells the
model which categories are unsafe and how to reason about edge cases.

Agnes ships a sensible default; custom safety policies are how you
adapt to industry-specific guidelines (medical advice safety,
educational content, professional tone).

## Default safety policy

The shipped default covers seven categories:

1. **No Harmful Content** — violence, self-harm, harmful medical
   practices, dangerous or illegal activities.
2. **No Hate Speech or Discrimination** — protected-class
   discrimination, dehumanization.
3. **No Harassment or Bullying** — threats, intimidation, doxxing.
4. **No Sexual Content** — explicit content, sexualization of minors,
   sexual exploitation.
5. **No Misinformation** — harmful conspiracy theories, medical
   misinformation, election misinformation, public-safety
   misinformation.
6. **No Illegal Content** — illegal activity, copyright infringement,
   fraud, illegal goods/services.
7. **No Privacy Violations** — sharing personal information without
   consent, doxxing, privacy-rights violations.

The full text lives in the dashboard at
[`agnes.lasscyber.com/protection/safety-moderation`](https://agnes.lasscyber.com/protection/safety-moderation)
and is the default for every new tenant.

## Authoring a custom safety policy

In the dashboard:

1. Click **New safety policy**.

2. Give it a name and description.

3. Author the `policy_content` markdown. Treat it as a *system prompt
   for a judge model*. The structure that works best:

   ```markdown theme={null}
   # <Safety policy name>

   ## Core Principles

   ### 1. <Category name>
   - <bulleted guideline>
   - <bulleted guideline>

   ### 2. <Category name>
   - …

   ## Content Assessment Guidelines

   When evaluating content, consider:
   1. The intent and context of the content
   2. The potential impact on individuals or groups
   3. Whether the content promotes or enables harmful behaviour

   ## Response Format

   The model should:
   1. Start with a clear "Yes" or "No" answer
   2. Explain which specific principles are relevant
   3. Provide a step-by-step analysis
   ```

4. Optionally mark as **default**. The default policy is used when no
   `policy_id` is set on the safety analyzer.

5. Save.

The analyzer constructs a per-category prompt at runtime, wrapping
your policy text in the ShieldGemma chat template:

```
<start_of_turn>user
You are a policy expert ...

<your policy_content here>

<the input under evaluation>

Does the input violate the above policy?
<end_of_turn>
<start_of_turn>model
```

ShieldGemma replies with a `Yes`/`No` plus a confidence score. Agnes
parses both and exposes the per-category score in
`max_violation_score` and `violation_category_count`.

## What "category" means here

ShieldGemma is *binary per category*: each category gets its own
forward pass and its own `Yes`/`No` verdict. The category names you
use in your policy markdown should match the
[output match hints](/analyzers/safe-responsible-ai#termination-signals)
exactly when you want to terminate on a specific category. The default
policy uses headings that line up with the shipped match hints
(`Dangerous Content`, `Harassment`, `Hate Speech`, `Sexually Explicit`).

If you invent new category names, make sure your termination rules
match the new strings.

## Picking a model

Larger ShieldGemma models tolerate longer policies and longer inputs:

| Model                    | Prompt char limit |
| ------------------------ | ----------------- |
| `google/shieldgemma-2b`  | 8K                |
| `google/shieldgemma-9b`  | 16K               |
| `google/shieldgemma-27b` | 32K               |

If your policy text is long (e.g. detailed industry guidelines), you
may exceed the 2B model's char limit and need to escalate to 9B.

## Worked example: a strict policy

If the default ships too permissive for your audience, a stricter
custom policy might:

* Tighten **Misinformation** to flag medical claims even when
  contextual.
* Add a category **Professional Conduct** that flags personal-life
  questions in a workplace assistant.
* Drop **Sexual Content** if your domain is fully family-friendly
  (the default would still catch it via `Sexually Explicit`).

You can also clone the default and remove categories you do not want
to evaluate; fewer categories = lower cost (each category is an
independent ShieldGemma forward pass).

## Wiring it into a combined policy

The safety analyzer takes a `policy_id` parameter:

```json theme={null}
{
  "name": "safety_moderation_analyzer",
  "params": {
    "model_id": "google/shieldgemma-9b",
    "policy_id": "<uuid-of-safety-policy>"
  }
}
```

There is no per-request safety policy override at this time; pick the
right policy in your combined policy.

## Permissions

| Role   | Read | Create / update | Delete |
| ------ | ---- | --------------- | ------ |
| Owner  | Yes  | Yes             | Yes    |
| Admin  | Yes  | Yes             | Yes    |
| Member | Yes  | Yes             | Yes    |
| Viewer | Yes  | No              | No     |

The relevant scope family is `safety_policy:*`.

## Authoring tips

* **Copy the default** and edit the categories that matter to you;
  do not start from scratch unless you know what you are doing.
* **Lead with the verdict format.** ShieldGemma needs a clear
  instruction to answer `Yes`/`No`; if your prose buries that
  instruction, the model can ramble and the parser will fall back to
  uncertain.
* **Keep categories small and orthogonal.** Five overlapping
  categories will fight each other; three orthogonal categories
  produce cleaner verdicts.
* **Test with the Analyzer Labs page.** The dashboard's
  [Analyzer Labs](https://agnes.lasscyber.com/analyzer) page lets you
  feed sample text through a single analyzer with a chosen safety
  policy; iterate on the policy text until verdicts match expectations
  before promoting it.

## Next

* [Safety analyzer](/analyzers/safe-responsible-ai) — runtime
  metrics, termination signals, model selection.
* [Combined analyzer](/concepts/combined-analyzer) — wire safety into
  termination rules.
