Analyzers overview

Agnes ships seven analyzers. Each is a self-contained detector with its own implementation (BERT classifiers, LLM-as-judge, Google Cloud APIs, the YARA engine, vector similarity). You compose them inside a combined policy; your code calls the combined policy, never an analyzer directly.

The seven canonical analyzers

Canonical name	Friendly name	What it detects	Implementation
`prompt-injection-jailbreak`	Prompt Injection & Jailbreak Detection	Injection / jailbreak prompts	BERT-family classifiers (Llama Prompt Guard, DeBERTa, ONNX defenders) on the model service
`safe-responsible-ai`	Safety & Responsible AI Guardrails	Hate, harassment, dangerous content, sexual content	ShieldGemma 2B / 9B / 27B via vLLM
`sensitive-data`	Sensitive Data Protection	PII, PHI, credentials, financial identifiers	Google Cloud DLP / SDP
`natural-language`	Natural Language Analysis	Sentiment, entities, topics, moderation	Google Cloud Natural Language
`url-risk`	Malicious URL Detection	Malware, phishing, unwanted-software URLs	URL extraction + Google Web Risk
`yara`	Yara Rule Enforcement	Customer-defined patterns	YARA engine, in-process
`semantic-threat-intelligence`	Semantic Threat Intelligence	Paraphrased / novel attacks similar to known ones	Vertex AI embeddings + pgvector cosine similarity

The canonical name (kebab-case) is what the SDKs and dashboard use. The Python SDK exposes them in snake_case, the TypeScript SDK in camelCase. Server-side keys (e.g. adversarial_detection_analyzer, safety_moderation_analyzer) are an internal detail you’ll only see in raw policy JSON.

When to enable each

Reach for analyzers in this order if you are starting from a blank policy:

Always-on cheap signals: yara, url-risk, sensitive-data. They cost milliseconds and catch a large class of attacks before you spend GPU on the classifier.
Adversarial classifier: prompt-injection-jailbreak. The single most valuable analyzer for inbound traffic. Use the 22M model for latency, 86M for accuracy on hard cases.
Safety judge: safe-responsible-ai. Most useful on outbound responses. On inbound it overlaps with the prompt-injection classifier; on outbound it earns its keep catching unsafe model completions.
Threat intel: semantic-threat-intelligence. Fast (vector compare) but adds Vertex embedding cost. Worth turning on once you have ingested known-bad prompts via the Workbench.
Linguistic signals: natural-language. Useful as a flag rather than a block; sentiment and entity counts are noisy termination signals.

The default-inbound policy uses analyzers 1–3; default-outbound swaps the order to weight the safety judge first. See Combined analyzer for both.

What each page covers

Every analyzer page below is structured the same way:

What it detects — the threat model.
How it works — the implementation, model IDs, and external dependencies.
Parameters — config knobs you can set in a policy.
Outputs and metrics — what shows up in analyzer_results and the metrics map.
Termination signals — boolean and match signals available for termination rules.
Limits and cost — request budget, tokens, and where the cost goes.
When to use it — guidance on inbound vs outbound, latency, noisy categories.

Source of truth

The analyzer catalog at api/data/analyzer_catalog.py is the single source of truth for parameters, metrics, model IDs, and limits. The dashboard policy editor and these docs both render from that catalog. If you spot a mismatch, file a ticket — the catalog is canon.

Get started

Concepts

Analyzers

Policies

Threat analysis

Testing

Administration

Analyzers overview

The seven canonical analyzers

When to enable each

What each page covers

Source of truth

Get started

Concepts

Analyzers

Policies

Threat analysis

Testing

Administration

Documentation Index

​The seven canonical analyzers

​When to enable each

​What each page covers

​Source of truth

The seven canonical analyzers

When to enable each

What each page covers

Source of truth