How Agnes works

This page walks through what happens when your code calls POST /api/v1/analyze/. It is the right starting point for anyone designing a policy, debugging a surprising decision, or reasoning about latency and cost. If you only need to ship, the Quickstart and Combined analyzer pages are enough.

The 30-second version

Three things to remember:

Policies drive everything. The policy you reference (by slug or id) defines which analyzers run, in what order, and what terminates the run.
Termination is explicit. Analyzers do not “block” on their own. Termination rules in the policy decide whether a metric or output match short-circuits the pipeline.
Errors are typed. Every error response carries the canonical envelope with a stable code, a request_id, and a link to the matching /errors page.

Components in play

The API service is the only customer-facing component. It is multi-tenant, async-first FastAPI, deployed to Cloud Run.
The model service is internal-only and runs the GPU-backed analyzers (prompt-injection classifiers and ShieldGemma via vLLM) on L4 16 GB GPU instances.
Postgres + pgvector stores tenants, users, policies, rules, embeddings, usage, and audit data.
Google Cloud APIs back the DLP, NLP, Web Risk, and embedding analyzers.

See Architecture for the full deployment picture.

Request lifecycle

The path of a single analyze call:

1. Transport and middleware

Every request hits a stack of middleware before any business logic runs:

Middleware	Purpose
Request ID	Generates `X-Request-ID` and threads it through logs and the response body.
Tenant context	Resolves the tenant from the API key (or JWT) and rejects mismatched `X-Tenant-ID` headers.
Idempotency	Honours `Idempotency-Key` for retry-safe operations.
Rate limiter	Per-tenant, per-route token-bucket. Surfaces `X-RateLimit-*` headers.

A 401 / 403 / 429 / 413 here never reaches the engine; the response is returned directly with a typed error envelope.

2. Policy resolution

The engine looks up the policy you referenced. A policy is the source of truth for a run; it bundles:

available_analyzers — which analyzers may run, with parameter values (e.g. which model ID for prompt-injection or safety).
execution_plan — an ordered list of steps. Each step is either sequential (analyzers run one after another) or asynchronous (asyncio.gather — analyzers run in parallel).
termination_conditions — output-pattern matches and metric thresholds that short-circuit the run when met.

If you do not pass a policy, Agnes uses your tenant’s is_default inbound policy. The shipped default is default-inbound, defined in api/data/policy_configs/default_config.json; see Combined analyzer for the full walkthrough.

3. Step-by-step execution

The engine iterates the execution_plan:

Sequential steps stop on the first analyzer that errors. The whole run’s overall status flips to ERROR and the response surfaces the failing analyzer.
Asynchronous steps run all listed analyzers concurrently. If any one errors, the overall run is ERROR, but every analyzer in the group still gets a chance to produce its result.
After every successful analyzer output, the engine evaluates that analyzer’s termination rules. If a rule matches, the run is marked TERMINATED_EARLY and the engine returns immediately.

Every analyzer reports:

A structured output specific to the analyzer (e.g. INJECTION/JAILBREAK or SAFE for prompt injection; an array of category violations for safety; an array of findings for SDP).
A metrics map (e.g. score, inference_time_ms, findings_count).
A status of OK, TERMINATED_EARLY, or ERROR.

4. Termination rules

Two kinds of termination signals fire:

Output match — a regex against the analyzer’s structured output (e.g. output_match: "INJECTION/JAILBREAK" for prompt injection).
Threshold — a comparison on a metric (e.g. score >= 0.85 on prompt injection).

When logical_operator is AND, both signals must hold to terminate. With OR either is enough. The action on match is one of:

terminate_immediately — stop the pipeline, return a “blocked” decision.
proceed_to_next_step — continue, but mark the analyzer as having raised a flag.

default-inbound uses terminate_immediately for high-confidence prompt-injection (score >= 0.85), unsafe safety output, and any DLP / URL / YARA finding. See Combined analyzer for the annotated JSON.

5. Decision and response

The response body packages everything together:

overall_status — OK, TERMINATED_EARLY, or ERROR.
terminated_early — boolean (also derivable from overall_status).
analyzer_results — per-analyzer output, metrics, and status.
aggregated_metrics — totals (e.g. summed cost, total processing time).
request_id — echoes the response header.

The Python SDK wraps this as a Decision object with allowed, blocked_by, reasons, and request_id for the common case; the full server response is on decision.raw. The TypeScript SDK is the camelCase mirror. See Interpreting results.

6. Logging and observability

Independently of the response, the engine emits a structured analyzer log line capturing the policy, the analyzers run, the decision, and the request ID. Lines flow into Loguru (always) and optionally Elasticsearch (when configured). These records back the in-app Analysis Log and the threat-summary charts. Sandbox traffic carries an X-Agnes-Test-Mode: true header so you can exclude it from billing dashboards.

Failure modes

A small set of well-defined failure modes covers everything Agnes can return:

What happened	HTTP	`code`	Behaviour
Bad credentials	401	`unauthorized`	Refresh the key, do not retry.
Wrong tenant or scope	403	`forbidden`	Fix permissions; do not retry.
Body schema invalid	422	`validation_error`	Fix the request.
Body too large	413	`payload_too_large`	Trim before retrying.
Idempotency key reused with different body	409	`idempotency_conflict`	Use a fresh key.
Rate limit hit	429	`rate_limit_exceeded`	Honour `Retry-After`.
Specific analyzer dependency degraded	503	`analyzer_unavailable`	Honour `Retry-After`; SDK retries automatically.
Generic transient	503	`service_unavailable`	Retry with backoff.
Unexpected	500	`internal_error`	Quote `request_id` when reporting.

The full table and per-code pages live under Errors.

What this means for you

Order your policy. Cheap fast analyzers (YARA, URL, SDP) belong in early steps; expensive GPU classifiers (prompt-injection, safety) belong after them so termination short-circuits cost.
Lean on Idempotency-Key for write endpoints (policies, rules, keys). The middleware deduplicates retries automatically.
Watch X-Request-ID. Both SDKs surface it on every response and every error. Quote it on every support ticket.
Pin Agnes-Version. It immunizes you against future minor contract changes. See Versioning.

Combined analyzer — the hero endpoint in detail with an annotated default-inbound policy.
Architecture — Cloud Run, GPUs, Cloud SQL, and the data plane.
Analyzers overview — what each analyzer does and when to enable it.

Get started

Concepts

Analyzers

Policies

Threat analysis

Testing

Administration

How Agnes works

The 30-second version

Components in play

Request lifecycle

1. Transport and middleware

2. Policy resolution

3. Step-by-step execution

4. Termination rules

5. Decision and response

6. Logging and observability

Failure modes

What this means for you

Next

Get started

Concepts

Analyzers

Policies

Threat analysis

Testing

Administration

Documentation Index

​The 30-second version

​Components in play

​Request lifecycle

​1. Transport and middleware

​2. Policy resolution

​3. Step-by-step execution

​4. Termination rules

​5. Decision and response

​6. Logging and observability

​Failure modes

​What this means for you

​Next

The 30-second version

Components in play

Request lifecycle

1. Transport and middleware

2. Policy resolution

3. Step-by-step execution

4. Termination rules

5. Decision and response

6. Logging and observability

Failure modes

What this means for you

Next