> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
> Use this file to discover all available pages before exploring further.

# How Agnes works

> The end-to-end request lifecycle from your application through the analyzer pipeline and back.

This page walks through what happens when your code calls
`POST /api/v1/analyze/`. It is the right starting point for
anyone designing a policy, debugging a surprising decision, or
reasoning about latency and cost.

If you only need to ship, the [Quickstart](/get-started/quickstart) and
[Combined analyzer](/concepts/combined-analyzer) pages are enough.

***

## The 30-second version

```mermaid theme={null}
flowchart LR
    SDK[Your code / SDK] -->|"POST /api/v1/analyze/"| API[Agnes API]
    API --> Mw[Middleware: request id, tenant, idempotency, rate limit]
    Mw --> Engine[Execution engine]
    Engine --> Plan[Resolve policy + execution plan]
    Plan --> Analyzers[Run analyzers in steps]
    Analyzers --> Term[Apply termination rules]
    Term --> Decision[Decision + reasons + metrics]
    Decision --> SDK
    Engine --> Logs[Structured analyzer log]
```

Three things to remember:

1. **Policies drive everything.** The policy you reference (by `slug` or
   `id`) defines which analyzers run, in what order, and what terminates
   the run.
2. **Termination is explicit.** Analyzers do not "block" on their own.
   Termination rules in the policy decide whether a metric or output
   match short-circuits the pipeline.
3. **Errors are typed.** Every error response carries the canonical
   envelope with a stable `code`, a `request_id`, and a link to the
   matching [`/errors`](/errors/overview) page.

## Components in play

```mermaid theme={null}
flowchart TB
    subgraph Edge[" "]
        Frontend[agnes.lasscyber.com<br/>React dashboard]
        SDKs[Python / TypeScript SDKs]
    end
    subgraph Cloud[Google Cloud Run]
        API[api.lasscyber.com<br/>FastAPI service]
        Model[model_service<br/>L4 GPU]
        DB[(Cloud SQL Postgres + pgvector)]
    end
    subgraph Upstream[External APIs]
        DLP[Google Cloud DLP]
        NLP[Google Cloud Natural Language]
        WebRisk[Google Web Risk]
        Vertex[Vertex AI Embeddings]
    end
    Frontend -->|JWT| API
    SDKs -->|API key| API
    API --> DB
    API -->|prompt injection / ShieldGemma| Model
    API --> DLP
    API --> NLP
    API --> WebRisk
    API --> Vertex
```

* The **API service** is the only customer-facing component. It is
  multi-tenant, async-first FastAPI, deployed to Cloud Run.
* The **model service** is internal-only and runs the GPU-backed
  analyzers (prompt-injection classifiers and ShieldGemma via vLLM) on
  L4 16 GB GPU instances.
* **Postgres + pgvector** stores tenants, users, policies, rules, embeddings,
  usage, and audit data.
* **Google Cloud APIs** back the DLP, NLP, Web Risk, and embedding
  analyzers.

See [Architecture](/concepts/architecture) for the full deployment
picture.

## Request lifecycle

The path of a single `analyze` call:

### 1. Transport and middleware

Every request hits a stack of middleware before any business logic runs:

| Middleware     | Purpose                                                                                     |
| -------------- | ------------------------------------------------------------------------------------------- |
| Request ID     | Generates `X-Request-ID` and threads it through logs and the response body.                 |
| Tenant context | Resolves the tenant from the API key (or JWT) and rejects mismatched `X-Tenant-ID` headers. |
| Idempotency    | Honours `Idempotency-Key` for retry-safe operations.                                        |
| Rate limiter   | Per-tenant, per-route token-bucket. Surfaces `X-RateLimit-*` headers.                       |

A 401 / 403 / 429 / 413 here never reaches the engine; the response is
returned directly with a typed error envelope.

### 2. Policy resolution

The engine looks up the policy you referenced. A policy is the source of
truth for a run; it bundles:

* `available_analyzers` — which analyzers may run, with parameter values
  (e.g. which model ID for prompt-injection or safety).
* `execution_plan` — an ordered list of **steps**. Each step is either
  `sequential` (analyzers run one after another) or `asynchronous`
  (`asyncio.gather` — analyzers run in parallel).
* `termination_conditions` — output-pattern matches and metric thresholds
  that short-circuit the run when met.

If you do not pass a policy, Agnes uses your tenant's `is_default`
inbound policy. The shipped default is `default-inbound`, defined in
[`api/data/policy_configs/default_config.json`](https://github.com/lasscyber/agnes-docs/tree/main/policy-fixtures);
see [Combined analyzer](/concepts/combined-analyzer) for the full
walkthrough.

### 3. Step-by-step execution

The engine iterates the `execution_plan`:

* **Sequential steps** stop on the first analyzer that errors. The whole
  run's overall status flips to `ERROR` and the response surfaces the
  failing analyzer.
* **Asynchronous steps** run all listed analyzers concurrently. If any
  one errors, the overall run is `ERROR`, but every analyzer in the
  group still gets a chance to produce its result.
* After every successful analyzer output, the engine evaluates that
  analyzer's termination rules. If a rule matches, the run is marked
  `TERMINATED_EARLY` and the engine returns immediately.

Every analyzer reports:

* A structured **output** specific to the analyzer (e.g.
  `INJECTION/JAILBREAK` or `SAFE` for prompt injection; an array of
  category violations for safety; an array of findings for SDP).
* A **metrics** map (e.g. `score`, `inference_time_ms`,
  `findings_count`).
* A **status** of `OK`, `TERMINATED_EARLY`, or `ERROR`.

### 4. Termination rules

Two kinds of termination signals fire:

* **Output match** — a regex against the analyzer's structured output
  (e.g. `output_match: "INJECTION/JAILBREAK"` for prompt injection).
* **Threshold** — a comparison on a metric (e.g. `score >= 0.85` on
  prompt injection).

When `logical_operator` is `AND`, both signals must hold to terminate.
With `OR` either is enough. The action on match is one of:

* `terminate_immediately` — stop the pipeline, return a "blocked"
  decision.
* `proceed_to_next_step` — continue, but mark the analyzer as having
  raised a flag.

`default-inbound` uses `terminate_immediately` for high-confidence
prompt-injection (`score >= 0.85`), unsafe safety output, and any DLP /
URL / YARA finding. See
[Combined analyzer](/concepts/combined-analyzer) for the annotated JSON.

### 5. Decision and response

The response body packages everything together:

* `overall_status` — `OK`, `TERMINATED_EARLY`, or `ERROR`.
* `terminated_early` — boolean (also derivable from `overall_status`).
* `analyzer_results` — per-analyzer output, metrics, and status.
* `aggregated_metrics` — totals (e.g. summed cost, total processing
  time).
* `request_id` — echoes the response header.

The Python SDK wraps this as a `Decision` object with `allowed`,
`blocked_by`, `reasons`, and `request_id` for the common case; the full
server response is on `decision.raw`. The TypeScript SDK is the
camelCase mirror. See [Interpreting results](/threat-analysis/interpreting-results).

### 6. Logging and observability

Independently of the response, the engine emits a **structured analyzer
log line** capturing the policy, the analyzers run, the decision, and
the request ID. Lines flow into Loguru (always) and optionally
Elasticsearch (when configured).

These records back the in-app
[Analysis Log](/threat-analysis/analysis-logs) and the threat-summary
charts. Sandbox traffic carries an `X-Agnes-Test-Mode: true` header so
you can exclude it from billing dashboards.

## Failure modes

A small set of well-defined failure modes covers everything Agnes can
return:

| What happened                              | HTTP | `code`                 | Behaviour                                        |
| ------------------------------------------ | ---- | ---------------------- | ------------------------------------------------ |
| Bad credentials                            | 401  | `unauthorized`         | Refresh the key, do not retry.                   |
| Wrong tenant or scope                      | 403  | `forbidden`            | Fix permissions; do not retry.                   |
| Body schema invalid                        | 422  | `validation_error`     | Fix the request.                                 |
| Body too large                             | 413  | `payload_too_large`    | Trim before retrying.                            |
| Idempotency key reused with different body | 409  | `idempotency_conflict` | Use a fresh key.                                 |
| Rate limit hit                             | 429  | `rate_limit_exceeded`  | Honour `Retry-After`.                            |
| Specific analyzer dependency degraded      | 503  | `analyzer_unavailable` | Honour `Retry-After`; SDK retries automatically. |
| Generic transient                          | 503  | `service_unavailable`  | Retry with backoff.                              |
| Unexpected                                 | 500  | `internal_error`       | Quote `request_id` when reporting.               |

The full table and per-code pages live under [Errors](/errors/overview).

## What this means for you

* **Order your policy.** Cheap fast analyzers (YARA, URL, SDP) belong
  in early steps; expensive GPU classifiers (prompt-injection, safety)
  belong after them so termination short-circuits cost.
* **Lean on `Idempotency-Key`** for write endpoints (policies, rules,
  keys). The middleware deduplicates retries automatically.
* **Watch `X-Request-ID`.** Both SDKs surface it on every response and
  every error. Quote it on every support ticket.
* **Pin `Agnes-Version`.** It immunizes you against future minor
  contract changes. See [Versioning](/sdks/versioning).

## Next

* [Combined analyzer](/concepts/combined-analyzer) — the hero endpoint
  in detail with an annotated `default-inbound` policy.
* [Architecture](/concepts/architecture) — Cloud Run, GPUs, Cloud SQL,
  and the data plane.
* [Analyzers overview](/analyzers/overview) — what each analyzer does
  and when to enable it.
