This page walks through what happens when your code callsDocumentation Index
Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
Use this file to discover all available pages before exploring further.
POST /api/v1/analyze/. It is the right starting point for
anyone designing a policy, debugging a surprising decision, or
reasoning about latency and cost.
If you only need to ship, the Quickstart and
Combined analyzer pages are enough.
The 30-second version
Three things to remember:- Policies drive everything. The policy you reference (by
slugorid) defines which analyzers run, in what order, and what terminates the run. - Termination is explicit. Analyzers do not “block” on their own. Termination rules in the policy decide whether a metric or output match short-circuits the pipeline.
- Errors are typed. Every error response carries the canonical
envelope with a stable
code, arequest_id, and a link to the matching/errorspage.
Components in play
- The API service is the only customer-facing component. It is multi-tenant, async-first FastAPI, deployed to Cloud Run.
- The model service is internal-only and runs the GPU-backed analyzers (prompt-injection classifiers and ShieldGemma via vLLM) on L4 16 GB GPU instances.
- Postgres + pgvector stores tenants, users, policies, rules, embeddings, usage, and audit data.
- Google Cloud APIs back the DLP, NLP, Web Risk, and embedding analyzers.
Request lifecycle
The path of a singleanalyze call:
1. Transport and middleware
Every request hits a stack of middleware before any business logic runs:| Middleware | Purpose |
|---|---|
| Request ID | Generates X-Request-ID and threads it through logs and the response body. |
| Tenant context | Resolves the tenant from the API key (or JWT) and rejects mismatched X-Tenant-ID headers. |
| Idempotency | Honours Idempotency-Key for retry-safe operations. |
| Rate limiter | Per-tenant, per-route token-bucket. Surfaces X-RateLimit-* headers. |
2. Policy resolution
The engine looks up the policy you referenced. A policy is the source of truth for a run; it bundles:available_analyzers— which analyzers may run, with parameter values (e.g. which model ID for prompt-injection or safety).execution_plan— an ordered list of steps. Each step is eithersequential(analyzers run one after another) orasynchronous(asyncio.gather— analyzers run in parallel).termination_conditions— output-pattern matches and metric thresholds that short-circuit the run when met.
is_default
inbound policy. The shipped default is default-inbound, defined in
api/data/policy_configs/default_config.json;
see Combined analyzer for the full
walkthrough.
3. Step-by-step execution
The engine iterates theexecution_plan:
- Sequential steps stop on the first analyzer that errors. The whole
run’s overall status flips to
ERRORand the response surfaces the failing analyzer. - Asynchronous steps run all listed analyzers concurrently. If any
one errors, the overall run is
ERROR, but every analyzer in the group still gets a chance to produce its result. - After every successful analyzer output, the engine evaluates that
analyzer’s termination rules. If a rule matches, the run is marked
TERMINATED_EARLYand the engine returns immediately.
- A structured output specific to the analyzer (e.g.
INJECTION/JAILBREAKorSAFEfor prompt injection; an array of category violations for safety; an array of findings for SDP). - A metrics map (e.g.
score,inference_time_ms,findings_count). - A status of
OK,TERMINATED_EARLY, orERROR.
4. Termination rules
Two kinds of termination signals fire:- Output match — a regex against the analyzer’s structured output
(e.g.
output_match: "INJECTION/JAILBREAK"for prompt injection). - Threshold — a comparison on a metric (e.g.
score >= 0.85on prompt injection).
logical_operator is AND, both signals must hold to terminate.
With OR either is enough. The action on match is one of:
terminate_immediately— stop the pipeline, return a “blocked” decision.proceed_to_next_step— continue, but mark the analyzer as having raised a flag.
default-inbound uses terminate_immediately for high-confidence
prompt-injection (score >= 0.85), unsafe safety output, and any DLP /
URL / YARA finding. See
Combined analyzer for the annotated JSON.
5. Decision and response
The response body packages everything together:overall_status—OK,TERMINATED_EARLY, orERROR.terminated_early— boolean (also derivable fromoverall_status).analyzer_results— per-analyzer output, metrics, and status.aggregated_metrics— totals (e.g. summed cost, total processing time).request_id— echoes the response header.
Decision object with allowed,
blocked_by, reasons, and request_id for the common case; the full
server response is on decision.raw. The TypeScript SDK is the
camelCase mirror. See Interpreting results.
6. Logging and observability
Independently of the response, the engine emits a structured analyzer log line capturing the policy, the analyzers run, the decision, and the request ID. Lines flow into Loguru (always) and optionally Elasticsearch (when configured). These records back the in-app Analysis Log and the threat-summary charts. Sandbox traffic carries anX-Agnes-Test-Mode: true header so
you can exclude it from billing dashboards.
Failure modes
A small set of well-defined failure modes covers everything Agnes can return:| What happened | HTTP | code | Behaviour |
|---|---|---|---|
| Bad credentials | 401 | unauthorized | Refresh the key, do not retry. |
| Wrong tenant or scope | 403 | forbidden | Fix permissions; do not retry. |
| Body schema invalid | 422 | validation_error | Fix the request. |
| Body too large | 413 | payload_too_large | Trim before retrying. |
| Idempotency key reused with different body | 409 | idempotency_conflict | Use a fresh key. |
| Rate limit hit | 429 | rate_limit_exceeded | Honour Retry-After. |
| Specific analyzer dependency degraded | 503 | analyzer_unavailable | Honour Retry-After; SDK retries automatically. |
| Generic transient | 503 | service_unavailable | Retry with backoff. |
| Unexpected | 500 | internal_error | Quote request_id when reporting. |
What this means for you
- Order your policy. Cheap fast analyzers (YARA, URL, SDP) belong in early steps; expensive GPU classifiers (prompt-injection, safety) belong after them so termination short-circuits cost.
- Lean on
Idempotency-Keyfor write endpoints (policies, rules, keys). The middleware deduplicates retries automatically. - Watch
X-Request-ID. Both SDKs surface it on every response and every error. Quote it on every support ticket. - Pin
Agnes-Version. It immunizes you against future minor contract changes. See Versioning.
Next
- Combined analyzer — the hero endpoint
in detail with an annotated
default-inboundpolicy. - Architecture — Cloud Run, GPUs, Cloud SQL, and the data plane.
- Analyzers overview — what each analyzer does and when to enable it.