Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt

Use this file to discover all available pages before exploring further.

This page walks through what happens when your code calls POST /api/v1/analyze/. It is the right starting point for anyone designing a policy, debugging a surprising decision, or reasoning about latency and cost. If you only need to ship, the Quickstart and Combined analyzer pages are enough.

The 30-second version

Three things to remember:
  1. Policies drive everything. The policy you reference (by slug or id) defines which analyzers run, in what order, and what terminates the run.
  2. Termination is explicit. Analyzers do not “block” on their own. Termination rules in the policy decide whether a metric or output match short-circuits the pipeline.
  3. Errors are typed. Every error response carries the canonical envelope with a stable code, a request_id, and a link to the matching /errors page.

Components in play

  • The API service is the only customer-facing component. It is multi-tenant, async-first FastAPI, deployed to Cloud Run.
  • The model service is internal-only and runs the GPU-backed analyzers (prompt-injection classifiers and ShieldGemma via vLLM) on L4 16 GB GPU instances.
  • Postgres + pgvector stores tenants, users, policies, rules, embeddings, usage, and audit data.
  • Google Cloud APIs back the DLP, NLP, Web Risk, and embedding analyzers.
See Architecture for the full deployment picture.

Request lifecycle

The path of a single analyze call:

1. Transport and middleware

Every request hits a stack of middleware before any business logic runs:
MiddlewarePurpose
Request IDGenerates X-Request-ID and threads it through logs and the response body.
Tenant contextResolves the tenant from the API key (or JWT) and rejects mismatched X-Tenant-ID headers.
IdempotencyHonours Idempotency-Key for retry-safe operations.
Rate limiterPer-tenant, per-route token-bucket. Surfaces X-RateLimit-* headers.
A 401 / 403 / 429 / 413 here never reaches the engine; the response is returned directly with a typed error envelope.

2. Policy resolution

The engine looks up the policy you referenced. A policy is the source of truth for a run; it bundles:
  • available_analyzers — which analyzers may run, with parameter values (e.g. which model ID for prompt-injection or safety).
  • execution_plan — an ordered list of steps. Each step is either sequential (analyzers run one after another) or asynchronous (asyncio.gather — analyzers run in parallel).
  • termination_conditions — output-pattern matches and metric thresholds that short-circuit the run when met.
If you do not pass a policy, Agnes uses your tenant’s is_default inbound policy. The shipped default is default-inbound, defined in api/data/policy_configs/default_config.json; see Combined analyzer for the full walkthrough.

3. Step-by-step execution

The engine iterates the execution_plan:
  • Sequential steps stop on the first analyzer that errors. The whole run’s overall status flips to ERROR and the response surfaces the failing analyzer.
  • Asynchronous steps run all listed analyzers concurrently. If any one errors, the overall run is ERROR, but every analyzer in the group still gets a chance to produce its result.
  • After every successful analyzer output, the engine evaluates that analyzer’s termination rules. If a rule matches, the run is marked TERMINATED_EARLY and the engine returns immediately.
Every analyzer reports:
  • A structured output specific to the analyzer (e.g. INJECTION/JAILBREAK or SAFE for prompt injection; an array of category violations for safety; an array of findings for SDP).
  • A metrics map (e.g. score, inference_time_ms, findings_count).
  • A status of OK, TERMINATED_EARLY, or ERROR.

4. Termination rules

Two kinds of termination signals fire:
  • Output match — a regex against the analyzer’s structured output (e.g. output_match: "INJECTION/JAILBREAK" for prompt injection).
  • Threshold — a comparison on a metric (e.g. score >= 0.85 on prompt injection).
When logical_operator is AND, both signals must hold to terminate. With OR either is enough. The action on match is one of:
  • terminate_immediately — stop the pipeline, return a “blocked” decision.
  • proceed_to_next_step — continue, but mark the analyzer as having raised a flag.
default-inbound uses terminate_immediately for high-confidence prompt-injection (score >= 0.85), unsafe safety output, and any DLP / URL / YARA finding. See Combined analyzer for the annotated JSON.

5. Decision and response

The response body packages everything together:
  • overall_statusOK, TERMINATED_EARLY, or ERROR.
  • terminated_early — boolean (also derivable from overall_status).
  • analyzer_results — per-analyzer output, metrics, and status.
  • aggregated_metrics — totals (e.g. summed cost, total processing time).
  • request_id — echoes the response header.
The Python SDK wraps this as a Decision object with allowed, blocked_by, reasons, and request_id for the common case; the full server response is on decision.raw. The TypeScript SDK is the camelCase mirror. See Interpreting results.

6. Logging and observability

Independently of the response, the engine emits a structured analyzer log line capturing the policy, the analyzers run, the decision, and the request ID. Lines flow into Loguru (always) and optionally Elasticsearch (when configured). These records back the in-app Analysis Log and the threat-summary charts. Sandbox traffic carries an X-Agnes-Test-Mode: true header so you can exclude it from billing dashboards.

Failure modes

A small set of well-defined failure modes covers everything Agnes can return:
What happenedHTTPcodeBehaviour
Bad credentials401unauthorizedRefresh the key, do not retry.
Wrong tenant or scope403forbiddenFix permissions; do not retry.
Body schema invalid422validation_errorFix the request.
Body too large413payload_too_largeTrim before retrying.
Idempotency key reused with different body409idempotency_conflictUse a fresh key.
Rate limit hit429rate_limit_exceededHonour Retry-After.
Specific analyzer dependency degraded503analyzer_unavailableHonour Retry-After; SDK retries automatically.
Generic transient503service_unavailableRetry with backoff.
Unexpected500internal_errorQuote request_id when reporting.
The full table and per-code pages live under Errors.

What this means for you

  • Order your policy. Cheap fast analyzers (YARA, URL, SDP) belong in early steps; expensive GPU classifiers (prompt-injection, safety) belong after them so termination short-circuits cost.
  • Lean on Idempotency-Key for write endpoints (policies, rules, keys). The middleware deduplicates retries automatically.
  • Watch X-Request-ID. Both SDKs surface it on every response and every error. Quote it on every support ticket.
  • Pin Agnes-Version. It immunizes you against future minor contract changes. See Versioning.

Next