> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Architecture

> How Agnes is deployed, the major components, and how data flows between them.

This page is the operator-level view of Agnes. It covers the deployed
components, the data plane, and the security boundaries that customers
care about (where their data goes, what's persisted, and what crosses
external API boundaries).

If you only need to call the API, you can safely skip this page; the
[Quickstart](/get-started/quickstart) is enough.

***

## Components

Agnes has four runtime components plus a small set of external upstream
APIs.

```mermaid theme={null}
flowchart TB
    subgraph customer["Customer environment"]
        app[Customer application]
        sdk[Python / TypeScript SDK]
    end

    subgraph public["Public surfaces"]
        front["agnes.lasscyber.com<br/>(React dashboard)"]
        api["api.lasscyber.com<br/>(FastAPI)"]
    end

    subgraph internal["Internal — not customer-reachable"]
        model["model_service<br/>(L4 GPU Cloud Run)"]
        db[("Cloud SQL Postgres<br/>+ pgvector")]
    end

    subgraph google["Google Cloud APIs"]
        dlp["Cloud DLP / SDP"]
        nlp["Cloud Natural Language"]
        webrisk["Cloud Web Risk"]
        vertex["Vertex AI Embeddings"]
    end

    auth0["Auth0"]
    stripe["Stripe"]
    sendgrid["SendGrid"]
    sentry["Sentry"]
    statuspage["status.lasscyber.com<br/>(Better Stack)"]

    app --> sdk
    sdk --> api
    front --> api
    front --> auth0
    api --> auth0
    api --> db
    api --> model
    api --> dlp
    api --> nlp
    api --> webrisk
    api --> vertex
    api --> stripe
    api --> sendgrid
    front --> sentry
    api --> sentry
```

### `agnes.lasscyber.com` — frontend dashboard

A React + TypeScript SPA that customers use to manage tenants, users,
roles, API keys, policies, YARA rules, SDP and safety policies, billing,
and analysis history. It authenticates via Auth0 and never holds API
keys; all programmatic calls go through bearer-token API keys minted
from this UI.

The frontend ships a thin in-app help link to this docs site. It does
**not** embed docs.

### `api.lasscyber.com` — API service

The customer-facing FastAPI application. Async-first, deployed to Google
Cloud Run with autoscaling. Every product endpoint lives under
`/api/v1/`. Health endpoints live at `/health`, `/healthz`, and the
mirror under `/api/v1/`.

The API is the only component that talks to:

* the Postgres database,
* the model service,
* Google Cloud DLP / NLP / Web Risk / Vertex,
* Auth0 management,
* Stripe and SendGrid.

### `model_service` — internal inference

A separate Cloud Run service running on **L4 16 GB GPU** instances.
Hosts:

* The prompt-injection / jailbreak BERT classifiers
  (`Llama-Prompt-Guard-2`, `DeBERTa-v3` injection v2, ONNX defenders).
* The safety LLM-as-a-judge (`google/shieldgemma-2b`,
  `google/shieldgemma-9b`, `google/shieldgemma-27b`) via vLLM.

The API service authenticates to the model service with an internal
service account; the model service is not reachable from the public
internet. If you ever see a model-service hostname in a URL, that is a
bug — only the API proxies through.

### Postgres + pgvector

A single Cloud SQL Postgres instance backs the API:

* Tenants, users, roles, invitations, audit events.
* API keys (hashed, never the raw value).
* Policies, YARA rules and policies, SDP policies, safety policies.
* Threat intelligence embeddings (`PromptInjection` table, 768-dim
  pgvector column). Queries use cosine similarity.
* Billing artifacts (subscription state, usage rollups, plan catalog).
* Idempotency keys and rate-limit counters.

All tables that hold customer data carry a `tenant_id` column and every
query is scoped by that column. Cross-tenant reads are not possible from
the public API surface.

## External upstream APIs

These are called by the API service when an analyzer needs them. They
each have their own pricing, latency, and reliability profile, and Agnes
surfaces failures as `analyzer_unavailable` (HTTP 503) so SDK clients
retry safely.

| Upstream                                               | Used by                                                                       |
| ------------------------------------------------------ | ----------------------------------------------------------------------------- |
| **Google Cloud DLP / SDP**                             | Sensitive Data Protection analyzer.                                           |
| **Google Cloud Natural Language**                      | Natural Language analyzer.                                                    |
| **Google Web Risk**                                    | URL Risk analyzer.                                                            |
| **Google Vertex AI Embeddings** (`text-embedding-004`) | Semantic Threat Intelligence analyzer.                                        |
| **Auth0**                                              | Web dashboard authentication; API key auth never touches Auth0.               |
| **Stripe**                                             | Subscription, metered usage, invoices, customer portal.                       |
| **SendGrid**                                           | Transactional email (verification, billing alerts, support tickets).          |
| **Sentry**                                             | Error reporting (only to operators; never customer payloads).                 |
| **Better Stack (status page)**                         | Real-time API health at [status.lasscyber.com](https://status.lasscyber.com). |

## Where customer data goes

Where text passed to `analyze` ends up depends on which analyzers run:

| Analyzer                     | Sends prompt to                                                                                        |
| ---------------------------- | ------------------------------------------------------------------------------------------------------ |
| Prompt Injection & Jailbreak | The internal `model_service`; never leaves Google Cloud.                                               |
| Safety & Responsible AI      | Same — internal model service.                                                                         |
| Sensitive Data Protection    | Google Cloud DLP.                                                                                      |
| Natural Language             | Google Cloud Natural Language.                                                                         |
| URL Risk                     | Extracts URLs locally, then queries Google Web Risk for each URL (URL only, not the surrounding text). |
| YARA                         | Stays in the API service — pure local matching against compiled rules.                                 |
| Semantic Threat Intelligence | Sends the prompt to Vertex AI for embedding, compares against pgvector locally.                        |

Sandbox keys (`ak_test_*`) bypass *all* upstream calls; the
[`TestModeStubProvider`](/testing/sandbox-mode) returns deterministic
canned results.

## Network and tenancy

* **Single global region.** Customer-facing services run in one Google
  Cloud region. Multi-region is on the roadmap; ask sales if it
  matters for compliance.
* **TLS everywhere.** Public endpoints use Google-managed certificates.
  Internal calls (API → model\_service, API → Cloud SQL, API → Cloud
  APIs) ride VPC Peering / Private Service Connect where available.
* **No customer egress.** Agnes never calls back into customer
  infrastructure. Webhook-style notifications (e.g. policy decision
  events) are pull-based via the analyzer log API.
* **Tenant isolation at every layer.** Tenant ID is set by middleware
  from the API key (or JWT), not from request bodies. The database
  layer rejects cross-tenant reads by query construction.

## Deployment surface

The repo's `infrastructure/` directory holds Docker, Terraform, and
Cloud Run config. Customers do not deploy Agnes themselves; this is
documented for operator-facing audits and security reviews. Reach out
to [`security@lasscyber.com`](mailto:security@lasscyber.com) if you need
a SOC 2-style packet, signed pentest report, or DPA.

## What is *not* deployed

A few things are intentionally absent so the threat surface stays small:

* **No customer model hosting.** Agnes does not run your model. Bring
  your own LLM (OpenAI, Anthropic, Google, self-hosted, …). Agnes's
  optional [OpenAI integration](/sdks/python#openai-drop-in) is a
  client-side wrapper, not a hosted endpoint.
* **No long-lived plaintext storage** of analyzed prompts. Structured
  decision metadata is logged; the raw prompt is not persisted unless
  you explicitly ingest it into the threat-intel store via
  [Workbench](/threat-analysis/analysis-logs).
* **No third-party data brokers.** Threat intel comes from public
  research datasets and customer-supplied data, never resold third-party
  feeds.

## Next

* [How Agnes works](/concepts/how-agnes-works) — request lifecycle.
* [Combined analyzer](/concepts/combined-analyzer) — the hero endpoint.
* [Errors](/errors/overview) — the canonical error envelope and what
  503s mean.
