Architecture

This page is the operator-level view of Agnes. It covers the deployed components, the data plane, and the security boundaries that customers care about (where their data goes, what’s persisted, and what crosses external API boundaries). If you only need to call the API, you can safely skip this page; the Quickstart is enough.

Components

Agnes has four runtime components plus a small set of external upstream APIs.

`agnes.lasscyber.com` — frontend dashboard

A React + TypeScript SPA that customers use to manage tenants, users, roles, API keys, policies, YARA rules, SDP and safety policies, billing, and analysis history. It authenticates via Auth0 and never holds API keys; all programmatic calls go through bearer-token API keys minted from this UI. The frontend ships a thin in-app help link to this docs site. It does not embed docs.

`api.lasscyber.com` — API service

The customer-facing FastAPI application. Async-first, deployed to Google Cloud Run with autoscaling. Every product endpoint lives under /api/v1/. Health endpoints live at /health, /healthz, and the mirror under /api/v1/. The API is the only component that talks to:

the Postgres database,
the model service,
Google Cloud DLP / NLP / Web Risk / Vertex,
Auth0 management,
Stripe and SendGrid.

`model_service` — internal inference

A separate Cloud Run service running on L4 16 GB GPU instances. Hosts:

The prompt-injection / jailbreak BERT classifiers (Llama-Prompt-Guard-2, DeBERTa-v3 injection v2, ONNX defenders).
The safety LLM-as-a-judge (google/shieldgemma-2b, google/shieldgemma-9b, google/shieldgemma-27b) via vLLM.

The API service authenticates to the model service with an internal service account; the model service is not reachable from the public internet. If you ever see a model-service hostname in a URL, that is a bug — only the API proxies through.

Postgres + pgvector

A single Cloud SQL Postgres instance backs the API:

Tenants, users, roles, invitations, audit events.
API keys (hashed, never the raw value).
Policies, YARA rules and policies, SDP policies, safety policies.
Threat intelligence embeddings (PromptInjection table, 768-dim pgvector column). Queries use cosine similarity.
Billing artifacts (subscription state, usage rollups, plan catalog).
Idempotency keys and rate-limit counters.

All tables that hold customer data carry a tenant_id column and every query is scoped by that column. Cross-tenant reads are not possible from the public API surface.

External upstream APIs

These are called by the API service when an analyzer needs them. They each have their own pricing, latency, and reliability profile, and Agnes surfaces failures as analyzer_unavailable (HTTP 503) so SDK clients retry safely.

Upstream	Used by
Google Cloud DLP / SDP	Sensitive Data Protection analyzer.
Google Cloud Natural Language	Natural Language analyzer.
Google Web Risk	URL Risk analyzer.
Google Vertex AI Embeddings (`text-embedding-004`)	Semantic Threat Intelligence analyzer.
Auth0	Web dashboard authentication; API key auth never touches Auth0.
Stripe	Subscription, metered usage, invoices, customer portal.
SendGrid	Transactional email (verification, billing alerts, support tickets).
Sentry	Error reporting (only to operators; never customer payloads).
Better Stack (status page)	Real-time API health at status.lasscyber.com.

Where customer data goes

Where text passed to analyze ends up depends on which analyzers run:

Analyzer	Sends prompt to
Prompt Injection & Jailbreak	The internal `model_service`; never leaves Google Cloud.
Safety & Responsible AI	Same — internal model service.
Sensitive Data Protection	Google Cloud DLP.
Natural Language	Google Cloud Natural Language.
URL Risk	Extracts URLs locally, then queries Google Web Risk for each URL (URL only, not the surrounding text).
YARA	Stays in the API service — pure local matching against compiled rules.
Semantic Threat Intelligence	Sends the prompt to Vertex AI for embedding, compares against pgvector locally.

Sandbox keys (ak_test_*) bypass all upstream calls; the TestModeStubProvider returns deterministic canned results.

Network and tenancy

Single global region. Customer-facing services run in one Google Cloud region. Multi-region is on the roadmap; ask sales if it matters for compliance.
TLS everywhere. Public endpoints use Google-managed certificates. Internal calls (API → model_service, API → Cloud SQL, API → Cloud APIs) ride VPC Peering / Private Service Connect where available.
No customer egress. Agnes never calls back into customer infrastructure. Webhook-style notifications (e.g. policy decision events) are pull-based via the analyzer log API.
Tenant isolation at every layer. Tenant ID is set by middleware from the API key (or JWT), not from request bodies. The database layer rejects cross-tenant reads by query construction.

Deployment surface

The repo’s infrastructure/ directory holds Docker, Terraform, and Cloud Run config. Customers do not deploy Agnes themselves; this is documented for operator-facing audits and security reviews. Reach out to security@lasscyber.com if you need a SOC 2-style packet, signed pentest report, or DPA.

What is not deployed

A few things are intentionally absent so the threat surface stays small:

No customer model hosting. Agnes does not run your model. Bring your own LLM (OpenAI, Anthropic, Google, self-hosted, …). Agnes’s optional OpenAI integration is a client-side wrapper, not a hosted endpoint.
No long-lived plaintext storage of analyzed prompts. Structured decision metadata is logged; the raw prompt is not persisted unless you explicitly ingest it into the threat-intel store via Workbench.
No third-party data brokers. Threat intel comes from public research datasets and customer-supplied data, never resold third-party feeds.

How Agnes works — request lifecycle.
Combined analyzer — the hero endpoint.
Errors — the canonical error envelope and what 503s mean.

​Components

​agnes.lasscyber.com — frontend dashboard

​api.lasscyber.com — API service

​model_service — internal inference

​Postgres + pgvector

​External upstream APIs

​Where customer data goes

​Network and tenancy

​Deployment surface

​What is not deployed

​Next