Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt

Use this file to discover all available pages before exploring further.

Agnes enforces per-tenant rate limits on every authenticated route. The limit applied depends on the route group; analyzer endpoints have their own budget, administrative endpoints have a separate one.

What you’ll see

Every successful response carries the current window state:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1762000060
HeaderMeaning
X-RateLimit-LimitRequests allowed in the current window.
X-RateLimit-RemainingRequests still available before the window closes.
X-RateLimit-ResetUnix timestamp when the window resets.
When the budget hits zero, the next request returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1762000060
{
  "detail": "Rate limit exceeded for analysis endpoints. Limit: 100/minute.",
  "code": "rate_limit_exceeded",
  "request_id": "...",
  "doc_url": "https://docs.lasscyber.com/errors/rate_limit_exceeded"
}
Retry-After is the number of seconds to wait before retrying. The SDKs honour it automatically.

Default limits

Limits depend on plan tier and route group. The headers are the authoritative source — the table below is a snapshot at the time of writing.
Route familyTrialStarterProfessionalEnterprise
POST /api/v1/analyze/ and analyzer endpoints30 / min100 / min500 / min2,000 / min (custom available)
Administrative (policies, keys, tenants)30 / min60 / min120 / min300 / min
Analyzer logs read30 / min60 / min120 / min300 / min
These numbers are per-tenant, not per-key. Two keys in the same tenant share a single budget. If you need a higher ceiling on a paid plan, mail sales@lasscyber.com.

How SDKs handle it

Both Python and TypeScript SDKs:
  1. Detect a 429 response.
  2. Parse the Retry-After header.
  3. Sleep for that many seconds.
  4. Retry the request, with a small jitter on top.
  5. Surface a RateLimitError (Python) / RateLimitError (TypeScript) if retries are exhausted.
You can disable automatic retries on the client construction call if you want to handle them yourself.

Token quotas vs request rate limits

Two independent meters apply to your traffic:
  • Request rate limits — covered on this page. Reset every minute.
  • Monthly token quotas — the included_tokens_monthly from your plan. Once exhausted, on-demand tokens (paid Stripe meter) kick in on tiers that support it. See Billing.
The 429 response covers rate limits only. Hitting the monthly quota without on-demand enabled produces a 402 with code: "billing_required".

Best practices

  • Honour Retry-After. Custom retry logic that ignores it will hammer the API and slow you down.
  • Prefer batching where the API supports it. POST /api/v1/analyze/ is single-prompt only by design (so each analysis carries a request ID and a clean termination); for log queries and policy reads, prefer the bulk endpoints.
  • Distribute load across keys for noisy-neighbour isolation. Two keys in the same tenant share a budget, so this only helps you throttle different workloads against each other rather than bypassing the limit.
  • Watch the headers. Charting X-RateLimit-Remaining over time is the easiest way to tell if you are approaching the ceiling.

Next