Rate limits

Agnes enforces per-tenant rate limits on every authenticated route. The limit applied depends on the route group; analyzer endpoints have their own budget, administrative endpoints have a separate one.

What you’ll see

Every successful response carries the current window state:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1762000060

Header	Meaning
`X-RateLimit-Limit`	Requests allowed in the current window.
`X-RateLimit-Remaining`	Requests still available before the window closes.
`X-RateLimit-Reset`	Unix timestamp when the window resets.

When the budget hits zero, the next request returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1762000060

{
  "detail": "Rate limit exceeded for analysis endpoints. Limit: 100/minute.",
  "code": "rate_limit_exceeded",
  "request_id": "...",
  "doc_url": "https://docs.lasscyber.com/errors/rate_limit_exceeded"
}

Retry-After is the number of seconds to wait before retrying. The SDKs honour it automatically.

Default limits

Limits depend on plan tier and route group. The headers are the authoritative source — the table below is a snapshot at the time of writing.

Route family	Trial	Starter	Professional	Enterprise
`POST /api/v1/analyze/` and analyzer endpoints	30 / min	100 / min	500 / min	2,000 / min (custom available)
Administrative (policies, keys, tenants)	30 / min	60 / min	120 / min	300 / min
Analyzer logs read	30 / min	60 / min	120 / min	300 / min

These numbers are per-tenant, not per-key. Two keys in the same tenant share a single budget. If you need a higher ceiling on a paid plan, mail sales@lasscyber.com.

How SDKs handle it

Both Python and TypeScript SDKs:

Detect a 429 response.
Parse the Retry-After header.
Sleep for that many seconds.
Retry the request, with a small jitter on top.
Surface a RateLimitError (Python) / RateLimitError (TypeScript) if retries are exhausted.

You can disable automatic retries on the client construction call if you want to handle them yourself.

Token quotas vs request rate limits

Two independent meters apply to your traffic:

Request rate limits — covered on this page. Reset every minute.
Monthly token quotas — the included_tokens_monthly from your plan. Once exhausted, on-demand tokens (paid Stripe meter) kick in on tiers that support it. See Billing.

The 429 response covers rate limits only. Hitting the monthly quota without on-demand enabled produces a 402 with code: "billing_required".

Best practices

Honour Retry-After. Custom retry logic that ignores it will hammer the API and slow you down.
Prefer batching where the API supports it. POST /api/v1/analyze/ is single-prompt only by design (so each analysis carries a request ID and a clean termination); for log queries and policy reads, prefer the bulk endpoints.
Distribute load across keys for noisy-neighbour isolation. Two keys in the same tenant share a budget, so this only helps you throttle different workloads against each other rather than bypassing the limit.
Watch the headers. Charting X-RateLimit-Remaining over time is the easiest way to tell if you are approaching the ceiling.

Errors → rate_limit_exceeded
Idempotency — make retries safe on write endpoints.
Billing — token quotas and on-demand rates.

Reference

Errors

What you’ll see

Default limits

How SDKs handle it

Token quotas vs request rate limits

Best practices

Next

Reference

Errors

Documentation Index

​What you’ll see

​Default limits

​How SDKs handle it

​Token quotas vs request rate limits

​Best practices

​Next

What you’ll see

Default limits

How SDKs handle it

Token quotas vs request rate limits

Best practices

Next