Agnes enforces per-tenant rate limits on every authenticated route. The limit applied depends on the route group; analyzer endpoints have their own budget, administrative endpoints have a separate one.Documentation Index
Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
Use this file to discover all available pages before exploring further.
What you’ll see
Every successful response carries the current window state:| Header | Meaning |
|---|---|
X-RateLimit-Limit | Requests allowed in the current window. |
X-RateLimit-Remaining | Requests still available before the window closes. |
X-RateLimit-Reset | Unix timestamp when the window resets. |
Retry-After is the number of seconds to wait before retrying. The
SDKs honour it automatically.
Default limits
Limits depend on plan tier and route group. The headers are the authoritative source — the table below is a snapshot at the time of writing.| Route family | Trial | Starter | Professional | Enterprise |
|---|---|---|---|---|
POST /api/v1/analyze/ and analyzer endpoints | 30 / min | 100 / min | 500 / min | 2,000 / min (custom available) |
| Administrative (policies, keys, tenants) | 30 / min | 60 / min | 120 / min | 300 / min |
| Analyzer logs read | 30 / min | 60 / min | 120 / min | 300 / min |
sales@lasscyber.com.
How SDKs handle it
Both Python and TypeScript SDKs:- Detect a
429response. - Parse the
Retry-Afterheader. - Sleep for that many seconds.
- Retry the request, with a small jitter on top.
- Surface a
RateLimitError(Python) /RateLimitError(TypeScript) if retries are exhausted.
Token quotas vs request rate limits
Two independent meters apply to your traffic:- Request rate limits — covered on this page. Reset every minute.
- Monthly token quotas — the
included_tokens_monthlyfrom your plan. Once exhausted, on-demand tokens (paid Stripe meter) kick in on tiers that support it. See Billing.
402 with code: "billing_required".
Best practices
- Honour
Retry-After. Custom retry logic that ignores it will hammer the API and slow you down. - Prefer batching where the API supports it.
POST /api/v1/analyze/is single-prompt only by design (so each analysis carries a request ID and a clean termination); for log queries and policy reads, prefer the bulk endpoints. - Distribute load across keys for noisy-neighbour isolation. Two keys in the same tenant share a budget, so this only helps you throttle different workloads against each other rather than bypassing the limit.
- Watch the headers. Charting
X-RateLimit-Remainingover time is the easiest way to tell if you are approaching the ceiling.
Next
- Errors →
rate_limit_exceeded - Idempotency — make retries safe on write endpoints.
- Billing — token quotas and on-demand rates.