> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate limits

> Per-tenant rate limits, the headers Agnes returns, and how the SDKs honour them.

Agnes enforces per-tenant rate limits on every authenticated route.
The limit applied depends on the route group; analyzer endpoints have
their own budget, administrative endpoints have a separate one.

## What you'll see

Every successful response carries the current window state:

```http theme={null}
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1762000060
```

| Header                  | Meaning                                            |
| ----------------------- | -------------------------------------------------- |
| `X-RateLimit-Limit`     | Requests allowed in the current window.            |
| `X-RateLimit-Remaining` | Requests still available before the window closes. |
| `X-RateLimit-Reset`     | Unix timestamp when the window resets.             |

When the budget hits zero, the next request returns:

```http theme={null}
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1762000060
```

```json theme={null}
{
  "detail": "Rate limit exceeded for analysis endpoints. Limit: 100/minute.",
  "code": "rate_limit_exceeded",
  "request_id": "...",
  "doc_url": "https://docs.lasscyber.com/errors/rate_limit_exceeded"
}
```

`Retry-After` is the number of seconds to wait before retrying. The
SDKs honour it automatically.

## Default limits

Limits depend on plan tier and route group. The headers are the
authoritative source — the table below is a snapshot at the time of
writing.

| Route family                                   | Trial    | Starter   | Professional | Enterprise                     |
| ---------------------------------------------- | -------- | --------- | ------------ | ------------------------------ |
| `POST /api/v1/analyze/` and analyzer endpoints | 30 / min | 100 / min | 500 / min    | 2,000 / min (custom available) |
| Administrative (policies, keys, tenants)       | 30 / min | 60 / min  | 120 / min    | 300 / min                      |
| Analyzer logs read                             | 30 / min | 60 / min  | 120 / min    | 300 / min                      |

These numbers are per-tenant, not per-key. Two keys in the same tenant
share a single budget.

If you need a higher ceiling on a paid plan, mail
[`sales@lasscyber.com`](mailto:sales@lasscyber.com).

## How SDKs handle it

Both Python and TypeScript SDKs:

1. Detect a `429` response.
2. Parse the `Retry-After` header.
3. Sleep for that many seconds.
4. Retry the request, with a small jitter on top.
5. Surface a `RateLimitError` (Python) / `RateLimitError` (TypeScript)
   if retries are exhausted.

You can disable automatic retries on the client construction call if
you want to handle them yourself.

## Token quotas vs request rate limits

Two independent meters apply to your traffic:

* **Request rate limits** — covered on this page. Reset every minute.
* **Monthly token quotas** — the `included_tokens_monthly` from your
  plan. Once exhausted, on-demand tokens (paid Stripe meter) kick in
  on tiers that support it. See [Billing](/administration/billing).

The 429 response covers rate limits *only*. Hitting the monthly quota
without on-demand enabled produces a `402` with `code:
"billing_required"`.

## Best practices

* **Honour `Retry-After`.** Custom retry logic that ignores it will
  hammer the API and slow you down.
* **Prefer batching where the API supports it.** `POST /api/v1/analyze/`
  is single-prompt only by design (so each analysis carries a request
  ID and a clean termination); for log queries and policy reads, prefer
  the bulk endpoints.
* **Distribute load across keys for noisy-neighbour isolation.** Two
  keys in the same tenant share a budget, so this only helps you
  throttle different *workloads* against each other rather than
  bypassing the limit.
* **Watch the headers.** Charting `X-RateLimit-Remaining` over time is
  the easiest way to tell if you are approaching the ceiling.

## Next

* [Errors → `rate_limit_exceeded`](/errors/rate_limit_exceeded)
* [Idempotency](/api-reference/idempotency) — make retries safe on
  write endpoints.
* [Billing](/administration/billing) — token quotas and on-demand
  rates.
