Documentation Index
Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
Use this file to discover all available pages before exploring further.
This analyzer detects prompt-injection attacks, jailbreak attempts, and
maliciously crafted inputs using specialized BERT-family classifier
models. It is the single most valuable analyzer to run on inbound
traffic.
| |
|---|
| Canonical name | prompt-injection-jailbreak |
| Python | prompt_injection_jailbreak |
| TypeScript | promptInjectionJailbreak |
| Server key | adversarial_detection_analyzer |
| Category | Adversarial |
What it detects
Three kinds of malicious input:
- Prompt injection — instructions that try to override your system
prompt (“Ignore previous instructions and …”).
- Jailbreaks — well-known role-play exploits (“DAN mode”,
“developer mode”, reverse-roleplay framings).
- Maliciously formed input — adversarial decorations, encoding
tricks, and adversarial perturbations against system prompts.
It does not detect unsafe content (that’s
Safety & Responsible AI), sensitive
data leakage (that’s SDP), or
domain-specific patterns (that’s YARA).
How it works
The input is tokenized and split into chunks of 400 tokens with 50
tokens of overlap. Each chunk is sent to the chosen BERT classifier
running on the internal model service (Cloud Run + L4 GPU). The
worst-case chunk score determines the final label and confidence:
- Label
INJECTION/JAILBREAK if the worst chunk crosses the model’s
decision threshold.
- Label
SAFE otherwise.
Available models
You select a model with the model_id parameter. Larger models are
more accurate; smaller models are faster.
| Model ID | Friendly name | Notes |
|---|
meta-llama/Llama-Prompt-Guard-2-22M | Llama Prompt Guard 22M | Default. Fastest. |
meta-llama/Llama-Prompt-Guard-2-86M | Llama Prompt Guard 86M | Balanced accuracy / latency. |
protectai/deberta-v3-base-prompt-injection-v2 | DeBERTa v3 Prompt Injection | Strong on classic injection idioms. |
testsavantai/prompt-injection-defender-large-v0-onnx | Prompt Injection Defender ONNX | ONNX runtime; useful as a second opinion. |
Parameters
| Key | Type | Required | Default | Notes |
|---|
model_id | select | Yes | meta-llama/Llama-Prompt-Guard-2-22M | Choose from the table above. |
Outputs and metrics
The analyzer_results.prompt-injection-jailbreak block looks like:
{
"label": "INJECTION/JAILBREAK",
"score": 0.97,
"metrics": {
"score": 0.97,
"inference_time_ms": 38.4
},
"status": "OK"
}
| Metric | Type | Range | Notes |
|---|
score | float | 0.0–1.0 | Probability the input is adversarial. |
inference_time_ms | float | — | Model inference duration. |
Termination signals
| Signal | What it matches |
|---|
Boolean: is_malicious | Fires when the classifier label is INJECTION/JAILBREAK. |
Output match: INJECTION/JAILBREAK | Same as above, expressed as a regex. |
Output match: SAFE | Fires on benign input. Useful for “allow lists”. |
Suggested score thresholds:
| Stance | Operator | Value |
|---|
| Conservative | > | 0.50 |
| Balanced | > | 0.75 |
| Aggressive | > | 0.90 |
The shipped default-inbound policy uses score >= 0.85 AND
output_match: INJECTION/JAILBREAK with terminate_immediately.
Limits and cost
| Limit | Value |
|---|
| Max input tokens | 100,000 |
| Requests / minute | 500 (per tenant) |
| Chunk size | 400 tokens with 50-token overlap |
Cost is model-inference time on the internal model service —
approximately $0.02 / call at the time of writing, billed via
metered tokens on your subscription. See
Billing.
Typical latency
20–100 ms depending on input length and model size. Cold-start adds a
small one-off penalty per Cloud Run instance.
When to use it
- Always on inbound. This is the single most valuable analyzer to
put before your LLM. Use the 22M model unless you have measured
evidence the 86M model is worth the extra latency for your traffic.
- Optional on outbound. The classifier is trained for inputs; on
outputs it tends to overfire on quoted user text. Prefer
Safety & Responsible AI for
outbound.
- Pair with semantic threat intel. This classifier is strong on the
syntactic shape of attacks; the
Semantic Threat Intelligence
analyzer catches paraphrases and obfuscations the classifier misses.
Failure modes
- Model service unavailable →
analyzer_unavailable 503 with
Retry-After. SDKs retry automatically.
- Token limit exceeded →
payload_too_large 413.
Next