Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt

Use this file to discover all available pages before exploring further.

This analyzer detects prompt-injection attacks, jailbreak attempts, and maliciously crafted inputs using specialized BERT-family classifier models. It is the single most valuable analyzer to run on inbound traffic.
Canonical nameprompt-injection-jailbreak
Pythonprompt_injection_jailbreak
TypeScriptpromptInjectionJailbreak
Server keyadversarial_detection_analyzer
CategoryAdversarial

What it detects

Three kinds of malicious input:
  • Prompt injection — instructions that try to override your system prompt (“Ignore previous instructions and …”).
  • Jailbreaks — well-known role-play exploits (“DAN mode”, “developer mode”, reverse-roleplay framings).
  • Maliciously formed input — adversarial decorations, encoding tricks, and adversarial perturbations against system prompts.
It does not detect unsafe content (that’s Safety & Responsible AI), sensitive data leakage (that’s SDP), or domain-specific patterns (that’s YARA).

How it works

The input is tokenized and split into chunks of 400 tokens with 50 tokens of overlap. Each chunk is sent to the chosen BERT classifier running on the internal model service (Cloud Run + L4 GPU). The worst-case chunk score determines the final label and confidence:
  • Label INJECTION/JAILBREAK if the worst chunk crosses the model’s decision threshold.
  • Label SAFE otherwise.

Available models

You select a model with the model_id parameter. Larger models are more accurate; smaller models are faster.
Model IDFriendly nameNotes
meta-llama/Llama-Prompt-Guard-2-22MLlama Prompt Guard 22MDefault. Fastest.
meta-llama/Llama-Prompt-Guard-2-86MLlama Prompt Guard 86MBalanced accuracy / latency.
protectai/deberta-v3-base-prompt-injection-v2DeBERTa v3 Prompt InjectionStrong on classic injection idioms.
testsavantai/prompt-injection-defender-large-v0-onnxPrompt Injection Defender ONNXONNX runtime; useful as a second opinion.

Parameters

KeyTypeRequiredDefaultNotes
model_idselectYesmeta-llama/Llama-Prompt-Guard-2-22MChoose from the table above.

Outputs and metrics

The analyzer_results.prompt-injection-jailbreak block looks like:
{
  "label": "INJECTION/JAILBREAK",
  "score": 0.97,
  "metrics": {
    "score": 0.97,
    "inference_time_ms": 38.4
  },
  "status": "OK"
}
MetricTypeRangeNotes
scorefloat0.0–1.0Probability the input is adversarial.
inference_time_msfloatModel inference duration.

Termination signals

SignalWhat it matches
Boolean: is_maliciousFires when the classifier label is INJECTION/JAILBREAK.
Output match: INJECTION/JAILBREAKSame as above, expressed as a regex.
Output match: SAFEFires on benign input. Useful for “allow lists”.
Suggested score thresholds:
StanceOperatorValue
Conservative>0.50
Balanced>0.75
Aggressive>0.90
The shipped default-inbound policy uses score >= 0.85 AND output_match: INJECTION/JAILBREAK with terminate_immediately.

Limits and cost

LimitValue
Max input tokens100,000
Requests / minute500 (per tenant)
Chunk size400 tokens with 50-token overlap
Cost is model-inference time on the internal model service — approximately $0.02 / call at the time of writing, billed via metered tokens on your subscription. See Billing.

Typical latency

20–100 ms depending on input length and model size. Cold-start adds a small one-off penalty per Cloud Run instance.

When to use it

  • Always on inbound. This is the single most valuable analyzer to put before your LLM. Use the 22M model unless you have measured evidence the 86M model is worth the extra latency for your traffic.
  • Optional on outbound. The classifier is trained for inputs; on outputs it tends to overfire on quoted user text. Prefer Safety & Responsible AI for outbound.
  • Pair with semantic threat intel. This classifier is strong on the syntactic shape of attacks; the Semantic Threat Intelligence analyzer catches paraphrases and obfuscations the classifier misses.

Failure modes

  • Model service unavailableanalyzer_unavailable 503 with Retry-After. SDKs retry automatically.
  • Token limit exceededpayload_too_large 413.

Next