YARA Rule Enforcement

The YARA analyzer matches input text against compiled YARA rule sets. Agnes ships a catalog of system rules (instruction bypass, secret patterns, credential leakage, common jailbreak phrases) and lets each tenant author its own rules and group them into YARA policies.


Canonical name	`yara`
Python	`yara`
TypeScript	`yara`
Server key	`yara_analyzer`
Category	Pattern Matching

What it detects

Anything you can express as a YARA rule:

Known prompt-injection idioms (“Ignore previous instructions…”, “Disregard the above…”).
Secret shapes (API tokens, SSH keys, generic high-entropy identifiers).
Internal codenames or document signatures you do not want leaving your tenancy.
Output formats you want to enforce (e.g. JSON-only outputs).

YARA is not statistical — a rule either matches or it does not. That makes it complementary to the ML analyzers: zero false positives in exchange for narrow coverage.

How it works

Agnes compiles all active rules in the selected YARA policy (or all active tenant rules if no policy is selected) into a single YARA matcher. Input text is scanned against the matcher; matches surface as analyzer findings with the rule name and category. System rules ship in api/data/yara/ and include things like instruction bypass, generic-secret detection, SSH-key shapes, and a “fake IT maintenance” social-engineering rule.

Parameters

Key	Type	Required	Default	Notes
`yara_policy_id`	select	No	tenant default	Reference a YARA policy. Leave empty to use the default (or all active rules if no default exists).

You can also override per request:

{ "prompt": "...", "policy_slug": "default-inbound", "yara_policy_id": "code-leakage" }

Outputs and metrics

{
  "matches": [
    { "rule_name": "InstructionBypass", "category": "Injection",
      "strings": ["ignore previous instructions"] },
    { "rule_name": "GenericSecret", "category": "Secrets",
      "strings": ["sk-XXXXXXXXX"] }
  ],
  "metrics": {
    "processing_time_ms": 1.2,
    "matches_found": 2
  },
  "status": "OK"
}

Metric	Suggested thresholds
`matches_found`	`> 0` (any match), `>= 3` (multiple).
`processing_time_ms`	Observability only.

Termination signals

Signal	What it matches
Boolean: `match_found`	Any rule matched.
Match: `rule_name`	A specific rule, by name.
Match: `category`	All rules in a meta `category =` block.

rule_name and category are dynamic signals — the dashboard policy editor populates them from the rules in your tenant.

Limits and cost

Limit	Value
Max input tokens	1,000,000
Requests / minute	5,000 (per tenant)

YARA runs in-process; no external API cost. Compilation is cached per tenant + policy.

Typical latency

1–50 ms depending on the number of compiled rules. YARA is one of the cheapest analyzers — put it early in your execution plan to short-circuit on cheap signals.

A small worked example

The shipped InstructionBypass rule:

rule InstructionBypass: Injection
{
    meta:
        category = "Instruction Bypass"
        description = "Detects phrases used to ignore, disregard, or bypass instructions."

    strings:
        $bypass_phrase = /(Ignore|Disregard|Skip|Forget|Neglect|Overlook|Omit|Bypass|Pay no attention to|Do not follow|Do not obey)\s*(prior|previous|preceding|above|foregoing|earlier|initial)?\s*(content|text|instructions|instruction|directives|directive|commands|command|context|conversation|input|inputs|data|message|messages|communication|response|responses|request|requests)\s*(and start over|and start anew|and begin afresh|and start from scratch)?/

    condition:
        $bypass_phrase
}

The meta: category = value (Instruction Bypass) becomes a termination signal you can match on — “terminate when any rule with category ‘Instruction Bypass’ fires” — without naming each rule individually.

When to use it

Always on as a cheap first line. Put YARA early in your combined policy so it short-circuits before the GPU classifiers fire.
Encode product-specific rules. Internal codenames, regulatory phrasings, product-specific PII shapes that Cloud DLP does not ship. YARA is the right place.
Outbound is high-value. Outbound YARA catches LLMs that have internalised your codebase or customer data and started reproducing patterns.

Failure modes

Rule compilation error → the rule fails to load and is excluded from the run; surfaced in admin logs but does not break the analyzer.
No rules in the selected policy → the analyzer returns matches_found: 0 and OK status. Fix the policy if you expected matches.

YARA rules and policies — author rules, group them into policies, manage activation.
Combined analyzer — wiring YARA into termination rules.

Get started

Concepts

Analyzers

Policies

Threat analysis

Testing

Administration

YARA Rule Enforcement

What it detects

How it works

Parameters

Outputs and metrics

Termination signals

Limits and cost

Typical latency

A small worked example

When to use it

Failure modes

Next

Get started

Concepts

Analyzers

Policies

Threat analysis

Testing

Administration

Documentation Index

​What it detects

​How it works

​Parameters

​Outputs and metrics

​Termination signals

​Limits and cost

​Typical latency

​A small worked example

​When to use it

​Failure modes

​Next

What it detects

How it works

Parameters

Outputs and metrics

Termination signals

Limits and cost

Typical latency

A small worked example

When to use it

Failure modes

Next