Malicious URL Detection

The URL Risk analyzer extracts every URL from the input text and checks each one against Google Web Risk. URLs that match Web Risk’s threat lists are flagged with their threat type.


Canonical name	`url-risk`
Python	`url_risk`
TypeScript	`urlRisk`
Server key	`url_analyzer`
Category	Threat Detection

What it detects

Three Google Web Risk threat types:

MALWARE — pages distributing malware.
SOCIAL_ENGINEERING — phishing or social-engineering pages.
UNWANTED_SOFTWARE — sites distributing unwanted software (toolbars, bundleware).

It does not classify URLs as “spammy” or “low quality”; Web Risk’s job is exclusively the three categories above.

How it works

Extract URLs from the input using the urlextract library and a recent TLD list (refreshed when older than 7 days).
For each extracted URL, call the Web Risk API.
Return per-URL verdicts and counts.

If the Web Risk API errors on a particular URL, Agnes treats that URL as unsafe — the analyzer fails closed. This means the unsafe_urls_count metric over-reports rather than under-reports during outages.

Parameters

This analyzer takes no parameters. Web Risk lookups always check all three threat types.

Outputs and metrics

{
  "urls": [
    { "url": "https://safe-corp.example",
      "is_safe": true,  "threat_types": [] },
    { "url": "https://phish.bad.example",
      "is_safe": false, "threat_types": ["SOCIAL_ENGINEERING"] }
  ],
  "metrics": {
    "processing_time_ms": 12.4,
    "urls_detected_count": 2,
    "unsafe_urls_count": 1
  },
  "status": "OK"
}

Metric	Suggested thresholds
`urls_detected_count`	Observability only.
`unsafe_urls_count`	`> 0` (any unsafe URL).
`processing_time_ms`	Observability only.

Termination signals

Signal	What it matches
Boolean: `unsafe_url_found`	Any URL flagged unsafe by Web Risk.
Match: `threat_type`	One of `MALWARE`, `SOCIAL_ENGINEERING`, `UNWANTED_SOFTWARE`.

Output match hints (regex): MALWARE, SOCIAL_ENGINEERING, UNWANTED_SOFTWARE, and (MALWARE|SOCIAL_ENGINEERING|UNWANTED_SOFTWARE) for any threat.

Limits and cost

Limit	Value
Max input tokens	1,000,000
Requests / minute	5,000 (per tenant)

Cost is Google Web Risk pricing — billed per URL lookup. See the Web Risk pricing page.

Typical latency

1–50 ms total, dominated by the number of URLs found. Web Risk lookups are fast (a handful of milliseconds each); URL extraction itself is near-instant.

When to use it

Always on. URL Risk is one of the cheapest analyzers and catches a category of attack the ML analyzers do not.
Both directions. Inbound: catches user-submitted phishing. Outbound: catches LLMs that hallucinated a malicious link.
Pair with YARA. YARA is the right tool for URL patterns you care about beyond Web Risk’s categories (e.g. internal hostnames you do not want appearing in outputs).

Failure modes

Web Risk error on a URL → that URL is treated as unsafe; the metric overcounts. The analyzer does not error; the run continues.
Web Risk fully unreachable → analyzer_unavailable 503 with Retry-After.

Combined analyzer — wire url_risk into termination rules.
YARA — for URL patterns Web Risk does not cover.

​What it detects

​How it works

​Parameters

​Outputs and metrics

​Termination signals

​Limits and cost

​Typical latency

​When to use it

​Failure modes

​Next