> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Malicious URL Detection

> Extract URLs and check them against Google Web Risk for malware, phishing, and unwanted-software threats.

The URL Risk analyzer extracts every URL from the input text and checks
each one against **Google Web Risk**. URLs that match Web Risk's threat
lists are flagged with their threat type.

|                    |                  |
| ------------------ | ---------------- |
| **Canonical name** | `url-risk`       |
| **Python**         | `url_risk`       |
| **TypeScript**     | `urlRisk`        |
| **Server key**     | `url_analyzer`   |
| **Category**       | Threat Detection |

## What it detects

Three Google Web Risk threat types:

* `MALWARE` — pages distributing malware.
* `SOCIAL_ENGINEERING` — phishing or social-engineering pages.
* `UNWANTED_SOFTWARE` — sites distributing unwanted software (toolbars,
  bundleware).

It does **not** classify URLs as "spammy" or "low quality"; Web Risk's
job is exclusively the three categories above.

## How it works

1. Extract URLs from the input using the
   [`urlextract`](https://pypi.org/project/urlextract/) library and a
   recent TLD list (refreshed when older than 7 days).
2. For each extracted URL, call the Web Risk API.
3. Return per-URL verdicts and counts.

If the Web Risk API errors on a particular URL, Agnes treats that URL
as unsafe — the analyzer fails *closed*. This means the
`unsafe_urls_count` metric over-reports rather than under-reports
during outages.

## Parameters

This analyzer takes no parameters. Web Risk lookups always check all
three threat types.

## Outputs and metrics

```json theme={null}
{
  "urls": [
    { "url": "https://safe-corp.example",
      "is_safe": true,  "threat_types": [] },
    { "url": "https://phish.bad.example",
      "is_safe": false, "threat_types": ["SOCIAL_ENGINEERING"] }
  ],
  "metrics": {
    "processing_time_ms": 12.4,
    "urls_detected_count": 2,
    "unsafe_urls_count": 1
  },
  "status": "OK"
}
```

| Metric                | Suggested thresholds    |
| --------------------- | ----------------------- |
| `urls_detected_count` | Observability only.     |
| `unsafe_urls_count`   | `> 0` (any unsafe URL). |
| `processing_time_ms`  | Observability only.     |

## Termination signals

| Signal                      | What it matches                                              |
| --------------------------- | ------------------------------------------------------------ |
| Boolean: `unsafe_url_found` | Any URL flagged unsafe by Web Risk.                          |
| Match: `threat_type`        | One of `MALWARE`, `SOCIAL_ENGINEERING`, `UNWANTED_SOFTWARE`. |

Output match hints (regex): `MALWARE`, `SOCIAL_ENGINEERING`,
`UNWANTED_SOFTWARE`, and `(MALWARE|SOCIAL_ENGINEERING|UNWANTED_SOFTWARE)`
for any threat.

## Limits and cost

| Limit             | Value              |
| ----------------- | ------------------ |
| Max input tokens  | 1,000,000          |
| Requests / minute | 5,000 (per tenant) |

Cost is **Google Web Risk pricing** — billed per URL lookup. See the
[Web Risk pricing page](https://cloud.google.com/web-risk/pricing).

## Typical latency

1–50 ms total, dominated by the number of URLs found. Web Risk lookups
are fast (a handful of milliseconds each); URL extraction itself is
near-instant.

## When to use it

* **Always on.** URL Risk is one of the cheapest analyzers and catches
  a category of attack the ML analyzers do not.
* **Both directions.** Inbound: catches user-submitted phishing.
  Outbound: catches LLMs that hallucinated a malicious link.
* **Pair with YARA.** YARA is the right tool for *URL patterns* you
  care about beyond Web Risk's categories (e.g. internal hostnames you
  do not want appearing in outputs).

## Failure modes

* **Web Risk error on a URL** → that URL is treated as unsafe; the
  metric overcounts. The analyzer does not error; the run continues.
* **Web Risk fully unreachable** → `analyzer_unavailable` 503 with
  `Retry-After`.

## Next

* [Combined analyzer](/concepts/combined-analyzer) — wire `url_risk`
  into termination rules.
* [YARA](/analyzers/yara) — for URL patterns Web Risk does not cover.
