> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Sensitive Data Protection

> Detect and optionally de-identify PII, credentials, and other sensitive data using Google Cloud DLP.

The Sensitive Data Protection analyzer (formerly *DLP*) detects
personally identifiable information (PII), credentials, financial
identifiers, and protected health information using **Google Cloud DLP**.
Agnes wraps DLP behind a tenant-scoped *SDP policy* so you control
which info types are inspected and how findings are de-identified.

|                    |                  |
| ------------------ | ---------------- |
| **Canonical name** | `sensitive-data` |
| **Python**         | `sensitive_data` |
| **TypeScript**     | `sensitiveData`  |
| **Server key**     | `dlp_analyzer`   |
| **Category**       | Data Protection  |

## What it detects

Google Cloud DLP recognizes \~150 built-in info types. Agnes ships five
default SDP policies covering common bundles:

* **General PII** — email, phone, person name, location, date of
  birth, IP address, URL, age.
* **Financial** — credit cards, IBAN, SWIFT, U.S. bank routing /
  account numbers.
* **Healthcare PHI** — medical record number, NPI, DEA, ICD-9 / 10,
  FDA codes, blood type.
* **Government IDs** — SSN, passport, driver's license, ITIN, ATIN.
* **Credentials & Secrets** — auth tokens, basic-auth headers,
  passwords, GCP API keys, signed URLs.

You can author your own SDP policy with any combination of info types
and likelihood thresholds; see [SDP policies](/policies/sdp-policies).

## How it works

The prompt is sent to the Google Cloud DLP API
(`projects.content.inspect` or `projects.content.deidentify` depending
on the policy). DLP returns findings — info type, likelihood, location,
optionally the matched quote — and Agnes packages them into the
analyzer output.

Findings can optionally be **de-identified** in place: Agnes returns a
sanitized copy of the input with the configured transformation
applied (replace with `[REDACTED]`, mask with `*`, replace with the
info type label, fully redact, etc.).

## Parameters

| Key             | Type   | Required | Default        | Notes                                                                                     |
| --------------- | ------ | -------- | -------------- | ----------------------------------------------------------------------------------------- |
| `sdp_policy_id` | select | No       | tenant default | Reference an [SDP policy](/policies/sdp-policies). Leave empty to use the tenant default. |

You can also override per-request:

```json theme={null}
{ "prompt": "...", "policy_slug": "default-inbound", "sdp_policy_id": "ph-strict" }
```

## Outputs and metrics

```json theme={null}
{
  "findings": [
    { "info_type": "EMAIL_ADDRESS", "likelihood": "VERY_LIKELY",
      "quote": "alice@example.com", "byte_range": { "start": 12, "end": 30 } },
    { "info_type": "US_SOCIAL_SECURITY_NUMBER", "likelihood": "LIKELY",
      "quote": "123-45-6789", "byte_range": { "start": 60, "end": 71 } }
  ],
  "deidentified_text": "Email me at [REDACTED] - SSN [REDACTED]",
  "metrics": {
    "dlp_processing_time_ms": 312.0,
    "findings_count": 2
  },
  "status": "OK"
}
```

| Metric                   | Suggested thresholds                                           |
| ------------------------ | -------------------------------------------------------------- |
| `findings_count`         | `> 0` (any finding), `>= 3` (multiple), `>= 10` (high volume). |
| `dlp_processing_time_ms` | Observability only; no recommended threshold.                  |

## Termination signals

| Signal                        | What it matches                                                                                |
| ----------------------------- | ---------------------------------------------------------------------------------------------- |
| Boolean: `has_sensitive_data` | Any finding (regardless of info type).                                                         |
| Match: `info_type`            | A specific info type, e.g. `US_SOCIAL_SECURITY_NUMBER`, `EMAIL_ADDRESS`, `CREDIT_CARD_NUMBER`. |

Output match hints (regex) ship for the most common info types:
`EMAIL_ADDRESS`, `PHONE_NUMBER`, `CREDIT_CARD_NUMBER`,
`US_SOCIAL_SECURITY_NUMBER`, plus a wildcard `.*` for "any sensitive
data".

## Limits and cost

| Limit                | Value             |
| -------------------- | ----------------- |
| Max input tokens     | 1,000,000         |
| Requests / minute    | 500 (per tenant)  |
| DLP timeout          | 30 s              |
| Findings per request | 100 (DLP default) |

Cost is **Google Cloud DLP pricing** — billed per content item
inspected. See the
[DLP pricing page](https://cloud.google.com/sensitive-data-protection/pricing)
for current rates.

## Typical latency

100–3000 ms depending on input size and the number of info types
inspected. Networking to DLP dominates; expect \~200 ms baseline.

## When to use it

* **Inbound and outbound.** SDP is one of the highest-value analyzers
  to put on outbound — LLMs hallucinate plausible-looking PII (credit
  cards, SSNs, emails) regularly.
* **Match info types to your domain.** Healthcare apps want PHI
  detection; fintech wants financial info types; everyone wants
  credential detection. Author per-product SDP policies.
* **Use `replaceConfig` for high-precision sanitization.** Replacing
  with `[REDACTED]` keeps the LLM's surrounding context legible while
  removing the leak.

## Failure modes

* **DLP API error** → the analyzer status flips to `ERROR` and the
  combined run's overall status follows.
* **Token limit exceeded** → `payload_too_large` 413.

## Next

* [SDP policies](/policies/sdp-policies) — author info-type bundles
  and de-identification configs.
* [Combined analyzer](/concepts/combined-analyzer) — wiring SDP into
  termination rules.
