Sensitive Data Protection

The Sensitive Data Protection analyzer (formerly DLP) detects personally identifiable information (PII), credentials, financial identifiers, and protected health information using Google Cloud DLP. Agnes wraps DLP behind a tenant-scoped SDP policy so you control which info types are inspected and how findings are de-identified.


Canonical name	`sensitive-data`
Python	`sensitive_data`
TypeScript	`sensitiveData`
Server key	`dlp_analyzer`
Category	Data Protection

What it detects

Google Cloud DLP recognizes ~150 built-in info types. Agnes ships five default SDP policies covering common bundles:

General PII — email, phone, person name, location, date of birth, IP address, URL, age.
Financial — credit cards, IBAN, SWIFT, U.S. bank routing / account numbers.
Healthcare PHI — medical record number, NPI, DEA, ICD-9 / 10, FDA codes, blood type.
Government IDs — SSN, passport, driver’s license, ITIN, ATIN.
Credentials & Secrets — auth tokens, basic-auth headers, passwords, GCP API keys, signed URLs.

You can author your own SDP policy with any combination of info types and likelihood thresholds; see SDP policies.

How it works

The prompt is sent to the Google Cloud DLP API (projects.content.inspect or projects.content.deidentify depending on the policy). DLP returns findings — info type, likelihood, location, optionally the matched quote — and Agnes packages them into the analyzer output. Findings can optionally be de-identified in place: Agnes returns a sanitized copy of the input with the configured transformation applied (replace with [REDACTED], mask with *, replace with the info type label, fully redact, etc.).

Parameters

Key	Type	Required	Default	Notes
`sdp_policy_id`	select	No	tenant default	Reference an SDP policy. Leave empty to use the tenant default.

You can also override per-request:

{ "prompt": "...", "policy_slug": "default-inbound", "sdp_policy_id": "ph-strict" }

Outputs and metrics

{
  "findings": [
    { "info_type": "EMAIL_ADDRESS", "likelihood": "VERY_LIKELY",
      "quote": "alice@example.com", "byte_range": { "start": 12, "end": 30 } },
    { "info_type": "US_SOCIAL_SECURITY_NUMBER", "likelihood": "LIKELY",
      "quote": "123-45-6789", "byte_range": { "start": 60, "end": 71 } }
  ],
  "deidentified_text": "Email me at [REDACTED] - SSN [REDACTED]",
  "metrics": {
    "dlp_processing_time_ms": 312.0,
    "findings_count": 2
  },
  "status": "OK"
}

Metric	Suggested thresholds
`findings_count`	`> 0` (any finding), `>= 3` (multiple), `>= 10` (high volume).
`dlp_processing_time_ms`	Observability only; no recommended threshold.

Termination signals

Signal	What it matches
Boolean: `has_sensitive_data`	Any finding (regardless of info type).
Match: `info_type`	A specific info type, e.g. `US_SOCIAL_SECURITY_NUMBER`, `EMAIL_ADDRESS`, `CREDIT_CARD_NUMBER`.

Output match hints (regex) ship for the most common info types: EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD_NUMBER, US_SOCIAL_SECURITY_NUMBER, plus a wildcard .* for “any sensitive data”.

Limits and cost

Limit	Value
Max input tokens	1,000,000
Requests / minute	500 (per tenant)
DLP timeout	30 s
Findings per request	100 (DLP default)

Cost is Google Cloud DLP pricing — billed per content item inspected. See the DLP pricing page for current rates.

Typical latency

100–3000 ms depending on input size and the number of info types inspected. Networking to DLP dominates; expect ~200 ms baseline.

When to use it

Inbound and outbound. SDP is one of the highest-value analyzers to put on outbound — LLMs hallucinate plausible-looking PII (credit cards, SSNs, emails) regularly.
Match info types to your domain. Healthcare apps want PHI detection; fintech wants financial info types; everyone wants credential detection. Author per-product SDP policies.
Use replaceConfig for high-precision sanitization. Replacing with [REDACTED] keeps the LLM’s surrounding context legible while removing the leak.

Failure modes

DLP API error → the analyzer status flips to ERROR and the combined run’s overall status follows.
Token limit exceeded → payload_too_large 413.

SDP policies — author info-type bundles and de-identification configs.
Combined analyzer — wiring SDP into termination rules.

​What it detects

​How it works

​Parameters

​Outputs and metrics

​Termination signals

​Limits and cost

​Typical latency

​When to use it

​Failure modes

​Next