| Canonical name | sensitive-data |
| Python | sensitive_data |
| TypeScript | sensitiveData |
| Server key | dlp_analyzer |
| Category | Data Protection |
What it detects
Google Cloud DLP recognizes ~150 built-in info types. Agnes ships five default SDP policies covering common bundles:- General PII — email, phone, person name, location, date of birth, IP address, URL, age.
- Financial — credit cards, IBAN, SWIFT, U.S. bank routing / account numbers.
- Healthcare PHI — medical record number, NPI, DEA, ICD-9 / 10, FDA codes, blood type.
- Government IDs — SSN, passport, driver’s license, ITIN, ATIN.
- Credentials & Secrets — auth tokens, basic-auth headers, passwords, GCP API keys, signed URLs.
How it works
The prompt is sent to the Google Cloud DLP API (projects.content.inspect or projects.content.deidentify depending
on the policy). DLP returns findings — info type, likelihood, location,
optionally the matched quote — and Agnes packages them into the
analyzer output.
Findings can optionally be de-identified in place: Agnes returns a
sanitized copy of the input with the configured transformation
applied (replace with [REDACTED], mask with *, replace with the
info type label, fully redact, etc.).
Parameters
| Key | Type | Required | Default | Notes |
|---|---|---|---|---|
sdp_policy_id | select | No | tenant default | Reference an SDP policy. Leave empty to use the tenant default. |
Outputs and metrics
| Metric | Suggested thresholds |
|---|---|
findings_count | > 0 (any finding), >= 3 (multiple), >= 10 (high volume). |
dlp_processing_time_ms | Observability only; no recommended threshold. |
Termination signals
| Signal | What it matches |
|---|---|
Boolean: has_sensitive_data | Any finding (regardless of info type). |
Match: info_type | A specific info type, e.g. US_SOCIAL_SECURITY_NUMBER, EMAIL_ADDRESS, CREDIT_CARD_NUMBER. |
EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD_NUMBER,
US_SOCIAL_SECURITY_NUMBER, plus a wildcard .* for “any sensitive
data”.
Limits and cost
| Limit | Value |
|---|---|
| Max input tokens | 1,000,000 |
| Requests / minute | 500 (per tenant) |
| DLP timeout | 30 s |
| Findings per request | 100 (DLP default) |
Typical latency
100–3000 ms depending on input size and the number of info types inspected. Networking to DLP dominates; expect ~200 ms baseline.When to use it
- Inbound and outbound. SDP is one of the highest-value analyzers to put on outbound — LLMs hallucinate plausible-looking PII (credit cards, SSNs, emails) regularly.
- Match info types to your domain. Healthcare apps want PHI detection; fintech wants financial info types; everyone wants credential detection. Author per-product SDP policies.
- Use
replaceConfigfor high-precision sanitization. Replacing with[REDACTED]keeps the LLM’s surrounding context legible while removing the leak.
Failure modes
- DLP API error → the analyzer status flips to
ERRORand the combined run’s overall status follows. - Token limit exceeded →
payload_too_large413.
Next
- SDP policies — author info-type bundles and de-identification configs.
- Combined analyzer — wiring SDP into termination rules.