Documentation Index
Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt
Use this file to discover all available pages before exploring further.
The Sensitive Data Protection analyzer (formerly DLP) detects
personally identifiable information (PII), credentials, financial
identifiers, and protected health information using Google Cloud DLP.
Agnes wraps DLP behind a tenant-scoped SDP policy so you control
which info types are inspected and how findings are de-identified.
| |
|---|
| Canonical name | sensitive-data |
| Python | sensitive_data |
| TypeScript | sensitiveData |
| Server key | dlp_analyzer |
| Category | Data Protection |
What it detects
Google Cloud DLP recognizes ~150 built-in info types. Agnes ships five
default SDP policies covering common bundles:
- General PII — email, phone, person name, location, date of
birth, IP address, URL, age.
- Financial — credit cards, IBAN, SWIFT, U.S. bank routing /
account numbers.
- Healthcare PHI — medical record number, NPI, DEA, ICD-9 / 10,
FDA codes, blood type.
- Government IDs — SSN, passport, driver’s license, ITIN, ATIN.
- Credentials & Secrets — auth tokens, basic-auth headers,
passwords, GCP API keys, signed URLs.
You can author your own SDP policy with any combination of info types
and likelihood thresholds; see SDP policies.
How it works
The prompt is sent to the Google Cloud DLP API
(projects.content.inspect or projects.content.deidentify depending
on the policy). DLP returns findings — info type, likelihood, location,
optionally the matched quote — and Agnes packages them into the
analyzer output.
Findings can optionally be de-identified in place: Agnes returns a
sanitized copy of the input with the configured transformation
applied (replace with [REDACTED], mask with *, replace with the
info type label, fully redact, etc.).
Parameters
| Key | Type | Required | Default | Notes |
|---|
sdp_policy_id | select | No | tenant default | Reference an SDP policy. Leave empty to use the tenant default. |
You can also override per-request:
{ "prompt": "...", "policy_slug": "default-inbound", "sdp_policy_id": "ph-strict" }
Outputs and metrics
{
"findings": [
{ "info_type": "EMAIL_ADDRESS", "likelihood": "VERY_LIKELY",
"quote": "alice@example.com", "byte_range": { "start": 12, "end": 30 } },
{ "info_type": "US_SOCIAL_SECURITY_NUMBER", "likelihood": "LIKELY",
"quote": "123-45-6789", "byte_range": { "start": 60, "end": 71 } }
],
"deidentified_text": "Email me at [REDACTED] - SSN [REDACTED]",
"metrics": {
"dlp_processing_time_ms": 312.0,
"findings_count": 2
},
"status": "OK"
}
| Metric | Suggested thresholds |
|---|
findings_count | > 0 (any finding), >= 3 (multiple), >= 10 (high volume). |
dlp_processing_time_ms | Observability only; no recommended threshold. |
Termination signals
| Signal | What it matches |
|---|
Boolean: has_sensitive_data | Any finding (regardless of info type). |
Match: info_type | A specific info type, e.g. US_SOCIAL_SECURITY_NUMBER, EMAIL_ADDRESS, CREDIT_CARD_NUMBER. |
Output match hints (regex) ship for the most common info types:
EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD_NUMBER,
US_SOCIAL_SECURITY_NUMBER, plus a wildcard .* for “any sensitive
data”.
Limits and cost
| Limit | Value |
|---|
| Max input tokens | 1,000,000 |
| Requests / minute | 500 (per tenant) |
| DLP timeout | 30 s |
| Findings per request | 100 (DLP default) |
Cost is Google Cloud DLP pricing — billed per content item
inspected. See the
DLP pricing page
for current rates.
Typical latency
100–3000 ms depending on input size and the number of info types
inspected. Networking to DLP dominates; expect ~200 ms baseline.
When to use it
- Inbound and outbound. SDP is one of the highest-value analyzers
to put on outbound — LLMs hallucinate plausible-looking PII (credit
cards, SSNs, emails) regularly.
- Match info types to your domain. Healthcare apps want PHI
detection; fintech wants financial info types; everyone wants
credential detection. Author per-product SDP policies.
- Use
replaceConfig for high-precision sanitization. Replacing
with [REDACTED] keeps the LLM’s surrounding context legible while
removing the leak.
Failure modes
- DLP API error → the analyzer status flips to
ERROR and the
combined run’s overall status follows.
- Token limit exceeded →
payload_too_large 413.
Next