Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lasscyber.com/llms.txt

Use this file to discover all available pages before exploring further.

The Sensitive Data Protection analyzer (formerly DLP) detects personally identifiable information (PII), credentials, financial identifiers, and protected health information using Google Cloud DLP. Agnes wraps DLP behind a tenant-scoped SDP policy so you control which info types are inspected and how findings are de-identified.
Canonical namesensitive-data
Pythonsensitive_data
TypeScriptsensitiveData
Server keydlp_analyzer
CategoryData Protection

What it detects

Google Cloud DLP recognizes ~150 built-in info types. Agnes ships five default SDP policies covering common bundles:
  • General PII — email, phone, person name, location, date of birth, IP address, URL, age.
  • Financial — credit cards, IBAN, SWIFT, U.S. bank routing / account numbers.
  • Healthcare PHI — medical record number, NPI, DEA, ICD-9 / 10, FDA codes, blood type.
  • Government IDs — SSN, passport, driver’s license, ITIN, ATIN.
  • Credentials & Secrets — auth tokens, basic-auth headers, passwords, GCP API keys, signed URLs.
You can author your own SDP policy with any combination of info types and likelihood thresholds; see SDP policies.

How it works

The prompt is sent to the Google Cloud DLP API (projects.content.inspect or projects.content.deidentify depending on the policy). DLP returns findings — info type, likelihood, location, optionally the matched quote — and Agnes packages them into the analyzer output. Findings can optionally be de-identified in place: Agnes returns a sanitized copy of the input with the configured transformation applied (replace with [REDACTED], mask with *, replace with the info type label, fully redact, etc.).

Parameters

KeyTypeRequiredDefaultNotes
sdp_policy_idselectNotenant defaultReference an SDP policy. Leave empty to use the tenant default.
You can also override per-request:
{ "prompt": "...", "policy_slug": "default-inbound", "sdp_policy_id": "ph-strict" }

Outputs and metrics

{
  "findings": [
    { "info_type": "EMAIL_ADDRESS", "likelihood": "VERY_LIKELY",
      "quote": "alice@example.com", "byte_range": { "start": 12, "end": 30 } },
    { "info_type": "US_SOCIAL_SECURITY_NUMBER", "likelihood": "LIKELY",
      "quote": "123-45-6789", "byte_range": { "start": 60, "end": 71 } }
  ],
  "deidentified_text": "Email me at [REDACTED] - SSN [REDACTED]",
  "metrics": {
    "dlp_processing_time_ms": 312.0,
    "findings_count": 2
  },
  "status": "OK"
}
MetricSuggested thresholds
findings_count> 0 (any finding), >= 3 (multiple), >= 10 (high volume).
dlp_processing_time_msObservability only; no recommended threshold.

Termination signals

SignalWhat it matches
Boolean: has_sensitive_dataAny finding (regardless of info type).
Match: info_typeA specific info type, e.g. US_SOCIAL_SECURITY_NUMBER, EMAIL_ADDRESS, CREDIT_CARD_NUMBER.
Output match hints (regex) ship for the most common info types: EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD_NUMBER, US_SOCIAL_SECURITY_NUMBER, plus a wildcard .* for “any sensitive data”.

Limits and cost

LimitValue
Max input tokens1,000,000
Requests / minute500 (per tenant)
DLP timeout30 s
Findings per request100 (DLP default)
Cost is Google Cloud DLP pricing — billed per content item inspected. See the DLP pricing page for current rates.

Typical latency

100–3000 ms depending on input size and the number of info types inspected. Networking to DLP dominates; expect ~200 ms baseline.

When to use it

  • Inbound and outbound. SDP is one of the highest-value analyzers to put on outbound — LLMs hallucinate plausible-looking PII (credit cards, SSNs, emails) regularly.
  • Match info types to your domain. Healthcare apps want PHI detection; fintech wants financial info types; everyone wants credential detection. Author per-product SDP policies.
  • Use replaceConfig for high-precision sanitization. Replacing with [REDACTED] keeps the LLM’s surrounding context legible while removing the leak.

Failure modes

  • DLP API error → the analyzer status flips to ERROR and the combined run’s overall status follows.
  • Token limit exceededpayload_too_large 413.

Next