Detection¶
lumen-argus ships with three built-in detectors that run sequentially on every request body: secrets, PII, and proprietary content. All regex patterns are compiled at import time to meet the <50ms scanning target.
Secrets Detector¶
The secrets detector combines 34+ compiled regex patterns with Shannon entropy analysis to catch credentials, tokens, and cryptographic material.
Cloud Provider Keys¶
| Pattern Name | Example Match | Severity |
|---|---|---|
aws_access_key |
AKIA... (20 chars) |
critical |
aws_secret_key |
aws_secret_key = "..." (40 chars) |
critical |
google_api_key |
AIza... (39 chars) |
critical |
gcp_service_account |
"type": "service_account" |
critical |
azure_subscription_key |
azure_key = "..." (32 hex chars) |
high |
AI Provider Keys¶
| Pattern Name | Example Match | Severity |
|---|---|---|
anthropic_api_key |
sk-ant-... |
critical |
openai_api_key |
sk-... (20+ chars, entropy-gated) |
critical |
Version Control and CI Tokens¶
| Pattern Name | Example Match | Severity |
|---|---|---|
github_token |
ghp_..., ghs_..., gho_..., ghr_... (36+ chars) |
critical |
github_fine_grained_pat |
github_pat_... (22+ chars) |
critical |
gitlab_token |
glpat-... (20+ chars) |
critical |
npm_token |
npm_... (36+ chars) |
critical |
pypi_token |
pypi-... (50+ chars) |
critical |
Cryptographic Material¶
| Pattern Name | Example Match | Severity |
|---|---|---|
private_key_pem |
-----BEGIN RSA PRIVATE KEY----- |
critical |
ssh_private_key |
-----BEGIN OPENSSH PRIVATE KEY----- |
critical |
Tokens and Sessions¶
| Pattern Name | Example Match | Severity |
|---|---|---|
jwt_token |
eyJ...eyJ... (3-part base64) |
high |
slack_token |
xoxb-..., xoxp-... |
critical |
slack_webhook |
https://hooks.slack.com/services/T.../B.../... |
high |
discord_webhook |
https://discord.com/api/webhooks/... |
high |
Payment¶
| Pattern Name | Example Match | Severity |
|---|---|---|
stripe_secret_key |
sk_live_..., sk_test_..., rk_live_... |
critical |
stripe_webhook_secret |
whsec_... |
critical |
Communication Services¶
| Pattern Name | Example Match | Severity |
|---|---|---|
twilio_api_key |
SK... (32 hex chars) |
high |
sendgrid_api_key |
SG.... (two base64 segments) |
critical |
mailgun_api_key |
key-... (32 chars) |
high |
Infrastructure¶
| Pattern Name | Example Match | Severity |
|---|---|---|
heroku_api_key |
heroku_api_key = "..." (UUID) |
critical |
docker_hub_pat |
dckr_pat_... (20+ chars) |
critical |
terraform_cloud_token |
terraform_token = "..." (entropy-gated) |
high |
vault_token |
hvs.... (24+ chars) |
critical |
datadog_api_key |
datadog_key = "..." (32 hex chars) |
high |
pagerduty_key |
pagerduty_key = "..." (entropy-gated) |
high |
Database URLs¶
| Pattern Name | Example Match | Severity |
|---|---|---|
database_url |
postgres://user:pass@host/db, mongodb+srv://... |
critical |
basic_auth_url |
https://user:pass@host |
high |
Generic Patterns (Entropy-Gated)¶
These patterns require Shannon entropy >4.5 bits/char to avoid false positives on placeholder values:
| Pattern Name | What It Matches | Severity |
|---|---|---|
generic_password |
password = "..." (8+ chars) |
high |
generic_api_key |
api_key = "..." (16+ chars) |
high |
generic_secret |
secret = "..." (16+ chars) |
high |
env_file_assignment |
export SECRET_KEY=... |
warning |
Shannon Entropy Analysis¶
Beyond pattern matching, the secrets detector performs an entropy sweep on text near secret-related keywords. This catches credentials that do not match any specific pattern but have the statistical profile of a random secret.
Proximity keywords
The entropy sweep activates when text appears near keywords like key,
secret, token, password, credential, auth, private, api_key,
access_key, bearer, and authorization.
PII Detector¶
The PII detector uses regex patterns with domain-specific validators to reduce false positives. Every match is validated before producing a finding.
Patterns¶
- Pattern: Standard email format
- Severity: warning
- Validation: None (regex match is sufficient)
- Pattern:
NNN-NN-NNNN - Severity: critical
- Validation: Range validation rejects area
000,666, and900+; rejects group00and serial0000
- Pattern: 13-19 digit card numbers (with optional spaces/dashes)
- Severity: critical
- Validation: Luhn algorithm checksum
- Pattern: US phone numbers with optional
+1, parentheses, dots, dashes - Severity: warning
- Validation: None
- Pattern:
+N NNNN...(country code + 4-14 digits) - Severity: info
- Validation: None
- Pattern:
N.N.N.N(dotted quad) - Severity: info
- Validation: Excludes private ranges (
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16), loopback (127.0.0.0/8), and link-local (169.254.0.0/16)
- Pattern: Two-letter country code + 2 check digits + up to 30 alphanumeric chars
- Severity: warning
- Validation: MOD-97 checksum (ISO 13616)
- Pattern: One uppercase letter followed by 8 digits
- Severity: info
- Validation: None
Proprietary Content Detector¶
The proprietary detector catches two categories: sensitive file types being sent to AI providers and confidentiality keywords in request content.
File Pattern Blocklist¶
Critical severity
Keyword Detection¶
Keywords are matched case-insensitively in the full request body text.
Critical keywords
CONFIDENTIAL PROPRIETARY TRADE SECRET
DO NOT DISTRIBUTE INTERNAL ONLY NDA REQUIRED
Warning keywords
DRAFT PRE-RELEASE UNRELEASED
Severity Levels¶
All findings carry one of four severity levels:
| Level | Meaning | Typical Action |
|---|---|---|
| critical | Active credentials, keys, or highly sensitive data | block |
| high | Probable secrets, passwords with high entropy | block or alert |
| warning | Possible PII, sensitive keywords, draft markers | alert |
| info | Low-confidence signals (international phone, public IP) | log |
Finding Deduplication¶
When the same secret appears multiple times in a request (common with autocomplete context), findings are collapsed into a single entry with a count.
lumen-argus: 3 finding(s) detected
[CRITICAL] secrets: aws_access_key (x47)
[WARNING] pii: email (x3)
[INFO] pii: ip_address
Deduplication uses a composite key of (detector, type, matched_value). The
count reflects how many times the identical value was found across all scanned
fields in the request.