Dedup Layers¶
Cross-request deduplication runs at three layers. Each one prevents a different waste or duplication problem at a different cost tier.
| Layer | Where | Key | Lifetime | Purpose |
|---|---|---|---|---|
| L1 — ContentFingerprint | RAM (sharded dict) | (conv_key, hash(field.text)) |
TTL + LRU (default 30 min) | Skip re-scanning text already scanned in earlier requests of the same conversation |
| L2 — FindingDedup | RAM (sharded dict) | (detector, type, hash(matched_value), session_id) |
TTL (default 30 min) | Skip re-recording the same finding within a session |
| L3 — UNIQUE partial index | SQLite findings table |
(content_hash, session_id, namespace_id) where both non-empty |
Persistent | Backstop after process restart; collapse on-conflict bumps seen_count |
Layered design means the right work is skipped at the right cost tier — L1 skips entire scans, L2 skips DB writes, L3 catches anything the in-memory layers miss.
Layer 1 — ContentFingerprint¶
Module: packages/proxy/lumen_argus/pipeline/_fingerprint.py
Problem solved: Claude Code (and most AI tools) re-send the entire conversation history on every turn. Without dedup, scanning a 50-message conversation on its 50th turn would re-scan all 50 messages — O(n²) total work over the session.
How it works:
- After extracting
ScanFields from the request body, compute a SHA-256 hash of each field's text. - Look up
(conv_key, hash)in the per-conversationseen_hashesset, whereconv_key = session.session_id. - Fields whose hash is already present skip detection entirely.
- New fields get scanned. On non-block actions, hashes are committed to the cache. On block, hashes stay "pending" until the proxy strips the offending content; only then are they committed.
Storage: 16 sharded dicts, each guarded by its own lock for low contention. LRU eviction once max_conversations is reached. TTL eviction (default 30 min) reclaims idle conversations.
Block-then-strip invariant: if a request is blocked and the hashes get committed prematurely, the same content would bypass scanning on retry. The two-phase commit (filter_new_fields returns hashes, commit_hashes is called only on success) prevents this.
Empty session_id: L1 is gated by if conv_key: and skipped entirely. Every field re-scans on every request — slower but correct (no global bucket).
Layer 2 — FindingDedup¶
Module: packages/proxy/lumen_argus/pipeline/_finding_dedup.py
Problem solved: L1 prevents re-scanning, but if a scan was re-done (after L1 eviction, after a process restart, or because a block-retry path re-scanned the same content) and detected the same finding, you don't want a duplicate analytics row. L2 also collapses repeated findings within a single request (the same secret in two fields).
How it works:
- Pipeline runs detection and policy.
result.findingscontains every detection. filter_new(findings, session_id)looks up(detector, type, hash(matched_value), session_id)for each. New keys are returned; keys seen within the TTL window are dropped.- Only the filtered list is sent to
record_findings()(analytics store). - All findings — new and dropped — still feed policy enforcement (block / redact / alert / log). Dedup never weakens action.
Storage: 16 sharded dicts, per-shard locks. Pure TTL eviction (default 30 min); no LRU.
Empty session_id: the cache is bypassed entirely (see empty-session contract). Every finding is treated as new — no read, no write — so cross-user collapse is impossible by construction.
Layer 3 — UNIQUE Partial Index¶
Module: packages/proxy/lumen_argus/analytics/schema.py, analytics/findings.py
Problem solved: L1 and L2 are in-memory only. After a process restart both are empty. Without L3, the same conversation re-sent post-restart would write duplicate rows. L3 also serves as a backstop for any code path that bypasses the in-memory layers (e.g. response-scan, MCP-scan, plugin paths).
Index definition:
CREATE UNIQUE INDEX idx_findings_dedup
ON findings(content_hash, session_id, namespace_id)
WHERE content_hash != '' AND session_id != '';
Key fields:
content_hash = sha256("detector|type|sha256(matched_value)[:16]")[:16]— stored on the row.session_id— raw session id fromSessionContext.namespace_id— multi-tenant scope (always1in community).
Insert path:
INSERT INTO findings (...)
VALUES (...)
ON CONFLICT(content_hash, session_id, namespace_id) WHERE content_hash != '' AND session_id != ''
DO UPDATE SET seen_count = seen_count + 1, timestamp = excluded.timestamp;
A new finding INSERTs with seen_count = 1. The same finding re-sent in the same session (e.g. after a restart wiped L2) UPDATEs the existing row and bumps seen_count. No duplicate rows.
Predicate carve-outs: WHERE content_hash != '' AND session_id != '' excludes two row classes from the constraint:
content_hash = ''— legacy or migration rows that did not compute a hash.session_id = ''— requests with no resolvable session (see empty-session contract). The constraint must not collapse multiple unknown-session users into a single row.
Empty session_id contract¶
A session_id of "" reaches the dedup layers when every extraction path failed: no x-session-id header, no x-opencode-session / x-session-affinity, no provider-metadata session, and the derived fingerprint had fewer than two parts to hash. This is rare on healthy traffic but reachable on malformed bodies, probes, or after a regression in session.extract_session.
The defended contract:
| Layer | Behavior with empty session_id |
|---|---|
| L1 ContentFingerprint | Skipped (if conv_key: gate) — every field re-scans |
| L2 FindingDedup | Cache bypassed — is_new and filter_new early-return; no read, no write |
| L3 UNIQUE partial index | Predicate excludes the row from the constraint — every insert lands as a new row |
Without these carve-outs, all unknown-session requests would share a single bucket per layer, and the first finding would suppress every subsequent one for the TTL window — across all users hitting the same condition.
Observability: two throttled WARNINGs surface the upstream condition without flooding the log:
argus.pipeline— fired fromScannerPipeline.scanat the L2 boundary.argus.analytics— fired fromFindingsRepository.recordat the L3 boundary. Defence-in-depth: response-scan, MCP-scan, and reload-import paths reachrecord()without going through the pipeline, so the storage layer is the only place that catches every empty-session insert.
Each warning emits at most once per minute and rolls the suppressed count into the next emission. The implementation lives in packages/proxy/lumen_argus/_throttled_log.py (ThrottledWarning).
Per-user attribution for empty-session findings is preserved via the source_ip, api_key_hash, hostname, and username columns on each row.
Worked example — populated session_id¶
Three turns of an Anthropic conversation on session_id = "fp:a3b1c2d4". The first user message contains a secret. Constants:
- AWS access key:
AKIAIOSFODNN7EXAMPLE(well-known example value) value_hash = sha256("AKIAIOSFODNN7EXAMPLE")[:16] = "f3d8a91b2c4e5d6f"(illustrative)content_hash = sha256("secrets|aws_access_key|f3d8a91b2c4e5d6f")[:16] = "7a2b…"(illustrative)
Turn 1 — first request¶
Request:
{
"model": "claude-opus-4-7",
"system": "You are a coding assistant. cwd=/Users/alice/proj",
"messages": [
{"role": "user", "content": "Help me debug this AWS key: AKIAIOSFODNN7EXAMPLE"}
]
}
Pipeline trace:
extractor.extract(...) returns 2 ScanFields:
[0] path="system", text="You are a coding assistant…"
[1] path="messages[0].content", text="Help me debug this AWS key: AKIA…"
L1 ContentFingerprint:
conv_key = "fp:a3b1c2d4"
shard[7]["fp:a3b1c2d4"] absent → both fields are NEW
pending_hashes = ("fp:a3b1c2d4", ["e4f1…", "9c7a…"])
Detection: hits AWS access key → Finding(detector="secrets", type="aws_access_key", …)
Policy: action = "alert" → commit_hashes(pending_hashes)
L1 after Turn 1:
shard[7] = {
"fp:a3b1c2d4": ConversationCache(seen_hashes={"e4f1…", "9c7a…"}, last_access=now)
}
L2 FindingDedup:
key = ("secrets", "aws_access_key", "f3d8a91b2c4e5d6f", "fp:a3b1c2d4")
shard[3][key] absent → new
shard[3][key] = monotonic_now
L3 SQLite:
INSERT … content_hash="7a2b…" session_id="fp:a3b1c2d4" seen_count=1
(no conflict; partial-index predicate true)
DB state: 1 row, seen_count = 1.
Turn 2 — conversation grows¶
Claude Code re-sends history plus a new assistant reply and a new user follow-up:
{
"messages": [
{"role": "user", "content": "Help me debug this AWS key: AKIAIOSFODNN7EXAMPLE"},
{"role": "assistant", "content": "I see the AWS access key in your message…"},
{"role": "user", "content": "Should I rotate it?"}
]
}
L1 lookup on 4 fields (system + 3 messages):
e4f1… in seen_hashes → SKIP
9c7a… in seen_hashes → SKIP
b8e2… NEW ← assistant reply
5d04… NEW ← "Should I rotate it?"
skipped = 2 → bump_seen_counts("fp:a3b1c2d4")
UPDATE findings SET seen_count = seen_count + 1 WHERE session_id = 'fp:a3b1c2d4';
→ Turn 1 row now has seen_count = 2.
Detection on the 2 new fields: clean. No findings.
L2: nothing to dedup.
L3: no INSERT.
DB state: 1 row, seen_count = 2.
Turn 3 — user pastes the same key again¶
User: "Just to confirm, this is the key: AKIAIOSFODNN7EXAMPLE" (different surrounding text → different field hash).
L1: new field → scan
Detection: same matched_value as Turn 1
L2: key = ("secrets", "aws_access_key", "f3d8a91b2c4e5d6f", "fp:a3b1c2d4")
present in shard[3] within TTL → is_new returns False → drop
record_findings called with empty list. No DB write.
DB state unchanged: 1 row, seen_count = 2.
If L1 had missed (cache evicted before Turn 3), the finding would have re-detected — and L3's ON CONFLICT … DO UPDATE seen_count = seen_count + 1 would have caught the duplicate at the storage layer. Defence in depth.
Worked example — empty session_id (the cross-user case)¶
Three users behind a central proxy, each sends a request whose extraction yielded session_id = "" and detection returned the same aws_access_key finding:
| User | source_ip | api_key_hash | session_id | matched_value |
|---|---|---|---|---|
| A | 10.0.0.1 | aaaa1111 | "" |
AKIAIOSFODNN7EXAMPLE |
| B | 10.0.0.2 | bbbb2222 | "" |
AKIAIOSFODNN7EXAMPLE |
| C | 10.0.0.3 | cccc3333 | "" |
AKIAIOSFODNN7EXAMPLE |
L1: conv_key = "" → if conv_key: is False → L1 skipped for all three.
L2: with the empty-session contract in place,
is_new(finding, session_id="") returns True without touching the cache.
All 3 findings reach record_findings.
Pipeline emits one throttled WARNING (1/min):
WARNING argus.pipeline empty session_id at finding-dedup boundary:
provider=anthropic findings=1 (dedup bypassed; 0 similar warnings
suppressed in last 60s)
Users B and C land inside the throttle window → counter increments.
L3: predicate WHERE content_hash != '' AND session_id != ''
User A insert: '7a2b…' != '' AND '' != '' → predicate FALSE
row not in partial index → no UNIQUE check → plain INSERT
User B insert: same → plain INSERT
User C insert: same → plain INSERT
Storage emits one throttled WARNING (1/min):
WARNING argus.analytics empty session_id at findings storage:
provider=anthropic findings=1 (unique-index bypassed; 0 similar
warnings suppressed in last 60s)
DB state:
id=1 source_ip=10.0.0.1 api_key_hash=aaaa1111 session_id="" seen_count=1
id=2 source_ip=10.0.0.2 api_key_hash=bbbb2222 session_id="" seen_count=1
id=3 source_ip=10.0.0.3 api_key_hash=cccc3333 session_id="" seen_count=1
Per-user attribution preserved via the network/identity columns. Operator sees two distinct WARNINGs (one per layer) confirming the upstream extraction gap, without log flood.
Failure-mode matrix¶
| Failure | L1 catches | L2 catches | L3 catches |
|---|---|---|---|
| Conversation history re-sent | ✓ skip re-scan | – | – |
| L1 evicted (TTL/LRU) before retry | ✗ | ✓ skip re-record | ✓ on-conflict bump |
| Process restart between requests | ✗ wiped | ✗ wiped | ✓ on-conflict bump |
| Same secret in two fields of one request | ✗ different field hashes | ✓ same (detector, type, value) key |
✓ same content_hash |
| Code path bypasses pipeline (response-scan, MCP, reload) | ✗ | ✗ | ✓ |
Tuning¶
# .lumen-argus.yaml
dedup:
conversation_ttl_minutes: 30 # L1 TTL
finding_ttl_minutes: 30 # L2 TTL
max_conversations: 10000 # L1 LRU cap (must be >= 16, the shard count)
max_hashes_per_conversation: 5000 # L1 per-conversation cap
L3 has no tunable — partial-index semantics are fixed in schema.
Related modules¶
packages/proxy/lumen_argus/pipeline/_pipeline.py— orchestration + L1/L2 call sitespackages/proxy/lumen_argus/pipeline/_fingerprint.py— L1 implementationpackages/proxy/lumen_argus/pipeline/_finding_dedup.py— L2 implementationpackages/proxy/lumen_argus/analytics/schema.py— L3 index definitionpackages/proxy/lumen_argus/analytics/findings.py— L3 INSERT pathpackages/proxy/lumen_argus/_throttled_log.py—ThrottledWarninghelper for both empty-session warnings