Changelog¶

All notable changes to lumen-argus are documented here.

0.6.0 (2026-03-23)¶

MCP Security Hardening (Phase 2)¶

Confused deputy protection: RequestTracker tracks outbound request IDs, rejects unsolicited responses from MCP servers (FIFO eviction at 10K, seeding gate, configurable warn/block)
Tool description poisoning detection: 7 pattern categories (instruction tags, file exfiltration, cross-tool manipulation, dangerous exec, download+exec, script injection, command injection)
Tool drift/rug-pull detection: SHA-256 baselines in mcp_tool_baselines DB table, human-readable diff summary on change (description length, added text, parameter changes)
Session binding: validates tools/call against tool inventory from first tools/list response (opt-in via mcp.session_binding, 10K tool cap, configurable warn/block)
All 4 proxy modes (stdio, HTTP bridge, HTTP listener, WS bridge) wired with all security features
Config: mcp.request_tracking, mcp.unsolicited_response_action, mcp.scan_tool_descriptions, mcp.detect_drift, mcp.drift_action, mcp.session_binding, mcp.unknown_tool_action
31 new tests (935 total)

MCP Proxy Unification (Phase 1)¶

Unified lumen-argus mcp subcommand replaces mcp-wrap with flag-based transport modes
Stdio subprocess mode: lumen-argus mcp -- <command> (replaces mcp-wrap)
HTTP bridge mode: lumen-argus mcp --upstream http://... (stdio client, HTTP upstream)
HTTP reverse proxy mode: lumen-argus mcp --listen :8089 --upstream http://...
WebSocket bridge mode: lumen-argus mcp --upstream ws://...
New lumen_argus/mcp/ package with transport abstraction (scanner, transport, proxy, env_filter)
Environment variable restriction for subprocess mode — safe vars only by default, --env KEY=VALUE to add more, --no-env-filter to disable
Config: mcp.env_filter, mcp.env_allowlist in YAML
--action flag to override default action per invocation
Removed mcp_scanner.py and mcp_wrap.py (replaced by lumen_argus/mcp/ package)

Rules Performance Optimization (Phase 1)¶

Aho-Corasick pre-filter: single O(n) pass narrows 1,700+ rules to ~15 candidates per field
Literal extraction from regex patterns via sre_parse (handles alternation, (?i), escapes)
Early termination: stop after first match when action is block
Hot-first ordering: rules sorted by hit_count DESC on reload
In-memory hit count accumulation with 60s periodic batch flush to DB
Graceful fallback: if pyahocorasick unavailable, sequential scan (current behavior)
Pro metrics hook: extensions.set_rule_metrics_collector(collector)
pyahocorasick>=2.0 added as dependency (pre-built wheels for all major platforms)
Benchmark: 184KB/53 rules: 236ms → 36ms (under 50ms target)
Parallel rule batching: when enabled (Pro toggle) and candidates > 50, groups rules by detector category and evaluates concurrently via ThreadPoolExecutor
RulesDetector.set_parallel(bool) for runtime toggle from Pipeline page
Pipeline page: "Parallel rule evaluation" toggle in Advanced section
Config: pipeline.parallel_batching in YAML + DB overrides, applied at startup and SIGHUP
Accelerator factory hook: extensions.set_accelerator_factory(factory) — Pro/Enterprise can swap Aho-Corasick with a custom pre-filter engine (e.g., Hyperscan/Vectorscan); graceful fallback if factory raises

0.5.0 (2026-03-22)¶

Async Proxy (Phase 1)¶

New async_proxy.py — aiohttp.web-based proxy server replacing ThreadingHTTPServer
Non-blocking I/O with coroutine per request instead of thread per request
CPU-bound scanning runs in thread pool via asyncio.to_thread()
SSE streaming via StreamResponse + async iteration
Built-in connection pooling via aiohttp.ClientSession with TCPConnector
Retry on connection errors (aiohttp.ClientConnectionError)
SIGHUP reload via loop.add_signal_handler()
All existing functionality preserved: MCP detection, history stripping, session extraction, response scanning
aiohttp>=3.9 added as dependency
18 new integration tests, 867 total tests passing

WebSocket Unification (Phase 2)¶

WebSocket relay moved from standalone port 8083 into the main proxy on port 8080
Client connects via ws://localhost:8080/ws?url=ws://target
Uses aiohttp.web.WebSocketResponse + ClientSession.ws_connect() — no separate websockets package needed
websockets dependency removed — aiohttp handles both HTTP and WebSocket
SSRF protection: only ws:// and wss:// schemes allowed
Origin validation configurable via websocket.allowed_origins
SIGHUP reloads scanner config without server restart (same port)
Dependencies reduced from 3 (pyyaml, aiohttp, websockets) to 2 (pyyaml, aiohttp)

WebSocket Connection Lifecycle Hooks¶

Each WebSocket connection assigned a unique connection_id (UUID)
Extension hook fires on open, finding_detected, and close events
ws_connections SQLite table tracks connection history (target URL, origin, duration, frame counts, findings, close code)
Default community hook records to analytics store; Pro can override for richer analytics
Hook calls run in thread pool via asyncio.to_thread() — no event loop blocking
Connection data included in daily retention cleanup
WebSocket findings now enforced by policy (block closes connection, alert logs + continues)
REST API: GET /api/v1/ws/connections, GET /api/v1/ws/stats

Cleanup¶

Removed dead proxy.py (old ThreadingHTTPServer) and pool.py (old connection pool)
Removed legacy test files (test_proxy_integration.py, test_pool.py)
Session tracking tests updated to use async_proxy module

Thread Safety (Python 3.13+ free-threaded / no-GIL)¶

ScannerPipeline.scan() snapshots shared references (_allowlist, _policy, _decoder, _detectors) under _reload_lock before scanning — prevents torn reads during reload()
ScannerPipeline.reload() swaps references under the same _reload_lock
AsyncArgusProxy._active_requests uses threading.Lock for atomic increment/decrement
Safe for PYTHON_GIL=0 (PEP 703) — all shared mutable state is properly synchronized

0.4.0 (2026-03-20)¶

Rules Engine¶

DB-backed detection rules replace hardcoded Python pattern files
rules table in SQLite: name, pattern, detector, severity, action, enabled, tier, source, description, tags, validator
CLI: lumen-argus rules import/export/list/validate subcommands
Auto-import 43 community rules on first serve (opt out: --no-default-rules)
RulesDetector: loads compiled patterns from DB, license-gated for Pro rules
Validator registry: luhn, ssn_range, iban_mod97, exclude_private_ips
Capture-group-aware matching: group(1) preferred over group(0)
Pipeline uses RulesDetector when DB has rules, falls back to hardcoded detectors
YAML custom_rules: reconciled to DB on startup/SIGHUP (Kubernetes-style)
SIGHUP reloads rules from DB via RulesDetector.reload()
Configurable: rules.auto_import: false to skip auto-import

Cross-Request Deduplication¶

3-layer dedup architecture eliminates redundant scanning of conversation history
Layer 1: Content fingerprinting — per-session SHA-256 hash set skips already-scanned fields before detectors run
Layer 2: Finding-level TTL cache — session-scoped (detector, type, matched_value_hash, session_id) suppresses duplicate DB writes
Layer 3: Store-level unique constraint — content_hash column with UNIQUE(content_hash, session_id) index, INSERT OR IGNORE
Configurable via dedup: config section (conversation_ttl_minutes, finding_ttl_minutes, max_conversations, max_hashes_per_conversation)
Background cleanup schedulers for both content fingerprint and finding caches
All findings remain in ScanResult for policy enforcement — dedup only affects DB recording
Notification dispatcher still receives all findings (has its own cooldown)
seen_count column tracks how many requests included each finding (dashboard shows ×N badge)
content_hash uses hash(matched_value) — no collisions between different secrets with same masked preview
bump_seen_counts() increments existing findings when conversation history is re-sent

Value Hashing¶

HMAC-SHA-256 hash of matched secret values stored as value_hash in findings DB
Enables cross-session secret tracking without persisting raw secrets
Auto-generated 32-byte key at ~/.lumen-argus/hmac.key (0600 permissions)
Full 64 hex chars output (256 bits, no truncation)
Configurable: analytics.hash_secrets (default: true)
Dashboard detail panel shows "Value Hash" field when populated

0.3.0 (2026-03-19)¶

Session Tracking¶

Per-request session context extraction: account_id, session_id, device_id, source_ip, working_directory, git_branch, os_platform, client_name, api_key_hash
Claude Code metadata.user_id JSON string parsing (account_uuid, device_id, session_id)
System prompt field extraction for working directory, git branch, and OS platform
User-Agent parsing for client tool identification
Derived fingerprint (fp:<hash>) fallback when no provider session ID
9 session columns in findings DB (no migration — direct schema update)
GET /api/v1/sessions endpoint with grouped finding counts
GET /api/v1/findings supports session_id and account_id filters
Dashboard: Session, Account, Device, Branch, Client columns in findings table
Session filter dropdown with clickable session IDs
api_key_hash excluded from JSONL audit log (stored in analytics DB only)
post_scan hook signature updated with session= kwarg (backward-compatible)

0.2.0 (2026-03-19)¶

Notification Channels¶

Notifications page unlocked in community dashboard (freemium: 1 channel any type, unlimited with Pro)
7 channel types available: webhook, email, Slack, Teams, PagerDuty, OpsGenie, Jira
Kubernetes-style YAML reconciliation — YAML is fully authoritative
Dashboard: CRUD for dashboard-managed channels, read-only for YAML channels with badge
Source-aware buttons: YAML channels get Toggle + Test, dashboard channels get full CRUD
Audit trail: created_by / updated_by fields on all channel operations
Channel limit enforcement with atomic count + insert under same lock
Sensitive field masking in API responses (webhook URLs, passwords, API keys)

Observability¶

/health endpoint: added uptime field, extension hook for Pro enrichment
/metrics endpoint: extension hook for Pro to append Prometheus metrics
OpenTelemetry tracing hook: set_trace_request_hook() wraps full request lifecycle
Span attributes: provider, body.size, findings.count, action, scan.duration_ms
All hooks fully guarded — exceptions never break requests

CI/CD¶

Docker smoke test in GitHub Actions (build image, verify /health)
Workflow permissions hardened (contents: read)
Concurrency: cancel in-progress runs on new push
README: CI status badge

Documentation¶

MkDocs Material site at lumen-argus.github.io/lumen-argus/docs/
Landing page at lumen-argus.github.io/lumen-argus/ with comparison table
GitHub Pages auto-deploy via Actions

Open Source¶

LICENSE (MIT, Artem Senenko)
SECURITY.md (responsible disclosure policy)
CONTRIBUTING.md (setup, constraints, commit format)
Issue templates (bug report, feature request, security link)
PR template (stdlib-only, security checklist)
Repo topics, description, homepage

0.1.0 (2026-03-17)¶

Initial release of the lumen-argus Community Edition.

Detection¶

34+ secret detection patterns (AWS, GitHub, Anthropic, OpenAI, Google, Stripe, Slack, JWT, database URLs, PEM keys, generic passwords)
Shannon entropy analysis (>4.5 bits/char near secret keywords)
PII detection with validation: email, SSN (range validation), credit card (Luhn), phone, IP (excludes private), IBAN (MOD-97), passport
Proprietary code detection: file pattern blocklist, keyword detection
Custom regex rules in config (unlimited, SIGHUP reloadable)
Duplicate finding deduplication

Proxy¶

Transparent HTTP proxy for AI coding tools (Claude Code, Copilot, Cursor)
Provider auto-detection: Anthropic, OpenAI, Gemini
SSE streaming passthrough via read1()
Connection pooling with idle timeout
Backpressure via max_connections semaphore
Graceful drain on shutdown (drain_timeout)
Custom CA bundle support for corporate proxies
/health JSON endpoint
/metrics Prometheus endpoint

Scanning¶

lumen-argus scan for file and stdin scanning
--diff mode for git pre-commit hooks (staged changes or ref diff)
--baseline / --create-baseline for known finding suppression
Differentiated exit codes (0=clean, 1=block, 2=alert, 3=log)
JSON output format for CI pipelines

Configuration¶

Bundled YAML parser (no PyYAML dependency)
Global + project-level config with merge semantics
Hot-reload via SIGHUP with config diff logging
Allowlists for secrets, PII, and paths (exact + glob)
Per-detector action overrides

Logging¶

Rotating application log (~/.lumen-argus/logs/lumen-argus.log)
Secure file permissions (0o600, atomic creation)
Startup summary at INFO (version, Python, OS, config, detectors)
Block/redact actions at INFO, slow scans at WARNING
lumen-argus logs export --sanitize for safe log sharing
Thread-safe JSONL audit log with retention policy

Extensions¶

Plugin system via Python entry points
Hooks: pre_request, post_scan, evaluate, config_reload, redact
Public API: pipeline.reload(), registry.set_proxy_server()

Security¶

Localhost-only binding (127.0.0.1, enforced at runtime)
matched_value never written to disk
Async-signal-safe shutdown handlers
TLS certificate verification with custom CA support