Proxy Server¶
The lumen-argus serve command starts a transparent HTTP proxy on localhost
that intercepts requests from AI coding tools, scans them for sensitive data,
and forwards clean requests to the upstream API provider over HTTPS.
Starting the Proxy¶
# Default: port 8080, all detectors enabled
lumen-argus serve
# Custom port
lumen-argus serve --port 9090
# With explicit config
lumen-argus serve --config /path/to/config.yaml
# JSON output (for log aggregation)
lumen-argus serve --format json
# Debug logging
lumen-argus serve --log-level debug
CLI Flags¶
| Flag | Default | Description |
|---|---|---|
--port, -p |
8080 |
Port to listen on (overrides config) |
--config, -c |
~/.lumen-argus/config.yaml |
Path to config file |
--log-dir |
~/.lumen-argus/audit |
Audit log directory |
--format, -f |
text |
Output format: text or json |
--no-color |
off | Disable ANSI color output |
--log-level |
warning |
Console log verbosity: debug, info, warning, error |
Provider Auto-Detection¶
lumen-argus automatically routes requests to the correct upstream API based on the request path and headers. No manual provider configuration is needed.
Detected by:
- Path starts with
/v1/messagesor/v1/complete - Header
x-api-keyis present - Header
anthropic-versionis present - Authorization header contains
Bearer sk-ant-
Upstream: https://api.anthropic.com
Detected by:
- Path starts with
/v1/chat/completions - Path starts with
/v1/completions - Path starts with
/v1/embeddings - Authorization header contains
Bearer sk-
Upstream: https://api.openai.com
Detected by:
- Path contains
/generateContent - Path starts with
/v1beta/
Upstream: https://generativelanguage.googleapis.com
Connecting AI Tools¶
Point each tool's base URL environment variable at the proxy:
# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8080
# GitHub Copilot / OpenAI-compatible tools
export OPENAI_BASE_URL=http://localhost:8080
# Gemini
export GEMINI_BASE_URL=http://localhost:8080
Connection Pooling¶
lumen-argus maintains a per-host connection pool for upstream HTTPS connections. Non-streaming responses return their connection to the pool for reuse. SSE streaming connections are consumed fully and closed (they cannot be reused mid-stream).
proxy:
timeout: 120 # Idle-read timeout in seconds (1-300)
connect_timeout: 10 # TCP connect timeout in seconds (1-120)
retries: 1 # Retry count on connection failure (0-5)
timeout is an idle-read timeout, not a total-duration cap — a long-running SSE
stream (e.g. a 1M-context extended thinking response) will not be killed as long
as the upstream keeps sending data. connect_timeout controls how long to wait
for the initial TCP handshake; raise it behind corporate proxy chains or
cross-region VPNs.
Idle connection eviction
Pooled connections are evicted after sitting idle for timeout * 2 seconds.
When a retry occurs after a stale-connection failure, the retry bypasses
the pool and creates a fresh connection.
Backpressure¶
The proxy limits concurrent upstream connections with a semaphore to prevent overwhelming the upstream provider during bursts.
When all connection slots are in use, new requests queue on the semaphore. A warning is logged when a request has to wait:
Changing max_connections requires restart
Unlike most config values, max_connections cannot be updated via SIGHUP
reload. The semaphore is initialized at startup and changing it requires
restarting the proxy.
Graceful Shutdown¶
On SIGINT or SIGTERM, lumen-argus stops accepting new connections and waits
for in-flight requests to complete before exiting.
The shutdown sequence:
- Signal received -- stop accepting new connections
- Wait up to
drain_timeoutseconds for active requests to finish - Force-close any requests still active after the timeout (logged as a warning)
- Close the connection pool
- Flush and close audit logs
- Print session summary
A second signal during the drain period forces an immediate exit.
TLS Configuration¶
Corporate Proxy / Custom CA¶
If your organization uses a TLS-intercepting proxy, provide the CA certificate bundle:
The ca_bundle value can be a path to a single PEM file or a directory of
certificates. The path is validated at config load time.
Disable Verification (Development Only)¶
Never disable TLS verification in production
Setting verify_ssl: false disables all certificate checks, making the
connection vulnerable to man-in-the-middle attacks. A warning is logged on
every startup when this is set.
TLS settings reload on SIGHUP -- the SSL context is rebuilt and all pooled
connections are closed so new ones pick up the updated certificates.
SSE Streaming Passthrough¶
For streaming API responses (used by Claude, ChatGPT, and other LLM providers),
lumen-argus uses read1() for low-latency chunk forwarding. Each chunk is
written to the client and flushed immediately, preserving the real-time
streaming experience.
Scanning happens on the request, not the response
lumen-argus scans the outbound request body (your prompts and context). Response streams are forwarded without inspection -- the goal is to prevent sensitive data from leaving your machine, not to filter what comes back.
Performance¶
The proxy is designed to add minimal overhead to each request:
| Metric | Target |
|---|---|
| Scan latency | < 50ms per request |
| Scan budget | First 200KB of request body |
| Pattern compilation | At import time (zero per-request cost) |
| Connection reuse | Pooled HTTPS connections for non-streaming requests |
Bodies larger than max_body_size (default 50MB) skip scanning entirely with a
warning logged. This prevents memory issues with unusually large payloads.
Local Endpoints¶
The proxy exposes two local endpoints that are handled directly (not forwarded upstream):
Health Check¶
Metrics¶
Returns Prometheus exposition format metrics including total requests, active requests, and per-provider statistics.
Configuration Reload¶
Send SIGHUP to reload the config file without restarting:
Reloadable settings include:
- Default action and per-detector action overrides
- Allowlists (secrets, PII, paths)
- Custom rules (recompiled on reload)
- Timeout, connect timeout, and retry counts
portandbind(graceful rebind — in-flight requests complete on the old address)max_body_size(scan limit and aiohttp rejection limit)- File log level
Settings that require a restart:
max_connections(aiohttp connector is initialized at startup)ca_bundleandverify_ssl(SSL context is baked into the connector)
Settings that take effect on next shutdown:
drain_timeout
Passthrough Mode¶
The proxy supports an active/passthrough mode toggle for disabling inspection without stopping the proxy:
# Disable inspection (forward everything without scanning)
curl -X PUT localhost:8081/api/v1/config \
-H "Content-Type: application/json" \
-d '{"proxy.mode": "passthrough"}'
# Re-enable inspection
curl -X PUT localhost:8081/api/v1/config \
-H "Content-Type: application/json" \
-d '{"proxy.mode": "active"}'
In passthrough mode:
- All requests are forwarded without scanning (no findings, no blocking)
- Audit trail still logs requests with
action=pass - MCP scanning, response scanning, and WebSocket frame scanning are all skipped
GET /api/v1/statusincludes"mode": "active"or"mode": "passthrough"- SSE emits a
mode-changedevent on transition - A
mode_changedfinding (severity=warning) is recorded when switching to passthrough
The mode is persisted in the SQLite config overrides — it survives proxy restarts.
No auth required
Any process that can reach the dashboard API can change the mode. This will be gated on auth/RBAC when available.
Internal Fail-Open¶
The proxy never returns 5xx to an AI tool because of an internal scanning bug. All scanning paths are wrapped in try/except:
- Pipeline scan failure → request forwarded unscanned,
scan_errorfinding logged (severity=critical) - MCP detection failure → MCP scanning skipped, request forwarded normally
- MCP argument scan failure → arguments not scanned, request forwarded
- WebSocket frame scan failure → frame relayed without scanning
- Response scan failure → response returned unmodified (already had try/except)
A spike in scan_error findings is a clear signal that a detector or rule has a bug.
Relay + Engine Architecture¶
For fault isolation, the proxy can run as two separate processes:
AI Tool → Relay (:8080) → Engine (:8090) → Upstream LLM Provider
↓ (engine down, fail-open)
└──────────────────────────→ Upstream LLM Provider
- Relay (port 8080): lightweight HTTP forwarder, ~400 lines, no scanning imports, near-zero crash risk
- Engine (port 8090): full inspection pipeline, rules, findings, dashboard — this is where bugs happen
Three runtime modes¶
# Separate processes (tray app spawns both)
lumen-argus engine --port 8090
lumen-argus relay --port 8080 --engine http://localhost:8090 --fail-mode open
# Combined in one process
lumen-argus serve --engine-port 8090 --fail-mode open
# Standard (no relay, backwards compatible)
lumen-argus serve
How the relay works¶
The relay has a 3-state machine:
| State | Behavior |
|---|---|
| STARTING | Queue requests for queue_on_startup seconds, then apply fail_mode |
| HEALTHY | Forward all traffic to engine |
| UNHEALTHY | Apply fail_mode policy |
A background task health-checks the engine every health_check_interval seconds via GET /health. The engine returns 503 while its pipeline is loading ("starting"), then 200 when ready ("ready"). The relay only marks the engine as healthy on 200.
Fail modes¶
| Engine state | fail_mode |
Relay behavior | Tool experience |
|---|---|---|---|
| Healthy | any | Forward via engine | Normal (inspected) |
| Unhealthy | open |
Forward directly to upstream provider | Normal (uninspected) |
| Unhealthy | closed |
Return 503 | Request fails |
| Starting | any | Queue briefly, then apply fail_mode | Brief delay |
Default: open (developer tools keep working even if the engine crashes).
Relay health endpoint¶
Configuration¶
relay:
port: 8080
fail_mode: open
engine_url: http://localhost:8090
health_check_interval: 2
health_check_timeout: 1
queue_on_startup: 2
timeout: 150
connect_timeout: 10
max_connections: 50
engine:
port: 8090
The relay timeout (150s) is intentionally higher than the engine timeout (120s) to account for scanning overhead. Both timeout and connect_timeout use the same idle-read / TCP-connect semantics as the proxy.
SIGHUP reload¶
Both relay and engine support kill -HUP <pid> for config reload:
- Engine: reloads rules, allowlists, detectors, timeouts, port/bind, mode
- Relay: reloads fail_mode, engine_url, health check intervals, timeout
Request tracing¶
The relay adds X-Request-ID: relay-N and X-Forwarded-For headers when forwarding to the engine. This enables correlating logs across the two processes.
Forward Proxy Mode¶
Some AI tools hardcode their API endpoints and don't support custom base URLs. For these tools, the agent provides a forward proxy mode that uses TLS interception (via mitmproxy) to scan traffic transparently.
How it works¶
AI Tool → HTTPS_PROXY=:9090 → Agent (mitmproxy TLS intercept)
↓ (decrypt + add identity headers)
Proxy (:8080) via /_forward
↓ (scan + forward to original host)
api.individual.githubcopilot.com
The tool thinks it's talking directly to the API. The agent terminates TLS
with a local CA certificate, inspects the request, enriches it with identity
headers, and re-routes AI traffic to the proxy's /_forward endpoint. Non-AI
hosts pass through without TLS interception.
Supported tools¶
| Tool | Why forward proxy is needed |
|---|---|
| Copilot CLI (GitHub auth) | COPILOT_PROVIDER_BASE_URL activates BYOK mode and breaks GitHub authentication |
Setup¶
# Start agent with both reverse relay and forward proxy
lumen-argus-agent relay --port 8070 --upstream http://proxy:8080 --forward-proxy-port 9090
# View CA certificate path
lumen-argus-agent forward-proxy ca-path
# Install CA to system trust store (requires admin)
sudo lumen-argus-agent forward-proxy install-ca
# Generate tool aliases
lumen-argus-agent forward-proxy aliases
Tool-specific aliases¶
Forward proxy uses shell aliases instead of global HTTPS_PROXY to avoid
routing all terminal HTTPS traffic through the proxy:
# Added to ~/.zshrc (or via lumen-argus-agent forward-proxy aliases)
source ~/.lumen-argus/forward-proxy-aliases.sh
The aliases file contains entries like:
alias copilot='HTTPS_PROXY=http://localhost:9090 NODE_EXTRA_CA_CERTS=~/.lumen-argus/ca/ca-cert.pem copilot'
Only the aliased tools route through the forward proxy. Other terminal tools
(curl, pip, brew, git) are unaffected.
CA certificate¶
The agent generates a CA certificate on first forward proxy start. The
certificate is stored at ~/.lumen-argus/ca/ca-cert.pem and must be trusted
by the tool's runtime:
- Node.js tools (Copilot CLI):
NODE_EXTRA_CA_CERTSenv var (set by alias) - System-wide:
sudo lumen-argus-agent forward-proxy install-ca
Findings¶
Forward proxy findings appear in the same findings table with
intercept_mode: forward and original_host populated from the
X-Lumen-Forward-Host header (e.g., api.individual.githubcopilot.com)
so you can distinguish traffic by its pre-interception destination. All
identity fields (hostname, username, working directory) are populated
via PID resolution, same as reverse proxy.
/_forward access control¶
The proxy's /_forward route has a three-state authentication gate,
checked in order:
- Pro mode — auth provider registered. Only authenticated agents
pass the gate. Unauthenticated callers are rejected with HTTP 403 and
a
authentication_errorbody. Providers that raiseAuthenticationError(explicit token rejection) surface as HTTP 401. This protects multi-tenant deployments where the proxy may hold agent-scoped credentials or enforce per-agent egress policy. - Community mode — no auth provider, non-loopback bind. Rejected
with HTTP 403 for the same reason as Pro. On a network-exposed bind
(e.g. Docker
--host 0.0.0.0) the proxy cannot rely on "every caller is local", so the relaxed gate would turn the proxy into an open SSRF relay to arbitrary hosts viaX-Lumen-Forward-Host. - Community mode — no auth provider, loopback bind. The gate is
skipped. Loopback callers are already local processes that could
reach the same upstream host directly; the proxy builds forwarding
headers purely from the incoming request (with
X-Lumen-*stripped) and does not inject credentials on the caller's behalf, so/_forwardgrants no capability the caller does not already have. The first such request emits a one-shotINFOlog line (#N /_forward: community mode ... — trusting local callers).
This means the community forward-proxy path works out of the box for
workstation deployments (lumen-argus-agent forward-proxy start +
shell aliases), and the security posture automatically upgrades the
moment a Pro extension registers an auth provider via
extensions.set_agent_auth_provider() or the operator rebinds the
proxy to a non-loopback address.
Security Model¶
Loopback by default
The proxy binds to 127.0.0.1 by default. Non-loopback binds
(e.g. --host 0.0.0.0 for Docker deployments) are supported but log
a warning at startup (async_proxy/_server.py:50), because a
network-exposed proxy changes the threat model. Specifically, the
community-mode /_forward gate relaxation only applies when the
bind is loopback — on non-loopback binds, /_forward requires an
authenticated agent even in community mode. If you run the proxy on
0.0.0.0, either register an auth provider (Pro) or use reverse-proxy
mode (ANTHROPIC_BASE_URL=http://proxy:8080) instead of forward proxy.
When bound to loopback, the proxy receives plain HTTP on localhost and upgrades to HTTPS for upstream connections. The unencrypted local hop never leaves the machine.