Rule Analysis¶

The Rule Analysis page detects duplicate, subset, and overlapping detection rules in your ruleset. It uses Crossfire, an optional corpus-based analysis engine, to generate test strings for each rule and measure how much they overlap.

This helps SecOps/admins clean up redundant rules — reducing scan time, eliminating duplicate alerts, and keeping the ruleset maintainable as it grows.

Installation¶

Crossfire is an optional dependency. Without it, the Rule Analysis tab shows install instructions.

# Via pip extras
pip install lumen-argus[rules-analysis]

# Or directly from GitHub (until PyPI release)
pip install git+https://github.com/lumen-argus/crossfire.git

The Docker image includes crossfire by default.

How it works¶

When you click Analyze, the engine runs four steps:

Load rules — reads all enabled rules from the DB (community, pro, custom)
Generate corpus — creates random strings that match each rule's regex pattern (default: 50 strings per rule)
Cross-evaluate — tests every rule against every other rule's corpus strings to build an overlap matrix
Classify — categorizes each pair as duplicate, subset, overlap, or disjoint based on match ratios

The analysis runs in a background thread. The dashboard shows live progress with a log terminal streaming each step.

Understanding results¶

Duplicates¶

Two rules match the same set of strings. Example: aws_key_v1 and aws_key_v2 both detect AKIA[0-9A-Z]{16} — one is redundant.

Jaccard score measures overlap symmetry (0.0 = no overlap, 1.0 = identical). Duplicates have Jaccard >= the configured threshold (default 0.8).

Action: Disable the redundant rule. The recommendation tells you which to keep based on tier priority (community > custom > pro).

Subsets¶

Rule A's corpus is fully matched by rule B, but not vice versa. Rule B is broader — it catches everything A catches, plus more.

Example: terraform_cloud_token is a subset of generic_secret — the generic pattern matches all Terraform tokens, but the Terraform pattern doesn't match all generic secrets.

Action: This is nuanced. The broader rule catches more, but the specific rule gives better finding names. In a DLP scanner, you often want both — the specific rule for precise alerting and the broad rule as a safety net. Use Review to compare the patterns, then decide:

Disable the subset if the broad rule's finding name is good enough
Dismiss if you want both rules active (the overlap is intentional)

Overlaps¶

Partial overlap — some strings match both rules, but neither fully contains the other. The card shows directional overlap percentages:

A→B: 40% means 40% of rule B's corpus is also matched by rule A
B→A: 30% means 30% of rule A's corpus is also matched by rule B

Action: Review both rules. Overlaps are usually expected between related patterns (e.g., generic_password and generic_password_unquoted). Dismiss if intentional.

Clusters¶

Groups of rules that are all connected by overlaps. If rules A, B, and C all overlap with each other, they form a cluster. Useful for identifying families of related patterns that could be consolidated.

Dashboard actions¶

Disable¶

Disables the recommended rule directly from the analysis page. The rule is toggled off via PUT /api/v1/rules/:name with enabled: false. The card shows a "resolved" state.

Review¶

Opens the Rules page filtered to both rules in the pair (comma-separated search). You can compare patterns, severity, tier, and hit counts side by side.

Dismiss¶

Hides the finding — it won't appear in future results until the next analysis. Dismissed pairs are stored in the DB and survive re-analysis. Use this for intentional overlaps you've reviewed and accepted.

Overlap badges on Rules page¶

Each rule on the Rules page shows a small badge like [2 ovr] if it appears in overlap analysis results. The badge links directly to the Rule Analysis page.

Configuration¶

Configure in config.yaml:

rule_analysis:
  samples: 50           # corpus strings per rule (higher = more stable, slower)
  threshold: 0.8        # overlap fraction for duplicate/subset classification
  seed: 42              # random seed for reproducible results
  auto_on_import: true  # run analysis after rule import

Tuning samples¶

The samples setting controls accuracy vs speed:

Samples	55 rules	1,700 rules	Stability
30	~0.5s	~30s	Borderline pairs may appear/disappear between runs
50	~0.8s	~50s	Good balance for most deployments
100	~1.5s	~2m	Reliable for compliance reporting

Lower samples means faster analysis but borderline overlaps (near the threshold) may fluctuate between runs. The crossfire classifier warns when confidence intervals are wide — increase samples if you see these warnings.

Threshold¶

The threshold controls what counts as a duplicate or subset (default 0.8 = 80% overlap). Lower values catch more pairs but increase noise. Higher values only flag near-exact matches.

Auto-analysis¶

When auto_on_import: true (default), analysis runs automatically in a background thread after:

Community rules auto-import on first startup
lumen-argus rules import CLI command

The dashboard receives an SSE event when analysis completes and auto-refreshes the Rule Analysis page.

API endpoints¶

Endpoint	Method	Description
`/api/v1/rules/analysis`	GET	Get cached results
`/api/v1/rules/analysis`	POST	Trigger new analysis (202, background)
`/api/v1/rules/analysis/status`	GET	Progress with log streaming (`?since=N`)
`/api/v1/rules/analysis/dismiss`	POST	Dismiss a finding pair

Pro enhancements¶

Pro extends the Rule Analysis page with:

Quality scoring — broad pattern detection, specificity analysis
Fully redundant rules list — one-click cleanup
Export — JSON and CSV reports for compliance
Auto-dedup on import — automatically disables Pro rules that duplicate community rules