Crossfire¶
Find duplicate and overlapping regex rules before they create duplicate alerts.
If you maintain detection rules for secret scanning, DLP, SAST, YARA, or IDS — you probably have rules that fire on the same input. Crossfire finds them.
$ crossfire scan my_rules.json
========================================================================
Crossfire Analysis Report
Rules: 142 from 1 file(s) | Corpus: 4,260 strings | Time: 3.2s
========================================================================
Duplicates (3 pairs)
Rule A Rule B Jaccard Recommendation
aws_key_v1 aws_key_v2 1.00 Keep A
slack_hook slack_webhook_url 0.94 Keep A
npm_token npm_auth_token 1.00 Keep A
Subsets (5 pairs)
Subset Rule Superset Rule A->B %
stripe_sk_live generic_secret 98%
github_pat_v2 github_token 100%
Quality Insights
Broad patterns (2):
generic_secret overlaps with 15 rules
email overlaps with 27 rules
Summary: Drop 7 rules, review 3, 2 broad patterns
The Problem¶
Detection rule sets grow over time. Different team members add rules independently. You merge rules from upstream projects. Eventually:
- Multiple rules match the same secret (duplicate findings)
- Broad rules like
generic_secretsilently cover what specific rules already catch - Nobody knows which rules to remove without breaking coverage
No existing tool solves this. Regex equivalence is undecidable for real-world patterns (anchors, lookahead, backreferences). YARA dedup tools only find exact matches. Secret scanners handle duplicates at the output level — too late.
How It Works¶
Crossfire uses a corpus-based approach: generate synthetic strings from each rule's regex, test every rule against every string, and measure overlap empirically.
This handles any regex feature Python supports — anchors, lookahead, backreferences, Unicode. No structural analysis limitations.
Performance¶
Tested on 1,537 rules (50 samples/rule, Docker):
| Step | Time |
|---|---|
| Load + validate | <1s |
| Generate corpus (parallel) | ~5s |
| Cross-evaluate (83M regex matches) | ~2s |
| Classify + quality assessment | <1s |
| Total | ~8s |
On Linux, Crossfire uses fork-based workers that inherit pre-compiled patterns via copy-on-write — zero serialization and recompilation overhead.
Want it faster? Install with RE2 support for accelerated regex compilation:
See Installation for details.