Skip to content

Crossfire

Find duplicate and overlapping regex rules before they create duplicate alerts.

If you maintain detection rules for secret scanning, DLP, SAST, YARA, or IDS — you probably have rules that fire on the same input. Crossfire finds them.

$ crossfire scan my_rules.json

========================================================================
  Crossfire Analysis Report
  Rules: 142 from 1 file(s) | Corpus: 4,260 strings | Time: 3.2s
========================================================================

  Duplicates (3 pairs)
  Rule A                    Rule B                     Jaccard Recommendation
  aws_key_v1                aws_key_v2                   1.00      Keep A
  slack_hook                slack_webhook_url            0.94      Keep A
  npm_token                 npm_auth_token               1.00      Keep A

  Subsets (5 pairs)
  Subset Rule               Superset Rule               A->B %
  stripe_sk_live            generic_secret                 98%
  github_pat_v2             github_token                  100%

  Quality Insights
  Broad patterns (2):
    generic_secret           overlaps with 15 rules
    email                    overlaps with 27 rules

  Summary: Drop 7 rules, review 3, 2 broad patterns

The Problem

Detection rule sets grow over time. Different team members add rules independently. You merge rules from upstream projects. Eventually:

  • Multiple rules match the same secret (duplicate findings)
  • Broad rules like generic_secret silently cover what specific rules already catch
  • Nobody knows which rules to remove without breaking coverage

No existing tool solves this. Regex equivalence is undecidable for real-world patterns (anchors, lookahead, backreferences). YARA dedup tools only find exact matches. Secret scanners handle duplicates at the output level — too late.

How It Works

Crossfire uses a corpus-based approach: generate synthetic strings from each rule's regex, test every rule against every string, and measure overlap empirically.

Load rules -> Validate regexes -> Generate test strings -> Cross-evaluate -> Classify -> Report

This handles any regex feature Python supports — anchors, lookahead, backreferences, Unicode. No structural analysis limitations.

Performance

Tested on 1,537 rules (50 samples/rule, Docker):

Step Time
Load + validate <1s
Generate corpus (parallel) ~5s
Cross-evaluate (83M regex matches) ~2s
Classify + quality assessment <1s
Total ~8s

On Linux, Crossfire uses fork-based workers that inherit pre-compiled patterns via copy-on-write — zero serialization and recompilation overhead.

Want it faster? Install with RE2 support for accelerated regex compilation:

pip install crossfire-rules[re2]

See Installation for details.