PyDiffWatch detection rules are structured YAML placed in rules/community/*.yaml. They are pure
data: the engine walks them against facts it computed from a package diff. There is no code in a rule
and no eval in the matcher — a rule can describe a match but can never execute. A rule that fails
validation (unknown predicate, wrong-scope predicate, bad value, malformed tree) is dropped at load time
with a logged warning and never evaluated. This is what makes it safe to load rules from anyone.
This guide is written so a human or an LLM assistant can author a valid rule. If you're an assistant: emit YAML that conforms exactly to the schema below; do not invent predicates or fields.
For each changed release the engine builds facts, runs every rule whose match is satisfied, and
sums the weights of the fired rules. If the total reaches the configured threshold (threshold_t,
default 40), the release is escalated to the LLM reviewer. Weights are additive and tunable — there is no
magic; a rule worth 45 on its own crosses the default threshold, a rule worth 20 needs a second signal.
| field | required | meaning |
|---|---|---|
id |
yes | unique identifier (kebab-case) |
applies_to |
yes | evaluation scope: code | binary | dep | maintainer |
weight |
yes | points added when the rule fires (number) |
match |
yes | a boolean tree of predicates (below) |
attack_type |
no | informational label (e.g. credential-exfil) |
location_scaled |
no | if true, the fired weight is multiplied by the file's location weight (code scope only); default false |
description |
no | human/LLM-readable explanation |
Evaluation scope decides what each rule runs against:
code— evaluated once per changed.py/.pyx/.pyifile.binary— once per added non-source / oversized / foreign member.dep— once per newly-added dependency finding.maintainer— once per release.
Rules in binary/dep scope fire once per matching item, and the weights sum — so two foreign-language
files at weight 25 contribute 50 automatically.
match is a single node. A node is either a boolean node or a predicate leaf.
all: [ ...nodes... ]— true if every child is true (AND)any: [ ...nodes... ]— true if any child is true (OR)not: { ...node... }— negation
Predicate leaves (each a one-key mapping). The Scope column says which applies_to it's valid in;
using a predicate outside its scope makes the rule invalid (and dropped).
| predicate | scope | fires when |
|---|---|---|
bound_call: {category: <c>} |
code | an import-bound call of category c is on an added line. c ∈ decode, exec, process, network, credential. Accepts a list: {category: [decode, exec]} (any of). |
bound_call: {name: <fn>} |
code | a bound/builtin call with that exact function name (e.g. system, b64decode) is on an added line |
import_present: {module: <m>} |
code | module m is imported in the file |
regex: {pattern: <re>} |
code | any added line matches the regular expression |
blob_present: true |
code | the file has an added line that looks like an encoded blob (long base64 run, very long line, or high-entropy window) |
syntax_error: true |
code | a complete .py file fails to parse |
location_at_least: <n> |
code | the file's location weight ≥ n (3.0 = auto-exec/auto-import: setup.py/setup.cfg/pyproject.toml/__init__.py/conftest.py/sitecustomize.py/.pth; 1.0 = normal; 0.2 = tests/docs/examples) |
binary_reason: <r> |
binary | the binary's reason is r. r ∈ source-too-large, foreign-language-source, new-binary |
dep_reason: <r> |
dep | the dependency finding's reason is r. r ∈ typosquat, nonexistent, brand-new |
maintainer_changed: true |
maintainer | the owner set changed vs. the prior stored release |
bound_call only counts a call when its receiver resolves, through the file's imports, to a dangerous
origin. os.system(...) counts as process; re.compile(...) does not count as exec;
pickle.loads(...) counts as decode but json.loads(...) does not. This binding is what keeps rules
precise — you can write bound_call: {category: exec} without it firing on every .compile() in the
ecosystem.
1. Download-and-run in an install hook (high confidence, escalates alone):
- id: autoexec-location
applies_to: code
weight: 45
attack_type: install-hook
match:
all:
- location_at_least: 3.0
- any:
- bound_call: {category: process}
- bound_call: {category: exec}
- bound_call: {category: network}2. Credentials read and sent out (exfil):
- id: combo-cred-network
applies_to: code
weight: 45
attack_type: credential-exfil
match:
all:
- bound_call: {category: credential}
- bound_call: {category: network}3. A specific dangerous call by name, anywhere in changed code (accumulating signal):
- id: marshal-loads-used
applies_to: code
weight: 20
description: marshal.loads on added lines — deserializing opaque bytecode is a loader smell.
match:
bound_call: {name: loads}4. A newly-added typosquat dependency (escalates alone):
- id: dep-typosquat
applies_to: dep
weight: 40
attack_type: dependency-typosquat
match:
dep_reason: typosquatDrop the YAML in rules/community/, then check it loads and behaves:
from pathlib import Path
from pydiffwatch.rules import load_rules
from pydiffwatch.engine import triage
from pydiffwatch.config import Config
from pydiffwatch.models import Diff, FileDiff, Hunk
rules = load_rules(Path("rules/community"))
assert any(r.id == "your-rule-id" for r in rules) # if absent, it failed validation (check the logs)
def code(path, lines):
return Diff("p", "1.1", False, [FileDiff(path, "modified",
[Hunk((0, 0), (0, len(lines)), lines, [])], "\n".join(lines))], [])
res = triage(code("setup.py", ["import os", "os.system('id')"]), Config(), rules)
print(res.score, res.escalate, [fr.rule for fr in res.fired_rules])If your rule doesn't appear in load_rules(...), it was rejected as invalid — run with logging at
WARNING to see why.
- Rules are pure data — the loader uses
yaml.safe_loadand the matcher never callseval/exec/getattr. A rule can describe a match; it cannot run code. An invalid rule is dropped at load (fail-closed), so one bad rule can never break the rest of the ruleset. - The
regexpredicate runs your pattern against untrusted package text at match time, using Python's backtrackingreengine. A pathologically nested pattern (e.g.(a+)+$) can backtrack catastrophically and hang. Patterns are length-capped (1000 chars) to bound the surface, but the cap does not prevent catastrophic backtracking. Preferbound_calland the other structural predicates overregex, and keep any regex simple and anchored.
Rules describe observable signals, not conclusions. Keep each rule narrow and explainable; let the
weights and the LLM reviewer combine them. Prefer a precise bound_call over a broad regex. A good rule
is one a reviewer can read and immediately understand why it fired.