Transformer dispatch: bottom-up by default, Root for tree-wide walkers¶
Date: 2026-04-16 Status: Implemented
Context¶
TransformerDescriptor previously had a scope field selecting between two
traversal strategies:
TransformScope::Node— bottom-up walk; the transformer is called on every matching node after its children have already been processed.TransformScope::Subtree— top-down walk; at the first matching ancestor the transformer is handed the whole subtree, and the framework does not recurse further.
Both MoveDetector and CopyDetector in binoc-stdlib used Subtree with
NodeShapeFilter::Container. This turned out to be wrong in two ways.
Bug 1: nested matches were invisible¶
In Subtree mode, if the transformer matched a node and returned
Unchanged, the controller treated that as "this subtree is done" and did
not recurse. Concretely: MoveDetector would match the root directory, find
no add/remove pairs at root level, return Unchanged, and the docs/
subdirectory that did have a matching pair was never visited. Unchanged
was documented as "zero cost, no-op," but in Subtree mode it was silently
data-destructive.
Bug 2: the detectors wanted Node all along¶
Even with Bug 1 fixed, MoveDetector/CopyDetector didn't want Subtree
semantics: they wanted "run me on every container, bottom-up, so inner
containers resolve their add/remove pairs first and the parent sees a clean
residual." That's exactly what Node provides.
The underlying conceptual split¶
The two strategies expressed two different things a tree-walking transformer might want:
- A visitor. "Run me on every node that matches my descriptor." Order: bottom-up, so child transformations are the input to the enclosing one. Covers correlation, summarization, rewriting.
- An owner. "When you find a subtree I match, hand it to me and stop. I will walk it myself." Matters when a transformer needs the original (untransformed-by-itself) shape, or wants to suppress recursive self-application.
The visitor covers almost every plugin. The owner shape only matters for a narrow set of cases — most prominently hierarchical dedup (e.g., a whole directory moved, suppress inner move detection on its contents).
Decision¶
Part 1: one default traversal strategy (bottom-up, visit every match)¶
TransformScope is removed. The controller has one default traversal:
bottom-up, visit every matching node. Concretely:
TransformScopeenum deleted frombinoc-sdk.scopefield andwith_scopebuilder dropped fromTransformerDescriptor.- The controller's
apply_transformeris a single bottom-up walk.
For every realistic plugin in the codebase this is a clearing operation, not a semantic change.
Part 2: NodeShapeFilter::Root for tree-wide walkers¶
When we came to generalize move and copy detection to work across container boundaries (files that move out of a zip into a surrounding directory, or whole folder renames that span nested subfolders), we needed something the visitor semantics doesn't provide: run exactly once, at the tree root, and let the transformer walk the tree itself.
Rather than reintroduce a full "owner" scope as a separate axis (which was
what this ADR originally predicted), we added a new variant to the existing
NodeShapeFilter:
pub enum NodeShapeFilter {
Any, // default
Container, // nodes with children
Leaf, // nodes without children
Root, // the tree root — called exactly once per diff
}
The controller threads an is_root flag through its bottom-up recursion
(true only at the outermost call); Root matches iff that flag is set. A
transformer declaring NodeShapeFilter::Root sees the whole tree, exactly
once, and is responsible for its own traversal. When dispatched with Root,
the controller also skips the automatic recursion into children — the
transformer is given the tree as it is and decides what to visit.
This is strictly simpler than a separate owner/subtree axis: it reuses
machinery we already have (the shape filter), is a single variant to add,
and covers every use case we currently need. The two stdlib detectors that
use it —CorrelationDetector and FolderMoveDetector— both run once at
the root. If a future use case needs "owner at an arbitrary matching
subtree," we can add that axis then; Root handles the concrete cases we
have today.
Part 3: per-transformer config¶
Related but separable: Transformer::transform gained a third argument:
fn transform(
&self,
node: DiffNode,
data: &dyn DataAccess,
config: &serde_json::Value,
) -> TransformResult;
This mirrors Renderer::render. The controller reads per-transformer
configuration from DatasetConfig.transformer_config (a BTreeMap<String,
serde_json::Value> keyed by transformer name) and passes the appropriate
slice into each transform call. Transformers that don't need
configuration simply ignore the argument; the default is
serde_json::Value::Null.
FolderMoveDetector.threshold is the first knob that uses this channel;
CorrelationDetector uses it for the aggregate_many_to_many flag. The
wire format (plugin_abi::TransformRequest) carries config across the
ABI boundary.
Alternatives Considered¶
-
Keep
TransformScopeand makeSubtreemean "Unchangedrecurses". Arguably principled, but the semantic distinction is still subtle enough that plugin authors pick the wrong one — the stdlib itself did. -
Reintroduce
TransformScope::Subtreewith owner semantics for the correlation detectors. The original reintroduction criteria in an earlier draft of this ADR. Rejected: owner-scope "pick a matching subtree, stop recursing" is meaningfully more general than what we need. For the two concrete plugins (CorrelationDetector,FolderMoveDetector) the matching subtree is always the root.NodeShapeFilter::Rootis the right-sized primitive; a separate axis would cost a new enum and a new piece of conceptual surface area for no gain. -
Top-down vs. bottom-up traversal as a separate axis decoupled from
Node/Root. Plausible future design — a transformer says "I want to see this node before/after my children are processed." Overshoots current needs. Bottom-up is the right default; add top-down only when a concrete transformer wants it. -
Put config in
DataAccessinstead of a fourth parameter. Rejected: mixes two concerns (I/O and configuration), andDataAccessis a per-session binding while config is per-transformer.
Consequences¶
- Stdlib plugins that need tree-wide correlation declare
NodeShapeFilter::Rootand walk the tree themselves — seebinoc-stdlib/src/transformers/correlation_detector.rsandfolder_move_detector.rs. - Existing plugins keep working: the new
configparameter is the only breaking trait change, and the greenfield rule (AGENTS.md #9) allows it. - Tree-wide walkers pay a full-tree walk per invocation, but they run once per diff — the visitor shape's O(n) amortized cost across many invocations doesn't apply.
- The
Rootdispatch shape is a narrow, well-typed admission that correlation passes are special. If a third use case appears it's already named.
Reintroduction criteria for "owner" scope (if ever needed)¶
We'll add a full owner/subtree mode only when a concrete plugin needs
"run once at the topmost matching ancestor; Replace/ReplaceMany/Remove
are final; Unchanged means 'decline' and the controller keeps descending".
No such plugin exists today. Candidates to watch:
- Hierarchical dedup at arbitrary nesting. E.g., the SQLite plugin
wants to collapse all row-level edits into a single schema-aware
summary — but only at the
.sqlitenode, not at arbitrary containers.Rootdoesn't cover this because the target isn't the root. - Subtree-wide statistical rewriting that needs pre-transformation
shape. E.g., "if this subtree has >200 leaf changes, collapse to a
one-line summary." Works under the default bottom-up visitor if
children propagate counts via
details; breaks if we need information destroyed by bottom-up summarization.
When either arrives with a concrete implementation in hand, add the scope
and document its interaction with NodeShapeFilter::Root.