Declared Write-Sets on TransformerDescriptor¶
Date: 2026-06-11
Status: Superseded in part by Correspondence-First Engine — TransformerDescriptor was removed in the migration; the write-set discipline carried over to rule descriptors and is mechanized in Invariant and Lint Tiers
Context¶
TransformerDescriptor declared only what a transformer READS — the
match_types/match_tags/match_actions/match_artifacts/node_shape
dispatch filters. What a transformer WRITES (the tags it adds, the action
values it sets, the item types and artifact formats it introduces) was
undeclared, discoverable only by reading the implementation.
That gap had concrete costs. Audits like the one behind the
pure-reorder collapse had to
grep every transformer body to learn who emits which tag; nothing checked
that a transformer's emissions stayed inside what its author believed it
emitted (the ColumnReorderDetector tags.clear() bug lived undetected
in exactly that blind spot); and there was no machine-readable way to ask
"which tags are single-producer/single-consumer dispatch channels?"
MLIR's open-vocabulary experience says the cheap, high-value move is a
dependentDialects analogue: transformers declare the vocabularies they
may emit, checked by a verifier.
Decision¶
Add four declared write-sets to TransformerDescriptor, with builders in
the existing style: emits_tags, emits_actions, emits_item_types,
and publishes_artifacts (artifact formats). emits_item_types exists
because transformers do write item types — TableSplitter rewrites a
node to tabular_collection and creates tabular children.
Declared vs. legacy is distinguishable. Each field is
Option<Vec<…>> with #[serde(default)]: None means "legacy plugin,
nothing declared" and is exempt from enforcement; Some(vec![]) means
"writes nothing" and is enforced. This deliberately inverts the
match_* convention — in READ fields, empty means unconstrained; in
WRITE fields, empty means writes nothing. The asymmetry is documented
on the struct. The Option is the escape hatch that lets third-party
plugins compiled against older SDKs keep loading and running unchecked.
Never for scheduling or dispatch. Write-sets are for verification, lint, and future capability negotiation only. No ordering logic may be built on them. The rationale is LLVM's fifteen-year arc: the legacy pass manager had declared dependencies plus a scheduler, and the new pass manager — like MLIR after it — abandoned that for explicit, user-ordered pipelines. Declared effects fed to a solver rot into untruthful declarations precisely because they are load-bearing; declared effects checked by a verifier stay honest because lying fails the build. Binoc's pipeline order remains an explicit config list ("config order is semantics, no solver").
Wire visibility. Descriptors already cross the C ABI inside
PluginDescription (the _binoc_plugin_describe registration payload),
so the new #[serde(default)] fields are wire-visible with no request
struct changes; TransformRequest carries nodes, not descriptors, and is
untouched. Per the SDK compatibility policy,
additive #[serde(default)] fields do not bump the compatibility floor
(MIN_COMPATIBLE_MINOR stays 1); the SDK minor version bumps 0.1 → 0.2
so a plugin built against the write-set SDK is identifiable and is not
loaded by older hosts that would silently ignore its declarations.
Harness enforcement, not runtime. The test-vector harness's
AbiTransformer wrapper snapshots the facts of each transform call's
input subtree (tags, actions, item types, artifact formats anywhere in
the tree) and asserts that everything new in the output subtree(s) is
inside the transformer's declared write-set. A violation is a test
failure naming the transformer and the undeclared emission. Because
every stdlib vector runs through the ABI-wrapped registry, every
transformer pass on every vector is checked; production runs pay
nothing. The set-difference semantics means moving an existing tag or
action between nodes is not an "emission" — only introducing one the
input tree didn't have.
Lint for single-producer/single-consumer tags.
single_producer_single_consumer_tags() walks registered descriptors
and flags any tag declared in exactly one emits_tags and matched by
exactly one other transformer's match_tags — the "function call drawn
slowly" shape that the pure-reorder collapse retired. Callers pass an
allowlist for tags that are legitimately consumed outside transformer
dispatch: binoc.cell-change → binoc-row-reorder is the documented
example, allowlisted because renderer group configs also consume the tag
and the consumer genuinely needs its own scan.
Stdlib declares fully. Every stdlib transformer and
binoc-row-reorder declares all four write-sets (audited against the
implementations); a test asserts stdlib never regresses to None. The
declarations are facts about the code, not aspirations — the harness
catches drift in either direction for emissions the vectors exercise.
Out of scope, recorded deliberately: inherent-vs-discardable tag
classification (MLIR's other half), cost functions, any change to
transformer ordering or recompare, and the comparator descriptor refactor
— though the schema is shaped to unify later into a reads/writes pair
shared with ComparatorDescriptor (comparators publish artifacts and
emit actions too); a doc comment on TransformerDescriptor sketches it.
Alternatives Considered¶
Enforce at runtime in the controller. Rejected: production diffs would pay a full tree walk per transformer pass to catch what is a plugin-author bug, and a runtime failure would turn a harmless undeclared annotation into a user-facing crash. The harness sees every stdlib transformer on every vector; third-party authors get the same check by running their vectors through the shared harness.
Use the write-sets to order or skip transformers. Rejected permanently, not deferred — see the LLVM rationale above. The moment declarations drive scheduling, authors are incentivized to game them and the verifier's ground truth is gone.
A single declared: bool flag plus plain Vec fields. Same
expressiveness, but it allows the incoherent state declared: true with
the field's meaning depending on a sibling flag, and serde's None
omission keeps legacy descriptors byte-identical on the wire.
Option<Vec> makes "undeclared" unrepresentable as a value collision.
Per-node (positional) diff instead of tree-set difference. Stricter — it would catch a transformer copying an existing tag onto a new node — but transformers legitimately restructure trees (fold children, split tables, relocate remainders), and node identity across a rewrite is not well-defined. Set semantics over the subtree is the invariant that survives restructuring.