Invariant and Lint Tiers: Harness, Mechanical, Agent¶
Date: 2026-06-12 Status: Implemented
Context¶
Binoc accumulated invariant checks in scattered, single-purpose homes:
changeset structural invariants hardcoded inside the test-vector harness,
write-set enforcement inside AbiTransformer, a descriptor lint in
test_support, and one-off assertions in per-crate test files. Each new
check faced the same questions with no settled answers: where does it
live, how does a plugin author adopt it, can it warn without failing, and
what about properties that no mechanism can check — behavioral
completeness on untested branches, performance posture, security of
container expansion, layering discipline?
The write-sets work sharpened the gap: the dynamic harness only audits code paths that vectors exercise, source scans only see literal patterns, and a class of contracts (the never-for-scheduling rule, "tags are facts") was enforced purely by ADR prose.
Decision¶
Three tiers, each with a designated home, a failure mode, and an adoption
path. The tier table lives in binoc_sdk::lints (module docs) and
.agents/skills/lint-plugin/SKILL.md so both humans and agents find it.
Tier 1 — harness invariants (hard fail, every vector). Properties of
every produced changeset, checked between diff and snapshot so gold files
cannot silently enshrine bugs. Home:
binoc_stdlib::test_vectors::check_changeset_invariants (now public —
plugins call it, plus their own domain checks, on changesets they build),
and the per-call write-set enforcement in AbiTransformer. Adding an
invariant here costs every harness user a hard failure, so it must be a
true always-property, cheap, and self-explanatory when it fires.
Tier 2 — mechanical lints (errors fail, warnings print). Static
checks over descriptor data and source text in binoc_sdk::lints
(test-support feature): a LintReport { errors, warnings } whose
assert_clean() panics on errors and prints warnings to stderr. The
convention is one tests/lints.rs per crate calling the helpers it cares
about; binoc-sqlite's is the model for plugin authors. Initial lints:
single-producer/single-consumer tags and undeclared write-sets
(descriptor-level — these also work on closed-source plugins, since
descriptors cross the ABI), wholesale tag wipes (tags.clear() bug
class), and the tripwire that binoc-core never reads write-set
declarations (mechanizing the write-sets ADR's never-for-scheduling
constraint). Source scans honor a binoc-lint: allow(<rule>) comment on
or above the flagged line — except the dispatch tripwire, where the
answer is "don't".
Warnings ride the test suite rather than a separate tool: zero new
infrastructure, colocated with what they check, and lint errors already
fail just test. The cost is that libtest hides stderr from passing
tests, so just lint runs the lint targets with --nocapture to surface
advisory warnings on demand.
Tier 3 — agent lints (review findings). Contracts that need judgment,
not mechanism: behavioral write-set completeness on branches vectors
don't reach, tags-as-facts for targeted removals, keyed-vs-positional
judgment hygiene, performance posture (no unguarded full-data scans,
streaming, bounded captures), security (path traversal, decompression
bombs, no ambient I/O), layering (AGENTS.md rules), and docs/ADR hygiene.
Documented as an operational checklist in
.agents/skills/lint-plugin/SKILL.md, including a reporting format
(violation vs. smell, file:line, offended contract, smallest fix). The
skill explicitly lists what tiers 1–2 already prove so agents verify the
gaps instead of re-deriving the automated parts.
The tiers are ordered by preference: a check that can move down a tier should. An agent-lint observation that turns out to be greppable becomes a tier-2 lint; a tier-2 pattern that is really an always-property of changesets becomes a tier-1 invariant.
Alternatives Considered¶
A separate lint binary / cargo xtask pass. A dedicated runner could
report warnings without the --nocapture wrinkle and gain flags
(--deny-warnings). Rejected for now: it adds a build target and a CI
step for what three test files express today, and lint errors must fail
just test anyway — the runner would duplicate that path. Revisit if
warnings need machine-readable output.
Custom rustc lints (dylint/clippy plugins). True AST-level precision
(e.g. resolving that a computed with_tag(tag) value is declared).
Rejected: heavy toolchain coupling for marginal gain over substring scans
plus the dynamic harness, which together already cover both sides
(patterns statically, behavior dynamically).
Type-level enforcement (sealed tag emission via declared-set tokens).
Strongest static guarantee, but it conflicts with the open-IR
architecture (AGENTS.md rule 4: open strings, conventions not
enforcement) and would break the DiffNode.tags: BTreeSet<String> plugin
surface. Deliberately not pursued.
Putting agent checks in CLAUDE.md / AGENTS.md prose. The checklist is operational (per-review, per-plugin), not ambient project context; AGENTS.md stays the pointer, the skill holds the procedure. A skill file can also be invoked selectively by tooling rather than occupying every session's context.