IR and changesets¶

The changeset IR is binoc's user-facing contract. The correspondence engine uses side trees, links, artifacts, edit lists, and compaction internally; after that work is complete, it projects the result as a tree of DiffNode values for renderers and downstream tools.

Understanding the projected tree is essential if you write renderers, consume changeset JSON, or add rule-pack output that should survive serialization.

A changeset is a tree of `DiffNode` values¶

Each changeset has one root DiffNode with children, grandchildren, and so on. The shape is structural and user-facing: directories and archives become containers, tabular files become leaves or table-collection children, and content changes become detail blocks under the relevant node.

flowchart TD
    Root["root: directory (modify)"]
    Root --> A["data/extra.csv (add)"]
    Root --> B["data/records.csv (modify, +1 row)"]
    Root --> C["docs/readme.txt (modify, +2/-1 lines)"]

The tree is a projection, not the engine's internal source of truth. Pairing, copy awareness, and edit compaction happen before projection.

`DiffNode` fields¶

The full set of fields is defined in binoc-sdk/src/ir.rs.

Field	Type	Purpose
`action`	open string	What happened: `"add"`, `"remove"`, `"modify"`, `"move"`, `"reorder"`, `"identical"`, or a plugin-defined value.
`item_type`	open string	What the item is: `"directory"`, `"file"`, `"tabular"`, `"zip_archive"`, or a plugin-defined value. Core does not schedule on it.
`path`	string	Logical path within the snapshot, e.g. `"archive.zip/>data/file.csv"`. `/>` marks a decompose boundary; a literal segment beginning with `>` is escaped as `\>`.
`sources`	list	Renderer-visible provenance records with path, side, and optional evidence/action. Moves, copies, merges, and deduplications use the same shape.
`summary`	optional	Human-readable one-liner set during projection.
`tags`	set of strings	Semantic observations such as `binoc.column-reorder` or `binoc.content-changed`.
`children`	list	Child diff nodes forming the projected tree.
`details`	map	Structured data for the projected change.
`annotations`	map	Renderer/plugin metadata kept separate from details.
`detail_blocks`	list	Structured sections and extract hints for renderer-controlled verbosity.
`diagnostics`	list	Reportable warnings/errors associated with the node.
`source_items`	transient	Live source-item provenance used during a run. Stripped from user-facing JSON.
`artifacts`	transient	Live artifact descriptors used by rules during a run. Stripped from user-facing JSON.

Three design commitments behind the IR¶

Everything is openly typed¶

action, item_type, tags, evidence kinds, and edit verbs are plain strings by convention. A plugin can introduce action: "biobinoc.gap-shift" without changing core.

Namespace custom values so downstream pipelines can distinguish plugin-owned facts from standard binoc.* facts. See Vocabulary.

Tags are facts, not judgments¶

Every tag is a factual observation. binoc.column-reorder means the columns were reordered; it does not say whether that change is important. Renderer config maps tags into headings or other presentation choices.

This split lets the same changeset JSON serve multiple audiences without rerunning the diff.

Projection is lossy by design¶

The projected tree is optimized for explanation, not for replaying the engine. It omits unchanged internal links, transient artifacts, and rejected compaction attempts. When a user asks for extracted data later, binoc reruns the correspondence engine against the original snapshots and resolves the projected node back to a live link.

What a changeset looks like on disk¶

A changeset on disk is JSON. The root node is the top-level object; children nest naturally. The default Markdown renderer reads the tree and produces a flat factual list:

# Changelog: snapshot-a -> snapshot-b

Claims: none

- **data/records.csv**: 1 row added
- **data/extra.csv**: New table (2 columns, 1 row)
- **summary.csv**: Columns reordered (content unchanged)

With explicit renderer config, the same tree can be grouped under literal headings in a declared order.

Pipeline integrators consume the JSON directly. The changeset JSON schema is the canonical serialized contract.

Combining changesets¶

binoc changelog changeset-1.json changeset-2.json ... reads multiple stored changesets and produces a single changelog spanning all of them. The combination is a renderer operation; it does not modify the stored IR.

Where to go next¶

For typed payloads used before projection: Artifacts and composition.
For extract ownership: Extract and provenance.
For the current engine design: correspondence-first engine.
For the JSON contract: changeset schema reference.