Extract changed data¶
Goal. Given a saved changeset, pull out the actual changed content — the added rows, removed lines, or reordered columns — not just the summary.
Prerequisites.
- A changeset JSON produced by binoc diff A B -o changeset.json.
- Both original snapshots still present on disk (extract reopens
them).
- The same plugin set that produced the changeset (extract looks up
plugins by name to reopen and format the data).
The one-liner¶
Where PATH is the logical path of a node in the changeset (for
example data.csv or archive.zip/records.csv) and ASPECT is the
kind of data to extract.
Example — pull the rows that were added to a CSV:
The output is valid CSV. Pipe it into another tool or inspect it directly.
Available aspects¶
Aspects depend on the node type. The common ones for the standard library:
| Node type | Aspects |
|---|---|
Tabular (CSV today; any comparator publishing tabular_v1) |
rows_added, rows_removed, cells_changed, columns_added, columns_removed, content |
| Text | diff, content_left, content_right, content |
| Column reorder | column_order |
A plugin-authored comparator or transformer can define its own aspects. Unknown aspects produce an error listing what's supported for that node.
Why both snapshots have to exist¶
The changeset JSON captures what changed — it does not carry the
changed bytes themselves (that would balloon the file). Extract walks
the provenance chain recorded on each node (comparator and
transformed_by fields) and reopens the snapshots through the same
comparator sequence that produced the node: directory → zip →
directory → csv, for example. At the leaf, it re-derives the data
and hands it to whichever plugin last touched the node to format.
The upshot: extract needs the snapshots and the plugins, but it does not need to re-diff. See Provenance and extract ADR and Extract and provenance.
Common issues¶
"Comparator X cannot extract aspect Y from node Z"¶
The node's responsible plugin doesn't know that aspect. Either pick
a supported aspect for the node type, or — if you're a plugin author
— implement extract for your plugin.
"Plugin X not found"¶
Extract uses plugin names, so the environment running extract must
have the same plugins installed as the environment that produced the
changeset. If you produced a changeset with a custom plugin and try
to extract in a plain pip install binoc environment, you'll hit
this. Install the plugin and retry.
Changeset describes a snapshot path that no longer exists¶
Extract reopens the original snapshots. Moving or deleting them after the diff breaks extract. Keep the snapshots alongside the changeset if you plan to extract later.
Where to go next¶
- Extract and provenance — the design of the reopen chain.
- Changeset JSON schema — the
comparatorandtransformed_byfields extract relies on. - Write a Python comparator / Write a Rust comparator — the hooks to implement for a custom plugin to support extract.