Python API¶
The public Python API lives in the top-level binoc package. Every symbol
on this page is reachable as binoc.<name> and is listed in
binoc.__all__; private names (anything starting with _) are
deliberately omitted. The page below is rendered directly from the
installed package's docstrings by
mkdocstrings[python] — see the
Documentation platform ADR.
Limitations of the Python surface¶
Python comparators, transformers, and renderers receive a deliberately simplified interface compared to Rust plugins:
- No
DataAccess. Python comparators get physical file paths onItemPair, not the trait object. They cannot publish artifacts or callworkspace()for scratch space. - No
content_hashormedia_typeonItemPair. - No
source_itemson Python transformers — they operate on theDiffNodetree only, and cannot re-read the raw snapshot data.
For plugins that need those capabilities, write a Rust plugin. See Write a Rust comparator and the Rust SDK reference.
For worked Python examples, see:
binoc¶
binoc ¶
Binoc: the missing changelog for datasets.
Binoc generates changelogs for datasets that don't ship with them. Given
snapshots of a dataset downloaded at different times, Binoc detects what
changed, expresses changes as a minimal structured diff (the :class:Changeset
/ :class:DiffNode tree), and renders changes as JSON or Markdown.
This module is the top-level Python API. Every symbol listed in
binoc.__all__ is considered public and is documented on this page.
Quick start::
import binoc
changeset = binoc.diff("snapshots/2024-03", "snapshots/2024-06")
print(changeset)
# Inspect the diff tree
for child in changeset.root:
print(f"{child.path}: {child.action}")
# Serialize
json_str = changeset.to_json()
markdown = binoc.to_markdown([changeset])
Writing plugins:
Subclass :class:Comparator to parse a new file format into the IR, or
subclass :class:Transformer to rewrite the diff tree. Register them on
a :class:Config with :meth:Config.add_comparator /
:meth:Config.add_transformer, or on a :class:PluginRegistry for
reuse across multiple diffs and for distribution as an entry point.
Test-vector helpers for plugin authors live in :mod:binoc.testing.
Changeset ¶
The result of :func:diff — a rooted diff tree plus metadata.
A Changeset records the two snapshot names it was computed from, the
root of the diff tree (None if the snapshots are identical), and a
free-form metadata dict for plugin use. Serialize with
:meth:to_json / :meth:to_dict, or via the module-level :func:to_json
and :func:to_markdown.
from_snapshot
property
¶
Name/identifier of the earlier snapshot this changeset was computed from.
node_count
property
¶
Total number of nodes in the diff tree (0 if :attr:root is None).
root
property
¶
Root of the diff tree, or None if the snapshots compare identical.
to_snapshot
property
¶
Name/identifier of the later snapshot this changeset was computed from.
find_node
method descriptor
¶
Recursively search the diff tree for a node whose path matches
selector. Returns None if there is no root or no match.
to_dict
method descriptor
¶
Serialize this changeset to a plain Python dict.
to_json
method descriptor
¶
Serialize this changeset to canonical binoc changeset JSON.
Config ¶
Dataset-level diff configuration.
A Config selects which registered comparators and transformers run for
a given dataset, and holds references to ad-hoc Python plugin instances
registered via :meth:add_comparator / :meth:add_transformer (i.e.
without packaging them as entry points).
comparators
property
¶
Names of the comparators this config will run, in order.
transformers
property
¶
Names of the transformers this config will run, in order.
add_comparator
method descriptor
¶
Register an ad-hoc :class:Comparator instance with this config.
Useful for quick scripts and tests where packaging the comparator as a distribution entry point would be overkill. The comparator is inserted before the stdlib binary fallback when that fallback is present.
add_transformer
method descriptor
¶
Register an ad-hoc :class:Transformer instance with this config.
default
staticmethod
¶
Return a fresh Config populated with the standard-library
defaults (stdlib comparators and transformers, in their default
order).
from_file
staticmethod
¶
Load a dataset config from a YAML file on disk.
DiffNode ¶
DiffNode(
action: str,
item_type: str,
path: str,
*,
source_path: str | None = None,
summary: str | None = None,
tags: list[str] | set[str] | None = None,
details: dict[str, Any] | None = None,
annotations: dict[str, Any] | None = None,
children: list[DiffNode] | None = None,
)
A node in the diff tree — the primary IR type.
A DiffNode records one change (or unchanged item) at one logical path.
Every comparator emits nodes; every transformer rewrites them. action,
item_type, and tags are open strings so plugins can introduce new
vocabulary without a core release.
Nodes are iterable and indexable over their children::
for child in node:
print(child.path, child.action)
first_child = node[0]
count = len(node)
annotations
property
¶
Transient/presentation annotations not part of the persisted IR.
details
property
¶
Structured JSON-serializable details describing the change.
item_type
property
¶
Open-string noun describing the kind of item ("file", "csv.row", ...).
source_path
property
¶
Prior logical path if this item was moved or renamed; None otherwise.
tags
property
¶
Open-string tags attached to this node (used for renderer significance classification and transformer dispatch).
all_tags
method descriptor
¶
Union of all tags on this node and its descendants.
find_node
method descriptor
¶
Recursively search this subtree for a node whose path matches
selector. Returns None if no match is found.
node_count
method descriptor
¶
Total number of nodes in the subtree rooted at this node.
to_dict
method descriptor
¶
Serialize this node (recursively) to a plain Python dict.
to_json
method descriptor
¶
Serialize this node (recursively) to pretty-printed JSON.
with_children
method descriptor
¶
Return a clone of this node with its children replaced.
with_detail
method descriptor
¶
Return a clone of this node with details[key] = value set. value
must be JSON-serializable.
with_source_path
method descriptor
¶
Return a clone of this node with source_path replaced (used to
record moves/renames).
with_summary
method descriptor
¶
Return a clone of this node with summary replaced.
with_tag
method descriptor
¶
Return a clone of this node with tag added to tags.
Expand ¶
Identical ¶
Comparator result: the two items are semantically identical; no diff node is produced.
ItemPair ¶
Leaf ¶
Comparator result: produce this :class:DiffNode as a terminal leaf —
the controller will not recurse into its children.
PluginRegistry ¶
A mutable registry of comparator, transformer, and renderer plugins.
Test harnesses and plugin authors build a PluginRegistry,
register plugin instances or load native .so plugins into it, and
pass it to :func:diff to control which plugins are available for
config resolution.
default
staticmethod
¶
Return a fresh registry preloaded with the standard-library plugins.
list_comparators
method descriptor
¶
Return the names of all registered comparators.
list_renderers
method descriptor
¶
Return the names of all registered renderers.
list_transformers
method descriptor
¶
Return the names of all registered transformers.
load_native_plugin
method descriptor
¶
Load a native Rust plugin from a shared library path.
The library must expose the binoc plugin C ABI (_binoc_plugin_describe
and related entry points). This is the same mechanism used by the
entry-point-based plugin discovery in :mod:binoc.
register_comparator
method descriptor
¶
Register a Python :class:Comparator instance with this registry.
The comparator's own name attribute is used for dispatch; the
_name argument is accepted for API symmetry and ignored.
register_renderer
method descriptor
¶
Register a Python renderer instance with this registry.
A Python renderer is any object with a name attribute, a
file_extension attribute (defaults to "txt") and a
render(changesets, config) -> str method.
register_transformer
method descriptor
¶
Register a Python :class:Transformer instance with this registry.
Replace ¶
Transformer result: replace the matched node with this single new node.
ReplaceMany ¶
Transformer result: replace the matched node with zero or more new nodes.
Skip ¶
Comparator result: this comparator cannot handle the item after all; the controller should continue to the next matching comparator.
Comparator ¶
Base class for Python-authored comparators.
A comparator is the parser layer of binoc: it takes an :class:ItemPair
and decides whether the two sides are semantically identical, whether
they differ (and how), and — for container formats — what child items
the controller should recursively diff next.
Subclass this and set the class attributes listed below, then implement
:meth:compare. If neither extensions nor media_types is set, the
comparator is treated as an imperative fallback and :meth:can_handle
decides whether it should run for each item.
Attributes:
name: Dispatch name / registry key for this comparator, e.g.
"bio.fasta". Plugins should namespace by package.
extensions: File extensions (with leading .) this comparator
claims. Declarative dispatch: first comparator to claim an
item wins. Ordering is a :class:Config concern.
media_types: MIME media types this comparator claims.
Example::
class FastaComparator(binoc.Comparator):
name = "bio.fasta"
extensions = [".fasta", ".fa"]
def compare(self, pair):
return binoc.Leaf(binoc.DiffNode(
action="modify",
item_type="fasta",
path=pair.logical_path,
))
config = binoc.Config.default()
config.add_comparator(FastaComparator())
changeset = binoc.diff("a", "b", config=config)
can_handle ¶
Return True if this comparator can handle pair.
Declarative dispatch by extensions / media_types is the normal
path. This method is only consulted for Python comparators that do not
declare either list.
compare ¶
Compare an :class:ItemPair and return a result variant.
Must return one of:
- :class:
Identical— items are semantically the same; produce no diff node. - :class:
Skip— this comparator cannot handle the item after all; the controller should try the next matching comparator. - :class:
Leaf— terminal diff node; the controller will not recurse into it. - :class:
Expand— container diff node plus the child :class:ItemPairs to recurse into.
Raises :class:NotImplementedError if a subclass forgets to
implement it.
Transformer ¶
Base class for Python-authored transformers.
A transformer is an optimization / normalization pass over the diff
tree: it rewrites :class:DiffNode s after all comparators have run
but before rendering. Transformers operate only on the IR — they do
not have access to the raw snapshot data.
Subclass this, set the dispatch filters, and implement :meth:transform.
Attributes:
name: Dispatch name / registry key for this transformer.
match_types: If non-empty, only call :meth:transform on nodes
whose :attr:~DiffNode.item_type is in this list.
match_tags: If non-empty, only call :meth:transform on nodes
carrying at least one of these tags.
match_actions: If non-empty, only call :meth:transform on nodes
whose :attr:~DiffNode.action is in this list.
node_shape: Dispatch filter on node shape — one of "any"
(default), "container" (only nodes with children), or
"leaf" (only childless nodes).
Example::
class Normalizer(binoc.Transformer):
name = "myproject.normalizer"
match_tags = ["myproject.raw"]
def transform(self, node):
return binoc.Replace(node.with_tag("myproject.normalized"))
config = binoc.Config.default()
config.add_transformer(Normalizer())
can_handle ¶
Return True if this transformer should process node.
Imperative escape hatch for cases where the declarative filters
(match_types / match_tags / match_actions /
node_shape) cannot express the match.
transform ¶
Rewrite a matched :class:DiffNode and return a result variant.
Must return one of:
- :class:
Unchanged— leave the node alone. - :class:
Replace— replace the node with one new node. - :class:
ReplaceMany— replace the node with zero or more nodes. - :class:
Remove— drop the node from the tree entirely.
diff
builtin
¶
diff(
snapshot_a: str,
snapshot_b: str,
*,
config: Config | None = None,
registry: PluginRegistry | None = None,
) -> Changeset
Diff two snapshots and return the resulting :class:Changeset.
Parameters:
-
snapshot_a(str) –Path to the earlier snapshot (file or directory).
-
snapshot_b(str) –Path to the later snapshot (file or directory).
-
config(Config | None, default:None) –Optional :class:
Configcontrolling which comparators and transformers run. IfNone, the stdlib defaults are used. -
registry(PluginRegistry | None, default:None) –Optional :class:
PluginRegistryproviding the set of plugins available to resolve fromconfig. IfNone, the stdlib registry is used.
Returns:
-
Changeset–The resulting :class:
Changeset.
to_json
builtin
¶
Render a :class:Changeset as canonical binoc changeset JSON.
to_markdown
builtin
¶
Render one or more changesets to Markdown using the stdlib renderer.
Parameters:
-
changesets(list[Changeset]) –List of :class:
Changesets to render. -
config(Config | None, default:None) –Optional :class:
Config. Thebinoc.markdownsection of its output config controls per-renderer options (significance rules, section layout, etc.).
Returns:
-
str–The rendered Markdown string.
testing ¶
Test vector helpers for binoc plugins.
Provides utilities to discover test vectors and run them against the binoc Python API. Plugin authors use this to validate their comparators end-to-end through the Python stack.
Snapshots are assumed to be already materialized — that is, any .zip.d
/ .tar.gz.d / plugin-specific staging dirs in test-vectors/ have been
built into real artifacts by just materialize (or the equivalent
cargo run -p <crate> --bin materialize-test-vectors invocations). See
docs/adr/test_vector_materialization.md for the design; pytest sessions
typically materialize once in a session-scoped fixture::
@pytest.fixture(scope="session")
def vectors_dir(tmp_path_factory):
import subprocess
dest = tmp_path_factory.mktemp("vectors")
subprocess.check_call([
"cargo", "run", "-q", "-p", "my_plugin",
"--features", "test-support",
"--bin", "materialize-test-vectors", "--",
str(dest), "my-plugin/test-vectors",
])
return dest
Typical usage in a plugin's pytest suite::
import binoc
from binoc.testing import discover_vectors, run_vector
@pytest.fixture
def registry():
r = binoc.PluginRegistry.default()
r.register_comparator("my-plugin.foo", MyComparator())
return r
@pytest.mark.parametrize(
"vector_dir",
discover_vectors(vectors_dir()),
ids=lambda v: v.name,
)
def test_vector(vector_dir, registry):
run_vector(vector_dir, registry=registry)
discover_vectors ¶
Find test vector directories under vectors_dir.
A valid vector directory contains manifest.toml, snapshot-a/,
and snapshot-b/. Returns a sorted list of Paths.
load_manifest ¶
Load a vector's manifest, merging defaults from the root manifest.
Returns a dict with keys vector, config (optional),
expected (optional).
run_vector ¶
run_vector(
vector_dir: str | Path,
*,
vectors_root: str | Path | None = None,
registry: PluginRegistry | None = None,
) -> binoc.Changeset
Run a single test vector and check its manifest assertions.
vector_dir must be a materialized vector — any .zip.d /
.tar.gz.d / plugin-specific staging directories should already have
been built into real artifacts. See module docstring for how to run
materialization once per session.
Steps:
1. Parse the manifest (with root-manifest defaults).
2. Build a binoc.Config from the manifest's [config] section.
3. Run binoc.diff() against the snapshots with the config and
optional registry.
4. Check [expected] assertions from the manifest.
Returns the resulting :class:binoc.Changeset.
check_assertions ¶
Verify a changeset against [expected] assertions from a manifest.
binoc.testing¶
Test-vector helpers for plugin authors. Separate submodule; import as
from binoc.testing import discover_vectors, run_vector.
testing ¶
Test vector helpers for binoc plugins.
Provides utilities to discover test vectors and run them against the binoc Python API. Plugin authors use this to validate their comparators end-to-end through the Python stack.
Snapshots are assumed to be already materialized — that is, any .zip.d
/ .tar.gz.d / plugin-specific staging dirs in test-vectors/ have been
built into real artifacts by just materialize (or the equivalent
cargo run -p <crate> --bin materialize-test-vectors invocations). See
docs/adr/test_vector_materialization.md for the design; pytest sessions
typically materialize once in a session-scoped fixture::
@pytest.fixture(scope="session")
def vectors_dir(tmp_path_factory):
import subprocess
dest = tmp_path_factory.mktemp("vectors")
subprocess.check_call([
"cargo", "run", "-q", "-p", "my_plugin",
"--features", "test-support",
"--bin", "materialize-test-vectors", "--",
str(dest), "my-plugin/test-vectors",
])
return dest
Typical usage in a plugin's pytest suite::
import binoc
from binoc.testing import discover_vectors, run_vector
@pytest.fixture
def registry():
r = binoc.PluginRegistry.default()
r.register_comparator("my-plugin.foo", MyComparator())
return r
@pytest.mark.parametrize(
"vector_dir",
discover_vectors(vectors_dir()),
ids=lambda v: v.name,
)
def test_vector(vector_dir, registry):
run_vector(vector_dir, registry=registry)
discover_vectors ¶
Find test vector directories under vectors_dir.
A valid vector directory contains manifest.toml, snapshot-a/,
and snapshot-b/. Returns a sorted list of Paths.
load_manifest ¶
Load a vector's manifest, merging defaults from the root manifest.
Returns a dict with keys vector, config (optional),
expected (optional).
run_vector ¶
run_vector(
vector_dir: str | Path,
*,
vectors_root: str | Path | None = None,
registry: PluginRegistry | None = None,
) -> binoc.Changeset
Run a single test vector and check its manifest assertions.
vector_dir must be a materialized vector — any .zip.d /
.tar.gz.d / plugin-specific staging directories should already have
been built into real artifacts. See module docstring for how to run
materialization once per session.
Steps:
1. Parse the manifest (with root-manifest defaults).
2. Build a binoc.Config from the manifest's [config] section.
3. Run binoc.diff() against the snapshots with the config and
optional registry.
4. Check [expected] assertions from the manifest.
Returns the resulting :class:binoc.Changeset.
check_assertions ¶
Verify a changeset against [expected] assertions from a manifest.