Skip to content

Python API

The public Python API lives in the top-level binoc package. Every symbol on this page is reachable as binoc.<name> and is listed in binoc.__all__; private names (anything starting with _) are deliberately omitted. The page below is rendered directly from the installed package's docstrings by mkdocstrings[python] — see the Documentation platform ADR.

Limitations of the Python surface

Python comparators, transformers, and renderers receive a deliberately simplified interface compared to Rust plugins:

  • No DataAccess. Python comparators get physical file paths on ItemPair, not the trait object. They cannot publish artifacts or call workspace() for scratch space.
  • No content_hash or media_type on ItemPair.
  • No source_items on Python transformers — they operate on the DiffNode tree only, and cannot re-read the raw snapshot data.

For plugins that need those capabilities, write a Rust plugin. See Write a Rust comparator and the Rust SDK reference.

For worked Python examples, see:

binoc

binoc

Binoc: the missing changelog for datasets.

Binoc generates changelogs for datasets that don't ship with them. Given snapshots of a dataset downloaded at different times, Binoc detects what changed, expresses changes as a minimal structured diff (the :class:Changeset / :class:DiffNode tree), and renders changes as JSON or Markdown.

This module is the top-level Python API. Every symbol listed in binoc.__all__ is considered public and is documented on this page.

Quick start::

import binoc

changeset = binoc.diff("snapshots/2024-03", "snapshots/2024-06")
print(changeset)

# Inspect the diff tree
for child in changeset.root:
    print(f"{child.path}: {child.action}")

# Serialize
json_str = changeset.to_json()
markdown = binoc.to_markdown([changeset])

Writing plugins: Subclass :class:Comparator to parse a new file format into the IR, or subclass :class:Transformer to rewrite the diff tree. Register them on a :class:Config with :meth:Config.add_comparator / :meth:Config.add_transformer, or on a :class:PluginRegistry for reuse across multiple diffs and for distribution as an entry point.

Test-vector helpers for plugin authors live in :mod:binoc.testing.

Changeset

Changeset(
    from_snapshot: str,
    to_snapshot: str,
    root: DiffNode | None = None,
)

The result of :func:diff — a rooted diff tree plus metadata.

A Changeset records the two snapshot names it was computed from, the root of the diff tree (None if the snapshots are identical), and a free-form metadata dict for plugin use. Serialize with :meth:to_json / :meth:to_dict, or via the module-level :func:to_json and :func:to_markdown.

from_snapshot property

from_snapshot: str

Name/identifier of the earlier snapshot this changeset was computed from.

metadata property

metadata: dict[str, str]

Free-form metadata dict (plugin-populated).

node_count property

node_count: int

Total number of nodes in the diff tree (0 if :attr:root is None).

root property

root: DiffNode | None

Root of the diff tree, or None if the snapshots compare identical.

to_snapshot property

to_snapshot: str

Name/identifier of the later snapshot this changeset was computed from.

find_node method descriptor

find_node(selector: str) -> DiffNode | None

Recursively search the diff tree for a node whose path matches selector. Returns None if there is no root or no match.

to_dict method descriptor

to_dict() -> dict[str, Any]

Serialize this changeset to a plain Python dict.

to_json method descriptor

to_json() -> str

Serialize this changeset to canonical binoc changeset JSON.

Config

Config(
    *,
    comparators: list[str] | None = None,
    transformers: list[str] | None = None,
)

Dataset-level diff configuration.

A Config selects which registered comparators and transformers run for a given dataset, and holds references to ad-hoc Python plugin instances registered via :meth:add_comparator / :meth:add_transformer (i.e. without packaging them as entry points).

comparators property

comparators: list[str]

Names of the comparators this config will run, in order.

transformers property

transformers: list[str]

Names of the transformers this config will run, in order.

add_comparator method descriptor

add_comparator(comparator: Any) -> None

Register an ad-hoc :class:Comparator instance with this config.

Useful for quick scripts and tests where packaging the comparator as a distribution entry point would be overkill. The comparator is inserted before the stdlib binary fallback when that fallback is present.

add_transformer method descriptor

add_transformer(transformer: Any) -> None

Register an ad-hoc :class:Transformer instance with this config.

default staticmethod

default() -> Config

Return a fresh Config populated with the standard-library defaults (stdlib comparators and transformers, in their default order).

from_file staticmethod

from_file(path: str) -> Config

Load a dataset config from a YAML file on disk.

DiffNode

DiffNode(
    action: str,
    item_type: str,
    path: str,
    *,
    source_path: str | None = None,
    summary: str | None = None,
    tags: list[str] | set[str] | None = None,
    details: dict[str, Any] | None = None,
    annotations: dict[str, Any] | None = None,
    children: list[DiffNode] | None = None,
)

A node in the diff tree — the primary IR type.

A DiffNode records one change (or unchanged item) at one logical path. Every comparator emits nodes; every transformer rewrites them. action, item_type, and tags are open strings so plugins can introduce new vocabulary without a core release.

Nodes are iterable and indexable over their children::

for child in node:
    print(child.path, child.action)

first_child = node[0]
count = len(node)

action property

action: str

Open-string verb describing what changed ("add", "modify", ...).

annotations property

annotations: dict[str, Any]

Transient/presentation annotations not part of the persisted IR.

children property

children: list[DiffNode]

Direct children of this node.

details property

details: dict[str, Any]

Structured JSON-serializable details describing the change.

item_type property

item_type: str

Open-string noun describing the kind of item ("file", "csv.row", ...).

path property

path: str

Logical path of this item within its snapshot.

source_path property

source_path: str | None

Prior logical path if this item was moved or renamed; None otherwise.

summary property

summary: str | None

Optional one-line human summary of the change.

tags property

tags: list[str]

Open-string tags attached to this node (used for renderer significance classification and transformer dispatch).

all_tags method descriptor

all_tags() -> list[str]

Union of all tags on this node and its descendants.

find_node method descriptor

find_node(selector: str) -> DiffNode | None

Recursively search this subtree for a node whose path matches selector. Returns None if no match is found.

node_count method descriptor

node_count() -> int

Total number of nodes in the subtree rooted at this node.

to_dict method descriptor

to_dict() -> dict[str, Any]

Serialize this node (recursively) to a plain Python dict.

to_json method descriptor

to_json() -> str

Serialize this node (recursively) to pretty-printed JSON.

with_children method descriptor

with_children(children: list[DiffNode]) -> DiffNode

Return a clone of this node with its children replaced.

with_detail method descriptor

with_detail(key: str, value: Any) -> DiffNode

Return a clone of this node with details[key] = value set. value must be JSON-serializable.

with_source_path method descriptor

with_source_path(source: str) -> DiffNode

Return a clone of this node with source_path replaced (used to record moves/renames).

with_summary method descriptor

with_summary(summary: str) -> DiffNode

Return a clone of this node with summary replaced.

with_tag method descriptor

with_tag(tag: str) -> DiffNode

Return a clone of this node with tag added to tags.

Expand

Expand(node: DiffNode, children: list[ItemPair])

Comparator result: produce this :class:DiffNode as a container, and schedule the given children as additional item pairs for the controller to dispatch.

children property

children: list[ItemPair]

Child item pairs to recurse into.

node property

node: DiffNode

The container diff node.

Identical

Identical()

Comparator result: the two items are semantically identical; no diff node is produced.

ItemPair

extension property

extension: str | None

is_dir property

is_dir: bool

left_path property

left_path: str | None

logical_path property

logical_path: str

right_path property

right_path: str | None

Leaf

Leaf(node: DiffNode)

Comparator result: produce this :class:DiffNode as a terminal leaf — the controller will not recurse into its children.

node property

node: DiffNode

The terminal diff node.

PluginRegistry

A mutable registry of comparator, transformer, and renderer plugins.

Test harnesses and plugin authors build a PluginRegistry, register plugin instances or load native .so plugins into it, and pass it to :func:diff to control which plugins are available for config resolution.

default staticmethod

default() -> PluginRegistry

Return a fresh registry preloaded with the standard-library plugins.

list_comparators method descriptor

list_comparators() -> list[str]

Return the names of all registered comparators.

list_renderers method descriptor

list_renderers() -> list[str]

Return the names of all registered renderers.

list_transformers method descriptor

list_transformers() -> list[str]

Return the names of all registered transformers.

load_native_plugin method descriptor

load_native_plugin(module_path: str) -> None

Load a native Rust plugin from a shared library path.

The library must expose the binoc plugin C ABI (_binoc_plugin_describe and related entry points). This is the same mechanism used by the entry-point-based plugin discovery in :mod:binoc.

register_comparator method descriptor

register_comparator(_name, obj: Any) -> None

Register a Python :class:Comparator instance with this registry.

The comparator's own name attribute is used for dispatch; the _name argument is accepted for API symmetry and ignored.

register_renderer method descriptor

register_renderer(_name, obj: Any) -> None

Register a Python renderer instance with this registry.

A Python renderer is any object with a name attribute, a file_extension attribute (defaults to "txt") and a render(changesets, config) -> str method.

register_transformer method descriptor

register_transformer(_name, obj: Any) -> None

Register a Python :class:Transformer instance with this registry.

Remove

Remove()

Transformer result: drop the matched node from the tree entirely.

Replace

Replace(node: DiffNode)

Transformer result: replace the matched node with this single new node.

node property

node: DiffNode

The replacement diff node.

ReplaceMany

ReplaceMany(nodes: list[DiffNode])

Transformer result: replace the matched node with zero or more new nodes.

nodes property

nodes: list[DiffNode]

The replacement diff nodes.

Skip

Skip()

Comparator result: this comparator cannot handle the item after all; the controller should continue to the next matching comparator.

Unchanged

Unchanged()

Transformer result: do not rewrite this node.

Comparator

Base class for Python-authored comparators.

A comparator is the parser layer of binoc: it takes an :class:ItemPair and decides whether the two sides are semantically identical, whether they differ (and how), and — for container formats — what child items the controller should recursively diff next.

Subclass this and set the class attributes listed below, then implement :meth:compare. If neither extensions nor media_types is set, the comparator is treated as an imperative fallback and :meth:can_handle decides whether it should run for each item.

Attributes: name: Dispatch name / registry key for this comparator, e.g. "bio.fasta". Plugins should namespace by package. extensions: File extensions (with leading .) this comparator claims. Declarative dispatch: first comparator to claim an item wins. Ordering is a :class:Config concern. media_types: MIME media types this comparator claims.

Example::

class FastaComparator(binoc.Comparator):
    name = "bio.fasta"
    extensions = [".fasta", ".fa"]

    def compare(self, pair):
        return binoc.Leaf(binoc.DiffNode(
            action="modify",
            item_type="fasta",
            path=pair.logical_path,
        ))

config = binoc.Config.default()
config.add_comparator(FastaComparator())
changeset = binoc.diff("a", "b", config=config)

can_handle

can_handle(pair: ItemPair) -> bool

Return True if this comparator can handle pair.

Declarative dispatch by extensions / media_types is the normal path. This method is only consulted for Python comparators that do not declare either list.

compare

compare(pair: ItemPair) -> Identical | Skip | Leaf | Expand

Compare an :class:ItemPair and return a result variant.

Must return one of:

  • :class:Identical — items are semantically the same; produce no diff node.
  • :class:Skip — this comparator cannot handle the item after all; the controller should try the next matching comparator.
  • :class:Leaf — terminal diff node; the controller will not recurse into it.
  • :class:Expand — container diff node plus the child :class:ItemPair s to recurse into.

Raises :class:NotImplementedError if a subclass forgets to implement it.

Transformer

Base class for Python-authored transformers.

A transformer is an optimization / normalization pass over the diff tree: it rewrites :class:DiffNode s after all comparators have run but before rendering. Transformers operate only on the IR — they do not have access to the raw snapshot data.

Subclass this, set the dispatch filters, and implement :meth:transform.

Attributes: name: Dispatch name / registry key for this transformer. match_types: If non-empty, only call :meth:transform on nodes whose :attr:~DiffNode.item_type is in this list. match_tags: If non-empty, only call :meth:transform on nodes carrying at least one of these tags. match_actions: If non-empty, only call :meth:transform on nodes whose :attr:~DiffNode.action is in this list. node_shape: Dispatch filter on node shape — one of "any" (default), "container" (only nodes with children), or "leaf" (only childless nodes).

Example::

class Normalizer(binoc.Transformer):
    name = "myproject.normalizer"
    match_tags = ["myproject.raw"]

    def transform(self, node):
        return binoc.Replace(node.with_tag("myproject.normalized"))

config = binoc.Config.default()
config.add_transformer(Normalizer())

can_handle

can_handle(node: DiffNode) -> bool

Return True if this transformer should process node.

Imperative escape hatch for cases where the declarative filters (match_types / match_tags / match_actions / node_shape) cannot express the match.

transform

transform(
    node: DiffNode,
) -> Unchanged | Replace | ReplaceMany | Remove

Rewrite a matched :class:DiffNode and return a result variant.

Must return one of:

  • :class:Unchanged — leave the node alone.
  • :class:Replace — replace the node with one new node.
  • :class:ReplaceMany — replace the node with zero or more nodes.
  • :class:Remove — drop the node from the tree entirely.

diff builtin

diff(
    snapshot_a: str,
    snapshot_b: str,
    *,
    config: Config | None = None,
    registry: PluginRegistry | None = None,
) -> Changeset

Diff two snapshots and return the resulting :class:Changeset.

Parameters:

  • snapshot_a (str) –

    Path to the earlier snapshot (file or directory).

  • snapshot_b (str) –

    Path to the later snapshot (file or directory).

  • config (Config | None, default: None ) –

    Optional :class:Config controlling which comparators and transformers run. If None, the stdlib defaults are used.

  • registry (PluginRegistry | None, default: None ) –

    Optional :class:PluginRegistry providing the set of plugins available to resolve from config. If None, the stdlib registry is used.

Returns:

  • Changeset

    The resulting :class:Changeset.

to_json builtin

to_json(changeset: Changeset) -> str

Render a :class:Changeset as canonical binoc changeset JSON.

to_markdown builtin

to_markdown(
    changesets: list[Changeset],
    *,
    config: Config | None = None,
) -> str

Render one or more changesets to Markdown using the stdlib renderer.

Parameters:

  • changesets (list[Changeset]) –

    List of :class:Changeset s to render.

  • config (Config | None, default: None ) –

    Optional :class:Config. The binoc.markdown section of its output config controls per-renderer options (significance rules, section layout, etc.).

Returns:

  • str

    The rendered Markdown string.

testing

Test vector helpers for binoc plugins.

Provides utilities to discover test vectors and run them against the binoc Python API. Plugin authors use this to validate their comparators end-to-end through the Python stack.

Snapshots are assumed to be already materialized — that is, any .zip.d / .tar.gz.d / plugin-specific staging dirs in test-vectors/ have been built into real artifacts by just materialize (or the equivalent cargo run -p <crate> --bin materialize-test-vectors invocations). See docs/adr/test_vector_materialization.md for the design; pytest sessions typically materialize once in a session-scoped fixture::

@pytest.fixture(scope="session")
def vectors_dir(tmp_path_factory):
    import subprocess
    dest = tmp_path_factory.mktemp("vectors")
    subprocess.check_call([
        "cargo", "run", "-q", "-p", "my_plugin",
        "--features", "test-support",
        "--bin", "materialize-test-vectors", "--",
        str(dest), "my-plugin/test-vectors",
    ])
    return dest

Typical usage in a plugin's pytest suite::

import binoc
from binoc.testing import discover_vectors, run_vector

@pytest.fixture
def registry():
    r = binoc.PluginRegistry.default()
    r.register_comparator("my-plugin.foo", MyComparator())
    return r

@pytest.mark.parametrize(
    "vector_dir",
    discover_vectors(vectors_dir()),
    ids=lambda v: v.name,
)
def test_vector(vector_dir, registry):
    run_vector(vector_dir, registry=registry)

discover_vectors

discover_vectors(vectors_dir: str | Path) -> list[Path]

Find test vector directories under vectors_dir.

A valid vector directory contains manifest.toml, snapshot-a/, and snapshot-b/. Returns a sorted list of Paths.

load_manifest

load_manifest(
    vector_dir: str | Path,
    vectors_root: str | Path | None = None,
) -> dict

Load a vector's manifest, merging defaults from the root manifest.

Returns a dict with keys vector, config (optional), expected (optional).

run_vector

run_vector(
    vector_dir: str | Path,
    *,
    vectors_root: str | Path | None = None,
    registry: PluginRegistry | None = None,
) -> binoc.Changeset

Run a single test vector and check its manifest assertions.

vector_dir must be a materialized vector — any .zip.d / .tar.gz.d / plugin-specific staging directories should already have been built into real artifacts. See module docstring for how to run materialization once per session.

Steps: 1. Parse the manifest (with root-manifest defaults). 2. Build a binoc.Config from the manifest's [config] section. 3. Run binoc.diff() against the snapshots with the config and optional registry. 4. Check [expected] assertions from the manifest.

Returns the resulting :class:binoc.Changeset.

check_assertions

check_assertions(
    name: str, changeset: Changeset, expected: dict
) -> None

Verify a changeset against [expected] assertions from a manifest.

binoc.testing

Test-vector helpers for plugin authors. Separate submodule; import as from binoc.testing import discover_vectors, run_vector.

testing

Test vector helpers for binoc plugins.

Provides utilities to discover test vectors and run them against the binoc Python API. Plugin authors use this to validate their comparators end-to-end through the Python stack.

Snapshots are assumed to be already materialized — that is, any .zip.d / .tar.gz.d / plugin-specific staging dirs in test-vectors/ have been built into real artifacts by just materialize (or the equivalent cargo run -p <crate> --bin materialize-test-vectors invocations). See docs/adr/test_vector_materialization.md for the design; pytest sessions typically materialize once in a session-scoped fixture::

@pytest.fixture(scope="session")
def vectors_dir(tmp_path_factory):
    import subprocess
    dest = tmp_path_factory.mktemp("vectors")
    subprocess.check_call([
        "cargo", "run", "-q", "-p", "my_plugin",
        "--features", "test-support",
        "--bin", "materialize-test-vectors", "--",
        str(dest), "my-plugin/test-vectors",
    ])
    return dest

Typical usage in a plugin's pytest suite::

import binoc
from binoc.testing import discover_vectors, run_vector

@pytest.fixture
def registry():
    r = binoc.PluginRegistry.default()
    r.register_comparator("my-plugin.foo", MyComparator())
    return r

@pytest.mark.parametrize(
    "vector_dir",
    discover_vectors(vectors_dir()),
    ids=lambda v: v.name,
)
def test_vector(vector_dir, registry):
    run_vector(vector_dir, registry=registry)

discover_vectors

discover_vectors(vectors_dir: str | Path) -> list[Path]

Find test vector directories under vectors_dir.

A valid vector directory contains manifest.toml, snapshot-a/, and snapshot-b/. Returns a sorted list of Paths.

load_manifest

load_manifest(
    vector_dir: str | Path,
    vectors_root: str | Path | None = None,
) -> dict

Load a vector's manifest, merging defaults from the root manifest.

Returns a dict with keys vector, config (optional), expected (optional).

run_vector

run_vector(
    vector_dir: str | Path,
    *,
    vectors_root: str | Path | None = None,
    registry: PluginRegistry | None = None,
) -> binoc.Changeset

Run a single test vector and check its manifest assertions.

vector_dir must be a materialized vector — any .zip.d / .tar.gz.d / plugin-specific staging directories should already have been built into real artifacts. See module docstring for how to run materialization once per session.

Steps: 1. Parse the manifest (with root-manifest defaults). 2. Build a binoc.Config from the manifest's [config] section. 3. Run binoc.diff() against the snapshots with the config and optional registry. 4. Check [expected] assertions from the manifest.

Returns the resulting :class:binoc.Changeset.

check_assertions

check_assertions(
    name: str, changeset: Changeset, expected: dict
) -> None

Verify a changeset against [expected] assertions from a manifest.