Skip to content

Python API

The public Python API lives in the top-level binoc package. Every symbol on this page is reachable as binoc.<name> and is listed in binoc.__all__; private names (anything starting with _) are deliberately omitted. The page below is rendered directly from the installed package's docstrings by mkdocstrings[python] — see the Documentation platform ADR.

Limitations of the Python surface

Rule authoring

Python can embed binoc, configure dataset semantics, discover plugins, and author renderers. Correspondence rule authoring is currently Rust-only and in-process until the stable ABI tier lands.

For plugins that need those capabilities, write Rust correspondence rules. See Plugin model and the Rust SDK reference.

For worked Python examples, see:

binoc

binoc

Binoc: the missing changelog for datasets.

Binoc generates changelogs for datasets that don't ship with them. Given snapshots of a dataset downloaded at different times, Binoc detects what changed, expresses changes as a minimal structured diff (the :class:Changeset / :class:DiffNode tree), and renders changes as JSON or Markdown.

This module is the top-level Python API. Every symbol listed in binoc.__all__ is considered public and is documented on this page.

Quick start::

import binoc

changeset = binoc.diff("snapshots/2024-03", "snapshots/2024-06")
print(changeset)

# Inspect the diff tree
for child in changeset.root:
    print(f"{child.path}: {child.action}")

# Serialize
json_str = changeset.to_json()
markdown = binoc.to_markdown([changeset])

Writing plugins: Python supports embedding, rendering, and dataset configuration. Parser and rewrite rule authoring is Rust-only until the correspondence rule ABI lands.

Test-vector helpers for plugin authors live in :mod:binoc.testing.

Changeset

Changeset(
    from_snapshot: str,
    to_snapshot: str,
    root: DiffNode | None = None,
)

The result of :func:diff — a rooted diff tree plus metadata.

A Changeset records the two snapshot names it was computed from, the root of the diff tree (None if the snapshots are identical), and a free-form metadata dict for plugin use. Structured diagnostics carry plugin and renderer findings. Serialize with :meth:to_json / :meth:to_dict, or via the module-level :func:to_json and :func:to_markdown.

claims property

claims: list[dict[str, Any]]

Run-scoped global claims. Reserved for future claim producers.

diagnostics property

diagnostics: list[dict[str, Any]]

Structured diagnostics attached to this changeset.

from_snapshot property

from_snapshot: str

Name/identifier of the earlier snapshot this changeset was computed from.

metadata property

metadata: dict[str, str]

Free-form metadata dict (plugin-populated).

node_count property

node_count: int

Total number of nodes in the diff tree (0 if :attr:root is None).

root property

root: DiffNode | None

Root of the diff tree, or None if the snapshots compare identical.

to_snapshot property

to_snapshot: str

Name/identifier of the later snapshot this changeset was computed from.

find_node method descriptor

find_node(selector: str) -> DiffNode | None

Recursively search the diff tree for a node whose path matches selector. Returns None if there is no root or no match.

to_dict method descriptor

to_dict() -> dict[str, Any]

Serialize this changeset to a plain Python dict.

to_json method descriptor

to_json() -> str

Serialize this changeset to canonical binoc changeset JSON.

Config

Config()

Dataset-level diff configuration.

A Config holds dataset-level semantic configuration for the correspondence engine.

default staticmethod

default() -> Config

Return a fresh Config populated with the standard-library defaults.

from_file staticmethod

from_file(path: str) -> Config

Load a dataset config from a YAML file on disk.

DiffNode

DiffNode(
    action: str,
    item_type: str,
    path: str,
    *,
    sources: list[SourceRecord] | None = None,
    summary: str | None = None,
    tags: list[str] | set[str] | None = None,
    details: dict[str, Any] | None = None,
    annotations: list[AnnotationRecord] | None = None,
    children: list[DiffNode] | None = None,
)

A node in the diff tree — the primary IR type.

A DiffNode records one change (or unchanged item) at one logical path. Correspondence rules project nodes from links and edit lists. action, item_type, and tags are open strings so plugins can introduce new vocabulary without a core release.

Nodes are iterable and indexable over their children::

for child in node:
    print(child.path, child.action)

first_child = node[0]
count = len(node)

action property

action: str

Open-string verb describing what changed ("add", "modify", ...).

annotations property

annotations: list[AnnotationRecord]

Renderer-visible annotations as {"package", "key", "value"} records.

children property

children: list[DiffNode]

Direct children of this node.

details property

details: dict[str, Any]

Structured JSON-serializable details describing the change.

item_type property

item_type: str

Open-string noun describing the kind of item ("file", "csv.row", ...).

path property

path: str

Logical path of this item within its snapshot.

sources property

sources: list[SourceRecord]

Provenance records for this projected node.

summary property

summary: str | None

Optional one-line human summary of the change, as plain text.

tags property

tags: list[str]

Open-string tags attached to this node (used for renderer grouping and rule-pack semantics).

all_tags method descriptor

all_tags() -> list[str]

Union of all tags on this node and its descendants.

annotate_from method descriptor

annotate_from(
    package: str, key: str, value: Any
) -> DiffNode

Alias for with_annotation_from.

find_node method descriptor

find_node(selector: str) -> DiffNode | None

Recursively search this subtree for a node whose path matches selector. Returns None if no match is found.

node_count method descriptor

node_count() -> int

Total number of nodes in the subtree rooted at this node.

to_dict method descriptor

to_dict() -> dict[str, Any]

Serialize this node (recursively) to a plain Python dict.

to_json method descriptor

to_json() -> str

Serialize this node (recursively) to pretty-printed JSON.

with_annotation_from method descriptor

with_annotation_from(
    package: str, key: str, value: Any
) -> DiffNode

Return a clone of this node with a namespaced annotation set. value must be JSON-serializable.

with_children method descriptor

with_children(children: list[DiffNode]) -> DiffNode

Return a clone of this node with its children replaced.

with_detail method descriptor

with_detail(key: str, value: Any) -> DiffNode

Return a clone of this node with details[key] = value set. value must be JSON-serializable.

with_source method descriptor

with_source(
    path: str,
    side: str,
    evidence: str | None,
    action: str | None,
) -> DiffNode

Return a clone of this node with a provenance source appended.

with_summary method descriptor

with_summary(summary: str) -> DiffNode

Return a clone of this node with summary replaced.

with_tag method descriptor

with_tag(tag: str) -> DiffNode

Return a clone of this node with tag added to tags.

ItemPair

extension property

extension: str | None

is_dir property

is_dir: bool

left_path property

left_path: str | None

logical_path property

logical_path: str

right_path property

right_path: str | None

PluginRegistry

A mutable registry of renderer plugins.

Test harnesses and plugin authors build a PluginRegistry, register plugin instances or load native .so plugins into it, and pass it to rendering paths that resolve configured renderers.

default staticmethod

default() -> PluginRegistry

Return a fresh registry preloaded with the standard-library plugins.

list_renderers method descriptor

list_renderers() -> list[str]

Return the names of all registered renderers.

load_native_plugin method descriptor

load_native_plugin(module_path: str) -> None

Load a native Rust plugin from a shared library path.

The library must expose the binoc plugin C ABI (_binoc_plugin_describe and related entry points). This is the same mechanism used by the entry-point-based plugin discovery in :mod:binoc.

register_renderer method descriptor

register_renderer(_name, obj: Any) -> None

Register a Python renderer instance with this registry.

A Python renderer is any object with a name attribute, a file_extension attribute (defaults to "txt") and a render(changesets, config) -> str method.

diff builtin

diff(
    snapshot_a: str,
    snapshot_b: str,
    *,
    config: Config | None = None,
    registry: PluginRegistry | None = None,
) -> Changeset

Diff two snapshots and return the resulting :class:Changeset.

Parameters:

  • snapshot_a (str) –

    Path to the earlier snapshot (file or directory).

  • snapshot_b (str) –

    Path to the later snapshot (file or directory).

  • config (Config | None, default: None ) –

    Optional :class:Config carrying dataset semantics. If None, the stdlib defaults are used.

  • registry (PluginRegistry | None, default: None ) –

    Optional :class:PluginRegistry providing the set of plugins available to resolve from config. If None, the stdlib registry is used.

Returns:

  • Changeset

    The resulting :class:Changeset.

to_json builtin

to_json(changeset: Changeset) -> str

Render a :class:Changeset as canonical binoc changeset JSON.

to_markdown builtin

to_markdown(
    changesets: list[Changeset],
    *,
    config: Config | None = None,
) -> str

Render one or more changesets to Markdown using the stdlib renderer.

Parameters:

  • changesets (list[Changeset]) –

    List of :class:Changeset s to render.

  • config (Config | None, default: None ) –

    Optional :class:Config. The binoc.markdown section of its output config controls per-renderer options (significance rules, section layout, etc.).

Returns:

  • str

    The rendered Markdown string.

testing

Test vector helpers for binoc plugins.

Provides utilities to discover test vectors and run them against the binoc Python API. Plugin authors use this to validate correspondence-first output through the Python stack.

Snapshots are assumed to be already materialized — that is, any .zip.d / .tar.gz.d / plugin-specific staging dirs in test-vectors/ have been built into real artifacts by just materialize (or the equivalent cargo run -p <crate> --bin materialize-test-vectors invocations). See docs/adr/test_vector_materialization.md for the design; pytest sessions typically materialize once in a session-scoped fixture::

@pytest.fixture(scope="session")
def vectors_dir(tmp_path_factory):
    import subprocess
    dest = tmp_path_factory.mktemp("vectors")
    subprocess.check_call([
        "cargo", "run", "-q", "-p", "my_plugin",
        "--features", "test-support",
        "--bin", "materialize-test-vectors", "--",
        str(dest), "my-plugin/test-vectors",
    ])
    return dest

Typical usage in a plugin's pytest suite::

import binoc
from binoc.testing import discover_vectors, run_vector

@pytest.mark.parametrize(
    "vector_dir",
    discover_vectors(vectors_dir()),
    ids=lambda v: v.name,
)
def test_vector(vector_dir):
    run_vector(vector_dir)

discover_vectors

discover_vectors(vectors_dir: str | Path) -> list[Path]

Find test vector directories under vectors_dir.

A valid vector directory contains manifest.toml, snapshot-a/, and snapshot-b/. Returns a sorted list of Paths.

load_manifest

load_manifest(
    vector_dir: str | Path,
    vectors_root: str | Path | None = None,
) -> dict

Load a vector's manifest, merging defaults from the root manifest.

Returns a dict with keys vector, config (optional), expected (optional).

run_vector

run_vector(
    vector_dir: str | Path,
    *,
    vectors_root: str | Path | None = None,
    registry: PluginRegistry | None = None,
) -> binoc.Changeset

Run a single test vector and check its manifest assertions.

vector_dir must be a materialized vector — any .zip.d / .tar.gz.d / plugin-specific staging directories should already have been built into real artifacts. See module docstring for how to run materialization once per session.

Steps: 1. Parse the manifest (with root-manifest defaults). 2. Build a binoc.Config from supported manifest [config] fields. 3. Run binoc.diff() against the snapshots with the config and optional registry. 4. Check [expected] assertions from the manifest.

Returns the resulting :class:binoc.Changeset.

check_assertions

check_assertions(
    name: str, changeset: Changeset, expected: dict
) -> None

Verify a changeset against [expected] assertions from a manifest.

binoc.testing

Test-vector helpers for plugin authors. Separate submodule; import as from binoc.testing import discover_vectors, run_vector.

testing

Test vector helpers for binoc plugins.

Provides utilities to discover test vectors and run them against the binoc Python API. Plugin authors use this to validate correspondence-first output through the Python stack.

Snapshots are assumed to be already materialized — that is, any .zip.d / .tar.gz.d / plugin-specific staging dirs in test-vectors/ have been built into real artifacts by just materialize (or the equivalent cargo run -p <crate> --bin materialize-test-vectors invocations). See docs/adr/test_vector_materialization.md for the design; pytest sessions typically materialize once in a session-scoped fixture::

@pytest.fixture(scope="session")
def vectors_dir(tmp_path_factory):
    import subprocess
    dest = tmp_path_factory.mktemp("vectors")
    subprocess.check_call([
        "cargo", "run", "-q", "-p", "my_plugin",
        "--features", "test-support",
        "--bin", "materialize-test-vectors", "--",
        str(dest), "my-plugin/test-vectors",
    ])
    return dest

Typical usage in a plugin's pytest suite::

import binoc
from binoc.testing import discover_vectors, run_vector

@pytest.mark.parametrize(
    "vector_dir",
    discover_vectors(vectors_dir()),
    ids=lambda v: v.name,
)
def test_vector(vector_dir):
    run_vector(vector_dir)

discover_vectors

discover_vectors(vectors_dir: str | Path) -> list[Path]

Find test vector directories under vectors_dir.

A valid vector directory contains manifest.toml, snapshot-a/, and snapshot-b/. Returns a sorted list of Paths.

load_manifest

load_manifest(
    vector_dir: str | Path,
    vectors_root: str | Path | None = None,
) -> dict

Load a vector's manifest, merging defaults from the root manifest.

Returns a dict with keys vector, config (optional), expected (optional).

run_vector

run_vector(
    vector_dir: str | Path,
    *,
    vectors_root: str | Path | None = None,
    registry: PluginRegistry | None = None,
) -> binoc.Changeset

Run a single test vector and check its manifest assertions.

vector_dir must be a materialized vector — any .zip.d / .tar.gz.d / plugin-specific staging directories should already have been built into real artifacts. See module docstring for how to run materialization once per session.

Steps: 1. Parse the manifest (with root-manifest defaults). 2. Build a binoc.Config from supported manifest [config] fields. 3. Run binoc.diff() against the snapshots with the config and optional registry. 4. Check [expected] assertions from the manifest.

Returns the resulting :class:binoc.Changeset.

check_assertions

check_assertions(
    name: str, changeset: Changeset, expected: dict
) -> None

Verify a changeset against [expected] assertions from a manifest.