Python API¶
The public Python API lives in the top-level binoc package. Every symbol
on this page is reachable as binoc.<name> and is listed in
binoc.__all__; private names (anything starting with _) are
deliberately omitted. The page below is rendered directly from the
installed package's docstrings by
mkdocstrings[python] — see the
Documentation platform ADR.
Limitations of the Python surface¶
Rule authoring
Python can embed binoc, configure dataset semantics, discover plugins, and author renderers. Correspondence rule authoring is currently Rust-only and in-process until the stable ABI tier lands.
For plugins that need those capabilities, write Rust correspondence rules. See Plugin model and the Rust SDK reference.
For worked Python examples, see:
binoc¶
binoc ¶
Binoc: the missing changelog for datasets.
Binoc generates changelogs for datasets that don't ship with them. Given
snapshots of a dataset downloaded at different times, Binoc detects what
changed, expresses changes as a minimal structured diff (the :class:Changeset
/ :class:DiffNode tree), and renders changes as JSON or Markdown.
This module is the top-level Python API. Every symbol listed in
binoc.__all__ is considered public and is documented on this page.
Quick start::
import binoc
changeset = binoc.diff("snapshots/2024-03", "snapshots/2024-06")
print(changeset)
# Inspect the diff tree
for child in changeset.root:
print(f"{child.path}: {child.action}")
# Serialize
json_str = changeset.to_json()
markdown = binoc.to_markdown([changeset])
Writing plugins: Python supports embedding, rendering, and dataset configuration. Parser and rewrite rule authoring is Rust-only until the correspondence rule ABI lands.
Test-vector helpers for plugin authors live in :mod:binoc.testing.
Changeset ¶
The result of :func:diff — a rooted diff tree plus metadata.
A Changeset records the two snapshot names it was computed from, the
root of the diff tree (None if the snapshots are identical), and a
free-form metadata dict for plugin use. Structured diagnostics
carry plugin and renderer findings. Serialize with
:meth:to_json / :meth:to_dict, or via the module-level :func:to_json
and :func:to_markdown.
claims
property
¶
Run-scoped global claims. Reserved for future claim producers.
diagnostics
property
¶
Structured diagnostics attached to this changeset.
from_snapshot
property
¶
Name/identifier of the earlier snapshot this changeset was computed from.
node_count
property
¶
Total number of nodes in the diff tree (0 if :attr:root is None).
root
property
¶
Root of the diff tree, or None if the snapshots compare identical.
to_snapshot
property
¶
Name/identifier of the later snapshot this changeset was computed from.
find_node
method descriptor
¶
Recursively search the diff tree for a node whose path matches
selector. Returns None if there is no root or no match.
to_dict
method descriptor
¶
Serialize this changeset to a plain Python dict.
to_json
method descriptor
¶
Serialize this changeset to canonical binoc changeset JSON.
Config ¶
Dataset-level diff configuration.
A Config holds dataset-level semantic configuration for the
correspondence engine.
DiffNode ¶
DiffNode(
action: str,
item_type: str,
path: str,
*,
sources: list[SourceRecord] | None = None,
summary: str | None = None,
tags: list[str] | set[str] | None = None,
details: dict[str, Any] | None = None,
annotations: list[AnnotationRecord] | None = None,
children: list[DiffNode] | None = None,
)
A node in the diff tree — the primary IR type.
A DiffNode records one change (or unchanged item) at one logical path.
Correspondence rules project nodes from links and edit lists. action,
item_type, and tags are open strings so plugins can introduce new
vocabulary without a core release.
Nodes are iterable and indexable over their children::
for child in node:
print(child.path, child.action)
first_child = node[0]
count = len(node)
annotations
property
¶
Renderer-visible annotations as {"package", "key", "value"}
records.
details
property
¶
Structured JSON-serializable details describing the change.
item_type
property
¶
Open-string noun describing the kind of item ("file", "csv.row", ...).
summary
property
¶
Optional one-line human summary of the change, as plain text.
tags
property
¶
Open-string tags attached to this node (used for renderer grouping and rule-pack semantics).
all_tags
method descriptor
¶
Union of all tags on this node and its descendants.
annotate_from
method descriptor
¶
Alias for with_annotation_from.
find_node
method descriptor
¶
Recursively search this subtree for a node whose path matches
selector. Returns None if no match is found.
node_count
method descriptor
¶
Total number of nodes in the subtree rooted at this node.
to_dict
method descriptor
¶
Serialize this node (recursively) to a plain Python dict.
to_json
method descriptor
¶
Serialize this node (recursively) to pretty-printed JSON.
with_annotation_from
method descriptor
¶
Return a clone of this node with a namespaced annotation set.
value must be JSON-serializable.
with_children
method descriptor
¶
Return a clone of this node with its children replaced.
with_detail
method descriptor
¶
Return a clone of this node with details[key] = value set. value
must be JSON-serializable.
with_source
method descriptor
¶
Return a clone of this node with a provenance source appended.
with_summary
method descriptor
¶
Return a clone of this node with summary replaced.
with_tag
method descriptor
¶
Return a clone of this node with tag added to tags.
ItemPair ¶
PluginRegistry ¶
A mutable registry of renderer plugins.
Test harnesses and plugin authors build a PluginRegistry,
register plugin instances or load native .so plugins into it, and
pass it to rendering paths that resolve configured renderers.
default
staticmethod
¶
Return a fresh registry preloaded with the standard-library plugins.
list_renderers
method descriptor
¶
Return the names of all registered renderers.
load_native_plugin
method descriptor
¶
Load a native Rust plugin from a shared library path.
The library must expose the binoc plugin C ABI (_binoc_plugin_describe
and related entry points). This is the same mechanism used by the
entry-point-based plugin discovery in :mod:binoc.
register_renderer
method descriptor
¶
Register a Python renderer instance with this registry.
A Python renderer is any object with a name attribute, a
file_extension attribute (defaults to "txt") and a
render(changesets, config) -> str method.
diff
builtin
¶
diff(
snapshot_a: str,
snapshot_b: str,
*,
config: Config | None = None,
registry: PluginRegistry | None = None,
) -> Changeset
Diff two snapshots and return the resulting :class:Changeset.
Parameters:
-
snapshot_a(str) –Path to the earlier snapshot (file or directory).
-
snapshot_b(str) –Path to the later snapshot (file or directory).
-
config(Config | None, default:None) –Optional :class:
Configcarrying dataset semantics. IfNone, the stdlib defaults are used. -
registry(PluginRegistry | None, default:None) –Optional :class:
PluginRegistryproviding the set of plugins available to resolve fromconfig. IfNone, the stdlib registry is used.
Returns:
-
Changeset–The resulting :class:
Changeset.
to_json
builtin
¶
Render a :class:Changeset as canonical binoc changeset JSON.
to_markdown
builtin
¶
Render one or more changesets to Markdown using the stdlib renderer.
Parameters:
-
changesets(list[Changeset]) –List of :class:
Changesets to render. -
config(Config | None, default:None) –Optional :class:
Config. Thebinoc.markdownsection of its output config controls per-renderer options (significance rules, section layout, etc.).
Returns:
-
str–The rendered Markdown string.
testing ¶
Test vector helpers for binoc plugins.
Provides utilities to discover test vectors and run them against the binoc Python API. Plugin authors use this to validate correspondence-first output through the Python stack.
Snapshots are assumed to be already materialized — that is, any .zip.d
/ .tar.gz.d / plugin-specific staging dirs in test-vectors/ have been
built into real artifacts by just materialize (or the equivalent
cargo run -p <crate> --bin materialize-test-vectors invocations). See
docs/adr/test_vector_materialization.md for the design; pytest sessions
typically materialize once in a session-scoped fixture::
@pytest.fixture(scope="session")
def vectors_dir(tmp_path_factory):
import subprocess
dest = tmp_path_factory.mktemp("vectors")
subprocess.check_call([
"cargo", "run", "-q", "-p", "my_plugin",
"--features", "test-support",
"--bin", "materialize-test-vectors", "--",
str(dest), "my-plugin/test-vectors",
])
return dest
Typical usage in a plugin's pytest suite::
import binoc
from binoc.testing import discover_vectors, run_vector
@pytest.mark.parametrize(
"vector_dir",
discover_vectors(vectors_dir()),
ids=lambda v: v.name,
)
def test_vector(vector_dir):
run_vector(vector_dir)
discover_vectors ¶
Find test vector directories under vectors_dir.
A valid vector directory contains manifest.toml, snapshot-a/,
and snapshot-b/. Returns a sorted list of Paths.
load_manifest ¶
Load a vector's manifest, merging defaults from the root manifest.
Returns a dict with keys vector, config (optional),
expected (optional).
run_vector ¶
run_vector(
vector_dir: str | Path,
*,
vectors_root: str | Path | None = None,
registry: PluginRegistry | None = None,
) -> binoc.Changeset
Run a single test vector and check its manifest assertions.
vector_dir must be a materialized vector — any .zip.d /
.tar.gz.d / plugin-specific staging directories should already have
been built into real artifacts. See module docstring for how to run
materialization once per session.
Steps:
1. Parse the manifest (with root-manifest defaults).
2. Build a binoc.Config from supported manifest [config] fields.
3. Run binoc.diff() against the snapshots with the config and
optional registry.
4. Check [expected] assertions from the manifest.
Returns the resulting :class:binoc.Changeset.
check_assertions ¶
Verify a changeset against [expected] assertions from a manifest.
binoc.testing¶
Test-vector helpers for plugin authors. Separate submodule; import as
from binoc.testing import discover_vectors, run_vector.
testing ¶
Test vector helpers for binoc plugins.
Provides utilities to discover test vectors and run them against the binoc Python API. Plugin authors use this to validate correspondence-first output through the Python stack.
Snapshots are assumed to be already materialized — that is, any .zip.d
/ .tar.gz.d / plugin-specific staging dirs in test-vectors/ have been
built into real artifacts by just materialize (or the equivalent
cargo run -p <crate> --bin materialize-test-vectors invocations). See
docs/adr/test_vector_materialization.md for the design; pytest sessions
typically materialize once in a session-scoped fixture::
@pytest.fixture(scope="session")
def vectors_dir(tmp_path_factory):
import subprocess
dest = tmp_path_factory.mktemp("vectors")
subprocess.check_call([
"cargo", "run", "-q", "-p", "my_plugin",
"--features", "test-support",
"--bin", "materialize-test-vectors", "--",
str(dest), "my-plugin/test-vectors",
])
return dest
Typical usage in a plugin's pytest suite::
import binoc
from binoc.testing import discover_vectors, run_vector
@pytest.mark.parametrize(
"vector_dir",
discover_vectors(vectors_dir()),
ids=lambda v: v.name,
)
def test_vector(vector_dir):
run_vector(vector_dir)
discover_vectors ¶
Find test vector directories under vectors_dir.
A valid vector directory contains manifest.toml, snapshot-a/,
and snapshot-b/. Returns a sorted list of Paths.
load_manifest ¶
Load a vector's manifest, merging defaults from the root manifest.
Returns a dict with keys vector, config (optional),
expected (optional).
run_vector ¶
run_vector(
vector_dir: str | Path,
*,
vectors_root: str | Path | None = None,
registry: PluginRegistry | None = None,
) -> binoc.Changeset
Run a single test vector and check its manifest assertions.
vector_dir must be a materialized vector — any .zip.d /
.tar.gz.d / plugin-specific staging directories should already have
been built into real artifacts. See module docstring for how to run
materialization once per session.
Steps:
1. Parse the manifest (with root-manifest defaults).
2. Build a binoc.Config from supported manifest [config] fields.
3. Run binoc.diff() against the snapshots with the config and
optional registry.
4. Check [expected] assertions from the manifest.
Returns the resulting :class:binoc.Changeset.
check_assertions ¶
Verify a changeset against [expected] assertions from a manifest.