Add Python SDK utilities for benchmarking (similar to `fda bench`) by Karakatiza666 · Pull Request #5716 · feldera/feldera

Karakatiza666 · 2026-02-27T12:15:56Z

Add benchmarking utilities to the Python SDK (feldera.benchmarking)

The fda bench --upload CLI command collects pipeline performance metrics, formats them as Bencher Metric Format (BMF), and uploads results to a Bencher-compatible server. Until now there was no Python equivalent — users working with Python-based benchmark workloads (e.g. test_tpch.py) had to use the CLI or roll their own polling loop.

This PR adds a feldera/benchmarking.py module that mirrors that functionality.

New public API (all exported from feldera):

collect_metrics(pipeline, duration_secs=None) — polls pipeline.stats() in a 1-second loop until pipeline_complete is True or the optional duration elapses. Validates incarnation UUID consistency across samples and raises RuntimeError if any input connector reported errors.
BenchmarkMetrics.from_samples(samples) — aggregates the raw snapshots into throughput, peak/min memory, peak/min storage, buffered-input-record statistics, and state amplification ratio.
BenchmarkResult — wraps a name, metrics, and timing. Provides to_bmf() (dict), to_json() (pretty-printed BMF string), and format_table() (ASCII table).
bench(pipeline, name=None, duration_secs=None) — convenience wrapper that calls collect_metrics and returns a BenchmarkResult.
upload_to_bencher(result, project, *, host, token, branch, feldera_client, ...) — POSTs the BMF report to a Bencher-compatible server. Reads BENCHER_API_TOKEN, BENCHER_PROJECT, and BENCHER_HOST from the environment. When feldera_client is provided, enriches the run context with the Feldera instance edition and revision (matching what fda does via get_config()).

Design notes:

These utilities are observation-only — they do not start, stop, or otherwise manage pipeline lifetime. Callers retain full control over the pipeline lifecycle, which allows them to configure compilation profile, storage, transactions, etc. before the benchmark window.
The metric aggregation logic and BMF structure are a direct translation of crates/fda/src/bench.rs, keeping the two outputs compatible.

mythical-fred · 2026-02-27T15:35:03Z

python/feldera/benchmarking.py

+        last = samples[-1]
+
+        uptime_s = last.runtime_elapsed_msecs / 1000.0
+        throughput = int(last.total_processed_records / uptime_s) if uptime_s > 0 else 0


This computes average throughput since pipeline start, not throughput during the measurement window. If the pipeline was running for minutes before collect_metrics was called, this dramatically understates the throughput seen during the benchmark.

The correct formula is delta-based:

first = samples[0] delta_records = last.total_processed_records - first.total_processed_records delta_secs = (last.runtime_elapsed_msecs - first.runtime_elapsed_msecs) / 1000.0 throughput = int(delta_records / delta_secs) if delta_secs > 0 else 0

Similarly the state_amplification denominator (input_bytes) is a cumulative total — so it has the same issue when the pipeline was pre-warmed.

gz

looks good but let's not make this public documentation because it likely has little use for someone using it outside of our org

so maybe put it e.g., under the testutils module?
we should have at least one test that benchmarkes something before we put this in

gz · 2026-02-27T17:39:03Z

for @abhizer and (@snkas @swanandx) it may be useful to release a utilities python package on PyPI that has the the for testing/qa/benchmarking code that's not core feldera python sdk (but depends on SDK functionality)

gz · 2026-02-27T17:35:13Z

python/docs/examples.rst

+   :header-rows: 1
+   :widths: 40 60
+
+   * - ``fda`` flag


i wouldn't explain this as "fda equivalent" args just document the args

Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>

Signed-off-by: Heorhii Bulakh <bulakh.96@gmail.com>

snkas · 2026-03-09T11:59:25Z

Could an explanation be added what is being benchmarked and why? Is it the Feldera instance itself, or is it about a user pipeline? Is this more like an additional monitoring service helper that does some regular polling?

Signed-off-by: Heorhii Bulakh <bulakh.96@gmail.com>

Karakatiza666 · 2026-03-09T13:11:41Z

@snkas are you talking about this description? Not sure what is ambiguous here?

The :mod:`feldera.benchmarking` module provides utilities to collect and upload
benchmark metrics for Feldera pipelines.  It polls :meth:`.Pipeline.stats` in a
loop, aggregates the snapshots into :class:`.BenchmarkMetrics`, and can
optionally upload a
`Bencher Metric Format (BMF) <https://bencher.dev/docs/reference/test-harnesses/>`_
report to a Bencher-compatible server.

.. note::
   These utilities only **observe** a running pipeline — they do not start,
   stop, or otherwise manage pipeline lifetime.  The caller is responsible for
   starting the pipeline before calling :func:`.bench` or
   :func:`.collect_metrics`, and for stopping it afterwards.

This is not a standalone service, rather designed to be SDK utils as part of monitoring tools, tests etc.

…time_revision to Python SDK Signed-off-by: Heorhii Bulakh <bulakh.96@gmail.com>

Karakatiza666 · 2026-03-09T13:38:00Z

I tested this PR privately, I found it useful when benchmarking a pipeline during a test.

I can move it under feldera.testutils.benchmarking;

we should have at least one test that benchmarkes something before we put this in

Chicken and the egg problem; I suggest merging under e.g. testutils, and then proceeding from there. I don't have a strong opinion on whether it needs to be in a separate Python package, but it would be a new artifact (I did not expect to introduce that), and "Feldera Python SDK" implies some useful tools, not just a thin API wrapper

snkas · 2026-03-09T13:52:39Z

It seems to me the equivalent of fda bench would be to have pipeline.benchmark(), what's the motivation to have it be separate?

If the pipeline start and stop are not done by the function itself, the utilities seem to be more about monitoring with a specific end condition rather than benchmarking.

One nice-to-have would be to have the ending condition also support a user-defined one via a lambda function or so on the pipeline, as for the benchmarks that are about completely processing some data, many connectors do not become completed (as there is not a guarantee that they end) but they still should complete (rather than just having a timeout).

snkas · 2026-03-09T14:22:55Z

It's generally difficult to capture what "benchmarking" means across pipelines, as in benchmarking implies capturing how well it performs, which can be subjective. Makes sense to keep it in a separate module or testutils until it's more settled, such that it doesn't become an API that we need to keep backward compatible. It might be worthwhile for the latter case to prefix _ to the functions to indicate they are for internal usage.

mythical-fred

No tests. 723 lines of new logic with zero test coverage. Functions like _stddev, _human_readable_bytes, BenchmarkMetrics.from_samples, _averaged_metrics, and format_table are pure functions — no pipeline, no infrastructure needed. They should have unit tests before this ships.

mythical-fred · 2026-03-09T17:15:02Z

python/feldera/benchmarking.py

+
+
+def _stddev(values: list[float]) -> float:
+    """Population standard deviation."""


_stddev, _human_readable_bytes, BenchmarkMetrics.from_samples, _averaged_metrics, and format_table are all pure functions with no external dependencies. These should have unit tests. Edge cases worth covering: empty sample list, 1-sample list (delta = 0, throughput = 0), runs with mismatched state_amplification = None, formatting with zero bytes, and multi-run stddev correctness.

mythical-fred · 2026-03-09T17:15:02Z

python/feldera/benchmarking.py

+            if edition == "Open source":
+                context["bencher.dev/v0/repo/hash"] = (
+                    "de8879fbda0c9e9392e3b94064c683a1b4bae216"
+                )


What are these hardcoded hashes? They look like git commit SHAs but bencher.dev/v0/repo/hash is a permanent identifier — these will be wrong the moment anything changes. If this is an internal Bencher convention, add a comment explaining what they represent and why they're static.

Karakatiza666 changed the title ~~Add Python SDK utilities for equivalent~~ Add Python SDK utilities for benchmarking (same as fda bench) Feb 27, 2026

Karakatiza666 changed the title ~~Add Python SDK utilities for benchmarking (same as fda bench)~~ Add Python SDK utilities for benchmarking (similar to fda bench) Feb 27, 2026

Karakatiza666 force-pushed the python-sdk-bench branch from 5665e3e to 918052b Compare February 27, 2026 12:52

mythical-fred reviewed Feb 27, 2026

View reviewed changes

gz reviewed Feb 27, 2026

View reviewed changes

Karakatiza666 force-pushed the python-sdk-bench branch from 918052b to 4bb7827 Compare March 2, 2026 12:30

Add Python SDK utilities for equivalent

ccf7f2f

Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>

Karakatiza666 force-pushed the python-sdk-bench branch from 4bb7827 to ccf7f2f Compare March 7, 2026 13:14

[python-sdk] Add 'Idle' bench strategy

6292a0d

Signed-off-by: Heorhii Bulakh <bulakh.96@gmail.com>

Karakatiza666 force-pushed the python-sdk-bench branch 3 times, most recently from f4216b7 to e5cc06d Compare March 9, 2026 11:26

[python-sdk] Add aggregtion to BenchmarkResult

11efd85

Signed-off-by: Heorhii Bulakh <bulakh.96@gmail.com>

Karakatiza666 force-pushed the python-sdk-bench branch from e5cc06d to 11efd85 Compare March 9, 2026 12:15

[python-sdk] Remove references to 'fda' in benchmarking docs, add run…

d59b4f7

…time_revision to Python SDK Signed-off-by: Heorhii Bulakh <bulakh.96@gmail.com>

Karakatiza666 force-pushed the python-sdk-bench branch from bea556e to d59b4f7 Compare March 9, 2026 13:32

Karakatiza666 marked this pull request as ready for review March 9, 2026 13:33

mythical-fred suggested changes Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Python SDK utilities for benchmarking (similar to `fda bench`)#5716

Add Python SDK utilities for benchmarking (similar to `fda bench`)#5716
Karakatiza666 wants to merge 4 commits intomainfrom
python-sdk-bench

Karakatiza666 commented Feb 27, 2026

Uh oh!

mythical-fred Feb 27, 2026

Uh oh!

gz left a comment •

edited

Loading

Uh oh!

gz commented Feb 27, 2026

Uh oh!

gz Feb 27, 2026

Uh oh!

snkas commented Mar 9, 2026

Uh oh!

Karakatiza666 commented Mar 9, 2026 •

edited

Loading

Uh oh!

Karakatiza666 commented Mar 9, 2026 •

edited

Loading

Uh oh!

snkas commented Mar 9, 2026

Uh oh!

snkas commented Mar 9, 2026 •

edited

Loading

Uh oh!

mythical-fred left a comment

Uh oh!

mythical-fred Mar 9, 2026

Uh oh!

mythical-fred Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def _stddev(values: list[float]) -> float:
		"""Population standard deviation."""

Conversation

Karakatiza666 commented Feb 27, 2026

Add benchmarking utilities to the Python SDK (feldera.benchmarking)

New public API (all exported from feldera):

Design notes:

Uh oh!

mythical-fred Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gz commented Feb 27, 2026

Uh oh!

gz Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

snkas commented Mar 9, 2026

Uh oh!

Karakatiza666 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Karakatiza666 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snkas commented Mar 9, 2026

Uh oh!

snkas commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

mythical-fred Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

mythical-fred Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gz left a comment •

edited

Loading

Karakatiza666 commented Mar 9, 2026 •

edited

Loading

Karakatiza666 commented Mar 9, 2026 •

edited

Loading

snkas commented Mar 9, 2026 •

edited

Loading