Skip to content

arrow ipc sdk#5814

Open
monochromatti wants to merge 3 commits intofeldera:mainfrom
monochromatti:arrow-ipc-sdk
Open

arrow ipc sdk#5814
monochromatti wants to merge 3 commits intofeldera:mainfrom
monochromatti:arrow-ipc-sdk

Conversation

@monochromatti
Copy link

@monochromatti monochromatti commented Mar 13, 2026

Summary

This PR adds Arrow IPC query support to the Python SDK so query results can be fetched directly as pyarrow.Table.

What changed

  • Added a new client API:
    • FelderaClient.query_as_arrow_ipc(...) -> pyarrow.Table
  • Added a pipeline convenience method:
    • Pipeline.query_arrow(...) -> pyarrow.Table
  • Added optional arrow extra for the Python package:
    • pip install "feldera[arrow]"
  • Updated Python README with installation guidance for Arrow support.
  • Added unit tests covering:
    • non-empty results
    • empty results with schema preservation
    • request parameter wiring
    • missing-pyarrow error path
    • Pipeline.query_arrow delegation

Notes

  • Arrow IPC responses are currently fully buffered in memory before deserialization.
  • Error messaging for missing pyarrow is explicit and points users to feldera[arrow].

@monochromatti monochromatti force-pushed the arrow-ipc-sdk branch 2 times, most recently from 4065f37 to edcaa7e Compare March 13, 2026 06:54
Copy link
Collaborator

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — but see inline: there is an existing open PR covering the same feature.

@@ -1217,6 +1232,51 @@ def query_as_parquet(self, pipeline_name: str, query: str, path: str):
file.write(chunk)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up: PR #4226 ("py: support arrow_ipc format for adhoc queries" by @abhizer) is still open and touches the same files with similar intent. It has been open since June 2025 waiting for @gz to review. You may want to coordinate — either close one in favour of the other, or check whether #4226 has superseded functionality that should be absorbed here.

@gz
Copy link
Contributor

gz commented Mar 13, 2026

hi @monochromatti this looks good thanks a lot for your contribution. @abhizer can you review this

@monochromatti
Copy link
Author

I'd like input on whether to return Generator[pyarrow.RecordBatch, ...] or a pyarrow.Table directly. The latter is the current state of the PR, but after some thinking it feels like generating batches is more in style with similar existing functionality and better suited for big payloads.

Copy link
Contributor

@abhizer abhizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

As a heads up, the reason we didn't merge the prior PR is because the server intermittently sent bad data and we were unable to figure out why.

@abhizer
Copy link
Contributor

abhizer commented Mar 13, 2026

I'd like input on whether to return Generator[pyarrow.RecordBatch, ...] or a pyarrow.Table directly

We normally return a generator, and it might be a good idea to keep this behavior consistent.

@mihaibudiu
Copy link
Contributor

@monochromatti please re-request a review from @abhizer when this is ready again

Signed-off-by: Mattias Matthiesen <mattias.matthiesen@eviny.no>
Signed-off-by: Mattias Matthiesen <mattias.matthiesen@eviny.no>
Signed-off-by: Mattias Matthiesen <mattias.matthiesen@eviny.no>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants