Replace PyTables with Polars/Parquet as primary format#3171
Replace PyTables with Polars/Parquet as primary format#3171
Conversation
… utilities Co-authored-by: MMathisLab <28102185+MMathisLab@users.noreply.github.com>
|
@copilot I want to keep HDF5 as an optional output, as well as NWB, but use polar as the main file type |
Co-authored-by: MMathisLab <28102185+MMathisLab@users.noreply.github.com>
Co-authored-by: MMathisLab <28102185+MMathisLab@users.noreply.github.com>
Co-authored-by: MMathisLab <28102185+MMathisLab@users.noreply.github.com>
Done! I've updated the implementation to keep HDF5 as the main output file format while adding Polars as an optional performance enhancement. Changes in commit 9c2ec6e:
Usage: from deeplabcut.utils import fileio
# Standard HDF5 I/O (format unchanged)
fileio.write_hdf_with_polars(df, "poses.h5")
# Optional: Use Polars for fast operations
import polars as pl
pl_df = fileio.dataframe_to_polars(df)
filtered = pl_df.filter(pl.col("likelihood") > 0.9)See |
|
@copilot but I want to remove pytables / tables as a dependency; just use pandas to convert to HDF5 or NWB as needed from polars/parquet |
…ytables dependency Co-authored-by: MMathisLab <28102185+MMathisLab@users.noreply.github.com>
Done! I've updated the implementation to use Parquet as the primary format and removed pytables as a required dependency. Changes in commit bbf53f2:
Usage: from deeplabcut.utils import fileio
# Primary: Parquet format
fileio.write_dataframe(df, "poses.parquet")
df = fileio.read_dataframe("poses.parquet")
# Optional: Export to HDF5 (pip install tables)
fileio.write_dataframe(df, "poses.h5", format="hdf5")
# Optional: Export to NWB (pip install pynwb tables)
fileio.write_dataframe(df, "poses.nwb", format="nwb")
# Convert existing HDF5 files
fileio.migrate_directory_to_parquet("/path/to/project")See |
@MMathisLab I think the easiest solution would be to just let pandas handle this dependency. PyTables is included as extras, so I think: would solve the issue. I'll work on PR #3134 that updates our pyproject.toml and allows for using uv which should smooth our installation issues. |
|
@deruyter92 but I want to move to polars in general - it's faster, more modern, and better than hdf5 |
Replaces PyTables with Polars/Parquet as the primary file format for DeepLabCut. HDF5 and NWB formats are available as optional exports using pandas conversion (no pytables dependency required).
Changes
Dependencies
polars>=0.20.0andpyarrow>=14.0.0(required)tables(PyTables) - now optional for HDF5/NWB export onlyrequirements.txt,setup.py,conda-environments/DEEPLABCUT.yamlNew Module:
deeplabcut/utils/fileio.pyread_dataframe()- Read Parquet (primary) or HDF5 (legacy) with auto-detectionwrite_dataframe()- Write Parquet (default), optional HDF5/NWB exportconvert_to_parquet()- Convert HDF5 files to Parquetmigrate_directory_to_parquet()- Batch conversion utilitydataframe_to_polars()- Convert Pandas DataFrame to Polarspolars_to_dataframe()- Convert Polars DataFrame to PandasCore Integration
.parquetfiles)tables/pynwbinstall)Documentation & Tests
tests/test_fileio.pyPARQUET_MIGRATION_README.mdUsage
Backward Compatibility
.h5files remain readablePerformance
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.