datafusion.dataframe_formatter¶

HTML formatting utilities for DataFusion DataFrames.

Classes¶

CellFormatter

Protocol for cell value formatters.

DataFrameHtmlFormatter

Configurable HTML formatter for DataFusion DataFrames.

DefaultStyleProvider

Default implementation of StyleProvider.

FormatterManager

Manager class for the global DataFrame HTML formatter instance.

StyleProvider

Protocol for HTML style providers.

Functions¶

_refresh_formatter_reference(→ None)

Refresh formatter reference in any modules using it.

_validate_bool(→ None)

Validate that a parameter is a boolean.

_validate_positive_int(→ None)

Validate that a parameter is a positive integer.

configure_formatter(→ None)

Configure the global DataFrame HTML formatter.

get_formatter(→ DataFrameHtmlFormatter)

Get the current global DataFrame HTML formatter.

reset_formatter(→ None)

Reset the global DataFrame HTML formatter to default settings.

set_formatter(→ None)

Set the global DataFrame HTML formatter.

Module Contents¶

class datafusion.dataframe_formatter.CellFormatter¶

Bases: Protocol

Protocol for cell value formatters.

__call__(value: Any) str¶

Format a cell value to string representation.

class datafusion.dataframe_formatter.DataFrameHtmlFormatter(max_cell_length: int = 25, max_width: int = 1000, max_height: int = 300, max_memory_bytes: int = 2 * 1024 * 1024, min_rows_display: int = 20, repr_rows: int = 10, enable_cell_expansion: bool = True, custom_css: str | None = None, show_truncation_message: bool = True, style_provider: StyleProvider | None = None, use_shared_styles: bool = True)¶

Configurable HTML formatter for DataFusion DataFrames.

This class handles the HTML rendering of DataFrames for display in Jupyter notebooks and other rich display contexts.

This class supports extension through composition. Key extension points: - Provide a custom StyleProvider for styling cells and headers - Register custom formatters for specific types - Provide custom cell builders for specialized cell rendering

Parameters:
  • max_cell_length – Maximum characters to display in a cell before truncation

  • max_width – Maximum width of the HTML table in pixels

  • max_height – Maximum height of the HTML table in pixels

  • max_memory_bytes – Maximum memory in bytes for rendered data (default: 2MB)

  • min_rows_display – Minimum number of rows to display

  • repr_rows – Default number of rows to display in repr output

  • enable_cell_expansion – Whether to add expand/collapse buttons for long cell values

  • custom_css – Additional CSS to include in the HTML output

  • show_truncation_message – Whether to display a message when data is truncated

  • style_provider – Custom provider for cell and header styles

  • use_shared_styles – Whether to load styles and scripts only once per notebook session

Initialize the HTML formatter.

Parameters:
  • max_cell_length (int, default 25) – Maximum length of cell content before truncation.

  • max_width (int, default 1000) – Maximum width of the displayed table in pixels.

  • max_height (int, default 300) – Maximum height of the displayed table in pixels.

  • max_memory_bytes (int, default 2097152 (2MB)) – Maximum memory in bytes for rendered data.

  • min_rows_display (int, default 20) – Minimum number of rows to display.

  • repr_rows (int, default 10) – Default number of rows to display in repr output.

  • enable_cell_expansion (bool, default True) – Whether to allow cells to expand when clicked.

  • custom_css (str, optional) – Custom CSS to apply to the HTML table.

  • show_truncation_message (bool, default True) – Whether to show a message indicating that content has been truncated.

  • style_provider (StyleProvider, optional) – Provider of CSS styles for the HTML table. If None, DefaultStyleProvider is used.

  • use_shared_styles (bool, default True) – Whether to use shared styles across multiple tables.

  • Raises

  • ------

  • ValueError – If max_cell_length, max_width, max_height, max_memory_bytes, min_rows_display, or repr_rows is not a positive integer.

  • TypeError – If enable_cell_expansion, show_truncation_message, or use_shared_styles is not a boolean, or if custom_css is provided but is not a string, or if style_provider is provided but does not implement the StyleProvider protocol.

_build_expandable_cell(formatted_value: str, row_count: int, col_idx: int, table_uuid: str) str¶

Build an expandable cell for long content.

Build the HTML footer with JavaScript and messages.

_build_html_header() list[str]¶

Build the HTML header with CSS styles.

_build_regular_cell(formatted_value: str) str¶

Build a regular table cell.

_build_table_body(batches: list, table_uuid: str) list[str]¶

Build the HTML table body with data rows.

_build_table_container_start() list[str]¶

Build the opening tags for the table container.

_build_table_header(schema: Any) list[str]¶

Build the HTML table header with column names.

_format_cell_value(value: Any) str¶

Format a cell value for display.

Uses registered type formatters if available.

Parameters:

value – The cell value to format

Returns:

Formatted cell value as string

_get_cell_value(column: Any, row_idx: int) Any¶

Extract a cell value from a column.

Parameters:
  • column – Arrow array

  • row_idx – Row index

Returns:

The raw cell value

_get_default_css() str¶

Get default CSS styles for the HTML table.

_get_javascript() str¶

Get JavaScript code for interactive elements.

format_html(batches: list, schema: Any, has_more: bool = False, table_uuid: str | None = None) str¶

Format record batches as HTML.

This method is used by DataFrame’s _repr_html_ implementation and can be called directly when custom HTML rendering is needed.

Parameters:
  • batches – List of Arrow RecordBatch objects

  • schema – Arrow Schema object

  • has_more – Whether there are more batches not shown

  • table_uuid – Unique ID for the table, used for JavaScript interactions

Returns:

HTML string representation of the data

Raises:

TypeError – If schema is invalid and no batches are provided

format_str(batches: list, schema: Any, has_more: bool = False, table_uuid: str | None = None) str¶

Format record batches as a string.

This method is used by DataFrame’s __repr__ implementation and can be called directly when string rendering is needed.

Parameters:
  • batches – List of Arrow RecordBatch objects

  • schema – Arrow Schema object

  • has_more – Whether there are more batches not shown

  • table_uuid – Unique ID for the table, used for JavaScript interactions

Returns:

String representation of the data

Raises:

TypeError – If schema is invalid and no batches are provided

register_formatter(type_class: type, formatter: CellFormatter) None¶

Register a custom formatter for a specific data type.

Parameters:
  • type_class – The type to register a formatter for

  • formatter – Function that takes a value of the given type and returns a formatted string

set_custom_cell_builder(builder: collections.abc.Callable[[Any, int, int, str], str]) None¶

Set a custom cell builder function.

Parameters:

builder – Function that takes (value, row, col, table_id) and returns HTML

set_custom_header_builder(builder: collections.abc.Callable[[Any], str]) None¶

Set a custom header builder function.

Parameters:

builder – Function that takes a field and returns HTML

_custom_cell_builder: collections.abc.Callable[[Any, int, int, str], str] | None = None¶
_custom_header_builder: collections.abc.Callable[[Any], str] | None = None¶
_type_formatters: dict[type, CellFormatter]¶
custom_css = None¶
enable_cell_expansion = True¶
max_cell_length = 25¶
max_height = 300¶
max_memory_bytes = 2097152¶
max_width = 1000¶
min_rows_display = 20¶
repr_rows = 10¶
show_truncation_message = True¶
style_provider¶
use_shared_styles = True¶
class datafusion.dataframe_formatter.DefaultStyleProvider¶

Default implementation of StyleProvider.

get_cell_style() str¶

Get the CSS style for table cells.

Returns:

CSS style string

get_header_style() str¶

Get the CSS style for header cells.

Returns:

CSS style string

class datafusion.dataframe_formatter.FormatterManager¶

Manager class for the global DataFrame HTML formatter instance.

classmethod get_formatter() DataFrameHtmlFormatter¶

Get the current global DataFrame HTML formatter.

Returns:

The global HTML formatter instance

classmethod set_formatter(formatter: DataFrameHtmlFormatter) None¶

Set the global DataFrame HTML formatter.

Parameters:

formatter – The formatter instance to use globally

_default_formatter: DataFrameHtmlFormatter¶
class datafusion.dataframe_formatter.StyleProvider¶

Bases: Protocol

Protocol for HTML style providers.

get_cell_style() str¶

Get the CSS style for table cells.

get_header_style() str¶

Get the CSS style for header cells.

datafusion.dataframe_formatter._refresh_formatter_reference() None¶

Refresh formatter reference in any modules using it.

This helps ensure that changes to the formatter are reflected in existing DataFrames that might be caching the formatter reference.

datafusion.dataframe_formatter._validate_bool(value: Any, param_name: str) None¶

Validate that a parameter is a boolean.

Parameters:
  • value – The value to validate

  • param_name – Name of the parameter (used in error message)

Raises:

TypeError – If the value is not a boolean

datafusion.dataframe_formatter._validate_positive_int(value: Any, param_name: str) None¶

Validate that a parameter is a positive integer.

Parameters:
  • value – The value to validate

  • param_name – Name of the parameter (used in error message)

Raises:

ValueError – If the value is not a positive integer

datafusion.dataframe_formatter.configure_formatter(**kwargs: Any) None¶

Configure the global DataFrame HTML formatter.

This function creates a new formatter with the provided configuration and sets it as the global formatter for all DataFrames.

Parameters:

**kwargs – Formatter configuration parameters like max_cell_length, max_width, max_height, enable_cell_expansion, etc.

Raises:

ValueError – If any invalid parameters are provided

Example

>>> from datafusion.html_formatter import configure_formatter
>>> configure_formatter(
...     max_cell_length=50,
...     max_height=500,
...     enable_cell_expansion=True,
...     use_shared_styles=True
... )
datafusion.dataframe_formatter.get_formatter() DataFrameHtmlFormatter¶

Get the current global DataFrame HTML formatter.

This function is used by the DataFrame._repr_html_ implementation to access the shared formatter instance. It can also be used directly when custom HTML rendering is needed.

Returns:

The global HTML formatter instance

Example

>>> from datafusion.html_formatter import get_formatter
>>> formatter = get_formatter()
>>> formatter.max_cell_length = 50  # Increase cell length
datafusion.dataframe_formatter.reset_formatter() None¶

Reset the global DataFrame HTML formatter to default settings.

This function creates a new formatter with default configuration and sets it as the global formatter for all DataFrames.

Example

>>> from datafusion.html_formatter import reset_formatter
>>> reset_formatter()  # Reset formatter to default settings
datafusion.dataframe_formatter.set_formatter(formatter: DataFrameHtmlFormatter) None¶

Set the global DataFrame HTML formatter.

Parameters:

formatter – The formatter instance to use globally

Example

>>> from datafusion.html_formatter import get_formatter, set_formatter
>>> custom_formatter = DataFrameHtmlFormatter(max_cell_length=100)
>>> set_formatter(custom_formatter)