Feature Summary
Texera workflows often involve exploring a dataset before applying cleaning, visualization, or analysis operators. A basic column-level summary operator would help users quickly understand the shape and quality of an input table.
This issue proposes adding a Column Summary Statistics workflow operator that takes one input table and outputs one summary row per input column.
Initial output fields:
- columnName
- dataType
- rowCount
- nullCount
- nonNullCount
- minValue
- maxValue
- meanValue
Proposed Solution or Design
For the first version:
- Numeric columns should report min, max, and mean.
- Non-numeric columns should report row/null/non-null counts and leave numeric summary fields null.
- The operator should follow existing Texera native operator patterns.
- Unit tests should cover numeric columns, non-numeric columns, null values, mixed columns, and empty input.
This is intended as a focused workflow operator for basic per-column summary statistics.
Affected Area
Workflow Engine (Amber)
Feature Summary
Texera workflows often involve exploring a dataset before applying cleaning, visualization, or analysis operators. A basic column-level summary operator would help users quickly understand the shape and quality of an input table.
This issue proposes adding a Column Summary Statistics workflow operator that takes one input table and outputs one summary row per input column.
Initial output fields:
Proposed Solution or Design
For the first version:
This is intended as a focused workflow operator for basic per-column summary statistics.
Affected Area
Workflow Engine (Amber)