Skip to content

feat: Implement data quality field contraint protos#366

Merged
omirandadev merged 6 commits into
masterfrom
feat/field-constraints-data-quality
Jun 17, 2026
Merged

feat: Implement data quality field contraint protos#366
omirandadev merged 6 commits into
masterfrom
feat/field-constraints-data-quality

Conversation

@omirandadev

Copy link
Copy Markdown
Collaborator

What this PR does / why we need it:

Adds typed FieldConstraints and Imputation proto messages plus matching Python/Pydantic surfaces so customers can declare per-column data quality rules and null-imputation strategies on a SparkSource. Constraints are stored on the source's proto and round-trip through the registry. The runtime that enforces them lives downstream in implementing feature engineering jobs that handle them.

Which issue(s) this PR fixes:

Misc

if len(proto.allowed_values) > 0:
kwargs["allowed_values"] = list(proto.allowed_values)
# `regex` is a plain string in proto3; empty == unset.
if proto.regex != "":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw the empty == unset note here, makes sense for proto3. Only nit: the validator still accepts regex="" and to_proto writes it, so you can create a value that silently reads back as None. Could reject empty regex in _regex_compiles to keep the two ends consistent.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. I added a check to reject empty regex strings and a unit test that should reject them as well.

allowed_values: Optional[List[str]] = None
regex: Optional[str] = None
unique: Optional[bool] = None
custom: Optional[Dict[str, str]] = None

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every other field validates at definition time (regex compiles, ranges in [0,1], etc.), but custom predicates are stored as-is, so a bad/empty predicate only fails at the next FE run. Could we reject empty predicate strings here? Maybe a non-empty check would catch the common typo early.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a method _custom_nonempty that checks that the predicate and name are non-empty. Other invalid custom sql expressions will still be accepted by the registry, but we dont have a spark session at the feast apply stage, so basic syntax checks seem like the best we can do for now. The upstream feature engineering job will (likely) have a spark session to perform that check.

@Manisha4 Manisha4 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@omirandadev omirandadev merged commit b561d08 into master Jun 17, 2026
27 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants