Skip to content

feat: Updating FeatureViewProjection and OnDemandFeatureView to add batch_source and entities#4530

Merged
HaoXuAI merged 16 commits into
masterfrom
fv-projections
Sep 23, 2024
Merged

feat: Updating FeatureViewProjection and OnDemandFeatureView to add batch_source and entities#4530
HaoXuAI merged 16 commits into
masterfrom
fv-projections

Conversation

@franciscojavierarceo

@franciscojavierarceo franciscojavierarceo commented Sep 17, 2024

Copy link
Copy Markdown
Member

What this PR does / why we need it:

This PR addresses some gaps in the FeatureViewProjection, BaseFeatureView, and OnDemandFeatureview classes. In particular, this PR aims to allow the data source to be included in the BaseFeatureView and all data sources in a FeatureViewProjection. I make the assumption that a single BaseFeatureView can only have one datasource, which is a reasonable assumption.

This structure allows users to express an explicit dependency graph with the BaseFeatureView being ultimately tied to one single, foundational data source which I believe is the right pattern.

Along with storing the datasources, we also store relevant batch_source data in the Projection.

More specificially, this PR update the following:

  1. FeatureViewProjection to include the underlying data sources for OnDemandFeatureView. This allows for much richer lineage when constructing metadata about an OnDemandFeatureViews and its data sources.
  2. OnDemandFeatureView to optionally support entities, which will be required to write an OnDemandFeatureView to the online store
  3. BaseFeatureView to include sources to optionally include bath_source
  4. OnDemandFeatureView to include write_to_online_store, a boolean to be used in a follow-up PR (not used in this PR)

Which issue(s) this PR fixes:

This is a step along the path of solving #4376

Misc

N/A

name_alias=None,
features=base_feature_view.features,
desired_features=[],
timestamp_field=base_feature_view.batch_source.created_timestamp_column # type:ignore[attr-defined]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is admittedly a hack and I don't like it but this should be refactored in 1.0.0

My overall learning from this is that FeatureViews should be really derived from the same object with the same class parameters and make them optionally instantiated.

This should be described more thoroughly for 1.0.0

@franciscojavierarceo franciscojavierarceo changed the title feat: Updating protos for Projections to include more info feat: Updating OnDemandFeatureView to add Entities and batch_source Sep 18, 2024
@franciscojavierarceo franciscojavierarceo changed the title feat: Updating OnDemandFeatureView to add Entities and batch_source feat: Updating FeatureViewProjection and OnDemandFeatureView to add batch_source and entities Sep 21, 2024
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…otobuf

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…ing to put a workaround

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review September 21, 2024 18:23
@franciscojavierarceo franciscojavierarceo requested review from a team, HaoXuAI, adchia, felixwang9817, shuchu and tokoko and removed request for a team September 21, 2024 18:23
@tokoko

tokoko commented Sep 22, 2024

Copy link
Copy Markdown
Collaborator

A few questions from me:

I make the assumption that a single BaseFeatureView can only have one datasource, which is a reasonable assumption.

I'm not sure this will hold true actually... BatchFeatureViews or whatever we will call them will need to rely on multiple sources for example.

FeatureViewProjection to include the underlying data sources for OnDemandFeatureView. This allows for much richer lineage when constructing metadata about an OnDemandFeatureViews and its data sources.

Can you elaborate on why we need this? My understanding is that FeatureViewProjection objects hold additional info (request-specific modifications) about a specific FeatureView, what's the point of copying some of those field over here when they can already be accessed from FeatureView objects anyway. Feels like unnecessary duplication to me.

OnDemandFeatureView to optionally support entities, which will be required to write an OnDemandFeatureView to the online store

I sort of understand the rationale behind this, but wouldn't it better for us to try to keep OnDemandFeatureViews entity-agnostic? I guess I have concerns similar to FeatureViewProjection ones above. odfv already has dependency on feature views and their entities as a result. Relisting them here feels redundant and would break a single of version of truth. What is a user applies an odfv whose entities do not match the entities of relevant feature views?

My idea about a odfv caching was not to make them retrievable by any set of entities, but rather by some sort of key that would hold information about all the input fields.

@franciscojavierarceo

Copy link
Copy Markdown
Member Author

ODFVs today can behave the old way by not using the optional forthcoming optional parameter that configures writes. The benefit of storing the data here for batch will be the backfilling and seeing the full lineage of the data sources that's used to compute the ODFV.

In a simple way:

ODFV = f(Data source 1, data source 2, feature view 3)

In practice, combinations like this may occur and we get this flexibility and lineage. Having the write option basically acts as a precomputation option for faster retrieval.

@franciscojavierarceo

franciscojavierarceo commented Sep 22, 2024

Copy link
Copy Markdown
Member Author

I sort of understand the rationale behind this, but wouldn't it better for us to try to keep OnDemandFeatureViews entity-agnostic? I guess I have concerns similar to FeatureViewProjection ones above. odfv already has dependency on feature views and their entities as a result.

In order to write an ODFV that can be stored in the online store for retrieval, you need an entity.

This is also important for backfilling.

Relisting them here feels redundant and would break a single of version of truth. What if a user applies an odfv whose entities do not match the entities of relevant feature views?

Not redundant as things can change at different points in time.

@HaoXuAI

HaoXuAI commented Sep 22, 2024

Copy link
Copy Markdown
Collaborator

Instead of directly updating the BaseFeatureView with DataSource, is it possible to change OndemandFeatureView in inherit from FeatureView which has the source attribute already?
I'm not saying this is a better idea since all types of concrete feature views have the source attribute. Just in case updating the BaseFeatureView might cause breaking changes

@franciscojavierarceo

Copy link
Copy Markdown
Member Author

It won't be a breaking change as it's optional. I do think we should revisit all of these for free 1.0.0 release and make StreamFeatureView, FeatureView, and OnDemandFeatureView all inherit from BaseFeatureView with a fixed set of parameters and make things much cleaner. They have lots of duplicate code all over the place and it's very unnecessary...but I'd rather do that full cleanup once we have this functionality done. Wdyt?

@HaoXuAI HaoXuAI left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@HaoXuAI HaoXuAI merged commit 0795496 into master Sep 23, 2024
@tokoko

tokoko commented Sep 23, 2024

Copy link
Copy Markdown
Collaborator

Sorry, couldn't follow up sooner.

Not redundant as things can change at different points in time.

I think that's precisely my point. we are allowing essentially same concepts to be defined twice in unrelated places that can easily diverge from one another.

For example, after this PR what's the difference between entities field of an odfvs vs retrieving the list of entities from underlying feature views:

odfv = store.get_on_demand_feature_view('transformed_conv_rate')

entities_1 = odfv.entities
entities_2 = [store.registry.get_any_feature_view(fvp.name).entities for fvp in odfv.source_feature_view_projections]

Is there any scenario where it makes sense for entities_1 and entities_2 not to be exactly the same? If not, why do we need them defined twice?

P.S. the same applied to the FeatureViewProjection. Those fields can easily be retrieved from the underlying FeatureView imho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants