feat: Updating FeatureViewProjection and OnDemandFeatureView to add batch_source and entities#4530
Conversation
| name_alias=None, | ||
| features=base_feature_view.features, | ||
| desired_features=[], | ||
| timestamp_field=base_feature_view.batch_source.created_timestamp_column # type:ignore[attr-defined] |
There was a problem hiding this comment.
This is admittedly a hack and I don't like it but this should be refactored in 1.0.0
My overall learning from this is that FeatureViews should be really derived from the same object with the same class parameters and make them optionally instantiated.
This should be described more thoroughly for 1.0.0
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…otobuf Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…ing to put a workaround Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
f50923f to
cf86862
Compare
|
A few questions from me:
I'm not sure this will hold true actually... BatchFeatureViews or whatever we will call them will need to rely on multiple sources for example.
Can you elaborate on why we need this? My understanding is that
I sort of understand the rationale behind this, but wouldn't it better for us to try to keep OnDemandFeatureViews entity-agnostic? I guess I have concerns similar to FeatureViewProjection ones above. odfv already has dependency on feature views and their entities as a result. Relisting them here feels redundant and would break a single of version of truth. What is a user applies an odfv whose entities do not match the entities of relevant feature views? My idea about a odfv caching was not to make them retrievable by any set of entities, but rather by some sort of key that would hold information about all the input fields. |
|
ODFVs today can behave the old way by not using the optional forthcoming optional parameter that configures writes. The benefit of storing the data here for batch will be the backfilling and seeing the full lineage of the data sources that's used to compute the ODFV. In a simple way: ODFV = f(Data source 1, data source 2, feature view 3) In practice, combinations like this may occur and we get this flexibility and lineage. Having the write option basically acts as a precomputation option for faster retrieval. |
In order to write an ODFV that can be stored in the online store for retrieval, you need an entity. This is also important for backfilling.
Not redundant as things can change at different points in time. |
|
Instead of directly updating the |
|
It won't be a breaking change as it's optional. I do think we should revisit all of these for free 1.0.0 release and make StreamFeatureView, FeatureView, and OnDemandFeatureView all inherit from BaseFeatureView with a fixed set of parameters and make things much cleaner. They have lots of duplicate code all over the place and it's very unnecessary...but I'd rather do that full cleanup once we have this functionality done. Wdyt? |
|
Sorry, couldn't follow up sooner.
I think that's precisely my point. we are allowing essentially same concepts to be defined twice in unrelated places that can easily diverge from one another. For example, after this PR what's the difference between Is there any scenario where it makes sense for P.S. the same applied to the |
What this PR does / why we need it:
This PR addresses some gaps in the
FeatureViewProjection,BaseFeatureView, andOnDemandFeatureviewclasses. In particular, this PR aims to allow the data source to be included in theBaseFeatureViewand all data sources in aFeatureViewProjection. I make the assumption that a singleBaseFeatureViewcan only have one datasource, which is a reasonable assumption.This structure allows users to express an explicit dependency graph with the
BaseFeatureViewbeing ultimately tied to one single, foundational data source which I believe is the right pattern.Along with storing the datasources, we also store relevant batch_source data in the Projection.
More specificially, this PR update the following:
FeatureViewProjectionto include the underlying data sources forOnDemandFeatureView. This allows for much richer lineage when constructing metadata about anOnDemandFeatureViews and its data sources.OnDemandFeatureViewto optionally supportentities, which will be required to write anOnDemandFeatureViewto the online storeBaseFeatureViewto include sources to optionally includebath_sourceOnDemandFeatureViewto includewrite_to_online_store, a boolean to be used in a follow-up PR (not used in this PR)Which issue(s) this PR fixes:
This is a step along the path of solving #4376
Misc
N/A