feat: Valkey Online Write Batch Vector Search Support by Manisha4 · Pull Request #351 · ExpediaGroup/feast

Manisha4 · 2026-03-31T17:31:20Z

What this PR does / why we need it:

                                                                                                                                                                                                                                                                                                                                                                                                                                 Adds vector embedding storage and indexing support to the Valkey online store, enabling vector similarity search capabilities in Feast (Phase 1 - Write Support).

Changes

Implementation (eg_valkey.py):

Add vector field detection via field.vector_index property
Store vector fields with original field names + raw numpy bytes (required for Valkey Search FT.SEARCH)
Store non-vector fields with mmh3 hash + protobuf (existing behavior preserved)
Automatic vector index creation (FT.CREATE) on first write when vector fields are present
Support for FLAT and HNSW index algorithms with configurable parameters
Support for Float32 and Float64 vector types
Vector dimension validation against field.vector_length

Configuration (EGValkeyOnlineStoreConfig):

vector_index_algorithm: FLAT or HNSW (default: HNSW)
vector_index_hnsw_m: Max outgoing edges per node (default: 16)
vector_index_hnsw_ef_construction: Index build quality (default: 200)
vector_index_hnsw_ef_runtime: Search quality (default: 10)

Design Decisions

Vector support for regular FeatureViews only (not SortedFeatureView, consistent with Milvus/ES)
Single vector per FeatureView (matches current Feast API constraint)
Backward compatible: non-vector feature views work unchanged

…batch functionality

vanitabhagwat · 2026-03-31T19:12:26Z

+        feast_dtype: Feast data type (e.g., Array(Float32))
+
+    Returns:
+        Valkey vector type string: "FLOAT32" or "FLOAT64"


Is float64 a supported vector type in valkey?

yes it is supported

This document says, only float32 is supported. A'm I missing something?
https://valkey.io/topics/search-data-formats/

You're right, I looked at data formats not relating to Search by mistake. Thanks!!

vanitabhagwat · 2026-03-31T20:39:18Z

+        index_name = _get_vector_index_name(config.project, table.name)
+
+        # Check if index exists
+        try:


If two processes call online_write_batch concurrently for the same feature view, both could see the index as non-existent and both attempt FT.CREATE, with one failing.

Will catch the exception like the other DBs

vanitabhagwat · 2026-03-31T20:58:31Z

+
+            if feature_name in vector_field_names:
+                # Vector field: deserialize from raw bytes
+                field = next(f for f in feature_view.features if f.name == feature_name)


This is O(n) field lookup per vector feature per entity. Can we try a dict lookup?

vanitabhagwat · 2026-03-31T21:15:01Z

+            schema_fields.append(VectorField(field_name, algorithm, attributes))
+
+        # Define index on HASH keys with specific prefix
+        key_prefix = _redis_key_prefix(table.join_keys)


_redis_key_prefix(table.join_keys) produces a prefix based only on the join key names (e.g., ["item_id"]), without including the project. But _redis_key() builds the actual HASH keys as
<serialized_entity_key>, with the project appended as a suffix.

This means the IndexDefinition prefix will match HASH keys from all projects that share the same join key names. If two projects (e.g., prod and staging) both have a feature view with join_keys=["item_id"], the vector index created for prod will also index staging's keys, and vice versa.
Im not sure how the solution would look like for this. Maybe use a filter expression or something when building the search to scope the results to correct project?

Yes, I added a TODO comment for now but will address this while implementing the Search part.

vanitabhagwat

I think we need to clean up the vector indexes somewhere. Maybe in delete_table and call it in teardown like we do for hashes.

vanitabhagwat · 2026-04-02T19:22:25Z

                if len(_tables) == 1:
                    pipe.delete(_k)
                else:
                    pipe.hdel(_k, *valkey_hash_keys)


Since we are using the original field names for vector fields, *pipe.hdel(_k, valkey_hash_keys) wont delete the vector fields. I
n case of vector fields, you need to have the original vector names added to valkey_hash_keys.

Thanks, made the fix!

piket · 2026-04-02T20:13:38Z

+def _get_vector_index_name(project: str, feature_view_name: str) -> str:
+    """Generate Valkey Search index name for vector fields."""
+    return f"{project}_{feature_view_name}_vidx"
+


Can we add the feature name into the index name? Looking ahead at multi-vector we are going to run into issues with just project + fv name.

piket

LGTM

* Adding support for Valkey Search, adding changes to the online_write_batch functionality * Addressing PR comments * addressing linting error * fix tests * addressing PR comments * addressing PR comments * fixing linting --------- Co-authored-by: Manisha4 <Manisha4@github.com>

* feat: Valkey Online Write Batch Vector Search Support (#351) * Adding support for Valkey Search, adding changes to the online_write_batch functionality * Addressing PR comments * addressing linting error * fix tests * addressing PR comments * addressing PR comments * fixing linting --------- Co-authored-by: Manisha4 <Manisha4@github.com> * feat: Support Vector Search in Valkey (#354) * Adding support for Valkey Search, adding changes to the online_write_batch functionality * Addressing PR comments * addressing linting error * Adding changes to support search in valkey * fix tests * adding unit tests * reformatting files and adding checks and more tests * reformatting files and adding checks and more tests * reformatting files and adding checks and more tests * Fix linter errors: type annotations and code formatting - Add explicit type annotation for schema_fields to support both TagField and VectorField - Encode project string to bytes for consistency with other hash values - Decode doc_key bytes to string for hmget compatibility - Fix code formatting: break long lines and remove extra blank lines - Remove tests for multiple vector fields (Feast enforces one vector per feature view) - Fix config type: use 'eg-valkey' (hyphen) not 'eg_valkey' (underscore) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * addressing PR comments * addressing PR comments * fixing linting * Fix missing feature_name argument in retrieve_online_documents_v2 Add the third argument (vector_field.name) to _get_vector_index_name call to match the updated function signature. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * addressing comments, PR changes for some fixes and merge conflicts * fixing tests * fixing tests * fixing linting * fixing linting --------- Co-authored-by: Manisha4 <Manisha4@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: Valkey vector search - remove unsupported SORTBY (#356) * fix: Valkey vector search - remove unsupported SORTBY and fix tag filter syntax Valkey Search KNN queries return results pre-sorted by distance, so explicit SORTBY is not supported and causes a ResponseError. This removes the .sort_by() call from the query builder. Additionally, fixes the project tag filter to use unquoted syntax with backslash escaping for special characters (e.g. hyphens, dots) instead of the quoted syntax which was returning empty results. Updates unit tests to reflect both changes: replaces three metric-specific sort order tests with a single test asserting no SORTBY is set, and updates escaping assertions to match the new backslash-escape approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: apply ruff format to eg_valkey.py and test_valkey.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Manisha4 <Manisha4@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Manisha4 <Manisha4@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Updated the elastic search to use vector_index defined in the feature view to identify vector fields (#348) * updated the elastic search to use vector_index defined in the feature view to identify vector fields * fix: formatting * Added logging and switched to use open source elastic search --------- Co-authored-by: vanitabhagwat <vbhagwat@expediagroup.com> * fix: ES integration tests (#350) * fix: ES integration tests * fix: Added fromisoformat() for converting timestamps --------- Co-authored-by: vanitabhagwat <vbhagwat@expediagroup.com> * fix:Elasticsearch online store — correctness, performance, and robustness fixes (#353) Co-authored-by: vanitabhagwat <vbhagwat@expediagroup.com> * Feature/vector store (#357) * feat: Valkey Online Write Batch Vector Search Support (#351) * Adding support for Valkey Search, adding changes to the online_write_batch functionality * Addressing PR comments * addressing linting error * fix tests * addressing PR comments * addressing PR comments * fixing linting --------- Co-authored-by: Manisha4 <Manisha4@github.com> * feat: Support Vector Search in Valkey (#354) * Adding support for Valkey Search, adding changes to the online_write_batch functionality * Addressing PR comments * addressing linting error * Adding changes to support search in valkey * fix tests * adding unit tests * reformatting files and adding checks and more tests * reformatting files and adding checks and more tests * reformatting files and adding checks and more tests * Fix linter errors: type annotations and code formatting - Add explicit type annotation for schema_fields to support both TagField and VectorField - Encode project string to bytes for consistency with other hash values - Decode doc_key bytes to string for hmget compatibility - Fix code formatting: break long lines and remove extra blank lines - Remove tests for multiple vector fields (Feast enforces one vector per feature view) - Fix config type: use 'eg-valkey' (hyphen) not 'eg_valkey' (underscore) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * addressing PR comments * addressing PR comments * fixing linting * Fix missing feature_name argument in retrieve_online_documents_v2 Add the third argument (vector_field.name) to _get_vector_index_name call to match the updated function signature. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * addressing comments, PR changes for some fixes and merge conflicts * fixing tests * fixing tests * fixing linting * fixing linting --------- Co-authored-by: Manisha4 <Manisha4@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: Valkey vector search - remove unsupported SORTBY (#356) * fix: Valkey vector search - remove unsupported SORTBY and fix tag filter syntax Valkey Search KNN queries return results pre-sorted by distance, so explicit SORTBY is not supported and causes a ResponseError. This removes the .sort_by() call from the query builder. Additionally, fixes the project tag filter to use unquoted syntax with backslash escaping for special characters (e.g. hyphens, dots) instead of the quoted syntax which was returning empty results. Updates unit tests to reflect both changes: replaces three metric-specific sort order tests with a single test asserting no SORTBY is set, and updates escaping assertions to match the new backslash-escape approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: apply ruff format to eg_valkey.py and test_valkey.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Manisha4 <Manisha4@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Manisha4 <Manisha4@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: Quantization support for elastic search (#355) * feat: Add retrieve_online_documents_v3 SDK with multi-vector and hybrid fusion (#358) * feat: implement retrieve_online_documents_v3 SDK method - Multi-vector search with configurable fusion (RRF, WEIGHTED_LINEAR, VECTOR_ONLY) via the ES retriever API. Valkey gracefully degrades to single-vector KNN with warnings. - "embedding" magic key for V2→V3 migration convenience - Reserved output fields: final_score, signal_scores - include_signal_scores and distance_metric accepted as reserved params - ODFV and reserved-name collision validation - Shared signal_scores encoding via _signal_scores helper * update tests * update tests * fixing linting * docs: clarify final_score semantics in Valkey V3 docstring Correct the Valkey final_score description — Valkey's __distance__ is lower-is-better across all metrics (COSINE, L2, IP), not higher-is-better for IP. Call out the ordering inversion vs Elasticsearch so callers don't assume cross-backend score portability. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: plumb include_signal_scores through V3 and align defaults to False Valkey/ES/provider/passthrough/online-store defaults were True, mismatching the SDK's False default. Align all layers on False and thread the parameter from retrieve_online_documents_v3 through the internal dispatcher, provider, and online stores so callers can opt in today and transparently pick up the explain-based per-signal path when it lands — no API change required. Tighten docstrings to describe the current best-effort behavior instead of hinting at latency tradeoffs that aren't wired yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * updating doc string * fix: preserve ranked row order in V3 retrieve_online_documents _retrieve_from_online_store_v3 was passing the driver's ranked rows through _get_unique_entities_from_values, which sorts and dedupes by entity-key bytes. That helper is correct for batch entity lookups but wrong here — ES/Valkey have already ordered rows by relevance, and the sort was scrambling them in the final DataFrame (e.g. doc_10 jumping ahead of doc_3 because "10" < "3" lexicographically). Replace the helper call with an identity mapping so the driver's rank order flows through untouched. No change to V1, V2, batch reads, or the helper itself; V3 output now matches the order returned by the online store. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Manisha4 <Manisha4@github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * fix: apply rescore_oversample to V3 ES kNN retrievers (#363) V3's retriever construction was silently ignoring rescore_oversample. V2 honored it (lines 483-486, 617-620) but V3 never added rescore_vector to its kNN clauses. On quantized indices (int8_hnsw / int4_hnsw / bbq_hnsw), this meant V3 queries returned lower recall than the config promised, with no error or warning. Wire rescore_oversample into each kNN retriever the same way V2 does. Covers single-vector and multi-vector V3 queries; BM25 retrievers skip the branch since they lack a "knn" key. Existing config validation (lines 102-127) already prevents rescore on non-quantized indices, so no new validation needed. Added three unit tests in TestRetrieveOnlineDocumentsV3QueryBuilding: - rescore_vector appears in single-vector query body when configured - rescore_vector appears on every kNN retriever in multi-vector query - rescore_vector absent when rescore_oversample is None Co-authored-by: Manisha4 <Manisha4@github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: vanitabhagwat <92561664+vanitabhagwat@users.noreply.github.com> Co-authored-by: vanitabhagwat <vbhagwat@expediagroup.com> Co-authored-by: Manisha4 <Manisha4@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Adding support for Valkey Search, adding changes to the online_write_…

4f04a75

…batch functionality

vanitabhagwat reviewed Mar 31, 2026

View reviewed changes

Manisha4 added 3 commits April 1, 2026 23:45

Addressing PR comments

c6512cc

addressing linting error

59c7753

fix tests

0b7ad6c

vanitabhagwat reviewed Apr 2, 2026

View reviewed changes

piket reviewed Apr 2, 2026

View reviewed changes

Manisha4 added 3 commits April 2, 2026 22:28

addressing PR comments

ad3987a

addressing PR comments

24df6ab

fixing linting

ba1b6d7

piket approved these changes Apr 3, 2026

View reviewed changes

Manisha4 merged commit bd19925 into feature/vector-store Apr 6, 2026
28 of 29 checks passed

Manisha4 deleted the add/valkey-search-support-writes branch April 6, 2026 23:34

Conversation

Manisha4 commented Mar 31, 2026

What this PR does / why we need it:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanitabhagwat left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

piket left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants