Some of our users work with deeply nested data, where they perform vector search on some nesting level, but are actually interested in retrieving the root level documents.
In memory this can be solved by traversing the nested structure on the fly, but with a database backend it is not doable: nested levels are only present in serialized form, so one would have to load everything into memory in order to be able to traverse the structure.
To tackle this, we propose the following:
- we create a function
get_root_doc(da, doc) that returns the root document of doc. The implementation could be something similar to this:
def get_root_doc(da, doc):
root_da_flat = da[...]
result = doc
while result.parent_id:
result = root_da_flat[result.parent_id]
return result
- For storage backends we expose an api that allows you to search by some nesting level, but retrieve documents on the root level:
da.find(..., return_root=True)
- It works the following way:
- when inserting a (batch of) Document(s), it calls
get_root_doc() on that
- It stores the root document's
id as a separate column in the database
- when searching with
return_root=True it performs a search, then take the result's stored root_id, and returns the root document based on that
- The level the user searches on needs to exist as a subindex (this is already the case), and the root level is always properly indexed anyways. The intermediate nesting levels can stay serialized.
Some of our users work with deeply nested data, where they perform vector search on some nesting level, but are actually interested in retrieving the root level documents.
In memory this can be solved by traversing the nested structure on the fly, but with a database backend it is not doable: nested levels are only present in serialized form, so one would have to load everything into memory in order to be able to traverse the structure.
To tackle this, we propose the following:
get_root_doc(da, doc)that returns the root document ofdoc. The implementation could be something similar to this:da.find(..., return_root=True)get_root_doc()on thatidas a separate column in the databasereturn_root=Trueit performs a search, then take the result's stored root_id, and returns the root document based on that