fix: allow doclist to have nested optional document#1472
Conversation
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
is this breaking change necessary? can't we just keep nan values in the respective backend format? |
Hmm not it is none. But this is a way harder story to tell if we have different behavior for Optional[Tensor] or Optional[BaseDoc]. I would rather have the same behavior. Plus I think this one is way more optimized and solve benchmark problem we had before wit pytorch. Because we don't carry long none value anymore |
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: samsja <55492238+samsja@users.noreply.github.com>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
…into fix-docarray-optional
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: samsja <55492238+samsja@users.noreply.github.com>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: samsja <55492238+samsja@users.noreply.github.com>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
…into fix-docarray-optional
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
JohannesMessner
left a comment
There was a problem hiding this comment.
Sorry to be pain in the butt, a lot of suggestions :D
But honestly it seems like the entire doc section could be condensed down to a few sentences:
- Nested optionals are a bit tricky, watch out
- For doclist the return will be a
DocListinstead ofDocList[...] - For docvec the return will be
DocList[...]if all documents are there, orNoneif all docs are None. No mix of docs and None allowed!
I would lead with these "rules", I think they are most important to the user. And after that we can go into an explanation of why that is, maybe even hidden away behind a collapsed box.
Otherwise I think it is difficult for the users to tease out what is the important information
| !!! note | ||
| This whole section is specific for nested BaseDoc, DocList and DocVec. |
There was a problem hiding this comment.
If this is the case then the heading for this should be different, something like "dealing with nested, optional fields" or something like that
| assert docs.image == [ImageDoc(tensor=np.ones((3, 224, 224))), None] | ||
| ``` | ||
|
|
||
| but for DocVec it is a bit different. Indeed, DocVec store the data for each filed as contiguous column. |
There was a problem hiding this comment.
What does the fact that it stores it as a column to do with it not being a DocList? That column could in theory be a DocList, right?
There was a problem hiding this comment.
well not if some document are none ?
|
|
||
|
|
||
| ```python | ||
| from docarray import DocVec |
There was a problem hiding this comment.
maybe use tabs for these examples?
There was a problem hiding this comment.
I split it a bit differently now
sure I will follow your suggestion |
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: samsja <55492238+samsja@users.noreply.github.com>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
…nto fix-docarray-optional
| or extend your data. | ||
|
|
||
|
|
||
| ## Dealing with Optional fields |
There was a problem hiding this comment.
| ## Dealing with Optional fields | |
| ## Dealing with nested Optional fields |
Everything below only applies to the nested case, right?
|
|
||
| For nested BaseDoc | ||
| * DocList will return a list of document if the field is optional and a DocList if the field is not optional | ||
| * DocVec will return a DocList if all documents are there, or None if all docs are None. No mix of docs and None allowed! |
There was a problem hiding this comment.
Shouldn't DocVec return a DocVec? Am i remembering correctly that we wanted to do that but haven't implemented it yet?
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: samsja <55492238+samsja@users.noreply.github.com>
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
|
📝 Docs are deployed on https://ft-fix-docarray-optional--jina-docs.netlify.app 🎉 |
Context
This pr fix #1468
What this PR do
This pr do two things related to Optional field
DocList support for Optional field
Before calling
docs.imagewould raise an error because it will try to return aDoctList[ImageDoc]but the None value cannot he pass into such a DocList.This PR change this behavior (this is a breaking change) and now return a list of the collected object. It can contain None or a ImageDoc.
DocVec support for Optional field
before you could not construct such a DocVec because you the doc are set to None.
We enforce as well the following behavior: if one field is None all of the other field must be None. An error will be raise at init time if this is not the case.
This pr allows this. Instead of storing all of the None it will just store None for this column.
This behavior is uniformized for the tensor field as well.
This is a breaking change. Before we were casting the None value into Nan value of each of the tensor backends and we would store the whole column of Nan value.
Now this is different we just store the column as None.
Example usage