Release Note
This release contains 4 new features, 11 bug fixes, and several documentation improvements.
💥 Breaking changes
Return type of DocVec Optional Tensor (#1472)
Optional tensor fields in a DocVec will return None instead of a list of Nan if the column does not hold any tensor.
This code snippet shows the breaking change:
from typing import Optional
from docarray import BaseDoc, DocVec
from docarray.typing import NdArray
class MyDoc(BaseDoc):
tensor: Optional[NdArray[10]]
docs = DocVec[MyDoc]([MyDoc() for j in range(2)])
print(docs.tensor)
| Version |
Return type |
| 0.30.0 |
[nan nan] |
| 0.31.0 |
None |
🆕 Features
Add InMemoryDocIndex (#1441)
In this version we have introduced the InMemoryDocIndex Document Index which allows you to perform in-memory exact vector search (as opposed to approximate nearest neighbor search in vector databases).
The InMemoryDocIndex can be used for prototyping and is suitable for dealing with small-scale documents (1k-10k), as opposed to a vector database that is suitable for larger scales but comes with a performance overhead at smaller scales.
from docarray import BaseDoc, DocList
from docarray.index.backends.in_memory import InMemoryDocIndex
from docarray.typing import NdArray
import numpy as np
class MyDoc(BaseDoc):
tensor: NdArray[512]
docs = DocList[MyDoc](MyDoc(tensor=i*np.ones(512)) for i in range(10))
doc_index = InMemoryDocIndex[MyDoc]()
doc_index.index(docs)
print(doc_index.find(3*np.ones(512), search_field='tensor', top_k=3))
FindResult(documents=<DocList[MyDoc] (length=10)>, scores=array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]))
DocList inherits from Python list (#1457)
DocList is now a subclass of Python's list. This means that you can now use all the methods that are available to Python lists on DocList objects. For example, you can now use len on DocList objects and tools like Pydantic or FastAPI will be able to work with it more easily.
Add len to DocIndex (#1454)
You can now perform len(vector_index) which is equivalent to vector_index.num_docs().
Other minor features
🐞 Bug Fixes
Point to older versions when importing Document or Documentarray (#1422)
Trying to load Document or DocumentArray from DocArray would previously raise an error, saying that you needed to downgrade your version of DocArray if you wanted to use these two objects. This behavior has been fixed.
Fix AnyDoc from_protobuf (#1437)
AnyDoc can now read any BaseDoc protobuf file. The same applies to DocList.
Other bug fixes
📗 Documentation Improvements
🤟 Contributors
We would like to thank all contributors to this release:
Release Note
This release contains 4 new features, 11 bug fixes, and several documentation improvements.
💥 Breaking changes
Return type of
DocVecOptional Tensor (#1472)Optional tensor fields in a
DocVecwill returnNoneinstead of a list ofNanif the column does not hold any tensor.This code snippet shows the breaking change:
[nan nan]None🆕 Features
Add
InMemoryDocIndex(#1441)In this version we have introduced the
InMemoryDocIndexDocument Index which allows you to perform in-memory exact vector search (as opposed to approximate nearest neighbor search in vector databases).The
InMemoryDocIndexcan be used for prototyping and is suitable for dealing with small-scale documents (1k-10k), as opposed to a vector database that is suitable for larger scales but comes with a performance overhead at smaller scales.DocListinherits from Python list (#1457)DocListis now a subclass of Python'slist. This means that you can now use all the methods that are available to Python lists onDocListobjects. For example, you can now uselenonDocListobjects and tools like Pydantic or FastAPI will be able to work with it more easily.Add
lentoDocIndex(#1454)You can now perform
len(vector_index)which is equivalent tovector_index.num_docs().Other minor features
to_jsonalias toBaseDoc(feat: add to_json alias #1494)🐞 Bug Fixes
Point to older versions when importing
DocumentorDocumentarray(#1422)Trying to load
DocumentorDocumentArrayfrom DocArray would previously raise an error, saying that you needed to downgrade your version of DocArray if you wanted to use these two objects. This behavior has been fixed.Fix
AnyDocfrom_protobuf(#1437)AnyDoccan now read anyBaseDocprotobuf file. The same applies toDocList.Other bug fixes
extendtoDocList(fix: fix extend with itself infinite recursion #1493)dict()onBaseDoc(fix: fix to dict exclude #1481)json()onBaseDoc(fix: fix to dict exclude #1481)pd.concat()instead ofdf.append()into_dataframe()to avoid warning (fix: usepd.concat()insteaddf.append()into_dataframe()to avoid warning #1478)ndarray(fix: torch tensor with grad to numpy #1429)hnswlib(fix: save index during creation for hnswlib #1424)📗 Documentation Improvements
DocindexURLs (fix: docindex urls #1433)hnswlibandelasticdocument indexes (feat: add install instructions for hnswlib and elastic doc index #1431)🤟 Contributors
We would like to thank all contributors to this release: