Skip to content

feat: add pre-filtering support#331

Merged
alaeddine-13 merged 45 commits into
mainfrom
feat-add-annlite-filtering
May 25, 2022
Merged

feat: add pre-filtering support#331
alaeddine-13 merged 45 commits into
mainfrom
feat-add-annlite-filtering

Conversation

@davidbp

@davidbp davidbp commented May 9, 2022

Copy link
Copy Markdown
Contributor

This PR support for pre-filtering in storage backends
closes: #263

TODOs:

  • modify the interface of find and add parameter filter, by default None. Handle the following cases:

    • If filter is not None and query is vector => perform filtered vector search if supported otherwise raise error (backend raises an error).
      • Needs test of raised error
    • If filter is not None and query is None => perform filter operation (exhaustive filtering if efficient backend filter is not supported otherwise, use filter from backend).
    • If query is a dict and filter is None, perform a filter operation (to preserve compatibility).
    • If filter is not None and and query is not None or vector, raise error.
  • use pre-filtering in _find:

    • use pre-filtering in _find for weviate
    • use pre-filtering in _find for qdrant
    • use pre-filtering in _find for annlite
  • define the schema of tags in config for both annlite, weaivate and qdrant:

    • Schema is based on columns which is part of QdrantConfig, AnnliteConfig, ElasticSearchConfig, SqliteConfig
    • schema for annlite
    • schema for weaviate
    • schema for qdrant
  • modify set_doc_by_id and make sure to include tags specified in the schema definition

    • set_doc_by_id for annlite
    • set_doc_by_id for weaviate
    • set_doc_by_id for qdrant
    • same for set_docs_by_ids if implemented in the backend
  • Documentation for

    • annlite
    • weaviate
    • qdrant
  • Tests that leverage pre-filtering for

    • annlite
    • weaviate
    • qdrant

@davidbp davidbp force-pushed the feat-add-annlite-filtering branch from f8b5d7b to 06be300 Compare May 9, 2022 13:02
@codecov

codecov Bot commented May 9, 2022

Copy link
Copy Markdown

Codecov Report

Merging #331 (bbc1473) into main (4213553) will decrease coverage by 0.00%.
The diff coverage is 91.22%.

@@            Coverage Diff             @@
##             main     #331      +/-   ##
==========================================
- Coverage   86.21%   86.20%   -0.01%     
==========================================
  Files         134      134              
  Lines        6324     6366      +42     
==========================================
+ Hits         5452     5488      +36     
- Misses        872      878       +6     
Flag Coverage Δ
docarray 86.20% <91.22%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/storage/annlite/find.py 83.33% <ø> (ø)
docarray/array/storage/sqlite/backend.py 93.24% <ø> (ø)
docarray/array/mixins/find.py 87.35% <76.92%> (-1.26%) ⬇️
docarray/array/storage/weaviate/find.py 86.95% <83.33%> (-2.79%) ⬇️
docarray/array/storage/annlite/backend.py 95.58% <100.00%> (+0.42%) ⬆️
docarray/array/storage/base/backend.py 78.94% <100.00%> (+3.94%) ⬆️
docarray/array/storage/elastic/backend.py 94.39% <100.00%> (+0.05%) ⬆️
docarray/array/storage/elastic/find.py 89.58% <100.00%> (+0.45%) ⬆️
docarray/array/storage/memory/find.py 91.22% <100.00%> (+0.31%) ⬆️
docarray/array/storage/qdrant/backend.py 95.91% <100.00%> (+0.12%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4213553...bbc1473. Read the comment docs.

@numb3r3 numb3r3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Comment thread tests/unit/array/mixins/test_find.py Outdated
@alaeddine-13 alaeddine-13 changed the title feat-add-annlite-filter feat: add pre-filtering support May 10, 2022
@github-actions github-actions Bot added size/m and removed size/s labels May 11, 2022
@davidbp davidbp self-assigned this May 11, 2022
@davidbp

davidbp commented May 11, 2022

Copy link
Copy Markdown
Contributor Author

@alaeddine-13 do you mean by If filter is not None and and query is not None or vector, raise error. the following ? If filter is not None and ( (query is not None) or (query is not vector)) raise error

@alaeddine-13

Copy link
Copy Markdown
Member

is

yes exactly, because applying a filter when the query is a condition or when the query is text does not really make sense

Comment thread docarray/array/mixins/find.py Outdated
@github-actions github-actions Bot added size/l and removed size/m labels May 24, 2022
Comment thread docs/advanced/document-store/weaviate.md Outdated
Comment thread docs/advanced/document-store/annlite.md Outdated
Comment thread docs/advanced/document-store/annlite.md Outdated
Comment thread docs/advanced/document-store/annlite.md Outdated
Comment thread docs/advanced/document-store/annlite.md
Comment thread docs/advanced/document-store/annlite.md Outdated
Comment thread docs/advanced/document-store/weaviate.md Outdated
Comment thread docs/advanced/document-store/weaviate.md Outdated
Comment thread docs/advanced/document-store/weaviate.md Outdated
Comment thread docs/advanced/document-store/weaviate.md
Comment thread docs/advanced/document-store/weaviate.md Outdated
@github-actions

Copy link
Copy Markdown

📝 Docs are deployed on https://ft-feat-add-annlite-filtering--jina-docs.netlify.app 🎉

@davidbp davidbp requested a review from JoanFM May 25, 2022 06:13
@alaeddine-13 alaeddine-13 merged commit 1a3fb27 into main May 25, 2022
@alaeddine-13 alaeddine-13 deleted the feat-add-annlite-filtering branch May 25, 2022 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support pre-filtering in docarray for annlite and qdrant

6 participants