Skip to content

feat: support filtering based on text keywords for qdrant#849

Merged
JoanFM merged 14 commits into
mainfrom
feat-qdrant-text-search
Dec 7, 2022
Merged

feat: support filtering based on text keywords for qdrant#849
JoanFM merged 14 commits into
mainfrom
feat-qdrant-text-search

Conversation

@AnneYang720

@AnneYang720 AnneYang720 commented Nov 25, 2022

Copy link
Copy Markdown
Contributor

Signed-off-by: AnneY evangeline-lun@foxmail.com

Goals:

This PR is related to issue #612

  • _find_by_text
  • add related tests
  • check and update documentation, if required. See guide

The qdrant find or filter can work as following:

from docarray import Document, DocumentArray
import numpy as np

n_dim = 128
da = DocumentArray(
    storage='qdrant',
    config={
        'n_dim': n_dim,
        'columns': {
            'price': 'float',
            'category': 'str',
            'info': 'text',
            'location': 'geo',
        },
    },
)

da.extend(
    [
        Document(
            id=f'r{i}',
            embedding=np.random.rand(n_dim),
            tags={
                'price': i + 0.5,
                'category': 'Shoes',
                'info': f'shoes {i}',
                'location': {"lon": -98.17 + i, "lat": 38.93 + i},
            },
        )
        for i in range(10)
    ]
)

filter = {
    'must': [
        {"key": "info", "match": {"text": "shoes"}},
        {
            "key": "location",
            "geo_radius": {
                "center": {"lon": -98.17, "lat": 38.71},
                "radius": 500.0 * 1000,  # unit is meter
            },
        },
    ]
}

results = da.find(np.random.rand(n_dim), filter=filter)

for doc in results:
    print(f" id={doc.id},\t info={doc.tags['info']},\t location={doc.tags['location']}")

Signed-off-by: AnneY <evangeline-lun@foxmail.com>
@codecov-commenter

codecov-commenter commented Nov 25, 2022

Copy link
Copy Markdown

Codecov Report

Base: 88.15% // Head: 88.06% // Decreases project coverage by -0.08% ⚠️

Coverage data is based on head (13414bf) compared to base (95d6c5f).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #849      +/-   ##
==========================================
- Coverage   88.15%   88.06%   -0.09%     
==========================================
  Files         138      138              
  Lines        7148     7154       +6     
==========================================
- Hits         6301     6300       -1     
- Misses        847      854       +7     
Flag Coverage Δ
docarray 88.06% <100.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/storage/qdrant/backend.py 95.23% <100.00%> (+0.23%) ⬆️
docarray/array/storage/qdrant/find.py 78.43% <100.00%> (-13.73%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: AnneY <evangeline-lun@foxmail.com>
Signed-off-by: AnneY <evangeline-lun@foxmail.com>
Signed-off-by: AnneY <evangeline-lun@foxmail.com>

@JoanFM JoanFM left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be clarified that the full_text_search here is not the same as in Elastic Search. I am not sure this implementation is correct actually.

According to Qdrant, this is just another filter conditions but not a full text search powered by BM25.

I would reject this PR and reimplement this as just raising the version and documenting that these kind of filters can be used

@JohannesMessner

JohannesMessner commented Dec 1, 2022

Copy link
Copy Markdown
Member

I think it needs to be clarified that the full_text_search here is not the same as in Elastic Search. I am not sure this implementation is correct actually.

According to Qdrant, this is just another filter conditions but not a full text search powered by BM25.

I would reject this PR and reimplement this as just raising the version and documenting that these kind of filters can be used

I think the way this implementation does the indexing etc. is correct, from briefly looking at the qdrant documentation it seems like this kind of filter requires a dedicated index to be built for it. But I agree that we should probably expose this feature through our filtering API, not expose it on the same level as vector similarity find().
Additional reference: https://qdrant.tech/documentation/filtering/#full-text-match

@JoanFM

JoanFM commented Dec 1, 2022

Copy link
Copy Markdown
Member

I think it needs to be clarified that the full_text_search here is not the same as in Elastic Search. I am not sure this implementation is correct actually.
According to Qdrant, this is just another filter conditions but not a full text search powered by BM25.
I would reject this PR and reimplement this as just raising the version and documenting that these kind of filters can be used

I think the way this implementation does the indexing etc. is correct, from briefly looking at the qdrant documentation it seems like this kind of filter requires a dedicated index to be built for it. But I agree that we should probably expose this feature through our filtering API, not expose it on the same level as vector similarity find(). Additional reference: https://qdrant.tech/documentation/filtering/#full-text-match

Exactly, I think it should be expose via filter

Signed-off-by: Anne Yang <evangeline-lun@foxmail.com>
This reverts commit 5fd7226.

Signed-off-by: AnneY <evangeline-lun@foxmail.com>
Signed-off-by: AnneY <evangeline-lun@foxmail.com>
Signed-off-by: AnneY <evangeline-lun@foxmail.com>
@github-actions

github-actions Bot commented Dec 6, 2022

Copy link
Copy Markdown

📝 Docs are deployed on https://ft-feat-qdrant-text-search--jina-docs.netlify.app 🎉

@JoanFM JoanFM changed the title feat: support find_by_text for qdrant feat: support filtering based on text keywords for qdrant Dec 7, 2022
@JoanFM JoanFM merged commit 30cc652 into main Dec 7, 2022
@JoanFM JoanFM deleted the feat-qdrant-text-search branch December 7, 2022 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants