From dee531c627d353cac1ccedc5e6083c4b4f2fd013 Mon Sep 17 00:00:00 2001 From: anna-charlotte Date: Fri, 28 Apr 2023 09:55:55 +0200 Subject: [PATCH 1/2] docs: change cosine similarity to distance for hnswlib doc index Signed-off-by: anna-charlotte --- docs/user_guide/storing/index_hnswlib.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/user_guide/storing/index_hnswlib.md b/docs/user_guide/storing/index_hnswlib.md index d873c059a57..61f13676bb6 100644 --- a/docs/user_guide/storing/index_hnswlib.md +++ b/docs/user_guide/storing/index_hnswlib.md @@ -123,16 +123,16 @@ In the example above you can see how to configure two different vector fields, w In this way, you can pass [all options that Hnswlib supports](https://github.com/nmslib/hnswlib#api-description): -| Keyword | Description | Default | -|-------------------|--------------------------------------------------------------------------------------------------------------------------------|---------| -| `max_elements` | Maximum number of vector that can be stored | 1024 | -| `space` | Vector space (similarity metric) the index operates in. Supports 'l2', 'ip', and 'cosine' | 'l2' | -| `index` | Whether or not an index should be built for this field. | True | -| `ef_construction` | defines a construction time/accuracy trade-off | 200 | -| `ef` | parameter controlling query time/accuracy trade-off | 10 | -| `M` | parameter that defines the maximum number of outgoing connections in the graph | 16 | -| `allow_replace_deleted` | enables replacing of deleted elements with new added ones | True | -| `num_threads` | sets the number of cpu threads to use | 1 | +| Keyword | Description | Default | +|-------------------|----------------------------------------------------------------------------------------------------|---------| +| `max_elements` | Maximum number of vector that can be stored | 1024 | +| `space` | Vector space (distance metric) the index operates in. Supports 'l2', 'ip', and 'cosine' (distance) | 'l2' | +| `index` | Whether or not an index should be built for this field. | True | +| `ef_construction` | defines a construction time/accuracy trade-off | 200 | +| `ef` | parameter controlling query time/accuracy trade-off | 10 | +| `M` | parameter that defines the maximum number of outgoing connections in the graph | 16 | +| `allow_replace_deleted` | enables replacing of deleted elements with new added ones | True | +| `num_threads` | sets the number of cpu threads to use | 1 | You can find more details on the parameters [here](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). From c1b5659bfdebd6efb43edd4c97252630cd4ec0b4 Mon Sep 17 00:00:00 2001 From: anna-charlotte Date: Fri, 28 Apr 2023 10:35:38 +0200 Subject: [PATCH 2/2] fix: apply joans suggestion from code review Signed-off-by: anna-charlotte --- docs/user_guide/storing/index_hnswlib.md | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/docs/user_guide/storing/index_hnswlib.md b/docs/user_guide/storing/index_hnswlib.md index 61f13676bb6..d8f5ee633e8 100644 --- a/docs/user_guide/storing/index_hnswlib.md +++ b/docs/user_guide/storing/index_hnswlib.md @@ -123,16 +123,19 @@ In the example above you can see how to configure two different vector fields, w In this way, you can pass [all options that Hnswlib supports](https://github.com/nmslib/hnswlib#api-description): -| Keyword | Description | Default | -|-------------------|----------------------------------------------------------------------------------------------------|---------| -| `max_elements` | Maximum number of vector that can be stored | 1024 | -| `space` | Vector space (distance metric) the index operates in. Supports 'l2', 'ip', and 'cosine' (distance) | 'l2' | -| `index` | Whether or not an index should be built for this field. | True | -| `ef_construction` | defines a construction time/accuracy trade-off | 200 | -| `ef` | parameter controlling query time/accuracy trade-off | 10 | -| `M` | parameter that defines the maximum number of outgoing connections in the graph | 16 | -| `allow_replace_deleted` | enables replacing of deleted elements with new added ones | True | -| `num_threads` | sets the number of cpu threads to use | 1 | +| Keyword | Description | Default | +|-------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------| +| `max_elements` | Maximum number of vector that can be stored | 1024 | +| `space` | Vector space (distance metric) the index operates in. Supports 'l2', 'ip', and 'cosine'.
**Note:** In contrast to the other backends, for HnswDocumentIndex `'cosine'` refers to **cosine distance**, not cosine similarity. To transform one to the other, you can use: `cos_sim = 1 - cos_dist`. For more details see [here](https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_Distance). | 'l2' | +| `index` | Whether or not an index should be built for this field. | True | +| `ef_construction` | defines a construction time/accuracy trade-off | 200 | +| `ef` | parameter controlling query time/accuracy trade-off | 10 | +| `M` | parameter that defines the maximum number of outgoing connections in the graph | 16 | +| `allow_replace_deleted` | enables replacing of deleted elements with new added ones | True | +| `num_threads` | sets the number of cpu threads to use | 1 | + +!!! note + In HnswLibDocIndex `space='cosine'` refers to cosine distance, not to cosine similarity, as it does for the other backends. You can find more details on the parameters [here](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md).