feat: add max_rel_per_label to support recall for labeled data by guenthermi · Pull Request #826 · docarray/docarray

guenthermi · 2022-11-22T08:07:06Z

Signed-off-by: Michael Guenther guenthermi50@gmail.com

Goals:

Support recall@k and F1 measure@k for labeled datasets.
check and update documentation, if required. See guide

For labeled datasets it is not trivial to calculate metrics like recall and F1 measure, which require the knowledge of the number of relevant documents in the document collection for each query since neither the whole set of relevant documents nor the number of documents with a specific label is provided to the evaluate function.

To enable the calculation of recall and F1 measure, this PR

adds a parameter max_rel_per_label: Dict to the evaluate function which provides the number of relevant documents for each label, i.e., the number of documents in the collection with this label.
calculates those label counts for max_rel_per_label in the embed_and_evaluate_function.

Code Example:

# example DocumentArray with matches and labels for evaluation 
da = DocumentArray([Document(text=str(i), tags={'label': i}) for i in range(3)])
for d in da:
  d.matches = da
# each label occurs one time in the data collection (not provided to the evaluate function)
max_rel_per_label = {i: 1 for i in range(3)}
# evaluate matches
metrics = da.evaluate(['recall_at_k'], max_rel_per_label=max_rel_per_label)
print(metrics)

{'recall_at_k': 1.0}

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

codecov-commenter · 2022-11-22T08:22:13Z

Codecov Report

❌ Patch coverage is 89.47368% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.81%. Comparing base (7a5b0bf) to head (66f6ee5).
⚠️ Report is 608 commits behind head on main.

Files with missing lines	Patch %	Lines
docarray/array/mixins/evaluation.py	94.11%	1 Missing ⚠️
docarray/math/evaluation.py	50.00%	1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (7a5b0bf) and HEAD (66f6ee5). Click for more details.

HEAD has 13 uploads less than BASE

Flag BASE (7a5b0bf) HEAD (66f6ee5)

docarray 26 13

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #826      +/-   ##
==========================================
- Coverage   86.38%   77.81%   -8.57%     
==========================================
  Files         138      138              
  Lines        7122     7137      +15     
==========================================
- Hits         6152     5554     -598     
- Misses        970     1583     +613

Flag	Coverage Δ
docarray	`77.81% <89.47%> (-8.57%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gmastrapas · 2022-11-23T15:07:36Z

        caller_max_rel = kwargs.pop('max_rel', None)
        for d, gd in zip(self, ground_truth):
-            max_rel = caller_max_rel or len(gd.matches)
+            if caller_max_rel:


I think you need to refactor the if else logic here a bit

for d, gd in zip(self, ground_truth): if caller_max_rel: max_rel = caller_max_rel if ground_truth_type == 'labels': if max_rel_per_label: max_rel = max_rel_per_label.get(d.tags[label_tag], None) if max_rel is None: raise ValueError( '`max_rel_per_label` misses the label ' + str(d.tags[label_tag]) ) else: raise ValueError('max_rel is required or something') else: max_rel = len(gd.matches)

I think it is correct, that caller_max_rel is used if the user provides a max_rel attribute explicitly.

This exception when max_rel is not set should also not be their because most of the metrics do not require max_rel, but setting it to None might be better than setting it to len(gd.matches). I will change this.

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

bwanglzu

left some comment

bwanglzu · 2022-11-28T10:24:56Z

+        if ground_truth and label_tag in ground_truth[0].tags:
+            max_rel_per_label = dict(Counter([d.tags[label_tag] for d in ground_truth]))
+        elif not ground_truth and label_tag in query_data[0].tags:
+            max_rel_per_label = dict(Counter([d.tags[label_tag] for d in query_data]))
+        else:
+            max_rel_per_label = None


i don't understand, max_rel_per_label is a variable you passed into the function, then what is this max_rel_per_label?

okay i see, these are two functions

can you elaberate a bit the naming, max_rel_per_label? why not num_relevant_documents_per_label or something like that?

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

JoanFM

I would like to see some changes in Documentation

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

alexcg1 · 2022-11-29T13:43:43Z


 In this case, the keyword argument `k` is passed to all metric functions, even though it does not fulfill any specific function for the calculation of the reciprocal rank.

+### The max-rel parameter


Suggested change

### The max-rel parameter

### The max_rel parameter

alexcg1 · 2022-11-29T13:43:59Z

+### The max-rel parameter
+
+Some metric functions shown in the table above require a `max_rel` parameter.
+This parameter should be set to the number of relevant documents in the document collection.


Suggested change

This parameter should be set to the number of relevant documents in the document collection.

This parameter should be set to the number of relevant Documents in the Document collection.

alexcg1 · 2022-11-29T13:44:13Z

+
+Some metric functions shown in the table above require a `max_rel` parameter.
+This parameter should be set to the number of relevant documents in the document collection.
+Without the knowledge of this number, metrics like `recall_at_k` and `f1_score_at_k` can be calculated.


Can or cannot?

alexcg1 · 2022-11-29T13:44:27Z

+This parameter should be set to the number of relevant documents in the document collection.
+Without the knowledge of this number, metrics like `recall_at_k` and `f1_score_at_k` can be calculated.
+
+In the `evaluate` function, one can provide a keyword argument `max_rel`, which is then used for all queries.


Change all "one" to "you"

alexcg1 · 2022-11-29T13:44:47Z

+{'recall_at_k': 1.0}
+```
+
+Since all relevant documents are in the matches, the recall is one.


Capitalize Documents throughout text

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

alexcg1

LGTM 👍

feat: add max_rel_per_label to support recall for labeled data

81177b7

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

guenthermi marked this pull request as ready for review November 22, 2022 08:34

guenthermi mentioned this pull request Nov 23, 2022

Make the evaluator more robust, speed-up and handle datasets up to 100K docs jina-ai/finetuner#512

Closed

gmastrapas reviewed Nov 23, 2022

View reviewed changes

guenthermi requested a review from gmastrapas November 24, 2022 08:11

refactor: change logic of setting max_rel

29777b7

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

guenthermi force-pushed the feat-add-max_rel_per_label branch from d65eb05 to 29777b7 Compare November 24, 2022 08:27

chore: merge main into branch

1400df5

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

LMMilliken approved these changes Nov 28, 2022

View reviewed changes

bwanglzu suggested changes Nov 28, 2022

View reviewed changes

guenthermi requested review from bwanglzu and removed request for gmastrapas November 28, 2022 10:35

guenthermi force-pushed the feat-add-max_rel_per_label branch 3 times, most recently from 802d8da to 62e70f8 Compare November 28, 2022 11:04

guenthermi closed this Nov 28, 2022

guenthermi reopened this Nov 28, 2022

guenthermi added 2 commits November 28, 2022 13:08

refactor: change name of max_rel_per_label

0c10f35

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

fix: add missing s

22f0dc9

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

guenthermi force-pushed the feat-add-max_rel_per_label branch 2 times, most recently from 9dbabed to 22f0dc9 Compare November 28, 2022 12:12

guenthermi added 2 commits November 28, 2022 13:14

Merge branch 'main' into feat-add-max_rel_per_label

985956c

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

fix: tests

5739b4f

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

guenthermi force-pushed the feat-add-max_rel_per_label branch from 68852b9 to 5739b4f Compare November 28, 2022 13:57

JoanFM requested changes Nov 28, 2022

View reviewed changes

docs: add documentation for max_rel

218574e

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

guenthermi requested review from JoanFM and removed request for JoanFM and bwanglzu November 29, 2022 13:25

guenthermi requested review from JoanFM and bwanglzu and removed request for JoanFM and bwanglzu November 29, 2022 13:25

Merge branch 'main' into feat-add-max_rel_per_label

729c44b

JoanFM requested a review from alexcg1 November 29, 2022 13:32

alexcg1 suggested changes Nov 29, 2022

View reviewed changes

docs: implement review notes

f61e8ed

Signed-off-by: Michael Guenther <guenthermi50@gmail.com>

guenthermi requested review from alexcg1 and removed request for JoanFM November 29, 2022 14:35

alexcg1 approved these changes Nov 29, 2022

View reviewed changes

Merge branch 'main' into feat-add-max_rel_per_label

66f6ee5

JoanFM approved these changes Nov 30, 2022

View reviewed changes

JoanFM merged commit 8a4224d into docarray:main Nov 30, 2022

alexcg1 mentioned this pull request Dec 6, 2022

chore: draft release note v0.20 #894

Closed


		In this case, the keyword argument `k` is passed to all metric functions, even though it does not fulfill any specific function for the calculation of the reciprocal rank.

		### The max-rel parameter

	This parameter should be set to the number of relevant documents in the document collection.
	This parameter should be set to the number of relevant Documents in the Document collection.

Conversation

guenthermi commented Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Nov 22, 2022 • edited by codecov Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gmastrapas Nov 23, 2022 • edited by JoanFM Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwanglzu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JoanFM left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexcg1 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

guenthermi commented Nov 22, 2022 •

edited

Loading

codecov-commenter commented Nov 22, 2022 •

edited by codecov Bot

Loading

gmastrapas Nov 23, 2022 •

edited by JoanFM

Loading