Question regarding F1 evaluation metric #5

atreyasha · 2022-03-18T11:20:28Z

I would like to ask a question regarding the F1 evaluation metric used in your paper (similar to #3). The paper mentions that the "average of the maximum F1 from each n−1 subset" is used to calculate the F1 metric. I am slightly unsure as to how this works, but think it could mean the following:

For each classification output, compare the predicted label against the labels from the annotators. Compute the maximum F1 per sample (which should be the same as accuracy), as shown in the example below:

Sample Predicted Label Ann1 Ann2 Ann3 Maximum F1

1 Relevant Irrelevant None Irrelevant 0

2 Relevant Relevant Relevant Relevant 1

3 Irrelevant None Irrelevant Relevant 1
Take the average of all maximum F1 scores: (0 + 1 + 1)/3 = 2/3 =~ 0.67

Is my understanding of the evaluation metric correct?

Thank you for your time.

atreyasha changed the title ~~Questions regarding F1 evaluation metric~~ Question regarding F1 evaluation metric Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding F1 evaluation metric #5

Question regarding F1 evaluation metric #5

atreyasha commented Mar 18, 2022 •

edited

Loading

Question regarding F1 evaluation metric #5

Question regarding F1 evaluation metric #5

Comments

atreyasha commented Mar 18, 2022 • edited Loading

atreyasha commented Mar 18, 2022 •

edited

Loading