Handle label imbalance in binary classification tasks on text benchmark #376

vid-koci · 2024-03-11T14:41:56Z

Labels in the text benchmarks are imbalanced and weighting the positive labels improves performance.
Experiments done on fake dataset (5% positive labels) with text_embedded and RoBERTa encodings:

ResNet result changes 91.1% -> 93.4%
FTTransformer result remains unchanged
Trompt result changes 95.2% -> 95.8%

The differences were even more stark with distilled roberta, but we aren't reporting those anywhere so I didn't note them down.

More results are pending

for more information, see https://pre-commit.ci

codecov · 2024-03-11T14:46:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.52%. Comparing base (893678f) to head (3801ffd).

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #376   +/-   ##
=======================================
  Coverage   93.52%   93.52%           
=======================================
  Files         124      124           
  Lines        6451     6451           
=======================================
  Hits         6033     6033           
  Misses        418      418

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

weihua916

Thanks!

Handle label imbalance in binary tasks

f03af4a

vid-koci added 1 - Priority P1 benchmark skip-changelog labels Mar 11, 2024

vid-koci requested a review from zechengz March 11, 2024 14:41

vid-koci self-assigned this Mar 11, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

3801ffd

for more information, see https://pre-commit.ci

weihua916 approved these changes Mar 11, 2024

View reviewed changes

weihua916 merged commit 96bdf12 into master Mar 11, 2024
14 checks passed

weihua916 deleted the vid_label_imbalance branch March 11, 2024 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle label imbalance in binary classification tasks on text benchmark #376

Handle label imbalance in binary classification tasks on text benchmark #376

vid-koci commented Mar 11, 2024 •

edited

Loading

codecov bot commented Mar 11, 2024

weihua916 left a comment

Handle label imbalance in binary classification tasks on text benchmark #376

Handle label imbalance in binary classification tasks on text benchmark #376

Conversation

vid-koci commented Mar 11, 2024 • edited Loading

codecov bot commented Mar 11, 2024

Codecov Report

weihua916 left a comment

Choose a reason for hiding this comment

vid-koci commented Mar 11, 2024 •

edited

Loading