Bug computing multiple predictions with the same classifier type #69

sarahyurick · 2024-08-07T21:08:40Z

I'm currently working on NVIDIA/NeMo-Curator#173 for NeMo Curator, which uses a multifold quality classifier to generate text quality predictions and their probabilities. The goal is to generate different probabilities per model fold and average them to generate a final prediction. However, I'm finding that only the results for the first quality model used in the pipeline are being saved, despite ensuring that the column names are different. See this notebook for an example.

@VibhuJawa suggested that the bug might be caused by CrossFit modifying the same internal flag in the Dask DataFrame. Also, using persist() on the Dask DataFrames produces the correct results, but from my understanding this isn't desirable because the intended use is to read, modify, and write very large JSONL files.

The text was updated successfully, but these errors were encountered:

sarahyurick · 2024-11-10T22:46:46Z

Closed by #99.

sarahyurick changed the title ~~Bug computing multiple predictions of the same classifier type~~ Bug computing multiple predictions with the same classifier type Aug 7, 2024

sarahyurick mentioned this issue Aug 7, 2024

Add Multiple Model Classification example NVIDIA/NeMo-Curator#173

Merged

sarahyurick mentioned this issue Nov 7, 2024

Fix loading multiple models #99

Merged

sarahyurick closed this as completed Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug computing multiple predictions with the same classifier type #69

Bug computing multiple predictions with the same classifier type #69

sarahyurick commented Aug 7, 2024 •

edited

Loading

sarahyurick commented Nov 10, 2024 •

edited

Loading

Bug computing multiple predictions with the same classifier type #69

Bug computing multiple predictions with the same classifier type #69

Comments

sarahyurick commented Aug 7, 2024 • edited Loading

sarahyurick commented Nov 10, 2024 • edited Loading

sarahyurick commented Aug 7, 2024 •

edited

Loading

sarahyurick commented Nov 10, 2024 •

edited

Loading