You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the get_batch function currently performs fast inference and tests for validation loss.
Create a new get_batch specifically for benchmarking.
This would utilize a json file (pretokenized), and has "question:" "answer:" for fields.
Most of the time we'd suspect just a single multiple choice answer (e.g. MMLU Pro), or a set of correct answers, and we can see 1 or 0 if the network has the correct answer from top 1 logit top 2 logits , etc.
The text was updated successfully, but these errors were encountered:
Benchmarks
This is going to be a huge contribution.
Currently the get_batch function currently performs fast inference and tests for validation loss.
Create a new get_batch specifically for benchmarking.
This would utilize a json file (pretokenized), and has "question:" "answer:" for fields.
Most of the time we'd suspect just a single multiple choice answer (e.g. MMLU Pro), or a set of correct answers, and we can see 1 or 0 if the network has the correct answer from top 1 logit top 2 logits , etc.
The text was updated successfully, but these errors were encountered: