Fix scoring bug, properly handeling `nan` values #780

fsschneider · 2024-08-29T11:34:29Z

When computing our benchmark scores, we want to "ignore" runs on a base workload, if the submission doesn't hit the target on the held-out workload. This is implemented here:

algorithmic-efficiency/scoring/performance_profile.py

Lines 322 to 328 in c465e25

    
           # For each held-out workload if variant target was not hit set submission to inf 
        
           for workload in df.keys(): 
        
             if workload not in BASE_WORKLOADS: 
        
               # If variants do not have finite score set base_workload score to inf 
        
               base_workload = get_base_workload_name(workload) 
        
               df[base_workload] = df.apply( 
        
                   variant_criteria_filter(base_workload, workload), axis=1)

However, the variant_criteria_filter() only checks for np.inf values (

algorithmic-efficiency/scoring/performance_profile.py

Lines 245 to 257 in c465e25

    
           def variant_criteria_filter(base_workload, variant_workload): 
        
             def filter(x): 
        
               try: 
        
                 if x[variant_workload] == np.inf: 
        
                   return np.inf 
        
                 else: 
        
                   return x[base_workload] 
        
               except KeyError as e: 
        
                 print(x.keys()) 
        
                 raise e 
        
             return filter

).
But another invalid score that can occur is a nan. This happens, e.g. when running OOM. In this case, the base workload score should also be ignored.

This PR fixes this issue. To properly do so, it also needs to load the list of held-out workloads (to drop all other workload variants that have only been computed for the baseline).

github-actions · 2024-08-29T11:34:42Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

fsschneider added 2 commits August 29, 2024 13:22

Update gitignore

d6d6239

Fix scoring bug handeling nan values

414e82e

fsschneider requested a review from priyakasimbeg August 29, 2024 11:34

fsschneider requested a review from a team as a code owner August 29, 2024 11:34

fsschneider changed the base branch from main to dev August 29, 2024 11:34

priyakasimbeg approved these changes Aug 29, 2024

View reviewed changes

priyakasimbeg merged commit 3b832f4 into mlcommons:dev Aug 29, 2024
16 of 19 checks passed

github-actions bot locked and limited conversation to collaborators Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix scoring bug, properly handeling `nan` values #780

Fix scoring bug, properly handeling `nan` values #780

fsschneider commented Aug 29, 2024

github-actions bot commented Aug 29, 2024

	# For each held-out workload if variant target was not hit set submission to inf
	for workload in df.keys():
	if workload not in BASE_WORKLOADS:
	# If variants do not have finite score set base_workload score to inf
	base_workload = get_base_workload_name(workload)
	df[base_workload] = df.apply(
	variant_criteria_filter(base_workload, workload), axis=1)

	def variant_criteria_filter(base_workload, variant_workload):

	def filter(x):
	try:
	if x[variant_workload] == np.inf:
	return np.inf
	else:
	return x[base_workload]
	except KeyError as e:
	print(x.keys())
	raise e

	return filter

Fix scoring bug, properly handeling nan values #780

Fix scoring bug, properly handeling nan values #780

Conversation

fsschneider commented Aug 29, 2024

github-actions bot commented Aug 29, 2024

Fix scoring bug, properly handeling `nan` values #780

Fix scoring bug, properly handeling `nan` values #780