-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ze_peak explicit scaling benchmark #88
Conversation
Original execution flow for each subdevice ID
Suppose we have two subdevices 0 & 1. For subdevice 0 there is no synchronization at all, since the cmdqueue is async we only measure the submission time which is very small. For subdevice 1 we will call cmdqueue sync on subdevice 0 at step 4, before we actually run the benchmark on subdevice 1 so there is no overlap at all. At the end we sum all the time measurements and calculate the BW. Although we had no overlap, we also didn't measure execution on subdevice 0, so we get half of the actual time and thus double the BW, so this bug was not discovered before. Additionally, for subdevice 0 we do submit-submit-...(500 times)...-submit-sync using the same cmdlist & cmdqueue pair, this violates L0 spec's description of
So we saw command buffer GPU page faults when running ze_peak on PVC. Corrected execution flow for each subdevice ID
Basically, now we submit 1 cmdlist to each subdevice asynchronously, and we do synchronization on all subdevices once we have submitted all cmdlists. There will be some warmup & cmdlist operation overhead mixed in there, and the barriers also have their own overhead, but the measured BW is still very close to 2x the performance on a single subdevice. |
The explicit scaling code for ze_peak violates L0 spec and has no overlap between sub-devices. This PR corrects these issues. Signed-off-by: Wenbin Lu <wenbin.lu@intel.com>
The explicit scaling code for ze_peak violates L0 spec and has no overlap between sub-devices. This PR corrects these issues. Signed-off-by: Wenbin Lu <wenbin.lu@intel.com>
The explicit scaling code for ze_peak violates L0 spec and has no overlap between sub-devices. This PR corrects these issues.