Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WGPUFuture #199

Open
3 tasks
kainino0x opened this issue Jul 25, 2023 · 19 comments
Open
3 tasks

WGPUFuture #199

kainino0x opened this issue Jul 25, 2023 · 19 comments
Labels
async Asynchronous operations and callbacks needs exploration Needs offline exploration (e.g. in an implementation)

Comments

@kainino0x
Copy link
Collaborator

kainino0x commented Jul 25, 2023

I've been working on this for a while, but haven't filed a tracking issue until now.

The main problem is that ProcessEvents is a polling-based API, so there's no way to get notified/woken when an event happens (like posix poll, or vkWaitForFences). A bunch of other issues also crop up around:

  • integration with other OS events (and therefore async runtimes like Tokio, Goroutines, browsers, etc.)
  • sync wasm, vs async wasm (jspi/asyncify), vs native, vs native-with-remoting
    • multithreaded usage
    • timing portability
  • can only poll all events, and not specific ones

Expect more from me on this soon.

This issue also subsumes:


TODO:

  • Add callback modes to the CallbackInfo structs (except UncapturedError)
  • A way to get the future for DeviceLost
  • ??? Need to go through this issue
@kainino0x
Copy link
Collaborator Author

kainino0x commented Aug 10, 2023

After a bunch of discussion in the webgpu.h meeting we've decided we'll tentatively go forward with my proposal, except for the OS-interop bits that are very tricky and may or may not turn out to be necessary. (Chromium will most likely end up with interesting answers about what we decide to do.)

Slide-deck version of the WGPUFuture proposal: https://docs.google.com/presentation/d/1_0lZlaja_9IJxBMZaQGKvAV8bPZoOosNouiF1WiuEBw/edit?usp=sharing
Full 14-page document with lots of implementation details, links about how things work in backend APIs and OSes, alternatives considered, etc.:
https://docs.google.com/document/d/1qJRTJRY318ZEqhK6ryw4P-91FumYQfOnNP6LpANYy6A/edit?usp=sharing

EDIT: Forgot to paste the meeting notes from aug 10:

  • Slides (continue from “ProcessEvents and Callback Modes”)
  • discussion
    • the future exclusive-use rule vs mutexing the futures (timed_mutex)
    • reactors / forwarding events into iocp etc.
    • CF: use cases:
      • ok with just doing it somewhere in loop (ProcessEvents)
      • don’t care and want event loop to handle it
      • need something done now and need to wait synchronously (WaitAny with one future)
    • looks good, tbd on all of the OS interop bits
    • better to have the embedder deal with threading where necessary than have the implementation use PostTask to forward events around
  • Future creation listener
    • don’t add this at least for now
  • device-level and instance-level userdata lifetimes
    • Each callback gets a separate userdata
    • Repeating callbacks get a special DeviceLost value the last time they’ll ever be called

@kainino0x kainino0x removed the !discuss Needs discussion (at meeting or online) label Aug 10, 2023
@kainino0x
Copy link
Collaborator Author

Making some tweaks to make it so we can choose not to polyfill and incrementally improve later:

  • “Future” is now one of the CallbackMode flags, if you don’t pass it then it returns a null future
  • Features that must be enabled at instance creation time. There’s no feature detectability because this is a property of an implementation and I figure applications can deal with it, but could add it.
    • Timeout > 0 (for sync WASM)
    • Timeout > 0 with more than 64 futures (for Win32). Unlikely to be needed much so won't implement this yet.
  • Device feature for whether its OnSubmittedWorkDone and MapAsync futures can be mixed with other sources (CPU, device1, device2), when timeout > 0 (for Vulkan). Implementing without support for now but might do something about it later - there are design ideas for how to implement it in the doc.

@kainino0x
Copy link
Collaborator Author

  • KN: update, I’ve been implementing in dawn, this has made me unhappy about doing thread-safe programming in C++ but it’s getting somewhere. Hopefully landing code well in advance of going on vacation
  • CF: Been thinking about naming. “Future” is confusing in Rust since it also has futures.
  • KN: Unsure if it’s going to appear in the Rust API. Maybe not if you’re using rust futures?
  • CF: Need to have a reactor to use futures. Expect people to mostly do one of two things: poll or wait. Neither requires a reactor, but does require an object to do that on.
  • KN: Can you return something that can do both those things and is a rust future?
  • CF: yes because future is just a trait, might be weird, will think about it
  • names: future, promise, event, fence, semaphore, …

@kainino0x
Copy link
Collaborator Author

kainino0x commented Aug 31, 2023

  • CF: Isn’t the limit really for 64 different submissions, not 64 futures?
  • KN: Yes, we can loosen the definition of the limit later if we want
  • CF: wgpu already has a bunch of crap in the instance descriptor. HLSL compiler, force GLSL minor version, etc.
  • AE: We have stuff like that in various places too, including the device. Toggles on instance, adapter, and device. Instance descriptor has a pointer to a “Platform” object
  • CF: Think that’s what wgpu-native does essentially
  • CF: Impl knows the order of the fences so can always wait the <=64 oldest.
  • AE: Async pipeline creation makes it so it can’t be reduced. Depending on number of threads in your thread pool.
  • Runtimes…
  • KN: Runtimes can’t interrupt WaitAny
  • CF: Runtime can create and signal its own event if it’s using WaitForMultipleObjects itself. Can’t do this with WaitAny (can’t provide a cpu-signaled event on vkWaitForFences)
  • KN: OK to just leave as is - have new members in the instance descriptor, but no way to feature detect? And crash if they’re not supported
  • Move these features/limits out to a substruct of instance descriptor, add a top-level function that can populate that struct with the supported features.
  • If you request things that aren’t supported, we return null, and it gets logged in an implementation-defined way.

@kainino0x
Copy link
Collaborator Author

kainino0x commented Aug 31, 2023

  • Move these features/limits out to a substruct of instance descriptor, add a top-level function that can populate that struct with the supported features.

Just a note: This struct has to be extensible, but WGPUInstanceDescriptor is also extensible, so we end up with:

typedef struct WGPUInstanceFeatures {
    WGPUChainedStruct const * nextInChain;
} WGPUInstanceFeatures WGPU_STRUCTURE_ATTRIBUTE;

typedef struct WGPUInstanceDescriptor {
    WGPUChainedStruct const * nextInChain;
    WGPUInstanceFeatures features;
} WGPUInstanceDescriptor WGPU_STRUCTURE_ATTRIBUTE;

WGPU_EXPORT WGPUSomeStatusEnum wgpuGetInstanceFeatures(WGPUInstanceFeatures* features) WGPU_FUNCTION_ATTRIBUTE;

This is kind of redundant but I guess it's fine? We could extend only one or the other, and/or make wgpuGetInstanceFeatures take a WGPUInstanceDescriptor, but those solutions seem worse.

@kainino0x
Copy link
Collaborator Author

kainino0x commented Sep 7, 2023

meeting:

  • error reporting for invalid CallbackModes

    • KN: Currently undefined behavior (assert), could be “call callback immediately with an error status, and then never again”
    • … actually we don’t like that because it would be very hard to wrap into a Result<WGPUFuture> in a higher level language
    • Return sentinel error Future?
    • Return a struct with status+future?
    • Make future an out param (and replace CallbackMode_Future with whether you passed that)? do this tentatively
    • Also require implementation-defined logging of the error
  • naming: CallbackMode / CallbackModeFlags

    • KN: ModeFlags is kind of redundant sounding. I wanted to call it CallbackFlag / CallbackFlags but this would be a special case among all the flags. Keep as is or change?
    • LK: Assuming we keep it as bitflags?
    • Current set of options is very confusing…
    • AE: I think we can make it work to mix ProcessEvents and Future
    • Likely no harder than mixing ProcessEvents and Spontaneous
    • CF: Bindings can strongly type queue vs non-queue events. Can dynamically check errors for queue events from different devices.
    • Remove the Future callback mode and always return a Future.
    • Make the callback info structs extensible. Change CallbackMode to an enum because now we don’t need extensibility and it only has 3 options (None, AllowProcessEvents, AllowSpontaneous).

@kainino0x
Copy link
Collaborator Author

We forgot to revisit the question of how to report errors but I think one of the points of always returning a future (as I did in an earlier version of the proposal already) was that now we can return the null future when there's an error, so I propose that being our error mechanism. (With implementation-defined logging for the details.)

@kainino0x
Copy link
Collaborator Author

Oh, Loko pointed out that if it's an enum (rather than flags that have invalid combinations) then we can probably just crash if you pass (WGPUCallbackMode)1337

@kainino0x
Copy link
Collaborator Author

Dec 21 meeting

  • Proposing tweaks to futures
    • LK/AE: instance logging callback and device uncaptured error callback always spontaneous; no uncaptured error callback mode. it’s weird that the uncaptured error callback has a callback mode like other futures - but it’s really not a future and doesn’t make sense to sort of look that way
      • add a getter for the device lost future id. you can indeed wait on it, but you would need to use a different thread. Makes it consistent with all other futures
    • CF: Any reason to have a future for DeviceLost other than consistency?
    • KN: can wait on a thread, can poll, can wait just after Destroy
    • LK: If you need to do that can just set the mode to AllowProcessEvents and call ProcessEvents
    • CF: Even if just for consistency, it’s fine
    • thumbs up let’s do this change
    • AE: Important thing is making UncapturedError (and logging callback) always spontaneous with no control.
    • LK: The DeviceLost future id can also be used as a unique identifier for the device (re: previous week conversation)
    • (recap of previous discussion about passing Device to the DeviceLost callback, and discussion of how it’s implemented)
    • CF: Appreciate how much easier it is to document this stuff in rust :)
  • confirming: we said we would have WGPUDevice in the lost callback. Will we have WGPUBuffer in the map callback?
    • Yes
    • related - the future event refs the buffer? so if you wgpuBufferRelease the “last” ref, the event doesn’t force-complete early. you need to call wgpuBufferDestroy or Unmap
    • Yes
    • (discussion of how auto-destroy will work in WASM)

@kainino0x kainino0x changed the title Problems with callbacks only being called in ProcessEvents WGPUFuture Jan 11, 2024
@kainino0x
Copy link
Collaborator Author

kainino0x commented Jan 11, 2024

Jan 11 meeting:

  • Loko: Initial implementation of Futures included a “Shutdown” semantic to guarantee that all callbacks are eventually called. Right now, we enforce this by calling all non-spontaneous callbacks with a “Shutdown” event completion type when the Instance is dropped. This does not work well with the DeviceLost event though… If user creates a device with a non-spontaneous device lost callback mode, dropping the instance would call the device lost callback?
  • Instead, (talking to Kai), maybe we should leak those callbacks? That would simplify stuff in that we don’t even need a “Shutdown” semantic for events anymore, but instead we would just leak callbacks that don’t have a way to be called… We could log and/or debug assert to make these easier to catch? Would this change our decision on calling the lost callback when we fail requestDevice? (We would already be leaking callbacks in other places anyways?) #251
  • With enum changes to reserve 0 and make sure no default for callback mode, how do we want to handle validation? For most of the future stuff we have a device so we could generate a validation error, but some don’t, i.e. adapter.requestDevice. How do we want to handle those?
    • Just return kNullFuture or something like that for all future returning entry points?
    • Use validation errors when possible, otherwise implementation defined logging?
  • Resolution: Add reserved enum for success/instance dropped for all callbacks. For popErrorScope, separate out its callback type from uncapturedErrorCallback. For validation return kNullFuture and implementation defined logging.

copybara-service bot pushed a commit to google/dawn that referenced this issue Feb 6, 2024
Bug: webgpu-native/webgpu-headers#199
Change-Id: Id30ec55b2eb727ddf6dd0bf09b612ddee99b88da
Reviewed-on: https://dawn-review.googlesource.com/c/dawn/+/173620
Commit-Queue: Austin Eng <enga@chromium.org>
Kokoro: Kokoro <noreply+kokoro@google.com>
Reviewed-by: Loko Kung <lokokung@google.com>
Reviewed-by: Kai Ninomiya <kainino@chromium.org>
@kainino0x
Copy link
Collaborator Author

Mar 7 meeting:

  • What was our answer to the question of whether we call UncapturedError one last time when the device is lost, or if the DeviceLost event is what is supposed to tell you that UncapturedError is not going to be called anymore? (#199 subtopic)
    • Last notes from Aug 10: #199 (comment)
    • KN: I thought we also discussed this on Jan 11, but no notes on that question: #199 (comment)
    • Would we do it for the logging callback too?
    • AE: That’ll be on the instance which doesn’t have an InstanceLost, so it would be useful
    • KN: Makes sense for UncapturedError to be consistent?
    • Maybe…
    • Prior art? Vulkan debug info callback?
      • Vulkan has explicit create and destroy
    • LK: Prefer having one userdata for both callbacks
    • Consistency with instance?
      • We could add InstanceLost when we add LoggingCallback
      • Or we could just guarantee LoggingCallback won’t be called after the last external reference to the instance is dropped (we have to track this anyway for Futures)
    • RM: Does Dawn need the logging callback on the instance? Could it be global?
      • In wgpu-native it’s global. Reason is Rust logs are kind of global, and everything in wgpu logs through that.
      • AE: In Chrome we need to route logs to the correct webpage for the device. Could do it by a token.
      • LK: Weren’t we planning to remove the device-level LoggingCallback?
      • For Instance-level logs should we send them to all pages?
      • KN: Are there any instance-level logs that we actually expect to happen in Chrome? Mostly incorrect usages of C API
      • AE: Things like couldn’t load Vulkan driver.
      • KN: Probably should cache that log and log it every time requestAdapter happens?
      • CF: Would be nice for logs to be capturable by Sentry etc. In WebGL error messages go to straight to the console and we can’t capture them.
    • Qs:
      • Should there be both a device and instance level LoggingCallback (in Dawn and/or if we eventually spec one)
        • AE: Think we only need instance-level one
      • Should the device LoggingCallback be called one last time (or DeviceLost be what tells you to free your userdata)
      • Should the instance LoggingCallback be called one last time (or should we just stop when the last external reference to the instance is dropped)
        • Stop when the last external reference is dropped.
        • AE: Can change later by adding a value to the LoggingType enum
      • Should UncapturedError be called one last time
        • No, it’s redundant and not really necessary
      • Should UncapturedError and DeviceLost get the same userdata
        • No, it’s less flexible
        • Most people won’t even pass a userdata to UncapturedError
        • But if they do they can deal with the lifetimes however they want
    • C++ lambdas for DeviceLost and UncapturedError. Won't match the C layout
      • We can make a new struct in C++ and convert it

@kainino0x kainino0x mentioned this issue Mar 8, 2024
@kainino0x
Copy link
Collaborator Author

kainino0x commented Mar 26, 2024

Issues @lokokung raised during implementation of DeviceLost:

  • what do we do when the RequestDevice fails (in the case where JS rejects - invalid feature/limit request)?
    • Loko added a FailedCreation value to the DeviceLostReason status - this seems simplest to me.
    • We could reuse the Unknown/Destroyed status with a null device (so it's still distinguishable).
    • We could just not call the callback - this would sort of "leak".
    • We could undo Pass DeviceLostCallback in at device creation time.  #173 and switch back to SetDeviceLostCallback, but change its behavior such that (like JS) it calls the callback if you set it after the device is already lost. This seems kind of painful because it would be the only case where a future can have its callback set after the fact.
  • now that DeviceLost is a future it needs an InstanceDropped status (no real question here).

@kainino0x
Copy link
Collaborator Author

Mar 28 meeting:

  • summary:
    • FailedCreation, or alternatives
    • InstanceDropped, because there’s a callback mode
  • LK: We could also need to call InstanceDropped with a null device
  • KN: Applications might not handle Unknown + null device and crash trying to use the device (but what are they using the device for?)
  • (discussion of adapters being single-use)
  • Add InstanceDropped (with device always null for simplicity)
  • Add FailedCreation (device always null because creation failed)

@kainino0x kainino0x added the needs docs Non-trivial API contract needs to be documented. Orthogonal to open vs closed; remove when doc'd. label Mar 28, 2024
@kainino0x kainino0x added the needs exploration Needs offline exploration (e.g. in an implementation) label Jun 6, 2024
@kainino0x
Copy link
Collaborator Author

Docs were added in #314!

@kainino0x kainino0x removed the needs docs Non-trivial API contract needs to be documented. Orthogonal to open vs closed; remove when doc'd. label Aug 31, 2024
@almarklein
Copy link

Docs were added in #314!

It looks like that pr also adds new functions to the spec. What part of WGPUFuture is still missing now? I'm curious to what extent I can already start implementing async support in wgpu-py 😊

@kainino0x
Copy link
Collaborator Author

I don't remember for sure, probably WGPUFuture is more or less landed, though there are some changes we need to make to the async functions' signatures for various reasons. I'll review at some point soon....

That said, this header doesn't yet match wgpu-native or Dawn, so please use the header that corresponds with your implementation. (Dawn has implemented WGPUFuture hasn't done all the signature changes yet either.)

@almarklein
Copy link

almarklein commented Oct 1, 2024

Thanks for your reply!

this header doesn't yet match wgpu-native or Dawn

It's on the way 🚀 gfx-rs/wgpu-native#427

Reading through some of the docs referenced here, I saw that if one uses the futures for all events (once its implemented) its not longer necessary to call wgpuInstanceProcessEvents(). But I also read somewhere that process events
"check in-flight GPU work and free resources no longer used by GPU". Is the latter no longer the case?

@PJB3005
Copy link

PJB3005 commented Oct 1, 2024

It's on the way 🚀 gfx-rs/wgpu-native#427

Just FYI, my PR doesn't really implement WGPUFuture as I don't think wgpu currently has the necessary infrastructure. wgpuInstanceWaitAny is just unimplemented!() right now.

@kainino0x
Copy link
Collaborator Author

But I also ready somewhere that process events
"check in-flight GPU work and free resources no longer used by GPU". Is the latter no longer the case?

IIRC we want to guarantee that stuff like this happens periodically as long as you're submitting work (e.g. in queue.submit()).

Though I did have an unused proposal for a function that would do this, if we find it's needed: https://docs.google.com/document/d/1qJRTJRY318ZEqhK6ryw4P-91FumYQfOnNP6LpANYy6A/edit#heading=h.l3149w38ebmt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async Asynchronous operations and callbacks needs exploration Needs offline exploration (e.g. in an implementation)
Projects
None yet
Development

No branches or pull requests

3 participants