Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for async task system processing #958

Merged
merged 1 commit into from
May 31, 2023

Conversation

gsnook
Copy link
Contributor

@gsnook gsnook commented Apr 6, 2023

** Updated April 10, 2023 **
This has been revised based on feedback.

Async tasks support is now far simpler and the setup leverages the same function definitions as threads. There is no additional ecs_os_api_t entries needed.

To use an external async task system to execute multithreaded system work, simply use ecs_set_task_threads instead of ecs_set_threads.

ecs_set_task_threads(world, 4, <create_task_callback>, <wait_task_callback>); // Create 4 worker task threads for the duration of each ecs_progress update
world.set_task_threads(4, <create_task_callback>, <wait_task_callback>);

For simplicity, these task callbacks use the same format as Flecs ecs_os_api_t thread APIs. In fact, you could provide your regular os thread api functions to create short-duration threads for multithreaded system processing.
Create multithreaded systems using the multi_threaded flag as with ecs_set_threads above.

When ecs_progress is called, your <create_task_callback> will be called once for every task thread needed to create task threads on demand. When ecs_progress is complete, your <wait_task_callback> function will be called to clean up each task thread.
By providing callback functions which create and remove tasks for your specific asynchronous task system, you can use Flecs with any kind of async task management scheme.

Internally, task threads are just regular threads with shorter lifespans. There is no duplication of code for updating tasks vs. threads.

Documentation & tests have been updated. Any test which used ecs_set_threads now has an ecs_set_task_threads variant.

@SanderMertens
Copy link
Owner

Can you provide more information on the difference between threads and tasks? Before I start reviewing the code I think I need a bit more context on which problem this solves. Some example code that uses the API would help for sure.

Also, since this is a non-trivial feature, can you add a few test cases that make sure it works correctly?

@gsnook
Copy link
Contributor Author

gsnook commented Apr 6, 2023

When using Flecs within a larger game engine like Unreal, it can be beneficial to have it use that engine's provided asynchronous task system rather than create its own threads. This allows Flecs to work within the expectations of the game engine rather than have its threads compete with the engine's task threads for processing time. Both Unity and Unreal have built-in job systems to handle async task requests.

The specific problem being solved is that those engines often generate worker threads for their job systems to saturate the available processing cores. Therefore, additional threads created by something like Flecs become competition for core time, leading to a large number of thread context switches. By allowing Flecs to use the engine's task system, it becomes a better 'citizen' of the engine and works in harmony with other asynchronous tasks and the engine's performance monitoring tools.

Setting up tasks looks identical to setting up threads. You supply your API hook functions for tasks to ecs_os_set_api and then inform Flecs that you wish to use tasks for multithreading by calling ecs_set_tasks instead of ecs_set_threads.

I struggled to add tests because the system relies on there being an external job scheduler in use. I would need to add one to the test harness and then make tests to utilize it. If I choose an open-source job system for that task, then the test becomes more about validating that job system than Flecs. We could perhaps have tests which use a dummy job system that executes work synchronously. That would test the task code paths in Flecs, but not actual parallel execution of work. Would that suffice?

@gsnook
Copy link
Contributor Author

gsnook commented Apr 7, 2023

I found a way to create tests for the task api. They have been added.

@SanderMertens
Copy link
Owner

Ok I see what the code is doing, I like the idea of being able to integrate with Unreal's task system! I do have a few reservations about how it is currently implemented:

  • task_new and task_wait are a bit too opinionated for the OS API. The goal of the interface is to provide a low-level abstraction for operating system features.
  • If you want to run existing Flecs pipelines from Unreal threads, it could be easier to just expose (and maybe reorganize) some of the internal functions that run a pipeline. That would give you more flexibility & probably be more performant as there would be fewer callbacks & allocations.
  • There is some code duplication between flecs_task and flecs_run_pipeline which would become annoying to maintain when building new pipeline features
  • The tests don't address edge cases like enabling/disabling systems, or a schedule that changes in the middle of a frame (check the Pipeline tests to see what I mean)

@gsnook
Copy link
Contributor Author

gsnook commented Apr 7, 2023

Thanks for the feedback! An alternative I was considering earlier was to skip adding task_new and task_wait to the OS API and instead make the task API an optional parameter to ecs_progress, so users could request 'progress with tasks'. I had just assumed you wanted all external API hooks together in the OS API. Conceptually, I don't mind it in the OS API. You can think of those hooks as not just low-level OS abstractions, but API hooks to the host platform. In my case, that's Unreal Engine so I provide OS API hooks to log, assert and perform async work through Unreal as my 'OS'.

I'd like to explore your idea for exposing pipeline work to tasks differently. The catch there is that a task should never block and wait for a signal to continue like Flecs threads do. Not all task systems are setup to shelve a task and allow other work to progress on the same worker thread. For the most part, you want to wait to create a task until the work is ready to go and then end it when the work is done. Creating tasks is usually low cost, so frequent creation/destruction is not a concern. It's part of the task system's intended design.

Refactoring flecs_task and flecs_run_pipeline to remove duplication sounds ideal. The last thing I want is to add maintenance work. I'll look into that asap along with expanding the test coverage.

In the meantime, what shall we do with this PR? Do you want to leave it open and iterate here, or should we shelve it and maybe keep the refactoring discussion going in a discord thread until the next revision is ready?

@gsnook
Copy link
Contributor Author

gsnook commented Apr 11, 2023

Revised based on feedback. See new description above.

@@ -639,7 +639,9 @@ struct ecs_world_t {
ecs_os_mutex_t sync_mutex; /* Mutex for job_cond */
int32_t workers_running; /* Number of threads running */
int32_t workers_waiting; /* Number of workers waiting on sync */

ecs_os_api_thread_new_t task_worker_new; /* override to create task workers. 0= unused */
ecs_os_api_thread_join_t task_worker_join; /* override to join task workers. 0= unused */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the OS API thread_new and thread_join callbacks be used for this? E.g. you'd set the OS API callbacks to your custom new/join functions first, then enable the task API.

The thread_new and thread_join aren't used for anything else by Flecs except the worker API, so it wouldn't conflict with anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered this, but the ecs_http_server uses ecs_os_thread_new and I didn't want to change that system. I figured it would be best to maintain the separate APIs for threads and tasks rather than force other systems (or future systems) from needing to be task-compliant.

@SanderMertens
Copy link
Owner

Looks good, thanks for the PR! 🎉

@SanderMertens SanderMertens merged commit a2b994b into SanderMertens:master May 31, 2023
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants