-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending DC to be Multithreaded #520
Labels
Milestone
Comments
If that's ok, I would like to work on this when I'm back. |
Great, the task is all yours |
May be related - but would this enable the export of completed results if a build were to fail half way? E.g. if a build were to fail due to some resources not being available (e.g. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
The current approach of DC executes tasks sequentially. Taking London Cycle Traffic Air Quality recipe as an example throughout the issue description, will explain the current approach and possible approach to make DC multithreaded.
The recipe executes 8 Tasks in total when running, explained below:
Task 1 -> Download LocalAuthority Data from OaImporter
Task 2 -> Download TrafficCounts Data from TrafficCountImporter
Task 3 -> Download airQuality Data from LAQNImporter
Task 4 -> Geographic Aggregation of
NO2 40 ug/m3
andBicycleFraction
in FieldsTask 5 -> Taking mean of
NO2 40 ug/m3
using LatestValueFieldTask 6 -> Calculation BicycleFraction by dividing
sum of CountPedalCycles
andsum of CountCarsTaxis
Task 7 -> Adding
CountPedalCycles
using LatestValueFieldTask 8 -> Adding
CountCarsTaxis
using LatestValueFieldCurrent Approach
In current approach DC executes one task at a time, so the order to execution would be:
Task 1, Task 2, Task 3, Task 5, Task 7, Task 8, Task 6, Task 4 (one at a time)
Proposed Approach
We could execute certain Tasks in Parallel as executing certain tasks doesn't depend upon other Tasks.
We could create
Dependency Graph
e.gTasks ------> Dependencies
Task 1------> 0
Task 2------> 0
Task 3------> 0
Task 4------> Task 5, 6
Task 5------> Task 1, 2, 3
Task 6------> Task 7, 8
Task 7------> Task 1, 2, 3
Task 8------> Task 1, 2, 3
Now we only execute those tasks which have
0
dependencies in parallel and keeps updatingDependency Graph
e.gWe could execute
Task 1,2,3
in parallel, once they are done we update theDependency Graph
and remove dependency count forTask 5, 7 ,8
Then we execute
Task 5,7,8
in parallel and once done,update the
Dependency Graph
again and remove dependency count forTask 4, 6
but noticeTask 4
can't still be executed in parallel withTask 6
as it hasTask 6
as dependency, which means now we executeTask 6 and 4
sequentially.Making DC multi-threaded could significantly improve run times.
Error log
None
The text was updated successfully, but these errors were encountered: