Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GPU->CPU transfers #5593

Merged
merged 9 commits into from
Sep 11, 2024
Merged

Enable GPU->CPU transfers #5593

merged 9 commits into from
Sep 11, 2024

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Aug 2, 2024

Category:

New feature
Refactoring

Description:

GPU->CPU transfer is made possible via .cpu() function in DataNode.
Some refactoring of pipeline class.
The checks have been removed from Pipeline class. The old executor still raises an error when GPU->CPU transfer occurs.
The check for GPU arguments to CPU ops have been removed from Python front-end.

TODO: Extend InputDevice in Schema and use it for python-side checks.

Additional information:

Affected modules and functionalities:

Pipeline
Python front-end
Copy operator

Key points relevant for the review:

Some old tests were removed or reduced to allow for the new capability.
New tests were added which test:
.cpu() for transfers (including with conditionals)
fn.shapes() with CPU backend and GPU input

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: EXE.06

JIRA TASK: DALI-4030

Comment on lines -521 to -535
// Creating the graph

for (auto& name_op_spec : op_specs_) {
string& inst_name = name_op_spec.instance_name;
OpSpec op_spec = name_op_spec.spec;
PrepareOpSpec(&op_spec, name_op_spec.logical_id);
try {
graph_builder_.Add(inst_name, op_spec);
} catch (...) {
PropagateError({std::current_exception(),
"Critical error when building pipeline:\n" + GetErrorContextMessage(op_spec),
"\nCurrent pipeline object is no longer valid."});
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part has been moved after the outputs because when processing outputs we add MakeContiguous operators.

@mzient mzient force-pushed the enable_gpu2cpu branch 3 times, most recently from 016ef1e to 33fca28 Compare September 9, 2024 08:08
@@ -91,6 +94,20 @@ def gpu(self) -> DataNode:
return transferred_node
return DataNode(self.name, "gpu", self.source)

def cpu(self) -> DataNode:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is copied from .gpu(), see above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code is identical it can be abstracted out to a function

Copy link
Contributor Author

@mzient mzient Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

def cpu(self):
    return self._to_backend("cpu")

def gpu(self):
    return self._to_backend("gpu")

def _to_backend(self, backend) -> DataNode:
    ...

?
It could be done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would be great! Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support above^^
And maybe _to_backend should not be internal, maybe people would like to use it directly (transfer by parameter and not the function name).

mzient and others added 5 commits September 10, 2024 13:51
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
…CPU `shapes` taking GPU input.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mzient mzient marked this pull request as ready for review September 10, 2024 16:21
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18299035]: BUILD STARTED

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18299035]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18301349]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18301349]: BUILD PASSED

* If `separated_execution == 0`, this value is ignored
* @param enable_memory_stats Enable memory stats.
*/
DLL_PUBLIC void
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added exec_flags so that we can (hopefully) get away without adding even more parameters when we add more flavors of the executor.

Copy link
Collaborator

@mdabek-nvidia mdabek-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non critical comments, address them if needed.

dali/pipeline/executor/lowered_graph.cc Show resolved Hide resolved
@@ -91,6 +94,20 @@ def gpu(self) -> DataNode:
return transferred_node
return DataNode(self.name, "gpu", self.source)

def cpu(self) -> DataNode:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code is identical it can be abstracted out to a function

pipe = pdef()
pipe.build()
for i in range(10):
gpu, cpu = pipe.run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add check for the TL backend here as well. check_batch can handle anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

pipe = pdef()
pipe.build()
for i in range(10):
peek, gpu, cpu = pipe.run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

* @param enable_memory_stats Enable memory stats.
*/
DLL_PUBLIC void
daliCreatePipeline3(daliPipelineHandle *pipe_handle, const char *serialized_pipeline, int length,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you add a test to c_api_test.cc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do (perhaps in a follow-up).

@JanuszL JanuszL self-assigned this Sep 11, 2024
Copy link
Contributor

@JanuszL JanuszL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a backend check in the Python test, other things are more questions than suggestions.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18333257]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18333257]: BUILD PASSED

@mzient mzient merged commit 408c18b into NVIDIA:main Sep 11, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants