-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fake HPU mode to Habana components with dummy habana_frameworks module. #250
Add fake HPU mode to Habana components with dummy habana_frameworks module. #250
Conversation
923b070
to
4d08172
Compare
Running bash format.sh throws errors regarding dummy modules - [attr-defined], not sure what to do about that. |
vllm/utils.py
Outdated
@lru_cache(maxsize=None) | ||
def is_fake_hpu() -> bool: | ||
return os.environ.get('VLLM_USE_FAKE_HPU', '0') != '0' or ( | ||
not _is_habana_frameworks_installed() and _is_built_for_hpu()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit risky. If for whatever reason we cannot find habana_frameworks or other env issue we shouldn't fallback to CPU by default and we should fail asap unless CPU fallback was requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, is_fake_hpu now only depends on VLLM_USE_FAKE_HPU flag.
vllm/utils.py
Outdated
@@ -1088,3 +1114,69 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args, | |||
"""Utility function to run async task in a lock""" | |||
async with lock: | |||
return await task(*args, **kwargs) | |||
|
|||
|
|||
def _create_dummy_modules(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very brittle. Anytime someone adds a new module in a any file we'd need to remember to wrap it here. Couldn't we do it somehow differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done research and asked a few people and I have not found a different way of doing it unfortunately. I'm open for suggestions but for now I have not found a more "elegant" way of doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using MagicMock?
https://stackoverflow.com/a/37126323
https://docs.python.org/3/library/unittest.mock.html
As far as I understand it should automatically mock everything in the hierarchy below. We could do it only for 'habana_frameworks'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately MagicMock doesn't solve the submodules issue but it highly improves visibility and mokes further dummy modules additions much simpler -> Changed origin dummy modules handling to MagicMock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Perhaps something like this could work:
builtin_import = __builtins__.__import__
def import_wrapper(name, *args, **kwargs):
if 'habana_frameworks' in name:
sys.modules[name] = MagicMock()
return builtin_import(name, *args, **kwargs)
__builtins__.__import__ = import_wrapper
Could you please check if it works? (last thing, I promise! 😄 )
vllm/utils.py
Outdated
habana_frameworks.torch.core.mark_step = lambda: print( # type: ignore | ||
'calling mark_step') | ||
habana_frameworks.torch.utils.internal.is_lazy = lambda: print( # type: ignore | ||
'calling is_lazy') | ||
torch.hpu.synchronize = lambda: print('calling synchronize' # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this correspond to definitions in _migrate_to_cpu()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
vllm/worker/habana_model_runner.py
Outdated
lora_logits_mask = torch.zeros(len(seq_group_metadata_list), | ||
(self.lora_config.max_loras + 1) * | ||
(self.lora_config.max_loras) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks completely unrelated and most likely this PR needs to be updated with recent changes from habana_main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
vllm/worker/habana_worker.py
Outdated
@@ -138,6 +140,11 @@ def determine_num_available_blocks(self) -> Tuple[int, int]: | |||
|
|||
# Execute a forward pass with dummy inputs to profile the memory usage | |||
# of the model. | |||
if is_fake_hpu(): | |||
# self.model_runner.profile_run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
vllm/worker/habana_worker.py
Outdated
world_size=parallel_config.world_size, | ||
rank=rank, | ||
init_method=distributed_init_method, | ||
) | ||
|
||
# A small all_reduce for warmup & checking conformance. | ||
dummy_tensor_hpu = torch.ones(1).to('hpu') | ||
device = 'hpu' if not is_fake_hpu() else 'cpu' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this snippet before. Couldn't we wrap it in a helper function? Like hpu_device_str() and move the check inside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapped in a helper function
Addressed most review comments, since the original PR there were changes in habana_main that cause this code to fail. Currently working on a fix. |
…mczuk/fake_hpu_cpu
1e4d079
to
73f213a
Compare
All review comments addressed, currently cpu-test fails because of a bug in habana_main, waiting for fix in PR #271 to be merged. |
@madamczykhabana All review comments addressed. After merging fixed habana_main all checks pass. |
vllm/utils.py
Outdated
@@ -1088,3 +1114,69 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args, | |||
"""Utility function to run async task in a lock""" | |||
async with lock: | |||
return await task(*args, **kwargs) | |||
|
|||
|
|||
def _create_dummy_modules(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using MagicMock?
https://stackoverflow.com/a/37126323
https://docs.python.org/3/library/unittest.mock.html
As far as I understand it should automatically mock everything in the hierarchy below. We could do it only for 'habana_frameworks'.
@madamczykhabana Changed to MagicMock -> ready to merge. |
vllm/utils.py
Outdated
@@ -1088,3 +1114,69 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args, | |||
"""Utility function to run async task in a lock""" | |||
async with lock: | |||
return await task(*args, **kwargs) | |||
|
|||
|
|||
def _create_dummy_modules(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Perhaps something like this could work:
builtin_import = __builtins__.__import__
def import_wrapper(name, *args, **kwargs):
if 'habana_frameworks' in name:
sys.modules[name] = MagicMock()
return builtin_import(name, *args, **kwargs)
__builtins__.__import__ = import_wrapper
Could you please check if it works? (last thing, I promise! 😄 )
@madamczykhabana It works! I made a small refactor of dummy modules handling, all habana_frameworks submodules are now created automatically, changing it should not be required in future development. By default methods from dummy submodules do nothing, in case we need a function to be doing something (ie. return False) see: https://github.com/HabanaAI/vllm-fork/pull/250/files#diff-dab7693bd00a09e22d39aee684a7e419aa358a47c4bd20df33d44f5adf60d304R1151-R1153 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…odule. (HabanaAI#250) Co-authored-by: Konrad Zawora <kzawora@habana.ai>
…odule. (HabanaAI#250) Co-authored-by: Konrad Zawora <kzawora@habana.ai>
Reverted PRs: - #250 - #195 --------- Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@126.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: jiqing-feng <107918818+jiqing-feng@users.noreply.github.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com> Co-authored-by: sroy745 <142070531+sroy745@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Brendan Wong <bjwpokemon@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Peter Salas <peter@fixie.ai> Co-authored-by: Alex Brooks <alex.brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Hanzhi Zhou <hanzhi713@gmail.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Refactor and improvements for #180