Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fake HPU mode to Habana components with dummy habana_frameworks module. #250

Merged
merged 39 commits into from
Sep 17, 2024

Conversation

jmaksymczuk
Copy link

@jmaksymczuk jmaksymczuk commented Sep 6, 2024

Refactor and improvements for #180

@jmaksymczuk jmaksymczuk force-pushed the private/jmaksymczuk/fake_hpu_cpu branch from 923b070 to 4d08172 Compare September 6, 2024 14:01
@jmaksymczuk
Copy link
Author

Running bash format.sh throws errors regarding dummy modules - [attr-defined], not sure what to do about that.
Except for that the code is ready for review @kzawora-intel

@jmaksymczuk jmaksymczuk changed the title [WIP] Add fake HPU mode to Habana components with dummy habana_frameworks module. Add fake HPU mode to Habana components with dummy habana_frameworks module. Sep 6, 2024
vllm/utils.py Outdated
@lru_cache(maxsize=None)
def is_fake_hpu() -> bool:
return os.environ.get('VLLM_USE_FAKE_HPU', '0') != '0' or (
not _is_habana_frameworks_installed() and _is_built_for_hpu())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit risky. If for whatever reason we cannot find habana_frameworks or other env issue we shouldn't fallback to CPU by default and we should fail asap unless CPU fallback was requested.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, is_fake_hpu now only depends on VLLM_USE_FAKE_HPU flag.

vllm/utils.py Outdated
@@ -1088,3 +1114,69 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args,
"""Utility function to run async task in a lock"""
async with lock:
return await task(*args, **kwargs)


def _create_dummy_modules():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very brittle. Anytime someone adds a new module in a any file we'd need to remember to wrap it here. Couldn't we do it somehow differently?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done research and asked a few people and I have not found a different way of doing it unfortunately. I'm open for suggestions but for now I have not found a more "elegant" way of doing that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using MagicMock?
https://stackoverflow.com/a/37126323
https://docs.python.org/3/library/unittest.mock.html

As far as I understand it should automatically mock everything in the hierarchy below. We could do it only for 'habana_frameworks'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately MagicMock doesn't solve the submodules issue but it highly improves visibility and mokes further dummy modules additions much simpler -> Changed origin dummy modules handling to MagicMock.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Perhaps something like this could work:

builtin_import = __builtins__.__import__

def import_wrapper(name, *args, **kwargs):
        if 'habana_frameworks' in name:
                sys.modules[name] = MagicMock()
        return builtin_import(name, *args, **kwargs)

__builtins__.__import__ = import_wrapper

Could you please check if it works? (last thing, I promise! 😄 )

vllm/utils.py Outdated
Comment on lines 1156 to 1160
habana_frameworks.torch.core.mark_step = lambda: print( # type: ignore
'calling mark_step')
habana_frameworks.torch.utils.internal.is_lazy = lambda: print( # type: ignore
'calling is_lazy')
torch.hpu.synchronize = lambda: print('calling synchronize' # type: ignore

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this correspond to definitions in _migrate_to_cpu()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

lora_logits_mask = torch.zeros(len(seq_group_metadata_list),
(self.lora_config.max_loras + 1) *
(self.lora_config.max_loras) *

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks completely unrelated and most likely this PR needs to be updated with recent changes from habana_main

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@@ -138,6 +140,11 @@ def determine_num_available_blocks(self) -> Tuple[int, int]:

# Execute a forward pass with dummy inputs to profile the memory usage
# of the model.
if is_fake_hpu():
# self.model_runner.profile_run()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

world_size=parallel_config.world_size,
rank=rank,
init_method=distributed_init_method,
)

# A small all_reduce for warmup & checking conformance.
dummy_tensor_hpu = torch.ones(1).to('hpu')
device = 'hpu' if not is_fake_hpu() else 'cpu'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this snippet before. Couldn't we wrap it in a helper function? Like hpu_device_str() and move the check inside?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped in a helper function

@jmaksymczuk
Copy link
Author

Addressed most review comments, since the original PR there were changes in habana_main that cause this code to fail. Currently working on a fix.

@jmaksymczuk jmaksymczuk force-pushed the private/jmaksymczuk/fake_hpu_cpu branch from 1e4d079 to 73f213a Compare September 11, 2024 12:50
@jmaksymczuk
Copy link
Author

All review comments addressed, currently cpu-test fails because of a bug in habana_main, waiting for fix in PR #271 to be merged.

@jmaksymczuk
Copy link
Author

@madamczykhabana All review comments addressed. After merging fixed habana_main all checks pass.

vllm/utils.py Outdated
@@ -1088,3 +1114,69 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args,
"""Utility function to run async task in a lock"""
async with lock:
return await task(*args, **kwargs)


def _create_dummy_modules():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using MagicMock?
https://stackoverflow.com/a/37126323
https://docs.python.org/3/library/unittest.mock.html

As far as I understand it should automatically mock everything in the hierarchy below. We could do it only for 'habana_frameworks'.

@jmaksymczuk
Copy link
Author

@madamczykhabana Changed to MagicMock -> ready to merge.

vllm/utils.py Outdated
@@ -1088,3 +1114,69 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args,
"""Utility function to run async task in a lock"""
async with lock:
return await task(*args, **kwargs)


def _create_dummy_modules():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Perhaps something like this could work:

builtin_import = __builtins__.__import__

def import_wrapper(name, *args, **kwargs):
        if 'habana_frameworks' in name:
                sys.modules[name] = MagicMock()
        return builtin_import(name, *args, **kwargs)

__builtins__.__import__ = import_wrapper

Could you please check if it works? (last thing, I promise! 😄 )

@jmaksymczuk
Copy link
Author

@madamczykhabana It works! I made a small refactor of dummy modules handling, all habana_frameworks submodules are now created automatically, changing it should not be required in future development. By default methods from dummy submodules do nothing, in case we need a function to be doing something (ie. return False) see: https://github.com/HabanaAI/vllm-fork/pull/250/files#diff-dab7693bd00a09e22d39aee684a7e419aa358a47c4bd20df33d44f5adf60d304R1151-R1153
PR ready to be merged.

Copy link

@madamczykhabana madamczykhabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@madamczykhabana madamczykhabana merged commit a9de5ba into habana_main Sep 17, 2024
14 checks passed
zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024
…odule. (HabanaAI#250)

Co-authored-by: Konrad Zawora <kzawora@habana.ai>
zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024
…odule. (HabanaAI#250)

Co-authored-by: Konrad Zawora <kzawora@habana.ai>
kzawora-intel added a commit that referenced this pull request Sep 23, 2024
michalkuligowski pushed a commit that referenced this pull request Sep 26, 2024
Reverted PRs:
- #250 
- #195

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Jani Monoses <jani.monoses@gmail.com>
Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: jiqing-feng <107918818+jiqing-feng@users.noreply.github.com>
Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: sroy745 <142070531+sroy745@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Peter Salas <peter@fixie.ai>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Hanzhi Zhou <hanzhi713@gmail.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
@jmaksymczuk jmaksymczuk deleted the private/jmaksymczuk/fake_hpu_cpu branch October 7, 2024 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants