Add fake HPU mode to Habana components with dummy habana_frameworks module. #250

jmaksymczuk · 2024-09-06T10:51:37Z

Refactor and improvements for #180

…a/test_pr

…a/fake_hpu

…hod,

jmaksymczuk · 2024-09-06T14:48:50Z

Running bash format.sh throws errors regarding dummy modules - [attr-defined], not sure what to do about that.
Except for that the code is ready for review @kzawora-intel

madamczykhabana · 2024-09-10T09:48:54Z

vllm/utils.py

+@lru_cache(maxsize=None)
+def is_fake_hpu() -> bool:
+    return os.environ.get('VLLM_USE_FAKE_HPU', '0') != '0' or (
+        not _is_habana_frameworks_installed() and _is_built_for_hpu())


This is a bit risky. If for whatever reason we cannot find habana_frameworks or other env issue we shouldn't fallback to CPU by default and we should fail asap unless CPU fallback was requested.

Changed, is_fake_hpu now only depends on VLLM_USE_FAKE_HPU flag.

madamczykhabana · 2024-09-10T09:50:57Z

vllm/utils.py

@@ -1088,3 +1114,69 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args,
    """Utility function to run async task in a lock"""
    async with lock:
        return await task(*args, **kwargs)
+
+
+def _create_dummy_modules():


This is very brittle. Anytime someone adds a new module in a any file we'd need to remember to wrap it here. Couldn't we do it somehow differently?

I've done research and asked a few people and I have not found a different way of doing it unfortunately. I'm open for suggestions but for now I have not found a more "elegant" way of doing that.

What about using MagicMock?
https://stackoverflow.com/a/37126323
https://docs.python.org/3/library/unittest.mock.html

As far as I understand it should automatically mock everything in the hierarchy below. We could do it only for 'habana_frameworks'.

Unfortunately MagicMock doesn't solve the submodules issue but it highly improves visibility and mokes further dummy modules additions much simpler -> Changed origin dummy modules handling to MagicMock.

Hmm... Perhaps something like this could work:

builtin_import = __builtins__.__import__ def import_wrapper(name, *args, **kwargs): if 'habana_frameworks' in name: sys.modules[name] = MagicMock() return builtin_import(name, *args, **kwargs) __builtins__.__import__ = import_wrapper

Could you please check if it works? (last thing, I promise! 😄 )

madamczykhabana · 2024-09-10T09:52:00Z

vllm/utils.py

+    habana_frameworks.torch.core.mark_step = lambda: print(  # type: ignore
+        'calling mark_step')
+    habana_frameworks.torch.utils.internal.is_lazy = lambda: print(  # type: ignore
+        'calling is_lazy')
+    torch.hpu.synchronize = lambda: print('calling synchronize'  # type: ignore


How does this correspond to definitions in _migrate_to_cpu()?

madamczykhabana · 2024-09-10T09:54:03Z

vllm/worker/habana_model_runner.py

            lora_logits_mask = torch.zeros(len(seq_group_metadata_list),
-                                           (self.lora_config.max_loras + 1) *
+                                           (self.lora_config.max_loras) *


this looks completely unrelated and most likely this PR needs to be updated with recent changes from habana_main

madamczykhabana · 2024-09-10T09:54:42Z

vllm/worker/habana_worker.py

@@ -138,6 +140,11 @@ def determine_num_available_blocks(self) -> Tuple[int, int]:

        # Execute a forward pass with dummy inputs to profile the memory usage
        # of the model.
+        if is_fake_hpu():
+            #            self.model_runner.profile_run()


madamczykhabana · 2024-09-10T09:56:38Z

vllm/worker/habana_worker.py

            world_size=parallel_config.world_size,
            rank=rank,
            init_method=distributed_init_method,
        )

    # A small all_reduce for warmup & checking conformance.
-    dummy_tensor_hpu = torch.ones(1).to('hpu')
+    device = 'hpu' if not is_fake_hpu() else 'cpu'


I've seen this snippet before. Couldn't we wrap it in a helper function? Like hpu_device_str() and move the check inside?

Wrapped in a helper function

jmaksymczuk · 2024-09-10T14:04:03Z

Addressed most review comments, since the original PR there were changes in habana_main that cause this code to fail. Currently working on a fix.

…mczuk/fake_hpu_cpu

…FAKE_HPU flag.

jmaksymczuk · 2024-09-11T14:37:31Z

All review comments addressed, currently cpu-test fails because of a bug in habana_main, waiting for fix in PR #271 to be merged.

jmaksymczuk · 2024-09-12T07:57:39Z

@madamczykhabana All review comments addressed. After merging fixed habana_main all checks pass.

madamczykhabana · 2024-09-16T09:27:12Z

vllm/utils.py

@@ -1088,3 +1114,69 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args,
    """Utility function to run async task in a lock"""
    async with lock:
        return await task(*args, **kwargs)
+
+
+def _create_dummy_modules():


What about using MagicMock?
https://stackoverflow.com/a/37126323
https://docs.python.org/3/library/unittest.mock.html

As far as I understand it should automatically mock everything in the hierarchy below. We could do it only for 'habana_frameworks'.

jmaksymczuk · 2024-09-16T11:18:24Z

@madamczykhabana Changed to MagicMock -> ready to merge.

madamczykhabana · 2024-09-16T12:46:43Z

vllm/utils.py

@@ -1088,3 +1114,69 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args,
    """Utility function to run async task in a lock"""
    async with lock:
        return await task(*args, **kwargs)
+
+
+def _create_dummy_modules():


Hmm... Perhaps something like this could work:

builtin_import = __builtins__.__import__ def import_wrapper(name, *args, **kwargs): if 'habana_frameworks' in name: sys.modules[name] = MagicMock() return builtin_import(name, *args, **kwargs) __builtins__.__import__ = import_wrapper

Could you please check if it works? (last thing, I promise! 😄 )

jmaksymczuk · 2024-09-17T10:16:01Z

@madamczykhabana It works! I made a small refactor of dummy modules handling, all habana_frameworks submodules are now created automatically, changing it should not be required in future development. By default methods from dummy submodules do nothing, in case we need a function to be doing something (ie. return False) see: https://github.com/HabanaAI/vllm-fork/pull/250/files#diff-dab7693bd00a09e22d39aee684a7e419aa358a47c4bd20df33d44f5adf60d304R1151-R1153
PR ready to be merged.

madamczykhabana

LGTM

…odule. (HabanaAI#250) Co-authored-by: Konrad Zawora <kzawora@habana.ai>

…eworks module. (#250)" This reverts commit a9de5ba.

Reverted PRs: - #250 - #195 --------- Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@126.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: jiqing-feng <107918818+jiqing-feng@users.noreply.github.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com> Co-authored-by: sroy745 <142070531+sroy745@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Brendan Wong <bjwpokemon@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Peter Salas <peter@fixie.ai> Co-authored-by: Alex Brooks <alex.brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Hanzhi Zhou <hanzhi713@gmail.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

kzawora-intel and others added 19 commits August 13, 2024 11:02

Update habana_model_runner.py

e52c0ec

Merge remote-tracking branch 'origin/habana_main' into private/kzawor…

dcc878b

…a/test_pr

Add fake HPU mode

afffe33

Merge remote-tracking branch 'origin/habana_main' into private/kzawor…

ed414dc

…a/fake_hpu

format.sh

ceca996

tp fixes

1976d75

add cpu github action job

db4c30f

format.sh

08c9cf3

fix cputest job

ebcb4ab

add better validation

506e026

[WIP] Fake hpu cpu migration with dummy habana_frameworks.

08a24b0

Add --fake_hpu to cpu-test.

731cab1

Trigger cpu-test on PR to private/kzawora/fake_hpu.

b87d43d

Create dummy habana_frameworks.torch.utils.internal.is_lazy dummy met…

1b09033

…hod,

Merge branch 'habana_main' into private/jmaksymczuk/fake_hpu_cpu

dd8ac9b

Fix for model_runner and loader.

fb4ca58

Fix for ruff checks.

2cf66a2

Merge branch 'habana_main' into private/jmaksymczuk/fake_hpu_cpu

34d4141

Add dummy bridge_config module.

4d08172

jmaksymczuk force-pushed the private/jmaksymczuk/fake_hpu_cpu branch from 923b070 to 4d08172 Compare September 6, 2024 14:01

jmaksymczuk added 4 commits September 6, 2024 17:16

format

b7beb49

Merge branch 'habana_main' into private/jmaksymczuk/fake_hpu_cpu

4e957d4

Missing bracket.

e9c1064

Refactor code.

91657ec

jmaksymczuk changed the title ~~[WIP] Add fake HPU mode to Habana components with dummy habana_frameworks module.~~ Add fake HPU mode to Habana components with dummy habana_frameworks module. Sep 6, 2024

jmaksymczuk added 2 commits September 9, 2024 12:26

format

3f1c973

Fix model runner, format.

0d9dff6

jmaksymczuk requested a review from madamczykhabana September 9, 2024 12:32

madamczykhabana requested changes Sep 10, 2024

View reviewed changes

Merge remote-tracking branch 'origin/habana_main' into private/jmaksy…

73f213a

…mczuk/fake_hpu_cpu

jmaksymczuk force-pushed the private/jmaksymczuk/fake_hpu_cpu branch from 1e4d079 to 73f213a Compare September 11, 2024 12:50

jmaksymczuk added 2 commits September 11, 2024 17:27

Remove --fake_hpu, is_fake_hpu and cpu migration depends on VLLM_USE_…

1d9fd69

…FAKE_HPU flag.

format

d4efdba

Merge habana_main into private/jmaksymczuk/fake_hpu_cpu.

a0f9f3c

jmaksymczuk requested a review from madamczykhabana September 12, 2024 07:56

madamczykhabana requested changes Sep 16, 2024

View reviewed changes

jmaksymczuk added 6 commits September 16, 2024 13:13

Dummy modules based on MagicMock - improves visibility.

0c79630

Remove failing prompt - text formatting.

88efc02

Rephrase one prompt that generated weirdly formatted output.

5864c3a

prompts

7633c4d

format

1b034d7

Merge branch 'habana_main' into private/jmaksymczuk/fake_hpu_cpu

88611af

madamczykhabana requested changes Sep 16, 2024

View reviewed changes

jmaksymczuk added 2 commits September 17, 2024 13:01

Create needed dummy modules automatically, add comments.

b414ffb

format

8d01b78

madamczykhabana approved these changes Sep 17, 2024

View reviewed changes

madamczykhabana merged commit a9de5ba into habana_main Sep 17, 2024
14 checks passed

zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024

Add fake HPU mode to Habana components with dummy habana_frameworks m…

e6d5f7a

…odule. (HabanaAI#250) Co-authored-by: Konrad Zawora <kzawora@habana.ai>

zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024

Add fake HPU mode to Habana components with dummy habana_frameworks m…

e345072

…odule. (HabanaAI#250) Co-authored-by: Konrad Zawora <kzawora@habana.ai>

kzawora-intel added a commit that referenced this pull request Sep 23, 2024

Revert "Add fake HPU mode to Habana components with dummy habana_fram…

c562b02

…eworks module. (#250)" This reverts commit a9de5ba.

This was referenced Sep 23, 2024

Prune habana_main for upstreaming #321

Merged

[DO NOT MERGE] Upstream test PR #322

Closed

jmaksymczuk deleted the private/jmaksymczuk/fake_hpu_cpu branch October 7, 2024 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fake HPU mode to Habana components with dummy habana_frameworks module. #250

Add fake HPU mode to Habana components with dummy habana_frameworks module. #250

jmaksymczuk commented Sep 6, 2024 •

edited

Loading

jmaksymczuk commented Sep 6, 2024

madamczykhabana Sep 10, 2024

jmaksymczuk Sep 11, 2024

madamczykhabana Sep 10, 2024

jmaksymczuk Sep 11, 2024

madamczykhabana Sep 16, 2024

jmaksymczuk Sep 16, 2024

madamczykhabana Sep 16, 2024

madamczykhabana Sep 10, 2024

jmaksymczuk Sep 10, 2024

madamczykhabana Sep 10, 2024

jmaksymczuk Sep 10, 2024

madamczykhabana Sep 10, 2024

jmaksymczuk Sep 10, 2024

madamczykhabana Sep 10, 2024

jmaksymczuk Sep 10, 2024

jmaksymczuk commented Sep 10, 2024

jmaksymczuk commented Sep 11, 2024

jmaksymczuk commented Sep 12, 2024

madamczykhabana Sep 16, 2024

jmaksymczuk commented Sep 16, 2024

madamczykhabana Sep 16, 2024

jmaksymczuk commented Sep 17, 2024

madamczykhabana left a comment

Add fake HPU mode to Habana components with dummy habana_frameworks module. #250

Add fake HPU mode to Habana components with dummy habana_frameworks module. #250

Conversation

jmaksymczuk commented Sep 6, 2024 • edited Loading

jmaksymczuk commented Sep 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmaksymczuk commented Sep 10, 2024

jmaksymczuk commented Sep 11, 2024

jmaksymczuk commented Sep 12, 2024

Choose a reason for hiding this comment

jmaksymczuk commented Sep 16, 2024

Choose a reason for hiding this comment

jmaksymczuk commented Sep 17, 2024

madamczykhabana left a comment

Choose a reason for hiding this comment

jmaksymczuk commented Sep 6, 2024 •

edited

Loading