[Hardware][CPU] Add ARM CPU backend #9957

ShawnD200 · 2024-11-02T16:14:45Z

Add ARM CPU backend support

(1) Based on the groundbreaking x86 CPU backend support, implemented the vector types on ARM neon
(2) Added tested cases for explicit implementation verification

# cd tests/cpu
# pip install .
# pytest . -v

Tested on MacOS with M2 Pro chip with docker image build, should work on generic ARM64/Linux.

Future work:

Built from source on bare-metal MacOS
Improve performance with advanced SIMD, and a lot other stuff
Test on ARM/Linux and with more models

Tested simple models, just works. Have fun. Feedback is welcome.

Signed-off-by: Shawn Du <shawnd200@outlook.com>

github-actions · 2024-11-02T16:14:56Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Shawn Du <shawnd200@outlook.com>

mergify · 2024-11-12T14:32:12Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ShawnD200.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Isotr0py

Thanks for implementing arm64 support! The cpu kernel tests look pretty good! Just have some questions.

(I don't have an arm64 device. See if others having arm64 can run the test on native Linux)

Isotr0py · 2024-11-12T16:46:22Z

requirements-arm.txt

+torch == 2.4.0
+torchvision  # required for the image processor of phi3v, this must be updated alongside torch


Why can't we use torch 2.5.0 or higher here? PyTorch have aarch64 distribution for 2.5.1: https://download.pytorch.org/whl/cpu/torch-2.5.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl#sha256=269b10c34430aa8e9643dbe035dc525c4a9b1d671cd3dbc8ecbcaed280ae322d

Honestly I haven't checked through the versions so that I don't know which is supported or not. This torch version is just copied from the x86 CPU backend. This could be future work.

Isotr0py · 2024-11-12T16:58:47Z

csrc/cpu/cpu_types_arm.hpp

+// FIXME: FP16 is not fully supported in Torch-CPU
+#define VLLM_DISPATCH_CASE_FLOATING_TYPES(...)                                 \
+  AT_DISPATCH_CASE(at::ScalarType::Float, __VA_ARGS__)                         \
+  AT_DISPATCH_CASE(at::ScalarType::BFloat16, __VA_ARGS__)
+


Since FP16 is not supported on arm, we might need to add a fallback to BF16 for arm CPU if user specified dtype="float16". (Just like Intel CPU before)

Yes, good point. X86 already added the switch case for Half type, ARM needs to add it too. This brings to a general question: how to automatically serve all ISA's, as a starter, those data types should be interfaces. I will add this to future work too. Thank you so much.

2. Use compound vector types 3. Refactor cpu_types 4. Improve fma and storeFP32 5. Clean up Signed-off-by: Shawn Du <shawnd200@outlook.com>

ShawnD200 · 2024-11-18T17:23:18Z

Pushed a few updates:

Inspired by @Isotr0py, I realized there is tremendous complexity induced by combination of hardware supports and model inference features. Therefore, I think maybe it requires a generic layer that can abstract away the underlying hardware differences. To that end, here comes CPU types interfaces which can be implemented by various ISAs and features within ISA. I haven't sorted many things out; your feedback is so desired.

BTW, before @mgoin introduced #9228 to me, which already implemented ARM CPU backend basically the same way, I had absolutely no idea of that PR. I started this work with a very simple goal that is to run vLLM on my macbook. Kudos to @sanketkaleoss, I learnt a lot from your work. Thanks.

ShawnD200 · 2024-11-18T17:27:06Z

@mgoin @bigPYJ1151 @tlrmchlsmth @sanketkaleoss @Isotr0py, your feedback would be very helpful. Thanks.

Signed-off-by: Shawn Du <shawnd200@outlook.com>

[Hardward][CPU] Add ARM CPU backend

2789754

Signed-off-by: Shawn Du <shawnd200@outlook.com>

mergify bot added documentation Improvements or additions to documentation ci/build labels Nov 2, 2024

DarkLight1337 mentioned this pull request Nov 3, 2024

[Installation]: build on arm64 meet a error #9964

Open

1 task

[Doc] Fix references

0350a3d

Signed-off-by: Shawn Du <shawnd200@outlook.com>

mgoin changed the title ~~[Hardward][CPU] Add ARM CPU backend~~ [Hardware][CPU] Add ARM CPU backend Nov 12, 2024

mergify bot added the needs-rebase label Nov 12, 2024

DarkLight1337 requested review from WoosukKwon and Isotr0py November 12, 2024 15:51

Isotr0py reviewed Nov 12, 2024

View reviewed changes

Resolve merge conflicts

c45979f

mergify bot removed the needs-rebase label Nov 14, 2024

ShawnD200 added 2 commits November 19, 2024 00:47

1. Add fp16 support

a346355

2. Use compound vector types 3. Refactor cpu_types 4. Improve fma and storeFP32 5. Clean up Signed-off-by: Shawn Du <shawnd200@outlook.com>

Merge remote-tracking branch 'origin/main' into Add-Arm-CPU-backend

4b2fdef

ShawnD200 requested a review from Isotr0py November 18, 2024 17:23

ShawnD200 force-pushed the Add-Arm-CPU-backend branch from 1209280 to 856d93d Compare November 19, 2024 02:11

Break long lines and adjust imports

8d554d6

Signed-off-by: Shawn Du <shawnd200@outlook.com>

ShawnD200 force-pushed the Add-Arm-CPU-backend branch from 856d93d to 8d554d6 Compare November 19, 2024 02:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hardware][CPU] Add ARM CPU backend #9957

[Hardware][CPU] Add ARM CPU backend #9957

ShawnD200 commented Nov 2, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 2, 2024

mergify bot commented Nov 12, 2024

Isotr0py left a comment

Isotr0py Nov 12, 2024

ShawnD200 Nov 13, 2024

Isotr0py Nov 12, 2024 •

edited

Loading

ShawnD200 Nov 13, 2024

ShawnD200 commented Nov 18, 2024

ShawnD200 commented Nov 18, 2024

		torch == 2.4.0
		torchvision # required for the image processor of phi3v, this must be updated alongside torch

[Hardware][CPU] Add ARM CPU backend #9957

Are you sure you want to change the base?

[Hardware][CPU] Add ARM CPU backend #9957

Conversation

ShawnD200 commented Nov 2, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 2, 2024

mergify bot commented Nov 12, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

Isotr0py Nov 12, 2024

Choose a reason for hiding this comment

ShawnD200 Nov 13, 2024

Choose a reason for hiding this comment

Isotr0py Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

ShawnD200 Nov 13, 2024

Choose a reason for hiding this comment

ShawnD200 commented Nov 18, 2024

ShawnD200 commented Nov 18, 2024

ShawnD200 commented Nov 2, 2024 •

edited by github-actions bot

Loading

Isotr0py Nov 12, 2024 •

edited

Loading