-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hardware][CPU] Add ARM CPU backend #9957
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Shawn Du <shawnd200@outlook.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Shawn Du <shawnd200@outlook.com>
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing arm64 support! The cpu kernel tests look pretty good! Just have some questions.
(I don't have an arm64 device. See if others having arm64 can run the test on native Linux)
requirements-arm.txt
Outdated
torch == 2.4.0 | ||
torchvision # required for the image processor of phi3v, this must be updated alongside torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use torch
2.5.0 or higher here? PyTorch have aarch64
distribution for 2.5.1: https://download.pytorch.org/whl/cpu/torch-2.5.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl#sha256=269b10c34430aa8e9643dbe035dc525c4a9b1d671cd3dbc8ecbcaed280ae322d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I haven't checked through the versions so that I don't know which is supported or not. This torch version is just copied from the x86 CPU backend. This could be future work.
csrc/cpu/cpu_types_arm.hpp
Outdated
// FIXME: FP16 is not fully supported in Torch-CPU | ||
#define VLLM_DISPATCH_CASE_FLOATING_TYPES(...) \ | ||
AT_DISPATCH_CASE(at::ScalarType::Float, __VA_ARGS__) \ | ||
AT_DISPATCH_CASE(at::ScalarType::BFloat16, __VA_ARGS__) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since FP16 is not supported on arm, we might need to add a fallback to BF16 for arm CPU if user specified dtype="float16"
. (Just like Intel CPU before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point. X86 already added the switch case for Half type, ARM needs to add it too. This brings to a general question: how to automatically serve all ISA's, as a starter, those data types should be interfaces. I will add this to future work too. Thank you so much.
2. Use compound vector types 3. Refactor cpu_types 4. Improve fma and storeFP32 5. Clean up Signed-off-by: Shawn Du <shawnd200@outlook.com>
Pushed a few updates: Inspired by @Isotr0py, I realized there is tremendous complexity induced by combination of hardware supports and model inference features. Therefore, I think maybe it requires a generic layer that can abstract away the underlying hardware differences. To that end, here comes CPU types interfaces which can be implemented by various ISAs and features within ISA. I haven't sorted many things out; your feedback is so desired. BTW, before @mgoin introduced #9228 to me, which already implemented ARM CPU backend basically the same way, I had absolutely no idea of that PR. I started this work with a very simple goal that is to run vLLM on my macbook. Kudos to @sanketkaleoss, I learnt a lot from your work. Thanks. |
@mgoin @bigPYJ1151 @tlrmchlsmth @sanketkaleoss @Isotr0py, your feedback would be very helpful. Thanks. |
1209280
to
856d93d
Compare
Signed-off-by: Shawn Du <shawnd200@outlook.com>
856d93d
to
8d554d6
Compare
Add ARM CPU backend support
(1) Based on the groundbreaking x86 CPU backend support, implemented the vector types on ARM neon
(2) Added tested cases for explicit implementation verification
Tested on MacOS with M2 Pro chip with docker image build, should work on generic ARM64/Linux.
Future work:
Tested simple models, just works. Have fun. Feedback is welcome.