Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hardware][CPU] Add ARM CPU backend #9957

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ShawnD200
Copy link
Contributor

@ShawnD200 ShawnD200 commented Nov 2, 2024

Add ARM CPU backend support

(1) Based on the groundbreaking x86 CPU backend support, implemented the vector types on ARM neon
(2) Added tested cases for explicit implementation verification

# cd tests/cpu
# pip install .
# pytest . -v

Tested on MacOS with M2 Pro chip with docker image build, should work on generic ARM64/Linux.

Future work:

  • Built from source on bare-metal MacOS
  • Improve performance with advanced SIMD, and a lot other stuff
  • Test on ARM/Linux and with more models

Tested simple models, just works. Have fun. Feedback is welcome.

Signed-off-by: Shawn Du <shawnd200@outlook.com>
Copy link

github-actions bot commented Nov 2, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation ci/build labels Nov 2, 2024
Signed-off-by: Shawn Du <shawnd200@outlook.com>
@mgoin mgoin changed the title [Hardward][CPU] Add ARM CPU backend [Hardware][CPU] Add ARM CPU backend Nov 12, 2024
Copy link

mergify bot commented Nov 12, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ShawnD200.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing arm64 support! The cpu kernel tests look pretty good! Just have some questions.

(I don't have an arm64 device. See if others having arm64 can run the test on native Linux)

Comment on lines 4 to 5
torch == 2.4.0
torchvision # required for the image processor of phi3v, this must be updated alongside torch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I haven't checked through the versions so that I don't know which is supported or not. This torch version is just copied from the x86 CPU backend. This could be future work.

Comment on lines 15 to 19
// FIXME: FP16 is not fully supported in Torch-CPU
#define VLLM_DISPATCH_CASE_FLOATING_TYPES(...) \
AT_DISPATCH_CASE(at::ScalarType::Float, __VA_ARGS__) \
AT_DISPATCH_CASE(at::ScalarType::BFloat16, __VA_ARGS__)

Copy link
Collaborator

@Isotr0py Isotr0py Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since FP16 is not supported on arm, we might need to add a fallback to BF16 for arm CPU if user specified dtype="float16". (Just like Intel CPU before)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. X86 already added the switch case for Half type, ARM needs to add it too. This brings to a general question: how to automatically serve all ISA's, as a starter, those data types should be interfaces. I will add this to future work too. Thank you so much.

@mergify mergify bot removed the needs-rebase label Nov 14, 2024
2. Use compound vector types
3. Refactor cpu_types
4. Improve fma and storeFP32
5. Clean up

Signed-off-by: Shawn Du <shawnd200@outlook.com>
@ShawnD200
Copy link
Contributor Author

Pushed a few updates:

Inspired by @Isotr0py, I realized there is tremendous complexity induced by combination of hardware supports and model inference features. Therefore, I think maybe it requires a generic layer that can abstract away the underlying hardware differences. To that end, here comes CPU types interfaces which can be implemented by various ISAs and features within ISA. I haven't sorted many things out; your feedback is so desired.

BTW, before @mgoin introduced #9228 to me, which already implemented ARM CPU backend basically the same way, I had absolutely no idea of that PR. I started this work with a very simple goal that is to run vLLM on my macbook. Kudos to @sanketkaleoss, I learnt a lot from your work. Thanks.

@ShawnD200
Copy link
Contributor Author

@mgoin @bigPYJ1151 @tlrmchlsmth @sanketkaleoss @Isotr0py, your feedback would be very helpful. Thanks.

Signed-off-by: Shawn Du <shawnd200@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants