-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When oneDNN is enabled, an unoptimized matmul is called by Tensorflow on aarch64 #168
Comments
Hi @lilh9598, Yes you're entirely correct - there is currently no ACL acceleration for The integration of mutmul support via ACL was motivated by the ops used for a small number of use cases which we had been exposed to, the BERT-large network from MLCommons for example. This lead to the integration of ACL matmul support via Integration into One issue with using ACL for sgemm support though, is that ACL is not a "BLAS" library, and does not provide all the functionality expected in one. In particular, there's no support for transpose at present. So supporting Could I ask what models you're interested in, where you think the |
Hi @nSircombe , Thanks for your detailed reply!
I think there is a same problem in the transformer block. In it's structure, the subsequent node of matmul(with constant weight) is not always biasadd, which could not be converted into _fusedmatmul(acl opt). Maybe the unoptimized matmul can be solved through graph optimization in TensorFlow? |
Hi @lilh9598, Actually one of the Team has been looking at transformer models, and managed to spend a little time looking at the 'missing' |
Hi @nSircombe , Look really good! I am looking forward to applying your new experiments to my cases. |
Hi @lilh9598, |
Hi @nSircombe , And Is this patch enabled in the image “armswdev/tensorflow-arm-neoverse:latest”? I would like to code a python program for further performance profile. |
Could you give more details about what you're running? OneDNN verbose logs would be handy - might help us spot missed opportunities to call ACL or highlight shapes where ACL performance is worse than expected. This patch is not yet in the dockerhub images - they're updated from this repo once a month. The next update is due next week. |
Our application is very complex, so I need extract the ml part to analyze first. And the bottleneck of that may be bandwidth. |
Ok @lilh9598 - would be interested to hear what you find. On the matmul front though, just a redacted version of the stdout log generated with the environment variable Look forward to hearing from you. ...I'll leave this issue open for now. |
The code for calling the matmul of oneDNN in tensorflow.
The code of dnnl_sgemm in oneDNN
From the above code, we can see that if cblas is not enabled, an unoptimized matmul will be called by Tensorflow on aarch64, which will cause performance degradation. So, I think a fully optimized matmul of acl should be added to dnnl_sgemm to make full use of aarch64‘s isa and improve performance of mkl_matmul(a tf op) on aarch64.
The text was updated successfully, but these errors were encountered: