[ 中文 ]
The Graphcore® C600 IPU-Processor PCIe Card is a high-performance acceleration server card targeted for machine learning inference and training. Powered by the Graphcore Mk2 IPU Processor with FP8 support, the C600 is a dual-slot, full height PCI Express Gen4 card designed for mounting in industry standard server chassis to accelerate machine intelligence workloads.
Up to eight C600 IPU-Processor PCIe Cards can be networked together using IPU-Link™ high-bandwidth interconnect cables, delivering enhanced IPU compute capability.
Name | Description |
---|---|
IPU Processor | Graphcore Mk2 IPU Processor with FP8 support |
IPU-Cores™ | 1,472 IPU-Cores, each one a high-performance processor capable of multi-thread, independent code execution |
In-Processor Memory™ | Each IPU-Core is paired with fast, local, tightly-coupled In-Processor Memory. The C600 accelerator includes 900MB of In-Processor Memory |
Compute | Up to 560 teraFLOPS of FP8 compute Up to 280 teraFLOPS of FP16 compute Up to 70 teraFLOPS of FP32 compute |
System Interface | Dual PCIe Gen4 8-lane interfaces |
Thermal Solution | Passive |
Form Factor | PCIe full-height/length; double-slot |
System Dimensions | Length: 267mm (10.50”); Height: 111mm (4.37”); Width: 27.6mm (1.09”); Mass: 1.27kg (2.8lbs) |
IPU-Link™ | Support 32 lanes, 128 GB/s bandwidth (64 GB/s in each direction) IPU-Links |
TDP | 185W |
Auxiliary Power Supply | 8-pin |
Quality Level | Server grade |
For more information of the Graphcore® C600, please refer to C600 cards.
PopRT is a high-performance inference framework specifically for Graphcore IPUs. It is responsible for deeply optimizing the trained models, generating executable programs that can run on the Graphcore IPUs, and performing low-latency, high-throughput inference.
You can get PopRT and related documents from graphcore/PopRT.
Docker images are provided at graphcorecn/poprt.
Model name | Precision | QPS | Dataset | Metric name | Metric value | report |
---|---|---|---|---|---|---|
albert-torch-fp32 | FP16 | 3,280 | Open Squad 1.1 | F1 Score | 87.69675 | report |
bert-torch-fp32 | FP8 | 4,464 | Open Squad 1.1 | F1 Score | 85.71465 | report |
bert-torch-fp32 | FP16 | 3,134 | Open Squad 1.1 | F1 Score | 85.85797 | report |
clip-onnx-fp32 | FP16 | 7,305 | Fake Dataset | Mean Diff | 0.00426 | report |
conformer-encoder-onnx-fp32 | FP16 | 9,341 | Fake Dataset | Mean Diff | 0.00161 | report |
deberta-torch-fp32 | FP16 | 1,702 | Open Squad 1.1 | F1 Score | 81.24629 | report |
resnet50-torch-fp32 | FP8 | 18,851 | Open Imagenet | Top-1 | 0.76824 | report |
resnet50-torch-fp32 | FP16 | 13,499 | Open Imagenet | Top-1 | 0.76963 | report |
roberta-torch-fp32 | FP16 | 3,088 | Open Squad 1.1 | F1 Score | 83.1606 | report |
roformer-tf-fp32 | FP16 | 2,520 | OPEN_CAIL2019 | Top-1 | 0.64323 | report |
swin-large-torch-fp32 | FP8 | 480 | Open Imagenet | Top-1 | 0.8552 | report |
swin-large-torch-fp32 | FP16 | 315 | Open Imagenet | Top-1 | 0.8536 | report |
videobert-onnx-fp32 | FP16 | 3,691 | OPEN_CIFAR | Top-1 | 0.6169 | report |
widedeep-tf-fp32 | FP16 | 31,446,195 | Open Criteo Kaggle | Top-1 | 0.77392 | report |
wget -O 'poplar_sdk-ubuntu_20_04-3.3.0-208993bbb7.tar.gz' 'https://downloads.graphcore.ai/direct?package=poplar-poplar_sdk_ubuntu_20_04_3.3.0_208993bbb7-3.3.0&file=poplar_sdk-ubuntu_20_04-3.3.0-208993bbb7.tar.gz'
tar xzf poplar_sdk-ubuntu_20_04-3.3.0-208993bbb7.tar.gz
source poplar_sdk-ubuntu_20_04-3.3.0+1403-208993bbb7/enable
docker pull graphcorecn/poprt:1.4.0
gc-docker -- -it \
-v `pwd -P`:/workspace \
-w /workspace \
--entrypoint /bin/bash \
graphcorecn/poprt:1.4.0
apt-get update && \
apt-get install wget libglib2.0-0 -y
For example,
python3 launch.py --task widedeep-tf-fp32 --hardware IPU
For more information of the command to run the task, please refer to ByteMLPerf.