-
Notifications
You must be signed in to change notification settings - Fork 86
BKM: Use CSRnet to Count Crowded People
It’s a common request to know how many persons in an area, indoor or outdoor. Subsequently understand the distribution map to get more accurate and comprehensive information. This could be critical for making correct decisions in high-risk environment, such as stampede and riot. Among many DNN structures, CSRnet is one of which delivered a state-of-the-art crowd counting tasks.
CSRnet chooses VGG-16 as the front-end, whose output size is 1/8 of the original input size. Then dilated convolution layers are used as the back-end for extracting deeper information and generating a heatmap which is the same size of the front-end. At the final stage, a bilinear interpolation with factor of 8 is used for scaling and making the output the same resolution of the input size.
The above image shows the CSRnet on three images from the ShanghaiTech
dataset. The first line contains the original images; the second line contains the ground truth of the dataset; and the third line is the CSRnet output heatmap. The darker pixels indicate the denser crowded people.
To count the crowd, sum the values of the output matrix of the model.
The author shares the training code at https://github.com/leeyeehoo/CSRNet-pytorch
, in the PyTorch
format. There is no pre-trained model weights. There is another implementation in the Keras
format on https://github.com/Neerajj9/CSRNet-keras
. We use the weight files in our project. It’s also trained on the ShanghaiTech
dataset.
The Intel OpenVINO Model Optimizer (MO) tool can't convert the Keras
model directly, so we first convert it to the tensorflow
format by the h5_to_pb.py
python script. Please note the input keras
path. Then use the MO tools to convert the tensorflow
model (.pb) to IR files as follows:
cd $(OPENVINO_PATH)/deployment_tools/model_optimizer
python3 mo.py --framework tf --input_model <path_to_pb> --input_shape [1,768,1024,3] --output_dir <output_path> --mean_values [123.675,116.28,103.53] --scale_values [58.395,57.12,57.375]
The 2nd and 3rd dimension number in [input_shape]
, which is the input blob height and width. They can be changed to get a balance between performance and accuracy. The higher resolution will give better accuracy but lower the throughput of the inference pipeline.
The [mean_values]
and [scale_values]
settings are decided by the training process. Because we don’t train the model by ourselves, these settings shouldn’t be changed.
Then you can find the converted IR model in the <output_path>
.
The calibration tool in the Intel OpenVINO can convert models of FP32/16 to INT8 to achieve higher performance. There are two modes in the calibration process. We choose the simple mode to get better performance improvement. See also the calibration reference.
cd $(OPENVINO_PATH)/deployment_tools
python3 ./tools/calibrate.py -sm -m <path_to_FP32_IR > -s <path_test_images> -p INT8 -td CPU -e ./inference_engine/lib/intel64/libcpu_extension_avx512.so -o <output_path>
Powered by Open Visual Cloud media and analytics software stacks.