- [STDNet] Spatiotemporal Dilated Convolution with Uncertain Matching for Video-based Crowd Estimation (TMM) [paper]
- [EPF] Estimating People Flows to Better Count them in Crowded Scenes (ECCV) [paper]
- [MLSTN] Multi-level feature fusion based Locality-Constrained Spatial Transformer network for video crowd counting (Neurocomputing) [paper](extension of LSTN)
- [LSTN] Locality-Constrained Spatial Transformer Network for Video Crowd Counting (ICME(oral)) [paper]
- Fast Video Crowd Counting with a Temporal Aware Network [paper]
- [ConvLSTM] Spatiotemporal Modeling for Crowd Counting in Videos (ICCV) [paper]