King's College London

Research portal

Point in, Box out: Beyond Counting Persons in Crowds

Research output: Contribution to journalConference paperpeer-review

Yuting Liu, Miaojing Shi, Qijun Zhao, Xiaofang Wang

Original languageEnglish
Pages (from-to)6462-6471
Journal IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Early online date9 Jan 2019
E-pub ahead of print9 Jan 2019


  • Point in, Box out_LIU_Epub1Jan2020_GREEN AAM

    liu2019point.pdf, 2.32 MB, application/pdf

    Uploaded date:11 May 2020

    Version:Accepted author manuscript

    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

King's Authors


Modern crowd counting methods usually employ deep neural networks (DNN) to estimate crowd counts via density regression. Despite their significant improvements, the regression-based methods are incapable of providing the detection of individuals in crowds. The detection-based methods, on the other hand, have not been largely explored in recent trends of crowd counting due to the needs for expensive bounding box annotations. In this work, we instead propose a new deep detection network with only point supervision required. It can simultaneously detect the size and location of human heads and count them in crowds. We first mine useful person size information from point-level annotations and initialize the pseudo ground truth bounding boxes. An online updating scheme is introduced to refine the pseudo ground truth during training; while a locally-constrained regression loss is designed to provide additional constraints on the size of the predicted boxes in a local neighborhood. In the end, we propose a curriculum learning strategy to train the network from images of relatively accurate and easy pseudo ground truth first. Extensive experiments are conducted in both detection and counting tasks on several standard benchmarks, e.g. ShanghaiTech, UCF_CC_50, WiderFace, and TRANCOS datasets, and the results show the superiority of our method over the state-of-the-art.

Download statistics

No data available

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454