King's College London

Research portal

Training object detectors from few weakly-labeled and many unlabeled images

Research output: Contribution to journalArticlepeer-review

Zhaohui Yang, Miaojing Shi, Chao Xu, Vittorio Ferrari, Yannis Avrithis

Original languageEnglish
Article number108164
PublishedDec 2021

Bibliographical note

Funding Information: This work was partially supported by the National Natural Science Foundation of China(NSFC) under Grant No. 61828602 and 61876007. Funding Information: Vittorio Ferrari is a Senior Staff Research Scientist at Google, where he leads a research group on visual learning. He received his PhD from ETH Zurich in 2004, then was a post-doc at INRIA Grenoble (2006–2007) and at the University of Oxford (2007–2008). Between 2008 and 2012 he was an Assistant Professor at ETH Zurich, funded by a Swiss National Science Foundation Professorship grant. In 2012–2018 he was faculty at the University of Edinburgh, where he became a Full Professor in 2016 (now a Honorary Professor). In 2012 he re-ceived the prestigious ERC Starting Grant, and the best paper award from the European Conference in Computer Vision. He is the author of over 120 technical publications. He regularly serves as an Area Chair for the major computer vision conferences, he was a Program Chair for ECCV 2018 and a General Chair for ECCV 2020. He is an Associate Editor of the International Journal of Computer Vision, and formerly of IEEE Pattern Analysis and Machine Intelligence. His current research interests are in learning visual models with minimal human supervision, human-machine collaboration, and 3D Deep Learning. Publisher Copyright: © 2021 Elsevier Ltd Copyright: Copyright 2021 Elsevier B.V., All rights reserved.

King's Authors


Weakly-supervised object detection attempts to limit the amount of supervision by dispensing the need for bounding boxes, but still assumes image-level labels on the entire training set. In this work, we study the problem of training an object detector from one or few images with image-level labels and a larger set of completely unlabeled images. This is an extreme case of semi-supervised learning where the labeled data are not enough to bootstrap the learning of a detector. Our solution is to train a weakly-supervised student detector model from image-level pseudo-labels generated on the unlabeled set by a teacher classifier model, bootstrapped by region-level similarities to labeled images. Building upon the recent representative weakly-supervised pipeline PCL [1], our method can use more unlabeled images to achieve performance competitive or superior to many recent weakly-supervised detection solutions. Code will be made available at

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454