Long-tailed Object Recognition - Learning from Imbalanced Image Data

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

Recent advancements in computer vision enable machines to perform tasks like image classification, object detection, and instance segmentation with high performance. However, the performance of deployed models is strongly affected by the availability and quality of data used for training the models. Vast, curated image datasets have facilitated the creation of robust models, but big datasets are difficult to make and cost a lot of money and effort. Moreover, in the real world, big data is not only difficult to acquire but it is also imbalanced as a result of Zipf’s law that makes the classes of the dataset long-tailed. In other words, real-world datasets have a distribution where few frequent classes contain many samples, and many rare classes contain only a limited amount of samples. When models are trained on long-tailed datasets, their performance is adequate for the frequent classes but it is very low for the rare classes as a result of the limited variance of samples and overfitting. As a consequence, the models trained with imbalanced data, work well only for the frequent classes and they cannot generalize for the rare classes, making them unreliable and unusable in realistic applications.

This dissertation tackles this problem, by developing novel algorithms to solve long-tailed image classification, detection, and instance segmentation. The main challenge is how to learn from the classes that have only a handful of samples. In the past, this was tackled by resampling the rare classes, i.e. highly re-using the same samples containing the rare classes, or by reweighting the rare classes, i.e. by putting higher misclassification cost for the samples containing the rare classes. Even though these techniques are efficient, they cannot fully solve the long-tailed recognition problem, because class-imbalance is not static, it changes drastically according to sampling.

For example, in long-tailed object detection, there could be an arbitrary number of foreground classes inside an image. When resampling is used at the image level, it does not guarantee an equal sampling rate for all classes, because detection images are scene-centric and the images containing rare classes might come hand-in-hand with frequent classes. When resampling is used at the object level, then it is possible to have equal samples for all classes, however, object-level sampling destroys the context of an image and might reduce the performance. When re-weighting is used, the rare classes might be under-weighted when the image contains a highly imbalanced set of objects or over-weighted when the image contains a fairly balanced set of objects. For these reasons, these techniques cannot fully reduce the class imbalance.

In contrast to them, this dissertation adopts a different approach to addressing the long-tailed image recognition issue and investigates the activation function, a crucial component of deep learning models. The activation function embeds a prior belief of how the data is distributed inside the model’s architecture, therefore, it allows the model to learn more efficiently and converge faster.

The main idea is to, firstly, understand the impact of the activation function used in long-tailed classification and, secondly, enhance it and make it robust to the long-tail data distribution. This is realized using three novel algorithms including Inverse Image Frequency (IIF), Gumbel Optimised Loss (GOL), and Gumbel Channel Attention (GCA). First, IIF is a novel multiplicative margin adjustment method that enforces dataset-dependent weights in the Softmax activation. These weights are inversely proportional to the class frequencies of the dataset, thus when they are used in imbalanced learning, they put greater emphasis on the rare classes and increase the rare class performance. What makes IIF different from previous margin adjustment methods is that IIF uses a multiplicative transformation rather than an additive one, which is better for object detection, as this task is highly imbalanced, and decreases the false positive predictions. IIF achieves excellent performance in long-tailed classification, long-tailed instance segmentation, and object detection tasks and surpasses the state-of-the-art in various benchmarks using a plethora of architectures and backbones. Even though IIF is efficient for both imbalanced classification and detection, its multiplicative margins are static, and this limits its performance in highly imbalanced tasks such as imbalanced instance segmentation and object detection. For this reason, in the second chapter, we propose GOL and the novel Gumbel activation function as an alternative to the conventional Sigmoid or Softmax Activation functions and it can dynamically tackle the class imbalance. First, through the Kolmogorov-Smirnov test, we show that logit distributions of the classes in long-tailed object detection, align better with the Gumbel distribution rather than the Logistic distribution. Based on that, we develop the Gumbel activation which is the cumulative distribution function of the Gumbel distribution. Gumbel is an asymmetric activation function that is applied in the classification layer of the neural network to predict the classes. It aligns with the long-tailed object distribution and enhances long-tail learning by producing exponential gradients for the rare classes, allowing the model to learn them efficiently. Also, it can be easily combined with previous long-tailed instance segmentation models, various backbones, and architectures and surpass the state-of-the-art in long-tailed detection and classification benchmarks.

IIF and GOL tackle the imbalance only in the classification layer. However, as we show in Chapter 5, the imbalance affects not only the classification layer but also intermediate layers such as the channel attention layer. In more detail, we show that channel attention models trained with balanced data produce attention signals that show higher variance than attention models trained with imbalanced data. Also, we show that attention models, trained with imbalanced data, produce low entropy reweighting signals that resemble all-pass filters and do not affect the rare class features.

These observations suggest that channel attention is not robust for the case of imbalanced training and to tackle this, we propose the Gumbel Channel Attention (GCA) module. GCA extends the use of Gumbel activation in the channel attention networks. In practice, GCA promotes the low channel responses and it efficiently retrieves better rare class descriptors compared to Sigmoid channel attention, as shown quantitatively. As are result, it significantly enhances the rare class accuracy of many classification models in various long-tailed classification benchmarks as shown in our experiments.

After introducing IIF, GOL, and GCA, we make combinations of these methods, first, to understand how they generalize and second, to further boost the state-of-the-art. Our combinatorial experiments show promising results and they further advance the performance in both long-tailed classification and long-tailed instance segmentation.

In each technical chapter, the problem is introduced, followed by a short comparison to previous works, the methodology, the experimental results, the analysis, and the conclusion. Finally, the dissertation concludes all the work and suggests possible new directions and potential applications.
Date of Award1 Jun 2024
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorShan Luo (Supervisor) & Anh Nguyen (Supervisor)

Cite this

'