Abstract
Abstract
Object detection is a fundamental computer vision task that focuses on identifying
and locating objects in an image. This process involves predicting both the classes
of the objects and their bounding box coordinates. Due to its multi-task nature, as
it includes classification and localization, optimizing the performance of an object
detector remains a non-trivial problem to solve. Ranking-based loss functions, such
as Average Precision Loss, Average Localization-Recall-Precision Loss and Rank &
Sort Loss, outperform commonly used score-based losses in object detection. These
loss functions better align with the evaluation criteria, have fewer hyperparameters,
and offer robustness against the imbalance between positive and negative classes.
However, they require pairwise comparisons among P positive and N negative pre-
dictions, introducing a time complexity of O(P N ), which is prohibitive since N is
often large (e.g., 10^8 in the ATSS object detector). Despite their advantages, the
widespread adoption of ranking-based losses has been hindered by their high time
and space complexities.
In this thesis, we focus on improving the efficiency of ranking-based loss functions.
To this end, we propose Bucketed Ranking-based Losses which group negative pre-
dictions into B buckets (B ≪ N ) in order to reduce the number of pairwise compar-
isons so that the time complexity can be reduced. Thanks to bucketing, our method
reduces the time complexity to O(max(N log(N ), P^2)). To validate our method and
demonstrate its generality, we conducted experiments on two different tasks, three
different datasets, seven different detectors. We show that Bucketed Ranking-based
(BR) Losses yield the same accuracy with their unbucketed versions and provide 2x
faster training on average. We also train, for the first time, transformer-based ob-
ject detectors using ranking-based losses, thanks to the efficiency of our BR. When
we train CoDETR, a state-of-the-art transformer-based object detector, using our BR
Loss, we consistently outperform its original results over several different backbones.