Deep learning brought considerable improvements to computer vision, especially in recognition problems such as image classification, object detection, semantic segmentation, instance segmentation, and keypoint detection. These problems have critical applications in the real world, especially in the search, social media, and surveillance domains. Unfortunately, there is still a remarkable accuracy gap between research datasets and real-world deployments caused by data distribution disparity. In particular, most detection methods have a noticeable accuracy drop on datasets with long-tailed distributions due to the bias towards frequent classes.
This thesis describes PriorBox, which learns calibration factors for long-tail datasets utilizing class distributions and a simple convolutional neural network. Since PriorBox uses easy-to-collect distributional and spatial priors, it does not introduce any data collection steps. Furthermore, the proposed method does not include typical class-rebalancing and loss manipulation strategies and works well with the existing object detection and instance segmentation models. Simple distributional class priors, such as the number of instances, size and aspect ratio are shown to be helpful for improving detection results on rare classes without a significant impact on the inference speed. We thoroughly evaluate the approach on the LVIS dataset using the Mask R-CNN baseline on long-tail object detection and instance segmentation tasks.