Title

SELF-SUPERVISED LEARNING FOR UNSUPERVISED IMAGE CLASSIFICATION AND SUPERVISED LOCALIZATION TASKS

Abstract

Abstract

Recent self-supervised learning methods, where instance discrimination task is a fundamental way of pretraining convolutional neural networks (CNN), excel in transfer learning performance. Even though instance discrimination task is a well suited pretraining method for classification with its image-level learning, lack of dense representation learning makes it sub-optimal for localization tasks such as object detection. In the first part of this thesis, we aim to mitigate this shortcoming of instance discrimination task by extending it to jointly learn dense representations alongside image-level representations. We add a segmentation branch parallel to the image-level learning to predict class-agnostic masks, enhancing location-awareness of the representations. We show the effectiveness of our pretraining approach on localization tasks by transferring the learned representations to object detection and segmentation tasks, providing relative improvements by up to 1.7% AP on PASCAL VOC and 0.8% AP on COCO object detection, 0.8% AP on COCO instance segmentation and 3.6% mIoU on PASCAL VOC semantic segmentation respectively.

In recent years, Vision Transformers (ViTs) have significantly enhanced the capabilities of deep learning models, boosting performance in traditional computer vision tasks and driving substantial progress in self-supervised learning methodologies. In the second part of this thesis, we focus on the improvements provided by ViTs and propose UCLS, an unsupervised image classification framework that leverages the improved feature representation quality and superior nearest neighbor performance of self-supervised ViTs. We incrementally enhance baseline methods for unsupervised image classification and further propose the use of a cluster ensembling methodology and a self-training step to optimize the utilization of multi-head classifiers. We validate the effectiveness of our framework through extensive experimentation and demonstrate that UCLS achieves state-of-the-art performance on nine image classification benchmarks in unsupervised settings, achieving a clustering accuracy of 99.3% on CIFAR10, 89% on CIFAR100 and being the first to surpass the 70% barrier on ImageNet in an unsupervised context.

Supervisor(s)

Supervisor(s)

MELIH BAYDAR

Date and Location

Date and Location

2024-07-26 14:00:00

Category

Category

PhD_Thesis