Title

AUGMENTING CLIP-BASED FEW SHOT RECOGNITION USING SHAPES AND CONCEPTS

Abstract

Abstract

Contemporary deep learning models, such as CLIP, exhibit strong zero-shot recognition performance on a broad range of tasks. However, they still substantially benefit from limited supervision in few-shot regimes. This thesis investigates ways to inject external knowledge to make few-shot adaptation models more data-efficient and/or interpretable. We start with studying the role of shape-only representations in object recognition and show that they are more data-efficient than raw RGB representations. Motivated by these findings, and texture versus shape bias literature, we propose \emph{v1-shape}, augmenting CLIP-based few shot recognition with an additional shape-conditioned branch, yielding modest gains. We also introduce \emph{v1-concept}, a CLIP-based concept bottleneck model encouraged to base its decisions on more general semantic concepts, improving few-shot accuracy under many settings. Finally, we explore a CLIP adaptation approach blending zero-shot and linear probe logits adaptively during inference.

Supervisor(s)

Supervisor(s)

Erce Guder

Date and Location

Date and Location

2025-08-25 09:00:00

Category

Category

MSc_Thesis