Abstract
Abstract
Professionals working in the legal domain routinely engage with extensive and complex legal passages. The volume and intricacy of these legal passages inevitably require an efficient and high-performance legal information retrieval system and there are emerging studies that integrate Natural Language Processing (NLP) techniques for this purpose. In this thesis, we introduce a dense retrieval system that combines ColBERT architecture with a multilingual BERT backbone, which is fine-tuned with comprehensive legal dataset in order for the model to capture nuanced semantic relationships between long queries and passages. The experiments were carried out to examine the capability of different dense retrieval models and the baseline is presented by traditional sparse retriever BM25. The results show that ColBERTv2 excels in ranking relevant documents given queries in Turkish legal domain, which can be utilized for various legal tasks and further research on Turkish passage retrieval.
Keywords: Information Retrieval, ColBERT, BM25, NLP, Turkish Legal Domain