Abstract
Abstract
The deployment of Large Language Models (LLMs) in knowledge-intensive domains is currently challenged by two critical bottlenecks: the computational cost of inference and the scarcity of labeled data for domain-specific retrieval. This thesis presents two frameworks to address these challenges through architectural decomposition and data-efficient learning, optimizing the entire pipeline from information access to reasoning.
First, to optimize the retrieval stage, we introduce a label-efficient approach that integrates active learning with LLM-based synthetic data generation. By identifying and labeling only the most informative queries, this method significantly enhances unsupervised dense retrievers without requiring expensive manual annotation. Second, to optimize the knowledge-intensive reasoning stage, we propose a modular distillation framework that factorizes the cognitive process of Large Language Models into three specialized components: an \textit{Analyzer} for problem decomposition, an \textit{Informant} for context generation, and a \textit{Reasoner} for logical synthesis. This structural separation enables compact student models to emulate the complex reasoning capabilities of massive teacher models with superior efficiency and interpretability.
Extensive evaluations across the BEIR, OBQA, StrategyQA, and MedQA benchmarks demonstrate that these approachs outperform monolithic baselines and achieves performance competitive with fully supervised, resource-intensive systems. By analyzing data-efficient retrieval and modular reasoning, this work offers a scalable, transparent, and practical solutions for deploying advanced knowledge systems in resource-constrained environments.