Title

Refurbished Embeddings with Integrated Guidance Networks for Efficient Context-Length Scaling

Abstract

Abstract

This thesis introduces REIGN (Refurbished Embeddings with Integrated Guidance Networks), a novel framework for efficient representation learning of long-form textual documents. Unlike traditional Transformer-based approaches limited by token length constraints, REIGN leverages a hierarchical strategy where pre-trained Guidance Networks (GNs) generate fixed-size chunk embeddings. These chunk-level embeddings are then processed by a lightweight encoder trained using a contrastive learning objective inspired by SimCLR. This decoupled design enables semantic understanding of documents containing hundreds of thousands of tokens without relying on subword tokenization or end-to-end backpropagation through large models. REIGN is benchmarked on synthetic datasets containing long-context documents, demonstrating strong performance in document-level semantic retrieval while maintaining computational efficiency and scalability. Additionally, a caching mechanism is proposed to precompute and reuse GN embeddings, significantly accelerating training and reducing memory overhead. This architecture also results in substantially faster training and fine-tuning by offloading heavy computation to the pre-encoding stage, making REIGN particularly well-suited for resource-constrained or iterative experimentation settings.

Supervisor(s)

Supervisor(s)

ALI DEVRIM EKIN CAVUSOGLU

Date and Location

Date and Location

2025-08-26 09:30:00

Category

Category

MSc_Thesis