Diversifying search results is a common mechanism in information retrieval to satisfy more users by surfacing documents that address different possible intentions of users. It generates a result list that is both relevant and diverse as possible as when ambiguous and broad queries appear. In this thesis, we first address search result diversification as a useful method to support search as learning, since diversification ensures to cover all possible aspects of the query in the final ranking. We argue that, in a search engine for the education domain, it is appropriate to diversify results across multiple dimensions, including the suitability of the content for different education levels and the type of the document in addition to topical ambiguity. We introduce a framework that extends the probabilistic and supervised methods for diversification that can consider the aspects of multiple independent dimensions during ranking, and demonstrate its effectiveness on a newly developed test collection.
Secondly, we propose three different frameworks that exploit supervised learning methods to improve the effectiveness of explicit search result diversification which presume that query aspects are known during diversification. We also, for the first time in the literature, propose to learn the importance of aspects by leveraging query performance predictors (QPPs). We conduct our exhaustive experiments on a commonly used benchmark dataset and show that explicit diversification performance can be considerably improved using supervised learning methods without requiring large training sets or high computing capabilities.
Finally, we examine the impact of static index pruning on diversification performance. We introduce two novel strategies that take into account the topical diversity of documents and preserve documents relevant to different aspects while pruning the index. We show that our proposed pruning strategies outperform the existing approaches in terms of diversification measures.