Genomic studies identify genomic loci representing genetic variations, transcription factor occupancy, or histone modification
through next generation sequencing (NGS) technologies. Interpreting these loci requires evaluating them with known genomic and
epigenomic annotations. In this thesis, we develop tools and techniques to assess the functional relevance of set of genomic intervals.
Towards this goal, we first introduce Genomic Loci ANnotation and Enrichment Tool (GLANET) as a comprehensive annotation and
enrichment analysis tool. Input query to GLANET is a set of genomic intervals. GLANET annotates and performs enrichment analysis
on these loci with a rich library that includes: (i) gene-centric regions that encompass their non-coding neighborhood,
(ii) a large collection of regulatory regions from ENCODE, and
(iii) gene sets derived from pathways. As a key feature, users can easily extend this library with new gene sets and genomic intervals.
GLANET implements a sampling-based enrichment test that can account for GC content and/or mappability biases inherent to NGS technologies,
which shows high statistical power and well-controlled Type-I error rate. Other key features of GLANET include assessment of impact of
single nucleotide variants on transcription factor binding sites when input consists of SNPs only and not only exon based but also
regulation based gene set enrichment analysis by considering introns and proximal regions of genes in a gene set. GLANET also allows
joint enrichment analysis for TF binding sites and KEGG pathways. With this option, users can evaluate whether the input set is enriched
concurrently with binding sites of TFs and the genes within a KEGG pathway. This joint enrichment analysis provides a detailed functional
interpretation of the input loci. As a second contribution we designed novel data-driven computational experiments for assessing the power
and Type-I error of enrichment procedures. The data-driven computational experiments render detailed quantitative comparisons of GLANET
with other tools possible. Our results on these computational experiments showcase GLANET’s unique capabilities as well as robustness,
speed and accuracy. Finally, as a third contribution, we present an efficient algorithmic solution for finding common overlapping intervals
over n interval sets. Our strategy is based on constructing one segment tree for each interval set as the first step and proceeds
by converting each segment tree to an indexed segment tree forest by cutting this tree at a certain depth.
Experiments on real data show that this data structure decreases the search time. This novel representation also enables parallel computations
on each segment tree in the forest. We also extend this solution to solve the problem of finding at least k common overlapping intervals over
n interval sets. The tools and techniques developed herein will hopefully expedite the genomic research and help improve our understanding of
the molecular biology of the cell and the mechanisms underlying diseases.