Abstract
Abstract
Solving many fundamental problems, such as traveling salesman (TS), shortest path (SP), and graph matching (GM), requires the use of non-differentiable functions. Although it is promising to use such functions in deep neural networks,
integrating piecewise constant functions into deep neural networks poses a significant challenge for gradient-based optimization. Traditional backpropagation methods struggle when encountering piecewise constant functions, leading to zero or undefined gradients that stall training. Although various heuristics-based gradient approximations exist, these approaches often remain task-specific, theoretically ungrounded, and fragmented.
This thesis addresses the need for a unified and theoretically sound methodology for gradient approximation through non-differentiable layers. First, it provides a comprehensive review and comparative analysis of existing techniques, highlighting that, despite their diversity, most methods share a common underlying principle. Building on these insights, the thesis introduces the Universal Update as a unifying framework capable of representing known approximations as special cases and inspiring the development of new variants.
The thesis also integrates Optimal Transport (OT) theory to replace non-differentiable label assignment in models like DEtection TRansformer (DETR), demonstrating how OT-based solutions can enhance tasks involving discrete decision-making.
Overall, the thesis validates the Universal Update method’s effectiveness empirically across multiple domains, including object detection, combinatorial optimization, and quantization.
By closing the theoretical gap, offering a unified perspective, and validating the proposed approach in practice, this work provides a robust foundation for incorporating non-differentiable functions into deep learning pipelines, ultimately broadening the scope and applicability of gradient-based optimization methods.