Abstract
Abstract
Conventional imaging sensors often struggle to capture the full dynamic range of real-world scenes, leading to a loss of detail in shadows and highlights. This talk presents a holistic approach to overcoming these limitations by aligning alternative hardware designs, such as multi-exposure sensors, with robust machine-learning solutions. The research demonstrates that capturing varied exposures at the pixel level provides critical reference data to simplify complex tasks like denoising and high-dynamic-range (HDR) reconstruction. These benefits extend to the temporal domain, where motion blur information is utilized to handle complex, non-uniform motion and enable high-quality video frame interpolation for HDR content.
Complementing these reconstruction techniques is the challenge of perceptually meaningful quality evaluation, which can be improved by leveraging visual masking—a fundamental characteristic of the human visual system. The presentation highlights how modeling masking effects significantly enhances the alignment of traditional metrics (PSNR, SSIM) and modern learning-based models (LPIPS, DISTS) with human perception. Building on this methodology, this talk introduces MILO, a lightweight, multiscale perceptual metric. Trained via pseudo-MOS supervision, MILO achieves state-of-the-art accuracy and serves as an effective perceptual loss for image and latent-space optimization within generative pipelines like Stable Diffusion, significantly improving performance in various image restoration tasks.