UDAN-CLIP

Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning

Afrah Shaahid1, Muzammil Behzad1
1King Fahd University of Petroleum and Minerals
Introduction figure

An overview of UDAN-CLIP addressing underwater image degradation and enhancement challenges.

Abstract

Underwater images suffer from color distortion, light absorption, and scattering effects that significantly degrade visual quality. We propose UDAN-CLIP, a diffusion-based underwater image enhancement framework augmented with contrastive vision-language guidance. Our model integrates domain-adaptive diffusion modeling, CLIP-guided semantic alignment, and spatial attention to focus on severely degraded regions. Extensive experiments demonstrate superior performance over state-of-the-art underwater enhancement methods in both quantitative metrics and perceptual quality.

Method Overview

UDAN-CLIP combines diffusion modeling with contrastive language-image supervision to improve semantic consistency during underwater image enhancement.

Architecture overview 1

Overall architecture of UDAN-CLIP

Architecture overview 2

Detailed diffusion module structure

  • Domain-Adaptive Diffusion Module: Learns underwater degradation distributions and progressively restores clean images.
  • CLIP-Guided Classifier: Aligns enhanced images with semantic textual descriptions.
  • Spatial Attention Mechanism: Focuses restoration on heavily degraded regions.
  • CLIP-Diffusion Loss: Encourages semantic consistency during reverse diffusion.

Comparison with State-of-the-Art

C60 dataset comparison

Results on C60 dataset

T200 dataset comparison

Results on T200 dataset

Color-Checker7 dataset comparison

Results on Color-Checker7 dataset

Performance heatmap

Performance heatmap analysis

Quantitative Results

Quantitative results table

Quantitative comparison on T200, Color-Checker7, and C60 datasets. Values show improvement over baseline methods.

T200 quantitative results

Quantitative comparison on T200 dataset

Color-Checker7 quantitative results

Quantitative comparison on Color-Checker7 dataset

C60 quantitative results

Quantitative results on C60 dataset

Detailed Analysis

Zoomed analysis 1

Preservation of fine textures and structural details. Our UDAN-CLIP recovers intricate patterns (e.g., coral textures, fish scales) that are lost or blurred in competing approaches.

Zoomed analysis 2

Recovery of fine details in challenging low-light underwater conditions. Our UDAN-CLIP reveals hidden structural elements (e.g., facial features, coin engravings, fish scales, and pool textures) that remain completely obscured in the CLIP-UIE baseline due to severe light absorption and scattering.

BibTeX


@article{shaahid2026udanclip,
  title={UDAN-CLIP: Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning},
  author={Shaahid, Afrah and Behzad, Muzammil},
  journal={arXiv preprint arXiv:2601.xxxxx},
  year={2026}
}