In view of the shortcomings of existing image inpainting methods in handling complex damage and relying on paired training samples, an end-to-end dual-stage self-supervised image inpainting method was proposed. The method included degradation simulation, edge restoration, and color reconstruction; synergistic restoration of structure and color was achieved through cooperative optimization. Experiments were conducted on a historical image dataset from archives, and evaluation was performed using four metrics: peak signal-to-noise ratio(PSNR), learned perceptual image patch similarity(LPIPS), Fréchet inception distarce(FID)and Colorfulness Scare. Experimental results demonstrate that the proposed method outperforms existing mainstream methods in terms of image restoration accuracy, perceptual quality, and color performance, exhibiting good practical value and robustness.
图像上色作为历史图像修复的重要组成部分[15-16],也得到了广泛关注.研究者们提出了多种基于深度学习的上色方法:DeOldify[17]通过自注意力机制实现语义感知色彩预测,但在历史场景中易出现时代错位(如将50年代中山装错误上色为现代西装颜色);DeepFill[18]采用门控卷积生成合理内容,但对结构性破损(如大面积面部缺失)处理效果不稳定.有国外研究团队提出了一种文本引导双重注意力修复网络(text-guided dual attention inpainting network, TDA-Net)[19],该方法通过双模态注意力机制从描述性文本中提取缺失区域的语义特征,并引入图像-文本匹配损失以提高修复结果与文本的一致性.此外,还有多模态特征融合方法(multimodal fusion learning, MMFL)[20]通过构建图像自适应词需求模块,合理过滤有效文本特征,使生成图像具有更精细的纹理.一些国内研究团队还提出了基于自适应特征融合与U-Net的双重退化网络(dual degradation network via adaptive feature fusion and U-Net, AFFU)[21],该网络使用单一网络结构同时解决图像降质问题,利用自引导模块融合多尺度图像信息,有效消除图像中的特定缺陷,并通过自适应多特征融合模块和信息转移机制链接这两个主要结构,自适应地选择和保留图像特征,防止有用信息的丢失.
在此背景下,本文提出了基于双阶段自监督框架的历史图像复原与上色算法(dual-stage self-supervised framework based historical image restoration and colorization algorithm, DSS-HIRC).该算法采用双阶段自监督学习框架,结合图像修复和上色任务,旨在实现对历史图像的高质量修复和真实感上色.通过引入自监督机制,能够在缺乏大规模标注数据的情况下,充分挖掘图像中的潜在信息,提升模型的泛化能力和修复效果.本研究旨在为历史图像修复领域提供一种高效、鲁棒的解决方案,通过系统性的修复,确保档案馆中保存的大量珍贵历史图像资料得以永久传承,促进文化遗产的数字化保护和传播.
1 算法架构
1.1 整体网络框架
本研究构建了一个端到端可训练的双阶段自监督图像修复框架,完整流程包括退化模拟、边缘修复和色彩恢复3个环节.通过自监督退化模拟构建训练样本,无需人工标注的破损-完整图像对,实现对高频边缘细节与低频色彩分布的协同重建.如算法1所示,首先,通过可控退化函数D(⋅)在原始RGB(red green blue)或灰度图像上动态生成多种损伤模式,包括边缘断裂(随机剥离一定宽度的像素带)、模糊(可变高斯核卷积)与噪声(泊松噪声或高斯噪声混合),形成伪破损-完整数据对(D( x ), x ),并采用在线生成策略以覆盖更丰富的场景变化,有效避免模型对固定模式的过拟合.随后的训练阶段由两阶段组成.阶段一聚焦于边缘信息恢复:本研究基于经典U-Net[9]架构引入残差连接和多尺度注意力模块,在网络输入端接收由退化图像输出的破损边缘图,通过卷积与下采样提取边缘特征,再经上采样与跳跃连接逐步还原细节,直至输出完整连贯的二值边缘轮廓.此阶段的自监督信号来自原始图像经Canny[22]算法提取的真实边缘图,采用二元交叉熵损失与结构相似性损失协同优化,使网络更好地聚焦于精细结构的连通性与准确性.阶段二则以色彩恢复为核心:将阶段一的修复边缘图作为输入,采用预训练的视觉Transformer(vision Transformer, ViT)作为全局语义编码器,从而捕捉图像中物体类别、空间布局等宏观信息;随后,融合了ViT特征的改进U-Net在局部结构建模上进一步发挥作用,通过跨层注意力机制将边缘与语义特征进行加权融合,生成初步的RGB重建图.为保证色彩的自然与真实,该阶段的损失函数包括:像素级均方误差(mean-square error,MSE),用于准确重构全局亮度与色彩分布;感知损失(基于VGG(visual geometry group)网络对高层语义特征的匹配)以增强图像的视觉连贯性与真实感;以及对抗损失(通过对局部斑块进行判别式训练)以进一步提升纹理细节与色彩过渡的精细度.两阶段网络在训练过程中采用协同优化策略,边缘修复与色彩恢复模块通过梯度反向传播共享部分特征提取器,从而实现相互补偿与协同增强.
本节提出了一种基于可控退化函数的自监督数据生成方案,其核心在于通过3步叠加模拟真实图像在采集或传输过程中可能出现的边缘断裂、模糊失真与噪声干扰,从而在线构建伪破损-完整训练对.具体来说,首先对输入图像 x 施加边缘掩膜破坏,即生成一个二值掩膜,在掩膜覆盖率范围内随机引入3种破损形态:线状断裂(随机长度的直线段)、块状缺失(宽度、高度的矩形区域)和利用Perlin噪声生成的不规则孔洞,以模拟自然损伤结构;其次,将掩膜处理后的图像与均值为零、方差可控的高斯噪声相加,以复现传感器或环境噪声带来的随机干扰;最后,对上述结果应用高斯滤波器,其标准差从均匀分布中采样,实现不同程度的模糊失真.三者按式(1)叠加生成多样化的退化图像.
.
式中:Blur代表对得到的中间图像应用操作;Mask( x )代表掩膜(masking),即在原始图像 x 上随机“擦除”或“遮盖”一部分像素;Noise( x )代表添加噪声.
ChenWen-xiang, TianQi-chuan, LianLu, et al. Research progress of image inpainting methods based on deep learning[J]. Computer Engineering and Applications, 2024, 60(22):58-73.
[3]
AnwarS, TahirM, LiC Y, et al. Image colorization: a survey and dataset[J]. Information Fusion, 2025, 114: 102720.
[4]
ChuaL O. CNN: a vision of complexity[J]. International Journal of Bifurcation and Chaos, 1997, 7(10): 2219-2425.
[5]
IsolaP, ZhuJ Y, ZhouT H, et al. Image-to-image translation with conditional adversarial networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, 2017: 5967-5976.
[6]
ParmarN, VaswaniA, UszkoreitJ, et al. Image Transformer[C]//Proceedings of the 35th International Conference on Machine Learning (ICML). Stockholm, 2018: 4055-4064.
WeiYun, WangLu-lu, XinZi-hao, et al. Coarse to fine approach to content-consistent image inpainting[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2025, 53(5): 178-184.
[9]
TaiY, YangJ, LiuX M, et al. MemNet: a persistent memory network for image restoration[C]//2017 IEEE International Conference on Computer Vision (ICCV). Venice, 2017: 4549-4557.
Zhou Qi-xue, Yu Ying, Hu Jia-lyu). Ancient mural image restoration network using Involution cascade attention mechanism[J]. Computer Science, 2025, 52(12: 158-165.
[12]
WangZ D, CunX D, BaoJ M, et al. Uformer: a general U-shaped Transformer for image restoration[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, 2022: 17662-17672.
[13]
LiangJ Y, CaoJ Z, SunG L, et al. SwinIR: image restoration using swin transformer[C]//2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Montreal, 2021: 1833-1844.
[14]
KarrasT, LaineS, AittalaM, et al. Analyzing and improving the image quality of StyleGAN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, 2020: 8107-8116.
[15]
VitoriaP, RaadL, BallesterC. ChromaGAN: adversarial picture colorization with semantic class distribution[C]//2020 IEEE Winter Conference on Applications of Computer Vision (WACV). Snowmass Village, 2020: 2434-2443.
[16]
GuoY, GaoY, LuY X, et al. OneRestore: a universal restoration framework forComposite degradation[C]//Computer Vision-ECCV 2024. Cham: Springer, 2025: 255-272.
[17]
GaaD, ChizhovV, PeterP, et al. Connecting image inpainting with denoising in the homogeneous diffusion setting[J]. Advances in Continuous and Discrete Models, 2025, 2025(1): 74.
[18]
KimG, KangK, KimS, et al. BigColor: colorization using a generative color prior for natural images[M]//Computer Vision-ECCV 2022. Cham: Springer Nature Switzerland, 2022: 350-366.
[19]
KangX Y, YangT, OuyangW Q, et al. DDColor: towards photo-realistic image colorization via dual decoders[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Paris, 2024: 328-338.
[20]
SalmonaA, BouzaL, DelonJ. DeOldify: a review and implementation of an automatic colorization method[J]. Image Processing on Line, 2022, 12: 347-368.
[21]
CuiM Z, JiangH, LiC Z. Progressive-augmented-based DeepFill for high-resolution image inpainting[J]. Information, 2023, 14(9): 512.
[22]
ZhangH M, WangM C, ZhangY X, et al. TDA-Net: a novel transfer deep attention network for rapid response to building damage discovery[J]. Remote Sensing, 2022, 14(15): 3687.
[23]
YangF, NingB, LiH Q. An overview of multimodal fusion learning[C]//Mobile Multimedia Communications. Cham: Springer, 2022: 259-268.
[24]
ChenY T, XiaR L, YangK, et al. Dual degradation image inpainting method via adaptive feature fusion and U-Net network[J]. Applied Soft Computing, 2025, 174: 113010.
[25]
RongW B, LiZ J, ZhangW, et al. An improved Canny edge detection algorithm[C]//2014 IEEE International Conference on Mechatronics and Automation. Tianjin, 2014: 577-582.
YangXiao-yu, WangAi-xia, YangGang, et al. Progressive face age synthesis algorithm based on generative adversarial network[J]. Journal of Northeastern University (Natural Science), 2024, 45(7): 944-952.
[28]
AlmahairiA, RajeswarS, SordoniA, et al. Augmented CycleGAN: learning many-to-many mappings from unpaired data[C]//Proceedings of the 35th International Conference on Machine Learning (ICML). Stockholm, 2018: 195-204.
[29]
ZamirS W, AroraA, KhanS, et al. Restormer: efficient transformer for high-resolution image restoration[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, 2022: 5718-5729.