To address the problem of requiring a large amount of labeled data for supervised learning in monocular depth estimation, a semi-supervised depth estimation framework AugDepth was proposed based on a teacher-student model. It operates by perturbing the data and training the model to learn depth consistency before and after the perturbation. Firstly, the smooth random intensity enhancement method was used to sample the intensity from the continuous domain. Multiple operations were randomly selected to increase the randomness of the data, and the output was enhanced by mixing the strength and weakness to prevent excessive disturbance. Then, considering the varying training difficulties of different unlabeled samples, while improving the model's inference of global information through Cutout, the Cutout strategy is adaptively adjusted based on the confidence level of unlabeled samples to enhance the model's generalization and learning abilities. The experimental results on the KITTI and NYU Deeph datasets show that AugDepth can significantly improve the accuracy of semi supervised depth estimation and exhibit good robustness in situations where labeled data is scarce.Key words:computer application; semi-supervised learning; data agumentation; monocular image; depth estimation
EigenD, PuhrschC, FergusR. Depth map predictionfrom a single image using a multi-scale deep network[C]∥Advances in Neural Information Processing Systems,Montreal, Canada, 2014: 2366-2374.
[2]
SongM, LimS, KimW. Monocular depth estimation using laplacian pyramid-based depth residuals[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(11): 4381-4393.
[3]
LeeJ H, HanM K, KoD W, et al. From big to small: multi-scale local planar guidance for monocular depth estimation[J/OL].[2023-08-26].
[4]
JiR, LiK, WangY, et al. Semi-supervised adversarial monocular depth estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2410-2422.
[5]
ChoJ, MinD, KimY, et al. A large RGB-D dataset for semi-supervised monocular depth estimation[J/OL]. [2023-08-27].
[6]
GuoX, LiH, YiS, et al. Learning monocular depth by distilling cross-domain stereo networks[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 506-523.
[7]
CubukE D, ZophB, ShlensJ, et al. Randaugment: practical automated data augmentation with a reduced search space[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,Seattle, USA, 2020: 702-703.
[8]
ZhaoZ, YangL, LongS, et al. Augmentation matters: a simple-yet-effective approach to semi-supervisedsemantic segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Vancouver, Canada,2023: 11350-11359.
[9]
ZhaoZ, LongS, PiJ, et al. Instance-specific and model-adaptive supervision for semi-supervised semantic segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada,2023: 23705-23714.
[10]
de VriesT, TaylorG W. Improved regularization of convolutional neural networks with cutout[J/OL].[2023-08-28].
[11]
TarvainenA, ValpolaH. Mean teachers are better rolemodels: weight-averaged consistency targets improve semi-supervised deep learning results[C]∥Advances in Neural Information Processing System,Vancouver, Canada, 2017: 1195-1204.
[12]
YuanJ, LiuY, ShenC, et al. A simple baseline for semi-supervised semantic segmentation with strong data augmentation[C]∥IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 8209-8218.
[13]
PoggiM, AleottiF, TosiF, et al. On the uncertainty of self-supervised monocular depth estimation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, USA, 2020: 3227-3237.
[14]
BaekJ, KimG, ParkS, et al. MaskingDepth: masked consistency regularization for semi-supervised monocular depth estimation[J/OL]. [2023-08-29].
[15]
FuH, GongM, WangC, et al. Deep ordinal regression network for monocular depth estimation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Alamitos,USA, 2018: 2002-2011.
[16]
GodardC, AodhaO M, FirmanM, et al. Digging into self-supervised monocular depth estimation[C]∥IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 2019: 3827-3837.
[17]
ShuC, YuK, DuanZ, et al. Feature-metric loss for self-supervised learning of depth and egomotion[C]∥European Conference on Computer Vision,Glasgow, UK, 2020: 572-588.
[18]
AmiriA J, LooS Y, ZhangH. Semi-supervised monocular depth estimation with left-right consistency using deep neural network[C]∥IEEE International Conference on Robotics and Biomimetics (ROBIO), Dali,China,2019: 602-607.
[19]
RanftlR, BochkovskiyA, KoltunV. Vision transformers for dense prediction[C]∥IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 12159-12168.