To address the issues of diverse land cover sizes and spatial combinations, as well as significant interclass similarity and intraclass variability in remote sensing image classification tasks, a lightweight frequency and spatial feature fused multi-scale remote sensing scene classification network(FS-LMFFNet) is proposed, based on the purpose of effective feature extraction and full integration of multi-scale features. Firstly, to combine the advantages of CNN and Transformer, and achieve an adequate extraction of local and global features, a Frequency and Spatial MLP module(FS-MLP) is proposed, which complements traditional spatial operations in extracting global high-frequency texture features by introducing frequency domain analysis. Secondly, to resolve the multi-scale characteristics of remote sensing scene images, a Lightweight Multi-layer Feature Fusion(LMFF) module is proposed, in which lightweight convolutional blocks are employed to efficiently fuse the multi-scale features in the first three stages. Finally, FS-LMFFNet has been extensively experimented on three publicly available datasets UC_Merced, RSSCN7 and AID datasets and yielded remarkable accuracies of 99.10%, 96.60% and 95.48%, respectively. Experimental results demonstrate the superior multi-scale feature extraction and fusion capability of FS-LMFFNet, which achieves better performance than other state-of-the-art models.
(1)为了结合CNN和Transformer的优势,实现局部和全局特征的充分提取,设计了一种频率和空间多层感知机模块(Frequency and spatial MLP, FS-MLP)。该模块运用深度卷积对图像的空间信息建模,采用可学习的全局滤波器补充卷积等空间操作丢失的全局高频纹理特征,通过多层感知机对空间和频率信息进行整合。
实验采用计算机图像分类任务中常用的评价标准,即采用参数量(Parameters,Param)、浮点运算次数(Floating point operations, FLOPs)作为模型复杂度的评价指标,采用准确率(Accuracy)作为模型性能的评价标准。设模型预测正确的正样本数量为,预测错误的正样本数为,预测正确的负样本数为,预测错误的负样本数为,准确率的计算公式如下:
XuCong-an, Ya-feiLyu, ZhangXiao-han, et al. A discriminative feature representation method based on dual attention mechanism for remote sensing image scene classification[J]. Journal of Electronics & Information Technology, 2021, 43(3): 683-691.
[3]
Morell-MonzóS, Sebastiá-FrasquetM T, EstornellJ. Land use classification of VHR images for mapping small-sized abandoned citrus plots by using spectral and textural information[J]. Remote Sensing, 2021, 13(4): No.681.
[4]
LiangS, ChengJ, ZhangJ. Maximum likelihood classification of soil remote sensing image based on deep learning[J]. Earth Sciences Research Journal, 2020, 24(3): 357-365.
[5]
FatemighomiH S, GolalizadehM, AmaniM. Object-based hyperspectral image classification using a new latent block model based on hidden Markov random fields[J]. Pattern Anal Applic, 2022, 25: 467-481.
[6]
KrizhevskyA, SutskeverI, HintonG E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[7]
DaiJ, QiH, XiongY, et al. Deformable convolutional networks[C]∥Proceedings of the IEEE International Conference on Computer Vision(ICCV), Venice, Italy, 2017: 764-773.
[8]
DingX, ZhangX, HanJ, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNNs[C]∥Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, Louisiana, USA,2022: 11953-11965.
[9]
GuoM H, LuC Z, LiuZ N, et al. Visual attention network[J]. Computational Visual Media, 2022, 9(4):733-752.
[10]
VaswaniA, ShazeerN, ParmarN, et al. Attention is all you need[C]∥Proceedings of 31st Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000-6010.
[11]
DosovitskiyA, BeyerL, KolesnikovA, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2022-10-18].
[12]
BaziY, BashmalL, RahhalM M A, et al. Vision transformers for remote sensing image classification[J]. Remote Sensing, 2021, 13(3): No. 516.
[13]
YuW H, LuoM, ZhouP, et al. Meta former is actually what you need for vision[C]∥Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 10809-10819.
WangWei, LiXi-jie, WangXin. ADC-CPANet:a remote sensing image classification method based on local-global feature fusion[J]. National Remote Sensing Bulletin, 2024, 28(10): 2661-2672.
[16]
WangW, HuT, WangX, et al. BFRNet: bidimensional feature representation network for remote sensing images classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-13.
[17]
HuangZ, ZhangZ, LanC, et al. Adaptive frequency filters as efficient global token mixers[EB/OL].[2023-03-22].
[18]
CaoR, FangL, LuT, et al. Self-attention -based deep feature fusion for remote sensing scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 18(1): 43-47.
WangWei, DengJi-wei, WangXin, et al. GLFFNet model for remote sensing image scene classification[J]. Acta Geodaetica ET Cartographica Sinica, 2023, 52(10): 1693-1702.
[21]
HendrycksD, GimpelK. Gaussian error linear units (GELUs)[EB/OL]. [2024-01-10].
[22]
SandlerM, HowardA, ZhuM L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA,2018:4510-4520.
[23]
ZhouD, HouQ, ChenY, et al. Rethinking bottleneck structure for efficient mobile network design[J]. In Computer Vision-ECCV 2020, Lecture Notes in Computer Science, 2020, 12348: 680-697.
[24]
SergeyI, ChristianS. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]∥Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015:448-456.
[25]
HouQ, ZhouD, FengJ. Coordinate attention for efficient mobile network design[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Montreal, Canada,2021: 13713-13722.
[26]
YangY, ShawnN. Bag-of-visual-words and spatial extensions for land-use classification[C]∥Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose California, USA, 2010: 270-279.
[27]
ZouQ, NiL H, ZhangT, et al. Deep learning based feature selection for remote sensing scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(11): 2321-2325.
[28]
XiaG S, HuJ, HuF, et al. AID: a benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965-3981.
[29]
LiuZ, LinY, CaoY, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]∥ IEEE/CVF International Conference on Computer Vision(ICCV), Montreal, Canada, 2021: 10012-10022.
[30]
CaoG, LuoS, HuangW, et al. Strip-MLP: efficient token interaction for vision MLP[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France,2023: 1494-1504.
[31]
SimonyanK, ZissermanA. Very deep convolutional networks for large-scale image recognition[EB/OL].[2023-03-18].
[32]
HeK M, ZhangX Y, RenS Q, et al. Deep residual learning for image recognition[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770-778.
[33]
QinZ, ZhangP, WuF, et al. FcaNet: frequency channel attention networks[C]∥Proceedings of the IEEE International Conference on Computer Vision, Xi'an, China, 2020: 763-772.
[34]
RaoY, ZhaoW, ZhuZ, et al. Global filter networks for image classification[J]. Advances in Neural Information Processing Systems, 2021, 2: 980-993.
[35]
TangY, HanK, GuoJ, et al. An image patch is a wave: phase-aware vision MLP[EB/OL].[2023-03-18].
[36]
LiJ, HassaniA, WaltonS, et al. ConvMLP: Hierarchical Convolutional MLPs for Vision[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Vancouver,Canada, 2023: 6307-6316.
[37]
WangX, DuanL, NingC, et al. Relation-attention networks for remote sensing scene classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 422-439.
[38]
TangX, LiM, MaJ, et al. EMTCAL: efficient multiscale transformer and cross-level attention learning for remote sensing scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-15.
[39]
SelvarajuR R, CogswellM, DasA, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]∥Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 618-626.