Existing classification algorithms often suffer from a large number of parameters, high computational complexity, and suboptimal classification performance in classifying skin lesion images, due to their uneven distribution attibutes. To address this issue, in this paper, we propose a lightweight transformer module and a novel strategy that combines Convolutional Neural Network (CNN) with Transformer to enhance network classification performance. Additionally, we adopt an inverse class loss function weighting scheme to mitigate the impact of imbalanced image category distribution during training. The lightweight transformer extracts essential features from input sequences, and performs separable self-attention computations to capture global feature information from skin lesion regions. This approach addresses the computational limitations of traditional transformer. Furthermore, our new strategy effectively integrates shallow global detail features with deep semantic features, enhancing the network's expressive ability. Experimental results on the HAM10000 dataset demonstrate that our algorithm outperforms other comparative methods in terms of evaluation metrics. Remarkably, we achieve these results while maintaining a model size of only 2.3 million parameters, which holds significant promise for advancing automatic skin lesion classification tools.
ViT与基于CNN的模型相比,具有较高的计算成本与延迟,主要效率瓶颈是多头自注意力机制(Multi-headed Self-attention,MHA),面对输入数据中序列 k 的数量,它需要O( k2 )的时间复杂度。
为了更高效地提取皮肤病变区域全局上下文信息,本文提出了轻量级Transformer模块,如图3所示,通过引入一种时间复杂度为O( k )的可分离自注意力计算方法(Separate Selfattention)[15],并将特征信息在送入LinearTransformer前,经由大小为2×2,步长为2的平均池化与最大值池化进行显著特征信息提取。经显著特征提取后输入到LinearTransformer的序列长度变为原本的1/4,有效减少特征信息的规模,进一步减少计算量。因为输入到CNN与输入到LinearTransformer的特征信息格式不同,需要将特征信息进行二维特征图与一维序列的转换。最终,相较于Transformer,本文提出的方法时间复杂度仅为O( k /4),在计算成本方面更为高效。
与MHA类似,可分离自注意力结构如图4(a)所示。输入 x 使用三个分支进行处理,即 L 、键 K 和值 V。 L 使用一个权重为 WI ∈ Rd 的线性层将 x 中的每个d维序列线性映射为一个标量,线性映射是一个内积运算。 WI 作为图4(b)中的潜在节点L,通过计算L和 x 之间的距离,得到一个k维向量,然后对这个k维向量应用softmax来生成上下文分数 cs ∈ Rk。与计算每个序列相对于所有k个标记的上下文分数的Transformer不同,可分离自注意力只计算相对于潜在序列L的上下文分数。这将计算上下文分数的成本从O( k2 )降低到O( k )。
为了使 cv 中编码的上下文信息与 x 中的所有序列共享,使用权重为 WV ∈ Rd×d 的分支 V 将 x 投影到d维空间,通过ReLU激活产生输出 xV ∈ Rk×d。然后, cv 中的上下文信息通过广播的元素相乘(broadcasted element-wise multiplication)操作传播到 xV。最后,将结果馈送到另一个权重为 WO ∈ Rd×d 的线性层,以产生最终输出 y ∈ Rk×d。数学上,可分离自注意力定义为:
HANS S, PARKI, CHANGS E, et al. Augmented Intelligence Dermatology: Deep Neural Networks Empower Medical Professionals in Diagnosing Skin Cancer and Predicting Treatment Options for 134 Skin Disorders[J]. J Invest Dermatol, 2020, 140(9): 1753-1761. DOI: 10.1016/j.jid.2020.01.019 .
SANDLERM, HOWARDA, ZHUM L, et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 4510-4520. DOI: 10.1109/CVPR.2018.00474 .
[4]
SRINIVASUP N, SIVASAIJ G, IJAZM F, et al. Classification of Skin Disease Using Deep Learning Neural Networks with MobileNet V2 and LSTM[J]. Sensors, 2021, 21(8): 2852. DOI: 10.3390/s21082852 .
[5]
HOANGL, LEES H, LEEE J, et al. Multiclass Skin Lesion Classification Using a Novel Lightweight Deep Learning Framework for Smart Healthcare[J]. Appl Sci, 2022, 12(5): 2677. DOI: 10.3390/app12052677 .
[6]
WANGL T, ZHANGL, SHUX, et al. Intra-class Consistency and Inter-class Discrimination Feature Learning for Automatic Skin Lesion Classification[J]. Med Image Anal, 2023, 85: 102746. DOI: 10.1016/j.media.2023.102746 .
[7]
HEX Z, TANE L, BIH W, et al. Fully Transformer Network for Skin Lesion Analysis[J]. Med Image Anal, 2022, 77: 102357. DOI: 10.1016/j.media.2022.102357 .
[8]
CHENL C, PAPANDREOUG, KOKKINOSI, et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFS[J]. IEEE Trans Pattern Anal Mach Intell, 2018, 40(4): 834-848. DOI: 10.1109/TPAMI.2017.2699184 .
[9]
DOSOVITSKIYA, BEYERL, KOLESNIKOVA, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[EB/OL]. (2021-06-03)[2024-03-04].
[10]
CHESLEREAN-BOGHIUT, FLEISCHMANNM E, WILLEMT, et al. Transformer-based Interpretable Multi-modal Data Fusion for Skin Lesion Classification[EB/OL]. (2023-08-31)[2024-03-04].
[11]
PENGZ L, HUANGW, GUS Z, et al. Conformer: Local Features Coupling Global Representations for Visual Recognition[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE, 2021: 357-366. DOI: 10.1109/ICCV48922.2021.00042 .
[12]
HEK M, ZHANGX Y, RENS Q, et al. Deep Residual Learning for Image Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2016: 770-778. DOI: 10.1109/CVPR.2016.90 .
[13]
MEHTAS, RASTEGARIM. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer[EB/OL]. (2022-03-04)[2024-03-04].
[14]
HOWARDA G, ZHUM L, CHENB, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[EB/OL]. (2017-04-17)[2024-03-04].
[15]
MEHTAS, RASTEGARIM. Separable Self-attention for Mobile Vision Transformers[EB/OL]. (2022-06-06)[2024-03-04].
[16]
TSCHANDLP, ROSENDAHLC, KITTLERH. The HAM10000 Dataset, a Large Collection of Multi-source Dermatoscopic Images of Common Pigmented Skin Lesions[J]. Sci Data, 2018, 5: 180161. DOI: 10.1038/sdata.2018.161 .
[17]
HUANGG, LIUZ, VAN DER MAATENL, et al. Densely Connected Convolutional Networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2017: 2261-2269. DOI: 10.1109/CVPR.2017.243 .
[18]
MILTONM A A. Automated Skin Lesion Classification Using Ensemble of Deep Neural Networks in ISIC 2018: Skin Lesion Analysis towards Melanoma Detection Challenge[EB/OL]. (2019-01-30)[2024-03-04].
[19]
GESSERTN, SENTKERT, MADESTAF, et al. Skin Lesion Diagnosis Using Ensembles, Unscaled Multi-crop Evaluation and Loss Weighting[EB/OL]. (2018-08-05)[2024-03-04].
[20]
RAY S. Disease Classification within Dermascopic Images Using Features Extracted by ResNet50 and Classification through Deep Forest[EB/OL]. (2018-07-25)[2024-03-04].
[21]
PEREZF, AVILAS, VALLEE. Solo or Ensemble? Choosing a CNN Architecture for Melanoma Classification[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). New York: IEEE, 2019: 2775-2783. DOI: 10.1109/CVPRW.2019.00336 .
THURNHOFER-HEMSIK, LÓPEZ-RUBIOE, DOMÍNGUEZE, et al. Skin Lesion Classification by Ensembles of Deep Convolutional Networks and Regularly Spaced Shifting[J]. IEEE Access, 2021, 9: 112193-112205. DOI: 10.1109/ACCESS.2021.3103410 .
[24]
WANGH, QIQ Q, SUNW J, et al. Classification of Skin Lesions with Generative Adversarial Networks and Improved MobileNetV2[J]. Int J Imaging Syst Tech, 2023, 33(5): 1561-1576. DOI: 10.1002/ima.22880 .
[25]
SELVARAJUR R, COGSWELLM, DAS A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization[C]//2017 IEEE International Conference on Computer Vision (ICCV). New York: IEEE, 2017: 618-626. DOI: 10.1109/ICCV.2017.74 .