The existing multimodal fine-grained sentiment analysis models have problems of insufficient extraction of textual and visual features and neglect of guiding role of aspect during modality fusion. To address these issues, an aspect percept multimodal fine-grained sentiment analysis model was proposed. Firstly, a semantic alignment module was designed to capture aspect clues in images, enhancing aspect awareness. Secondly, an aspect-oriented syntactic dependency graph and an emotion syntactic graph attention network were constructed to explore complex dependency relationships related to aspect in the text from multiple perspectives. Thirdly, a multi-feature encoding module was developed to extract rich visual feature clues. Finally, a dual cross-attention mechanism was introduced to obtain bidirectional interaction information between text and image modalities. The quality of aspect features was further improved using a gated semantic graph convolutional network and an aspect masking mechanism, while an attention mechanism was employed for aspect percept modality fusion. The experimental results demonstrated that, compared to baseline models, the proposed model achieved average accuracy improvements of 3.00 and 2.78 percentage points on the Twitter-2015 and Twitter-2017 public datasets, respectively, along with average Macro-F1 score increases of 3.50 and 2.53 percentage points. These findings confirmed the model's effectiveness in enhancing the performance of multimodal fine-grained sentiment analysis.
随着深度学习在自然语言处理各个领域的成熟应用,双向长短时记忆网络(Bi-directional Long Short-Term Memory,Bi-LSTM)模型和双向编码器表征(Bidirectional Encoder Representation from Transformers,BERT)等预训练语言模型在文本序列语义分析领域大放光彩[5]。在此基础之上,一些细粒度文本分析工作将句法知识加入,获取文本内部逻辑。如万宇杰等[6]利用图卷积神经网络构建邻接矩阵来建模节点之间的依存关系,有效挖掘文本内在联系。Huang等[7]则是利用图注意力网络为每个节点的邻居节点动态分配不同的权重,自适应地聚合邻居节点特征。为进一步提高句法依赖图质量,Wang等[8]重塑和修剪普通依赖树,使模型专注于方面词和潜在观点词之间的连接。谢珺等[9]则是结合情感常识知识对句法依赖图进行增强,同时构建降噪句法图,提高了文本细粒度情感分析模型性能。但上述方法忽略了图像在情感分析任务中的重要作用。
DeepSentiBank是Chen等[20]提出的一种包含2 048组形容词名词对(Adjectives and Nouns,ANPs)的概念检测器,对于输入的图像数据,DeepSentiBank可以利用深度神经网络获取ANPs与图像匹配程度。考虑到名词可以从另一个角度提供方面词线索,模型通过语义对齐模块学习名词中的方面词相关特征,并作为额外线索融入方面词特征表示中。为了避免关键信息丢失,模型选取前几组匹配度较高的ANPs作为图像辅助信息参与后续计算,具体过程如下。
ZHAOH, YANGM Y, BAIX Y, et al. A Survey on Multimodal Aspect-based Sentiment Analysis[J]. IEEE Access, 2024, 12: 12039-12052. DOI: 10.1109/ccdc58219.2023.10326793 .
[2]
YUJ F, JIANGJ, XIAR. Entity-sensitive Attention and Fusion Network for Entity-level Multimodal Sentiment Classification[J]. IEEE/ACM Trans Audio Speech Lang Process, 2019, 28: 429-439. DOI: 10.1109/TASLP.2019.2957872 .
[3]
YUJ F, WANGJ M, XIAR, et al. Targeted Multimodal Sentiment Classification Based on Coarse-to-fine Grained Image-target Matching[C]//Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. Vienna, Austria: IJCAI, 2022: 4482-4488.. DOI: 10.24963/ijcai.2022/622 .
[4]
WANGS J, CAIG Y, LVG R. Aspect-level Multimodal Sentiment Analysis Based on Co-attention Fusion[J]. Int J Data Sci Anal, 2025, 20(2): 903-916. DOI: 10.1007/s41060-023-00497-3 .
[5]
ZHUC, DINGQ. Aspect-based Sentiment Analysis via Dual Residual Networks with Sentiment Knowledge[J]. J Supercomput, 2024, 81(1): 131. DOI: 10.1007/s11227-024-06546-3 .
HUANGB X, CARLEYK. Syntax-aware Aspect Level Sentiment Classification with Graph Attention Networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA, USA: ACL, 2019: 5468-5476. DOI: 10.18653/v1/d19-1549 .
[9]
WANGK, SHENW Z, YANGY Y, et al. Relational Graph Attention Network for Aspect-based Sentiment Analysis[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: ACL, 2020: 3229-3238. DOI: 10.18653/v1/2020.acl-main.295 .
XIEJ, GAOJ, XUX Y, et al. Aspect-based Sentiment Analysis Model of Dual-transformer Network Based on Knowledge Enhancement[J]. Data Anal Knowl Discov, 2024, 8(11): 47-58. DOI: 10.12677/csa.2022.1212291 .
[12]
KHANZ, FUY. Exploiting BERT for Multimodal Target Sentiment Classification Through Input Space Translation [C]//Proceedings of the 29th ACM international conference on multimedia. New York: ACM, 2021: 3034-3042. DOI: 10.1145/3474085.3475692 .
[13]
WANY J, CHENY Z, LINJ L, et al. A Knowledge-augmented Heterogeneous Graph Convolutional Network for Aspect-level Multimodal Sentiment Analysis[J]. Comput Speech Lang, 2024, 85: 101587. DOI: 10.1016/j.csl.2023.101587 .
[14]
ZHAOF, LIC H, WUZ, et al. M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: ACL, 2023: 9057-9070. DOI: 10.18653/v1/2023.emnlp-main.561 .
[15]
ZHAOF, WUZ, LONGS Y, et al. Learning from Adjective-noun Pairs: A Knowledge-enhanced Framework for Target-oriented Multimodal Sentiment Classification[C]//Proceedings of the 29th International Conference on Computational Linguistics. Gyeongju, Republic of Korea: International Committee on Computational Linguistics, 2022: 6784-6794. DOI: 10.18653/v1/2023.findings-emnlp.403 .
[16]
BORTHD, CHENT, JIR R, et al. SentiBank: Large-scale Ontology and Classifiers for Detecting Sentiment and Emotions in Visual Content[C]//Proceedings of the 21st ACM International Conference on Multimedia. ACM, 2013: 459-460. DOI: 10.1145/2502081.2502268 .
[17]
WANGZ, LIUY, YANGJ N. BERT-based Multimodal Aspect-level Sentiment Analysis for Social Media[C]//Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition. New York: ACM, 2022: 187-192. DOI: 10.1145/3573942.3573971 .
WANGJ W, LIUZ, SHENGV, et al. SaliencyBERT: Recurrent Attention Network for Target-oriented Multimodal Sentiment Classification[M]//Pattern Recognition and Computer Vision. Cham: Springer International Publishing, 2021: 3-15. DOI: 10.1007/978-3-030-88010-1_1 .
[20]
WANGZ Y, GUOJ J. Self-adaptive Attention Fusion for Multimodal Aspect-based Sentiment Analysis[J]. Math Biosci Eng, 2024, 21(1): 1305-1320. DOI: 10.3934/mbe.2024056 .
[21]
ZHANGT Z, ZHOUG, LUJ C, et al. Text-image Semantic Relevance Identification for Aspect-based Multimodal Sentiment Analysis[J]. PeerJ Comput Sci, 2024, 10: e1904. DOI: 10.7717/peerj-cs.1904 .
[22]
CHENT, BORTHD, DARRELLT, et al. Deepsentibank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks[EB/OL]. (2014-01-01) [2025-12-11].
[23]
DOZATT, MANNINGC D. Deep Biaffine Attention for Neural Dependency Parsing[EB/OL]. (2016-01-01) [2025-12-11].
[24]
CAMBRIAE, LIUQ, DECHERCHIS, et al. Senticnet 7: A Commonsense-based Neurosymbolic AI Framework for Explainable Sentiment Analysis[C]//Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association, 2022: 3829-3839. DOI: 10.20944/preprints202001.0163.v1 .
[25]
TSAIY H, BAIS J, LIANGP P, et al. Multimodal Transformer for Unaligned Multimodal Language Sequences[J]. Proc Conf Assoc Comput Linguist Meet, 2019, 2019: 6558-6569. DOI: 10.18653/v1/p19-1656 .
ZHANGY J, GANW Y, XIEB H, et al. Few-shot Object Detection Integrating Multi-scale Feature and Attention[J]. J Chin Comput Syst, 2025, 46(3): 689-696. DOI: 10.20009/j.cnki.21-1106/TP.2023-0509 .
[28]
WOO S, PARKJ, LEEJ Y, et al. CBAM: Convolutional Block Attention Module[C]//Proceedings of the European Conference on Computer Vision. ECCV, Cham: Springer, 2018: 3-19. DOI: 10.7717/peerjcs.2100/fig-6 .
[29]
PONTIKIM, GALANISD, PAVLOPOULOSJ, et al. SemEval-2014 Task 4: Aspect Based Sentiment Analysis[C]//Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Stroudsburg, PA, USA: ACL, 2014: 27-35. DOI: 10.3115/v1/s14-2004 .
[30]
XUN, MAOW J, CHENG D. Multi-interactive Memory Network for Aspect Based Multimodal Sentiment Analysis[J]. Proc AAAI Conf Artif Intell, 2019, 33(1): 371-378. DOI: 10.1609/aaai.v33i01.3301371 .
[31]
YUJ F, JIANGJ. Adapting BERT for Target-oriented Multimodal Sentiment Classification[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Macau, China: Morgan Kaufmann, 2019: 5408-5414. DOI: 10.24963/ijcai.2019/751 .
[32]
YANGH, ZHAOY Y, QINB. Face-sensitive Image-to-emotional-text Cross-modal Translation for Multimodal Aspect-based Sentiment Analysis[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: ACL, 2022: 3324-3335. DOI: 10.18653/v1/2022.emnlp-main.219 .
[33]
YANGJ, XUM Y, XIAOY L, et al. AMIFN: Aspect-guided Multi-view Interactions and Fusion Network for Multimodal Aspect-based Sentiment Analysis[J]. Neurocomputing, 2024, 573: 127222. DOI: 10.1016/j.neucom.2023.127222 .