针对文本分类模型的高效硬标签对抗攻击方法

邱士林; 刘启和; 周世杰; 曾逸

doi:10.12178/1001-0548.2024295

电子科技大学学报 ›› 2026, Vol. 55 ›› Issue (1) : 116 -128. DOI: 10.12178/1001-0548.2024295

计算机工程与应用

针对文本分类模型的高效硬标签对抗攻击方法

作者信息 +

Efficient hard-label adversarial attacks against natural language processing models

Author information +

文章历史 +

PDF (1023K)

摘要

为了评估自然语言处理模型在真实应用场景下的对抗鲁棒性，硬标签设置下的黑盒对抗攻击技术逐渐引发关注。然而，受限于文本的离散性、反馈信息有限、查询次数限制等因素，现有硬标签对抗攻击方法通常存在查询次数多、对抗文本语义一致性低等问题，难以满足真实应用场景需求。因此，提出了一种高效的硬标签对抗攻击方法，该方法在对抗文本初始化阶段引入注意力机制，并在对抗文本语义优化阶段中提出了基于语义聚类的同义词搜索、基于语义梯度的动态扩展同义词搜索两个策略。实验结果表明，该方法能以少量查询来生成语义一致性高、自然流畅的高质量对抗文本。

Abstract

Due to the necessity of verifying the robustness of natural language processing models against adversarial attacks in real-world application scenarios, black-box adversarial attack techniques under the hard-label setting have garnered increasing attention. However, due to the discrete nature of textual data, the limited information feedback from the victim model, and the constraints on the number of queries imposed by practical applications, existing hard-label adversarial attack methods usually suffer from excessive queries to the victim model and low semantic consistency of generated adversarial texts, rendering them inadequate for real-world applications. To this end, an efficient hard label adversarial attack method is proposed. In this method, an attention mechanism is introduced in the initialization stage of the adversarial text, while in the adversarial text semantic optimization stage, two strategies are proposed: the semantic clustering-based synonym search and the semantic gradient-based dynamic expansion synonym search. Experimental results demonstrate that the proposed method can efficiently generate high-quality adversarial text with high semantic consistency and natural fluency with a small number of queries.

关键词

对抗攻击 / 对抗样本 / 鲁棒性 / 自然语言处理 / 人工智能

Key words

adversarial attack / adversarial example / robustness / natural language processing / artificial intelligence

引用本文

引用格式 ▾

[Author(id=1276507814172811846, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=shilinqiu@std.uestc.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1276507814244115018, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, authorId=1276507814172811846, language=EN, stringName=Shilin QIU, firstName=Shilin, middleName=null, lastName=QIU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=^*, address=School of Information and Software Engineering, University of Electronic Science and Technology of China , Chengdu 610054, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1276507814298640971, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, authorId=1276507814172811846, language=CN, stringName=邱士林, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=^*, address=电子科技大学信息与软件工程学院 , 成都 610054, bio={"content":"

邱士林，博士生，主要从事人工智能对抗安全方面的研究。

"}, bioImg=null, bioContent=

邱士林，博士生，主要从事人工智能对抗安全方面的研究。

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1276507814084731457, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, xref=null, ext=[AuthorCompanyExt(id=1276507814101508674, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, companyId=1276507814084731457, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Information and Software Engineering, University of Electronic Science and Technology of China , Chengdu 610054, China), AuthorCompanyExt(id=1276507814118285892, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, companyId=1276507814084731457, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=电子科技大学信息与软件工程学院 , 成都 610054)])]), Author(id=1276507814353166926, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1276507814424470098, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, authorId=1276507814353166926, language=EN, stringName=Qihe LIU, firstName=Qihe, middleName=null, lastName=LIU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=School of Information and Software Engineering, University of Electronic Science and Technology of China , Chengdu 610054, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1276507814495773268, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, authorId=1276507814353166926, language=CN, stringName=刘启和, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=电子科技大学信息与软件工程学院 , 成都 610054, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1276507814084731457, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, xref=null, ext=[AuthorCompanyExt(id=1276507814101508674, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, companyId=1276507814084731457, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Information and Software Engineering, University of Electronic Science and Technology of China , Chengdu 610054, China), AuthorCompanyExt(id=1276507814118285892, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, companyId=1276507814084731457, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=电子科技大学信息与软件工程学院 , 成都 610054)])]), Author(id=1276507814583853655, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1276507814655156826, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, authorId=1276507814583853655, language=EN, stringName=Shijie ZHOU, firstName=Shijie, middleName=null, lastName=ZHOU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=School of Information and Software Engineering, University of Electronic Science and Technology of China , Chengdu 610054, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1276507814713877085, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, authorId=1276507814583853655, language=CN, stringName=周世杰, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=电子科技大学信息与软件工程学院 , 成都 610054, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1276507814084731457, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, xref=null, ext=[AuthorCompanyExt(id=1276507814101508674, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, companyId=1276507814084731457, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Information and Software Engineering, University of Electronic Science and Technology of China , Chengdu 610054, China), AuthorCompanyExt(id=1276507814118285892, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, companyId=1276507814084731457, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=电子科技大学信息与软件工程学院 , 成都 610054)])]), Author(id=1276507814768403040, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1276507814835511906, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, authorId=1276507814768403040, language=EN, stringName=Yi ZENG, firstName=Yi, middleName=null, lastName=ZENG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=School of Information and Software Engineering, University of Electronic Science and Technology of China , Chengdu 610054, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1276507814877454948, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, authorId=1276507814768403040, language=CN, stringName=曾逸, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=电子科技大学信息与软件工程学院 , 成都 610054, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1276507814084731457, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, xref=null, ext=[AuthorCompanyExt(id=1276507814101508674, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, companyId=1276507814084731457, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Information and Software Engineering, University of Electronic Science and Technology of China , Chengdu 610054, China), AuthorCompanyExt(id=1276507814118285892, tenantId=1045748351789510663, journalId=1155139928303341607, articleId=1242840499720893254, companyId=1276507814084731457, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=电子科技大学信息与软件工程学院 , 成都 610054)])])] 邱士林,刘启和,周世杰,曾逸. 针对文本分类模型的高效硬标签对抗攻击方法[J]. 电子科技大学学报, 2026, 55(1): 116-128 DOI:10.12178/1001-0548.2024295

登录浏览全文

4963

注册一个新账户忘记密码

参考文献

原文顺序 | 出版日期 | 本文引用

[1]	BORDOLOI M, BISWAS S K. Sentiment analysis: A survey on design framework, applications and future Scopes[J].Artificial Intelligence Review, 2023, 56(11): 12505-12560.

[2]	ZHANG B, HADDOW B, BIRCH A. Prompting large language model for machine translation: A case study[C]//International Conference on Machine Learning. [S.l.]: PMLR,2023: 41092-41110.

[3]	LI X, THICKSTUN J, GULRAJANI I, et al. Diffusion—lm improves controllable text generation[J].Advances in Neural Information Processing Systems, 2022, 35: 4328-4343.

[4]	ZHANG W N, CUI Y M, ZHANG K Y, et al. A static and dynamic attention framework for multi turn dialogue generation[J].ACM Transactions on Information Systems, 2023, 41(1): 1-30.

[5]	LYU A, LI J P, XIE S F, et al. Envisioning future from the past: Hierarchical duality learning for multi—turn dialogue generation[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: [s.n.],2023: 7382-7394.

[6]	PAPERNOT N, MCDANIEL P, SWAMI A, et al. Crafting adversarial input sequences for recurrent neural networks[C]//Proceedings of the 2016 IEEE Military Communications Conference. [s.l.]: IEEE,2016: 49-54.

[7]	QIU S L, LIU Q H, ZHOU S J, et al. Adversarial attack and defense technologies in natural language processing: A survey[J].Neurocomputing, 2022, 492: 278-307.

[8]	GAO J, LANCHANTIN J, SOFFA M L, et al. Black—box generation of adversarial text sequences to evade deep learning classifiers[C]//Proceedings of the IEEE Security and Privacy Workshops. [S.l.]: IEEE,2018: 50-56.

[9]	JIN D, JIN Z J, ZHOU J T, et al. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8018-8025.

[10]	WANG T, WANG X, QIN Y, et al. Cat—gen: Improving robustness in NLP models via controlled adversarial text generation[EB/OL]. [2024—10—05].https://arxiv.org/pdf/2010.02338.

[11]	ZHENG X Q, ZENG J H, ZHOU Y, et al. Evaluating and enhancing the robustness of neural network—based dependency parsing models with adversarial examples[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: [s.n.],2020: 6600-6610.

[12]	CHENG M H, YI J F, CHEN P Y, et al. Seq2Sick: Evaluating the robustness of sequence—to—sequence models with adversarial examples[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 3601-3608.

[13]	ATANASOVA P. Generating label cohesive and well—formed adversarial claims[M]//Accountable and Explainable Methods for Complex Reasoning over Text. Cham: Springer Nature Switzerland, 2020: 65-79.

[14]	MAHESHWARY R, MAHESHWARY S, PUDI V. A strong baseline for query efficient attacks in a black box setting[EB/OL]. [2024—09—10].https://arxiv.org/pdf/2109.04775.

[15]	MAHESHWARY R, MAHESHWARY S, PUDI V. Generating natural language attacks in a hard label black box setting[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(15): 13525-13533.

[16]	YE M C, MIAO C L, WANG T, et al. TextHoaxer: Budgeted hard—label adversarial attacks on text[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(4): 3877-3884.

[17]	YE M C, CHEN J H, MIAO C L, et al. LeapAttack: Hard—label adversarial attack on text via gradient—based optimization[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2022: 2307-2315.

[18]	LIU H, XU Z, ZHANG X T, et al. SSPAttack: A simple and sweet paradigm for black—box hard—label textual adversarial attack[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(11): 13228-13235.

[19]	LIU H, XU Z, ZHANG X, et al. HQA—attack: Toward high quality black—box hard—label adversarial attack on text[J].Advances in Neural Information Processing Systems, 2023, 36: 51347-51358.

[20]	CER D, YANG Y F, KONG S Y, et al. Universal sentence encoder for English[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Stroudsburg, PA: [s.n.],2018: 169-174.

[21]	SOCHER R, PERELYGIN A, WU J, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: [s.n.],2013: 1631-1642.

[22]	PANG B, LEE L. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales[EB/OL]. [2024—09—17].https://arxiv.org/pdf/cs.CL/0506075.

[23]	ZHANG X, ZHAO J, LECUN Y. Character—level convolutional networks for text classification[J].Advances in Neural Information Processing Systems, 2015, 28: 1-9.

[24]	DEVLIN J. Bert: Pre—training of deep bidirectional transformers for language understanding[EB/OL]. [2024—09—24].https://arxiv.org/pdf/1810.04805.

[25]	KIM Y. Convolutional neural networks for classification[EB/OL]. [2024—09—03].https://arxiv.org/pdf/1408.5882.

[26]	HOCHREITER S, SCHMIDHUBER J. Long short—term memory[J].Neural Computation, 1997, 9(8): 1735-1780.

[27]	BOWMAN S R, ANGELI G, POTTS C, et al. A large annotated corpus for learning natural language inference[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: ACL, 2015: 632-642.

[28]	WILLIAMS A, NANGIA N, BOWMAN S. A broad—coverage challenge corpus for sentence understanding through inference[EB/OL]. [2024—08—22].https://arxiv.org/pdf/1704.05426.

[29]	RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J].Open AI blog, 2019, 1(8): 9.

[30]	YANG Z C, YANG D Y, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: [s.n.],2016: 1480-1489.