To address the issue that traditional zero-shot stance detection (ZSSD) methods rely on a large amount of labeled data for model fine-tuning, a method based on in-context learning and semantic retrieval enhancement is proposed. The stance reasoning ability of large language models (LLMs) for unseen targets is improved without the need to tune model parameters. By retrieving semantically similar examples from existing labeled data and following the in-context learning paradigm of LLMs, relevant examples and test texts are formatted into input prompts with task descriptions, which enables models to perform stance classification in a more informative context. It is demonstrated through experiments that the ZSSD performance of the Flan-T5 model is improved by using this method, and the original model is significantly outperformed on the fine-grained task of the SEM16 dataset. The results indicate that the LLMs, understanding of the task can be deepened through this method, and relevant knowledge in examples is referenced to encourage LLM to gain better comprehension of test texts, thereby enabling more accurate inference of stances toward unseen targets.
最近,大规模语言模型(Large language model, LLM)的语义理解和指令遵循等基础能力的提升,为零样本立场检测带来了新的方法路线。已有研究尝试设计提示,使用ChatGPT推理未见对象的立场[12],尽管在一些对象上取得了可观的表现,但其忽略了已有立场标注数据中可供借鉴的样本,这在一定程度上阻碍了LLM在处理零样本立场检测时借助相关样本来推理未见对象立场的能力。因为,通过上下文学习(In-context Learning, ICL)[13],相关的样例不仅可以加深大模型对任务的理解,还能辅助大模型从这些样例中提取相关的知识,从而更准确地理解待测文本并推断其关于给定对象的立场。
为了阐述这一想法,以下给出实例加以说明。关于对象“Hillary Clinton”的一条待测文本是“so when all you brave patriots stop hillary, who you going to replace her with, jeb bush? lol”,为了判断其中蕴含关于“Hillary Clinton”的立场,需要尽可能地了解文本中提及的人物“jeb bush”。而在已有其他对象的标注数据中,存在与待测文本语义相似的文本:“gop plan to stop trump? lol maybe they can put up loser candidates like romney, mccain and jeb!”,通过这条相关的样例,可以获取“jeb bush是失败的共和党候选人”这一相关知识,从而辅助大模型推断出待测文本关于“Hillary Clinton”的立场为支持。
为了生成清晰的提示,首先给出一段简洁的任务描述,用于指导大模型理解立场检测:what’s the attitude of the tweet delimited by triple quotes to the target “<target>”. select from “favor, against or neutral”。其中,<target>表示具体的立场对象,并指定使用支持、反对和中立3种类别进行立场分类。随后,如式(1)所示,基于上述任务描述,将前述选定的样例集合E格式化为上下文示范C。具体而言,对于集合E中的每个样例,执行下列步骤:1)把给定的立场对象在任务描述中;2)把较长的用户文本单独置于任务描述之后,并使用三重引号作为分隔,以避免语义混淆;3)以插入语的形式补充说明该样例对应的立场标签。从而完成对这条样例的格式化处理。
MOHAMMADS, KIRITCHENKOS, SOBHANIP, et al. Semeval-2016 task 6: detecting stance in tweets[C]∥Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). Stroudsburg,USA:ACL,2016. DOI:10.18653/Vi/S16-1003 .
CAOR, LUOX, XIY, et al. Stance detection for online public opinion awareness: an overview[J]. International Journal of Intelligent Systems, 2022,37(12):11944-11965.
[4]
AUGENSTEINI, ROCKTÄSCHELT, VLACHOSA, et al. Stance detection with bidirectional conditional encoding[C]∥Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,USA: ACL, 2016:876-885.
XUC, PARISC, NEPALS, et al. Cross-target stance classification with self-attention networks[C]∥Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, USA: ACL, 2018:778-783.
[7]
ALLAWAYE, MCKEOWNK. Zero-shot stance detection: a dataset and model using generalized topic representations[C]∥Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2020:8913-8931.
[8]
LIUR, LINZ, TANY, et al. Enhancing zero-shot and few-shot stance detection with commonsense knowledge graph[C]∥Proceedings of Findings of the Association for Computational Linguistics:ACL-IJCNLP. Stroudsburg,USA:ACL,2021:3152-3157.
[9]
LIANGB, CHENZ, GUIL, et al. Zero-shot stance detection via contrastive learning[C]∥Proceedings of the ACM Web Conference. New York, USA: ACM,2022:2738-2747.
[10]
JIANGY, GAOJ, SHENH, et al. Zero-shot stance detection via multi-perspective contrastive learning with unlabeled data[J]. Information Processing & Management,2023,60(4):3361-3375.
[11]
ZHUQ, LIANGB, SUNJ, et al. Enhancing zero-shot stance detection via targeted background knowledge[C]∥Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2022:2070-2075.
[12]
ZHANGB, DINGD J, JINGL W, et al. How would stance detection techniques evolve after the launch of ChatGPT?[EB/OL]. (2023-04-10) [2024-03-02].
[13]
LIUJ C, SHEND H, ZHANGY Z, et al. What makes good in-context examples for GPT-3?[EB/OL]. (2021-01-17) [2024-03-01].
BROWNT, MANNBENJAMIN, RYDERN, et al. Language models are few-shot learners[C]∥Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook, USA: Curran Associates Inc., 2020:1877-1901.
[16]
CHUNGH W, HOUL, LONGPRES, et al. Scaling instruction-finetuned language models[EB/OL]. (2022-12-06) [2024-03-02].
[17]
REIMERSN, GUREVYCHI. Sentence-BERT: sentence embeddings using siamese bert-networks[EB/OL]. (2019-08-27) [2024-03-12].