基于指代消解的民间文学文本实体关系抽取
摘要:
民间文学是中华文化的重要组成部分,具有重要的研究价值。随着人工智能的快速发展,数字化技术成为修复民间文学残缺作品、构建民间文学领域知识图谱等实际应用的重要方式.然而,民间文学文本中指示代词多、实体关系重叠,使得民间文学文本关系抽取困难.为此,提出一种基于指代消解的实体关系联合抽取方法CR_RSAN,使用指代消解获取指示代词和对应实体的位置信息,并利用该信息设计指示代词替换算法和调整文本序列标注方法,以此强化模型获取文本语义特征的能力.此外,使用同时编码实体和关系信息的序列标注方法以缓解文本实体关系重叠问题.对比实验选用目前主流方法的模型作为基线,并相继在民间文学文本上进行实验,CR_RSAN在精确率、召回率和F1值等方面分别提高了13.39个百分点、14.29个百分点和14.98个百分点.
Folk literature is an important part of Chinese culture and has significant value.With the rapid development of artificial intelligence,digital technology has become an important way to repair broken works and built the knowledge graph of folk literature.However,there are many demonstrative pronouns and overlapping entity relations in folk literature texts,which poses great challenges for the relation extraction in folk literature text.In view of these characteristics,the method named CR_RSAN is proposed for relation extraction which is based on coreference resolution.This method uses coreference resolution to obtain the position correspondence between demonstrative pronoun and the corresponding entity,and uses this information to design the demonstrative pronoun replacement algorithm and adjust the sequence labeling method,thus improving the ability of the method to obtain the semantic features of text.In addition,the sequence labeling method encodes both entity and relation information alleviates the problem of overlapping entity relation in text.Some methods with the best performance are selected as the baseline and verified in the folk literature text.CR_RSAN's precision,recall rate and F1 value are increased by13.39,14.29 and 14.98 percentage points respectively.
作者:
魏静 岳昆 段亮 王笳辉
Wei Jing;Yue Kun;Duan Liang;Wang Jiahui(School of Information Science and Engineering,Key Lab of Intelligent Systems and Computing of Yunnan Province,Yunnan University,Kunming 650500,China)
机构地区:
云南大学信息学院
引用本文:
《河南师范大学学报(自然科学版)》 CAS 2024年第1期84-92,共9页
Journal of Henan Normal University(Natural Science Edition)
基金:
云南省重大科技专项(202002AD080002) 云南省教育厅科学研究基金(2002Y010) 云南大学研究生科研创新项目(2021Y023,2021Y174)。
关键词:
民间文学 关系抽取 指代消解 注意力 序列标注
folk literature relation extraction coreference resolution attention sequence tagging
分类号:
TP391 [自动化与计算机技术—计算机应用技术]
 
							 
					


