摘 要：在国家科技基础条件平台中如何建设汉语字词之间的语义关系库，并且利用初始的语义关系库自动获取句法模式和新的关系。使用了句法模式的概念，并提出了利用已有关系发现新模式、利用已有模式发现新关系的方法，创造性地设计相关模型并实现了一个中文语义关系知识库系统。利用此系统结合自然语言处理相关技术，从搜狗语料库和百度百科页面文件中大规模自动化获取了有效关系200多个，并从中提取了继承、同义等有效的新关系1 000多条。实验证明其效率达到约40%，主要取决于关系中查询词的距离取值和语料库本身的性质。
关键词：自然语言处理； 信息抽取； 语义关系抽取； 句法模式
中图分类号：tp301.2 文献标志码：a 文章编号：1001-3695(2008)08-2295-04
mutual-extraction between semantic relationships and
lexical patterns in natural language processing
liang na1， geng guo-hua1， zhou ming-quan2
(1. school of information science & technology, northwest university, xi’an 710127, china； 2. college of information science & technology, beijing normal university, beijing 100875, china)
abstract:this paper focused on an automatic approach to build a semantic relationship database in the national science and technology infrastructure platform, identified lexical patterns and extended new semantic relationships by existing ones from corpus. in fact there were a lot of potential relationships between words, and these words could be connected to a big network by them. so the problem was how to model this network and how to get relationships automatically. with the concept of lexical pattern, devised a new method: generalized new patterns form the existing relationships and generalized new relationships from existing patterns. this paper designed and realized a chinese semantic relationships knowledgebase system. using this system and nlp technology, extracted more than 200 effective relationships and more than 1 000 new relationships (such as inherit and synonym)from sogou corpus and baidu baike. the experiment result shows that the precision of these relationships is around 40%, depends on the distance between the searching words and the type of articles in corpus.......