Unsupervised Open Relation Extraction

文中提出了全新的对word embedding的re-weight的方式
并且对于稀疏的word features的reduction。

Introduction

distant supervision不能够捕获一些未在已知数据库中出现的关系类型。
unsupervised methods有可能克服这些问题。因此作者提出unsupervised 方法

Proposed Method

基于两个named entity的类型以及决定关系的短语来构建sentence representation
对于决定关系的短语，根据两个entity之间的 dependency path 使用 re-weight 之后的 word representation。
这些representation将被聚类
模型被分为4个步骤:
Preprocessing
Feature Extraction （对应得到Entity Features, Sentence Features)
Sparse Feature Reduction (对应PAC降维)
Relation Clustering (HAC Clustering)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oa3uexpx-1574055571414)(evernotecid://D5FFD04A-3038-43CF-94CF-E8020F37C28F/appyinxiangcom/19325391/ENResource/p253)]

Preprocessing

使用DBpedia Spotlight抽取named entity
assumption：一个句子中至少有两个entity
使用Stanford CoreNLP 来进行dependency path tree的抽取

Feature Extraction

对于每个句子，Feature包括：

realtion type
dependency path between entities
word embedding

由于不是每个word都equally contribute to expression of relation.于是作者提出将pre-trained word emebdding re-weight一下（这个方法和 update词向量的对比？）

Sparse Feature Reduction

在supervised的训练过程中不用担心feature的问题，因为都是算法在训练数据集上自己抽取出来的额，而在unsupervised learning的过程中需要注意feature稀疏的问题，因此使用了PAC来对feature进行一个抽取（在merge之前）

Relation Clustering

使用了HAC， criteria是Ward’s linkage criteria.

Evaluation

Feature	F1
TF-IDF	12.2
Word-Emb.	7.4
IDF-Emb.	10.3
Dependency Re-Weighted Emb.	19.5

Table 1: Comparison between di↵erent features for clustering.

Var. Autoencoder	Rel-LDA	HAC	Our
35.8	29.6	28.3	41.6

Table 2: Pairwise F1 (%) scores of di↵erent models on the test set of the NYT-FB dataset.

免责声明：本文来自互联网新闻客户端自媒体，不代表本网的观点和立场。

合作及投稿邮箱：E-mail:editor@tusaishared.com

上一篇：机器学习——HMM(隐马尔可夫模型的基本概念)(一)

下一篇：python处理csv文件，分列，去重，合并

用户评价

全部评价

热门资源

Python 爬虫（二）...

所谓爬虫就是模拟客户端发送网络请求，获取网络响...
TensorFlow从1到2...

原文第四篇中，我们介绍了官方的入门案例MNIST，功...
TensorFlow从1到2...

“回归”这个词，既是Regression算法的名称，也代表...
机器学习中的熵、...

熵 (entropy) 这一词最初来源于热力学。1948年，克...
TensorFlow2.0（10...

前面的博客中我们说过，在加载数据和预处理数据时...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com