Abstract
An Introduction to String Re-Wr Fan Bu1 , Hang Li2 1,3 State Key Laboratory of Inte 1,3 Tsinghua National Laboratory f 1,3 Department of Computer Sci. an 2 Huawei Noah’s 1 bufan0000@gmail.com, 2 3 zxyAbstract Learning for sentence re-writing is a fundamental task in natural language processing and information retrieval. In this paper, we propose a new class of kernel functions, referred to as string rewriting kernel, to address the problem. A string re-writing kernel measures the similarity between two pairs of strings. It can capture the lexical and structural similarity between sentence pairs without the need of constructing syntactic trees. We further propose an instance of string re-writing kernel which can be computed ef?ciently. Experimental results on benchmark datasets show that our method can achieve comparable results with state-of-the-art methods on two sentence re-writing learning tasks: paraphrase identi?cation and recognizing textual entailment. 1 Introduction Learning for sentence re-writing is a fundamental task in natural language processing and information retrieval, which includes paraphrasing, textual entailment and transformation between query and document title in search. The key question here is how to represent the re-writing ofsentences. In previous research on sentence re-writing learning such as paraphrase identi?cation and recognizing textual entailment, most representations are based on the lexicons [Zhang and Patrick, 2005; Lintean and Rus, 2011; de Marneffe et al., 2006] or the syntactic trees [Das and Smith, 2009; Heilman and Smith, 2010] of the sentence pairs. Motivated by previous work on paraphrase generation [Lin and Pantel, 2001; Barzilay and Lee, 2003], we represent a rewriting of sentence by all possible re-writing rules that can be applied into it. For example, in Fig. 1, (A) is one rewriting rule that can be applied into the sentence re-writing (B). Speci?cally, we propose a new class of kernel functions, called string re-writing kernel (SRK), which de?nes the similarity between two re-writings (pairs) of strings as the inner product between them in the feature space induced by all the ? The paper on which this extended abstract is based was tcipient of the best student paper award of the 50th Annual Meetof the Association for Computational Linguistics[Bu et al., 201