summarization-dpp-capsnet

2020-03-27 |

31 |

0 |

summarization-dpp-capsnet

Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization

We provide the source code for the paper "Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization", accepted at ACL'19. If you find the code useful, please cite the following paper.

@inproceedings{cho-lebanoff-foroosh-liu:2019,
 Author = {Sangwoo Cho and Logan Lebanoff and Hassan Foroosh and Fei Liu},
 Title = {Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization},
 Booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
 Year = {2019}}

This repository contains the code for a similarity measure network using Capsule network.

Dependencies

This code is developed with the following environment:

Python 2.7
Keras 2.2.4
Tensorflow 1.12.0 backend
pip install -r requirements.txt

Train and evaluate on the CNN/DM summary pair dataset

Set up directory for training/testing data

$ git clone https://github.com/sangwoo3/summarization-dpp-capsnet.git & cd summarization-dpp-capsnet
$ mkdir data & cd data

Download the data

Download CNN/DM summary pair dataset from HERE and extract it under /data directory

This summary dataset is pre-processed with 50k prevailing vocabularies in CNN/DM summary pair dataset. The label is 1 for a positive pair sentence, and 0 for a negative pair. The positive pair is a pair of a summary sentence and its most similar sentence in the source document that leads to the largest Rouge scores. The negative pair is a pair of the same summary sentence and a random sentence in the same document.

Download Glove word vectors of 50k vocabulary from HERE and place it under /data directory

6B tokens, 300d Glove word vectors are used LINK

If you want raw CNN/DM summary dataset, download from HERE.

This data contains candiate summary sentences for each document. The data is pre-processed with the preprocess.py file to generate the above CNN/DM summary pair dataset.)

Training

$ python main_Capsnet.py

Testing

$ python main_Capsnet.py --testing

Testing on STS dataset

$ python main_Capsnet.py --testing --test_mode STS

Pre-trained Model

Download the pre-trained model from HERE and place it under /result/capnet_sim directory

/result/capnet_sim is a default directory for training results

Download the model fine-tuned on STS dataset from HERE

This model is trained on CNN/DM summary pair dataset and then fine-tuned on STS.
It can be used to evaluate STS prediction accuracy.

System summary

We provide our best system summaries of DUC04 and TAC11. They are generated with DPP and in the system_summary directory. For DPP and multi-document dataset, we do not provide the code and dataset due to license. Please refer to DPP code and download DUC 03/04 and TAC 08/09/10/11 dataset with your request and approval.

License

This project is licensed under the BSD License - see the LICENSE.md file for details.

上一篇：CapsNet-Keras-Text-Classification

下一篇：CapsNet-tf

用户评价

全部评价

还没有评论，说两句吧！

热门资源

seetafaceJNI

项目介绍基于中科院seetaface2进行封装的JAVA...
spark-corenlp

This package wraps Stanford CoreNLP annotators ...
Keras-ResNeXt

Keras ResNeXt Implementation of ResNeXt models...
capsnet-with-caps...

CapsNet with capsule-wise convolution Project ...
inferno-boilerplate

This is a very basic boilerplate example for pe...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com