资源算法GGNN_text_summarizer

GGNN_text_summarizer

2020-02-17 | |  40 |   0 |   0

Structured Neural Summarization

A repository with the code for the paper with the same title. The experiments are based on the more general-purpose graph neural network library OpenGNN. You can install it by following it's README.md.

Experiments are based around the train_and_eval.py script. Besides the main experiments, this repo also contains the following folders:

  • Parsers: A collection of scripts to parse and process various datasets to the format used by the experiments

  • Data: A collection of scripts to utility functions to handle and analyse the formated data

  • Models: Some bash script wrapppers around the main script with some model/hyperparameter combination for diferent experiments

Getting Started

As an example, we will show how run a sequenced-graph to sequence model with attention on the CNN/DailyMail dataset. This assumed the process data is located in /data/naturallanguage/cnn_dailymail/split/{train,valid,test}/{inputs,targets}.jsonl.gz.

For instruction on how to process see the corresponding subfolder.

Start by build vocabularies for the node and edge labels in the input side and tokens in the output side by running

ognn-build-vocab --field_name node_labels 
                 --save_vocab /data/naturallanguage/cnn_dailymail/node.vocab 
                 /data/naturallanguage/cnn_dailymail/split/train/inputs.jsonl.gz
ognn-build-vocab --no_pad_token --field_name edges --string_index 0 
                 --save_vocab /data/naturallanguage/cnn_dailymail/edge.vocab 
                 /data/naturallanguage/cnn_dailymail/split/train/inputs.jsonl.gz
ognn-build-vocab --with_sequence_tokens 
                 --save_vocab /data/naturallanguage/cnn_dailymail/output.vocab 
                 /data/naturallanguage/cnn_dailymail/split/train/inputs.jsonl.gz

Then run

python train_and_eval.py

This will create the model directory cnndailymail_summarizer, which contains tensorflow checkpoint and event files that can monitored in tensorboard.

We can also pass directly the file we wish to do inference on by running

python train_and_eval.py --infer_source_file /data/naturallanguage/cnn_dailymail/split/test/inputs.jsonl.gz 
                         --infer_predictions_file /data/naturallanguage/cnn_dailymail/split/test/predictions.jsonl

Then print the metrics on the predictions run

python rouge_evaluator /data/naturallanguage/cnn_dailymail/split/test/summaries.jsonl.gz 
                       /data/naturallanguage/cnn_dailymail/split/test/predictions.jsonl





上一篇:ggnn_numpy

下一篇:ggnn_fault_loc

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • shih-styletransfer

    shih-styletransfer Code from Style Transfer ...