fever-allennlp
This is an implenetation of the FEVER Baseline system. This is a heterogeneous system with two components: the for a short sentence (called a claim) this system firstly finds evidence (information rertrieval) and then performs an assessment whether the claim is Supported or Refuted given the evidence (natual language inference).
The system contains three components: an end-to-end evidence retrieval system, the NLI classifier and a training data sampling script to generate new instances to train the NLI model.
The information retrieval system uses Facebook's DrQA implementation for TF-IDF based document similarity. The DrQA script runs with two phases: firstly using the Wikipedia index, it will select the k=5 closest documents that are most similar to the claim. Each of the sentences in those documents is used to then construct a new index over the sentences to find l=5 nearest sentences to the claim.
We have three models for NLI: Decomposable Attention, ESIM and ESIM with ELMO embeddings. These models concatenate the evidence sentences retrieved from the IR phase and predict a label in Supported, Refuted, or NotEnoughInfo.
For claims that are NotEnoughInfo there is no evidence labeled in the dataset. The evidence sampling script identifies the closest relevant document for NotEnoughInfo claims then samples a sentence uniformly at random to train the NLI classifier.
This model can be installed with either pip or docker. For more info about the docker image, see this repo: )[https://github.com/j6mes/fever2-sample])
Create and activate fever conda environment
conda create -n feversource activate fever
Install dependencies
pip install -r requirements.txt
If using the docker verison of this repo. The data will be mounted in a data folder. Otherwise it must be manually set up with the following scripts:
Download GloVe
mkdir -p data wget http://nlp.stanford.edu/data/wordvecs/glove.6B.zip unzip glove.6B.zip -d data/glove gzip data/glove/*.txt
Download Wiki
mkdir -p data mkdir -p data/index mkdir -p data/fever wget -O data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz https://s3-eu-west-1.amazonaws.com/fever.public/wiki_index/fever-tfidf-ngram%3D2-hash%3D16777216-tokenizer%3Dsimple.npz wget -O data/fever/fever.db https://s3-eu-west-1.amazonaws.com/fever.public/wiki_index/fever.db
Download Data
mkdir -p data mkdir -p data/fever-data wget -O data/fever-data/train.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/train.jsonl wget -O data/fever-data/dev.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/shared_task_dev.jsonl wget -O data/fever-data/test.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/shared_task_test.jsonl
The pretrained models can be used with the following scripts:
Find the 5 nearest sentences from the 5 nearest documents using the pre-computed TF-IDF index. The documents are in the database file fever.db.
export PYTHONPATH=srcexport FEVER_ROOT=$(pwd)mkdir -p workexport WORK_DIR=work python -m fever_ir.evidence.retrieve --database $FEVER_ROOT/data/fever/fever.db --index $FEVER_ROOT/data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file $FEVER_ROOT/data/fever-data/dev.jsonl --out-file $WORK_DIR/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5
There are three available model files: https://jamesthorne.co.uk/fever/fever-esim-elmo.tar.gz, https://jamesthorne.co.uk/fever/fever-esim.tar.gz and https://jamesthorne.co.uk/fever/fever-da.tar.gz. If you are using a pretrained model, change $MODEL_FILE and $MODEL_NAME appropriately.
export CUDA_DEVICE=-1 #Set this to appropriate value if using a GPU. -1 for CPUexport PYTHONPATH=srcexport FEVER_ROOT=$(pwd)mkdir -p workexport WORK_DIR=workexport MODEL_NAME=fever-esim-elmoexport MODEL_FILE=https://jamesthorne.co.uk/fever/$MODEL_NAME.tar.gz python -m allennlp.run predict --output-file $WORK_DIR/$MODEL_NAME.predictions.jsonl --cuda-device $CUDA_DEVICE --include-package fever.reader $MODEL_FILE $WORK_DIR/dev.sentences.p5.s5.jsonl
Dev set can be scored locally with the FEVER scorer
python -m fever_ir.submission.score --predicted_labels $WORK_DIR/$MODEL_NAME.predictions.jsonl --predicted_evidence $WORK_DIR/dev.sentences.p5.s5.jsonl
Test set can be uploaded to the scoring server for scoring - the submission can be prepared with the following script
python -m fever_ir.submission.prepare --predicted_labels $WORK_DIR/$MODEL_NAME.predictions.jsonl --predicted_evidence $WORK_DIR/test.sentences.p5.s5.jsonl --out_file $WORK_DIR/submission.jsonl
export PYTHONPATH=srcexport FEVER_ROOT=$(pwd)mkdir -p workexport WORK_DIR=work python -m fever_ir.evidence.retrieve --index $FEVER_ROOT/data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file $FEVER_ROOT/data/fever-data/dev.jsonl --out-file $FEVER_ROOT/data/fever/train.ns.pages.p1
Decomposable Attention Model
allennlp train configs/decomposable_attention.json -s log/fever_da --include-package fever
ESIM
allennlp train configs/esim_elmo.json -s log/fever_esim --include-package fever
ESIM+ELMo
allennlp train configs/esim_elmo.json -s log/fever_esim_elmo --include-package fever
还没有评论,说两句吧!
热门资源
TensorFlow-Course
This repository aims to provide simple and read...
Tensorflow-TCN
Tensorflow TCN The explanation and graph in ...
DuReader_QANet_BiDAF
Machine Reading Comprehension on DuReader Usin...
My_DrQA
My_DrQA A re-implement DrQA based on Pytorch
Klukshu-Sockeye-...
KLUKSHU SOCKEYE PROJECTS 2016 This repositor...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com