资源算法ABSA-BERT-pair

ABSA-BERT-pair

2020-03-10 | |  41 |   0 |   0

ABSA as a Sentence Pair Classification Task

Codes and corpora for paper "Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence" (NAACL 2019)

Requirement

  • pytorch: 1.0.0

  • python: 3.7.1

  • tensorflow: 1.13.1 (only needed for converting BERT-tensorflow-model to pytorch-model)

  • numpy: 1.15.4

  • nltk

  • sklearn

Step 1: prepare datasets

SentiHood

Since the link given in the dataset released paper has failed, we use the dataset mirror listed in NLP-progress and fix some mistakes (there are duplicate aspect data in several sentences). See directory: data/sentihood/.

Run following commands to prepare datasets for tasks:

cd generate/
bash make.sh sentihood

SemEval 2014

Train Data is available in SemEval-2014 ABSA Restaurant Reviews - Train Data and Gold Test Data is available in SemEval-2014 ABSA Test Data - Gold Annotations. See directory: data/semeval2014/.

Run following commands to prepare datasets for tasks:

cd generate/
bash make.sh semeval

Step 2: prepare BERT-pytorch-model

Download BERT-Base (Google's pre-trained models) and then convert a tensorflow checkpoint to a pytorch model.

For example:

python convert_tf_checkpoint_to_pytorch.py 
--tf_checkpoint_path uncased_L-12_H-768_A-12/bert_model.ckpt 
--bert_config_file uncased_L-12_H-768_A-12/bert_config.json 
--pytorch_dump_path uncased_L-12_H-768_A-12/pytorch_model.bin

Step 3: train

For example, BERT-pair-NLI_M task on SentiHood dataset:

CUDA_VISIBLE_DEVICES=0,1,2,3 python run_classifier_TABSA.py 
--task_name sentihood_NLI_M 
--data_dir data/sentihood/bert-pair/ 
--vocab_file uncased_L-12_H-768_A-12/vocab.txt 
--bert_config_file uncased_L-12_H-768_A-12/bert_config.json 
--init_checkpoint uncased_L-12_H-768_A-12/pytorch_model.bin 
--eval_test 
--do_lower_case 
--max_seq_length 512 
--train_batch_size 24 
--learning_rate 2e-5 
--num_train_epochs 6.0 
--output_dir results/sentihood/NLI_M 
--seed 42

Note:

  • For SentiHood, --task_name must be chosen in sentihood_NLI_Msentihood_QA_Msentihood_NLI_Bsentihood_QA_B and sentihood_single. And for sentihood_single task, 8 different tasks (use datasets generated in step 1, see directory data/sentihood/bert-single) should be trained separately and then evaluated together.

  • For SemEval-2014, --task_name must be chosen in semeval_NLI_Msemeval_QA_Msemeval_NLI_Bsemeval_QA_B and semeval_single. And for semeval_single task, 5 different tasks (use datasets generated in step 1, see directory : data/semeval2014/bert-single) should be trained separately and then evaluated together.

Step 4: evaluation

Evaluate the results on test set (calculate Acc, F1, etc.).

For example, BERT-pair-NLI_M task on SentiHood dataset:

python evaluation.py --task_name sentihood_NLI_M --pred_data_dir results/sentihood/NLI_M/test_ep_4.txt

Note:

  • As mentioned in step 3, for sentihood_single task, 8 different tasks should be trained separately and then evaluated together. --pred_data_dir should be a directory that contains 8 files named as follows: loc1_general.txtloc1_price.txtloc1_safety.txtloc1_transit.txtloc2_general.txtloc2_price.txtloc2_safety.txt and loc2_transit.txt

  • As mentioned in step 3, for semeval_single task, 5 different tasks should be trained separately and then evaluated together. --pred_data_dir should be a directory that contains 5 files named as follows: price.txtanecdotes.txtfood.txtambience.txt and service.txt

  • For the rest 8 tasks, --pred_data_dir should be a file just like that in the example.

Citation

@inproceedings{sun-etal-2019-utilizing,
    title = "Utilizing {BERT} for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence",
    author = "Sun, Chi  and
      Huang, Luyao  and
      Qiu, Xipeng",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/N19-1035",
    pages = "380--385"
}


上一篇:bert_score

下一篇: bert.erl

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...