Abstract
Open information extraction (IE) is the task of
extracting open-domain assertions from natural language sentences. A key step in open
IE is confidence modeling, ranking the extractions based on their estimated quality to
adjust precision and recall of extracted assertions. We found that the extraction likelihood,
a confidence measure used by current supervised open IE systems, is not well calibrated
when comparing the quality of assertions extracted from different sentences. We propose
an additional binary classification loss to calibrate the likelihood to make it more globally
comparable, and an iterative learning process,
where extractions generated by the open IE
model are incrementally included as training
samples to help the model learn from trial and
error. Experiments on OIE2016 demonstrate
the effectiveness of our method.1