Abstract
The information revolution brought with it
information pollution. Information retrieval
and extraction help us cope with abundant information from diverse sources. But some
sources are of anonymous authorship, and
some are of uncertain accuracy, so how can
we determine what we should actually believe? Not all information sources are equally
trustworthy, and simply accepting the majority
view is often wrong.
This paper develops a general framework for
estimating the trustworthiness of information
sources in an environment where multiple
sources provide claims and supporting evidence, and each claim can potentially be produced by multiple sources. We consider two
settings: one in which information sources
directly assert claims, and a more realistic
and challenging one, in which claims are inferred from evidence provided by sources, via
(possibly noisy) NLP techniques. Our key
contribution is to develop a family of probabilistic models that jointly estimate the trustworthiness of sources, and the credibility of
claims they assert. This is done while accounting for the (possibly noisy) NLP needed
to infer claims from evidence supplied by
sources. We evaluate our framework on several datasets, showing strong results and significant improvement over baselines.