Abstract
Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with
new data, powered by modern information extraction tools that associate probabilities with database
tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world
assumption of probabilistic databases, that facts not
in the database have probability zero, clearly con-
flicts with their everyday use. To address this discrepancy, we propose an open-world probabilistic
database semantics, which relaxes the probabilities
of open facts to default intervals. For this openworld setting, we lift the existing data complexity
dichotomy of probabilistic databases, and propose
an efficient evaluation algorithm for unions of conjunctive queries. We also show that query evaluation can become harder for non-monotone queries.