Abstract:
Probabilistic databases address well the requirements of an increasing number of modern applications that produce large collections of uncertain data. We propose probabilistic cardinality constraints as a principled tool to control the occurrences of data patterns in probabilistic databases. Our constraints help balance the consistency and completeness targets for the quality of an organization's data, and can be used to predict with which probability a given number of query answers will be returned without actually querying the data. These target applications are unlocked by developing algorithms to reason efficiently about probabilistic cardinality constraints, and to help analysts acquire the marginal probability by which cardinality constraints should hold in a given application domain. For this purpose, we overcome technical challenges to compute Armstrong PC-sketches as succinct data samples that perfectly visualize any given perceptions about these marginal probabilities.