Abstract:
The data deluge is defined by increasing amounts of large data with increasing degree
of uncertainty. In a recent response, probabilistic databases are receiving a great deal of
interest from research and industry. One popular approach to probabilistic databases is to
extend traditional relational database technology to handle uncertainty. In this approach
probabilistic databases are probability distributions over a collection of possible worlds of
relational databases. On the one hand, research has seen various efforts to extend query
evaluation from relational to probabilistic databases. On the other hand, updates have
not received much attention at all. In this paper we show that well-known syntactic normal
form conditions capture probabilistic databases with desirable update behavior. Such
behavior includes the absence of data redundancy, insertion, deletion, and modification
anomalies. We further show that standard normalization procedures can be applied to
standard representations of probabilistic databases to obtain database schemata that satisfy
the normal form condition, and can thus be updated efficiently.