Abstract:
Graph database are increasingly popular for data management and analytics. As with every data model, managing the integrity of entities is fundamental for data governance but also important for the efficiency of update and query operations. In response to shortcomings of uniqueness and existence constraints in graph databases, we propose a new principled class of constraints that separates uniqueness from existence dimensions, and fully supports multiple labels and composite properties. We illustrate benefits of the constraints on real-world examples by use of the node integrity they enforce for better update and query performance. We establish axiomatic and algorithmic characterizations for reasoning about any set of constraints in our new class. We also give examples of small node samples that satisfy the same constraints as the original data set, and are useful for the elicitation of business rules, and the identification of data quality problems. Finally, we briefly discuss the role of our constraints in the design for data quality, and propose extensions to managing node integrity within graph database systems.