Abstract:
The Structured Query Language (SQL) has been the ANSI and ISO industry standard for defining and querying databases for more than three decades now. SQL still dominates the market and influences new evolving paradigms. The increasingly central role of data in business, science and society brings forward a new wave of database users that need to learn SQL to become or remain successful in their domain. Querying databases is a non-trivial exercise: Queries must be specified with respect to a schema which might be difficult to comprehend or not even available, and there is generally no feedback whether the query that users write is actually sound in terms of its semantics. Many new users cannot afford to attend courses to learn how to write sound queries, and even those who can, may still remain in doubt whether they have written a query that meets the target. Trainers of SQL may find it impossible to provide detailed feedback on the soundness of a trainee’s query. This is particularly challenging as there are many different ways by which the semantics of a query can be expressed syntactically in SQL. As humans learn a lot from good examples, ideally, users would like to see some example that illustrates the difference of their query from the correct one. Exploiting randomly generated databases or real-world databases for that purpose is doomed: one cannot expect for an SQL query Q to simply find a database db which produces different answers than Q for every SQL query Q′ that is semantically different from Q. Not surprisingly, previous research on this topic has identified severe limits to this approach. Nevertheless, in terms of learning how to write semantically sound SQL queries it would be great to have an algorithm that could somehow magically produce, for a given query Q, a database dbQ that produces answers to Q that are different from answers to any query Q′ that is semantically different from Q. It is widely accepted that most queries that database users ask in practice are conjunctive queries. This research has established foundations as well as an implementation and performance analysis of an algorithm that computes such databases dbQ for conjunctive SQL queries. More precisely, for each conjunctive SQL query Q the algorithm computes a database dbQ such that for all queries Q′ ∈ L(Q), Q and Q′ are semantically equivalent if and only if the evaluation of Q on dbQ produces the same answers as the evaluation of Q′ on dbQ. Here, L(Q) consists of all those conjunctive SQL queries that have the same SELECT and FROM clause as Q. A database dbQ with these properties is called an L(Q)-test database for Q. Having dbQ, reduces the query equivalence problem for the class L(Q) to the evaluation of such queries on dbQ. It is therefore not surprising that the existence and computation of test databases are non-trivial. Indeed, it is known that there are conjunctive queries under set semantics for which no test databases exist. It is shown that that the investigation of conjunctive queries under bag semantics, i.e. conjunctive SQL queries, is therefore a key enabler for the general existence of test databases. The design of the algorithm for computing a test database for Q is based on identifying tuples that represent Q and some queries that properly contain Q and are minimal with that property. The algorithm is implemented in form of a Graphical User Interface (GUI) that shows the results of both target query Q and user query Q′ ∈ L(Q), and highlights any existing differences in the results to help users understand either why their query is incorrect or assure them that they produced a correct answer. The GUI therefore provides considerable assistance in learning conjunctive SQL queries. In addition to developing the algorithm that computes dbQ and its implementation in form of the GUI, the research also conducts several experiments that evaluate the performance of the algorithm and analyzes the results. The experiments confirm the intuition that the computation of L(Q)-test databases for conjunctive queries is efficient in practice.