Abstract:
Program Analysis (PA) is the process of gathering deeper insights about a
source code and analysing them to keep complex software correct according to certain criteria. A modern approach to performing program analysis
is to translate the analysis logic into a standard database language such
as Datalog. Datalog based program analysis tools have been successful in
gaining performance improvements. However, working with such proprietary
tools is complex and requires a steep learning curve, the processes are non-standardised, analysis are not easily shareable, and tooling expertise is required to perform efficient analysis. The key challenge in the program analysis domain is the lack of a standardised and reusable framework that is both
simple and scalable. We approached the above-mentioned challenges from
the perspective of using database as a solution and, in particular, utilising
state-of-the-art Semantic Web technologies. We use an ontology to model
and standardise the domain, the Resource Description Framework (RDF) to
format the data, a triplestore to store and process the data, and Datalog to
represent the rules. The Datalog rules are transformed into SPARQL (de-facto query language for triplestore databases) queries for execution in the
Semantic Web stack. To reduce the cost of fine-tuning an analysis execution,
we have designed a generalised and novel self-learning SPARQL optimiser
which can learn from each query execution and improve itself, and thereby
can improve rule execution as well. To handle evolving/changing source code
data, we have used an in-house temporal triplestore, PDStore (Parsimonious
Data Store). With these features, the Semantic Web stack can enable program analysis, source code querying and further provides a mechanism to
check the source code adherence to coding standards. Our approach makes it
possible for improvements in the Semantic Web stack to be cascaded to the
benefit of the program analysis community. In this thesis, we did not create
any new types of program analysis, but have looked at a way of using a Semantic Web stack to enable program analysis requirements that will make
the analysis standardised, efficient, reusable, and portable (to any Semantic
stack).