Automated Documentation to Code Traceability Link Recovery and Visualization
Reference
Degree Grantor
Abstract
Documentation written in natural language and source code are two of the major artifacts of a system. Tracking a variety of traceability links between documentation and code assists developers in comprehension, efficient development, and effective management of a system. However, automated traceability systems to date have faced with three major open research challenges. The first challenge is how to extract links with both high precision and high recall. We introduce an approach that combines three supporting techniques, Regular Expression, Key Phrases, and Clustering, with Information Retrieval (IR) models to improve the performance of automated traceability recovery between documents and source code. This combination approach takes advantage of strengths of the three techniques to ameliorate limitations of IR models. Our experimental results show that our approach improves the performance of IR models, increases the precision of retrieved links, and recovers more true links than IR alone. The second challenge is how to establish robust traceability benchmarks to evaluate traceability recovery techniques. We describe an approach and guidelines to enable researchers to establish affordable and robust traceability benchmarks. We have designed rigorous manual identification and verification strategies to determine whether or not a link is correct. We have developed a formula to calculate the probability of errors made in created benchmarks. The analysis of error probability results shows that our approach can build high quality benchmarks and our strategies significantly reduce the error probability in them. The third challenge is how to efficiently visualize links for complex systems because of scalability and visual clutter issues. We present a new approach that combines treemap and hierarchical tree techniques to reduce visual clutter and to allow the visualization of the global structure of traces and a detailed overview of each trace, while still being highly scalable and interactive. The usability evaluation results show that our approach can effectively and efficiently help software developers comprehend, browse, and maintain large numbers of links.