Abstract:
Accurate effort estimation enables effective managerial decisions to be made by project managers when embarking on a project, and this trend also applies when managing Web projects. Web development has steadily increased over the years; research into identifying sound ways to improve the effort estimates being made by thousands of Web companies worldwide would be valuable. Looking at the state of the art in the domain of Web resource estimation, we show that numerous effort estimation techniques have been investigated; none of which have proven conclusively to be the best. Findings from a prior study identified that accurate estimation techniques can be combined to create ensembles that consistently performed well when used for general software effort estimation. This motivated us to replicate this study using Web project data from the Tukutuku dataset. Our results confirmed that such ensembles are also effective in the domain of Web effort estimation. Next we extended the methodology replicated, and applied it in conjunction with a technique called bagging when building Web effort estimation ensembles. Bagging enables multiple estimation runs to be made using a single dataset, which provides us with a more in depth look at ensemble performance. As bagging usually entails a loss of training data, we investigated three variants of bagging with different amounts of data loss, from around a third of the training set to none. We therefore demonstrate the effect the bagging process has on effort estimation techniques, particularly unstable ones. We analyzed these results in terms of the accuracy-diversity trade-off, taking an established mathematical formalization of this trade-off and successfully applying it to our ensemble results. This analysis provides an explanation for our estimation results, and also lays the ground work for building ensembles using estimation techniques that have been selected for their diversity as well as for their accuracy. In this thesis we demonstrate that while there may not be a single best estimation technique, ensembles of techniques can provide consistently accurate effort estimates. Practitioners can combine the estimates from a small set of accurate techniques, perhaps ones they have used successfully in the past, to create effective effort estimation ensembles. From a research perspective we demonstrate that bagging can be used with effort estimation, and that the ensemble accuracy-diversity trade-o can be quantified. We hope that the findings of this thesis will encourage practitioners to use ensembles for Web effort estimation, while acting as a stepping stone for future research into this domain.