Abstract:
Least-squares tree is a phylogenetic tree generated by the least-squares model based on an observed matrix of pairwise distances to depict the evolutionary relationships between biological replicators. In biology, it is common in many studies to focus on the binary tree as the primary graphical representation to describe the phylogenetic relationships. Many methods available for generating the tree/model for given distances are all about finding the “best” binary tree. However, in the case of the existence of multiway speciation events, the binary tree will not be the “best” possible model. In this thesis, we perform a backward selection method with the application of information theoretic criteria (ITC) on the tree suggested by any tree construction methods in order to remove the insignificant internal edges and to obtain the optimal model which can better describe the relationships between objects whose pairwise distances are known. Through the project, we (i) investigate the use of ITC on least-squares trees and prove theoretical results on the optimal tree selection, (ii) present the issues and problems related to the application of BIC (a very popular ITC in phylogenetics); (iii) build a tree selection process with the application of ten ITC in R, (iv) evaluate the performance by conducting experiments with both simulated data and real-world data, (v) implement the tree selection process in a web-based Shiny application with a friendly user interface for the public use.