Abstract:
Identifying the correct evolutionary tree is an essential and difficult biological problem. It is important to neither over-resolve nor falsely resolve its structure; a problem well-suited to information criteria. Stochastic Complexity (SC) was introduced in [Rissanen(1978)] and since then various forms of it have been derived (see [Rissanen(2012)] for the newest developments of this topic). According to the MDL principle, SC is defined in the context of transmitting the existing data to a hypothesized decoder. The “encoding” is performed by using mathematical models that belong to a pre-defined class, and the model which leads to the shortest code length is deemed to be the most suitable for describing the data [Gr ünwald(2007)]. In this work, we consider SC for assessing phylogenetic trees. To this end, we use SC to encode the parameters and the model (tree) structure. We perform a theoretical comparison of SC with the well- known Bayesian Information Criterion (BIC) and investigate their behavior when the size of the tree→∞and as error → 0. Experiments are conducted with real-world and simulated data in which we compare SC with various forms of BIC, AIC (Akaike Information Criterion) and KIC (Kullback Information Criterion).