The archives of history : a phylogenetic approach to the study of language
Reference
Degree Grantor
Abstract
Languages are the archives of history. They not only provide us with a system for communicating historical information, but their elements - such as lexicon and grammar - carry historical signal about the people who spoke these languages and their cultures. In this thesis, I apply computational Bayesian phylogenetic methods to language data to test hypotheses about the human settlement of the Pacific. This region was settled during the Holocene by the Austronesians who spread through the region from Taiwan, through Island South-East Asia, and into Oceania. During this period, the languages the Austronesians brought with them diversified into a vast language family, currently containing between 1,000 and 1,200 languages. Here I describe the construction and development of a large database of lexical items: the Austronesian Basic Vocabulary Database. This database now contains around 100,000 lexical items from over 500 languages in the Austronesian region. The lexical cognate information in this database is used to construct a phylogeny of 400 Austronesian languages using Bayesian phylogenetic methods. The results support a recent 5,500-year Taiwanese origin of the Austronesians in three critical aspects. First, the trees place the origin of the Austronesians in Taiwan, with a subsequent spread south through Island South-East Asia into Oceania and Polynesia. Second, divergence time estimates calculated on the phylogenies show an emergence of Austronesian around 5,200 years ago. Third, the trees reflect the same pattern of expansion pulses and settlement pauses as predicted by the Taiwanese-Origin scenario. To explore further the potential of these phylogenetic methods, I conduct a simulation study to establish the effect that borrowing of lexical items between languages has on phylogenetic reconstruction. Results show that these methods are able to robustly estimate the true tree topology even when borrowing levels are high, but show a general tendency towards underestimating the true age. Finally, the phylogenetic potential of typological linguistic information is assessed. In contrast to arguments suggesting that typology is deeply stable, the analyses show that this data is evolving as rapidly as lexical information. I conclude that the application of these phylogenetic methods to language information provide very powerful ways of testing hypotheses about human prehistory and language change in general.