Abstract:
The aim of this thesis is to implement an additional Zipf-MandelbrotII distribution (Mandelbrot 1961), as a piece of software, on top of the existing Vector Generalized Linear and Additive Models (VGLMs/VGAMs) framework. The implemented model follows Zipf’s law, and is particularly suitable for word frequency analysis, taking into account of the occurring frequency and occurring rank of words in a corpus. Through the VGAM R package, one can develop the Maximum Likelihood Estimate (MLE) of the scaling parameter on empirical data sets by passing in the implemented model. Along with the model, associate dpqr-type functions are also implemented. All functions are used and validated using real world data sets from the Gutenberg Database. Upon the completion of this work, a greater flexibility on analysing linguistics data is provided.