Statistical Learning with Mixtures

Show simple item record

dc.contributor.advisor Wang, Y en
dc.contributor.author Hu, Shengwei en
dc.date.accessioned 2020-04-24T01:59:14Z en
dc.date.issued 2020 en
dc.identifier.uri http://hdl.handle.net/2292/50494 en
dc.description.abstract The motivation of this thesis is to provide a new general approach to solving statistical learning problems from the perspective of probability density as it has the capacity of observing a fuller view of data. For this purpose, we present a density estimator by modifying a semiparametric mixture-based density estimator recently proposed by Wang and Wang (2015). To optimize the trade-off between computational efficiency and estimation accuracy, the original full covariance matrix is replaced with a diagonal one so that it has the potential to be applied to high-dimensional scenarios. Numerical studies suggest its performance is competitive compared to other peer density estimators. On the basis of the presented density estimator, several practical areas in the field of statistical learning are then studied. Firstly, we investigate the classification problem. In particular, a new density-based classifier is developed. A novel point of the proposed classifier is its ability to perform density estimation on a data set with multiple classes while controlling the smoothness of the density estimates for different classes using a global bandwidth parameter. Secondly, we investigate cluster analysis and propose two modal clustering methods. Both of them start with density estimation using the proposed density estimator and allocate observations into several initial clusters. The first clustering method suggests a new technique called “mode-flattening” to sequentially reduce the number of clusters by locally modifying the smoothness of the density estimate. The second clustering method presents a “cluster-merging” approach based on the connectivities between clusters and the significance of each cluster. Simulated and real-world case studies show that both methods are able to deal with difficult clustering problems. Finally, we study feature selection for high-dimensional classification problems. A novel density-based feature selection method is proposed by ranking variables on the basis of their empirical misclassification errors. The proposed feature selection method is demonstrated to be efficient through several real-world high-dimensional classification applications. en
dc.publisher ResearchSpace@Auckland en
dc.relation.ispartof PhD Thesis - University of Auckland en
dc.relation.isreferencedby UoA en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/nz/ en
dc.title Statistical Learning with Mixtures en
dc.type Thesis en
thesis.degree.discipline Statistics en
thesis.degree.grantor The University of Auckland en
thesis.degree.level Doctoral en
thesis.degree.name PhD en
dc.rights.holder Copyright: The author en
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en
pubs.elements-id 799109 en
pubs.record-created-at-source-date 2020-04-24 en
dc.identifier.wikidata Q112952398


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics