Statistical Learning with Mixtures

Hu, Shengwei

dc.contributor.advisor	Wang, Y	en
dc.contributor.author	Hu, Shengwei	en
dc.date.accessioned	2020-04-24T01:59:14Z	en
dc.date.issued	2020	en
dc.identifier.uri	http://hdl.handle.net/2292/50494	en
dc.description.abstract	The motivation of this thesis is to provide a new general approach to solving statistical learning problems from the perspective of probability density as it has the capacity of observing a fuller view of data. For this purpose, we present a density estimator by modifying a semiparametric mixture-based density estimator recently proposed by Wang and Wang (2015). To optimize the trade-off between computational efficiency and estimation accuracy, the original full covariance matrix is replaced with a diagonal one so that it has the potential to be applied to high-dimensional scenarios. Numerical studies suggest its performance is competitive compared to other peer density estimators. On the basis of the presented density estimator, several practical areas in the field of statistical learning are then studied. Firstly, we investigate the classification problem. In particular, a new density-based classifier is developed. A novel point of the proposed classifier is its ability to perform density estimation on a data set with multiple classes while controlling the smoothness of the density estimates for different classes using a global bandwidth parameter. Secondly, we investigate cluster analysis and propose two modal clustering methods. Both of them start with density estimation using the proposed density estimator and allocate observations into several initial clusters. The first clustering method suggests a new technique called “mode-flattening” to sequentially reduce the number of clusters by locally modifying the smoothness of the density estimate. The second clustering method presents a “cluster-merging” approach based on the connectivities between clusters and the significance of each cluster. Simulated and real-world case studies show that both methods are able to deal with difficult clustering problems. Finally, we study feature selection for high-dimensional classification problems. A novel density-based feature selection method is proposed by ranking variables on the basis of their empirical misclassification errors. The proposed feature selection method is demonstrated to be efficient through several real-world high-dimensional classification applications.	en
dc.publisher	ResearchSpace@Auckland	en
dc.relation.ispartof	PhD Thesis - University of Auckland	en
dc.relation.isreferencedby	UoA	en
dc.rights	Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.	en
dc.rights.uri	https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/nz/	en
dc.title	Statistical Learning with Mixtures	en
dc.type	Thesis	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	The University of Auckland	en
thesis.degree.level	Doctoral	en
thesis.degree.name	PhD	en
dc.rights.holder	Copyright: The author	en
dc.rights.accessrights	http://purl.org/eprint/accessRights/OpenAccess	en
pubs.elements-id	799109	en
pubs.record-created-at-source-date	2020-04-24	en
dc.identifier.wikidata	Q112952398

Files in this item

Name: whole.pdf

Size: 5.822Mb

Format: PDF

This item appears in the following Collection(s)

Doctoral Theses [6929]

Show simple item record

Statistical Learning with Mixtures

Files in this item

This item appears in the following Collection(s)

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics

Statistical Learning with Mixtures

Files in this item

This item appears in the following Collection(s)

Share

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics