dc.contributor.advisor |
Wang, Y |
en |
dc.contributor.author |
Hu, Shengwei |
en |
dc.date.accessioned |
2020-04-24T01:59:14Z |
en |
dc.date.issued |
2020 |
en |
dc.identifier.uri |
http://hdl.handle.net/2292/50494 |
en |
dc.description.abstract |
The motivation of this thesis is to provide a new general approach to solving statistical learning problems from the perspective of probability density as it has the capacity of observing a fuller view of data. For this purpose, we present a density estimator by modifying a semiparametric mixture-based density estimator recently proposed by Wang and Wang (2015). To optimize the trade-off between computational efficiency and estimation accuracy, the original full covariance matrix is replaced with a diagonal one so that it has the potential to be applied to high-dimensional scenarios. Numerical studies suggest its performance is competitive compared to other peer density estimators. On the basis of the presented density estimator, several practical areas in the field of statistical learning are then studied. Firstly, we investigate the classification problem. In particular, a new density-based classifier is developed. A novel point of the proposed classifier is its ability to perform density estimation on a data set with multiple classes while controlling the smoothness of the density estimates for different classes using a global bandwidth parameter. Secondly, we investigate cluster analysis and propose two modal clustering methods. Both of them start with density estimation using the proposed density estimator and allocate observations into several initial clusters. The first clustering method suggests a new technique called “mode-flattening” to sequentially reduce the number of clusters by locally modifying the smoothness of the density estimate. The second clustering method presents a “cluster-merging” approach based on the connectivities between clusters and the significance of each cluster. Simulated and real-world case studies show that both methods are able to deal with difficult clustering problems. Finally, we study feature selection for high-dimensional classification problems. A novel density-based feature selection method is proposed by ranking variables on the basis of their empirical misclassification errors. The proposed feature selection method is demonstrated to be efficient through several real-world high-dimensional classification applications. |
en |
dc.publisher |
ResearchSpace@Auckland |
en |
dc.relation.ispartof |
PhD Thesis - University of Auckland |
en |
dc.relation.isreferencedby |
UoA |
en |
dc.rights |
Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. |
en |
dc.rights.uri |
https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm |
en |
dc.rights.uri |
http://creativecommons.org/licenses/by-nc-sa/3.0/nz/ |
en |
dc.title |
Statistical Learning with Mixtures |
en |
dc.type |
Thesis |
en |
thesis.degree.discipline |
Statistics |
en |
thesis.degree.grantor |
The University of Auckland |
en |
thesis.degree.level |
Doctoral |
en |
thesis.degree.name |
PhD |
en |
dc.rights.holder |
Copyright: The author |
en |
dc.rights.accessrights |
http://purl.org/eprint/accessRights/OpenAccess |
en |
pubs.elements-id |
799109 |
en |
pubs.record-created-at-source-date |
2020-04-24 |
en |
dc.identifier.wikidata |
Q112952398 |
|