Abstract:
To improve health management on the heart disease related hospital readmission, this thesis proposes a machine learning based framework for modeling the patients who are at-risk for preventable hospital readmission. Differing from the traditional statistical methods, this proposed framework integrates Association Rule Mining (ARM) and Clustering techniques to build at-risk identification models for the pursuit of new insights into the at-risk populations. The new insights allow general practitioners to view their patients in groups characterized by the predicted risk factors. We aim to provide the new insights to complement conventional regression models which are limited in scope to the significance and the weight of specific expected predictors. The proposed framework is called a ‘Hybrid Clustering-ARM framework’ (HCA framework). To experimentally assess the feasibility of the HCA framework, we were approved to access two data sources: Framingham Heart Study, which is a well-known historical dataset for heart events; and the New Zealand VIEW (Vascular Informatics Using Epidemiology and the Web) dataset, which is relatively new. We applied the HCA framework on both data sources with a series of sensitivity analyses. To some extent, the HCA framework is able to produce a model to identify the risk factors as good as the traditional regression based models. Besides the traditional perspective of detecting risk factors in the medical prediction models, the identification model, derived by the HCA framework, provides an insight on the ‘at-risk’ patients in clusters as well as ‘low-risk’ patients. Together, these detected ‘at-risk’ patients are allowed to map into multiple clusters, which makes the understanding on the ‘at-risk’ patients close to the natural distribution of the sampled patients. Theoretically, all sampled patients are at some risk of having the CVD conditions of interest. However, some of the sampled patients are more likely to develop the disease than others. By segmenting the sampled patients in the style with multiple ‘at-risk’ clusters and one ‘low-risk’ group, it helps the general practitioners (or others with an interest, e.g. public health physicians, cardiologists or health policy and resource planners) to categorize their patients for better health management.