Abstract:
The issue of missing data is a common problem for researchers and data analysts working with surveys and other types of questionnaires that use ordinal data. Despite the frequent occurrence and the relevance of this missing data problem, many machine learning algorithms handle missing data in a rather naive way. The standard approach involves first imputing the missing values, and then giving the completed imputed data to the learning algorithm. One advantage of this approach is that it allows the user to select the most suitable imputation method for different datasets. However, the classification result is not promising. Su et al. proposed an algorithm called “Classifier-based Nominal Imputation” (CNI), which improves the classification problem for machine learning algorithms on incomplete nominal datasets, but the performance on ordinal data remains unknown. Our work applied this CNI technique to ordinal data and the experimental results showed that using this CNI algorithm to pre-process the incomplete ordinal dataset, resulted in significantly higher classification accuracy than learners that do not apply any imputation method and those using baseline imputation techniques, such as the most common value imputation. This CNI algorithm is found to be helpful for many learners such as K Nearest Neighbour, Naive Bayes and Multilayer Perceptron Neural Networks on incomplete ordinal data.