Abstract:
The effect of clustering on tests for goodness-of-fit and homogeneity is reviewed. An extensive empirical study to compare the performance of the Rao-Scott modified chi-squared test, the Wald chi-squared test, the ordinary Pearson chi-squared test and the generalized Hotelling's T2 test is carried out which suggests that the Hotelling's T2 test out-performs the others. The Rao-Scott modified test and the Wald test work well for large surveys. In smaller or moderate surveys, the Wald test does not perform as well as the Rao-Scott modified-test. We derive a statistic, TG2, which is analogous to the generalized Hotelling's T2 statistic, for testing the hypothesis H0:P = f(ө). The Dirichlet multinomial distribution is formulated as a model for contingency tables. We also carry out an extensive empirical study to compare the performance of Brier's modified chi-squared test, the Pearson chi-squared test and the TG2 test. The results show that the Pearson chi-squared test performs very badly indeed. Brier's modified chi-squared test is reasonably good even when the numbers of clusters is small. The TG2 test is preferable to Brier's modified test only when the cluster size is large. Clustering also has a strong influence on the Sign-test, the One-Sample t- test, the Mann-Whitney-Wilcoxon test and the Two-Sample t-test. We suggest modified test statistics which lead to substantially better tests.
Since the Mann-Whitney-Wilcoxon test and the Two-Sample t-test involve two populations there are several possible cluster sampling schemes that could be considered. We show that different sampling schemes will influence the tests differently.