Abstract:
In addition to validity and reliability evidence, other psychometric qualities of the PE Metrics assessments needed to be examined. This article describes how those critical psychometric issues were addressed during the PE Metrics assessment bank construction. Specifically, issues included (a) number of items or assessments needed, (b) training protocol for required intra- and inter-rater objectivity, and (c) the development of a score scale. First, using a subsample of data from the PE Metrics study, in which students were assessed using four assessments, the impact of the number of assessments was examined. It was found that at least two assessments are needed when applying PE Metrics for the purpose of high stakes testing. Single individual assessments can still be used in teaching practice, but the results must be interpreted with caution. Second, with the training protocol developed for PE Metrics, satisfactory intra-rater objectivity can be achieved. When two or more raters are involved in rating, however, an additional monitoring protocol should be employed so that inter-rater objectivity can be monitored and controlled. Third, to help allow for a consistent interpretation and reporting of PE Metrics results, a score scale was developed. Other related issues, such as test fairness and setting performance standards, were discussed, and future directions concerning PE Metrics maintenance and continuing development were outlined.