The Influence of Item Discrimination on Misclassification of Test Takers
DOI:
https://doi.org/10.35670/1667-4545.v21.n3.36294Keywords:
decision accuracy, item discrimination, item response theory, Rudner algorithm, information functionAbstract
It has been suggested that low discriminating items can be included in a test with a criterion-referenced score interpretation as long as they measure a highly relevant content. However, low item discrimination increases the standard error of measurement, which might increase the expected proportion of misclassified test takers. In order to test it, responses from 2000 test takers to 100 items were simulated, varying item discrimination values and number and location of cut scores, and classification inaccuracy was estimated. Results show that the expected proportion of misclassified test takers increased as item discrimination decreased, and as the cut scores were closer to the mean of the distribution of test takers. Therefore, a test should include as few items with low discrimination values as possible —or even none— in order to reduce the expected proportion of test takers classified into a wrong performance level.
Downloads
References
Baker, F. B., & Kim, S.-H. (2017). The basics of Item Response Theory using R. New York, N.Y: Springer. doi: 10.1007/978-3-319-54205-8
Burton, R. F. (2001). Do item-discrimination indices really help us to improve our tests? Assessment & Evaluation in Higher Education, 26(3), 213-220. doi: 10.1080/02602930120052378
Cheng, Y., Liu, C., & Behrens, J. (2015). Standard error of ability estimates and the classification accuracy and consistency of binary decisions. Psychometrika, 80(3), 645-664. doi: 10.1007/s11336-014-9407-z
Clifford, R. (2016). A rationale for criterion-referenced proficiency testing. Foreign Language Annals, 49(2), 224-234. doi: 10.1111/flan.12201
DeMars, C. (2010). Item Response Theory. Oxford, Oxfordshire: Oxford University Press.
Ercikan, K., & Julian, M. (2002). Classification accuracy of assigning student performance to proficiency levels: Guidelines for assessment design. Applied Measurement in Education, 15(3), 269-294. doi: 10.1207/S15324818AME1503_3
Frisbie, D. A. (2005). Measurement 101: Some fundamentals revisited. Educational Measurement: Issues and Practice, 24(3), 21-28. doi: 10.1111/j.1745-3992.2005.00016.x
Haladyna, T. M. (2016). Item analysis for selected-response test items. In S. Lane, M. R. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed, pp. 392-409). New York, NY: Routledge.
Haladyna, T. M., Rodriguez, M. C., & Stevens, C. (2019). Are multiple-choice items too fat? Applied Measurement in Education, 32(4), 350-364. doi: 10.1080/08957347.2019.1660348
Hambleton, R. K., & Jones, R. W. (1994). Item parameter estimation errors and their influence on test information functions. Applied Measurement in Education, 7(3), 171-186. doi: 10.1207/s15324818ame0703_1
Lathrop, Q. N. (2014). R package cacIRT: Estimation of classification accuracy and consistency under item response theory. Applied Psychological Measurement, 38(7), 581-582. doi: 10.1177/0146621614536465
Lathrop, Q. N. (2015). Practical issues in estimating classification accuracy and consistency with R package cacIRT. Practical Assessment, Research, and Evaluation, 20, Article 18. Retrieved from https://scholarworks.umass.edu/pare/vol20/iss1/18
Lathrop, Q. N., & Cheng, Y. (2013). Two approaches to estimation of classification accuracy rate under item response theory. Applied Psychological Measurement, 37(3), 226-241. doi: 10.1177/0146621612471888
Lathrop, Q. N., & Cheng, Y. (2014). A nonparametric approach to estimate classification accuracy and consistency. Journal of Educational Measurement, 51(3), 318-334. doi: 10.1111/jedm.12048
Lee, W.-C. (2010). Classification consistency and accuracy for complex assessments using item response theory. Journal of Educational Measurement, 47(1), 1-17. doi: 10.1111/j.1745-3984.2009.00096.x
Leydold, J., & H”ormann, W. (2021). Runuran: R interface to the ‘UNU.RAN’ random variate generators (Version 0.34) [R package]. Retrieved from https://cran.r-project.org/web/packages/Runuran/index.html
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Luecht, R. M. (2016). Applications of item response theory: Item and test information functions for designing and building mastery tests. In S. Lane, M. R. Raymond, & T. M. Haladyna (Eds.), Handbook of Test Development (2nd ed.). New York, NY: Routledge.
Martineau, J. A. (2007). An expansion and practical evaluation of expected classification accuracy. Applied Psychological Measurement, 31(3), 181-194. doi: 10.1177/0146621606291557
Paek, I., & Han, K. T. (2013). IRTPRO 2.1 for Windows (item response theory for patient-reported outcomes). Applied Psychological Measurement, 37(3), 242-252. doi: 10.1177/0146621612468223
Partchev, I., Maris, G., & Hattori, T. (2017). irtoys: A collection of functions related to item response theory (IRT) (Version 0.2.1) [R package]. Retrieved from https://cran.r-project.org/package=irtoys
Popham, W. J. (2014). Criterion-referenced measurement: Half a century wasted? Educational Leadership, 71(6), 62-66. Retrieved from http://www.ascd.org/publications/educational_leadership/mar14/vol71/num06/Criterion-Referenced_Measurement@_Half_a_Century_Wasted%C2%A2.aspx
Popham, W. J., & Husek, T. R. (1969). Implications of criterion-referenced measurement. Journal of Educational Measurement, 6(1), 1-9. doi: 10.1111/j.1745-3984.1969.tb00654.x
R Core Team. (2020). R: A language and environment for statistical computing (Version 4.0.2). [Computer software]. Retrieved from https://www.R-project.org
Ramírez-Benítez, Y., Jiménez-Morales, R. M., & Díaz-Bringas, M. (2015). Matrices progresivas de Raven: Punto de corte para preescolares 4 - 6 años. Revista Evaluar, 15(1), 123-133. doi: 10.35670/1667-4545.v15.n1.14911
Richaud de Minzi, M. C. (2008). Nuevas tendencias en psicometría. Revista Evaluar, 8(1), 1-19. doi: 10.35670/1667-4545.v8.n1.501
Rizopoulos, D. (2018). ltm: Latent trait models under IRT (Version 1.1-1) [R package]. Retrieved from https://CRAN.R-project.org/package=ltm
Rudner, L. M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment, Research, and Evaluation, 7, Article 14. doi: 10.7275/an9m-2035
Rudner, L. M. (2005). Expected classification accuracy. Practical Assessment, Research, and Evaluation, 10, Article 13. doi: 10.7275/56a5-6b14
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). New York, NY: Springer. doi: 10.1007/978-3-319-24277-4
Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., ... & RStudio. (2021). ggplot2: Create elegant data visualisations using the grammar of graphics (Version 3.3.5) [R package]. Retrieved from https://cran.r-project.org/web/packages/ggplot2/index.html
Wyse, A. E., & Hao, S. (2012). An evaluation of item response theory classification accuracy and consistency indices. Applied Psychological Measurement, 36(7), 602-624. doi: 10.1177/0146621612451522
Xing, D., & Hamleton, R. K. (2004). Impact of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing examinations. Educational and Psychological Measurement, 64(1), 5-21. doi: 10.1177/0013164403258393
Yen, W. M. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52(2), 275-291. doi: 10.1007/BF02294241
Additional Files
Published
Issue
Section
License
Copyright (c) 2021 Raúl Emmanuel Trujano
This work is licensed under a Creative Commons Attribution 4.0 International License.
Revista Evaluar aplica la Licencia Internacional de Atribuciones Comunes Creativas (Creative Commons Attribution License, CCAL). Bajo esta licencia, los autores retienen la propiedad de copyright de los artículos pero permiten que, sin que medie permiso de autor o editor, cualquier persona descargue y distribuya los artículos publicados en Evaluar. La única condición es que siempre y en todos los casos se cite a los autores y a la fuente original de publicación (i.e. Evaluar). El envío de artículos a Evaluar y la lectura de los mismos es totalmente gratuito.