DIFFERENTIAL ITEM-PERSON FUNCTIONING (DIPF) ON QUIZIZZ-ASSISTED PHYSICS MEASUREMENT QUESTIONS: A RASCH MODEL ANALYSIS

Mohd Zaidi Bin Amiruddin; Achmad Samsudin; Andi Suhandi; Nila Apriliyanti; Bayram Costu

doi:10.22437/jiituj.v10i1.41234

Authors

Mohd Zaidi Bin Amiruddin Universitas Pendidikan Indonesia https://orcid.org/0000-0001-9814-5782
Achmad Samsudin Universitas Pendidikan Indonesia https://orcid.org/0000-0003-3564-6031
Andi Suhandi Universitas Pendidikan Indonesia https://orcid.org/0000-0001-9912-7308
Nila Apriliyanti Al-Islam Krian High School https://orcid.org/0009-0003-1662-0861
Bayram Costu Yildiz Technical University https://orcid.org/0000-0003-1429-8031

DOI:

https://doi.org/10.22437/jiituj.v10i1.41234

Keywords:

Gender, Measurement, Physics, Rasch Model, Quizizz

Abstract

Assessment and measurement have always been crucial topics, especially in physics education, where accurate evaluation is needed to measure students' understanding and mastery of the subject. This study tested the validity and reliability of physics measurement questions administered through the Quizizz platform and identified Differential Item-Person Functioning (DIPF) using the Rasch model-assisted Winstep software. This research design used Item Response Theory (IRT). The study involved 34 high school students from Sidoarjo, East Java, Indonesia. The instrument consisted of 15 multiple-choice questions on basic physics measurements. The results showed that the instrument had good construct validity, with a raw variance explained by the measurement of 22.8%, indicating an effective measure to gauge students' ability. Reliability analysis showed moderate consistency, with a Cronbach Alpha of 70%, although person and item reliabilities were weaker at 63% and 46%, respectively. DIF analysis showed no significant gender bias. Future research should improve the instrument's reliability and consider a broader range of external factors to understand student performance comprehensively.

Downloads

Download data is not yet available.

Author Biographies

Mohd Zaidi Bin Amiruddin, Universitas Pendidikan Indonesia

Faculty of Mathematics and Science Education, Universitas Pendidikan Indonesia, Jawa Barat, Indonesia

Achmad Samsudin, Universitas Pendidikan Indonesia

Faculty of Mathematics and Science Education, Universitas Pendidikan Indonesia, Jawa Barat, Indonesia

Andi Suhandi, Universitas Pendidikan Indonesia

Faculty of Mathematics and Science Education, Universitas Pendidikan Indonesia, Jawa Barat, Indonesia

Nila Apriliyanti, Al-Islam Krian High School

Al-Islam Krian High School, Sidoarjo, Jawa Timur, Indonesia

Bayram Costu, Yildiz Technical University

Department of Science Education, Yildiz Technical University, Istanbul, Turkey

References

Akukwe, B., & Schroeders, U. (2016). Socio-economic, cultural, social, and cognitive aspects of family background and the biology competency of ninth-graders in Germany. Learning and Individual Differences, 45, 185–192. https://doi.org/10.1016/j.lindif.2015.12.009.

Amiruddin, M. Z. Bin, Samsudin, A., Suhandi, A., Kaniawati, I., COŞTU, B., Aminuddin, A. H., & Kuniawan, F. (2023). Validity and reliability of the global warming instrument: A pilot study using rasch model analysis. Jurnal Pendidikan MIPA, 24(4), 912–922. https://doi.org/10.23960/jpmipa/v24i4.pp912-922.

Bauer, D. J. (2023). Enhancing measurement validity in diverse populations: Modern approaches to evaluating differential item functioning. British Journal of Mathematical and Statistical Psychology, 76(3), 435–461. https://doi.org/10.1111/bmsp.12316.

Boone, W. J., Staver, J. R., & Yale, M. S. (2013). Rasch analysis in the human sciences. Springer. https://doi.org/10.1007/978-94-007-6857-4.

Bouwer, R., Koster, M., & Van den Bergh, H. (2023). Benchmark rating procedure, best of both worlds? Comparing procedures to rate text quality in a reliable and valid manner. Assessment in Education: Principles, Policy & Practice, 30(3–4), 302–319. https://doi.org/10.1080/0969594X.2023.2241656.

Büyükkıdık, S. (2023). Purification procedures used for the detection of gender DIF: Item bias in a foreign language test. International Journal of Assessment Tools in Education, 10(4), 765–780. https://doi.org/10.21449/ijate.1250358.

Chakrabartty, S. N. (2013). Best split-half and maximum reliability. IOSR Journal of Research & Method in Education, 3(1), 1–8. https://doi.org/10.9790/7388-0310108.

Chen, Z., Li, G., He, J., Yang, Z., & Wang, J. (2022). A new parallel adaptive structural reliability analysis method based on importance sampling and K-medoids clustering. Reliability Engineering & System Safety, 218, 108124. https://doi.org/10.1016/j.ress.2022.108639.

Cheung, G. W., Cooper-Thomas, H. D., Lau, R. S., & Wang, L. C. (2023). Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations. Asia Pacific Journal of Management, 1–39. https://doi.org/10.1007/s10490-023-09871-y.

Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412. https://doi.org/10.1037/pas0000626.

Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric instruments: theory and application. The American Journal of Medicine, 119(2), 166-e7. https://doi.org/10.1016/j.amjmed.2005.10.036.

Cordier, R., Speyer, R., Schindler, A., Michou, E., Heijnen, B. J., Baijens, L., Karaduman, A., Swan, K., Clave, P., & Joosten, A. V. (2018). Using Rasch analysis to evaluate the reliability and validity of the swallowing quality of life questionnaire: an item response theory approach. Dysphagia, 33, 441–456. https://doi.org/10.1007/s00455-017-9873-4.

Dianovinina, K., Surjaningrum, E. R., & Wulandari, P. Y. (2024). Adaptation and validation of the children’s cognitive triad inventory for Indonesian students. International Journal of Evaluation and Research in Education (IJERE), 13(3), 1356–1362. https://doi.org/10.11591/ijere.v13i3.28038.

Goldhammer, F. (2015). Measuring ability, speed, or both? Challenges, psychometric solutions, and what can be gained from experimental control. Measurement: Interdisciplinary Research and Perspectives, 13(3–4), 133–164. https://doi.org/10.1080/15366367.2015.1100020.

Fährmann, K., Köhler, C., Hartig, J., & Heine, J.-H. (2022). Practical significance of item misfit and its manifestations in constructs assessed in large-scale studies. Large-Scale Assessments in Education, 10(1), 7.

Harlen, W. (2005). Trusting teachers’ judgement: Research evidence of the reliability and validity of teachers’ assessment used for summative purposes. Research Papers in Education, 20(3), 245–270. https://doi.org/10.1080/02671520500193744.

Intasoi, S., Junpeng, P., Tang, K. N., Ketchatturat, J., Zhang, Y., & Wilson, M. (2020). Developing an assessment framework of multidimensional scientific competencies. International Journal of Evaluation and Research in Education, 9(4), 963–970. https://doi.org/10.11591/ijere.v9i4.20542.

Kauertz, A., & Fischer, H. E. (2006). Assessing students’ level of knowledge and analysing the reasons for learning difficulties in physics by Rasch analysis. Applications of Rasch Measurement in Science Education, 212–246.

Kimberlin, C. L., & Winterstein, A. G. (2008). Validity and reliability of measurement instruments used in research. American Journal of Health-System Pharmacy, 65(23), 2276–2284. https://doi.org/10.2146/ajhp070364.

Kinyua, K., & Okunya, L. O. (2014). Validity and reliability of teacher-made tests: Case study of year 11 physics in Nyahururu district of Kenya. African Educational Research Journal, 2(2), 61–71.

Köhler, C., Robitzsch, A., & Hartig, J. (2020). A bias-corrected RMSD item fit statistic: An evaluation and comparison to alternatives. Journal of Educational and Behavioral Statistics, 45(3), 251–273.

Kusuma, I. Y., Triwibowo, D. N., Pratiwi, A. D. E., & Pitaloka, D. A. E. (2022). Rasch modelling to assess psychometric validation of the knowledge about tuberculosis questionnaire (KATUB-Q) for the general population in Indonesia. International Journal of Environmental Research and Public Health, 19(24), 16753. https://doi.org/10.3390/ijerph192416753.

Lechien, J. R., Maniaci, A., Gengler, I., Hans, S., Chiesa-Estomba, C. M., & Vaira, L. A. (2024). Validity and reliability of an instrument evaluating the performance of intelligent chatbot: The artificial intelligence performance instrument (AIPI). European Archives of Oto-Rhino-Laryngology, 281(4), 2063–2079. https://doi.org/10.1007/s00405-023-08219-y.

Lee, H., & Geisinger, K. F. (2014). The effect of propensity scores on DIF analysis: Inference on the potential cause of DIF. International Journal of Testing, 14(4), 313–338. https://doi.org/10.1080/15305058.2014.922567.

Lohr, K. N. (2002). Assessing health status and quality-of-life instruments: attributes and review criteria. Quality of Life Research, 11, 193–205. https://doi.org/10.1023/A:1015291021312.

Lupi, J. B., Carvalho de Abreu, D. C., Ferreira, M. C., Oliveira, R. D. R. de, & Chaves, T. C. (2017). Brazilian Portuguese version of the revised fibromyalgia impact questionnaire (FIQR-Br): cross-cultural validation, reliability, and construct and structural validation. Disability and Rehabilitation, 39(16), 1650–1663. https://doi.org/10.1080/09638288.2016.1207106.

MacKenzie, S. B., Podsakoff, P. M., & Podsakoff, N. P. (2011). Construct measurement and validation procedures in MIS and behavioral research: Integrating new and existing techniques. MIS Quarterly, 293–334. https://doi.org/10.2307/23044045.

Madansky, A. (1965). Approximate confidence limits for the reliability of series and parallel systems. Technometrics, 7(4), 495–503. https://doi.org/10.1080/00401706.1965.10490293.

Maldonado-Murciano, L., Pontes, H. M., Barrios, M., Gómez-Benito, J., & Guilera, G. (2023). Psychometric validation of the Spanish Gaming Disorder Test (GDT): Item response theory and measurement invariance analysis. International Journal of Mental Health and Addiction, 21(3), 1973–1991. https://doi.org/10.1007/s11469-021-00704-x.

Martinková, P., Drabinová, A., Liaw, Y.-L., Sanders, E. A., McFarland, J. L., & Price, R. M. (2017). Checking equity: Why differential item functioning analysis should be a routine part of developing conceptual assessments. CBE—Life Sciences Education, 16(2), rm2. https://doi.org/10.1187/cbe.16-10-0307.

Malik, R. S. (2018). Educational challenges in 21st century and sustainable development. Journal of Sustainable Development Education and Research, 2(1), 9–20.

Mi, S., Ye, J., Li, Y., & Bi, H. (2023). Development and validation of a conceptual survey instrument to evaluate senior high school students’ understanding of electrostatics. Physical Review Physics Education Research, 19(1), 10114. https://doi.org/10.1103/PhysRevPhysEducRes.19.010114.

Mian, S. H., Salah, B., Ameen, W., Moiduddin, K., & Alkhalefah, H. (2020). Adapting universities for sustainability education in industry 4.0: Channel of challenges and opportunities. Sustainability, 12(15), 6100. https://doi.org/10.3390/su12156100.

Molloy, J. C., Chadwick, C., Ployhart, R. E., & Golden, S. J. (2011). Making intangibles “tangible” in tests of resource-based theory: A multidisciplinary construct validation approach. Journal of Management, 37(5), 1496–1518. https://doi.org/10.1177/0149206310394185.

Moreira, H., & Freire, M. L. L. (2024). Promoting formative assessment with quizizz: A classroom action research study. Ciencia Latina Revista Científica Multidisciplinar, 8(2), 590–604.

Müller, M. (2020). Item fit statistics for Rasch analysis: can we trust them? Journal of Statistical Distributions and Applications, 7(1), 5.

Munir, J., Faiza, M., Jamal, B., Daud, S., & Iqbal, K. (2023). The impact of socio-economic status on academic achievement. Journal of Social Sciences Review, 3(2), 695–705.

Nisa, K., Suprapto, N., Amiruddin, M. Z., Sari, E. P. D. N., & Athiah, B. D. (2024). Ethnoscience-Quizizz test to measure problem-solving skills: a Rasch analysis. Int J Eval & Res Educ, 13(6), 4247–4255.

Oliveri, M. E., Ercikan, K., Lyons-Thomas, J., & Holtzman, S. (2016). Analyzing fairness among linguistic minority populations using a latent class differential item functioning approach. Applied Measurement in Education, 29(1), 17–29. https://doi.org/10.37811/cl_rcm.v8i2.10511.

Owan, V. J., Abang, K. B., Idika, D. O., Etta, E. O., & Bassey, B. A. (2023). Exploring the potential of artificial intelligence tools in educational measurement and assessment. EURASIA Journal of Mathematics, Science and Technology Education, 19(8), em2307. https://doi.org/10.29333/ejmste/13428.

Pals, F. F. B., Tolboom, J. L. J., & Suhre, C. J. M. (2023). Development of a formative assessment instrument to determine students’ need for corrective actions in physics: Identifying students’ functional level of understanding. Thinking Skills and Creativity, 50, 101387. https://doi.org/10.1016/j.tsc.2023.101387.

Pellegrino, J. W. (2014). Assessment as a positive influence on 21st century teaching and learning: A systems approach to progress. Psicología Educativa, 20(2), 65–77. https://doi.org/10.1016/j.pse.2014.11.002.

Polit, D. F. (2014). Getting serious about test–retest reliability: a critique of retest research and some recommendations. Quality of Life Research, 23, 1713–1720. https://doi.org/10.1007/s11136-014-0632-9.

Pronk, T., Molenaar, D., Wiers, R. W., & Murre, J. (2022). Methods to split cognitive task data for estimating split-half reliability: A comprehensive review and systematic assessment. Psychonomic Bulletin & Review, 29(1), 44–54. https://doi.org/10.3758/s13423-021-01948-3.

Quaigrain, K., & Arhin, A. K. (2017). Using reliability and item analysis to evaluate a teacher-developed test in educational measurement and evaluation. Cogent Education, 4(1), 1301013.

Rabgay, T. (2015). A study of factors influencing students’ academic performance in a Higher secondary school in Bhutan. Rabsel-the CERD Educational Journal, 16(2), 74–96.

Ranyard, R., McNair, S., Nicolini, G., & Duxbury, D. (2020). An item response theory approach to constructing and evaluating brief and in‐depth financial literacy scales. Journal of Consumer Affairs, 54(3), 1121–1156.

Razali, M. N. M., Hamid, A. H. A., Alias, B. S., & Mansor, A. N. (2024). The validity and reliability of culturally responsive leadership practice instruments in small schools peninsular Malaysia. Int J Eval & Res Educ, 13(1), 1–8. https://doi.org/10.11591/ijere.v13i1.26274.

Reynolds, C. R., Altmann, R. A., & Allen, D. N. (2021). The problem of bias in psychological assessment. In Mastering modern psychological testing: Theory and methods (pp. 573–613). Springer. https://doi.org/10.1007/978-3-030-59455-8_15.

Rodríguez-Hernández, C. F., Cascallar, E., & Kyndt, E. (2020). Socio-economic status and academic performance in higher education: A systematic review. Educational Research Review, 29, 100305. https://doi.org/10.1016/j.edurev.2019.100305.

Saputro, S. (2022). Trend creative thinking perception of students in learning natural science: Gender and domicile perspective. International Journal of Instruction, 15(1), 701–716. https://doi.org/10.29333/iji.2022.15140a.

Saxena, S., Carlson, D., Billington, R., & Orley, J. (2001). The WHO quality of life assessment instrument (WHOQOL-Bref): the importance of its items for cross-cultural research. Quality of Life Research, 10, 711–721. https://doi.org/10.1023/A:1013867826835.

Song, J., Howe, E., Oltmanns, J. R., & Fisher, A. J. (2023). Examining the concurrent and predictive validity of single items in ecological momentary assessments. Assessment, 30(5), 1662–1671. https://doi.org/10.1177/10731911221113563.

Sumintono, B., & Widhiarso, W. (2015). Aplikasi pemodelan rasch pada assessment pendidikan. Trim komunikata.

Sumintono, B., & Widhiarso, W. (2014). Aplikasi Model Rasch untuk penelitian ilmu-ilmu sosial [Application of the Rasch Model for Social Sciences Research]. Bandung, Indonesia: Trimkom Publishing House.

Van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of item response theory. Taylor & Francis Group. Citado Na Pág, 1(7), 8. https://doi.org/10.1007/978-1-4757-2691-6_1.

Van Zile-Tamsen, C. (2017). Using Rasch analysis to inform rating scale development. Research in Higher Education, 58(8), 922–933. https://doi.org/10.1007/s11162-017-9448-0.

Willingham, W. W., & Cole, N. S. (2013). Gender and fair assessment. Routledge. https://doi.org/10.4324/9781315045115.

Yildiz, S. M., & Kara, A. (2017). A unidimensional instrument for measuring internal marketing concept in the higher education sector: IM-11 scale. Quality Assurance in Education, 25(3), 343–361. https://doi.org/10.1108/QAE-02-2016-0009.

Zainuddin, Z., Shujahat, M., Haruna, H., & Chu, S. K. W. (2020). The role of gamified e-quizzes on student learning and engagement: An interactive gamification solution for a formative assessment system. Computers & Education, 145, 103729. https://doi.org/10.1016/j.compedu.2019.103729.