The Mode Effects and Digital Divide Among Taiwanese Students Based on the TIMSS 2019 Computer-Based Testing
Author: Kuan-Ming Chen (Research Center for Testing and Assessment, National Academy for Educational Research), Tian-Ming Sheu (Department of Education, National Taiwan Normal University), Tsung-Hau Jen (Science Education Center, National Taiwan Normal University)
Vol.&No.:Vol. 70, No. 2
Date:June 2025
Pages:1-41
DOI:https://doi.org/10.6209/JORIES.202506_70(2).0001
Abstract:
Introduction
To meet the demands and advantages of the digital age, most international large-scale assessments have transitioned students’ response modes from paper-and-pencil to computer-based testing (CBT). With the aid of digitalization, CBT can better align with the fundamental principles of authentic assessment (Herrington & Herrington, 1998) by incorporating multimedia presentations and capturing test-takers’ maneuvering behaviors in greater detail. However, when the item characteristics or constructs that an item is designed to assess differ between paper-and-pencil and CBT formats, a mode effect may arise as a result of variations in students’ response modes. If this mode effect correlates with factors related to digital divide, such as disparities in students’ socioeconomic status (SES), the allocation of urban-rural resources, and gender (Cooper, 2006; Hsieh & Ming, 2022; Hu & Chang, 2023; Kuo, 2022), CBT raises concerns regarding educational equity. Therefore, addressing the digital divide in CBT implementations is crucial to ensuring educational equity and accessibility before full adoption.
As the first assessment cycle to implement CBT, in the Trends in International Mathematics and Science Study (TIMSS) 2019, half of the participating countries, including Taiwan, implemented CBT as the primary testing format (eTIMSS) for the main survey while concurrently administering a bridge study using the paper-and-pencil format (paperTIMSS). The purpose of this bridge study was to identify and equate potential mode effects in TIMSS 2019, ensuring item equivalence across modes and supporting robust trend analyses across assessment cycles. Drawing on data of Taiwanese eighth-grade students from TIMSS 2019 and its accompanying bridge study, this research examines concerns regarding educational equity by examining the relationships between mode effects and three students background factors: SES, school location, and gender. Specifically, this study seeks to determine whether Taiwanese eighth-grade students’ performance in mathematics and science differed between the two testing formats and whether a digital divide was evident based on SES, school location, or gender. Additionally, TIMSS 2019 incorporated computerized innovative items to assess problem-solving and inquiry (PSI) tasks (Mullis et al., 2021). Therefore, this study also explores how Taiwanese eighth-grade students from different backgrounds performed on these computerized PSI tasks (eTIMSS-PSI).
Method
Analyzing the performance of Taiwanese eighth-grade students on trend items in mathematics and science from eTIMSS 2019 and its bridge study, paperTIMSS, mode effects were identified by comparing the item average percent correct between the paper-and-pencil and the CBT formats. Subsequently, the digital divide related to demographic factors was examined by comparing scale score performance on trend items between eTIMSS and paperTIMSS among students with varying SES, school locations, and genders. Finally, students’ performance on eTIMSS-PSI items was analyzed using scale scores to assess how Taiwanese eighth-grade students from different backgrounds performed.
Results
Significant reductions in the item average percent correct on trend items were observed in eTIMSS compared to paperTIMSS among Taiwanese eighth-grade students in both mathematics and science, indicating substantial mode effects across both subjects. Based on these identified mode effects, scaling procedures were implemented using the international average mode effect and performance data from previous assessment cycles (which included only paperTIMSS) for each subject to generate scale scores. Consequently, the average scale scores for mathematics and science were adjusted to ensure comparability between eTIMSS and paperTIMSS, as well as between TIMSS 2019 and prior assessment cycles.
By comparing the average scale score performance between eTIMSS and paperTIMSS among Taiwanese eighth-grade students with varying SES, school locations, and genders, no significant difference was found in the relationship between mode effects and students’ background factors in either mathematics or science. Therefore, the digital divide due to mode effects was not evident in the scale scores across both subjects based on SES, school location, or gender.
Furthermore, the analysis of eTIMSS-PSI scale scores for Taiwanese eighth-grade students with varying backgrounds revealed a declining trend in performance among students from higher SES compared to those from lower SES backgrounds, as well as among students attending schools in urban areas compared to those in rural areas.
Conclusions
The performance differences between paper-and-pencil and CBT formats in TIMSS 2019 mathematics and science among Taiwanese eighth-grade students from different backgrounds reveal the following key findings:
1. Overall mode effects: Taiwanese eighth-grade students, similar to their counterparts in most other countries, demonstrated lower average percent correct on items presented in the CBT format compared to the paper-and-pencil format in both mathematics and science. Substantial mode effects were observed, particularly in mathematics.
2. No widening digital divide: After applying scaling procedures to ensure comparability of scale score between the two testing formats, no evidence of a digital divide was found based on SES, school location, or gender, in either mathematics or science. Students from different backgrounds performed similarly across both formats.
3. Performance disadvantage on PSI tasks: As computerized PSI tasks captured test-takers’ maneuvering behaviors in greater detail, students from higher SES backgrounds or urban areas demonstrated a performance disadvantage on PSI tasks. It remains unclear whether this disadvantage results from weaker PSI skills among Taiwanese students in general or potential impacts of interactive nature of CBT items on specific student groups. Future research is needed to explore these factors.
In the digital age, the transition to CBT is inevitable as technology continues to transform education and assessment practices. The integration of digital tools in learning environments makes the adoption of CBT essential for aligning with modern educational needs while offering more interactive and efficient assessment methods. The urgency of digitalization in education become even more apparent during the COVID-19 pandemic, highlighting the need for accessible and equitable digital learning solutions. While CBT offers benefits, concerns related to educational equity, including mode effects and the digital divide, remain. By analyzing the impact of CBT in TIMSS 2019, educators and policymakers can accordingly develop evidence-based strategies to ensure fair, reliable, and valid assessments for all students. With precise scaling procedures to eliminate originally evident mode effects, it is assuring that no widening digital divide was identified. In conclusion, the transition to digital assessments in TIMSS 2019 did not compromise educational equity for Taiwanese eighth-grade students.
Keywords:
international large-scale survey, computer-based testing, digital divide, mode efect
《Full Text》
References:任宗浩、譚克平、張立民(2011)。二階段分層叢集抽樣的設計效應估計:以TIMSS 2007調查研究為例。教育科學研究期刊,56(1),33-65。https://doi.org/10.3966/ 2073753X2011035601002
【Jen, T.-H., Tam, H.-P., & Wu, M. (2011). An estimation of the design effect for the two-stage stratified cluster sampling design. Journal of Research in Education Sciences, 56(1), 33-65. https://doi.org/10.3966/ 2073753X2011035601002】
胡翠君、張存真(2023)。學校環境、教師背景與學生表現對偏鄉教師自我效能的影響。教育科學研究期刊,68(3),179-208。https://doi.org/10.6209/jories.202309_68(3).0006
【Hu, T.-C., & Chang, T.-J. (2023). The influence of school environment, teacher background and student performance on the self-efficacy of teachers in rural schools. Journal of Research in Education Sciences, 68(3), 179-208. https://doi.org/10.6209/jories.202309_68(3).0006】
教育部(2008)。教育部中小學資訊教育白皮書2008-2011。
【Ministry of Education. (2008). Ministry of Education white paper on information education for primary and secondary schools 2008-2011.】
教育部(2018)。十二年國民基本教育課程綱要(國民中小學暨普通型高級中等學校)-自然科學領域。https://ghresource.k12ea.gov.tw/uploads/1613715832381Ld8uk4KU.pdf
【Ministry of Education. (2018). Curriculum guidelines of 12-Year Basic Education for elementary school, junior high and general senior high schools: The domain of natural science. https://cirn.moe.edu.tw/ Upload/file/38227/104346.pdf】
張俊彥(主編)(2021)。TIMSS 2019國際數學與科學教育成就趨勢調查國家報告。國立臺灣師範大學科學教育中心。
【Chang, C.-Y. (Ed.). (2021). Taiwan national report of TIMSS 2019. Science Education Center of National Taiwan Normal University.】
» More
一、中文文獻
任宗浩、譚克平、張立民(2011)。二階段分層叢集抽樣的設計效應估計:以TIMSS 2007調查研究為例。教育科學研究期刊,56(1),33-65。https://doi.org/10.3966/ 2073753X2011035601002
【Jen, T.-H., Tam, H.-P., & Wu, M. (2011). An estimation of the design effect for the two-stage stratified cluster sampling design. Journal of Research in Education Sciences, 56(1), 33-65. https://doi.org/10.3966/ 2073753X2011035601002】
胡翠君、張存真(2023)。學校環境、教師背景與學生表現對偏鄉教師自我效能的影響。教育科學研究期刊,68(3),179-208。https://doi.org/10.6209/jories.202309_68(3).0006
【Hu, T.-C., & Chang, T.-J. (2023). The influence of school environment, teacher background and student performance on the self-efficacy of teachers in rural schools. Journal of Research in Education Sciences, 68(3), 179-208. https://doi.org/10.6209/jories.202309_68(3).0006】
教育部(2008)。教育部中小學資訊教育白皮書2008-2011。
【Ministry of Education. (2008). Ministry of Education white paper on information education for primary and secondary schools 2008-2011.】
教育部(2018)。十二年國民基本教育課程綱要(國民中小學暨普通型高級中等學校)-自然科學領域。https://ghresource.k12ea.gov.tw/uploads/1613715832381Ld8uk4KU.pdf
【Ministry of Education. (2018). Curriculum guidelines of 12-Year Basic Education for elementary school, junior high and general senior high schools: The domain of natural science. https://cirn.moe.edu.tw/ Upload/file/38227/104346.pdf】
張俊彥(主編)(2021)。TIMSS 2019國際數學與科學教育成就趨勢調查國家報告。國立臺灣師範大學科學教育中心。
【Chang, C.-Y. (Ed.). (2021). Taiwan national report of TIMSS 2019. Science Education Center of National Taiwan Normal University.】
陳冠銘(2017)。TASA資料庫的二次分析。載於蕭儒棠、曾建銘、謝佩蓉、黃馨瑩、吳慧珉、陳冠銘、蔡明學、謝名娟、謝進昌(主編),大型教育調查研究實務-以TASA為例(頁129-168)。國家教育研究院。
【Chen, K.-M. (2017). The secondary analysis on TASA dataset. In J.-T. Hsiao, C.-M. Cheng, P.-J. Hsieh, H.-Y. Huang, H.-M. Wu, K.-M. Chen, M.-H. Tsai, M.-C. Hsieh, & J.-C. Hsieh (Eds.), Using large-scale assessment datasets for research: Taiwan Assessment of Student Achievement (TASA) (pp. 129-168). National Academy for Educational Research.】
郭晏輔(2022)。探討學校層級的社經文化背景對學習成就之影響及其制度性成因。臺灣教育社會學研究,22(2),1-46。https://doi.org/10.53106/168020042022122202001
【Kuo, Y.-F. (2022). The discussion for the influence from school-level SES on learning achievement and the institutional causes. Taiwan Journal of Sociology of Education, 22(2), 1-46. https://doi.org/10.53106/ 168020042022122202001】
許添明、商雅雯、陳冠銘(2018)。兼顧公平與卓越的資源分配-投資臺灣弱勢者教育。載於財團法人黃昆輝教授教育基金會(主編),繁榮與進步:教育的力量(頁271-304)。財團法人黃昆輝教授教育基金會。
【Sheu, T.-M., Shang, Y.-W., & Chen, K.-M. (2018). Between equity and excellence: Resource allocation in Taiwan’s education. In The Professor Huang Kun-huei Education Foundation (Ed.), Prosperity and progress: Power of education (pp. 271-304). The Professor Huang Kun-huei Education Foundation.】
謝卓君、閔詩紜(2022)。臺灣閱讀教育治理之城鄉差異探究。教育科學研究期刊,67(4),73-104。https://doi.org/10.6209/jories.202212_67(4).0003
【Hsieh, C.-C., & Ming, S.-Y. (2022). Urban-rural differences in governance of reading education in Taiwan. Journal of Research in Education Sciences, 67(4), 73-104. https://doi.org/10.6209/jories.202212_67(4).0003】
二、外文文獻
Ben Youssef, A., Dahmani, M., & Ragni, L. (2022). ICT use, digital skills and students’ academic performance: Exploring the digital divide. Information, 13(3), 129. https://doi.org/10.3390/ info13030129
Bennett, R. E., Braswell, J., Oranje, A., Sandene, B., Kaplan, B., & Yan, F. (2008). Does it matter if I take my mathematics test on computer? A second empirical study of mode effects in NAEP. Journal of Technology, Learning, and Assessment, 6(9), 1-39.
Breithaupt, K., & Hare, D. R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67(1), 5-20. https://doi.org/10.1177/0013164406288162
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587
Cooper, J. (2006). The digital divide: The special case of gender. Journal of Computer Assisted Learning, 22(5), 320-334. https://doi.org/10.1111/j.1365-2729.2006.00185.x
Crisp, G., Guàrdia, L., & Hillier, M. (2016). Using e-assessment to enhance student learning and evidence learning outcomes. International Journal of Educational Technology in Higher Education, 13(1), 18. https://doi.org/10.1186/s41239-016-0020-3
Facer, K. (2012). Taking the 21st century seriously: Young people, education and socio-technical futures. Oxford Review of Education, 38(1), 97-113. http://www.jstor.org/stable/23119474
Facer, K., & Sandford, R. (2010). The next 25 years?: Future scenarios and future directions for education and technology. Journal of Computer Assisted Learning, 26(1), 74-93. https://doi.org/ 10.1111/j.1365-2729.2009.00337.x
Fishbein, B., Foy, P., & Yin, L. (2021). TIMSS 2019 user guide for the international database (2nd ed.). TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement.
Fishbein, B., Martin, M. O., Mullis, I. V. S., & Foy, P. (2018). The TIMSS 2019 item equivalence study: Examining mode effects for computer-based assessment and implications for measuring trends. Large-scale Assessments in Education, 6(1), 11. https://doi.org/10.1186/s40536- 018-0064-z
Fisher, R. A. (1921). On the probable error of a coefficient of correlation deduced from a small sample. Metron, 1, 3-32.
Foy, P., Fishbein, B., von Davier, M., & Yin, L. (2020). Implementing the TIMSS 2019 scaling methodology. In M. O. Martin, M. von Davier, & I. V. S. Mullis (Eds.), Methods and procedures: TIMSS 2019 technical report (pp. 12.1-12.146). TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement.
Foy, P., Galia, J., & Li, I. (2008). Scaling the data from the TIMSS 2007 mathematics and science assessments. In J. F. Olson, M. O. Martin, & I. V. S. Mullis (Eds.), TIMSS 2007 technical report (pp. 225-279). TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.
Foy, P., & LaRoche, S. (2020). Estimating standard errors in the TIMSS 2019 results. In M. O. Martin, M. von Davier, & I. V. S. Mullis (Eds.), Methods and procedures: TIMSS 2019 technical report (pp. 14.1-14.60). TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement.
Foy, P., & Yin, L. (2016). Scaling the TIMSS 2015 achievement data. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and procedures in TIMSS 2015 (pp. 13.1-13.62). TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement.
Gallagher, A., Bridgeman, B., & Cahalan, C. (2002). The effect of computer-based tests on racial-ethnic and gender groups. Journal of Educational Measurement, 39(2), 133-147. https://doi.org/10.1111/j.1745-3984.2002.tb01139.x
Hart, B., & Risley, T. R. (1995). Meaningful differences in the everyday experience of young American children. Paul H. Brookes.
Hattie, J. (2008). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge. https://doi.org/10.4324/9780203887332
Hattie, J. (2023). Visible learning: The sequel. A synthesis of over 2,100 meta-analyses relating to achievement. Routledge. https://doi.org/10.4324/9781003380542
Hedges, L. V. (2007). Correcting a significance test for clustering. Journal of Educational and Behavioral Statistics, 32(2), 151-179. https://doi.org/10.3102/1076998606298040
Herrington, J., & Herrington, A. (1998). Authentic assessment and multimedia: How university students respond to a model of authentic assessment. Higher Education Research & Development, 17(3), 305-322. https://doi.org/10.1080/0729436980170304
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65-70.
Horkay, N., Bennett, R. E., Allen, N., Kaplan, B. A., & Yan, F. (2006). Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP. The Journal of Technology, Learning and Assessment, 5(2). https://ejournals.bc.edu/index.php/jtla/article/ view/1641/1488
International Association for the Evaluation of Educational Achievement. (2022). Help manual for the IEA IDB Analyzer (Version 5.0). https://www.iea.nl/sites/default/files/2022-06/IDB- Analyzer-Manual-%28Version-5-0%29.pdf
IMD World Competitiveness Center. (2024). IMD world digital competitiveness ranking 2023. https://www.imd.org/centers/wcc/world-competitiveness-center/ rankings/world-digital-competitiveness-ranking/
Kish, L. (1965). Survey sampling. John Wiley & Sons.
Kish, L., & Frankel, M. R. (1974). Inference from complex samples. Journal of the Royal Statistical Society. Series B (Methodological), 36(1), 1-37.
Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229-249.
Lumley, T. (2004). Analysis of complex survey samples. Journal of Statistical Software, 9(8), 1-19. https://doi.org/10.18637/jss.v009.i08
Lumley, T. (2021). R package “survey”: Analysis of complex survey samples (Version 4.1-1) [Computer software]. https://CRAN.R-project.org/package=survey
Martin, M. O., von Davier, M., & Mullis, I. V. S. (Eds.). (2020). Methods and procedures: TIMSS 2019 technical report. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement.
Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177-196. https://doi.org/10.1007/BF02294457
Mullis, I. V. S., Martin, M. O., Fishbein, B., Foy, P., & Moncaleano, S. (2021). Findings from the TIMSS 2019 problem solving and inquiry tasks. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement.
Mullis, I. V. S., Martin, M. O., Foy, P., & Hooper, M. (2017a). ePIRLS 2016 international results in online informational reading. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement.
Mullis, I. V. S., Martin, M. O., Foy, P., & Hooper, M. (2017b). PIRLS 2016 international results in reading. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement.
Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 international results in mathematics and science. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement.
O’Neill, R., & Wetherill, G. B. (1971). The present state of multiple comparison methods. Journal of the Royal Statistical Society. Series B (Methodological), 33(2), 218-250.
Organisation for Economic Co-operation and Development. (2010). PISA computer-based assessment of student skills in science. OECD Publishing. https://doi.org/10.1787/ 9789264082038-en
Organisation for Economic Co-operation and Development. (2017). PISA 2015 technical report. OECD Publishing.
Rawlins, P. (2022). E-Assessment. In M. A. Peters (Ed.), Encyclopedia of teacher education (pp. 561-566). Springer Nature Singapore. https://doi.org/10.1007/978-981-16-8679-5_110
Schenker, N., & Gentleman, J. F. (2001). On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician, 55(3), 182-186. https://doi.org/10.1198/000313001317097960
Sirin, S. R. (2005). Socioeconomic status and academic achievement: A meta-analytic review of research. Review of Educational Research, 75(3), 417-453. https://doi.org/10.3102/ 00346543075003417
Timmis, S., Broadfoot, P., Sutherland, R., & Oldfield, A. (2016). Rethinking assessment in a digital age: Opportunities, challenges and risks. British Educational Research Journal, 42(3), 454-476. https://doi.org/10.1002/berj.3215
von Davier, M. (2020). TIMSS 2019 scaling methodology: Item response theory, population models, and linking across modes. In M. O. Martin, M. von Davier, & I. V. S. Mullis (Eds.), Methods and procedures: TIMSS 2019 technical report (pp. 11.1-11.25). TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement.
von Davier, M., Foy, P., Martin, M. O., & Mullis, I. V. S. (2020). Examining eTIMSS country differences between eTIMSS data and bridge data: A look at country-level mode of administration effects. In M. O. Martin, M. von Davier, & I. V. S. Mullis (Eds.), Methods and procedures: TIMSS 2019 technical report (pp. 13.1-13.24). TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511618765
Wainer, H., Kaplan, B., & Lewis, C. (1992). A comparison of the performance of simulated hierarchical and linear testlets. Journal of Educational Measurement, 29(3), 243-251.
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375. https://doi.org/10.1111/ j.1745-3984.1984.tb01040.x
White, K. R. (1982). The relation between socioeconomic status and academic achievement. Psychological Bulletin, 91(3), 461-481. https://doi.org/10.1037/0033-2909.91.3.461
Yan, D., von Davier, A. A., & Lewis, C. (Eds.). (2014). Computerized multistage testing: Theory and applications. Chapman & Hall/CRC. https://doi.org/10.1201/b16858