題型對學生數學表現水準之影響─以相似形為例
作者:臺北市立大理高級中學陳建亨、國立臺灣師範大學數學系楊凱琳
卷期:66卷第3期
日期:2021年9月
頁碼:247-277
DOI:https://doi.org/10.6209/JORIES.202109_66(3).0008
摘要:
本研究的目的為探討不同題型對試題難度、試題鑑別度及九年級學生數學表現水準的影響,研究者以相似形的能力指標作為測驗內容,設計相同的題目敘述之選擇題、填空題及計算說理題,並編製成含有六題共同題的三種試卷,以進行測驗等化之用。每份試卷分別有361、411及378位的受試者,合計1,150位的受試者。研究設計採用共同題不等組設計,並以同時校準法(concurrent calibration method)進行測驗等化。本研究分別用單向度二參數部分計分模式與多向度潛在迴歸模式(multidimensional latent regression model),進行試題難度、鑑別度及學生在不同題型下所表現之能力參數的估計,並將學生在各題型中的能力估計值排序換算為百分等級。研究結果發現:一、選擇題的平均難度最低,計算說理題的平均難度最高;二、選擇題的平均鑑別度最低,計算說理題的平均鑑別度最高;三、單向度模式分析發現,有69.04%的學生在三種題型能力估計值的PR值(百分等級)相差20以上;四、多向度潛在迴歸模式分析發現,有12.61%的學生在三種題型能力估計值的PR值相差15以上。研究結果顯示,測驗題與學生的數學表現水準有關,而且主要有兩種不同型態的影響。文末針對兩類學生在不同題型較具優勢的原因,做進一步的討論。
關鍵詞:相似形、測驗等化、試題反應理論、題目類型
《詳全文》
參考文獻:
- 大學入學考試中心(2011)。指定科目考試數學考科考試說明(適用99課綱)。https://www.ceec.edu.tw/files/file_pool/1/0J052618615204795377/03-102指考數學考試說明_定稿_.pdf 【College Entrance Examination Center. (2011). Introduction of advanced subjects test mathematics (99 education curricula). https://www.ceec.edu.tw/files/file_pool/1/0J052618615204795377/03-102指考數學考試說明_定稿_.pdf】
- 王文中、呂金燮、吳毓瑩、張郁雯、張淑慧(1999)。教育測驗與評量─教室學習觀點。五南。 【Wang, W.-C., Lu, C.-H., Wu, Y.-Y., Chang, Y.-W., & Chang, S.-H. (1999). Educational testing and assessment: The point of view on class learning. Wu-Nan.】
- 余民寧(2002)。教育測驗與評量─成就測驗與教學評量。心理。 【Yu, M.-N. (2002). Educational testing and assessment: Achievement test and instructional evaluation. Psychological.】
- 余民寧(2009)。試題反應理論(IRT)及其應用。心理。 【Yu, M.-N. (2009). Item Response Theory and application. Psychological.】
- 吳宜靜(2005)。八二年版國一學生縮圖與放大圖繪製之概念與表現(未出版碩士論文)。國立臺南大學。 【Wu, Y.-C. (2005). The concept and performance on both reduced and enlarged graphs of seventh grade students [Unpublished master’s thesis]. National University of Tainan.】
» 展開更多
- 大學入學考試中心(2011)。指定科目考試數學考科考試說明(適用99課綱)。https://www.ceec.edu.tw/files/file_pool/1/0J052618615204795377/03-102指考數學考試說明_定稿_.pdf 【College Entrance Examination Center. (2011). Introduction of advanced subjects test mathematics (99 education curricula). https://www.ceec.edu.tw/files/file_pool/1/0J052618615204795377/03-102指考數學考試說明_定稿_.pdf】
- 王文中、呂金燮、吳毓瑩、張郁雯、張淑慧(1999)。教育測驗與評量─教室學習觀點。五南。 【Wang, W.-C., Lu, C.-H., Wu, Y.-Y., Chang, Y.-W., & Chang, S.-H. (1999). Educational testing and assessment: The point of view on class learning. Wu-Nan.】
- 余民寧(2002)。教育測驗與評量─成就測驗與教學評量。心理。 【Yu, M.-N. (2002). Educational testing and assessment: Achievement test and instructional evaluation. Psychological.】
- 余民寧(2009)。試題反應理論(IRT)及其應用。心理。 【Yu, M.-N. (2009). Item Response Theory and application. Psychological.】
- 吳宜靜(2005)。八二年版國一學生縮圖與放大圖繪製之概念與表現(未出版碩士論文)。國立臺南大學。 【Wu, Y.-C. (2005). The concept and performance on both reduced and enlarged graphs of seventh grade students [Unpublished master’s thesis]. National University of Tainan.】
- 胡詩菁、鍾靜(2015)。數學課室中應用建構反應題進行形成性評量之研究。臺灣數學教師,36(2),26-48。https://doi.org/10.6610/TJMT.20150806.01 【Hu, S.-C., & Chung, C. (2015). A research of formative assessment in mathematics classroom with constructed response items. Taiwan Journal of Mathematics Teachers, 36(2), 26-48. https://doi.org/10.6610/TJMT.20150806.01】
- 國立臺灣師範大學心理與教育測驗研究發展中心(2013a)。試題取材與命題原則。http://www.cap.rcpet.edu.tw/test_3.html 【Research Center for Psychological and Educational Testing, National Taiwan Normal University. (2013a). Themes and the principle of testing. http://www.cap.rcpet.edu.tw/test_3.html】
- 國立臺灣師範大學心理與教育測驗研究發展中心(2013b)。數學科(含非選擇題題型)考試內容。https://cap.rcpet.edu.tw/test_4_4.html 【Research Center for Psychological and Educational Testing, National Taiwan Normal University. (2013b). Mathematics examination content (including non-choice items). https://cap.rcpet.edu.tw/test_4_4.html】
- 康木村、柳賢(2004,12月)。國中學生「相似形」迷思概念之研究。中華民國第二十屆科學教育學術研討會,高雄縣,臺灣。 【Kang, M.-T., & Liu, H. (2004, December). Study of similar figure misconceptions in junior high school students. The 2004 International Conference of Science Education in Taiwan, Kaohsiung, Taiwan.】
- 教育部(2012)。97年國民中小學九年一貫課程綱要。https://www.k12ea.gov.tw/files/97_sid17/980424數學課程綱要修訂(單冊).pdf 【Ministry of Education. (2012). 1998 grade 1-9 curriculum guidelines. https://www.k12ea.gov.tw/files/97_sid17/980424數學課程綱要修訂(單冊).pdf】
- 郭生玉(2004)。教育測驗與評量。精華書局。 【Kuo, S.-Y. (2004). Educational testing and assessment. Jingwha.】
- 陳建亨、楊凱琳(2014,12月)。題型對學生解題表現的影響─以相似形內容為例。第30屆科學教育學術研討會,臺北市,臺灣。 【Chen, C.-H., & Yang, K.-L. (2014, December). Effect of item types on students’ performance in mathematical problem solving: Taking similar figures as an example. 2014 International Conference of Science Education, Taipei, Taiwan.】
- 陳映孜、何曉琪、劉昆夏、林煥祥、鄭英耀(2017)。從教師自編科學成就測驗之Rasch分析看教與學。教育科學研究期刊,62(3),1-23。https://doi.org/10.6209/JORIES.2017.62(3).01 【Chen, Y.-T., Ho, H.-C., Liu, K.-H., Lin, H.-S., & Cheng, Y.-Y. (2017). Glimpse into teaching and learning using Rasch analyses of a teacher-made science achievement test. Journal of Research in Education Sciences, 62(3), 1-23. https://doi.org/10.6209/JORIES.2017.62(3).01】
- 黃國展(2003)。國三學生解相似形問題之歷程分析研究(未出版碩士論文)。國立高雄師範大學。 【Huang, K.-C. (2003). Analysis and study of grade 9 students process the similar figures questions [Unpublished master’s thesis]. National Kaohsiung Normal University.】
- 趙子揚、黃嘉莉、宋曜廷、郭蕙寧、許明輝(2016)。教師情境判斷測驗之編製。教育科學研究期刊,61(2),85-117。https://doi.org/10.6209/JORIES.2016.61(2).04 【Chao, T.-Y., Huang, J.-L., Sung, Y.-T., Kuo, H.-N., & Shiu, M.-H. (2016). Construction of the teacher situational judgment test. Journal of Research in Education Sciences, 61(2), 85-117. https://doi.org/10.6209/JORIES.2016.61(2).04】
- 簡啟全(2011)。國中數學科「相似形」單元電腦化測驗與診斷模式研發(未出版碩士論文)。國立臺中教育大學。 【Chien, C.-C. (2011). The computerized diagnostic test and diagnostic mode research for similar figures of mathematics in junior high school [Unpublished master’s thesis]. National Taichung University of Education.】
- 藍珮君(2008,11月)。華語文能力測驗垂直等化研究。2008台灣華語文教學年會暨研討會,花蓮縣,臺灣。 【Lan, P.-J. (2008, November). Study of test vertical scaling in test of Chinese as a foreign language. 2008 International Annual Conference of Teaching Chinese as a Second Language, Hualien, Taiwan.】
- 藍珮君、陳柏熹(2014)。華語文閱讀測驗信度效度分析與垂直等化研究。華語文教學研究,11(1),99-125。 【Lan, P.-J., & Chen, P.-H. (2014). A reliability, validity and vertical equating study of the reading subtest of the test of Chinese as a foreign language. Journal of Chinese Language Teaching, 11(1), 99-125.】
- Ajideh, P., & Mozaffarzadeh, S. (2012). C-test vs. Multiple-choice cloze test as tests of reading comprehension in Iranian EFL context: Learners’ perspective. English Language Teaching, 5(11), 143-150. https://doi.org/10.5539/elt.v5n11p143
- Berg, C. A., & Smith, P. (1994). Assessing students’ abilities to construct and interpret line graphs: Disparities between multiple-choice and free-response instruments. Science Education, 78(6), 527-554. https://doi.org/10.1002/sce.3730780602
- Bridgeman, B. (1992). A comparison of quantitative questions in open-ended and multiple-choice formats. Journal of Educational Measurement, 29(3), 253-271. https://doi.org/10.2307/1435138
- Cox, D. C. (2013). Similarity in middle school mathematics: At the crossroads of geometry and number. Mathematical Thinking and Learning, 15(1), 3-23. https://doi.org/10.1080/10986065.2013.738377
- Eduwem, J. D., & Umoinyang, I. E. (2014). Item types and upper basic education students’ performance in mathematics in the Southern Senatorial District of Cross River State, Nigeria. Journal of Modern Education Review, 4(1), 57-73. https://doi.org/10.15341/jmer(2155-7993)/01.04.2014/008
- Evens, H., & Houssart, J. (2004). Categorizing pupils’ written answers to a mathematics test question: “I know but I can’t explain”. Educational Research, 46(3), 269-282. https://doi.org/10.1080/0013188042000277331
- Freedle, R., & Kostin, I. (1993). The prediction of TOEFL reading item difficulty: Implications for construct validity. Language Testing, 10(2), 133-170. https://doi.org/10.1177/026553229301000203
- Haladyna, T. M. (1997). Writing test items to evaluate higher order thinking. Allyn & Bacon.
- Hancock, G. R. (1994). Cognitive complexity and the comparability of multiple-choice and constructed-response test formats. The Journal of Experimental Education, 62(2), 143-157. https://doi.org/10.1080/00220973.1994.9943836
- Hollingworth, L., Beard, J. J., & Proctor, T. P. (2007). An investigation of item type in a standards-based assessment. Practical Assessment Research & Evaluation, 12(18). https://doi.org/10.7275/6ggz-8837
- Koğar, E. Y., & Koğar, H. (2018). Examination of dimensionality and latent trait scores on mixed-format tests. PEOPLE: International Journal of Social Sciences, 4(1), 165-185. https://doi.org/10.20319/pijss.2018.41.165185
- Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). Springer. https://doi.org/10.1007/978-1-4939-0317-7_10
- Martinez, M. E. (1991). A comparison of multiple-choice and constructed figural response items. Joural of Educational Measurement, 28(2), 131-145. https://doi.org/10.1111/j.1745-3984.1991.tb00349.x
- Mullis, I. V. S., & Martin, M. O. (Eds.). (2017). TIMSS 2019 assessment frameworks. http://timssandpirls.bc.edu/timss2019/frameworks/
- National Assessment Governing Board. (2002). Mathematics framework for the 2003 National Assessment of Educational Progress. https://files.eric.ed.gov/fulltext/ED470533.pdf
- Oosterhof, A. C., & Coats, P. K. (1984). Comparison of difficulties and reliabilities of quantitative word problems in completion and multiple-choice item formats. Applied Psychological Measurement, 8(3), 287-294. https://doi.org/10.1177/014662168400800305
- Rylander, J., LeBlanc, C., Lees, D., Schipperr, S., & Milne, D. (2018). Validating classroom assessments measuring learner knowledge of academic vocabulary. The Institute for Liberal Arts and Sciences Bulletin, Kyoto University, 1, 83-110. https://doi.org/10.14989/ILAS_1_83
- Tankersley, K. (2007). Tests that teach: Using standardized tests to improve instruction. Association for Supervision and Curriculum Development.
- Volodin, N. A., & Adams, R. J. (1995, April). Identifying and estimating a D-dimensional item response model. Paper presented at the International Objective Measurement Workshop, Berkeley, CA, USA.
- Wise, S. L., & Gao, L. (2017). A general approach to measuring test-taking effort on computer-based tests. Applied Measurement in Education, 30(4), 343-354. https://doi.org/10.1080/08957347.2017.1353992
- Wolf, D. F. (1993). A comparison of assessment tasks used to measure FL reading comprehension. The Modern Language Journal, 77(4), 473-489. https://doi.org/10.1111/j.1540-4781.1993.tb01995.x
- Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370.
Journal directory listing - Volume 66 (2021) - Journal of Research in Education Sciences【66(3)】September
Effects of Item Type on Student Mathematics Performance: Similar Figures as an Example
Author: Chien-Heng Chen (Taipei Municipal DaLi High School), Kai-Lin Yang (Department of Mathematics, National Taiwan Normal University)
Vol.&No.:Vol. 66, No. 3
Date:September 2021
Pages:247-277
DOI:https://doi.org/10.6209/JORIES.202109_66(3).0008
Abstract:
This study investigated the influence of item type on the difficulty and discriminatory power of items and on ninth graders’ mathematics performance. Researchers tested similar figures as competence indicators; three tests were designed with multiple-choice items, completion items, and essay items that corresponded to the same stem, and six common items were equated for each test. The tests were administered to 1,150 students, with 361, 411, and 378 receiving different kind of three tests. The tests were equated using a common-item, nonequivalent group design and the concurrent-calibration method.
The difficulty and discriminatory power of items and the ability parameters of the students taking the tests with different item types were estimated with a unidimensional two-parameter partial-credit model and a multidimensional latent regression model. The estimates of the students’ ability parameters were converted to percentile ranks (PRs). The following are the results of the study: (1) The average difficulty of the multiple-choice items was the lowest, and that of the essay items was the highest. (2) The average discriminatory power of the multiple-choice items was the lowest, and that of the essay items was the highest. (3) For 69.04% of the students, the values of three item types differed by 20 or more in PR. (4) Finally, the multidimensional latent regression model revealed that, for 12.61% of the students, the values of three item types differed by 15 or more in PR.
The results indicate that item type is related to student mathematics performance through two main types of effect. The researchers further investigated why some students’ performance on the completion and essay items was superior to that on the multiple-choice items and why some students’ performance on the multiple-choice items was superior to that on the essay items.
Keywords:similar figure, test equating, item-response theory, item type