Journal directory listing - Volume 68 (2023) - Journal of Research in Education Sciences【68(2)】June

Assessing the Validity of Standard-Setting for an English Language Assessment With a Hybrid Expert and Empirical Performance Model Author: Jin-Chang Hsieh (Research Center for Testing and Assessment, National Academy for Educational Research)

Vol.&No.:Vol. 68, No. 2
Date:June 2023
Pages:1-35
DOI:https://doi.org/10.6209/JORIES.202306_68(2).0001

Abstract:
Background and Purpose
The Taiwan Assessment of Student Achievement: Longitudinal Study (TASAL) was implemented to evaluate the effect of the new 12-year basic education curriculum on student performance in Taiwan. TASAL is a standards-based, large-scale assessment that aims to track the literacy growth of Taiwanese students, explore relevant factors, and collect empirical evidence to assist in the development of future curriculum guidelines. This study assessed the validity of standard-setting with a hybrid model combining expert and student empirical performance.
The hybrid model exhibits multidimensional, multisource, and long-term cumulative features. The multidimensional feature provides evidence for procedural, internal, and external validity and for setting appropriate standards (Kane, 1994, 2001; Pant et al., 2009). The multisource feature indicates that the evidence of validity is derived from various sources, such as expert opinions and students’ empirical performance. Finally, the long-term cumulative feature represents the process of accumulating evidence over a long period. Presenting every type of evidence in a study is challenging due to time and resource constraints. The burden placed on researchers and students should be considered.
Method
1. Sampling
In 2019, the evaluation of seventh-grade students was initiated formally in TASAL. In 2020, the same group of students, now in the eighth grade, was evaluated in TASAL. The sampling method was stratified two-stage cluster sampling. Initially, 256 junior high schools were selected to take part in the evaluation. Finally, 246 schools with a total of 2,793 students were enrolled for this project. Regarding the English test of TASAL, in 2019, 2,793 seventh-grade students took the TASAL English test. In 2020, 2,893 eighth-grade students took the test. Among the eighth-grade students, 2,554 took the English test in both years.
2. Materials
The TASAL English core competence assessment was developed through a standardized procedure, including purpose clarification, theory construction, assessment guidelines, performance level descriptor development, test item designation, test assembly, and data analysis. The TASAL English core competence assessment examines English reading comprehension according to the corresponding content in the 12-year basic education curriculum. Based on the concept of transforming verb-noun usage into cognitive processes and content knowledge, as proposed by Anderson et al. (2001), a separate set of assessment criteria and test items has been developed for the TASAL English core competence assessment to evaluate reading comprehension.
In the TASAL English core competence assessment, six levels of performance descriptors was initially proposed (Hsieh, 2023). However, no corresponding test items were available for the sixth (highest) level of the assessment, because the standard-setting process still focused on the seventh-grade test items. Therefore, this study focused on the first five levels, which included acquiring linguistic fluency, locating explicitly stated information, literal comprehension, implicit comprehension, and evaluation and reflection beyond text comprehension. According to a review of the literature, various text types based on the OECD text types (2019) are used in the TASAL English core competence assessment, and these types are modified to include descriptive, introductive, transactional, expository, commentary, persuasive, narrative, and literary texts. The assessment for seventh-grade students contained 182 test items, and the assessment for eighth-grade students contained 196 test items; 84 common items were included in both assessments. The response consistency was good. The Expected A Posteriori (EAP) estimate of the items were 0.85 and 0.91 in the assessments for seventh-grade and eighth-grade students, respectively.
3. Standard-setting
This study employed the extended Angoff method (Hambleton & Plake, 1995) to establish assessment standards. A total of 15 experts from various regions in Taiwan were trained and participated in the standard-setting meeting. Among these experts, 10 were women and 5 were men, with an average teaching experience of 18.25 years.
The standard-setting meeting was implemented in three rounds, and student ability and cutoff scores were estimated by weighted likelihood estimation (Warm, 1989). Statistical analyses were performed in R (R Core Team, 2022) and TAM software packages (Robitzsch et al., 2020).
Result and Conclusion
Feedback was collected using a questionnaire on standard-setting. Most of the experts rated the process and outcome of the standard-setting meeting as being well above or above average. The experts agreed or strongly agreed that providing feedback and PLD procedures were helpful in establishing standards. In summary, this study provides satisfactory evidence for the procedural validity of standard-setting.
This study also provides evidence for the internal validity of standard-setting. During the initial round, the standard error of cutoff scores was between 2.03 and 11.58, as reported by all experts across all levels. However, during subsequent rounds, the margin of error decreased. In general, most standard errors (relative to the measurement error of 34.64) were within an acceptable level of 0.33, which is consistent with the results of Kaftandjieva (2010, p. 104).
Using the English comprehension performance of eighth-grade students as the external criteria, the use of the scores obtained from the seventh-grade assessment to set cutoff scores was effective for significantly distinguishing between different levels of achievement. A partial η2 of .506 was obtained, indicating a large effect size, as suggested by Cohen (1988). In conclusion, this study provides evidence for the external validity of standard-setting.
In summary, some valuable suggestions are provided based on the study results. For example, when evaluating changes in student performance, the regression toward the mean may be a crucial factor affecting the result of standard-setting during the implementation of vertical articulation of cutoff scores across grades. Additionally, continuously collecting evidence to support the validity of standard-setting is crucial in responding to educational policies and curriculum guidelines. Therefore, the study results indicate the importance of building ongoing proof of validity in future research.

Keywords:English comprehension, hybrid of expert and student empirical performance models, Taiwan Assessment of Student Achievement: Longitudinal study, standard-based large-scale assessment, standard setting

《Full Text》 檔名

References:
  1. 任宗浩(2018)。十二年國民基本教育實施成效評估—臺灣學生成就長期追蹤評量計畫(第一期)(總計畫)(NAER-107-12-B-1-01-00-1-02)。國家教育研究院。【Jen, T.-H. (2018). Effectiveness of 12-year basic education program: A longitudinal study on Taiwan Assessment of Student Achievement (TASA-L) (I) (NAER-107-12-B-1-01-00-1-02). National Academy for Educational Research.】
  2. 吳正新(2019)。長期追蹤調查抽樣技術與權重校正(NAER-2019-113-A-1-1-E1-03)。國家教育研究院。【Wu, J.-S. (2019). Sampling design and weighting adjustment of large-scale surveys (NAER-2019-113-A-1-1- E1-03). National Academy for Educational Research.】
  3. 侯佩君、杜素豪、廖培珊、洪永泰、章英華(2008)。台灣鄉鎮市區類型之研究:「台灣社會變遷基本調查」第五期計畫之抽樣分層效果分析。調查研究─方法與應用,23,7-32。https://doi.org/10.7014/TCYCFFYYY.200804.0007 【Hou, P.-C., Tu, S.-H., Liao, P.-S., Hung, Y.-T., & Chang, Y.-H. (2008). The typology of townships in Taiwan: The analysis of sampling stratification of the 2005-2006 Taiwan social change survey. Survey Research: Method and Application, 23, 7-32. https://doi.org/10.7014/TCYCFFYYY.200804.0007】
  4. 國家教育研究院(2018)。十二年國民基本教育課程綱要:國民中小學暨普通型高級中等學校:語文領域─英語文作者。【National Academy for Educational Research. (2018). 12-year basic education curriculum for elementary and high school: English. Author.】
  5. 國家教育研究院(無日期)。首頁。臺灣學生成就長期追蹤評量計畫。2022年3月30日,https://tasal.naer.edu.tw/【National Academy for Educational Research. (n.d.). Homepage. Taiwan Assessment of Student Achievement: Longitudinal Study. Retrieved March 30, 2022, from https://tasal.naer.edu.tw/】
» More
APA FormatHsieh, J.-C. (2023). Assessing the validity of standard-setting for an English language assessment with a hybrid expert and empirical performance model. Journal of Research in Education Sciences, 68(2), 1-35. https://doi.org/10.6209/JORIES.202306_68(2).0001