Benefits of Integrated Assessment
The current interest in the profession for integrating skills for assessment resides in their apparent authenticity. Particularly for specific purposes, such as assessing academic language, needs analyses of language use have shown that skills are used in tandem rather than in isolation (e.g., Leki & Carson, 1997). Thus, including this integration as part of assessment creates test tasks that appear authentic in view of their alignment with real language‐use contexts. The connection between the test and the real world is intended to result in a positive impact on test users' confidence in the scores, increase test takers' motivation, and lead to scores that are more predictive of future performance. Integrated assessments that provide test takers with content or ideas for their performances may mitigate nonlanguage factors such as creativity, background knowledge, or prior education, or a combination of these (Read, 1990). Some research has reported that test takers prefer integrated tasks because they understand the task topic better than they do on single skill tasks and may generate ideas from the sources given (Plakans, 2009). However, Huang and Hung (2013) found that actual performance and anxiety measures did not support test takers' perceptions that integrated tasks lower anxiety in comparison with independent speaking tasks.
Another advantage with this kind of assessment is the emphasis on the skills working together rather than viewing them as individual components of language ability. Integrated assessment may fit well with current language‐teaching approaches, such as task‐based language teaching (TBLT), which move away from teaching separate skills to focusing on accomplishing tasks using language holistically. Such tests may also have a positive washback, or impact, on classrooms that integrate skills, focus on content and language integrated learning (CLIL), or have goals for specific‐purposes language use.
Challenges of Integrated Assessment
Although visible benefits exist with integrating skills in assessment, a number of challenges remain, such as developing high‐quality integrated tasks, rating learners' performance appropriately, and justifying the validity of interpretations and uses.
Developing high‐quality integrated prompts can be challenging because these tasks are often complex, including multiple steps and texts. The development or selection of source texts requires decisions to be made about the length and level of the text as well as about the content. For example, some tests include multiple texts that give different opinions on a topic while others have one long text that describes a single phenomenon. When test developers aim to produce parallel items these considerations about texts need to be taken into account. Carefully crafted instructions are an important consideration. Test takers need a clear idea of what is expected and, as much as possible, of how to integrate the skills in their process and product. Studies have shown that test takers approach these tasks in a variety of ways, some using both skills to complete the tasks while others use only one skill and thus are not truly integrating. With more frequent use, test takers' confusion may decrease; however, those unfamiliar with the type of assessment may struggle to understand how to complete the task, which can affect their score regardless of their language ability.
Although several studies have found assessment of integrated skills tasks can lead to reliable rating (Ascención, 2005; Gebril, 2010), the issue of scoring these performance‐based tasks remains difficult. The rubric for integrated skills assessment needs to reflect skill integration in some way unless there is a clearly dominant skill that is of primary concern, such as with stimulus tasks or thematically linked tasks that do not require a content‐responsible response. Thus, a clear definition of the role of the integrated skills and what constitutes evidence for them in the performance is needed for meaningful scoring. The example below presents a detailed rubric checklist for assessing integrated reading and writing skills.
|
No evidence |
Highly competent |
1. Comprehends main idea in the text |
1 |
2 |
3 |
4 |
2. Distinguishes details and key ideas |
1 |
2 |
3 |
4 |
3. Paraphrases ideas from source text appropriately |
1 |
2 |
3 |
4 |
4. Selects ideas from the source text well |
1 |
2 |
3 |
4 |
5. Connects ideas from source text with own |
1 |
2 |
3 |
4 |
6. Clearly develops a thesis position |
1 |
2 |
3 |
4 |
7. Provides support for position |
1 |
2 |
3 |
4 |
8. Follows logical organization and cohesion |
1 |
2 |
3 |
4 |
9. Displays grammatical accuracy |
1 |
2 |
3 |
4 |
10. Uses clear specific vocabulary |
1 |
2 |
3 |
4 |
When test takers are required to draw on texts in their performance, they may copy strings of words verbatim from others or plagiarize (Cumming et al., 2005; Gebril & Plakans, 2013; Barkaoui, 2015), which makes the responses difficult to rate and affects score interpretation. For some test takers, such borrowing may be a strategy, needed because of low reading comprehension, hesitancy in expressing themselves in writing, or a lack of experience with source‐text integration (Yu, 2008; Wolfersberger, 2013). However, skills required by some types of integrated reading–writing tasks include selecting ideas from texts to include in one's writing, choosing when to paraphrase or quote from the text, and including appropriate citation. Therefore, test developers need to consider how these skills appear in the construct that integrated tasks are intended to assess. Rating rubrics need to address how to score such writing; for example, the rubric above includes the descriptor, “Paraphrases ideas from source text appropriately,” which could prompt raters to negatively appraise direct source‐text copying.
Validity of test‐score interpretation and use needs to be justified on the basis of evidence for a correspondence between test scores and the integrated ability that the test is intended to measure, as well as evidence for the utility of the test scores. Test developers and researchers need to consider how to elicit evidence in order to conduct validation research. When dividing language into skills areas, defining the construct appears manageable; questions arise, however, with the construct underlying integrated assessment. Examining writing processes in a thematically linked integrated assessment, Esmaeili (2002) concluded that reading and writing could not be viewed as stand‐alone constructs. In a study of non‐native and native English speakers, Delaney (2008) found that reading‐to‐write tasks elicited processes attributable to unique constructs that were not merely a combination of reading ability and writing skill, but also of discourse synthesis. Plakans (2009) also found evidence of discourse synthesis in writers' composing processes for integrated writing assessment and concluded that the evidence supported interpretation of such a construct from test scores. Using structural equation modeling and qualitative methods, Yang and Plakans (2012) found complex interrelated strategies used by writers in reading–listening–writing tasks, further supporting the idea that the processes related to discourse synthesis (selecting, connecting, and organizing) improved test performance. In a similar study, focused on summarization tasks, Yang (2014) used structural equation modeling (SEM) to provide evidence that the task required comprehension and construction strategies as well as planning, evaluating, source use, and discourse synthesis strategies. While research into validity and integrated assessment is building momentum, ongoing attention and research is needed to attention to refine evolving definitions, innovation in task types, and approaches to scoring.
Читать дальше