Another listening task used in tests today is sentence repetition, which requires test takers to orally repeat what they hear, or the analogous written task of dictation, which requires people to write what they hear. As with constructed response items, computer technology has made the scoring of sentence repetition and dictation objective and practical. Translation tasks, which require test takers to translate what they hear in the target language into their first language, is also a popular task used for assessing listening, especially when everyone who is assessed has the same first language.
Limitations of Listening Assessment Tasks
Validly assessing second language listening comprehension presents a number of challenges. The process of listening comprehension is not completely understood, and there are currently no methods which allow a view of the listener's brain to see what has been comprehended. Instead the listener must indicate what has been understood. The medium of this indication, along with other factors, has potential for diminishing the validity of listening assessments.
The majority of listening tasks require test takers to select responses from given choices or to use speaking, reading, or writing skills to demonstrate comprehension of the input. For instance, most forms of multiple‐choice, true/false, matching, short‐answer and long‐answer items require test takers to read the questions and make a selection or provide a written response, while other tasks—such as sentence repetition—require oral responses. The need for learners to use other language skills when their listening is assessed can lead to scores that are not representative of their listening abilities in isolation, such as when watching a movie.
Strategies and other abilities not generally defined as part of a listening comprehension construct may also lead test takers to achieve scores that are not representative of their listening abilities. For instance, some learners may be able to eliminate wrong answer options or even select the correct answer by using test‐taking strategies, such as selecting the longest answer to increase their chances of getting an answer to a multiple‐choice item correct. Another problem with sentence repetition and dictation tasks is that people with well‐developed sound recognition skills may be able to repeat the sounds they hear or write letters that correspond to the words they hear without comprehending the information.
Scores on listening assessments are compromised in various ways depending on the tasks that test developers choose to use. Therefore, listening assessment developers and users should take into account the abilities of the test takers and limitations of the tasks used to best ensure that the test provides a valid indication of learners' listening abilities.
Computers in Assessing Listening
Developments in computer technology expand the potential for different types of listening input and ways of determining comprehension of input. For instance, technology allows acoustic signals to be easily accompanied by various types of input such as visual stimuli, which can make tasks more realistic.
Computers also increase the potential for using test items that require short constructed responses. Developers create model answers and identify the key words and phrases in them, and then these key words and phrases, along with acceptable synonyms, can be used to create a scoring algorithm. Reponses to items that contain part or all of the targeted information can be given partial or full credit (Carr, 2014).
Computer technology may, however, have a negative effect on the assessment of listening. The tendency to use technology because it is available and attractive may lead to assessments that do not validly measure listening comprehension (Ockey, 2009). For instance, including attractively colored interfaces or interesting sounds, which are not part of the message to be comprehended, may be distracting to some test takers. Such distractions could lead to invalid scores for affected individuals. Moreover, computer scoring systems cannot make human judgments that may be important when assessing language (Condon, 2006). For instance, in a summary task scored by a computer, a test taker who paraphrases may be assigned a low score because the computer does not recognize that the test taker has comprehended the input; the scoring algorithm only assigns points for vocabulary (or common synonyms of the vocabulary) found in the input. It is therefore important that test developers who use computers to assess listening recognize the strengths and limitations of this technology.
Current and Future Directions in Assessing Listening
An increasing number of researchers support listening assessments which include as much contextual information in the input as possible (Gruba, 1999; Buck, 2001; Ockey & Wagner, 2018). Of particular interest to most of these researchers is that the acoustic signal be accompanied by associated visual stimuli, such as a video which shows the speaker.
Studies which have investigated people's use of visual information when taking a listening test have been revealing. The results of these studies suggest that individuals make use of visual information to varying degrees. Ockey (2007) found that with context‐only video, which only establishes the context of the discourse (e.g., shows a woman giving a lecture in a university setting), test takers engaged to vastly different extents, with some reporting a great deal of use of the visuals, others reporting little or no use, and still others indicating that the visuals were distracting them from “listening.” Rhetorical signaling cues (movements or expressions of the speaker) have also been shown to be important in listening comprehension (Dunkel & Davis, 1994). Jung (2006) found that second language learners misinterpreted texts more commonly when rhetorical signaling cues were excluded from the input. Wagner (2010) found that test takers achieved higher scores on video than on audio‐only tests when actors used gestures to help with explanations. Ockey (2007) found that individual test takers report using various types of rhetorical signaling cues, including lip movements, facial gestures, and hand movements.
Research which has compared test takers' scores on audio‐only assessments with tests that include visual stimuli as well as the audio input have produced mixed results. Some studies have found that including visuals leads to increased scores (Shin, 1998; Sueyoshi & Hardison, 2005; Wagner, 2010), while others have failed to find a difference in scores for the two conditions (Gruba, 1989; Coniam, 2000). A recent study by Batty (2018) may help to explain these contradictory findings. He found that particular question types are impacted in different ways by including visuals. His research indicated that implicit items were made much easier by visuals while explicit items were less affected by including visual stimuli. It may be that studies which had mostly explicit items failed to find a difference between the audio‐only and audio accompanied by visual information.
Eye‐tracking research has also provided increased understanding of listening processes while test takers attempt to comprehend listening input. Using eye‐tracking technology, Suvorov (2015) considered dwell time (how long eye gaze fixates on a particular visual stimuli) and found that test takers paid more attention to content videos than context videos. Also using dwell time with eye‐tracking technology, Batty (2016) found that test takers spent over 80% of their time observing facial cues when watching videos.
Researchers increasingly argue that the aim of assessing listening should not necessarily be to attempt to isolate comprehension from other language abilities. These researchers contend that listening is commonly interactive, meaning most listening includes opportunities to ask for clarification and that listeners are typically expected to respond after listening (Douglas, 1997; Ockey & Wagner, 2018). Other research indicates that listening and speaking cannot be separated in interactive discussions among two or more individuals (Ducasse & Brown, 2009; Galaczi, 2014). As a result of these conceptualizations of listening and research findings, test developers have begun to create listen–speak tasks (and other integrated listening items), which require both listening and speaking. They contend that it may not be appropriate or even possible to measure listening as distinct from oral production in an interactive communication context. Such an approach limits concerns about measuring more than “listening” with listening assessments.
Читать дальше