LibCat » Книги » Приключения » unrecognised » Methodologies and Challenges in Forensic Linguistic Casework

Methodologies and Challenges in Forensic Linguistic Casework

Здесь есть возможность читать онлайн «Methodologies and Challenges in Forensic Linguistic Casework» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Methodologies and Challenges in Forensic Linguistic Casework
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Methodologies and Challenges in Forensic Linguistic Casework: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Methodologies and Challenges in Forensic Linguistic Casework»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Methodologies and Challenges in Forensic Linguistic Casework
Discover more about Forensic Linguistics, a fascinating cross-disciplinary field from an international team of renowned contributors Methodologies and Challenges in Forensic Linguistic Casework

Methodologies and Challenges in Forensic Linguistic Casework — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Methodologies and Challenges in Forensic Linguistic Casework», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

One advantage given to the analysis is that the texts were precisely time stamped. Emails as a genre naturally create an ordered series of texts for analysis, and this structure to the data can assist in devising a method and in hypothesis formation and testing. For example, if there is a working hypothesis of an account takeover by a different writer at some point in a series of emails, then this provides an analytic advantage over a situation where an email account might have been hacked and subject to occasional use by a second author.

In the Starbuck case, TG was able to clarify with the police investigator that the hypothesis of an account takeover was indeed central, and thus he was able to take this into account in analysis design. This is an advantage in analysis as it allowed the creation of different sets of texts. The first set was a group of known emails sent from Debbie’s account before any account takeover had occurred. This group included emails up to the last time Debbie had been seen alive and well. The second set was emails after which any account takeover may have occurred. If a style shift was to be found, it is likely that it will be within this group with the later emails in the group being stylistically different from those in the known set of emails. This is not to say that each email was not considered individually, but that they were also considered in terms of their position in the time series. This means, for example, that the weight of evidence for any style shift can be considered cumulatively after any identified break in style.

A further point in TG’s preliminary evaluation is that each email text was relatively short. At the most basic level, the problem of dealing with short texts is that they do not provide the analyst with as much material as longer texts, from which distinctive and consistent features might be identified. Generally, more evidence is simply better. 4Slightly more technically, the issue is that linguistic observations in less text will give rise to fewer examples of the feature, and this means that generalization into a pattern of use will be less reliable.

For example, imagine trying to predict the bias of a weighted coin: if you flipped it only a few times you would be unlikely to be able to estimate the bias correctly, but if you flipped it a few hundred times you might have a very good estimate. The same thing happens when you measure the relative frequency of a word (i.e., its percentage out of the total words in the text). If one looks at a single, short sentence from a text, the word ‘the’ might occur once in five words, but we would not want to generalize from such an observation that the word occurs once every five words across the entire text. Only after we have seen a sufficient number of tokens or instances of a word can we start to make such estimations. Texts that are fewer than 500 words long are therefore generally seen as being too short for the application of stylometric approaches to authorship analysis (although recently this number has been decreasing; see Grieve et al., 2019), and, often in a forensic context, the entire data set might be smaller than this.

Finally, one last complication with this data was that, although it consisted of emails, the police provided us with access only to screenshots of the texts. Because these were simple images, they could not be automatically analyzed computationally. As a result, we needed to convert these images into text using optical character recognition software, which was a relatively time-consuming process and required thorough checking against the image files to ensure that even minor punctuation features were correctly digitized.

The outcome of TG’s evaluation phase of the analysis was the judgment that this data set as a whole was well suited for analysis. Cases like this with small, closed sets of authors, sufficient data, and register control do occur with some regularity, despite claims sometimes made in the stylometry literature in particular (e.g., Luyckx & Daelemans, 2011). Law enforcement agencies can often provide these types of problem—especially with online language use providing essentially permanent records of data available. Researchers with relatively little forensic experience appear to focus their efforts on more and more challenging problems. For practical casework problems, these more complex research projects are less relevant. Such academic authorship studies are, of course, important, but many issues around the “easier” sorts of cases have not yet been resolved. By sharing actual investigative linguistic casework with the researchers and the public, the forensic linguistic community can help provide a picture of the landscape of actual forensic problems.

ANALYSIS

As noted already, the purpose of separating the analysis into stages was to allow TG to pass the data in the case to JG in a controlled way. Specifically, in line with the protocol published in Grant (2012) and, given the time series nature of the data, TG began by providing JG with only the two sets of known writings for Debbie and Jamie Starbuck. TG had requested from the police contact that he, too, should not be informed of any particular suspected breakpoint in the data series. In spite of this, the emails were provided to TG in two files of known and disputed emails. To resolve this, TG removed the last few emails from Debbie’s known emails and added them to the disputed set to create a blind test set of emails for JG’s analysis. The advantage of having a second party manage the data access for the primary analyst is that it allows for practical issues such as this to be taken from the hands of the police, who may not fully understand the requests to provide data in certain ways to assist in the outcome.

JG analyzed the known writings, primarily by hand, to identify a linguistic feature set that showed pairwise distinctiveness between the two possible authors—that is to say, features were identified that were consistently used by one author, but not by the other ( Table 2.1). Most notably, this approach prevented confirmation bias against any hypothesis as to who had written the disputed material. This is especially important in the context of the careful stylistic analysis for texts in forensic linguistics, which relies almost entirely on the judgment of the analyst as opposed to quantitative stylometric approaches (e.g., see Grieve, 2007), which generally involve the use of preselected feature sets (e.g., function word frequencies).

Table 2.1 Linguistic Feature Examples

Feature	Debbie	Jamie
Sentence length	Long sentences (24 words per sentence average) I’m now back in Oz, after 5 weeks In NZ—had a good time, though it felt so much more remote than here (guess it is!) and I really felt that, being there .	Short sentences (10 words per sentence average) I knew I’d forget something. 2 things in fact .
One-word sentences	No tokens	Occasional use Sorry. I thought I’d replied .
Run-on sentences	Relatively common Are you enjoying your new car, what is it?	No tokens
Awhile	No tokens	3 tokens Shouldv’e done that awhile ago .
Inserts	Relative uncommon ha ha—you’re entirely responsible for how or where it goes	Relatively common Umm….you haven’t actully apologised for anthing despite your insistence otherwise .
Emoticon usage	No tokens	9 tokens Its gorgeous:) hope you enjoyed your holiday .)

One basic distinction between a stylistic approach and a stylometric approach is that the stylistic approach generally involves a data-driven generation of a case-specific feature set, whereas stylometric analysis tends to rely on predesigned feature sets. The strength of one approach can be the weakness of the other in that feature sets arising from stylistic approaches resist generic validation studies but lead to explanation-rich outcomes that are easier to explain to non-specialists like police, lawyers, or juries. In contrast, stylometric features can be validated in independent testing—such that they can be applied consistently by researchers and minimize the need for analysts to rely on their own judgment—but the abstract nature of these analyses can resist informative explanation.