LibCat » Книги » Приключения » unrecognised » Sandrine Zufferey - Introduction to Corpus Linguistics

Sandrine Zufferey - Introduction to Corpus Linguistics

Здесь есть возможность читать онлайн «Sandrine Zufferey - Introduction to Corpus Linguistics» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Introduction to Corpus Linguistics
Автор:
Sandrine Zufferey
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Introduction to Corpus Linguistics: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Introduction to Corpus Linguistics»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Over the past decades, the use of quantitative methods has become almost generalized in all domains of linguistics. However, using these methods requires a thorough understanding of the principles underlying them. Introduction to quantitative methods in linguistics aims at providing students with an up-to-date and accessible guide to both corpus linguistics and experimental linguistics. The objectives are to help students developing critical thinking about the way these methods are used in the literature and helping them to devise their own research projects using quantitative data analysis.

Introduction to Corpus Linguistics — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Introduction to Corpus Linguistics», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

If sentences following such a syntactic structure never or almost never appear in the corpus, linguists might conclude that this sentence is only rarely used in English. Rationalist methodology, on the contrary, might respond to the same issue by relying on the intuitions of linguists. In this particular case, they might wonder whether they could produce such a sentence or not, whether it seems correct or incorrect depending on their knowledge of the language and might infer a grammaticality judgment from it. Grammaticality judgments are often classified into three types: correct , incorrect or marked , in the event that a sentence may seem possible, but sounds unnatural.

This example illustrates a fundamental difference between empirical and rationalist methodology. While the rationalist methodology leads to the formulation of categorical judgments, the empirical methodology provides a more refined answer to this question, since the observation of corpus data offers a precise indication of frequency, rather than a result in terms of absence or presence . This is one of the reasons why many linguists currently consider that the empirical methodology better matches a scientific approach (in the sense of confrontation against the facts) than a purely rationalist method for studying language.

Nonetheless, the choice between the use of empirical or rationalist methods is not limited to the field of linguistics. Certain scientific branches such as physics, chemistry, as well as sociology and history are essentially empirical disciplines. In fact, both physicists and historians base their insights on external data, which they collect in the world, in order to build a theory, test it and draw conclusions from it. On the other hand, other disciplines such as mathematics or philosophy are traditionally based on a rationalist approach, since mathematicians and philosophers use their own reasoning to build theories and to draw conclusions, rather than from the collection and observation of external data. Philosophers often resort to thought experiments, but these are not experiments in the empirical sense of the term, because they are based on the reflective abilities of researchers.

1.3. Chomsky’s arguments against empiricism in linguistics

Although corpus linguistics has experienced a strong growth over the past 20 years, the empirical grounding of linguistics is not new. Linguists have long used observational data. In the 19th Century, for example, linguists used to work on the comparison of Indo-European languages in an attempt to reconstruct their common origin. Research was based on existing data about the languages spoken in Europe such as German, French and English. Similarly, in the first half of the 20th Century in the United States, the so-called distributionist approach to syntax focused on the study of sentence formation in syntactic structures as they appeared in text corpora, and from there, tried to infer language’s general functioning. Around the late 1950s, the use of corpora in linguistics was almost completely interrupted in certain fields such as syntax, following the works of the American linguist Noam Chomsky. In fact, Chomsky defended a strictly rationalist methodological approach to linguistics, and fiercely opposed any use of external data. The objections made by Chomsky against the use of external data in linguistics have been numerous. We will briefly review them, to show in what ways most of them have lost their raison d’être in the context of current research.

Chomsky’s first objection to the use of corpora, which is also the most fundamental one, is that corpora contain language samples produced by speakers. According to him, linguistics should not focus on the linguistic performance of speakers, but on the competence they have in their mother tongue, something he calls their internal language. Now, here is the problem. When people speak, what they produce (their performance) does not necessarily reflect what they know about their language (their competence). For example, under the effect of stress or fatigue, speakers sometimes produce verbal slip-ups or make language mistakes. From time to time, almost everybody happens to badly conjugate an irregular verb and mistakenly produce the form “ he eated ” instead of “ he ate ”. However, if the person who produced this wrong form were recorded, and then asked whether he or she thought he or she had spoken correctly or not, we can almost be sure that he or she would realize his or her mistake and would be able to state the correct form, “ he ate ”. Conversely, a speaker could pronounce a word like “ serendipity ” after having heard it from somebody else’s lips, but without really knowing its meaning. These examples illustrate the fact that the words speakers “utter” are not always a true reflection of their linguistic competence. In this way, according to Chomsky, the fact of studying corpora places linguists on the wrong track, because they lead them to consider language from the point of view of “production”, which merely represents a biased reflection of the rules of language.

According to Chomsky, another problem related to corpus linguistics stems from the fact that corpora are not representative of the language as a whole. He illustrates this problem in an extreme way, by picking the case of an aphasic speaker recorded in a corpus. Linguists analyzing this corpus would draw totally incorrect conclusions about the language in question, since this person does not represent the linguistic competence of a typical speaker. Furthermore, even if we were not to include an atypical speaker, a corpus could never represent more than a tiny language sample when compared to all the oral and written productions in any language. It is for this very same reason that it is impossible to conclude that a word simply does not exist in a language just because it is absent from a corpus. It could simply never have been pronounced in such particular context, while it could exist in other language registers or have been mentioned by other speakers not included in the corpus. This problem is particularly acute in the case of rare linguistic phenomena, such as infrequent words or little used linguistic structures.

This limitation has led to Chomsky’s third criticism of corpora, namely the fact that a corpus can never contain the whole of a language and that, therefore, the above-mentioned biases are not solvable. According to him, this problem is all the more serious because even if a corpus were of a very large size and included a representative portion of the language, it would not be fully analyzable by linguists, given the fact that it is impossible to manually analyze the content of billions of sentences.

Chomsky’s last two objections have largely become obsolete due to the advances made in computer science. In fact, the size of corpora has increased exponentially over the past 20 years, and corpus analysis tools have also made considerable progress. It has thus become possible to analyze very large amounts of data, which represent a much more accurate mirror of the language than when Chomsky formulated his objections. We will return to this in section 1.4, devoted to the connections between computer science and corpus linguistics. In addition to these technological advances, theoretical and methodological advances have also largely made it possible to eliminate or control the other types of biases mentioned by Chomsky. For example, good practice for building a corpus is to accurately document the type of language it contains. This helps to avoid analyzing the language of a single aphasic subject by mistake, for example, as Chomsky might suggest. It is nonetheless true that a corpus can only show that which it contains, and therefore the absence of evidence that a word or a structure exists in a corpus cannot constitute definitive proof of their absence from the language. Thus, for certain research questions relating to rare or hardly observable phenomena in a corpus, it might be advisable to complement research with another empirical method, namely with the experimental method. As we will see later in this chapter, this method shares the use of a quantitative methodology with corpus linguistics.