Sandrine Zufferey - Introduction to Corpus Linguistics

Здесь есть возможность читать онлайн «Sandrine Zufferey - Introduction to Corpus Linguistics» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Introduction to Corpus Linguistics: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Introduction to Corpus Linguistics»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Over the past decades, the use of quantitative methods has become almost generalized in all domains of linguistics. However, using these methods requires a thorough understanding of the principles underlying them. Introduction to quantitative methods in linguistics aims at providing students with an up-to-date and accessible guide to both corpus linguistics and experimental linguistics. The objectives are to help students developing critical thinking about the way these methods are used in the literature and helping them to devise their own research projects using quantitative data analysis.

Introduction to Corpus Linguistics — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Introduction to Corpus Linguistics», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

But corpus linguistics was not only developed thanks to the creation of such tools. Above all, it is the general development of computers and the digital revolution which have made the greatest advances possible. In fact, the increase in the computing power of machines – as well as in their memory – has made it possible to build ever larger corpora. Until the 1980s, a corpus of a million words was considered to be a very large corpus. For instance, the first reference corpora (such as the Brown corpus developed for American English in the early 1960s) were about this size. At the same time, the arrival of cassette recorders to the market enabled the first creations of oral corpora containing an exact transcription of spoken speech, rather than a synthesis taken in shorthand.

The marketing of scanners in the 1980s later made it possible to digitize a significant amount of data and corpora began to reach larger sizes, up to 20 million words. Then, with the democratization of computer use, the amount of digitally disseminated texts greatly accelerated the growth of corpora. Finally, since the beginning of 21st Century, the wide dissemination of documents online via the Internet has given another dimension to the size of corpora available to researchers. At present, the Google Books corpus, for example, contains more than 500 billion words, which represents approximately 4% of all the published books of all time (Michel et al . 2011). We will discuss the possible uses of such a corpus in the following chapters. In Chapter 6, we will also see that the Internet potentially offers an exceptional data resource for corpus linguistics, but that Internet research cannot be used without an additional processing step if we are to grant data quality.

1.5. Quantitative versus qualitative methods

We have seen that computers help us to work on very large corpora and automatically count word occurrences, find keywords, etc. The need to use a large amount of data and the desire to quantify the presence of linguistic elements in a corpus corresponds to a quantitative research methodology. This methodology involves observing or manipulating variables, as well as the use of statistical tests. The main objective is to test a limited number of variables, in a highly controlled environment whenever possible and on a language sample that can be representative of the phenomenon studied. This can later make it possible to generalize the results obtained to the whole language or to a part of the target language (e.g. journalistic language). These methods nonetheless imply a certain form of reductionism and a simplification of reality. Ultimately, the addition of studies with well-defined and properly controlled variables may provide a global and realistic picture of a phenomenon.

Let us take an example. Suppose we want to test the hypothesis that women talk more about their feelings than men. To test this hypothesis by means of a corpus study, we should first make sure that we are comparing records of men and women produced in the same context, for example, in the context of friendly discussions around a topic, or a face-to-face interview with a researcher. We will also need to make sure that the corpus collected in this way includes approximately the same speaking time or the same number of words pronounced by men and women. This control over the linguistic context and the duration of interactions helps us to ensure that men and women have had fairly equal motives to pronounce words related to emotions/feelings, and as many chances of doing so. Second, we would have to choose a list of words to search within the corpus, representative of the vocabulary related to emotions, for example verbs such as to annoy , adjectives like furious or nouns like anger . Then, by comparing the number of times these words have been produced by the two groups and by validating the significance of the differences observed between the groups through statistical tests, we would be able to provide an answer to the research question. In this study, we have sought to reduce the number of confounding variables by controlling the context of production of the statements, as well as by limiting the word choice in the examined vocabulary. It is precisely this limited and reductionist aspect that the opponents to quantitative methods criticize, thinking that the constructed and unnatural context in which structured interviews take place does not reflect the richness of natural and spontaneous exchanges between speakers.

The other major methodological paradigm includes so-called qualitative studies. The main objective of these studies is holistic: they aim to study a phenomenon understanding it as a whole, as detailed and as thoroughly as possible, but in a small number of people. Due to their nature, qualitative studies are interpretative. In linguistics, research paradigms involving a qualitative methodology typically resort to the administration of questionnaires with open questions, interviews, observations or introspective techniques, such as think-aloud protocols. For example, in order to study the differences in the way of expressing emotions between men and women, a qualitative methodology could involve asking a reduced number of speakers, for example three men and three women, to describe the way in which they express their emotions, either by talking freely with the experimenter or by talking to each other. The analysis would then require an in-depth study of some of the examples found interesting during the discussion.

One of the main criticisms aimed at qualitative methods is that they are very subjective in nature, insofar as they are largely based on the interpretations made by linguists and the subjective impressions of a few speakers. Thus, the specific cases they describe cannot often be generalized to a population, which, by the way, is not the aim pursued by such studies. Rather than the generalization of results, these studies are based on the possibility of making a transfer from a particular situation so as to understand another one with which it shares common traits. For example, an in-depth case study on the difficulties of expressing emotions in an aphasic patient may help to highlight similar difficulties existing in other patients with the same disorder.

To summarize, each of the two methodological paradigms introduced in this section has both advantages and disadvantages. Quantitative methods enable the generalization of results to the whole of a population, whereas qualitative methods offer a more detailed and nuanced panorama of a real case. Recently, the complementarity between these approaches has started to be broadly accepted in research and many studies are crossing the two types of methodologies, in order to benefit from their advantages and limit their disadvantages.

For example, if we want to know whether learners of French as a foreign language at an advanced level are able to use collocations as native speakers do (collocations such as “ prendre une décision ” – to make a decision – or “ pleuvoir à verse ” – to pour with rain), we can search for occurrences of these expressions in text corpora produced by learners and compare the number of times these expressions appear – and their frequency – in a corpus of similar textual productions made by native speakers. By comparing these frequencies through statistical tests, we will know whether learners actually use these expressions as often as native speakers do, or not. Even if we find a difference between the two groups, something which this study will not tell us is why learners do not use these expressions as often as native speakers do or which expressions they use instead. To find out, we can complete this study with a qualitative analysis, by observing, for example, which words often accompany the occurrences of the noun décision in French, which are not the verb prendre . If we observe that several times the verb used is faire (make), rather than prendre (take), a decision in English-speaking learners, but not in German-speaking learners, we will conclude that these errors could come from a problem of transfer from their mother tongue and, more specifically, from the expression to make a decision in English.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Introduction to Corpus Linguistics»

Представляем Вашему вниманию похожие книги на «Introduction to Corpus Linguistics» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Introduction to Corpus Linguistics»

Обсуждение, отзывы о книге «Introduction to Corpus Linguistics» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x