Big data and data science are partly technological phenomena, which are about using computing power and algorithms to collect and analyse comparatively large datasets of, often, unstructured information. But they are also most prominently cultural and political phenomena that come along with the idea that huge unstructured datasets, often based on social media interactions and other digital traces left by people, when paired with methods like machine learning and natural language processing, can offer a higher form of truth which can be computationally distilled rather than interpretively achieved.
Such mythological beliefs are not new, however, as there has long been, if not a hierarchy, at least a strict division of research methods within the cultural and social sciences, where some methods – those that have come to be labelled ‘quantitative’, and that analyse data tables with statistical tools – have been vested with an ‘aura of truth, objectivity, and accuracy’ (boyd and Crawford, 2012, p. 663). Other methods – those commonly named ‘qualitative’, and involving close readings of textual data from interviews, observations, and documents – are seen as more interpretive and subjective, rendering richer but also (allegedly) more problematic results. This book rests on the belief that this distinction is not only annoying, but also wrong. We can get at approximations of ‘the truth’ by analysing social and cultural patterns, and those analyses are by definition interpretive, no matter the chosen methodological strategy. Especially in this day and age where data, the bigger the better, are fetishised, it is high time to move on from the unproductive dichotomy of ‘qualitative’ versus ‘quantitative’.
Pure data science tends to focus very strongly simply on what is researchable. It goes for the issues for which there are data, no matter if those issues have any real-life urgency or not. The last decade has seen parts of the field of data science and parts of the social sciences become entangled in ways that risk a loss of theoretical grounding. In a seminal paper outlining the emerging discipline of ‘computational social science’, David Lazer and colleagues wrote that:
We live life in the network. We check our e-mails regularly, make mobile phone calls from almost any location, swipe transit cards to use public transportation, and make purchases with credit cards. Our movements in public places may be captured by video cameras, and our medical records stored as digital files. We may post blog entries accessible to anyone, or maintain friendships through online social networks. Each of these transactions leaves digital traces that can be compiled into comprehensive pictures of both individual and group behavior, with the potential to transform our understanding of our lives, organizations, and societies.
(Lazer et al., 2009, p. 721)
Furthermore, they argued that there was an inherent risk in the fact that existing social theories were ‘built mostly on a foundation of one-time “snapshot” data’ and that they therefore may not be fit to explain the ‘qualitatively new perspectives’ on human behaviour offered by the ‘vast, emerging data sets on how people interact’ (Lazer et al., 2009, p. 723). While I agree that social analysis must be re-thought in light of these developments, I am not so sure that it is simply about ‘compiling’ the data, and then being prepared that existing theories may no longer work. Rather, I argue, we should trust a bit more that even though the size and dynamics of the data may be previously unseen, the social patterns that they can lay bare – if adequately analysed – can still largely be interpreted with the help of ‘old’ theories, and with an ‘old’ approach to theorising. After all, theories are not designed to understand particular forms of data, but instead the sociality to which they bear witness.
My point is that data need theory, for considering both the data, the methods, the ethics, and the results of the research. By extension, still, theories may always need to be updated, revised, discarded, or newly invented – but that has always been true. As such, this book is therefore positioned within the broad field of ‘digital sociology’ as outlined by authors such as Deborah Lupton (2014) and Noortje Marres (2017). One strand within the debate about what digital sociology is, and what it entails, relates to the emergence of ‘digital methods’. In general, there is widespread disagreement about what such methods are, and whether there should be a focus on continuity with established social research traditions, or on revolutionary innovation. In a sense, this book can be read as one out of many possible ventures in the direction pointed out by Noortje Marres when she writes:
The digitization of social life and social research opens up anew long-standing questions about the relations between different methodological traditions in social enquiry: what are the defining methods of sociological research? Are some methods better attuned to digital environments, devices and practices than others? Do interpretative and quantitative methods present distinct methodological frameworks, or can these be combined?
(Marres, 2017, p. 105)
With co-author Caroline Gerlitz, Marres suggests that we go beyond previous divisions of methods by thinking in terms of ‘interface methods’ (Marres and Gerlitz, 2016). This means highlighting that digital methods are dynamic and under-determined, and that a multitude of methodologies are intersecting in digital research. By recognising ‘the unstable identity of digital social research techniques’, we can ‘activate our methodological imagination’ (Marres, 2017, p. 106). Marres continues to say that:
Rather than seeing the instability of digital data instruments and practices primarily as a methodological deficiency, i.e. as a threat to the robustness of sociological data, methods and findings, the dynamic nature of digital social life may also be understood as an enabling condition for social enquiry.
(Marres, 2017, p. 107)
In this book, I suggest a general stance by which more integrated methodologies can be developed and propagated. Writing from my own personal position as a social media researcher and cultural sociologist, I will present an argument that the data-drivenness of big data science does not in essence need to be conceived as being different from the data-drivenness of ethnography and anthropology. My end goal is to outline a framework by which theoretical interpretation and a ‘qualitative’ approach to data is integrated with ‘quantitative’ analysis and data science techniques.
The book, in the end, is especially focused on what interpretive sociology can bring to the table here. With this concept I refer to the classic notion of sociology as ‘a science concerning itself with the interpretive understanding of social action […] its course and consequences’ (Weber, [1921] 1978, p. 4). This kind of sociology is about the understanding ( Verstehen ) of social life and has a focus on processes of how meaning is created through social activities. In other words, it is not a positivist and objectivist science. As Max Weber put it, ‘meaning’ never refers:
to an objectively ‘correct’ meaning or one which is ‘true’ in some metaphysical sense. It is this which distinguishes the empirical sciences of action, such as sociology and history, from the dogmatic disciplines in that area […] which seek to ascertain the ‘true’ and ‘valid’ meanings associated with the objects of their investigation.
(Weber, [1921] 1978, p. 4)
Still, he continued, interpretive sociology ‘like all scientific observations, strives for clarity and verifiable accuracy of insight and comprehension ( Evidenz )’ (Weber, [1921] 1978, p. 4). The interpretive stance should entail moving back and forth between such evidence – data – and their iterative and cumulative interpretation – theory.
Читать дальше