Monthly Archives: September 2016

Big data problems we face today can be traced to the social ordering practices of the 19th century

“This situation brings with it a socio-political dimension of interest to us, one in which our understanding of people and our actions on individuals, groups and populations are deeply implicated. The collection of social data had a purpose – understanding and controlling the population in a time of significant social change. To achieve this, new kinds of information and new methods for generating knowledge were required. Many ideas, concepts and categories developed during that first data revolution remain intact today, some uncritically accepted more now than when they were first developed. In this piece we draw out some connections between these two data ‘revolutions’ and the implications for the politics of information in contemporary society. It is clear that many of the problems in this first big data age and, more specifically, their solutions persist down to the present big data era.”
“These changes in knowledge were facilitated not only by large quantities of new information pouring in from around the world but by shifts in the production, processing and analysis of that information. Many of these methods are still with us including information taxonomies and knowledge trees to name but two. Hacking observed that while social categories are epistemic products their application can have marked ontological effects. Knowledge of the natural world was rapidly applied to the social world and the politicking of social identifies began in earnest, supported by a rising tide of data and analytical methods. Conservatives and social critics alike relied on the production and dissemination of data, both large and small, to support repression and reform. The public inquiry emerged as another 19th century mechanism that persists in the present, with the same general focus – poverty, crime, health and systemic failures.

These new knowledge demands saw some contextual successes, such as in the demographic and statistical sciences, and some failures, such as Babbage’s analytical engine design which was conceived but not completed during his lifetime. In some ways growing academic specialisations created a situation in which what was gained through a narrowing of focus and growth in sub-disciplinary activity was also lost in generalisability. This distinctly Victorian problem endures to the present day despite interdisciplinary projects of various kinds. Floridi writing on the philosophy of big data, has said quite specifically that the real big data problem we face today is less one of the quantity or quality of data or even technical skills but rather one of epistemology.
“Much of the data collected about human beings by bureaucratic systems has a history not simply of description or even understanding but one of control. Foucault’s power/knowledge nexus is situated in a selection of bureaucratic and institutional forms for this reason. Every deviant or ‘underperforming’ social category is a warrant for action once documented. Consequently a great deal of social data is coercive in nature. Social data is rarely neutral and the persistence of ‘wicked’ social problems illustrates how regulation has been favoured in preference to their solution. That a census or a social survey is a snapshot of the way our societies are regulated is rarely remarked on and instead emphasis is given to the presumed objectivity of the categories and their data. This is the ideology of the small data era in action – the claim that it is science and not society that we are seeing through such instruments.


The targets of social policy interventions for more than two centuries have essentially been the same categories of people – groups marked as moral outsiders (deviants) in their societies. The collection of data about these categories of people, in particular, was a marked feature of the first big data environment. These categories were operationalised through society’s regulatory processes and institutions including education, the law and of course healthcare. These are the same locations where debates about structure, agency and morality continue to intersect and where the use of data and technology are represented as largely emancipatory. The risk is that ‘big data’ replicates the ideological underpinnings common to much of what has been produced under the small data paradigm.
“Our question then is how do we go about re-writing the ideological inheritance of that first data revolution? Can we or will we unpack the ideological sequelae of that past revolution during this present one? The initial indicators are not good in that there is a pervasive assumption in this broad interdisciplinary field that reductive categories are both necessary and natural. Our social ordering practices have influenced our social epistemology. We run the risk in the social sciences of perpetuating the ideological victories of the first data revolution as we progress through the second. The need for critical analysis grows apace not just with the production of each new technique or technology but with the uncritical acceptance of the concepts, categories and assumptions that emerged from that first data revolution. That first data revolution proved to be a successful anti-revolutionary response to the numerous threats to social order posed by the incredible changes of the nineteenth century, rather than the Enlightenment emancipation that was promised.
“Information is not new and nor is data – of whatever order of magnitude. We are in a period that can reasonably be seen as the second ‘big data’ revolution and it is revolutionary because it challenges our accepted understanding of the world and not simply because of the volumes and velocity of data generation in our new digital information technologies. Many social categories were designed to control, coerce and even oppress their targets. The poor, the unmarried mother, the illegitimate child, the black, the unemployed, the disabled, the dependent elderly – none of these social categories of person is a neutral framing of individual or collective circumstances. They are instead a judgement on their place in modernity and material grounds for research, analysis and policy interventions of various kinds. Two centuries after the first big data revolution many of these categories remain with us almost unchanged and, given what we know of their consequences, we have to ask what will be their situation when this second data revolution draws to a close?

Like that first data revolution, this present one also has ambitions for people and their interactions with the new media emerging in its wake. These discussions are useful and necessary because discussion and negotiation are essential in the face of revolution. The responses to revolution in the late 18th and 19th centuries were often violent but we now have better methods available for the maintenance of social order as Foucault’s technologies of the self and Bourdieu’s habitus. Where we see this becoming highly problematic is in the continuity of ideologically informed notions of ourselves and others and the reproduction of such ideologies in and through our new digital environments. Following Floridi, this is a significant epistemic and ethical problem in our current big data era.”

from The London School of Economics and Political Science

Making up people

Making up people

by Ian Hacking

“I have long been interested in classifications of people, in how they affect the people classified, and how the affects on the people in turn change the classifications. We think of many kinds of people as objects of scientific inquiry. Sometimes to control them, as prostitutes, sometimes to help them, as potential suicides. Sometimes to organise and help, but at the same time keep ourselves safe, as the poor or the homeless. Sometimes to change them for their own good and the good of the public, as the obese. Sometimes just to admire, to understand, to encourage and perhaps even to emulate, as (sometimes) geniuses. We think of these kinds of people as definite classes defined by definite properties. As we get to know more about these properties, we will be able to control, help, change, or emulate them better. But it’s not quite like that. They are moving targets because our investigations interact with them, and change them. And since they are changed, they are not quite the same kind of people as before. The target has moved. I call this the ‘looping effect’. Sometimes, our sciences create kinds of people that in a certain sense did not exist before. I call this ‘making up people’.

What sciences? The ones I shall call the human sciences, which, thus understood, include many social sciences, psychology, psychiatry and, speaking loosely, a good deal of clinical medicine. I am only pointing, for not only is my definition vague, but specific sciences should never be defined except for administrative and educational purposes. Living sciences are always crossing borders and borrowing from each other.

The engines used in these sciences are engines of discovery but also engines for making up people. Statistical analysis of classes of people is a fundamental engine. We constantly try to medicalise: doctors tried to medicalise suicide as early as the 1830s. The brains of suicides were dissected to find the hidden cause. More generally, we try to biologise, to recognise a biological foundation for the problems that beset a class of people. More recently, we have hoped to geneticise as much as possible. Thus obesity, once regarded as a problem of incontinence, or weakness of the will, becomes the province of medicine, then of biology, and at present we search for inherited genetic tendencies. A similar story can be told in the search for the criminal personality.

These reflections on the classification of people are a species of nominalism. But traditional nominalism is static. Mine is dynamic; I am interested in how names interact with the named. The first dynamic nominalist may have been Nietzsche. An aphorism in The Gay Science begins: ‘There is something that causes me the greatest difficulty, and continues to do so without relief: unspeakably more depends on what things are called than on what they are.’ It ends: ‘Creating new names and assessments and apparent truths is enough to create new ‘things’.’ Making up people would be a special case of this phenomenon.”

from Generation Online

The Politics of Data: The rising prominence of a data-centric approach to scientific research

The Politics of Data: The rising prominence of a data-centric approach to scientific research

by Sabina Leonelli and Louise Bezuidenhout

” […] contemporary manifestations of big data have distinctive features that relate to the technologies, institutions and governance structures of the contemporary scientific world.

For instance, this approach is typically associated to the emergence of large-scale, multi-national networks of scientists; to a strong emphasis on the importance of sharing data and regarding them as valuable research outputs in and of themselves, regardless of whether or not they have yet been used as evidence for a given discovery; the institutionalization of procedures and norms for data dissemination through the Open Science and Open Data movements, and policies such as those recently adopted by Research Councils UK and key research funders such as the European Research Council, the Wellcome Trust and the Gates Foundation; and the development of instruments, building on digital technologies and web services, that facilitate the production and dissemination of data with a speed and geographical reach as yet unseen in the history of science.

This peculiar conjuncture of institutional, socio-political, economic and technological developments have considerably increased international debate over processes of data production, dissemination and interpretation within science and beyond. This level of reflexivity over data practices is arguably the most novel and interesting aspect of contemporary debates over big data. What we are witnessing is thus not the emergence of a wholly new research paradigm dealing with hitherto unseen types of data, but rather the rising prominence of a data-centric approach to scientific research, where concerns over data sharing and use in the long term take precedence over immediate attempts to analyze data.

Thus conceptualized, data centrism raises fundamental epistemological issues, which are deeply intertwined with the political challenges posed by big data. […] Philosophical analysis can help to address these questions in ways that inform both current data practices and the ways in which have been conceptualized within the social science and humanities, as well as by policy bodies and other institutions.”
“Scientific research is often presented as the most systematic set of efforts in the contemporary world aimed to critically explore and debate what constitutes acceptable and sufficient evidence for any given belief about reality. The very term ‘data’ comes from the Latin ‘given’, and indeed data are meant to document as faithfully and objectively as possible whatever entities or processes are being investigated. And yet, data collection is always steeped in a specific way of understanding the world and constrained by given material and social conditions, and the resulting data are therefore marked by the historical circumstances through which they were generated: what constitutes trustworthy or sufficient data changes across time and space, making it impossible to ever assemble a complete and intrinsically reliable dataset.”
“This landscape makes the study of data into an excellent entry point to reflect on the activities and claims associated to the idea of scientific knowledge, and the implications of existing conceptualisations of various forms of knowledge production and use.”
“From these interviews it became evident that there were a range of material and social aspects of their research environment that played significant roles in their overall data engagement activities.”
“Such research clearly demonstrates the importance of scrutinizing all processes involved in data engagement and to recognize the role that research environments play in not only the creation of data, but also their selection, presentation and dissemination. How scientists perceive their research environments, what they recognize as strengths and limitations, and what in these environments pose material or social challenges to data engagement all influence what data travels in or out of any research context.”
“The types of data shared and valued, the longevity of these data, and the pathways through which they are disseminated and re-used all have complicated relationships to the research environments in which they are utilized.  In consequence, homogenized perceptions of key issues such as what data are, how raw data differs from processed data, and how data ownership can be understood reveal their limitations.”

from The London School of Economics and Political Science

A Few Notes for STS on Big Data

A Few Notes for STS on Big Data

by David Banks

” Jurgeson writes:

Positivism’s intensity has waxed and waned over time, but it never entirely dies out, because its rewards are too seductive. The fantasy of a simple truth that can transcend the divisions that otherwise fragment a society riven by power and competing agendas is too powerful, and too profitable. To be able to assert convincingly that you have modeled the social world accurately is to know how to sell anything from a political position, a product, to one’s own authority. Big Data sells itself as a knowledge that equals power. But in fact, it relies on pre-existing power to equate data with knowledge.

“I am not sure if the resurgence of positivism in the guise of Big Data should be considered a failing of STS or the success of powerful and willfully ignorant technocratic elites. Probably equal portions of both, but I’m going to put the pressure on my fellow STS scholars to see this as a professional, collective failing. While we still are far from a world without misogyny, white supremacy, or empire, we as academics should take note of our own house: the internal fights at the level of institutions that we’ve let slip by us. Do we apply to Big Data grants and then use the funds for research that undermines the concept altogether? Do we participate in social media-funded conferences and research centers so that we may, from within, raise concerns early and often? Or do we confront positivism head-on as the force for command and control that we know that it is, in all of its forms, and insist on not legitimating Big Data by attaching our names to it? To all three I’d say “yes.” “

from Cyborgology