” […] contemporary manifestations of big data have distinctive features that relate to the technologies, institutions and governance structures of the contemporary scientific world.
For instance, this approach is typically associated to the emergence of large-scale, multi-national networks of scientists; to a strong emphasis on the importance of sharing data and regarding them as valuable research outputs in and of themselves, regardless of whether or not they have yet been used as evidence for a given discovery; the institutionalization of procedures and norms for data dissemination through the Open Science and Open Data movements, and policies such as those recently adopted by Research Councils UK and key research funders such as the European Research Council, the Wellcome Trust and the Gates Foundation; and the development of instruments, building on digital technologies and web services, that facilitate the production and dissemination of data with a speed and geographical reach as yet unseen in the history of science.
This peculiar conjuncture of institutional, socio-political, economic and technological developments have considerably increased international debate over processes of data production, dissemination and interpretation within science and beyond. This level of reflexivity over data practices is arguably the most novel and interesting aspect of contemporary debates over big data. What we are witnessing is thus not the emergence of a wholly new research paradigm dealing with hitherto unseen types of data, but rather the rising prominence of a data-centric approach to scientific research, where concerns over data sharing and use in the long term take precedence over immediate attempts to analyze data.
Thus conceptualized, data centrism raises fundamental epistemological issues, which are deeply intertwined with the political challenges posed by big data. […] Philosophical analysis can help to address these questions in ways that inform both current data practices and the ways in which have been conceptualized within the social science and humanities, as well as by policy bodies and other institutions.”
“Scientific research is often presented as the most systematic set of efforts in the contemporary world aimed to critically explore and debate what constitutes acceptable and sufficient evidence for any given belief about reality. The very term ‘data’ comes from the Latin ‘given’, and indeed data are meant to document as faithfully and objectively as possible whatever entities or processes are being investigated. And yet, data collection is always steeped in a specific way of understanding the world and constrained by given material and social conditions, and the resulting data are therefore marked by the historical circumstances through which they were generated: what constitutes trustworthy or sufficient data changes across time and space, making it impossible to ever assemble a complete and intrinsically reliable dataset.”
“This landscape makes the study of data into an excellent entry point to reflect on the activities and claims associated to the idea of scientific knowledge, and the implications of existing conceptualisations of various forms of knowledge production and use.”
“From these interviews it became evident that there were a range of material and social aspects of their research environment that played significant roles in their overall data engagement activities.”
“Such research clearly demonstrates the importance of scrutinizing all processes involved in data engagement and to recognize the role that research environments play in not only the creation of data, but also their selection, presentation and dissemination. How scientists perceive their research environments, what they recognize as strengths and limitations, and what in these environments pose material or social challenges to data engagement all influence what data travels in or out of any research context.”
“The types of data shared and valued, the longevity of these data, and the pathways through which they are disseminated and re-used all have complicated relationships to the research environments in which they are utilized. In consequence, homogenized perceptions of key issues such as what data are, how raw data differs from processed data, and how data ownership can be understood reveal their limitations.”