Probably since forever philosophers and mathematicians have dreamed of mechanizing thought, of removing judgment from thinking. The newest aspirant in this millennial quest is Big Data, and not surprising there is an eponymous book (excerpted here) with the following provided summary:
This revelatory exploration of big data, which refers to our newfound ability to crunch vast amounts of information, analyze it instantly and draw profound and surprising conclusions from it, discusses how it will change our lives and what we can do to protect ourselves from its hazards.
Big Data (BD) is the new New Thing, the method by which diligence can substitute for thought. The idea actually has a certain charm as it reverberates with our sense of justice. Collecting data is hard work, but it is generally the kind of work for which effort is rewarded. Work hard and you will do well. Put in the hours and the data will pile up. It’s an activity that rewards virtue.
In this it is entirely unlike coming up with a plausible analysis, aka thinking. This activity is totally unfair. Lazy people can have excellent ideas. Sloth is no bar to insight and profligacy no guarantee of intellectual stagnation. Here even the wicked, sloppy, and lazy can prosper. How unfair.
In a just world, virtue would be rewarded. In a just world hard work would guarantee enlightenment. We don’t live in a just world. Big Data is the unfounded belief that this can be remedied. The hard work of data gathering can substitute for the caprice of thought. It cannot be, and, unfortunately, believing it can is likely to deform scientific practice. To see this, consider the following quote:
The era of big data challenges the way we live and interact with the world. Most strikingly, society will need to shed some of its obsession for causality in exchange for simple correlations: not knowing why but only what. This overturns centuries of established practices and challenges our most basic understanding of how to make decisions and comprehend reality (10).
And that is precisely the problem. Big Data is part of an enterprise aimed at reforming scientific practice. Dump why aim for what. However, contrary to the prevailing conception, without a model/theory it is not clear what it means to just “look for correlations.” Data do not speak for themselves. So gathering lots of data will not result in eloquent models that understand the whats that matter. Big data sets cannot pull themselves up by the bootstraps (nothing can pull itself up by the bootstraps!) thereby yielding useful models. So, without explicit thoughtful models that guide the enterprise, we will be saddled with implicit models that obscure (and trivialize) what we are doing (as noted here without good models it is even difficult to separate good data from bad).
None of this would be worth mentioning were it not for the mesmerizing powers of Big Data. We have seen this before (here, and here for example). Big Data is the modern avatar to classical empiricist methodology. It’s appeal is its promises to provide insight without intellectual sweat. This time, however, Empiricism has found a slogan attached to a technology, Google being the all-powerful mantra. Not surprisingly, money-making slogans can be very enticing and Google intellectuals (e.g. Peter Norvig) can gain powerful platforms. And though I am quite sure that like all other (empiricist) attempts to circumvent thought, this too will ultimately fail, it’s demise may not come soon enough to prevent serious damage. So when you hear the siren calls of Big Data I suggest the following prophylactic procedure; repeat Kant’s dictum to yourself, viz. data without theory is blind, data without theory is blind, data without theory is blind…and hope it soon goes away.