Sunday, January 13, 2013

Data-driven scientists are lazy

Ouch.  So says petrkiel at R-bloggers as a commentary to Hans Rosling in the BBC's Joy of Stats.  Maybe not as concisely as 'lazy', but 'failure of imagination' when one is expected to be imaginative is pretty damn close:

Data-driven scientists (data miners) ... believe that data can tell a story, that observation equals information, that the best way towards scientific progress is to collect data, visualize them and analyze them (data miners are not specific about what analyze means exactly). When you listen to Rosling carefully he sometimes makes data equivalent to statistics: a scientist collects statistics. He also claims that “if we can uncover the patterns in the data then we can understand“. I know this attitude: there are massive initiatives to mobilize data, integrate data, there are methods for data assimilation and data mining, and there is an enormous field of scientific data visualization. Data-driven scientists sometimes call themselves informaticians or data scientists. And they are all excited about big data: the larger is the number of observations (N) the better.
And the punchline:
Emphasisizing data at the expense of hypothesis means that we ignore the actual thinking and we end up with trivial or arbitrary statements, spurious relationships emerging by chance, maybe even with plenty of publications, but with no real understanding. This is the ultimate and unfortunate fate of all data miners. I shall note that the opposite is similarly dangerous: Putting emphasis on hypotheses (the extreme case of hypothesis-driven science) can lead to a lunatic abstractions disconnected from what we observe. Good science keeps in mind both the empirical observations (data) and theory (hypotheses, models).