I take it, the [recursively repeated] point is tha...

2013-05-09T11:47:07.695-07:00

I take it, the [recursively repeated] point is that mindlessly gathering data is bad. I doubt that anyone questions this and goes on data-gathering expeditions without having any theory [hypothesis/conjecture] to [dis]prove - so much more interesting would be to get told how much data is "just about right" for say linguistic theories. Any proposals?

Also, you say: "past a certain point, your return on adding more data diminishes to the point that you’re only wasting time gathering more". This is plausible for any finite set of data. But given that language is, at least on some views, a system with infinitely many hierarchically organized expressions [Chomsky, 2012], no matter how many data we gather we only have a tiny tiny subset of all possible data. So my question is: how do we know that so far we have looked at data that will tell us something interesting about the nature of language and not at "correlations [that are] extremely high just by chance"?

Comments on Faculty of Language: More on Big Data

I take it, the [recursively repeated] point is tha...