He makes at least there important points.
First, that there is a difference between data collection and scientific experimentation. The idea, implicit in most of the big data PR, is that one can collect data quite a-theoretically and expect to gain scientific insight. As Chomsky notes that this runs against the accumulated wisdom of the last 200 years of scientific research. As Chomsky compactly put it:
...theory-driven experimental investigation has been the nature of the sciences for the last 500 years.Quite right. Experiments are not just looking. They are looking with an attitude and the tude is a function of theory.
Second, much of what linguistic study has NO relevant data in any conceivable corpus. He cites ECP, but this is just the tip of a very large iceberg. No relevant data, then big data collection is besides the point:
In linguistics we all know that the kind of phenomena that we inquire about are often exotic. They are phenomena that almost never occur. In fact, those are the most interesting phenomena, because they lead you directly to fundamental principles. You could look at data forever, and you’d never figure out the laws, the rules, that are structure dependent. Let alone figure out why. And somehow that’s missed by the Silicon Valley approach of just studying masses of data and hoping something will come out. It doesn’t work in the sciences, and it doesn’t work here.Let me underline one point Chomsky makes: it's the manufactured experimental data that is important to gaining insight. As in the other sciences, linguists create data not found in the wild and use this factitious data to understand what is happening. Real life data is often (IMO, generally) useless because it is too complex. The aim of good data is to reduce irrelevant interference effects that arise from the interaction of many component causes. Real life data is just that; too complex. In linguistics, of particular importance is negative data; data that some structure is unacceptable or cannot have a specific meaning. This is not the kind of data that Big Data can get because it is data that is missing from everyday usage of language. And yes, PoS arguments are built from this kind of data and that is why they are so useful.
Third, I am still not sure what Chomsky's take on island effects is. One of the interesting debates in the Sprouse and Hornstein volume revolved around whether these were reducible to simple complexity effects. My read on this is that Sprouse and Wagers and Phillips got the better of the discussion and that reducing islands to complexity just wasn't going to fly. I'd be interested to know what others think.
At any rate, take a quick look, as it is short and interesting.
CHomsky's recent Sophia Lectures is another excellent recent source of Chomsky syntax speculation. The lectures (plus an excellent interview by Naomi Fukui and Mihoko Zushi) are contained in volume 64 of Sophia Linguistica. I have no online link, unfortunately. But I recommend getting hold of the volume and reading it. Interesting stuff.