When I finished the proof of the Merge-MSO correspondence two years ago, I felt more than just the usual PPE (post-proof elation) which comes with having climbed a mountain you yourself divined into existence. Not only does the result make it a lot easier to prove things about Minimalist grammars, it also shows that we can throw in pretty much any proposal from syntax without altering the computational properties of the formalism. What more, the correspondence between constraints and Merge solves a long-standing issue for all of Minimalism: Why are there constraints in the first place? Why isn't it all just Merge and Move? The correspondence between Merge and constraints provides a novel answer: maybe it really is all just Merge and Move, maybe the constraints we find are merely epiphenomena of the feature system. If so, the question isn't why languages have odd restrictions such as the Person Case Constraint, that part we now get for free as long as we do not put any extra restrictions on the lexicon and the feature system. Instead, the real puzzle is why we haven't found more restrictions like that.
You see, MSO is capable of a lot more than just defining the kinds of constraints we linguists have come to love. The linguistic constraints are just a small subclass of what MSO can pull off if it flexes its muscles.
- You want reflexives to c-command their antecedent rather than the other way round? MSO can do it.
- You want adjectives to be masculine if their containing clause is tensed and feminine otherwise? MSO can do it.
- You want your verbs to select a CP only if it contains John or Mary? MSO can do it.
- You want to allow center embedding only if it involves at least three levels of embedding? MSO can do it.
- You want to allow only trees whose size is a multiple of 17? MSO can do it.
- You want to interpret every leaf as 0 or 1 depending on whether it is dominated by an odd number of nodes and block a tree if this kind of interpretation yields a binary encoding of a song in your mp3 collection? MSO... well, you know the drill.
Adding insult to injury, the Merge-MSO correspondence is dubious from a typological perspective. If Merge is allowed to do everything MSO can do, then the intersection of two natural languages is a natural language, and so should be their union and their relative complement. So the fact that there are strictly head-final languages and strictly head-initital languages would entail that there is a language (the union of the two) where a sentence can freely alternate between all phrases being head-final or all phrases being head-initial while mixing the two is always blocked. There should also be a language that (modulo differences in the phonetic realizations of lexical items) consists of all trees that are well-formed in French and German but illicit in English.
In comparison to the freak show that is the class of MSO-definable constraints, our original worry about Merge voiding the Specifier Island Constraint (SPIC) seems rather moot. Sure, Merge punches so many holes in the SPIC that Swiss cheese looks a like a convex set in comparison, but that's rather small potatoes now that our set of UG grammars includes even the "Anti-LF" grammar that generates only trees that contain at least one semantic type conflict. Moreover, maybe there are cases where displacement is mediated by Merge rather than Move. Greg Kobele has a semi-recent paper where he describes how such a system would work, and why language might actually work this way. Some instances of movement can be replaced by a more general version of GPSG-style slash feature percolation, and since this system is easily defined in terms of MSO, it can be handled by Merge. Greg then argues that this kind of split between Merge-displacement and Move-displacement could be used to explain the differences between A-movement and A'-movement. Of course the SPIC is severely weakened in such a system, but there is a nice pay-off. If we want that pay-off, the original SPIC has to be abandoned for a more general principle that applies to both kinds of displacement while also being immune to the feature-coding loop-holes.
So what is the moral of the story? Feature coding and the power it endows Merge with isn't completely good or completely evil. It has advantages and disadvantages. Yes, that a simple operation like Merge (or subcategorization outside Minimalism) can do all of the above is truly worrying. There's clearly something about language we are missing (well, there's many things, but this is one of them). Norbert suggested that the problem disappears if one assumes a fixed set of features. I don't think this is a good solution, in fact, I don't think it is a solution at all. But even if it worked as intended, it would throw out both the bad and the good aspects of feature coding. Ideally, we would like to get rid of the former while holding on to the latter. Nobody has a working solution for this yet, but we will look at some promising candidates in the next and final part of this ongoing series.
 Strictly speaking, this might actually be the case. Maybe those are all possible natural languages and there are independent reasons why we do not find them in the wild --- learnability considerations being the prime suspect (then again, aren't there tons of learnable classes that are at least closed under intersection?)