ROA: | 997 |
---|---|
Title: | The VC Dimension of Constraint-Based Grammars |
Authors: | Max Bane, Jason Riggle, Morgan Sonderegger |
Comment: | Manuscript under review; draft of 10/29/2008 |
Length: | 15 |
Abstract: | We analyze the complexity of Harmonic Grammar (HG), a linguistic model in which licit underlying-to-surface-form mappings are determined by optimization over weighted constraints. We show that the Vapnik-Chervonenkis Dimension of HG grammars with k constraints is k-1. This establishes a fundamental bound on the complexity of HG in terms of its capacity to classify sets of linguistic data that has significant ramifications for learnability. The VC dimension of HG is the same as that of Optimality Theory (OT), which is similar to HG, but uses ranked rather than weighted constraints in optimization. The parity of the VC dimension in these two models is somewhat surprising because OT defines finite classes of grammars---there are at most k! ways to rank k constraints---while HG can define infinite classes of grammars because the weights associated with constraints are real-valued. The parity is also surprising because HG permits groups of constraints that interact through so-called `gang effects' to generate languages that cannot be generated in OT. The fact that the VC dimension grows linearly with the number of constraints in both models means that, even in the worst case, the number of randomly chosen training samples needed to weight/rank a known set of constraints is a linear function of k. We conclude that though there may be factors that favor one model or the other, the complexity of learning weightings/rankings is not one of them. |
Type: | Paper/tech report |
Area/Keywords: | Phonology,Computation,Learnability,Formal Analysis |
Article: | Version 1 |