[Author Login]
Title:The VC Dimension of Constraint-Based Grammars
Authors:Max Bane, Jason Riggle, Morgan Sonderegger
Comment:Manuscript under review; draft of 10/29/2008
Abstract:We analyze the complexity of Harmonic Grammar (HG), a linguistic
model in which licit underlying-to-surface-form mappings are
determined by optimization over weighted constraints. We show that
the Vapnik-Chervonenkis Dimension of HG grammars with k
constraints is k-1. This establishes a fundamental bound on the
complexity of HG in terms of its capacity to classify sets of
linguistic data that has significant ramifications for
learnability. The VC dimension of HG is the same as that of
Optimality Theory (OT), which is similar to HG, but uses ranked
rather than weighted constraints in optimization. The parity of the
VC dimension in these two models is somewhat surprising because OT
defines finite classes of grammars---there are at most k! ways to
rank k constraints---while HG can define infinite classes of
grammars because the weights associated with constraints are
real-valued. The parity is also surprising because HG permits groups
of constraints that interact through so-called `gang effects' to
generate languages that cannot be generated in OT. The fact that the
VC dimension grows linearly with the number of constraints in both
models means that, even in the worst case, the number of randomly
chosen training samples needed to weight/rank a known set of
constraints is a linear function of k. We conclude that though
there may be factors that favor one model or the other, the
complexity of learning weightings/rankings is not one of them.
Type:Paper/tech report
Area/Keywords:Phonology,Computation,Learnability,Formal Analysis
Article:Version 1