|Title:||Naive parameter learning for Optimality Theory - The hidden structure problem|
|Comment:||to appear in Proceeding of NELS 40|
|Abstract:||There exist a number of provably correct learning algorithms for Optimality Theory and closely related theories. These include Constraint Demotion (CD; Tesar 1995, et seq.), a family of algorithms for classic OT. For Harmonic Grammar (Legendre, Miyata and Smolensky 1990; Smolensky and Legendre 2006) and related theories (e.g. maximum entropy), there is Stochastic Gradient Ascent (SGA; Soderstrom, Mathis and Smolensky 2006, Jäger 2007). There is also the Gradual Learning Algorithm for Stochastic OT (GLA; Boersma 1997), which works well in most cases but is known not to be correct in the general case (see e.g., Pater 2008). The success of these algorithms (and correctness proofs in the case of CD and SGA) relies on the assumption that learners are provided with full structural descriptions of the data, including prosodic structure as well as underlying representations, which are not available to the human learner.
The present paper describes a new online learning algorithm, the Naive Pairwise Ranking Learner (NPRL), an adaptation of Yang's (2002) Naive Parameter Learning to OT. Simulations on a large metrical phonology test set with hidden structure (Tesar and Smolensky 2000) show that NPRL outperforms existing algorithms and achieves a 100% success rate on the system. Like the earlier algorithms, NPRL is an online algorithm that processes a single data point at a time and maintains one grammatical hypothesis at a time. In contrast to all the earlier algorithms, NPRL sidesteps the issue of structural ambiguity altogether because its updates do not depend on it. Also, NPRL is not error-driven: it makes adjustments to the current grammar hypothesis not only when the current hypothesis results in an error but also when the learner's hypothesis results in an output that matches the data. Despite this apparent success, further investigations into the algorithm's performance, however, reveal unusual behavior reminiscent of random search. This motivates investigation into a truly naive learner, random search, which provides a baseline for performance on this test set. It turns out that the random baseline beats the performance of NPRL (getting 100% accuracy in less time) as well as the performance of the GLA and CD on this test set. The paper discusses implications for the evaluation of constraint-based learning algorithms.