Abstract: | The HG literature has adopted so far the Perceptron reweighing rule because of its convergence guarantees. Yet, this rule is not suited to HG, as it fails at ensuring non-negativity of the weights. The first contribution of this paper is a solution to this impasse. I consider a variant of the Perceptron which truncates any update at zero, thus maintaining the weights non-negative in a principled way. And I show that the convergence guarantees for the original Perceptron extend to its truncated variant. Unfortunately, although convergent, HG error-driven learning (with both the original and the truncated Perceptron reweighing rule) is not efficient, contrary to error-driven learning in OT. Indeed, the second contribution of this paper is a counterexample with just ten constraints where the HG learner makes over five million errors before converging while the OT error-driven learner makes less than fifty errors, and yet the HG and OT typologies coincide! The superiority of OT over HG error-driven learning is shown to extend to the stochastic implementation. These results do not contradict the good performance of the Perceptron reported in the Machine Learning and Computational Linguistics literature, as the latter literature has focused on an implementation of the Perceptron which is not error-driven (the kernel dual Perceptron), precisely to cope with the inefficiency of the error-driven Perceptron adopted in the HG literature. |