Title:Learning From Paradigmatic Information
Authors:Bruce Tesar
Comment:Superseded by ROA-844.1. To appear in NELS 36.
Abstract:Paradigmatic information is information requiring knowledge of morphological identity across words. It consists of the phonological consequences of knowing that a morpheme must have a single phonological underlying form, even if it surfaces differently in different words. There are two basic forms of paradigmatic information. One is morphemic alternation: the surface realizations of a single morpheme in different morphological contexts (a context consists of the other morphemes used to form the word). The other is morphemic contrast: the surface realizations of two different morphemes in the same morphological context.

Paradigmatic information is necessary for phonological learning. This can be demonstrated with a constructed linguistic system in which several distinct languages, with distinct mappings, have identical inventories of surface phonological forms. To learn the full phonology, the learner must utilize paradigmatic information: that is the only information that can distinguish the different phonotactically identical phonologies.

Phonotactic learning, treating each word in isolation, can uncover some ranking information. Once morphological constituency information is grasped by the learner, some underlying feature values can be determined on the basis of the ranking information provided by phonotactic learning, by using inconsistency detection to test different values for the underlying forms of morphemes. However, this is insufficient for learning the full phonology when the relevant ranking information isn't phonotactically apparent. Paradigmatic information must be used.

Learning such phonologies is non-trivial because of the interdependence of the morphemic underlying forms with each other, and with the mapping. If a learner entertains several possible underlying forms for a root, the consequences (in terms of surface realizations) of each considered underlying form depends upon the underlying forms chosen for the affixes that the root combines with. The consequences of underlying form choice for each of those affixes depends upon the underlying form for other roots they can combine with, and so forth. All surface consequences of considered underlying forms are affected by knowledge of the ranking (or lack thereof). In short, everything seems to depend upon everything.

The learner needs to balance two considerations when learning from paradigmatic information: information content and computational efficiency. The learner needs to simultaneously process units of data that are large enough to contain the needed information: single words aren't sufficient. On the other hand, the units should be small enough to be reasonably processed. The entire data set certainly contains the desired paradigmatic information, but considering all possible lexica across the entire paradigm simultaneously will be intractable.

Contrast pairs are pairs of words that differ in only a single morpheme, where the two differing morphemes surface non-identically. Contrast pairs provide paradigmatic information, and can determine the underlying values of certain alternating features. This is because morphemic contrast implicates a causal role for underlying feature values: if two morphemes surface differently in the same environment, they must have different underlying feature values that actually cause the surface differences. Contrast pairs also strike a balance between information content and computational cost: they contain paradigmatic information not available from single words in isolation, but only involve a small portion of the lexicon.

Once contrast pairs have been used to set the underlying form for a morpheme, the learner can look at words where that underlying form is not faithfully realized on the surface, and learn non-phonotactic ranking information. That ranking information can then be used to determine the underlying values of additional features. The processes of setting the underlying feature values and determining ranking information feed each other, leading to the learning of the phonology. Contrast pairs provide key paradigmatic information, making it possible to set the underlying values of some key alternating features, and thus providing a crucial entry into the portions of the grammar that are not phonotactically visible.
Type:Paper/tech report
Article:This article has been withdrawn.