WPCc 2BPCourier 10cpi3|mle)CG Times Italic (Scalable)&PCG Times (Scalable)HPLASIII.PRSx  @,\,C0X@2 Z2XpqTHP LaserJet IIIHPLASIII.PRSiP,\,C0&P1S&cGcGBPr]~]2MF2cGcGI U12( U1cGt )$Document[8]cGBPrDocument Style2MF2cGcGW U12( U1cGt )$` ` ` Document[4]cGBPrDocument Style2MF2cGcGe U12( U1cGt )$  . 2ee\pTDocument[6]cGBPrDocument Style2MF2cGcGs U12( U1cGt )$  Document[5]cGBPrDocument Style2MF2cGcG U12( U1cGt )$   Document[2]cGBPrDocument Style2MF2cGcG U12( U1cGt )$*    Document[7]cGBPrDocument Style2MF2cGcG U12( U1cGt )$ ` ` ` 2&n d  BibliogrphycGBPrBibliography2MF2cGcG U12( U1cGt )$ Right Par[1]GBPrRight-Aligned Paragraph Numbers U12( U1cGt )$ 8@  Right Par[2]GBPrRight-Aligned Paragraph Numbers U12( U1cGt )$ A@` ` `  ` ` ` Document[3]cGBPrDocument Style2MF2cGcG U12( U1cGt )$ 0     2 X _ Right Par[3]GBPrRight-Aligned Paragraph Numbers U12( U1cGt )$ J` ` ` @  ` ` ` Right Par[4]GBPrRight-Aligned Paragraph Numbers U12( U1cGt )$ S` ` `  @  Right Par[5]GBPrRight-Aligned Paragraph Numbers U12( U1cGt )$\` ` `  @hhh hhh Right Par[6]GBPrRight-Aligned Paragraph Numbers  U12( U1cGt )$e` ` `  hhh@ hhh 2 P    Right Par[7]GBPrRight-Aligned Paragraph Numbers U12( U1cGt )$n ` ` `  hhh@  Right Par[8]GBPrRight-Aligned Paragraph Numbers) U12( U1cGt )$w!"` ` `  hhh@ppp ppp Document[1]cGBPrDocument Style2MF2cGcG7 U12( U1cGt )$F#$   ׃  Doc InitcGBPrInitialize Document StylecGE U12( U1cGt )$%&    I. 1. A. a.(1)(a) i) a)Document2 qa1Documentt )$Document StyleRN~]~]2MF2cGcGF'(   ׃  a2Documentt )$Document StyleRN~]~]2MF2cGcG*)*   a3Documentt )$Document StyleRN~]~]2MF2cGcG0+ ,    a4Documentt )$Document StyleRN~]~]2MF2cGcG- . . 2eQeppa5Documentt )$Document StyleRN~]~]2MF2cGcG /0 a6Documentt )$Document StyleRN~]~]2MF2cGcG 12 a7Documentt )$Document StyleRN~]~]2MF2cGcG34` ` ` a8Documentt )$Document StyleRN~]~]2MF2cGcG56` ` ` 2-}Tech InitcGBPrInitialize Technical StyleG U12( U1cGt )$78 1 .1 .1 .1 .1 .1 .1 .1 Technicala1Technical )$ RN~]~]2MF2cGcG$9: a2Technical )$ RN~]~]2MF2cGcG/;< a3Technical )$ RN~]~]2MF2cGcG:=> 2 !}"&#a4Technical )$ RN~]~]2MF2cGcG E?@ a5Technical )$ RN~]~]2MF2cGcG!PAB a6Technical )$ RN~]~]2MF2cGcG"[CD a7Technical )$ RN~]~]2MF2cGcG#fEF 2;$%&'a8Technical )$ RN~]~]2MF2cGcG$qGH Technical[5]GBPrTechnical Document StyleGcG U12( U1cGt )$%&IJ  . Technical[6]GBPrTechnical Document StyleGcG U12( U1cGt )$&&KL  . Technical[2]GBPrTechnical Document StyleGcG U12( U1cGt )$'*MN    2(m)*+FTechnical[3]GBPrTechnical Document StyleGcG U12( U1cGt )$('OP   Technical[4]GBPrTechnical Document StyleGcG U12( U1cGt )$)&QR   Technical[1]GBPrTechnical Document StyleGcG U12( U1cGt )$*4S$T     Technical[7]GBPrTechnical Document StyleGcG  U12( U1cGt )$+&UV  . 29.,-.-!/)Technical[8]GBPrTechnical Document StyleGcG U12( U1cGt )$,&WX  . PleadingcGBPrHeader for numbered pleading paper U12( U1cGt )$-YZ    X  y*dddyy*dddy Hdd1 Hdd2 Hdd3 Hdd4 Hdd5 Hdd6 Hdd7 Hdd8 Hdd9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 H24 H25 H26 H27 H28   ӋSquib ReviewGBPrRequest for review of squibO U12( U1cGt )$.[?\#PP#,|Linguistic Inquiry Squibs and Discussion Editors: |||Chisato Kitagawa |||John McCarthy (mccarthy@cs.umass.edu) |||Margaret Speas Department of Linguistics University of Massachusetts Amherst, MA 01003 USA #PP# |||D3 1, 4D ,4 <DL444The enclosed manuscript has been submitted for publication in the Squibs and Discussion section of Linguistic Inquiry. We would appreciate it if you could look over the manuscript and send us a brief review of it. In order to ensure that we will be able to respond to the author within a reasonable amount of time, we request that you send us your review within three weeks. If you are not able to review it within three weeks, please return it immediately and, if possible, suggest an alternate reviewer. 444Thank you in advance for your help. For our convenience, please enclose this letter with your review. 444____ Publish as is. 444____ Do not publish. 444____ Publish with minor revisions. 444____ Publish only with major revisions. 444____ Inappropriate as a squib; subject matter requires greater length. 444Please enclose additional comments on a separate sheet. 444  <<<DDDSincerely, 444  <<<DDDJohn J. McCarthy #PP#Squib LetterGBPr]~]2MF2cGcGk U12( U1cGt )$/]^#PP#,|Linguistic Inquiry Squibs and Discussion Editors: |||Chisato Kitagawa |||John McCarthy (mccarthy@cs.umass.edu) |||Margaret Speas Department of Linguistics University of Massachusetts Amherst, MA 01003 USA #PP# |||D3 1, 4D ,4 <DL 444  <<<DDDSincerely, 444  <<<DDDJohn J. McCarthy #PP#230k.1X42V23LetterG3cGBPrLinguistics LetterMF2cGcG U12( U1cGt )$0_R`#PP#  =  #PP#UNIVERSITY OF MASSACHUSETTS`(#(#(##PP#Department of Linguistics #PP#AT AMHERST #PP#South College`(#(#(#mccarthy@cs.umass.edu Amherst, MA 01003 (413)5456830 #PP# X   `(#(#(#D3 1, 4D؃  444  <<<DDDSincerely, 444  <<<DDDJohn J. McCarthy 444  <<<DDDProfessor of Linguistics#PP#   Example;cGBPr]~]2MF2cGcG U12( U1cGt )$1ab I. A. 1. a.(1)(a) i) a) 1 a i a.(1)(a) i) a)X01Í ÍX0Í Í3|D2 ?48t='"m+O6^6=U\\===\====\\\\\\\\\\==Qs~sm=Gsizbsw===\\=Q\Q\Q=\\33\3\\\\DG3\\\\QQ\Q=\\\\\=\\\\\\\\\3QQQQQz~QsQsQsQsQ=3=3=3=3\\\\\\\\\\Q\\\\\i\QQQ~Q~Q~Q~Q\sQsQsQsQ\\\\\\\\=3=3=3=3fG\s3s3s3s3s3\m\\\\zDzDzDbGbGbGbGs3s3s3\\\\\\\wQwQwQ\s3\zDbGs3\\\\\=\\===WxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxN\\\=QKK\\\\\\A\\\\A==__\00\\pp\\\mff=_\A"\_\壣\==px=\\f\z\=\Q\iwUzpNmń\QQ====фpsfpfzQsGwQ\Q=3QzffQz\Qpi\p\\sQQzpfppppG\33QQQpQpppp==\\\\\\\pppppppppppppppppppGGGGGGG\\\\\\\\\\\\\\\\\\\\333333333333QQQQQQQQQQQQQQQQQQQpppppppppppppppppppp=is=p=\\Q=x"m+O6^+1DIIr111I1111IIIIIIIIII11Aj\dm\Wjm19g\mjTjbO\mjjj_111II1AIAIA1II))I)rIIII69)IIjIIAAIA1IIIII1IIIIIIIII)jAjAjAjAjAbdA\A\A\A\A1)1)1)1)mIjIjIjIjImImImImIjIjAmIjIjIjImITIjAjAjAdAdAdAdAmI\A\A\A\AjIjIjIjIjIjImImI1)1)1)1)kQ9gI\)\)\)\)\)mImWmImIjIjIjb6b6b6O9O9O9O9\)\)\)mImImImImImIjjI_A_A_AmI\)mIb6O9\)jIjImIjImI1II111WxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxNjjjIII1A<WjjjjjjjIjjjjjjjAjjjjAjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj1jjj1jjj1jjj1jjjjjjjjjjjjjjZ\QZQbA\9_AmIjA1)gAbQQmAbIjAjZTIZII\AjAbZjQrZZZZ9I))AAAZAZZZZ11IIIIIIIZZZZZZZZZZZZZZZZZZZ9999999IIIIIIIIIIIIIIIIIIII))))))))))))AAAAAAAAAAAAAAAAAAAZZZZZZZZZZZZZZZZZZZZ1Tj\m1jjZ1jIIA1xCG Times (Scalable)CG Times Italic (Scalable)Univers (Scalable)Univers Bold (Scalable)Univers Italic (Scalable) P7P6k(V1+,/ոV&_ x$&7X.t==,/ &txP7&P2IR?B(DjDZ@I"m+O6^+1GIIr111I1111IIIIIIIIII11IZZbjZTjj1AbQzbjZjZIQjZzZQQ111II1IIAIA)II))A)jIIII99)IAbAA9AIA1IIIII1IIIIIIIII)ZIZIZIZIZIbbAZAZAZAZA1)1)1)1)bIjIjIjIjIjIjIjIjIQAZIjIjIjIQAjIZIZIZIZIbAbAbAbAjIZAZAZAZAjIjIjIjIjIjIjIjI1)1)1)1)wIAbAQ)Q)Q)Q)Q)bIbZbIbIjIjIbZ9Z9Z9I9I9I9I9Q)Q)Q)jIjIjIjIjIjIzbQAQ9Q9Q9jIQ)bIZ9I9Q)QAQAjIjIjI1II111WxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxNjjjIII1IDDIIIIII4IIII411III))IIZZoIIjIOooII1II4"jjIjIjjI傂I11jZjjjjjjjjjjx1IZIQIjbI1IjAITjr_jjjjjjjjjjDjjjjjjzbZrj>WjjjjjjjIjjjjjjjAjjjjAjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj1jjj1jjj1jjj1jjjjjjjjjjjjjZZZQZQbAZ9QAjIjA1)bAbQzQbAbIjAjZZIZIIQAQAbZZQrZZZZ9I))AAAZAZZZZ11IIIIIIIZZZZZZZZZZZZZZZZZZZ9999999IIIIIIIIIIIIIIIIIIII))))))))))))AAAAAAAAAAAAAAAAAAAZZZZZZZZZZZZZZZZZZZZ1TZZj1jQZ1QIIA1xCG Times (Scalable)CG Times Italic (Scalable)Univers (Scalable)"m+O6^==\ss===s====ssssssssss==_sif3fzbmwws~p===\\=bibibDii,,b,iiiiA\DifffXR\R=sssss=ssssss\sp,bbbbbbibibibib3,3,3,3,iiiiiiiii~fbiii~fimibbbbbbbiibibibibiiiiiiii3,3,3,3,Xfzbb,b,b,bCb,i~iiiiwAwAwAw\w\w\w\sDsDsDiiiiii~fpXpXpXib,iwAw\sD~f~fiii=\\===WddddddddddddddddddddddddddddddddddddddddN\ss=_ffsssKKsGG\\sG==\\sDDsspp\\\~pp=\\G"\a\巷 \==mx=\sfs\=s\siwbzpNmń\RR====шfsbf\biUpNib3,z\_ffsNiimbmmQsU~bX~fUi,,ibbbf~~==sssssssfffffffffffffffffffUUUUUUUiiiiiiiiiiiiiiiiiiii,,,,,,,,,,,,iiiiiiibbbbbbbbbbbb=ii3~3~b\Q=xCG Times (Scalable)CG Times Italic (Scalable)Univers (Scalable)Univers Bold (Scalable)2ZIZNPU"m+O6^==_ss===s====ssssssssss==_sif3fzbmwws~p===\\=bibibDii,,b,iiiiA\DifffX\\\=sssss=ssssss\sp,bbbbbbibibibib3,3,3,3,iiiiiiiii~fbiii~fimibbbbbbbiibibibibiiiiiiii3,3,3,3,Xfzbb,b,b,bHb3iiiiiwAwAwAw\w\w\w\sDsDsDiiiiii~fpXpXpXib,iwAw\sD~f~fiii=\\===WddddddddddddddddddddddddddddddddddddddddN\ss=_ffsssKKsGG\\sG==\\sDDsspp\\\~pp=\\G"\f\巷 \==px=\smsX=sfsi~\zpNmń\\\====шismf\ii\pUip30z\\mfXipmipwUsU~mfi\i00immiw~==sssssssiiiiiiiiiiiiiiiiiii\\\\\\\iiiiiiiiiiiiiiiiiiii000000000000iiiiiiimmmmmmmmmmmm=pi3~3~i\Q=x3m=6,#{&m P7&Pj)W1+,ۏW P7P6k(V1+,/ոV&_ x$&7X.t==,/ &txP7&P.t==,B s&tp?7&7II,/LxP7P7II,B"p?7"m+O6^IInܧIIIIIIIIIrܣ~z=zvȟܟIIInnIv~v~vQ~~55v5~~~~MnQ~zzzjbnbIIn5vvvvvػv~v~v~v~v=5=5=5=5~~~~~~~~~zv~~~z~~vvvvvvv~~v~v~v~v~~~~~~~~=5=5=5=5jzvv5v5v5vPv5~~~~~ȏMMMnnnnQQQ~~~~~~ܿzjjj~v5~MnQzz~~~InnIIIWddddddddddddddddddddddddddddddddddddddddNnIrzzԋZZ܋VVnnVIInnQQ܋nnn̆InnV"ܟntn nIIܟܟxInzܟܣnIn~ܟvܟܟܷܟ^şnܟbbIIIIѣzvznv~f^~v=5nrzz^~~vbfvjzf~55~vvvzIIzzzzzzzzzzzzzzzzzzzfffffff~~~~~~~~~~~~~~~~~~~~555555555555~~~~~~~vvvvvvvvvvvvI~~==vnbIx"m+O6^IIrܧIIIIIIIIIrܣ~z=zvȟܟIIInnIv~v~vQ~~55v5~~~~MnQ~zzzjnnnI!In5vvvvvػv~v~v~v~v=5=5=5=5~~~~~~~~~zv~~~z~~vvvvvvv~~v~v~v~v~~~~~~~~=5=5=5=5jzvv5v5v5vVv=~~~~~ȏMMMnnnnQQQ~~~~~~ܿzjjj~v5~MnQzz~~~InnIIIWddddddddddddddddddddddddddddddddddddddddNnIrzzԋZZ܋VVnnVIInnQQ܋nnn̆InnV"ܟnzn nIIܟܟxInܳܣjIz~ܟnܟܟܿܟ^şnܟnnIIIIѣ~zn~~nf~=9nnȂzj~~ff̟z̟~n~99~~II~~~~~~~~~~~~~~~~~~~nnnnnnn~~~~~~~~~~~~~~~~~~~~999999999999~~~~~~~I~==~nbIx2Z\_` c"m+O6^==\ss===s====ssssssssss==_sif3fzbmwws~p===\\=bibibDii,,b,iiiiA\DifffXQ\Q=sssss=ssssss\sp,bbbbbbibibibib3,3,3,3,iiiiiiiii~fbiii~fimibbbbbbbiibibibibiiiiiiii3,3,3,3,Xfzbb,b,b,b:b,iiiiiwAwAwAw\w\w\w\sDsDsDiiiiii~fpXpXpXib,iwAw\sD~f~fiii=\\===WddddddddddddddddddddddddddddddddddddddddN\ss=_ffsssKKsGG\\sG==\\sDDsspp\\\~pp=\\G"\d\巷\==mx=\sfs\=s\siwbzpNmń\QQ====шfsbf\biUpNib3,z\_ffsNiimbmmQsU~bX~fUi,,ibbbf~~==sssssssfffffffffffffffffffUUUUUUUiiiiiiiiiiiiiiiiiiii,,,,,,,,,,,,iiiiiiibbbbbbbbbbbb=ii3~3~b\Q=x3m=6,#{&m P7&Pj)W1+,ۏW P7P6k(V1+,/ոV&_ x$&7X.t==,/ &txP7&P.t==,B s&tp?7&7II,/LxP7P7II,B"p?7d.t==,Z=a&tx7&XNnIrzzZ X =  ; kI  1 a i a.(1)(a) i) a) 1 a i a.(1)(a) i) a)Tesar & SmolenskyLearnability of Optimality Theoryă#gP &P#]  The Learnability of Optimality Theory:ă  O  An Algorithm and Some Basic Complexity Results ă  Bruce Tesar & Paul Smolensky  O]  ]  Department of Computer Science & Institute of Cognitive Science  O fUniversity of Colorado at Boulder#iP#{&P#у If Optimality Theory (Prince & Smolensky 1991, 1993) is correct, Universal Grammar provides a set of universal constraints which are highly general, inherently conflicting, and consequently rampantly violated in the surface forms of languages. A language's grammar ranks the universal constraints in a dominance hierarchy, higher-ranked constraints taking absolute priority over lower-ranked constraints, so that violations of a constraint occur in well-formed structures when, and only when, they are necessary to prevent violation of higher-ranked constraints. Languages differ principally in how they rank the universal constraints in their language-specific dominance hierarchies. The surface forms of a given language are structural descriptions of inputs which are optimal in the following sense: they satisfy the universal constraints, or, when these constraints are brought into conflict by an input, they satisfy the highest-ranked constraints possible. This notion of optimality is partly language-specific, since the ranking of constraints is language-particular, and partly universal, since the constraints which evaluate well-formedness are (at least to a considerable extent) universal. In many respects, ranking of universal constraints in Optimality Theory plays a role analogous to parameter-setting in principles-and-parameters theory. Evidence in favor of this Optimality-Theoretic characterization of Universal Grammar is provided elsewhere; most work to date addresses phonology: see Prince & Smolensky 1993 (henceforth, `P&S') and the several dozen works cited therein, notably McCarthy & Prince 1993; initial work addressing syntax includes Grimshaw 1993 and Legendre, Raymond & Smolensky 1993. Here, we investigate the learnability of grammars in Optimality Theory. Under the assumption of innate knowledge of the universal constraints, the primary task of the learner is the determination of the dominance ranking of these constraints which is particular to the target language. We will present a simple and efficient algorithm for solving this problem, assuming a given set of hypothesized underlying forms. (Concerning the problem of acquiring underlying forms, see the discussion of `optimality in the lexicon' in P & S 1993:9). The fact that surface forms are optimal means that every positive example entails a great number of implicit negative examples: for a given input, every candidate output other than the correct form is ill-formed.d See 4.1 below for a discussion of ties, when more than one form is optimal.d As a consequence, even a single positive example can greatly constrain the possible grammars for a target language, as we will see explicitly. In 1 we present the relevant principles of Optimality Theory and discuss the special nature of the learning problem in that theory. Readers familiar with the theory may wish to proceed directly to 1.3. In 2 we present the first version of our learning algorithm, initially, through a concrete example; we also consider its (low) computational complexity. Formal specification of the first version of the algorithm and proof of its correctness are taken up in the Appendix. In 3 we generalize the algorithm, identifying a more general core called Constraint Demotion (`CD') and then a family of CD algorithms which differ in how they apply this core to the acquisition data. We sketch a proof of the correctness and convergence of the CD algorithms, and of a bound on the number of examples needed to complete learning. In 4 we briefly consider the issue of ties in the ranking of constraints and the case of inconsistent data. Finally, we observe that the CD algorithm entails a Superset Principle for acquisition: as the learner refines the grammar, the set of wellformed structures shrinks. We offer three related contributions in this paper. First, the existence of the CD algorithms demonstrates formally that Optimality Theoretic grammars are efficiently learnable. Second, these algorithms provide a first suggestion of how to analyze the actual language acquisition process within Optimality Theory, and suggest that acquisition may respect a Superset Principle. Last but not least, the CD algorithms have a very real practical value for linguists working in Optimality Theory. Given language data and a hypothesized set of constraints, the algorithms quickly and easily provide a class of constraint rankings which account for the data, or directly determine that no such rankings exist. #gP&P# 1. The Learning Problem in Optimality Theory #iP&P# The novel features of the learnability problem, and the character of the learning algorithm, depend on a number of the detailed properties of Optimality Theory. We begin with a selfcontained presentation of the relevant principles of the theory, to the level of detail required for presentation of the learning algorithm. We consistently exemplify the principles, and in 2, the learning algorithm, using the Basic CV Syllable Structure Theory of P&S (6), which we develop concurrently. #gP&P# 1.1 Constraints and Their Violation #iP&P# (FUN) Grammars are functions. A grammar is a specification of a function which assigns to each input a unique structural description or output. (A grammar does not provide an algorithm for computing this function, e.g., by sequential derivation.) In CV theory, an input is a string of Cs and Vs, e.g., /VCVC/. An output is a parse of the string into syllables, which we notate as follows: ( FUN1 -). .V.CVC. = [% V] [% CVC] . V .CV. C  = V [% CV] C . V .CV.C]. = V [% CV] [% C]] . .]V.CV. C  = [% ]V] [% CV] C (These four forms will be referred to throughout the paper, and will be consistently labelled ad as in ( FUN1 -).) Possible output a is an onsetless open syllable followed by a closed syllable. Parse b contains only one, open, syllable: the initial V and final C of the input are not parsed into syllable structure; following ideas related to `Stray Erasure' (McCarthy 1979, Steriade 1982, It= 1986, 1989), we assume that these unparsed segments are not phonetically realized when the phonological output b is subjected to phonetic interpretation. Parse c consists of a pair of open syllables, in which the nucleus of the second syllable is not filled by underlying material. This is taken to mean that phonetic realization of the phonological structure c provides an epenthetic segment in the final nucleus, following a welldeveloped conception of epenthesis (established, e.g., by Selkirk 1981, LaPointe & Feinstein 1982, Broselow 1982, Archangeli 1984, Kaye & Lowenstamm 1984, Piggott & Singh 1985, It= 1986, 1989). As in parse b, the initial underlying V is unparsed in parse c. Parse d is also a pair of open syllables, but this time it is the onset of the first syllable which is unfilled by underlying material, and the final underlying C which is unparsed. In derivational terms, the first and second syllables of  FUN1 -.c respectively constitute the result of deletion and epenthesis rules. Optimality Theory lacks derivations and rules; only the output phonological structure  FUN1 -.c, and the input /VCVC/ (which is in fact contained in the output), are recognized. In place of deletion we have underparsing, in which underlying material is not parsed into syllable structure; in place of epenthesis we have overparsing, in which syllable positions in the output are not filled by underlying material. What ( FUN1 ) asserts is that the grammar assigns to the input /VCVC/ one structural description, such as one of the possibilities listed under ( FUN1 -.ad). Which one gets assigned is determined by subsequent principles. (GEN) Gen. Universal Grammar provides the set of possible inputs, the set of possible outputs, and a function Gen which, given any input i, generates the set of candidate outputs Gen(i), for i. The input i is a substructure contained within each of its candidate outputs in Gen(i). In the CV case, for any input i, the candidate outputs in Gen(i) consist in all possible parsings of the string into syllables, including the possible over and underparsing structures exemplified above in ( FUN1 -.bd). All syllables are assumed to contain a Nucleus position, with optional preceding Onset and following Coda positions. Here we adopt what P&S (6) dub the Basic CV Syllable Structure Theory, in which the simplifying assumption is made that the syllable positions Onset and Coda may each contain at most one C, and the position Nucleus may contain at most one V. The possibility of overparsing means that the candidate set for any input is in fact infinite. However, it can be shown by examining the constraints of the Basic CV theory that the distribution of unfilled syllable positions is highly limited, entailing that optimal candidates always reside in a particular finite subset of the infinite candidate set (P&S 6.2.3). ( CONSTR ) Universal constraints. Universal Grammar provides a set of universal constraints which assess the wellformedness of candidate outputs for a given input in parallel (i.e., simultaneously). Given a candidate output, each constraint assesses a set of marks each of which corresponds to one violation of the constraint; the collection of all marks assessed a candidate c is denoted marks(c). The CV constraints we will deal with here are as follows: ( CONSTR3 -) Universal CV Syllable Structure Constraints.  Ons Syllables have onsets.  Cod Syllables do not have codas.  Parse Underlying material is parsed into syllable structure.  FillNuc Nucleus positions are filled with underlying material.  FillOns Onset positions (when present) are filled with underlying material. We can illustrate how these constraints assess marks by considering the candidate outputs in ( FUN1 -.ad). Parse ( FUN1 -.a) = .V.CVC. violates Ons in its first syllable and Cod in its second; the remaining constraints are satisfied. We denote the single mark which Ons assesses .V.CVC. as *Ons; so the complete set of marks incurred by the candidate .V.CVC. is  ` ` ` marks(.V.CVC.) = {*Ons, *Cod}  The first candidate is a faithful parse: it involves neither under nor overparsing, and therefore satisfies the faithfulness constraints Parse and Fill. By contrast the underparsed candidate ( FUN1 -.b) = V .CV. C  violates Parse, and more than once:   ` ` ` marks( V .CV. C ) = {*Parse, *Parse}  As we will see, it is crucial that multiple marks be maintained (although the theory requires no counting). The third candidate ( FUN1 -.c) = V .CV.C]. combines underparsing with overparsing, violating both Parse a Fill constraint:   ` ` ` marks( V .CV.C]) = {*Parse, *FillNuc}  The final candidate ( FUN1 -.d) = .]V.CV. C  violates FillOns rather than FillNuc (as well as Parse):  ` ` ` marks(.]V.CV. C ) = {*Parse, *FillOns}  The marks incurred by these candidates are summarized in the following table: ( CONSTR3 .) Marks incurred by Four Candidate Parses of /VCVC/ | dd   zz  dd   zz | *PP*CandidatesOnsCodParseFillNucFillOns** /VCVC/ *uu*a..V.CVC.****b. V .CV. C * ***c. V .CV.C].****d..]V.CV. C ** (The order in which the constraints are listed here is arbitrary.) #gP&P# 1.2 Optimality and Harmonic Ordering #iP&P# The central notion of optimality now makes its appearance. The idea is that by examining the marks assigned by the universal constraints to all the candidate outputs for a given input, we can find the least marked, or optimal, one: this is the one and only wellformed parse that may be assigned to the input. The relevant notion of `least marked' is not the simplistic one of just counting numbers of violations. Rather, in a given language, different constraints have different strengths or priorities: they are not all equal in force. When a choice must be made between satisfying one constraint or another, the stronger must take priority. The result is that the weaker will be violated in a wellformed surface structure. The notion of harmonic ordering defined here is identical with that of P&S (e.g., 5) but the formulation differs; the direct link between the two is provided by the Cancellation/Domination Lemma of P&S (Appendix). ( RANKING ) Constraint Ranking, Harmonic Ordering, and Optimality. The grammar of each language ranks the universal constraints in a dominance hierarchy; when constraint 1 dominates another 2 in the hierarchy, we write 1 >  > 2. The ranking is total: the hierarchy determines the relative dominance of every pair of constraints:   ` ` `  1 >  > 2 >  >  >  > n   A grammar's constraint hierarchy induces on all the candidate outputs a harmonic ordering as follows. Let a and b be two candidate parses with sets of marks marks(a) and marks(b). To compare a and b, first cancel all the marks they have in common, getting lists of `uncancelled' marks: marks-(a) and marks-(b); now, no mark occuring in one list occurs in the other. Then determine which list of uncancelled marks contains the worst mark: a mark assessed by the highestranking constraint violated in the two lists. This candidate is less harmonic or has lower Harmony than the other; if it is a, then we write: a u b. (We also sometimes write marks(a) u marks(b).) The harmonic ordering u ranks all pairs of candidates. For a given input, the most harmonic of the candidate outputs provided by Gen is the optimal candidate: it is the one assigned to the input by the grammar. Only this optimal candidate is wellformed; all less harmonic candidates are illformed. If two candidates are assessed exactly the same set of marks by the universal constraints, then they are equally harmonic (regardless of the constraint ranking). If it should happen that such a tie should occur for the most harmonic candidate output for an input, then the two outputs are both optimal, both wellformed, with the interpretation of free alternation. We can illustrate harmonic ordering in the CV case by reexamining the table of marks ( CONSTR3 .) under the assumption that the universal constraints have now been ranked by a particular grammar as follows: (RANKING4.a) Constraint Hierachy for L1: ` ` ` Ons >  > Cod >  > FillNuc >  > Parse >  > FillOns   In the following constraint tableau, the most dominant constraints in L1 appear to the left: (RANKING4.b) Constraint Tableau for L1 = &CVep,del | dd z z  dd z z | *PP*CandidatesOnsCodFillNucParseFillOns*  * /VCVC/ *uu* d..]V.CV. C ***  *b. V .CV. C * **  *c. V .CV.C].***  *a..V.CVC.** The candidates in this tableau have been listed in harmonic order, from highest to lowest Harmony; the optimal candidate is marked manually. Determining that this candidate is optimal requires demonstrating that it is more harmonic than any of the infinitely many competing candidates. A general technique for such demonstration, the Method of Mark Eliminability (P&S 7.3), proceeds by showing that any attempt to avoid the marks incurred by the putatively optimal output leads to alternatives which incur worse marks. Starting at the bottom of the tableau, let us verify that a u c. The first step is to cancel common marks: here, there are none. The next step is to determine which candidate has the worst mark, i.e., violates the most highly ranked constraint: it is a, which violates Ons. Therefore a is the less harmonic. In determining that c u b, we first cancel the common mark *Parse; c then earns the worst mark of the two, *FillNuc. We see the importance of retaining multiple marks in the comparison of b to d: one *Parse mark cancels, leaving marks-(b) = {*Parse} and marks-(d) = {*FillOns}. The worst mark is the uncancelled *Parse incurred by b: so b u d. This constraint hierarchy given in (RANKING4.a) and illustrated in (RANKING4.b) is that of a language in which all syllables have the surface form CV: onsets are required, codas are forbidden. In case of problematic inputs such as /VCVC/ where a faithful parse into CV syllables is not possible, this language uses overparsing to provide missing onsets and underparsing to avoid codas (it is the language denoted &CVep,del in P&S:6.2.2.2). We will call this language L1, and explore how the course of learning it differs from that of another CVlanguage, L2. If in L1's constraint hierarchy we exchange the two Fill constraints, we get L2: (RANKING4.c) Constraint Hierachy for L2: ` ` ` Ons >  > Cod >  > FillOns >  > Parse >  > FillNuc Now the tableau corresponding to (RANKING4.b) becomes (RANKING4.d); the columns have been reordered to reflect the constraint reranking, and the candidates have been reordered to reflect the new harmonic ordering: (RANKING4.d) Constraint Tableau for L2 = &CVdel,ep | dd  z   dd  z  | *PP*CandidatesOnsCodFillOnsParseFillNuc*  * /VCVC/ *uu* c. V .CV.C].***  *b. V .CV. C * **  *d..]V.CV. C ***  *a..V.CVC.** Like L1, all syllables in L2 are CV; /VCVC/ gets syllabified differently, however. In L2, underparsing is used to avoid onsetless syllables, and overparsing to avoid codas (L2 is P&S's language &CVdel,ep). The relation between L1 and L2 illustrates the final central principle of Optimality Theory relevant here: (TYP) Typology by ReRanking. Crosslinguistic variation is principally due to variation in languagespecific rankings of the universal constraints. Analysis of the optimal forms arising from all possible rankings of the constraints provided by Universal Grammar gives the typology of possible human languages. Universal Grammar may impose restrictions on the possible rankings of its constraints. In the CV theory, Universal Grammar imposes no restrictions on the ranking of the constraints we have discussed here ( CONSTR3 ). Analysis of all possible rankings of these constraints reveals that the resulting typology of basic CV syllable structures is an instantiation of Jakobson's typological generalizations (Jacobson 1962, Clements & Keyser 1983): see P&S:6.( Other explicit OptimalityTheoretic typologies derived from reranking universal constraints include that of sonoritybased inventories of onset, nucleus and coda segments, including an explanation of the onset/coda licensing asymmetry (P&S:8); featurebased segmental inventories (P&S:9); and systems of case marking and grammatical voice (Legendre, Raymond, & Smolensky 1993). These phonological typologies all involve universal restrictions on constraint rankings, when the constraint set is enlarged beyond those of ( CONSTR3 ). By far the most striking case is McCarthy & Prince 1994's OptimalityTheoretic formulation of prosodic morphology, which consists largely in the principle that morphological constraints are universally dominated by prosodic constraints.( In this typology, syllable structures may have required or optional onsets, and, independently, forbidden or optional codas. In a principlesandparameters theory, such a typology might arise from a parameter which governs whether the principle `Syllables have onsets' does or does not apply (unviolated on the surface) in the language. Here, we have a corresponding principle, Ons ( CONSTR3 -), but it is a constraint present in all languages, violable when (and only when) dominated by conflicting constraints. By ranking the constraint differently in their hierarchies, languages differ in whether the constraint is surfaceunviolated. (But even when ranked low enough to be violated in wellformed surface forms, Ons still functions: as P&S emphasize, /CVCV/ is universally parsed .CV.CV. rather than *.CVC.V. even in languages where the syllables .CVC. and .V. of the latter parse are both wellformed; this is because Ons"and/or Cod"is still operative in the language, even though lowranked. A perhaps more dramatic case is provided by Fill; even in languages in which Fill is surfaceviolated"as manifest in epenthetic material"Fill still functions absolutely crucially to sharply limit possible epenthesis sites: P&S:6.2.3.) #gP&P# 1.3 The Learning Problem #iP&P# Substantive knowledge of language in Optimality Theory thus consists in knowledge of the universal set of possible inputs, outputs and Gen ( GEN2 ); of the universal constraints ( CONSTR3 ); of the ranking of those constraints particular to the language (RANKING4); and of the lexicon providing the particular underlying forms in the languageparticular inputs. Under the assumption that what is universal in this knowledge is not learned but innate, what remains then to be learned is the lexicon and the constraint ranking.D It may well be that the OptimalityTheoretic constraints of universal grammar will turn out to need parameters; in this case, parametersetting will also be part of the learning process. We limit our discussion to `typology by reranking' because that is what has figured most prominently thus far in OptimalityTheoretic analyses of crosslinguistic variation, and because that is the novel aspect of the learning problem in comparison with principlesandparameters theory. It should also be observed that, while Optimality Theoretic analyses have to date succeeded to a significant degree in avoiding idiosyncratic, languageparticular constraints, such constraints have stillsometimes proved necessary, and learning them may need to be part of a comprehensive learning theory for Optimality Theory.D Concerning acquisition of the lexicon, the reader is referred to discussion in P&S:9. Here we assume that the learner has a hypothesized lexicon in hand, and faces the problem of learning the ranking of known (or, for that matter, merely hypothesized) constraints. The initial data for the learning problem are pairs consisting of an input and its wellformed (optimal) parse. For example, the learner of the CV language L1 might have as an initial datum the input /VCVC/ together with its wellformed parse .]V.CV. C  (RANKING4.b). Together with this single piece of explicit positive evidence comes a large mass of implicit negative evidence. Every alternative parse of this input is known to be illformed; for example, the faithful parse *.V.CVC. is illformed. Thus the learner knows that, with respect to the unknown constraint hierarchy,  ` ` ` .V.CVC. u .]V.CV. C   Furthermore, corresponding harmonic comparisons must hold for every suboptimal parse. Unless the grammar leads two candidate outputs to tie for most harmonic, yielding two optimal outputs, all candidates other than the observed output can be inferred to be suboptimal. The case of ties is taken up in 4.1 below. Thus each single piece of positive initial data conveys a large amount of infered comparative data of the form:  ` ` ` [suboptimal parse subopt of input i] u [optimal parse opt of input i]  Such pairs are what feed our learning algorithm. Each pair carries the information that the constraints violated by the suboptimal parse subopt must outrank those violated by the optimal parse opt, that, in some sense, we must have marks(subopt) >  > marks(opt), or more informally, losermarks >  > winnermarks The algorithm we now present is nothing but a means of making this observation precise, and deducing its consequences. #gP&P# 2. The RCD Algorithm #iP&P# Before giving a general description of the algorithm, we first illustrate its use. In 2.1 we show how the algorithm learns the language L1 from the single positive example which we have discussed, /VCVC/  .]V.CV. C ; we then show how the algorithm learns the different language L2 from its (different) parse of the same input. Then in 2.2 we give a general but somewhat informal statement of the algorithm, and in 2.3 an informal summary of the proof of its convergence and correctness, and discuss its computational complexity. More formal treatment is provided in the Appendix. #gP&P# 2.1 An Example: Learning CV Syllable Structure #iP&P# In the first language we consider, L1, the correct parse of /VCVC/ is .]V.CV. C . This fact forms our starting point, one initial datum. Table ( CONSTR3 .) gives the marks incurred by the wellformed parse (labelled d) and by three illformed parses. From this table we form the following table of markdata pairs: ( MARKS ) Markdata Pairs (L1) h dd] "  dd] " h ",,"subopt u optlosermarks = marks(subopt)winnermarks = marks(opt)"" " "a u d.V.CVC.u .]V.CV. C {*Ons, *Cod}{*Parse, *FillOns}"`! `! "b u d V .CV. C u .]V.CV. C {*Parse, *Parse}{*Parse, *FillOns}"`! `! "c u d V .CV.C].u .]V.CV. C {*Parse, *FillNuc}{*Parse, *FillOns} In order that each suboptimal output subopt be less harmonic than the optimal output opt, the marks incurred by the former, marks(subopt) must collectively be worse than marks(opt). According to (RANKING4), what this means more precisely is that subopt must incur the worst uncancelled mark, compared to opt. So the first step in the learning algorithm is to cancel the common marks in ( MARKS6 ): ( MARKS' ) Markdata Pairs after Cancellation (L1) h dd]  dd] h ",,"subopt u optlosermarks = marks-(subopt)winnermarks = marks-(opt)"" " "a u d.V.CVC.u .]V.CV. C {*Ons, *Cod}{*Parse, *FillOns}"`! `! "b u d V .CV. C u .]V.CV. C { *parse *Parse}{ *parse *FillOns}"`! `! "c u d V .CV.C].u .]V.CV. C { *parse *FillNuc}{ *parse *FillOns} The cancelled marks have been struck out. Note that the cancellation operation which transforms marks to marks- is only welldefined on pairs of sets of marks; e.g., *Parse is cancelled in the pairs b u d and c u d, but not in the pair a u d. Note also that cancellation of marks is done tokenbytoken: in the row b u d, one but not the other mark *Parse in marks(b) is cancelled. As we will see, the algorithm is sensitive not to absolute numbers of marks, but only to whether subopt or opt incurs more marks of a given type. (Optimality Theory recognizes a relative distinction between more or lessviolation of a constraint, but not an absolute quantitative measure of degree of constraint violation.) The table ( MARKS'7 ) of markdata after cancellation is the data on which the rest of the algorithm operates. The second part of the algorithm proceeds recursively, finding first the constraints that may be ranked highest while being consistent with the markdata pairs, then eliminating those constraints from the problem and starting over again to rank the remaining, lower, constraints. Conceived as a sequence of passes, the first pass through the data determines the highestranking constraints, the next pass the nexthighest ranking constraints, and so forth down the hierarchy. If the data provide enough information to completely determine the total ranking, then only one constraint will be returned by each pass. In general, however, the result of the algorithm will be a stratified hierarchy of the form:   ( STRATHI ) Stratified Domination Hierarchy ` ` ` {1, 2, ..., 3} >  > {4, 5, ..., 6} >  > ... >  > {7, 8, ..., 9}  The constraints 1, 2, ..., 3 comprise the first stratum in the hierarchy: they are not ranked with respect to one another, but they each dominate all the remaining constraints. A stratified hierarchy may not totally harmonically order all candidate outputs for a given input, since it fails to specify the relative ranking of constraints which may conflict. This has the consequence, discussed below in 4.1, that the output of the algorithm may fail to decide between candidates for which relevant training data is unavailable.  N.B.: Unless qualified as `totally ranked,' henceforth, `hierarchy' will mean `stratified hierarchy' . The initial state of the learner can be taken to be the completely degenerate stratified hierarchy in which all the constraints are lumped together in a single stratum. Learning proceeds to refine this hierarchy into a sequence of smaller strata. Each pass of the algorithm begins with a set of notyetranked constraints, and ends by selecting some of them to constitute the nextlower stratum in the hierarchy. At each step, markdata pairs which are accounted for by the alreadyranked constrants are eliminated from the markdata table ( MARKS'7 ), so that only notyetranked constraints are left in the table. The key observation is this:   ( KEYOBS ) Key observation. For any given pair, if an uncancelled mark is incurred by the winner, then it must be dominated by a mark incurred by the loser: otherwise, the winner wouldn't win (this is the Cancellation/Domination Lemma of P&S:130,148,221). Therefore, any constraint assessing an uncancelled mark to the winner cannot be among the highest ranked constraints in the set, and cannot be output as part of the next stratum. Conversely, any constraint that does not assess a mark to any winner can enter the next stratum, since no evidence exists that it is dominated by any of the remaining constraints.  We illustrate how this process deduces the hierarchy for L1 from the data in ( MARKS'7 ). When the algorithm begins, the notyetranked constraints comprise the entire universal set ( CONSTR3 .ae):   ` ` ` notyetrankedconstraints = {Ons, Cod, Parse, FillNuc, FillOns}  Examining the rightmost column of the markdata table ( MARKS'7 ) we see that two marks, *Parse and *FillOns, appear in the list of uncancelled winner marks. So the two constraints Parse and FillOns must be dominated by other constraints (those violated by the corresponding losers); they cannot be the highestranked of the notyetrankedconstraints. This leaves:   ( KEYOBS9 .a)` ` ` highestrankedconstraints = {Ons, Cod, FillNuc}  which constitutes the output of the first pass: these three constraints form the highest stratum in the hierarchy. The data do not support any distinctions in ranking among the three, so none are made. Now   ( KEYOBS9 .b)` ` ` notyetrankedconstraints = {Parse, FillOns} Now that the highest ranked constraints have been determined, the list of markdata pairs can be trimmed down by removing any markdata pairs that are completely accounted for by the constraints selected as highest. This is the case if at least one of the marks incurred by the loser of a pair is among the highest ranked constraints. Such a mark is guaranteed to dominate all of the corresponding winner's marks, because all of the winner's marks were disqualified from being ranked highest. So we eliminate from the markdata table every row in which any of the highestranked constraints appear. In the current example, we eliminate the pair a u d because *Ons appears (or, alternatively, because *Cod appears), and also the pair c u d, because *FillNuc appears. The new markdata table is thus:   ( KEYOBS9 .c) Markdata Pairs (L1, After First Pass) h dd]  dd] h "RR"subopt u optlosermarkswinnermarks"" " "b u d V .CV. C u .]V.CV. C { *parse *Parse}{ *parse *FillOns}  At the end of the first pass, we now have the first (highest) stratum ( KEYOBS9 .a), a reduced list of notyetranked constraints ( KEYOBS9 .b), and a reduced markdata table ( KEYOBS9 .c). Crucially, the reduced markdata table involves only the notyetrankedconstraints, so we can now recursively invoke the same algorithm, using the remaining data to rank the remaining constraints. This initiates the next pass. Repeating this process with the reduced markdata table ( KEYOBS9 .c), we examine the rightmost column of the table, and observe that of the two notyetrankedconstraints ( KEYOBS9 .b), only one, FillOns, appears. The remaining constraint, then, is output as the next stratum of highestranked constraints:   ( KEYOBS9 .a-)` ` ` highestrankedconstraints = {Parse}  This leaves   ( KEYOBS9 .b-)` ` ` notyetrankedconstraints = {FillOns}  The final step of the second pass is to trim the markdata table, eliminating rows in which the highestrankedconstraints appear. This eliminates the only row in the table, so that the new markdata table is empty.   ( KEYOBS9 .c-) Markdata Pairs (L1, After Second Pass) : none The result of the first two passes is the highest segment of the stratified hierarchy: ` ` ` {Ons, Cod, FillNuc} >  > {Parse} The third pass operates on an empty markdata table. Since there are no marks in the rightmost column of such a table, no remaining constraints must be dominated: all the notyetranked constraints are output as the highestranked. In this case, that is the one remaining constraint FillOns.   ( KEYOBS9 .a.)` ` ` highestrankedconstraints = {FillOns}  This leaves   ( KEYOBS9 .b.)` ` ` notyetrankedconstraints = {} so the algorithm terminates, with the final result: ( LRNDL1 ) Learned Stratified Hierarchy for L1:  ` ` ` {Ons, Cod, FillNuc} >  > {Parse} >  > {FillOns} This result represents a class of all totallyranked constraint hierarchies which give rise to the target language L1: the same optimal outputs arise regardless of the ranking of the three highest constraints. One of these refinements of the learned stratified hierarchy (LRNDL110) is the particular total ranking given in (RANKING4.a): this is but one of the correct hierachies for the target language. It is easy to see how the course of learning L2 differs from that of L1, assuming the learner's initial datum is the parse of the same input, /VCVC/, which is now V .CV.C]. (candidate c in ( FUN1 -); see the tableau (RANKING4.d)). The markdata table used by the algorithm, containing the marks of subopt u opt pairs after cancellation, is now: ( MARKS'2 ) Markdata Pairs after Cancellation (L2) h dd]  dd] h "RR"subopt u optlosermarkswinnermarks"" " "a u c.V.CVC.u V .CV.C].{*Ons, *Cod}{*Parse, *FillNuc}"`! `! "b u c V .CV. C u V .CV.C].{ *parse *Parse}{ *parse *FillNuc}"`! `! "d u c.]V.CV. C u V .CV.C].{ *parse *FillOns}{ *parse *FillNuc} This table is identical to its L1 counterpart ( MARKS'7 ) except that the marks *FillNuc and *FillOns are interchanged. The result of the algorithm is therefore the same as before, (LRNDL110), with this exchange made. ( LRNDL2 ) Learned Stratified Hierarchy for L2:  ` ` ` {Ons, Cod, FillOns} >  > {Parse} >  > {FillNuc} Again, this stratified hierarchy is correct: its further refinements into totallyranked hierarchies, including the one we singled out in (RANKING4.d), all give rise to L2. That these CV languages can each be learned completely from a single positive example attests to the power of the implicit negative data which comes with each positive example in Optimality Theory. #gP&P# 2.2 General Statement of the RCD Algorithm (Informal Version)#iP&P# Given: universalconstraints = a set of universal constraints ` ` ` initialdata = a set of wellformed outputs of the target language L We assume that L arises from some (not necessarily unique) total ranking R of the universal constraints. To Find: ` ` ` A stratified hierarchy in which these are the optimal parses of their corresponding inputs. We proceed via an intermediate step in which initialdata is prepared for the algorithm. #gP&P# 2.2.1 Data Preparation #iP&P# Given: universalconstraints = a set of universal constraints ` ` ` initialdata = a set of wellformed outputs of the target language L Generate: ` ` ` a. For each output in initialdata (opt), choose a set of suboptimal competitors (subopt) ` ` ` b. Make a table of these pairs, subopt u opt, listing the marks incurred by the subopts in one column, losermarks, and the marks incurred by opts in another, winnermarks. c. The result is markdata =   T dd` o  dd` o T **subopt u optlosermarks = marks(subopt)winnermarks = marks(opt)  .........   #gP&P# 2.2.2 Recursive Constraint Demotion (RCD) #iP&P# Given: universalconstraints = a set of universal constraints ` ` ` markdata = a table of pairs of mark lists (losermarks, winnermarks) To Find: ` ` ` A stratified hierarchy with respect to which each of the losermarks is less harmonic than its corresponding winnermarks. RCD Algorithm: Set notyetrankedconstraints = universalconstraints  I. Mark Cancellation ` ` ` For each pair (losermarks, winnermarks) in markdata: ` ` `   a. For each occurence of a mark * in both losermarks and winnermarks in the same pair, remove that occurence of * from both. ` ` `   b. If, as a result, no winnermarks remain, remove the pair from markdata. II. Recursive Ranking ` ` ` ` ` ` a. Output highestrankedconstraints = all the constraints in notyetrankedconstraints which do not appear in the column winnermarks of markdata; these form the highestranked stratum of the notyetranked constraints. ` ` ` ` ` ` b. Remove the highestrankedconstraints from the notyetrankedconstraints. ` ` ` ` ` ` c. Remove all rows from markdata which contain any marks assessed by the highestrankedconstraints. ` ` ` ` ` ` d. Call Recursive Ranking again, with the reduced markdata and the reduced notyetrankedconstraints. Note that in step c of Recursive Ranking, the relevant marks (those assessed by the highestrankedconstraints) can only appear in the column losermarks; for any constraint contributing a mark to the column winnermarks is not, by step a, among the relevant constraints (those in highestrankedconstraints). #gP&P# 2.3 Informal Analysis of the Algorithm #iP&P# Observe first that multiple uncancelled tokens of the same type of mark in the markdata table, either for winner or loser, have the same effect as a single token. For in step a, we simply determine which constraints assess no marks at all in the winnermarks column: whether a single or multiple tokens of a mark appear makes no difference. Then in step b, a row is removed from markdata if it contains any marks at all assessed by the highestrankedconstraints; multiple tokens of a mark type have the same effect as a single token. Thus, for efficiency considerations below, we can assume that in Initialization step c, if a row of the markdata table contains multiple tokens of the same type of mark, duplicates are eliminated, leaving at most one token of each type. In other words, in the initial markdata table prior to cancellation, what really matters is, for each constraint, which of subopt or opt incurs more violations of the constraint ; if it is subopt, then a token of the mark * appears in the column losermarks; if it is opt, then the token of * appears instead in the column winnermarks. What matters in the assessment of optimality is only which of two candidates more seriously violates each constraint: not any absolute magnitude of violation. The correctness of the algorithm should be clear. The highestrankedconstraints output at each pass of the algorithm are exactly the constraints which need not be dominated in order to explain the available data; the remaining notyetrankedconstraints, by contrast, must be dominated and so cannot be highestranked. We now show that the algorithm must terminate. On each pass of the algorithm, at least one constraint must be output. For suppose not. That would mean that every one of the notyetrankedconstraints appears in the column winnermarks, i.e., as an uncancelled mark of an optimal form. But that would mean that every one of the notyetrankedconstraints must be dominated by one of the other notyetrankedconstraints: which means there is no ranking of these constraints which is consistent with the markdata, in contradiction to the basic assumption that the markdata derive from some ranking of the given constraints. So on each pass, at least one constraint is eliminated from the notyetrankedconstraints. Thus the number of passes required cannot be more than the number of universal constraints: call this Nconstr. The number of steps required for each pass cannot exceed the number of uncancelled marks in the markdata table: each mark in the column winnermarks is examined in step a to ensure that its corresponding constraint will not be output as a highestrankedconstraint, and, in the worst case, each mark in the column losermarks must be examined in step c to determine which rows must be eliminated from data. The number of uncancelled marks per row of the table can't exceed the number of constraints Nconstr, so the total number of steps per pass can't exceed NconstrNpairs, where Npairs is the number of rows in the initial markdata table, i.e., the number of pairs subopt u opt used. So the number of steps required for all the passes can't exceed ` ` ` (Nconstr)2Npairs In the worst case, the RCD algorithm is quadratic in the number of constraints, and linear in the number of markdata pairs. This makes the algorithm quite efficient, although further work is required to investigate efficiency issues in the choice of subopt u opt pairs. #gP&P# 3. General Constraint Demotion #iP&P# In this section, we pull out the core part of the RCD algorithm, which we call the Core CD Algorithm, and show how it can be used in a number of ways to yield a family of CD Algorithms which differ primarily in how much data they process at one time. #gP&P# 3.1 The Core CD Algorithm #iP&P# Given:  = a stratified hierarchy of constraints ` ` ` markdata = a table of pairs of mark lists losermarks u winnermarks To Find: ` ` ` A stratified hierarchy with respect to which each of the losermarks is less harmonic than its corresponding winnermarks. Prior to use of CD, as for RCD (see 2.2.2) markdata must be prepared as appropriate for the version of CD which will be used (see 3.2). Core CD Algorithm I. Mark Cancellation: As for RCD (2.2.2) II. Constraint Demotion (repeat this step until no demotions occur) ` ` ` for each pair losermarks u winnermarks in markdata ` ` `   i. find the mark *loser in losermarks which is highestranked in  ` ` `  ii. for each *winner in winnermarks ` ` `  hhhif loser does not dominate winner in , then demote constraint winner to the stratum in  immediately below that of loser (creating such a stratum if it does not already exist) Output  #gP&P# 3.2 Versions of the CD Algorithm #iP&P# Now we present several ways of using the Core CD Algorithm to learn a language. All start, like RCD, with the stratified hierarchy in which all the universal constraints are in the top stratum: we call this 0. The constraints in this stratified hierarchy are then demoted as demanded by the data. ( BATCHCD ) Batch CD a. Compile all available output forms into initialdata. b. Perform Data Preparation, generating markdata. ` ` ` c. Set  = 0 d. Execute Core CD on markdata and  (once). Batch CD is closely related to RCD (see 3.4): all the data is processed simultaneously by Core CD. ( ONLINECD ) Online CD ` ` ` a. Set  = 0  Execute repeatedly: ` ` ` ` ` ` b. Select one optimal output opt and one suboptimal competitor subopt. ` ` ` c. Set markdata = {marks(subopt) u mark(opt)}. ` ` ` ` ` ` d. Execute Core CD on markdata and current . Online CD operates by applying Core CD separately to each subopt u opt pair. ( IOCD ) I/O CD ` ` ` a. Set  = 0  Execute repeatedly: ` ` ` b. Select one optimal output opt. ` ` ` c. Perform Data Preparation, generating markdata. ` ` ` d. Execute Core CD on markdata and current .    I/O CD is one way of using Core CD in between the extremes of Batch CD (all data processed at once) and OnLine CD (one markdata pair processed at a time). One form opt from the language is processed at a time, but a number of competitors subopt to opt may be processed at once. #gP&P# 3.3 Example #iP&P# We exemplify OnLine CD using the language L1 above. Recall that OnLine CD operates on one losermarks u winnermarks pair at a time. Step a of OnLine CD has us set the initial stratified hierarchy to ` ` `  = 0 = {Parse, FillNuc, FillOns, Ons, Cod} Step b has us pick a subopt u opt pair; suppose we pick b u d of ( MARKS6 ). Step c has us generate the corresponding pair of mark lists, as shown in the second row of ( MARKS6 ). Step d has us execute Core CD. Step I of Core CD is Mark Cancellation; the result is: () Markpair for OnLine CD, Step 1 (L1) h dd]  dd] h "RR"subopt u optlosermarkswinnermarks"" " "b u d V .CV. C u .]V.CV. C { *parse *Parse}{ *parse *FillOns} Step II of Core CD is Constraint Demotion. We first find the highestranked (uncancelled) mark *loser in ; this is *Parse. We next check to see whether the winnermarks are dominated by *Parse. The only winner mark is *winner = *FillOns, which is not so dominated. So Core CD calls for demoting winner = FillOns to the stratum immediately below loser = Parse. Since no such stratum currently exists, we create it. The hierarchy is now ` ` `  = {Parse, FillNuc, Ons, Cod} >  > {FillOns} This completes the execution of Core CD, so we return to step b of the OnLine CD procedure: picking another subopt u opt pair. Suppose this is a u d of ( MARKS6 ): () Markpair for OnLine CD, Step 2 (L1) h dd]  dd] h "RR"subopt u optlosermarkswinnermarks"" " "a u d.V.CVC.u .]V.CV. C {*Ons, *Cod}{*Parse, *FillOns} We proceed to the Constraint Demotion step of CD, which calls first for finding the highestranked of the losermarks; since Ons and Cod are both top ranked, either will do: choose, say, Ons. We next check whether each of winnermarks is dominated by Ons. FillOns is so dominated, so no demotion results. Parse is not, however, so it is demoted to the stratum immediately below that of Ons; the hierarchy is now: ` ` `  = {FillNuc, Ons, Cod} >  > {Parse, FillOns} Suppose now that the next subopt u opt pair is: () Markpair for OnLine CD, Step 3 (L1) h dd]  dd] h "RR"subopt u optlosermarkswinnermarks"" " "c u d V .CV.C].u .]V.CV. C { *parse *FillNuc}{ *parse *FillOns} Since the uncancelled loser mark, *FillNuc already dominates the uncancelled winner mark, *FillOns, no demotion results, and  is unchanged. This is an example of an uninformative pair, given its location in the sequence of training pairs, since no demotions result. Suppose the next subopt u opt pair results from a new input, /VC/: () Markpair for OnLine CD, Step 4 (L1) h dd]  dd] h "RR"subopt u optlosermarkswinnermarks"  " VC u .]V. C { *parse *Parse}{ *parse *FillOns}Since the loser mark *Parse does not dominate the winner mark *FillOns, we must demote FillOns to the stratum immediately below Parse, resulting in the hierarchy: ` ` `  = {FillNuc, Ons, Cod} >  > {Parse} >  > {FillOns} This stratified hierarchy actually gives rise exactly to L1, so no further data will be informative. It is the same hierarchy as that learned by RCD, (LRNDL110). #gP&P# 3.4. Analysis #iP&P# Here's a sketch of the proof that the procedures of 3.3 based on the CD algorithm converge to a correct stratified hierarchy for the target language L. The basic idea is this: If at any point CD places a constraint 3 in the third stratum, it is only because the data seen so far entail a sequence of constraint dominations in which two other constraints 1 and 2 form a domination chain over 3: ` ` ` 1 >  > 2 >  > 3. Constraints start at the top of the hierarchy and are only demoted by CD when they have to be in order to accomodate the data currently being evaluated. Each constraint is demoted until, eventually, it arrives in the highest stratum permitted it by the language. Each demotion brings the constraint demoted at least one stratum closer to its final destination; since constraints are never promoted, the hierarchy converges monotonically, in a sense defined precisely below, to a correct hierarchy. We present a few definitions and then a series of propositions leading to the convergence result. In this section, we assume that all constraint hierarchies involve the same set of constraints. () Def. The offset of a constraint  in a hierarchy  is the number of strata above .  is in a lower stratum in 1 than in 2 if the offset of  in 1 is greater than in 2.  is in the same stratum in 1 and 2 if it has the same offset in both. () Def. A constraint hierarchy 1 hdominates 2 if every constraint is in the same or a lower stratum in 2 than in 1. This concept must be clearly distinguished from a more obvious relation between hierarchies: () Def. A constraint hierarchy 2 is a refinement of 1 if every domination relation  >  > - of 1 preserved in 2. () Prop. For any language L generated by an underlying total constraint ranking, there exists a unique stratified hierarchy L which generates L and which has the property that each constraint is ranked as high as possible; that is, for any other hierarchy - which generates L, every constraint is in the same stratum or a lower stratum in - relative to L. We term L the hdominant target stratified hierarchy for L, or simply the target for L. () Cor. The target for L hdominates every total constraint ranking which generates L. () Prop. The hierarchy output by CD hdominates the input hierarchy. This holds because CD only demotes constraints. () Prop. If the input hierarchy hdominates the target, so does the output hierarchy. This holds because CD will never demote a constraint lower than necessary; the constraint is as low as necessary in the target. () Def. The hdistance between a hierarchy 1 and a hierarchy 2 hdominated by 1 is the sum, over all constraints , of the difference between the offset of  in 1 and in 2. () Lemma. The hdistance between the current hierarchy and the target decreases by at least 1 for each demotion/informative pair. () Lemma. The hdistance from the maximal hierarchy to the target cannot exceed ` ` ` (Nconstr1)Nconstr. () Thm. Starting with 0 and presenting the CD algorithm with data according to any of the procedures Batch CD, OnLine CD, or I/O CD, the procedure converges on the target hierarchy L for L after at most (Nconstr1)Nconstr informative examples. () Prop. The relation of Batch CD to RCD is this. The first iteration of Batch CD runs once through all the data, demoting a certain number of constraints. The contraints which remain in the top stratum at the end of this first iteration can never be demoted in subsequent iterations through the data; and other constraints can never be promoted into this stratum. So after the first iteration, the top stratum has its final form. All markdata pairs containing these constraints can be removed and ignored in subsequent iterations. This same property then holds recursively for subsequent iterations. In general, nth iteration of Batch CD definitively determines the nth stratum from the top. This is the same at the corresponding stratum output by RCD on the nth call to RCD (a recursive call for n > 1). In other words, the real difference between Batch CD and RCD is slight: Batch CD demotes constraints to a number of levels simultaneously, never demoting any constraints in the top n strata after iteration n, while the nth call to RCD in effect just demotes constraints from the nth to the nĩ1st stratum. This is why we have used the term Recursive Constraint Domination even though, unlike the other algorithms we have discussed, RCD does not explicitly employ Core CD.   #gP&P# 4. Consequences and Extensions #iP&P# #gP&P# 4.1 Ties #iP&P# The way we have approached the learning problem assumes that all competitors to an observed output are suboptimal. This might fail to be the case in two different ways. First, a single input may give rise to multiple optimal outputs which all earn the same set of marks (i.e., the constraints fail to distinguish them). In this case, while it is not actually correct to assume that all competitors of a given observed output are suboptimal, no errors in the learning result. Recall that the first step in the CD algorithm is the cancellation of common marks. Thus if the `suboptimal' form selected is in fact a second optimal form, then all marks cancel and this pair is discarded. Two forms which do not incur identical marks can only tie for optimality if the underlying constraint hierarchy is not totally ranked, for two different marks must in this case have the same rank: neither can dominate the other. In this case, observing one of the optimal forms and assuming that all others are suboptimal can lead to errors. It is currently unknown whether it is possible to extend CD to learn languages in which some inputs have multiple optimal outputs which do not earn identical sets of marks. We might, for example, suppose that the target language derives from an underlying stratified hierarchy rather than a totally ranked hierarchy. We might also assume that the learner's initial data consists in a set of inputs, and for each input, all its optimal outputs, should there be more than one. Much of the analysis extends to this setting, although the Core CD Algorithm needs to be extended with an addition step to handle pairs opt1  opt2 of tying optima. In this step, each mark in marks-(opt1) must be placed in the same stratum as a corresponding mark in marks-(opt2): a somewhat delicate business. Indeed, achieving ties for optimality between forms which incur different marks is a tricky matter. It may be that learning languages which do not derive from a totallyranked hierarchy is much more difficult than the case we have studied here. In that case, demands of learnability would suggest that universal grammar rules out such languages. Even though the set of target languages is generated from the set of totally ranked hierarchies of the universal constraints, the hypothesis space employed by the CD learning algorithm is the larger space of stratified hierarchies. That learning can be much less difficult when the hypothesis space is larger than the target space is a theme of recent work in Computational Learning Theory. While we cannot at this point rule out the possibility of an equally efficient learning algorithm which operates purely in the smaller space of totallyranked hierachies, it seems quite unlikely to us. Learning in the context of initial data which contains outright errors is almost certainly a problem of much greater complexity than that addressed here. It is worth noting, however, that if the initial data presented to the RCD algorithm are not consistent with any total ranking of the given constraints, the algorithm discovers this fact: At some stage of its execution, it fails in its attempt to find a constraint which does not appear among the winnermarks (Step II.a; see the discussion of convergence in 2.3). #gP&P# 4.2 The Superset Principle #iP&P# Taken seriously as a model of language acquisition, the CD algorithm suggest a Superset Principle in which the languages hypothesized earlier in learning are larger than, or supersets of, later languages. The initial state 0 is a completely unrefined stratified hierarchy in which all constraints form a single stratum. In the CV syllable structure example considered above, in this initial state, all the candidate parses we considered are tied for optimality, therefore all are accepted initially as wellformed. Data is then used to successively refine this hierarchy " in a somewhat loose sense: the output of CD at any stage is not in general literally a refinement of its input, but the number of strata is strictly nondecreasing. The end result in the syllable structure example is that all but one of the initially accepted outputs is rejected by the final grammar.   #gP&P# Appendix. Formal Statement of the RCD Algorithm and Proof of Its Correctness #iP&P# #gP&P# A.1 The RCD Algorithm #iP&P# Given: A list of constraints  A list of markdata pairs: ` ` `  markdata = {(losermarksi , winnermarksi)} ` ` ` where each losermarksi and winnermarksi is a list of marks = {*m} Part I: Mark Cancellation for each pair i in markdata for each occurrence of a mark *m in both losermarksi and winnermarksi ` ` ` remove *m from winnermarksi; remove *m from losermarksi if winnermarksi is empty, remove (losermarksi, winnermarksi) from markdata endfor Part II: Recursive Ranking while notyetrankedconstraints not empty do set highestrankedconstraints to empty copy notyetrankedconstraints to highestrankedconstraints for each pair i in markdata ` ` ` for each constraint j in highestrankedconstraints matching a mark *m in winnermarksi ` ` ` remove j from highestrankedconstraints endfor for each constraint j in highestrankedconstraints ` ` ` remove j from notyetrankedconstraints endfor for each pair i in markdata ` ` ` if a mark *m in losermarksi matches a constraint j in highestrankedconstraints ` ` `  then remove (losermarksi, winnermarksi) from markdata endfor OUTPUT highestrankedconstraints endwhile This algorithm sequentially outputs lists of constraints, with the first list containing the constraints ranked the highest, the second list containing the constraints ranked second highest, etc. #gP&P# A.2 Proof of Correctness of the Algorithm #iP&P# The algorithm is correct if it outputs a partial ranking of the constraints which assigns strictly higher harmony to opt than to subopt for each pair subopt u opt. Thus, any totally ranked hierarchy consistent with the output (partially ranked) stratified hierarchy will correctly evaluate all of the data presented. Part I of the algorithm, where common marks are cancelled within markdata pairs, is justified by the Cancellation Lemma (P&S:139) which says precisely that the relative harmony of two structures will be correctly determined if all common marks are cancelled, and only the remaining marks are considered. Further, if cancellation results in the winner of a pair having no remaining marks, then the winner will be assigned higher harmony than the loser no matter which way the constraints are ranked. Consequently, the markdata pair is guaranteed to be satisfied by the algorithm's output, and it contains no useful information to constrain the ranking, so it may be discarded. Part II of the algorithm may be proven recursively by demonstrating that: ` ` ` ` ` ` (a) an iteration correctly finds and outputs the subset of the given constraints that may be ranked highest; ` ` ` ` ` ` (b) at the end of an iteration, the remaining markdata pairs may only be explained on the basis of the constraints not yet output. Thus, what is left over is itself a well-defined problem, and the next set of constraints to be output is precisely the set of constraints that may be highest ranked in the remaining problem; ` ` ` ` ` ` (c) each iteration is guaranteed to output a list with at least one constraint, so long as a solution exists that is consistent with all of the markdata pairs. This ensures that the algorithm converges. First, we prove that an iteration outputs those constraints which may be ranked highest (of the given constraints). From the set of constraints not yet output, a constraint is removed if a corresponding mark is found in at least one of the winners' lists of marks. The Cancellation/Domination Lemma (P&S:130,148,221) states that one parse is assigned higher relative Harmony than another if and only if every uncancelled mark incurred by the winner is dominated by an uncancelled mark of the loser. The first part of the algorithm insures that only uncancelled marks are considered. If an uncancelled mark incurred by the winner is dominated by one incurred by the loser, then the mark incurred by the winner cannot be among the highest ranked. Thus, all constraints eliminated by this step cannot be among the highest ranked. Each of the constraints remaining at the end of the step may or may not appear in any particular loser's list of marks. If it doesn't, then the constraint is neutral with respect to that markdata pair, so ranking it highest is (vacuously) consistent with that markdata pair. If it does, then ranking the constraint highest guarantees that the winner will be more harmonic, as the constraint's mark dominates all uncancelled marks of the winner, so ranking the constraint highests is (non-vacuously) consistent with the markdata pair. Thus, ranking all constraints remaining in the list highest is consistent with all of the markdata pairs. Next, we prove that, at the end of an iteration, the remaining markdata pairs may only be explained on the basis of the constraints not yet output. First observe that no constraints that were output during the iteration may have marks appearing in any of the winnermarks lists, because any constraint with a mark appearing in any winnermarks was automatically disqualified from being output. Further, the step of the algorithm following the output step removes all markdata pairs in which an outputted constraint has a mark in the Lose mark list. Thus, the remaining markdata pairs have no marks corresponding to constraints already output. The constraints and markdata pairs remaining at the end of an iteration may then be viewed as a problem set of their own. The next set of constraints to be output is exactly the set of constraints that may be ranked highest given the remaining constraints and the remaining markdata pairs. This recursion bottoms out when no markdata pairs are left. Because markdata pairs with empty winnermarks were removed in Part I of the algorithm, it is guaranteed that while there are markdata pairs in the datalist, there will be constraints not yet output, including each constraint with a mark in at least one of the win lists of the remaining markdata pairs. If no markdata pairs remain in the list, then no constraints will be removed from the list of remaining constraints, and all of them will be output. This is correct, because each of the constraints (vacuously) satisfies the remaining data (since there is none). Finally, we prove that the algorithm converges whenever a solution exists. Suppose, to the contrary, that during an iteration, all (not yet output) constraints were disqualified from being highest-ranked. This means that each of the constraints has a mark appearing in the winnermarks list of at least one markdata pair. This means that the markdata pairs collectively require each constraint to be dominated by another constraint, which violates the requirement that constraint domination be non-circular. Thus, no solution exists which will satisfy all of the markdata pairs. So, if a solution does exist, then each iteration will output a list containing at least one constraint, and reduce the list of constraints not yet output. This guarantees that the algorithm will converge. #gP&P# Acknowledgements #iP&P# We are most grateful for important comments and encouragement from Alan Prince, G)raldine Legendre, Jane Grimshaw, and Bruce Hayes. Tesar acknowledges the support of an NSF Graduate Fellowship, and both authors acknowledge the support of NSF grant IRI9213894. We are grateful to the Institute of Cognitive Science at the University of Colorado at Boulder for partial support of the First Rutgers Optimality Theory Workshop where this work is to be presented. #gP&P# References #iP&P# Archangeli, Diana. 1984. Underspecification in Yawelmani phonology and morphology. Doctoral Dissertation. MIT, Cambridge, MA. [Garland Press, 1988.] Broselow, Ellen. 1982. On the interaction of stress and epenthesis, Glossa 16, 115132. Grimshaw, Jane. 1993. Minimal projection, heads, and inversion. Ms. Rutgers University, New Brunswick, NJ. It=, Junko. 1986. Syllable Theory in Prosodic Phonology, Ph. D. Dissertation, UMass, Amherst. It=, Junko. 1989. A Prosodic Theory of Epenthesis, Natural Language & Linguistic Theory 7, 217260. Kaye, Jonathan and Jean Lowenstamm. 1984. De la syllabicit), in F. Dell, D. Hirst, and J.R. Vergnaud, ed., Forme sonore du langage, Hermann, Paris. LaPointe, Steven and Mark Feinstein. 1982. The Role of Vowel Deletion and Epenthesis in the Assignment of Syllable Structure, in The Structure of Phonological Representations.Part II, ed. H.v.d.Hulst and N. Smith, 69120. Foris, Dordrecht, . Legendre, G)raldine, William Raymond, and Paul Smolensky. 1993. Analytic typology of case marking and grammatical voice. Proceedings of the Berkeley Linguistics Society, 19. McCarthy, John. 1979. Formal Problems in Semitic Phonology and Morphology, Doctoral dissertation, MIT, Cambridge, MA. McCarthy, John and Alan Prince. 1993. Prosodic Morphology I: constraint interaction and satisfaction. Ms. University of Massachusetts, Amherst, and Rutgers University, New Brunswick, NJ. To appear as Linguistic Inquiry Monograph, MIT Press, Cambridge, MA. Piggott, Glyne and Raj Singh. 1985. The phonology of epenthetic segments. Canadian Journal of Linguistics 30, 415453. Prince, Alan and Paul Smolensky. 1991. Notes on connectionism and Harmony Theory in linguistics. Technical Report CU-CS-533-91. Department of Computer Science, Univ. of Colorado, Boulder. Prince, Alan and Paul Smolensky. 1993. Optimality Theory: Constraint Interaction in Generative Grammar. Ms. Rutgers University, New Brunswick, and University of Colorado, Boulder. NJ. To appear as Linguistic Inquiry Monograph, MIT Press, Cambridge, MA. Selkirk, Elisabeth. 1981. Epenthesis and degenerate syllables in Cairene Arabic. In Theoretical Issues in the Grammar of the Semitic Languages, ed. H. Borer and J. Aoun. MIT, Cambridge. Steriade, Donca. 1982. Greek prosodies and the nature of syllabification. Doctoral Dissertation. MIT, Cambridge, MA.