CLIN 2005 Abstracts
  • Induction of a Dependency Parser
    Yoav Seginer (ILLC, Universiteit van Amsterdam)
    In this talk I describe an unsupervised learning algorithm for the induction of an incremental dependency parser from raw text. The parser and learning algorithm work in tandem to bootstrap the parser: as an utterance is read from left to right, the parser incrementally assigns it a dependency structure based on parameters learned from previous examples. Simultaneously, the learning algorithm uses the resulting parse to improve its estimation of these (and other) parameters. The parser and learning algorithm were designed for and applied to the adult utterances in the Childes corpus. The input is therefore spoken language, with all its disfluencies, incompleteness and ungrammaticality. At the same time, it is usually syntactically simple, with a limited vocabulary and an extensive use of pronouns. It also displays a balanced mixture of declaratives, imperatives and questions.

    I will conclude by discussing the evaluation of such a parser, especially when a relevant gold standard is not available. I will argue that a criterion for the success of an unsupervised learning algorithm is the stability of its output when trained on different corpora of the same language. This is controlled for triviality by its failure to generate the same results when trained on various corpora not taken from that language (but with the same set of words).