CLIN 2005 Abstracts
  • Stochastic Tree-Substitution Grammars: Descriptive Adequacy, Computational Complexity and Statistical Consistency
    Willem Zuidema (Institute for Logic, Language and Computation, University of Amsterdam)
    We study the formalism of Stochastic Tree Substitution Grammars(STSGs), such as primarily used in the Data-Oriented Parsing framework (Scha, 1990; Bod, 1998). We evaluate the advantages and limita-tions of the formalism for describing syntactic dependencies in natural language data, and analyze results on the complexity of parsing with STSGs (Goodman, 1996; Sima'an, 2002) and the problem of learning STSGs from treebank data (Johnson, 2002). We first show that some of these complexity and estimation theoretic results are presented as more problematic than they really are, but then proceed by identifying some very real computational problems. We further show that proposed restrictions of the formalism (Goodman, 1996; Collins and Duffy, 2002) that alleviate the computational problems, in turn impair its descriptive coverage. We conclude with a new proposal on how to reconcile the descriptive desiderata with the computational constraints.