CLIN 2005 Abstracts
  • Using parallel text to guide parse selection
    Erwin Marsi (Communication & Cognition, Tilburg University)
    Parallel text corpora have proven to be very useful resources for inducing linguistic knowledge, ranging from basic NLP tasks like PoS tagging to full blown applications like statistical machine translation. In general, these corpora become even more useful when enhanced with syntactic annotation. As syntactic parsers are by no means perfect, substantial manual correction is required. Our hypothesis is that parsing accuracy can be improved by exploiting the fact that we have a parallel text available, potentially reducing the amount of correction labour required. In this talk, we explore to what extent PP attachment ambiguity - one of the typical parsing problems - can be resolved by aligning and reranking parallel parse trees. To give an example of this idea: "saw the second man with the binoculars" may be disambiguated exploiting the parallel text "saw the man well using binoculars". Our approach is to rerank potential parses of one sentence according to how well a parse can be aligned to the independent parse of the parallel sentence. We explain the alignment and reranking procedure, and report empirical results on a small corpus of parallel Dutch text.