CLIN 2005 Abstracts
  • Parse tree disambiguation using Minimum Description Length
    Scott Martens (Centrum voor Computerlinguistiek, Katholieke Universiteit Leuven)
    The minimum description length principle (MDL) is a technique for inductive inference from data that has become increasingly widespread in data mining and machine learning applications, but is still relatively unknown in computational linguistics. It is similar in content to the techniques conventionally used in data compression, but represents a more generalized set of principles applicable to all sorts of information regardless of their structure. Furthermore, it offers us a simple tool to quantitatively evaluate the productivity of any regularities in data. This presentation will briefly describe the MDL principle and apply it to a the problem of selecting among different possible parses of English sentences extracted from the British National Corpus by evaluating the constituents of the different parses.