CLIN 2005 Abstracts
  • Multi-granular resources: OSLIN
    Maarten Janssen (ILTEC, Lisboa, Portugal)
    In the design of lexica for natural language processing purposes, there is a mismatch between two different requirements. On the one hand, the complexities of language require a very fine-grained description of the lexical items, providing all the lexical information necessary for processing the text. But on the other hand, for processing freely chosen texts, there is the need to have at least some information on the full range of lexical items appearing in the text. And given the enormous size of the actual lexicon of a language - especially compounding languages like Dutch, it is impossible to provide very fine-grained information for all words. This problem is made even bigger by the occurrence of neologisms, which require full-size lexicon not just to be big, but constantly updated as well. Many of the most successful shallow parsers, such as these developed in Pisa, are therefore based on the principle of combining different resources. But the problem with this is that these resources are not designed to be compatible, and often cannot be combined coherently. This paper explains the mutli-granular design of OSLIN - an open-source lexical information network - a proposal for the construction of a network of lexical resources, in which not all lexical items need to be described at the same level of granularity. OSLIN is based upon the design of the MorDebe database, developed by the ILTEC institute in Lisbon, which is a full-scale (inflectional) morphological database, currently containing around 1,,5M word-forms for Portuguese. MorDebe is fully integrated with a semi-automatic neologism tracking system called NeoTrack, and hence provides a constantly kept up-to-date resource. The idea behind OSLIN is to construct a network of coherent lexical resources around these basic morphological data. This paper focusses on the problems such a network faces, possible solutions to these problems, and the possibility of setting up a partially wiki-based framework.