CLIN 2005 Abstracts
  • Semantic Clustering in Dutch
    Tim Van de Cruys (CLCG, University of Groningen)
    Handcrafting semantic classes is a difficult and time-consuming job, and depends on human interpretation. Possible machine learning techniques would be much faster, and do not rely on interpretation, because they stick to the data. The goal of this research is to present some machine learning techniques that make it possible to achieve an automatic clustering of Dutch words. More particularly, vector space measures are used to compute the semantic similarity of nouns according to the adjectives those nouns collocate with. Such semantic similarity measures provide a thorough basis to cluster nouns into semantic classes. Partitional clustering algorithms, that produce stand-alone clusters, as well as agglomerative clustering algorithms, that produce hierarchical trees, are investigated. For the evaluation of the clusters, evaluation frameworks will be used that compare the clusters to the hand-crafted Dutch EuroWordNet and the Interlingual Wordnet synsets. Additionally, the clustering of adjectives according to the collocating nouns has been investigated.