CLIN 2005 Abstracts
  • Criterions Aggregation for Best Flexible Document Categorization
    Jebari Chaker (King Saud University, College of Computer and Information Sciences, Computer Science Department)
    Ounalli Habib (Université de Tunis El'Manar, Faculté des Sciences de Tunis, Département d'Informatique)
    Document categorization can be used in different natural language processing applications, In particular document type identification. We analyse in this paper three types of criterions used to identify document type (or category) (book, PhD thesis master thesis, scientific report, scientific paper, call for papers, FAQ, email). Also, we try to aggregate these criterions in order to improve categorization quality. We conduct various experiments on a corpus of 760 textual documents collected from the web, using a FlexDC system that we have developed to identify document type. Our experiments show the usefulness of these three criterions and of a new, technique for aggregating these criterions.