CLIN 2005 Abstracts
  • A Pilot Study for a Corpus of Dutch Aphasic Speech (CoDAS)
    Eline Westerhout (Utrecht University)
    Paola Monachesi (Utrecht University)
    The Spoken Dutch Corpus (SDC) represents an important resource for the study of contemporary standard Dutch, as spoken by adults in the Netherlands and Flanders. However, it only contains speech from adults with intact speech abilities. There is the need to develop specialized corpora which represent other types of speech such as aphasic speech. We will present the results of a pilot study for the development of such a corpus: CODAS, a Corpus of Dutch Aphasic Speech.

    Given the special character of the speech contained in CoDAS, we cannot simply carry over the design and the annotation protocols of existing corpora, such as SDC or CHILDES. However, they have been assumed as starting point. In our pilot study, we have established the basic requirements with respect to text types, metadata and annotation levels that CoDAS should fulfill. In this respect, we have investigated whether and how the procedures and protocols for the annotation and transcription used for the SDC should be adapted in order to annotate and transcribe the aphasic speech properly. In particular, for the orthographic transcription and the part-of-speech tagging, suggestions for improvement of the existing protocols have been given. On the other hand, the phonetic transcription procedure assumed within the SDC can be adopted without major modifications.