Intelligent Systems Lab Amsterdam


Please see the new ISLA website at: http://isla.science.uva.nl

Audio visual data fusion

Our greater interest involves the development of methodologies for the fusion of audio and video data in multi-modal information streams (such as movies, web-cams, surveillance cameras) in order to create a robust tool for the detection of scene transitions and events.

In this project, the data we will explore is a multi-media stream containing different people talking to the camera. This data has many real world applications, since in many widely available multimedia streams people are being filmed talking. Examples include news anchors, talk shows, interviewed people etc. The objective of this work will be to create a framework able to detect how many people appear in the video data, how many people speak in the accompanying audio data and -most important- associate each person with the corresponding audio segments.

Research Group Members

Drs A. Noulas
Dr Ben Kröse

Independent Modality Analysis

State of the art techniques are used to analyze each modality independently. Algorithms will be implemented to achieve efficient face detection, face recognition, optical flow extraction, low level visual and audio feature extraction. This data will then be fused using probabilistic models. The generative model we use can be seen in the next Figure

Funding

This project is part of the MultimediaN project.
Maintained by Bas Terwijn. Last edited on Mon, 25 Jan 2010 13:38:55 +0100