Audio visual data fusion
Our greater interest involves the development of
methodologies for the fusion of audio and video data in
multi-modal information streams (such as movies, web-cams,
surveillance cameras) in order to create a robust tool for the
detection of scene transitions and events.
In this project, the data we will explore is a multi-media stream containing
different people talking to the camera. This data has many real
world applications, since in many widely available multimedia
streams people are being filmed talking. Examples include news
anchors, talk shows, interviewed people etc. The objective of this
work will be to create a framework able to detect how many people
appear in the video data, how many people speak in the
accompanying audio data and -most important- associate each person
with the corresponding audio segments.
Research Group Members
Drs A. Noulas
Dr Ben Kröse
Independent Modality Analysis
State of the art techniques are used to analyze each modality independently. Algorithms will be implemented to achieve efficient face detection, face recognition, optical flow extraction, low level visual and audio feature extraction. This data will then be fused using probabilistic models. The generative model we use can be seen in the next Figure
Funding
This project is part of the MultimediaN project.
|