Intelligent Systems Lab Amsterdam


Please see the new ISLA website at: http://isla.science.uva.nl

Looking at People

A key goal of the Human Perception and Modeling group is the development of sensing systems for the automatic recognition of humans and their activities, enabling a machine to interact intelligent and effortlessly with a human-inhabited environment. We are particularly interested in visual perception in the context of intelligent vehicles, Surveillance, entertainment and home automation. Research can be roughly subdivided in three parts: person detection, pose recovery, and activity recognition.

Person Detection

To detect persons in the image, we consider appearance-based methods, which learn implicit models from training data. We investigate what particular combination of features and pattern classifiers works well by means of large benchmark studies. We furthermore address the "curse of dimensionality", the effect that an exponential increasing number of training samples is needed with increasing dimensionality of the feature space. We do this by developing interactive techniques to accelerate object labeling, which in turn require sophisticated generative models for describing person and background appearance. These models can also be used to generate "virtual" training samples. Finally, we consider non-linear dimensionality reduction and manifold mapping techniques. Pedestrian detection research is in close collaboration with Daimler Research in Ulm, Germany.

Pose Recovery

In order to provide more meaningful features (e.g. viewpoint invariant) for a subsequent activity recognition component, we explore methods to identify human body pose by matching 2D or 3D graphical models. Challenges are abound: how to efficiently (re)initialize the models, how to adapt generic models to particular persons, how to deal with sustained (self) occlusion and how to incorporate temporal information.

We are currently working on a system that estimates 3D human upper body pose from multiple cameras. Pose initialization is performed by hierarchical shape exemplar matching. Temporal integration consists of computing best trajectories combining a motion model and observations in a Viterbi-style maximum likelihood approach. Recovered poses of high confidence are used to adapt a texture model.

Activity Recognition

The straightforward approach of recognizing activities by pattern classification does not scale up to multiple, unconstrained and/or long-term activities, all of which requiring an unrealistic amount of training data. Thus the challenge is to derive a hierarchical representation which decompose high-level activities (e.g. playing tennis) into smaller building blocks, i.e. movement primitives (e.g. "forehand", "backhand", "running to the net"), which are detectable from image data. Apart from the supervised learning case, where activity classes are labeled in advance, we are also interested in the unsupervised case, where the system automatically learns concepts; this latter case is especially relevant for the detection of what constitutes "abnormal" behavior.

We are developing an innovative surveillance system Cassandra, aimed at detecting aggressive human behavior in public environments. A distinguishing aspect of CASSANDRA is the exploitation of the complimentary nature of audio and video sensing to disambiguate scene activity in real-life, noisy and dynamic environments. At the lower level, independent analysis of the audio and video streams yields intermediate descriptors of a scene like cream", "passing train" or "articulation energy". At the higher level, a Dynamic Bayesian Network is used as a fusion mechanism that produces an aggregate aggression indication for the current scene.

Participants

Vacancy (Ph.D. Student / PostDoctoral Researcher)

Maintained by Bas Terwijn. Last edited on Mon, 25 Jan 2010 13:39:04 +0100