FS2HSC - IEEE/RSJ IROS 2007 Workshop:
From sensors to human spatial concepts

November 2, 2007, San Diego, CA, USA


Call for Papers

Important Dates




Invited Speakers


Sponsored by: Workshop sponsored by COGNIRON project


Workshop sponsored by NXP

Invited speakers:

Laurent Itti, University of Southern California,USA

In recent years, a number of neurally-inspired computational models have emerged which demonstrate unparalleled performance, flexibility, and adaptability in coping with real-world inputs. In the visual domain, in particular, such models are achieving great strides in tasks including focusing attention onto the most important locations in a scene, recognizing attended objects, computing contextual information in the form of the "gist" of the scene, and planning/executing visually-guided motor actions, among many other functions. However, these models have not yet been able to demonstrate much higher-level or cognitive computation ability. On the other hand, symbolic models from artificial intelligence have reached significant maturity in their cognitive reasoning abilities, but the worlds in which they can operate have been necessarily simplified (e.g., a chess board, a virtual maze). In this talk I will present the latest developments in our and other laboratories which attempt to bridge the gap between these two disciplines, neural modeling and artificial intelligence, in developing the next generation of robots. I will briefly review a number of efforts which aim at building models that can both process real-world inputs in robust and flexible ways, and perform cognitive reasoning on the symbols extracted from these inputs. I will draw from examples in the biological/computer vision fields, including algorithms for complex scene understanding, and for robot navigation.

Dieter Fox, University of Washington, USA

Over the last decade, the mobile robotics community has developed highly efficient and robust solutions to estimation problems such as robot localization and map building. With the availability of various techniques for spatially consistent sensor integration, an important next goal is the extraction of high-level information from sensor data. Such information is often discrete, requiring techniques different from those typically applied to mapping and localization. In this talk I will describe how Conditional Random Fields (CRF) can be applied to tasks such as semantic place labeling, object recognition, and scan matching. CRFs are discriminative, undirected graphical models that were developed for labeling sequence data. Due to their ability to handle arbitrary dependencies between observation features, CRFs are extremely well suited for classification problems involving high-dimensional feature vectors. However, the adequate incorporation of continuous features into CRFs is not trivial, and I will discuss a combination of boosting and CRF training that increases the effectiveness of CRFs applied to continuous data.

Antonio Torralba, MIT CSAIL, Cambridge, USA.

Object detection and recognition is generally posed as a matching problem between the object representation and the image features (e.g., aligning pictorial cues, shape correspondence, constellations of parts, etc.) while rejecting the background features using an outlier process. In this talk, we take a different approach: we formulate the object detection problem as a problem of aligning elements of the entire scene. The background, instead of being treated as a set of outliers, is used to guide the detection process. Our approach relies on the observation that when we have a big enough database then we can find with high probability some images in the database very close to a query image, as in similar scenes with similar objects arranged in similar spatial configurations. If the images in the retrieval set are partially labeled, then we can transfer the knowledge of the labeling to the query image, and the problem of object recognition becomes a problem of aligning scene regions. But, can we find a dataset large enough to cover a large number of scene configurations? Given an input image, how do we find a good retrieval set, and, finally, how we do transfer the labels to the input image? We will use two datasets; 1) the LabelMe dataset, which contains more than 10,000 labeled images with over 180,000 annotated objects. 2) The tiny images dataset: A dataset of weakly labeled images with more than 79,000,000 images. Work in collaboration with Rob Fergus, Bryan Russell, Ce Liu and William T. Freeman

Contact: B.Terwijn@uva.nl