Object recognition by scene alignment

Antonio Torralba

Computer Science and Artificial Intelligence Laboratory
Dept. of Electrical Engineering and Computer Science
Massachussetts Institute of Technology, USA

Abstract. Object detection and recognition is generally posed as a matching problem between the object representation and the image features (e.g., aligning pictorial cues, shape correspondence, constellations of parts, etc.) while rejecting the background features using an outlier process. In this talk, we take a different approach: we formulate the object detection problem as a problem of aligning elements of the entire scene. The background, instead of being treated as a set of outliers, is used to guide the detection process. Our approach relies on the observation that when we have a big enough database then we can find with high probability some images in the database very close to a query image, as in similar scenes with similar objects arranged in similar spatial configurations. If the images in the retrieval set are partially labeled, then we can transfer the knowledge of the labeling to the query image, and the problem of object recognition becomes a problem of aligning scene regions. But, can we find a dataset large enough to cover a large number of scene configurations? Given an input image, how do we find a good retrieval set, and, finally, how we do transfer the labels to the input image? We will use two datasets; 1) the LabelMe dataset, which contains more than 10,000 labeled images with over 180,000 annotated objects. 2) The tiny images dataset: A dataset of weakly labeled images with more than 79,000,000 images. Work in collaboration with Rob Fergus, Bryan Russell, Ce Liu and William T. Freeman