打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
Place and scene recognition from video


While navigating in an environment, a vision system has to be able to recognize where it is and what the main objects in the scene are. We present a context-based vision system for place and object recognition. The goal is to identify familiar locations (e.g., office 610, conference room 941, Main Street), to categorize new environments (office, corridor, street) and to use that information to provide contextual priors for object recognition (e.g., table, chair, car, computer). We have trained a system to recognize over 60 locations (indoors and outdoors) and to suggest the presence and locations of more than 20 different object types. The algorithm has been integrated into a mobile system that provides real-time feedback to the user.

As a test-bed for the approach proposed, we use a helmet-mounted mobile system. The system is composed of a web-cam that is set to capture 4 images/second at a resolution of 120x160 pixels (color). The web-cam is mounted on a helmet in order to follow the head movements while the user explores their environment. The user receives feedbackabout system performance through a head-mounted display.

   

Kevin Murphy                         Antonio Torralba

We use a low-dimensional global image representation thatcaptures the "gist" of the scene.This can be used as input to a Bayes net/ HMM, as shown below.(See our ICCV03 paper for details.)

Below we show the performance of place recognition for a sequence that starts indoors and then goes outdoors.(ICCV03 Figure 3). Top. The solid line represents the true location, and the dots represent the posterior probability associated with each location. There are 63 possible locations, but we only show those with non negligible probability mass. Middle. Estimated category of each location. Bottom. Estimated probability of being indoors or outdoors.

Some images from the dataset.

Publications

Movies

  • AVI of place recognitionusing wearable camera.If P(place-category(t)|vG(1:t)) > threshold, we print the category ofthe place (office, kitchen, etc) in the top right corner(black = correct, red = incorrect).If P(place(t)|vG(1:t)) > threshold, we print the name of the specificplace (office 101, kitchen #3, etc) in the bottom right corner(black = correct, red = incorrect).

  • AVI of place recognitionusing wearable camera. This one shows the HMM belief statesuperimposed on a topological map.
    Text output is the same as above movie.The bottom half shows a map of the 9th floor of the AI lab (NE43).Blue solid circle indicates P(place(t)|vG(1:t)) as computed using the HMM;black hollow circle indicates P(place(t)|vG(t)) as computed using theinstantaneous gist;red/green cross = true location.The size of the circles is proportional to the probability.Notice how the HMM provides temporal smoothing.Nevertheless, there are discontinuous jumps, which apparently violatetopological constraints, because we apply Dirichlet smoothing to thetransition matrix. This effect can be reduced (at the cost ofincreased latency upon moving to a new location) by down-weighting thelikelihood by an exponential factor (see equation for \tilde{b}_t onp4 of ICCV paper).

  • WMV movie which shows how Dan Roth ported our place recognition system to anER1 mobile robot.

Data

  • The video data used to generate the results in Figure 3 of theICCV03 paper is availableas part of theMIT CSAILdatabase of object and scenes.Look for the folder called "paperSequence".

  • The matlab file here contains the80 dimensional gist vectors for the video sequence, and the placenumbers and names:

    placeNames: {1x20 cell}     placeNums: [1x3430 double]         gists: [80x3430 double]

    If you typeplot(foo.placeNums,'o-')the results look slightly different from Figure 3, since the names ofthe places were changed somewhat. But it is qualitatively similar.Note that although we considered 63 places in the ICCV03 paper, only20 occur in this particular sequence.

  • The file gistsICCV03.zip (14MB) contains17 files, similar to the above, for the 17 video sequences used in the ICCV03 paper (see here for the list of files used fortraining and testing).
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
3dRR11 Workshop (3dRR 11)
素材库各种办公场合.84 Business & Office scenes
那些年何恺明在顶会上的分享
显著目标检测的研究思路
几个常见的语音交互平台的简介和比较
机器视觉会议
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服