RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments

1 Problem Statement and Related Work
RGB-D cameras are novel sensing system that capture RGB images along with per-pixel depth information. RGB-D cameras rely on either structured light patterns combined with stereo sensing [6, 10] or time-of-flight laser sensing [1] to generate depth estimates that can be associated with RGB pixels.
RGB-D相机是新颖的传感系统，可以捕获RGB图像以及像素深度信息。 RGB-D摄像机依赖于结合光立体感测[6,10]或飞行时间激光感测[1]的结构光模式来生成可与RGB像素关联的深度估计。
Very soon, small, high-quality RGB-D cameras developed for computer gaming and home entertainment applications will become available at cost below $100.
很快，为计算机游戏和家庭娱乐应用开发的小型高品质RGB-D相机将以低于100美元的价格出售。
In this paper we investigate how such cameras can be used in the context of robotics, specifically for building dense 3D maps of indoor environments.
在本文中，我们将探讨如何将这种摄像机应用于机器人领域，特别是用于构建密集的室内环境3D地图。
Such maps have applications in robot navigation, manipulation, semantic mapping, and telepresence. The robotics and computer vision communities have developed a variety of techniques for 3D mapping based on laser range scans [8, 11], stereo cameras [7], monocular cameras [3], and unsorted collections of photos [4].
这样的地图在机器人导航，操纵，语义映射和远程呈现方面具有应用。机器人和计算机视觉社区开发了基于激光距离扫描[8,11]，立体相机[7]，单目相机[3]和未分类照片集[4]的各种3D映射技术。
While RGB-D cameras provide the opportunity to build 3D maps of unprecedented richness,
RGB-D相机提供了构建空前丰富3D地图的机会
they have drawbacks that make their application to 3D mapping difficult: They provide depth only up to a limited distance(typically less than 5m), depth values are much noisier than those provided by laser scanners,
它们有一些缺点，使它们难以应用于三维地图绘制：它们只能提供有限距离（通常小于5米）的深度，深度值的噪音比激光扫描仪提供的要深得多，
and their field of view (∼ 60 ◦ ) is far more constrained than that of specialized cameras or laser scanners typically used for 3D mapping(∼ 180 ◦ ).
他们的视野（〜60°）比通常用于三维测绘的专用相机或激光扫描仪受到的限制要大得多
In our work, we use a camera developed by PrimeSense [10].
在我们的工作中，我们使用由PrimeSense开发的相机[10]
The key insights of this investigation are: first, that existing frame matching techniques are not sufficient to provide robust visual odometry with these cameras; second, that a tight integration of depth and color information can yield robust frame matching and loop closure detection; third, that building on best practice techniques in SLAM and computer graphics makes it possible to build and visualize accurate and extremely rich 3D maps with such cameras;
这项调查的关键见解是：首先，现有的帧匹配技术不足以提供稳健视觉里程计用这些相机; 其次，深度和颜色信息的紧密结合可以产生稳健的帧匹配和环路闭合检测; 第三，基于SLAM和计算机图形学最佳实践技术的构建使得用这种相机构建和可视化精确且极其丰富的3D地图成为可能;
and, fourth, that it will be feasible to build complete robot navigation and interaction systems solely based on cheap depth cameras
第四，完全基于便宜的深度相机构建完整的机器人导航和交互系统将是可行的

2 Technical Approach

Following best practice in robot mapping, our RGB-D mapping technique consists of three key components: first, the spatial alignment of consecutive data frames; second, the detection of loop closures; and, third, the globally consistent alignment of the complete data sequence.
根据机器人建图的最佳实践，我们的RGB-D建图技术由三个关键组件组成：首先是连续数据帧的空间对齐; 其次，检测环路闭合; 第三，完整数据序列的全局一致性对齐
Alignment between successive frames is computed by jointly optimizing over both appearance and shape matching. Appearance-based alignment is done with RANSAC over SIFT features annotated with 3D position.
通过对外观和形状匹配进行联合优化来计算连续帧之间的对齐。基于外观的对齐通过RANSAC完成，通过SIFT特征注释3D位置
The 3D SIFT matching requires no initial estimate of the relative pose, but can fail if a
frame contains few distinctive visual feature points. Shape-based alignment is performed through ICP using a point-to-plane error metric [2].
3D SIFT匹配不需要相对姿态的初始估计，如果a帧几乎不包含鲜明的视觉特征点，匹配将失败。基于形状的对齐是通过使用点到面误差度量的ICP来执行的[2]
ICP alignment requires a good initial estimate of the relative pose, but allows the full 3D
shape of the data to constrain alignment. In our joint optimization framework, we run 3D SIFT to obtain an initial alignment, and then the inliers from the RANSAC solution are included as fixed point-to-point constraints alongside the point-to-plane constraints from the full point cloud.
ICP校准需要良好的初始估计对于相对姿态，但允许数据的全3D形状限制对齐。在我们的联合优化框架中，我们运行3D SIFT以获得初始对齐，然后将来自RANSAC解决方案的内点作为固定的点对点约束以及来自全点云的点对平面约束
In this way our system can handle situations in which only RGB or shape alone would fail to generate good alignments. Our approach detects loop closures by matching data frames against a subset of previously collected frames using 3D SIFT.
通过这种方式，我们的系统可以处理仅RGB或单独形状无法生成良好对齐的情况。我们的方法通过使用3D SIFT将数据帧与先前收集的帧的子集进行匹配来检测环路闭合。
To generate globally consistent alignments we use TORO, a pose-graph optimization tool developed for robotics SLAM [5]. The overall system can accurately align and map large indoor environments in near real time and is capable of handling extreme situations such as featureless corridors and completely dark rooms.
为了生成全局一致的对齐，我们使用TORO，一种为机器人SLAM开发的姿态图优化工具[5]。整个系统可以近乎实时地准确对齐和映射大型室内环境，并且能够处理极端情况，例如少特征走廊和完全黑暗的房间
Once we have aligned the frames, we build a global map in the form of small colored surface patches called surfels [9]. This representation enables efficient reasoning about occlusions and color for each part of the environment, and provides good visualizations of the resulting model.
一旦我们对齐了框架，我们就建立了全局地图以称为surfels的小型彩色表面贴片的形式[9]。这种表示方式可以有效推理环境中每个部分的遮挡和颜色，并为结果模型提供良好的可视化效果。
Furthermore, surfels automatically adapt the resolution of the representation to the quality and resolution of data available for each patch. A closeup of the surfel representation can
be seen in Fig. 1(c).
此外，surfels会自动调整表示的分辨率以适合每个补丁可用数据的质量和分辨率。图1（c）中可以看到surfel表示的特写。

3 Experiments and Results

We collected data with a person carrying the camera through large single loops in two indoor environments, shown in Fig. 1. In Fig. 2, the resulting 3D maps are overlaid on 2D representations of our indoor environments.
我们通过在两个室内环境中通过大型单回路收集数据，如图1所示。在图2中，生成的3D地图覆盖在我们室内环境的2D表示中。
These experiments were performed in static environments with plentiful visual features in normal daylight. We have also performed experiments in conditions under which SIFT fails catastrophically (e.g. dark rooms) and when ICP alone fails (e.g. panning across a flat wall), while our combined optimization succeeds.

这些实验在正常日光下具有丰富视觉特征的静态环境中进行。我们还在SIFT发生灾难性故障（例如暗室）和ICP单独失效（例如平坦墙壁上的平移）失败的情况下进行实验，而我们的组合优化成功。
Videos of our results can be found at
http://www.cs.washington.edu/robotics/projects/3d-mapping-videos/ .

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。