proposed a multi-view consistency layer, which can use multi-view context information for object-class segmentation from multiple RGB-D views. DA-RNN employs a recurrent neural network to tightly combine the information contained in multiple viewpoints of an RGB-D video stream to improve the semantic segmentation performance. proposed a data-associated recurrent neural network (DA-RNN) integrated with Kinect Fusion for 3D semantic mapping. However, they do not take full advantage of the associated information provided by multiple viewpoints of a scene. proposed a real-time dense monocular CNN-SLAM method, which can perform depth prediction and semantic segmentation simultaneously from a single image using a deep neural network.Īll the above methods mainly focus on semantic segmentation using a single image and perform 3D label refinement through a recursive Bayesian update using a sequence of images. It integrates CRF-RNN with RGB-D SLAM, and a post-processing optimization stage is not required. proposed the first system to perform simultaneous 3D mapping and pixel-wise material recognition. ![]() All of these three methods require fully connected CRF optimization as an offline post-processing stage, i.e., the best performing semantic mapping methods are not capable of online operation. Because of the state-of-the-art performance provided by the CNN-based scene understanding, SemanticFusion integrates deconvolutional neural networks with ElasticFusion to obtain a real-time-capable (25 Hz) semantic mapping system. Then, all of the semantically-labelled images are associated together using visual odometry to generate the semantic map. For the second type of methods, both approaches adopt human-designed features with random decision forests to perform per-pixel label predictions of the incoming RGB videos. The approach is limited to situations where repeated and identical objects are present for semantic mapping. The first type of methods such as SLAM++ can only recognize known 3D objects in a predefined database. To the best of our knowledge, the online dense 3D semantic mapping methods can be further grouped into three main sub-categories: semantic mapping based on 3D template matching, 2D/2.5D semantic segmentation and RGB-D data association from multiple viewpoints. The main contributions of this paper can be summarized as follows: In this paper, we propose a dense RGB-D semantic mapping system with a Pixel-Voxel neural network, which can perform dense 3D mapping, while simultaneously recognizing and semantically labelling each point in the 3D map. To date, there are no existing methods that make use of both RGB and point cloud data for semantic mapping. Thus, these methods are not yet integrated in real-time systems for robotics applications.Ĭompared to the well-investigated research on geometric 3D reconstruction and scene understanding, limited literature is available for 3D semantic mapping. However, some of those methods are painfully slow due to their high computational demands. ![]() RGB, RGB-D and point cloud data have been successfully utilized for semantic segmentation. On the contrary, impressive results in semantic segmentation have been achieved with the advancement of convolutional neural networks (CNN). ![]() However, these 3D maps contain no semantic-level understanding of the observed scenes. For intelligent mobile robotics applications, extending 3D mapping to 3D semantic mapping enables robots not only to localize themselves with respect to the scene’s geometrical features, but also to simultaneously understand the higher-level semantic meaning of a complex scene.Ī variety of well-known methods such as RGB-D SLAM, Kinect Fusion and ElasticFusion can generate a dense or semi-dense 3D map from RGB-D videos. It enables robots to perform advanced tasks requiring high precision, such as nuclear waste classification and sorting or autonomous package delivery in warehouse environments. The semantic information provided with a 3D dense map is more useful than the geometric information itself in robot-human or robot-environment interaction. Real-time 3D semantic mapping is often desired in a number of robotics applications, such as localization, semantic navigation and human-aware navigation.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |