To Örebro University

oru.seÖrebro University Publications
Change search
Link to record
Permanent link

Direct link
Hoang, Dinh-Cuong
Publications (5 of 5) Show all publications
Hoang, D.-C., Stork, J. A. & Stoyanov, T. (2022). Context-Aware Grasp Generation in Cluttered Scenes. In: 2022 International Conference on Robotics and Automation (ICRA): . Paper presented at IEEE International Conference on Robotics and Automation (ICRA 2022), Philadelphia, USA, May 23-27, 2022 (pp. 1492-1498). IEEE
Open this publication in new window or tab >>Context-Aware Grasp Generation in Cluttered Scenes
2022 (English)In: 2022 International Conference on Robotics and Automation (ICRA), IEEE, 2022, p. 1492-1498Conference paper, Published paper (Refereed)
Abstract [en]

Conventional methods to autonomous grasping rely on a pre-computed database with known objects to synthesize grasps, which is not possible for novel objects. On the other hand, recently proposed deep learning-based approaches have demonstrated the ability to generalize grasp for unknown objects. However, grasp generation still remains a challenging problem, especially in cluttered environments under partial occlusion. In this work, we propose an end-to-end deep learning approach for generating 6-DOF collision-free grasps given a 3D scene point cloud. To build robustness to occlusion, the proposed model generates candidates by casting votes and accumulating evidence for feasible grasp configurations. We exploit contextual information by encoding the dependency of objects in the scene into features to boost the performance of grasp generation. The contextual information enables our model to increase the likelihood that the generated grasps are collision-free. Our experimental results confirm that the proposed system performs favorably in terms of predicting object grasps in cluttered environments in comparison to the current state of the art methods.

Place, publisher, year, edition, pages
IEEE, 2022
National Category
Computer Sciences
Identifiers
urn:nbn:se:oru:diva-98437 (URN)10.1109/ICRA46639.2022.9811371 (DOI)000941265701005 ()2-s2.0-85136323876 (Scopus ID)9781728196824 (ISBN)9781728196817 (ISBN)
Conference
IEEE International Conference on Robotics and Automation (ICRA 2022), Philadelphia, USA, May 23-27, 2022
Funder
EU, Horizon 2020, 101017274 (DARKO)
Available from: 2022-04-01 Created: 2022-04-01 Last updated: 2023-05-03Bibliographically approved
Hoang, D.-C. (2021). Vision-based Perception For Autonomous Robotic Manipulation. (Doctoral dissertation). Örebro: Örebro University
Open this publication in new window or tab >>Vision-based Perception For Autonomous Robotic Manipulation
2021 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

In order to safely and effectively operate in real-world unstructured environments where a priori knowledge of the surroundings is not available, robots must have adequate perceptual capabilities. This thesis is concerned with several important aspects of vision-based perception for autonomous robotic manipulation. With a focus on topics related to scene reconstruction, object pose estimation and grasp configuration generation, we aim at helping robots to better understand their surroundings, to avoid undesirable contacts with the environment and to accurately grasp selected objects.

With the wide availability of affordable RGB-D cameras, research on visual SLAM (Simultaneous Localization and Mapping) or scene reconstruction has made giant strides in development. As a key element of an RGB-D reconstruction system, a large number of registration algorithms have been proposed in the context of RGB-D Tracking and Mapping (TAM). The state-of-the-art methods rely on color and depth information to track camera poses. Besides depth and color images, semantic information is now often available due to the advancement of image segmentation driven by deep learning. We are interested to explore to what extent the use of semantic cues can increase the robustness of camera pose tracking. This leads to the first contribution of this dissertation. A method for reliable camera tracking using an objective function that combines geometric, appearance, and semantic cues with adaptive weights.

Beyond the purely geometric model of the environment produced by classical reconstruction systems, the inclusion of rich semantic information and 6D poses of object instances within a dense map is useful for robots to effectively operate and interact with objects. Therefore, the second contribution of this thesis is an approach for recognizing objects present in a scene and estimating their full pose by means of an accurate 3D semantic reconstruction. Our framework deploys simultaneously a 3D mapping algorithm to reconstruct a semantic model of the environment, and an incremental 6D object pose recovery algorithm that carries out predictions using the reconstructed model. We demonstrate that we can exploit multiple viewpoints around the same object to achieve robust and stable 6D pose estimation in the presence of heavy clutter and occlusion.

The methods taking RGB-D images as input have achieved state-of-the-art performance on the object pose estimation task. However, in a number of cases, color information may not be available — for example, when the input is point cloud data from laser range finders or industrial high-resolution 3D sensors. Therefore, besides methods using RGB-D images, studies on recovering the 6D pose of rigid objects from 3D point clouds containing only geometric information are necessary. The third contribution of this dissertation is a novel deep learning architecture to address the problem of estimating the 6D pose of multiple rigid objects in a cluttered scene, using only a 3D point cloud of the scene as an input. The proposed architecture pools geometric features together using a self-attention mechanism and adopts a deep Hough voting scheme for pose proposal generation. We show that by exploiting the correlation between poses of object instances and object parts we can improve the performance of object pose estimation.

By applying a 6D object pose estimation algorithm, robots can perform grasping known objects where the 3D model of objects is available and a grasp database is pre-defined. What if we want to grasp novel objects? The fourth contribution of this thesis is a method for robust manipulation of novel objects in cluttered environments. we develop an end-to-end deep learning approach for generating grasp configurations for a two-finger parallel jaw gripper, based on 3D point cloud observations of the scene. The proposed model generates candidates by casting votes to accumulate evidence for feasible grasp configurations. We exploit contextual information by encoding the dependency of objects in the scene into features to boost the performance of grasp generation.

Place, publisher, year, edition, pages
Örebro: Örebro University, 2021. p. 140
Series
Örebro Studies in Technology, ISSN 1650-8580 ; 93
National Category
Computer Sciences
Identifiers
urn:nbn:se:oru:diva-95168 (URN)9789175294148 (ISBN)
Public defence
2021-12-17, Örebro universitet, Långhuset, Hörsal L3, Fakultetsgatan 1, Örebro, 09:00 (English)
Opponent
Supervisors
Available from: 2021-10-25 Created: 2021-10-25 Last updated: 2021-11-25Bibliographically approved
Hoang, D.-C., Lilienthal, A. & Stoyanov, T. (2020). Object-RPE: Dense 3D Reconstruction and Pose Estimation with Convolutional Neural Networks. Robotics and Autonomous Systems, 133, Article ID 103632.
Open this publication in new window or tab >>Object-RPE: Dense 3D Reconstruction and Pose Estimation with Convolutional Neural Networks
2020 (English)In: Robotics and Autonomous Systems, ISSN 0921-8890, E-ISSN 1872-793X, Vol. 133, article id 103632Article in journal (Refereed) Published
Abstract [en]

We present an approach for recognizing objects present in a scene and estimating their full pose by means of an accurate 3D instance-aware semantic reconstruction. Our framework couples convolutional neural networks (CNNs) and a state-of-the-art dense Simultaneous Localisation and Mapping(SLAM) system, ElasticFusion [1], to achieve both high-quality semantic reconstruction as well as robust 6D pose estimation for relevant objects. We leverage the pipeline of ElasticFusion as a back-bone and propose a joint geometric and photometric error function with per-pixel adaptive weights. While the main trend in CNN-based 6D pose estimation has been to infer an object’s position and orientation from single views of the scene, our approach explores performing pose estimation from multiple viewpoints, under the conjecture that combining multiple predictions can improve the robustness of an object detection system. The resulting system is capable of producing high-quality instance-aware semantic reconstructions of room-sized environments, as well as accurately detecting objects and their 6D poses. The developed method has been verified through extensive experiments on different datasets. Experimental results confirmed that the proposed system achieves improvements over state-of-the-art methods in terms of surface reconstruction and object pose prediction. Our code and video are available at https://sites.google.com/view/object-rpe.

Place, publisher, year, edition, pages
Elsevier, 2020
Keywords
Object pose estimation, 3D reconstruction, Semantic mapping, 3D registration
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:oru:diva-88209 (URN)10.1016/j.robot.2020.103632 (DOI)000558081900023 ()2-s2.0-85090016097 (Scopus ID)
Funder
EU, Horizon 2020
Available from: 2020-12-29 Created: 2020-12-29 Last updated: 2021-04-22Bibliographically approved
Hoang, D.-C., Lilienthal, A. & Stoyanov, T. (2020). Panoptic 3D Mapping and Object Pose Estimation Using Adaptively Weighted Semantic Information. IEEE Robotics and Automation Letters, 5(2), 1962-1969
Open this publication in new window or tab >>Panoptic 3D Mapping and Object Pose Estimation Using Adaptively Weighted Semantic Information
2020 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 5, no 2, p. 1962-1969Article in journal (Refereed) Published
Abstract [en]

We present a system capable of reconstructing highly detailed object-level models and estimating the 6D pose of objects by means of an RGB-D camera. In this work, we integrate deep-learning-based semantic segmentation, instance segmentation, and 6D object pose estimation into a state of the art RGB-D mapping system. We leverage the pipeline of ElasticFusion as a backbone and propose modifications of the registration cost function to make full use of the semantic class labels in the process. The proposed objective function features tunable weights for the depth, appearance, and semantic information channels, which are learned from data. A fast semantic segmentation and registration weight prediction convolutional neural network (Fast-RGBD-SSWP) suited to efficient computation is introduced. In addition, our approach explores performing 6D object pose estimation from multiple viewpoints supported by the high-quality reconstruction system. The developed method has been verified through experimental validation on the YCB-Video dataset and a dataset of warehouse objects. Our results confirm that the proposed system performs favorably in terms of surface reconstruction, segmentation quality, and accurate object pose estimation in comparison to other state-of-the-art systems. Our code and video are available at https://sites.google.com/view/panoptic-mope.

Place, publisher, year, edition, pages
IEEE, 2020
Keywords
RGB-D perception, object detection, segmen-tation and categorization, mapping
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:oru:diva-81423 (URN)10.1109/LRA.2020.2970682 (DOI)000526520500038 ()2-s2.0-85079819725 (Scopus ID)
Funder
EU, Horizon 2020
Available from: 2020-04-30 Created: 2020-04-30 Last updated: 2024-01-17Bibliographically approved
Hoang, D.-C., Stoyanov, T. & Lilienthal, A. J. (2019). Object-RPE: Dense 3D Reconstruction and Pose Estimation with Convolutional Neural Networks for Warehouse Robots. In: 2019 European Conference on Mobile Robots, ECMR 2019: Proceedings. Paper presented at 2019 European Conference on Mobile Robots (ECMR), Prague, Czech Republic, September 4-6, 2019. IEEE, Article ID 152970.
Open this publication in new window or tab >>Object-RPE: Dense 3D Reconstruction and Pose Estimation with Convolutional Neural Networks for Warehouse Robots
2019 (English)In: 2019 European Conference on Mobile Robots, ECMR 2019: Proceedings, IEEE, 2019, article id 152970Conference paper, Published paper (Refereed)
Abstract [en]

We present a system for accurate 3D instance-aware semantic reconstruction and 6D pose estimation, using an RGB-D camera. Our framework couples convolutional neural networks (CNNs) and a state-of-the-art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, to achieve both high-quality semantic reconstruction as well as robust 6D pose estimation for relevant objects. The method presented in this paper extends a high-quality instance-aware semantic 3D Mapping system from previous work [1] by adding a 6D object pose estimator. While the main trend in CNN-based 6D pose estimation has been to infer object's position and orientation from single views of the scene, our approach explores performing pose estimation from multiple viewpoints, under the conjecture that combining multiple predictions can improve the robustness of an object detection system. The resulting system is capable of producing high-quality object-aware semantic reconstructions of room-sized environments, as well as accurately detecting objects and their 6D poses. The developed method has been verified through experimental validation on the YCB-Video dataset and a newly collected warehouse object dataset. Experimental results confirmed that the proposed system achieves improvements over state-of-the-art methods in terms of surface reconstruction and object pose prediction. Our code and video are available at https://sites.google.com/view/object-rpe.

Place, publisher, year, edition, pages
IEEE, 2019
National Category
Robotics
Identifiers
urn:nbn:se:oru:diva-78295 (URN)10.1109/ECMR.2019.8870927 (DOI)000558081900023 ()2-s2.0-85074398548 (Scopus ID)978-1-7281-3605-9 (ISBN)
Conference
2019 European Conference on Mobile Robots (ECMR), Prague, Czech Republic, September 4-6, 2019
Available from: 2019-11-29 Created: 2019-11-29 Last updated: 2020-09-16Bibliographically approved
Organisations

Search in DiVA

Show all publications