163 Views
March 13, 20
スライド概要
2020/03/13
Deep Learning JP:
http://deeplearning.jp/seminar-2/
DL輪読会資料
ClearGrasp 2020/03/13 Joji Toyama
Introduction • • Recognizing 3D geometry of objects are important to automate human operation. ex. Grasping However recognizing 3D geometry of transparent objects are difficult. – Because of the specular highlights or reflecting back from the surface behind the object. • ClearGrasp proposed the method to recognize 3D geometry of transparent objects from a single RGB-D Image. – By using Sim2Real techniques and proposing the CNN architecture for the task. 2
Supplement: Errors in depth estimation for transparent objects Transparent objects occur erros in depth estimation from RGB-D camera. – Type I: erros are caused by specular highlights. – Type II: erros are caused by reflecting back from the surface behind the object. 3
Sim2Real: Learning from synthetic data. • • To train NN for image recognition, we need a lot of images and labels, which is laborious, costly and time consuming. Image synthetic techniques can generate a lot of images with labels. real data synthetic data 4
Related work of Sim2Real • • Synthetic data was used in various tasks, but the research which concern transparent objects is few. Transparent-included dataset wasn’t used for 3D reconstruction. 5
Synthetic data was used in various tasks, but the research which concern transparent objects is few. Synthetic dataset was used for grasping objects. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World Synthetic dataset was used for control humanoid hand. Learning Dexterous In-Hand Manipulation 6
Transparent-included dataset wasn’t used for 3D reconstruction. Synthetic dataset which contain transparent object was used for refractive flow estimation, semantic segmentation, or relative depth. reflective flow estimation semantic segmentation TOM-Net: Learning Transparent Object Matting from a Single Image Material-Based Segmentation of Objects relative path Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks 7
Paper Information • Title: ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation • Authors: Shreeyak S. Sajjan, Matthew Moore, Mike Pan, Ganesh Nagaraja, Johnny Lee, Andy Zeng, Shuran Song • Institutes: Synthesis.ai, Google, Columbia University. • Research Page: https://sites.google.com/view/cleargrasp • Dataset: https://sites.google.com/view/cleargrasp/data?authuser=0 • Code: https://github.com/Shreeyak/cleargrasp • Publication: ICRA 2020 • Blog: https://ai.googleblog.com/2020/02/learning-to-see-transparent-objects.html 8
Results of ClearGrasp 9
Abstract • Created synthetic and real 3D geometry dataset of transparent objects. • Propose an architecture to infer accurate 3D geometry of transparent objects from a single RGBD image. 10
ClearGrasp dataset: Synthetic dataset • • • • • 50,000 photorealistic renders. Include surface normals, segmentation masks edges, and depth. Each image contains up to 5 transparent objects. Transparent objects are on a flat ground or inside a tote with various backgrounds and lighting. Synthetic Images are generated from Blender physics engine and rendering engine. 11
ClearGrasp dataset: Real dataset • Real dataset – 286 images. – Include RGB-Images and Depth Images. – Each image contains up to 6 transparent objects with an average of 2 objects per image. – Transparent objects were placed in the scene along with various random opaque objects like cardboard boxes, decorative mantelpieces and fruits. – First they put spray painted objects same as transparent ones to get depth and replaced using GUI app to put sub-millimeter accuracy can be achieved in the positioning of the objects. 12
Overview of proposed CNN architecture 13
Method: RGB Image→surface normals/mask/occulusion boundaries • Transparent object segmentation – outputs: the pixel-wise masks of transparent objects. • Surface Normal estimation – output: surface normal (3 dims), L2 normalized • Boundary detection – output: each pixel labels of the input images(Non-Edge/Occlusion Boundary/Contact Edges) ※ all network architectures are DeepLabv3 + DRN-D-54 14
Method: Global optimization • • • loss function is written in right. notation – 𝐸𝐷 : the distance between the estimated depth 𝐷(𝑝) and the observed raw depth 𝐷0 (𝑝) at pixel 𝑝. – 𝐸𝑁: measures the consistency between the estimated depth and the predicted surface normal 𝑁(𝑝). – 𝐸𝑆 : encourages adjacent pixels to have the same depths. – 𝐵 ∈ [0, 1] downweights the normal terms based on the predicted probability a pixel is on an occlusion boundary (𝐵(𝑝)). the matrix form of the system of equations is sparse and symmetric positive definite, we can solve it efficiently with a sparse Cholesky factorization ※this method is proposed in “Deep depth completion of a single rgb-d image.” (CVPR 2018) 15
Results • • Results on real-world images and novel object shapes. Robot manipulation 16
Experiment Setting • • Dataset Notation – Syn-train: Synthetic training set with 5 objects – Syn-known: Synthetic validation set for training objects. – Syn-novel: Synthetic test set of 4 novel objects. – MP+SN: Out-of-domain real-world RGB-D datasets of indoor scenes that do not contain transparent objects’ depth (Matterport3D [7] and ScanNet [11]). – Real-known: Real-world test set for all 5 of the training objects. – Real-novel: Real world test set of 5 novel objects, including 3 not present in synthetic data. Metrics – RMSE: the Root Mean Squared Error in meters. – Rel: the median error relative to the depth – percentages of pixels with predicted depths falling within an interval ([δ = |predicted − true|/true], where δ is 1.05, 1.10 or 1.25) 17
Results on real-world images and novel object shapes (quantitive results) • • Generalization: real-world images – achieved similar RMSE and Rel scores on real-world domain. Generalization: novel-object shapes – able to generalize to previously unseen object shapes. 18
Results on real-world images and novel object shapes • qualitative results. 19
Robot manipulation • • • Enviroment Setting – a pile of 3 to 5 transparent objects are presented on a table. – suction and a parallel-jaw gripper are tested as end-effectors. – For each end-effector type, with and without Clear-Grasp, we train a grasping algorithm using 500 trial and error grasping attempts, then test it with 50 attempts. – picking algorithm is same as “Robotic Pick-and-Place of Novel Objects in Clutter with MultiAffordance Grasping and Cross-Domain Image Matching”(ICRA2017) Metrics – success rate = # successful picks / #picking attemtps Results end-effectors wo Clear-Grasp w Clear-Grasp suction 64% 86% parallel-jaw 12% 72% 20
Picking algorithm • FCN infer pixel-wise suction or grasp success probability from rotated heightmaps generated from RGB-D images. 21
Conclusion • ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation recovered 3D geometry of transparent objects by – Created synthetic and real 3D geometry dataset of transparent objects. – Propose an architecture to infer accurate 3D geometry of transparent objects from a single RGB-D image. • We can utilize these ideas in our research especially in – sim2real computer vision. – research or development which use depth camera. 22
Refrences • ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation – https://arxiv.org/abs/1910.02550 • Soccer On Your Tabletop – http://grail.cs.washington.edu/projects/soccer/ • Semantic Scene Completion from a Single Depth Image – https://arxiv.org/pdf/1611.08974.pdf • • • • TOM-Net: Learning Transparent Object Matting from a Single Image – http://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_TOMNet_Learning_Transparent_CVPR_2018_paper.pdf Material-Based Segmentation of Objects – https://cseweb.ucsd.edu/~mkchandraker/pdf/wacv19_transparent.pdf Single-Shot Analysis of Refractive Shape Using Convolutional Neural Networks – http://people.compute.dtu.dk/jerf/papers/matseg.pdf A Geodesic Active Contour Framework for Finding Glass – http://isda.ncsa.illinois.edu/~kmchenry/documents/cvpr06a.pdf 23
Refrences • • • • Friend or foe: exploiting sensor failures for transparent object localization and classification. – https://www.uni-koblenz.de/~agas/Public/Seib2017FOF.pdf Glass Object Localization by Joint Inference of Boundary and Depth – https://xmhe.bitbucket.io/papers/icpr12.pdf A Fixed Viewpoint Approach for Dense Reconstruction of Transparent Objects – https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/2B_072.pdf Seeing Glassware: from Edge Detection to Pose Estimation and Shape Recovery – http://www.roboticsproceedings.org/rss12/p21.pdf • Learning Dexterous In-Hand Manipulation • – https://arxiv.org/abs/1808.00177 Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World – https://arxiv.org/abs/1703.06907 24