[DL輪読会]Real-Time Semantic Stereo Matching

1K Views

December 13, 19

#deep learning #Stereo Matching #Segmentation #Real-Time Processing #Sugisaki Hiroaki #Semantic Stereo Matching

スライド概要

2019/12/13
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 92.4K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 71.7K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 61.6K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 55.2K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 52.2K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 50.3K

各ページのテキスト

Real-Time Semantic Stereo Matching Sugisaki Hiroaki (B3, Sophia Univ) 1

内容 ● Real-Time Semantic Stereo Matching ○ ○ ○ ○ ○ ○ https://arxiv.org/abs/1910.00541 Pier Luigi Dovesi, Matteo Poggi, Lorenzo Andraghetti, Miquel Mart´, Hedvig Kjellstrom¨, Alessandro Pieropan, Stefano Mattoccia 2019/10 RTS2Net ■ セグメンテーションを取り入れたステレオマッチングモデル ■ ステレオマッチングを取り入れたセグメンテーションモデル SOTAの複雑でコストのかかるモデルに比べて、計算効率を高めた、比較的シンプルなモデルの提案モデルの軽量化と精度のトレードオフを柔軟に切り替え 2

https://arxiv.org/abs/1910.00541

ステレオマッチングの前提知識 ● ステレオカメラ ○ 対象物を複数の異なる方向から同時に撮影することのできるカメラ ● ステレオマッチング ○ ○ ステレオカメラによって撮影された画像の視差(disparity)を求めることまた求めた視差から三角測量を用いて被写体の深度を求めること. 左 - ステレオカメラ - Wikipedia : https://ja.wikipedia.org/wiki/%E3%82%B9%E3% 83% 86% E3% 83%AC%E3%82%AA%E3%82%AB%E3%83%A1%E3% 83% A9 2003 Stereo Datasets : http://vision.middlebury.edu/stereo/data/sc enes2003/ 右 3 3

Disparity Map 右 ● Disparity ○ ○ 被写体の対応点の左右画像におけるピクセル差. 対称点の視差が大きいほど手前にある. ● Disparity Map ○ 各ピクセルにおける視差の大きさを２次元画像として表したもの. Disparity Map - 2003 Stereo Datasets : http://vision.middlebury.edu/stereo/data/sc enes2003/ 左 Disparity 4

http://vision.middlebury.edu/stereo/data/scenes2003/

背景 ● ステレオマッチングとセグメンテーションは相互に関連がある ○ 光の反射などによる色合いの変化を苦手とするステレオマッチングにセグメンテーション情報を与えることで、同一物に含まれる画素であることを知ることができる ● 先行研究により2つを関連付けたモデルは提案されている[8],[9] ○ モデルが複雑であり、推論速度も現実的ではない ● 軽量でシンプルなモデルの提案 5

提案モデル 6

Encoder ● 入力画像をエンコーダで解像度別に分割 ● ハイパーパラメータcを用いて分割する解像度を調整 7

Segmentation ● 低解像度の出力をupsampleして高解像度で補正していく 8

Disparity ● Cost Convolution 9

10.

Refinement ● disparity mapとセグメンテーションの行列をconcateと upsampleでrefineしていく ● 2つをembeddingする手法が有効である先行研究あり [8],[9] 10

11.

実験 ● データセット ○ KITTI 2015 ● AnyNetとの比較 ○ ○ AnyNet : RTS2Netのもとになっており、ステレオマッチング部のみを持つモデル hyper-parameter : c を変更しての比較 ● KITTI 2015 Online Benchmark 比較 ○ Stereo Matching ○ Segmentation 11

12.

実験結果 (AnyNetとの比較) ● 評価指標 ○ ○ EPE (end-point-error) ■ disparityの誤差が3ピクセル以上である割合 D1-all ■ disparity誤差の割合 mIOU ■ mean intersection of unit pAcc ■ per-pixel accuracy ○ ○ ● 結果 ○ ○ c=1のときはAnyNetに近い精度 c=16に近づけることでより高精度 12

13.

KITTI 2015 Online Benchmark ● Stereo ○ Real-timeフレームワークであるMADNet やStereoNetによりも精度が良い ● Segmentation ○ Semantic Stereo系であるSegStereoの30 倍速い 13

14.

まとめ ● ハイパーパラメータを変えることで柔軟に軽量化をはかることができる ● Real-Time系のSOTAに近い精度を出すことができている 14