[DL輪読会]Zero shot visual imitation

260 Views

February 02, 18

#deep learning

スライド概要

2018/2/2
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 90.7K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 67.4K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 61.1K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 49.4K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 47.1K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 46.9K

各ページのテキスト

1 DEEP LEARNING JP [DL Papers] “Zero-Shot Visual Imitation (ICLR2018)” Zero-Shot Iori Yanokura, JSK Lab Visual Imitation (ICLR 2018) http://deeplearning.jp/

http://deeplearning.jp/

2 • ICLR 2018 accepted • Project page: https://sites.google.com/view/zero-shot-visual-imitation/home • Reviews: 8 (confidence: 4), 8 (confidence: 3), 7 (confidence: 5) = rank of 986 / 1000 = top 2% • • • • : Deepak Pathak, Parsa Mahmoudieh, Michael Luo, Pulkit Agrawal, Dian Chen,Fred Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell

3 • • • • Reward( ) Reward Engineering... (Sutton&Barto, 1998)

4 (Imitation Learning) 2 • •

5 • • • •

6 • One-shot imitation learning (Duan et al., 2017) - 1

7 (Visual Demonstration) • • • (Nair et al., 2017) Sermanet et al., 2016) (Liu et al., 2017)

8 • •

9 V Policy goal • Universal value functions (Schaul et al., 2015) - V(s;θ) • Hindsight experience replay (Andrychowicz et al., 2017) -

10.

10 3. LEARNING TO IMITATE WITHOUT EXPERT SUPERVISION • • • • •

11.

11 • Imitation Learning - • Visual Demonstration - • Forward and Inverse Dynamics - Dynamics model multiple steps policy • Goal Conditioning - zero-shot

12.

12 Forward dynamics

13.

13 3.1 LEARNING THE PARAMETRIC SKILL FUNCTION • • • • •

14.

14 3.1 LEARNING THE PARAMETRIC SKILL FUNCTION • One-step (st, at, st+1) Ground-truth • Cross-entropy loss SGD •

15.

15 3.2 MODELING ACTION DISTRIBUTION VIA FORWARD CONSISTENCY • State • RNN model-free action state forward dynamics

16.

16 3.2 MODELING ACTION DISTRIBUTION VIA FORWARD CONSISTENCY • θh, θf forward dynamics f

17.

17 3.3 GOAL SATISFACTION • • •

18.

18 4 Results • Navigation • • • PSF

19.

19 4.1 NAVIGATION IN INDOOR OFFICE ENVIRONMENT

20.

20 4.1 NAVIGATION IN INDOOR OFFICE ENVIRONMENT (Result) https://www.youtube.com/watch?v=ynfVRM27YFU https://www.youtube.com/watch?time_continue=3&v=OwvnqjgUqc8

21.

21 4.2 VISION - BASED ROPE MANIPULATION Task: State: RGB Action: ( ) (Nair et al., 2017)

22.

22 4.2 VISION - BASED ROPE MANIPULATION(Result) https://www.youtube.com/watch?v=YlaojV XHagM

https://www.youtube.com/watch?v=YlaojV

23.

23 5 CONCLUSION • • • • (Andrychowicz et al., 2017)

24.

24 • RL • •

25.

25 Reference Ashvin Nair, Dian Chen, Pulkit Agrawal, Phillip Isola, Pieter Abbeel, Jitendra Malik, and Sergey Levine. Combining selfsupervised learning and imitation for vision-based rope manipulation. ICRA, 2017. Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In ICML, 2017. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. arXiv preprint arXiv:1707.01495, 2017. Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. Universal value function approxima- tors. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 1312–1320, 2015. Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever,Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. arXiv preprint arXiv:1703.07326, 2017. Pierre Sermanet, Kelvin Xu, and Sergey Levine. Unsupervised perceptual rewards for imitation learning. arXiv preprint arXiv:1612.06699, 2016. Bradly C Stadie, Pieter Abbeel, and Ilya Sutskever. Third-person imitation learning. arXiv preprint arXiv:1703.01703, 2017. YuXuan Liu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. Imitation from observation: Learn- ing to imitate behaviors from raw video via context translation. arXiv preprint arXiv:1707.03374,2017. Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. In NIPS, 2015. Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information processing systems, pp. 305–313, 1989. Sutton, Richard S., and Andrew G. Barto. Introduction to reinforcement learning. Vol. 135. Cambridge: MIT press, 1998.