[DL輪読会]Composable Deep Reinforcement Learning for Robotic Manipulation

>100 Views

April 06, 18

#deep learning #Deep Learning #Robotic Manipulation #Reinforcement Learning #Maximum Entropy RL #Soft Q-Learning

スライド概要

2018/04/06
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 92.4K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 71.7K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 61.6K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 55.2K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 52.2K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 50.2K

各ページのテキスト

1 DEEP LEARNING JP [DL Papers] “Composable Deep Reinforcement Learning for Robotic Manipulation” Zero-Shot Iori Yanokura Visual Imitation (ICLR 2018) http://deeplearning.jp/

http://deeplearning.jp/

[beta]

'
•

2



L(
• UC BerkeleyLSergey LevineLT\^Y

• Project page
• https://sites.google.com/view/composing-real-world-policies/
• https://github.com/haarnoja/softqlearning

• !
• Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, and Sergey Levine

• #
• 2<6@ 1-/53>;8;7L&%
•
]ZVWN)FQ.<:=<?8;7,0<984ALR
• *


- R

IL]ZVWN)EHBQ

• U[SXKGBH
- "MJ$DJB+P(CO

Maximum Entropy RL Maximum Entropy Entropy http://bair.berkeley.edu/blog/2017/10/06/soft-q-learning/ (Ziebart 2010) 3

http://bair.berkeley.edu/blog/2017/10/06/soft-q-learning/

4 Soft Q-Learning Q Soft Bellman Equation (Haarnoja, 2017)

[beta]

COMPOSITIONALITY OF MAXIMUM
ENTROPY POLICIES
• Compositionality

• Maximum Entropy RL)=62*? 5!
• 7policyB;A/2.&EDCB*,4B
• Multi-objective settings35 
(9
IHFGB .2B-/? 
1178,@=7B"#6-/?,43%>
<'J178 :0847B$+?,4

5

[beta]

COMPOSITIONALITY OF MAXIMUM
ENTROPY POLICIES

6

DCB%
DCB;&:

#55+>

:AFGHE268<?0C9 .4
 #@"/>-69=34.4'1*$9
5;7(/>)I
"Q$@Q$:

6/>

Qc*::!
:RL26-:QΣ*7:Qc*9!'):

*5+4'8'

+,8>

Bounding the Sub-Optimality of Composed Policies Appendix A KL-divergence Qc*overestimate 7

Bounding the Sub-Optimality of Composed Policies D , γC* 8

9 Experiments • Simulation • MuJoCo • 7-DoF Sawyer Robot • Actions • Torque command at each joint • Observations • • Q-function, policy • 100 or 200 unit

10.

B. Composing Policies for Pushing in Simulation • MuJoCo • / • 10

11.

C. SQL for Real-World Manipulation • Sawyer" • • • • Reaching Stack policy Avoid an obstacle 2compositionality of soft policies Reaching Lego Stacking 2 ('$& ( %$#" !. RL) 11

12.

C. SQL for Real-World Manipulation compositionality of soft policies Avoid a fixed obstacle policy Lego Stacking policy || Composing Policies https://www.youtube.com/watch?time_continue=10&v=wdexoLS2cWU 12

https://www.youtube.com/watch?time_continue=10&v=wdexoLS2cWU

13.

[beta]

13

Conclusion
• Soft Q-learning0
?Model-free RL(DDPG, NAF)FE3;F47
• SQL?Composing Policy?
•

• 
•
?


• BD

=!)'*)+%($:@.A2-2?/

• 49<? =LKI0:1E?/

• ">MHGKFC5F
• JNOPQL/C

:1E/

47*)&%#,F89

:#)'*)+%($:16.