>100 Views
April 06, 18
スライド概要
2018/04/06
Deep Learning JP:
http://deeplearning.jp/seminar-2/
DL輪読会資料
1 DEEP LEARNING JP [DL Papers] “Composable Deep Reinforcement Learning for Robotic Manipulation” Zero-Shot Iori Yanokura Visual Imitation (ICLR 2018) http://deeplearning.jp/
'
•
2
L(
• UC BerkeleyLSergey LevineLT\^Y
• Project page
• https://sites.google.com/view/composing-real-world-policies/
• https://github.com/haarnoja/softqlearning
• !
• Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, and Sergey Levine
• #
• 2<6@ 1-/53>;8;7L&%
•
]ZVWN)FQ.<:=<?8;7,0<984ALR
• *
- R
IL]ZVWN)EHBQ
• U[SXKGBH
- "MJ$DJB+P(CO
Maximum Entropy RL Maximum Entropy Entropy http://bair.berkeley.edu/blog/2017/10/06/soft-q-learning/ (Ziebart 2010) 3
4 Soft Q-Learning Q Soft Bellman Equation (Haarnoja, 2017)
COMPOSITIONALITY OF MAXIMUM
ENTROPY POLICIES
• Compositionality
• Maximum Entropy RL)=62*? 5!
• 7policyB;A/2.&EDCB*,4B
• Multi-objective settings35
(9
IHFGB .2B-/?
1178,@=7B"#6-/?,43%>
<'J178 :0847B$+?,4
5
COMPOSITIONALITY OF MAXIMUM
ENTROPY POLICIES
6
DCB%
DCB;&:
#55+>
:AFGHE268<?0C9 .4
#@"/>-69=34.4'1*$9
5;7(/>)I
"Q$@Q$:
6/>
Qc*::!
:RL26-:QΣ*7:Qc*9!'):
*5+4'8'
+,8>
Bounding the Sub-Optimality of Composed Policies Appendix A KL-divergence Qc*overestimate 7
Bounding the Sub-Optimality of Composed Policies D , γC* 8
9 Experiments • Simulation • MuJoCo • 7-DoF Sawyer Robot • Actions • Torque command at each joint • Observations • • Q-function, policy • 100 or 200 unit
B. Composing Policies for Pushing in Simulation • MuJoCo • / • 10
C. SQL for Real-World Manipulation • Sawyer" • • • • Reaching Stack policy Avoid an obstacle 2compositionality of soft policies Reaching Lego Stacking 2 ('$& ( %$#" !. RL) 11
C. SQL for Real-World Manipulation compositionality of soft policies Avoid a fixed obstacle policy Lego Stacking policy || Composing Policies https://www.youtube.com/watch?time_continue=10&v=wdexoLS2cWU 12
13
Conclusion
• Soft Q-learning0
?Model-free RL(DDPG, NAF)FE3;F47
• SQL?Composing Policy?
•
•
•
?
• BD
=!)'*)+%($:@.A2-2?/
• 49<? =LKI0:1E?/
• ">MHGKFC5F
• JNOPQL/C
:1E/
47*)&%#,F89
:#)'*)+%($:16.