>100 Views
March 10, 17
スライド概要
2017/3/10
Deep Learning JP:
http://deeplearning.jp/seminar-2/
DL輪読会資料
#SJEHJOHUIF(BQ#FUXFFO 7BMVFBOE1PMJDZ#BTFE 3FJOGPSDFNFOU-FBSOJOH ॳ୩ྯ࣊
ϝλใ w (PPHMF#SBJO w w 0pS/BDIVN
3-ΞϧΰϦζϜͷྨ w 3-ΞϧΰϦζϜେ͖࣍͘ͷ̐ͭʹ͚ΒΕΔ w 0OQPMJDZPS0⒎QPMJDZ w 7BMVFCBTFEPS1PMJDZCBTFE
3-BMHPSJUIN`TNBUSJY 7BMVFCBTFE 1PMJDZCBTFE 0OQPMJDZ 0⒎QPMJDZ 4"34" %2/ 3FUSBDF "$ 5310 %%1( "$&3 0⒎1"$
8IBUJT0OQPMJDZ w ࡏݱͷํࡦͰಘΒΕͨيͷΈΛͬͯɹɹɹɹɹ ΤʔδΣϯτΛߋ৽͢Δ w ࡏݱͷํࡦКʹґଘͨ͠27ΛٻΊΔࣄ͕ଟ͍
3-BMHPSJUIN`TNBUSJY 7BMVFCBTFE 1PMJDZCBTFE 0OQPMJDZ 0⒎QPMJDZ 4"34" %2/ 3FUSBDF "$ 5310 %%1( "$&3 0⒎1"$
4"34" Q π ํࡦΛКʹݻఆͨ͠ͱ͖ͷ2ؔ #FMMNBO&RVBUJPO π π Q (s,a) = r(s,a) + γ Eπ [Q (s ',a')] NJOJNJ[F-GPSВ π θ π θ L = (r(s,a) + γ Q (s ',a') − Q (s,a)) 2
8IBUJT0⒎QPMJDZ w ࡏݱͷํࡦͱҧ͏ํࡦ͔ΒಘΒΕͨيͬͯ ΤʔδΣϯτΛߋ৽͢Δ w ࠷దํࡦʹର͢Δ27ΛٻΊΔ͜ͱ͕ଟ͍
3-BMHPSJUIN`TNBUSJY 7BMVFCBTFE 1PMJDZCBTFE 0OQPMJDZ 0⒎QPMJDZ 4"34" %2/ 3FUSBDF "$ 5310 %%1( "$&3 0⒎1"$
2MFBSOJOH %2/ o Q ࠷దํࡦʹର͢Δ2ؔ #FMMNBO&RVBUJPO Q (s,a) = r(s,a) + γ max Q (s ',a') o o a' NJOJNJ[F-GPSВ L = (r(s,a) + γ max Q (s ',a') − Q (s,a)) a' o θ o θ 2
0OQPMJDZWT0⒎QPMJDZ w w 0OQPMJDZNFUIPE ๏ ߋ৽ࣜʹϚϧνεςοϓΛ͙͢ద༻Ͱ͖Δ º ֶशͷͨΊͷيΛαϯϓϧ͠ͳ͍ͱ͍͚ͳ͍ 0⒎QPMJDZNFUIPE ๏ ࠓ·ͰಘͨيΛશֶͯशʹར༻Ͱ͖Δ º NBYPQFSBUPSͷ͍ͤͰ̍εςοϓ͔͠ߋ৽ʹ͑ͳ͍
0OQPMJDZWT0⒎QPMJDZ w w 0OQPMJDZNFUIPE ๏ ߋ৽ࣜʹϚϧνεςοϓΛ͙͢ద༻Ͱ͖Δ º ֶशͷͨΊͷيΛαϯϓϧ͠ͳ͍ͱ͍͚ͳ͍ 0⒎QPMJDZNFUIPE ๏ ࠓ·ͰಘͨيΛશֶͯशʹར༻Ͱ͖Δ º NBYPQFSBUPSͷ͍ͤͰ̍εςοϓ͔͠ߋ৽ʹ͑ͳ͍
/TUFQ4"34" #FMMNBO&RVBUJPO π π Q (s,a) = r(s,a) + γ Eπ [Q (s ',a')] OTUFQͷيΛ༻͍Δ n−1 L = (∑ γ r(si ,ai ) + γ Q (sn ,an ) − Q (s0 ,a0 )) i n π θ π θ 2 i=0 TUFQΛେ͖͘͢Δ͜ͱͰUBSHFUਪఆͷCJBTΛݮΒ͢
0OQPMJDZWT0⒎QPMJDZ w w 0OQPMJDZNFUIPE ๏ ߋ৽ࣜʹϚϧνεςοϓΛ͙͢ద༻Ͱ͖Δ º ֶशͷͨΊͷيΛαϯϓϧ͠ͳ͍ͱ͍͚ͳ͍ 0⒎QPMJDZNFUIPE ๏ ࠓ·ͰಘͨيΛશֶͯशʹར༻Ͱ͖Δ º NBYPQFSBUPSͷ͍ͤͰ̍εςοϓ͔͠ߋ৽ʹ͑ͳ͍
2MFBSOJOH %2/ L = (r(s,a) + γ max Q (s ',a') − Q (s,a)) a' o θ o θ NBYPQFSBUPSͰબ͞ΕͨB`ͱ ࣮ࡍͷيͷB`͕ҟͳΔͷͰ NVMUJTUFQʹͰ͖ͳ͍ 2
7BMVFCBTFEͱ1PMJDZCBTFE w w 7BMVFCBTFE ࠷దͳՁؔΛٻΊΔ ˒ ֶशํ๏ՁؔͷpUUJOH 1PMJDZCBTFE ࠷దͳํࡦΛٻΊΔ ˒ ֶशํ๏ํࡦޯ๏
$POUSJCVUJPO w FOUSPQZSFHVMBSJ[FEͳQPMJDZʹؔͯ͠ɹɹɹɹɹ ҰൠԽͨ͠ϕϧϚϯ࠷దํఔࣜΛఏҊ w ͦΕΛʹݩP⒎QPMJDZͰNVMUJTUFQͳΞϧΰϦζϜ 1$- 1BUI$POTJTUFODZ-FBSOJOH ΛఏҊ w 7BMVFCBTFEͱ1PMJDZCBTFEͳख๏Λ౷Ұతͳɹ Ͱํݟઆ໌
&OUSPQZSFHVMBSJ[FEͱ w ํࡦͷ͕POFIPUʹͳΒͳ͍Α͏ʹํࡦͷ FOUSPQZΛ࠷େԽͤ͞ͳ͕Βֶशͤ͞Δ w "$ͳͲͰଛࣦؔͷ̍෦ͰΘΕ͍ͯΔ
(FOFSBMCFMMNBOFRVBUJPO ∗ Q FOUSPQZSFHVMBSJ[FEͳ࠷ద2ؔ #FMMNBO&RVBUJPO Q (s,a) = r(s,a) + γτ log ∑ a' exp(Q (s ',a') / τ ) ∗ ∗ w ϕϧϚϯ࠷దํఔࣜͷNBYPQFSBUPSΛɹɹɹɹ MPHTVNFYQʹ͢Δ͜ͱͰҰൠԽ w НˠͰNBYPQFSBUPSͱҰக
Нˠͷ࣌ τ log ∑ a' exp(Q (s ',a') / τ ) ∗ = τ log(exp(Q (s ,a ) / τ )∑ a' exp((Q (s ',a') − Q (s ,a )) / τ )) ∗ M ∗ M ∗ M M = max Q (s ',a') + τ log( ∑ a' exp((Q (s ',a') − Q (s ,a )) / τ )) ∗ ∗ ∗ M M a' ҎԼ w "QQFOEJYʹFOUSPQZSFHVMBSJ[FEͳํࡦʹɹɹ ͜ͷํఔ͕ࣜΓཱͭ͜ͱ͕ࣔ͞Ε͍ͯΔ
Ϛϧνεςοϓͷલʹ w ͋Δ̍ͭͷεςοϓͰͷКʹؔ͢Δతؔͷ ࠷େԽͰࡏݱͷ࠷దঢ়ଶՁؔͷਪఆΛߦ͏ w ཁ#FMMNBOํఔࣜͷಋग़Ͱ͍ͬͯΔ͜ͱΛɹɹ తؔͷ࠷େԽͱ͍͏จ຺Ͱߟ͑Δʁ ҎԼFOUSPQZSFHVMBSJ[FEͳ7ͱКʹ͍ͭͯ V (s) = −τ log π (a | s) + r(s,a) + γ V (s ') ∗ ∗ ཱ͕͢Δ͜ͱʹ͍ͭͯઆ໌͠·͢ ৄ͍͠ূ໌จͷ"QQFOEJYʹ ∗
࠷దঢ়ଶՁؔͷ࣌ {a1 ,...,an } ࣍ͷঢ়ଶ ࡏݱͷঢ়ଶ s0 ,v0 తؔ OMR (π ) = {v1 ,...,vn } {s1 ,..., sn } n ∑ π (a )(r + γ v ) i o i i i=1 తؔΛ࠷େԽͤ͞Δ࣌КPOFIPUʹͳΔ v = OMR (π ) = max(ri + γ v ) o 0 o i o i
&OUSPQZSFHVMBSJ[FEͳ࣌ᶃ n తؔ OENT (π ) = ∑ π (ai )(ri + γ v − τ log π (ai )) ∗ i i=1 π (ai ) OENT (π ) = −τ ∑ π (ai )log +τS ∗ exp((ri + γ vi ) / τ ) / S i=1 n తؔͷ࠷େԽ ˠ,-μΠόʔδΣϯεͷ࠷খԽ π (ai ) = ∗ exp((ri + γ v ) / τ ) ∗ i n ∑ exp((r i' i '=1 +γ v ) /τ ) ∗ i' ࣜมܗ
&OUSPQZSFHVMBSJ[FEͳ࣌ᶄ n v = OENT (π ) = τ log ∑ exp((ri + γ v ) / τ ) ∗ 0 ∗ ∗ i i=1 π (ai ) = exp((ri + γ v ) / τ ) ∗ i ∗ n ∑ exp((r i' +γ v ) /τ ) ∗ i' i '=1 v = −τ log π (ai ) + r(si ,ai ) + γ v ∗ 0 ∗ ∗ i
$POTJTUFODZ w ҰൠʹҎԼͷ͕ࣜશͯͷ T B S Ͱཱ͢Δɹɹɹɹɹɹ ূ໌"QQFOEJY V (s) = −τ log π (a | s) + r(s,a) + γ V (s ') ∗ ∗ ∗ ؼೲతʹద༻ −V (s1 ) + γ V (st ) + R(s1:t ) − τ G(s1:t , π ) = 0 ∗ t−1 n−m−1 ∗ ∗ n−m−1 R(sm:n ) = ∑ γ r(sm+i ,am+i ) G(sm:n , π ) = ∑ γ log π (am+i | sm+i ) i i=0 i i=0
1$QBSBNFUFSJ[F Cθ ,φ (s1:t ) = −Vφ (s1 ) + γ Vφ (st ) + R(s1:t ) − τ G(s1:t , π θ ) t−1 w $ͷೋΛଛࣦؔͱͯ͠༻͍Δ͜ͱͰВͱПͷ ࠷దԽͱͯ͠ؼண͢Δ͜ͱ͕Ͱ͖Δ w QPMJDZCBTFEͱWBMVFCBTFEΛ౷ҰతʹఆࣜԽ Δθ ∝ Cθ ,φ (s1:t )∇θ G(s1:t , π θ ) Δφ ∝ Cθ ,φ (s1:t )(∇φVφ (s1 ) − ∇φγ Vφ (st )) t−1
"$ͱͷൺֱ $POTJTUFODZ Cθ ,φ (s1:t ) = −Vφ (s1 ) + γ Vφ (st ) + R(s1:t ) − τ G(s1:t , π θ ) t−1 "$ߋ৽ࣜ Aθ ,φ (s1:d+1 ) = −Vφ (s1 ) + γ Vφ (sd+1 ) + R(s 1:d+1 ) d T −1 Δθ ∝ Es0:T [∑ Aθ ,φ (si:i+d )∇θ log π θ (ai | si )] i=0 T −1 Δφ ∝ Es0:T [∑ Aθ ,φ (si:i+d )∇φVφ (si )] i=0
%2/ͱͷൺֱ w NBYPQFSBUPSʹΑͬͯ̍εςοϓ͔͠ߋ৽ࣜʹɹ ΈࠐΊͳ͔ͬͨͷΛϚϧνεςοϓʹ֦ு w ࣮ݧతʹ্ख͘ߦ͍ͬͯͨOTUFQ2ʹMPHQSPCΛ ͢͜ͱͰཧతʹͮ͘جΞϧΰϦζϜͱ֦ͯ͠ு
࣮ݧ w ΞϧΰϦζϜܥλεΫ w "$ͱ1SJPSJUJ[FE%%2/Ͱൺֱ w શͯͷλεΫͰ%2/ "$ʹಉ͘͠উར
࣮ݧΤΩεύʔτ w ΤΩεύʔτͷيΛ3#ʹೖΕֶͯश w w JNQPSUBODFTBNQMJOHΛ͏ख๏ͱҧ͍͕ Θ͔Βͳͯ͘༻Մೳ ͷ͘͢͝ྑ͘ͳͬͨ
·ͱΊ w 0⒎QPMJDZͰϚϧνεςοϓͳֶश͕Մೳ w ՁؔۙࣅͱํࡦޯΛ౷Ұͨ͠ଛࣦؔͷɹɹ ࠷খԽͰఆࣜԽͨ͠ w طଘख๏ΛPWFSQFSGPSNͨ͠
ݸਓతʹࢥͬͨ͜ͱ w ࣮͕ݧτΠλεΫతͳͷ͔͍ͬͯ͠ͳ͍ͷͳ ͥʁ $BSU1PMFͬͯΈͨ w w IUUQTHJUIVCDPNSBSJMVSFMPQDM@LFSBT $POUJOVPVTDPOUSPMλεΫʹߋ৽ࣜͦͷ·· ͑ͦ͏Ͱ ূ໌ͦ͠͏ ࢼͯ͠ΈΔՁ͕͋Γ ͦ͏
$BSU1PMF