373 Views
May 19, 21
スライド概要
2020/08/28
Deep Learning JP:
http://deeplearning.jp/seminar-2/
DL輪読会資料
GANͱΤωϧΪʔϕʔεϞσϧ Shohei Taniguchi, Matsuo Lab (M2) 1
֓ཁ ҎԼͷ3ຊͷจΛϕʔεʹɺGANͱEBMͷؔʹ͍ͭͯ·ͱΊΔ Deep Directed Generative Models with Energy-Based Probability Estimatio https://arxiv.org/abs/1606.03439 Maximum Entropy Generators for Energy-Based Model https://arxiv.org/abs/1901.08508 n s g Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Samplin https://arxiv.org/abs/2003.06060
Outline લఏࣝ • Generative Adversarial Network • Energy-based Model GANͱEBMͷྨࣅ จհ
Generative Adversarial Network [Goodfellow et al., 2014] ࣝผ ثDθ ͱੜ ثGϕ ͷϛχϚοΫεήʔϜ dx dz dx Dθ : ℝ → [0,1], Gϕ : ℝ → ℝ ℒ (θ, ϕ) = p(x) [log Dθ(x)] + p(z) [log (1 𝔼 𝔼 ࣝผث ℒ Λ࠷େԽ͠ɺੜث ℒ Λ࠷খԽ − Dθ (Gϕ (z)))]
GANͷֶश GANͷߋ৽ࣜ ℒ (θ, ϕ) = N ∑ i=1 log Dθ(xi) + log (1 − Dθ (Gϕ (zi))) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ) zi ∼ Normal (0,I)
GANͷҰൠతͳղऍ ࣝผີʹثൺਪఆث ࣝผثσʔλ p (x) ͱੜαϯϓϧͷ pϕ (x) = ີൺਪఆͯ͠ͱثͷׂΛՌͨ͢ i.e., ࣝผ࠷͕ثదͳͱ͖ 𝔼 p (x) (x) D* = θ p (x) + pϕ (x) p(z) [p (Gϕ (z))] ͷ
GANͷҰൠతͳղऍ ੜثͷֶशJS divergenceͷ࠷খԽ ࣝผ࠷͕ثదͳͱ͖ ℒ (θ, ϕ) = JS (p (x) ∥ pϕ (x)) − 2 log 2 p (x) + pϕ (x) p (x) + pϕ (x) 1 1 JS (p (x) ∥ pϕ (x)) = KL p (x) ∥ + KL pϕ (x) ∥ 2 2 2 ( ) 2 ( ) ੜ ثGϕ σʔλͱͷJensen-Shannon divergence࠷খԽʹΑΓֶश͞ΕΔ
−log D Trick ΦϦδφϧͷϩεͩͱɺޯফࣦ͕͜ىΓ͍͢ͷͰɺ ޙΛҎԼͷΑ͏ʹஔ͖͑ΔτϦοΫ͕Α͘ΘΕΔ ℒ (θ, ϕ) = log D (x) − [ ] p(x) θ (z) log D G p(z) [ θ( ϕ )] 𝔼 𝔼 ͨͩ͠ɺ͜ͷ߹ີൺਪఆʹͮ͘جղऍΓཱͨͳ͍
GANͷੜ σʔλͱͷڑͷࢦඪΛJSҎ֎ʹม͑Δͱɺ༷ʑͳGANͷੜ࡞͕ܥΕΔ ྫɿWasserstein GAN ℒ (θ, ϕ) = D (x) − [ ] p(x) θ (z) D G p(z) [ θ ( ϕ )] dx Dθ 1-Ϧϓγοπͳؔʢℝ → ℝʣ 𝔼 𝔼 ͜ͷͱ͖ɺੜثͷֶश1-Wasserstein distanceͷ࠷খԽͱͳΔ
Energy-based Model ΤωϧΪʔؔ Eθ (x) Ͱ֬ϞσϧΛද͢ݱΔ pθ (x) = exp (−Eθ (x)) Z (θ) Z (θ) = exp (−Eθ (x)) dx ∫ Eθ (x) ෛͷର −log pθ (x) ʹൺྫ
EBMͷֶश Contrastive Divergence EBMͷରͷޯ ∇θ log pθ (x) = − ∇θ Eθ (x) + ∇θ log Z (θ) = − ∇θ Eθ (x) + (x′ ) ∇ E [ ] x′∼pθ(x) θ θ  𝔼  ܇࿅σʔλͷΤωϧΪʔΛԼ͛ͯɺϞσϧ͔ΒͷαϯϓϧͷΤωϧΪʔΛ্͛Δ
EBM͔ΒͷαϯϓϦϯά Langevin dynamics ޯ߱Լ๏ʹϊΠζ͕ͷͬͨܗ ޯϕʔεͷMCMC x ← x − η ∇x Eθ [x] + ϵ ϵ ∼ Normal (0,2ηI) ͜ͷߋ৽ࣜͰ܁Γฦ͠αϯϓϦϯά͢Δͱɺαϯϓϧͷ pθ (x) ʹऩଋ͢Δ
EBMͷֶश Contrastive Divergence ·ͱΊΔͱɺEBMͷߋ৽ࣜ ℒ (θ, x′) = − N ∑ θ ← θ + ηθ ∇θ ℒ (θ, x′) x′i ← x′i − ηx′ ∇x′ℒ (θ, x′) + ϵ ϵ ∼ Normal (0,2ηI)         i=1 Eθ (xi) + Eθ (x′i)
EBMͷֶश Contrastive Divergence ·ͱΊΔͱɺEBMͷߋ৽ࣜ ℒ (θ, x′) = − N ∑ θ ← θ + ηθ ∇θ ℒ (θ, x′) x′i ← x′i − ηx′ ∇x′ℒ (θ, x′) + ϵ ϵ ∼ Normal (0,2ηI)         i=1 Eθ (xi) + Eθ (x′i) Α͘ݟΔͱGANͬΆ͍
GANͷߋ৽ࣜ ℒ (θ, ϕ) = N ∑ i=1 log Dθ(xi) + log (1 − Dθ (Gϕ (zi))) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ) zi ∼ Normal (0,I)
GANͷߋ৽ࣜ with -logDτϦοΫ ℒ (θ, ϕ) = N ∑ i=1 log Dθ(xi) − log Dθ (Gϕ (zi)) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ) zi ∼ Normal (0,I)
GANͷߋ৽ࣜ with -logDτϦοΫ Eθ (x) = − log Dθ (x)ͱ͓͘ͱ ℒ (θ, ϕ) = − N ∑ i=1 Eθ (xi) + Eθ (Gϕ (zi)) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ) zi ∼ Normal (0,I)
GANͱEBMͷྨࣅੑ GAN with - logD trick ℒ (θ, ϕ) = − N ∑ i=1 EBM Eθ (xi) + Eθ (Gϕ (zi)) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) i=1 ϵ ∼ Normal (0,2ηI)  ΊͬͪΌࣅͯΔ͚Ͳͪΐͬͱҧ͏      zi ∼ Normal (0,I)  ∑ Eθ (xi) + Eθ (x′i) θ ← θ + ηθ ∇θ ℒ (θ, x′) x′i ← x′i − ηx′ ∇x′ℒ (θ, x′) + ϵ ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ)  ℒ (θ, x′) = − N
GANͱEBMͷྨࣅੑ GAN with - logD trick ℒ (θ, ϕ) = − N ∑ i=1 EBM Eθ (xi) + Eθ (Gϕ (zi)) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) ℒ (θ, x′) = − N ∑ i=1 Eθ (xi) + Eθ (x′i) θ ← θ + ηθ ∇θ ℒ (θ, x′) x′i ← x′i − ηx′ ∇x′ℒ (θ, x′) + ϵ ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ) ϊΠζ͔ΒαϯϓϧΛੜ͢ΔؔGϕΛ ߋ৽͢Δ ΊͬͪΌࣅͯΔ͚Ͳͪΐͬͱҧ͏         zi ∼ Normal (0,I) αϯϓϧΛߋ৽͢ΔΘΓʹ ϵ ∼ Normal (0,2ηI) ߋ৽ʹϊΠζ͕ͷΔ
จհ EBMͷֶशΛGANΈ͍ͨʹੜثΛͬͯͰ͖ͳ͍͔ʁ ➡ จ1, 2 GANͷࣝผثΛΤωϧΪʔؔͱΈͳ͢ͱɺੜ࣌ʹࣝผثΛ͑ΔͷͰʁ ➡ จ3
Deep Directed Generative Models with Energy-Based Probability Estimation https://arxiv.org/abs/1606.03439 Taesup Kim, Yoshua Bengio (Université de Montréal)
EBMͷֶश Contrastive Divergence EBMͷରͷޯ ∇θ log pθ (x) = − ∇θ Eθ (x) + ≈ − ∇θ Eθ (x)+ x′∼pθ(x) [ ∇θ Eθ (x′)] (z) ∇ E G z∼p(z) [ θ θ ( ϕ )]  𝔼  𝔼 pθ (x) ͔ΒͷαϯϓϦϯάΛ Gϕ (z) ͔ΒͷαϯϓϦϯάͰஔ͖͑Δ
ੜثͷֶश pϕ (x) = p(z) [δ (Gϕ (z))] ͱ͢Δͱɺpθ (x) = pϕ (x) ͱͳΕྑ͍ͷͰ ͜ͷ2ͭͷͷKL divergenceΛ࠷খԽ͢Δ͜ͱͰֶश͢Δ KL (pϕ∥ pθ) = (x) −log p − H p [ ] pϕ θ ( ϕ) αϯϓϧͷΤωϧΪʔΛ αϯϓϧͷΤϯτϩϐʔΛ 𝔼 𝔼 Լ͛Δ ্͛Δ
ੜثͷֶश ͳͥΤϯτϩϐʔ߲͕ඞཁ͔ KL (pϕ∥ pθ) = pϕ [−log pθ (x)] − H (pϕ) αϯϓϧͷΤωϧΪʔΛ αϯϓϧͷΤϯτϩϐʔΛ Լ͛Δ ্͛Δ ͠Τϯτϩϐʔ߲͕ͳ͍ͱɺੜثΤωϧΪʔ͕࠷খʢʹີ͕࠷େʣͷ αϯϓϧͷΈΛੜ͢ΔΑ͏ʹֶशͯ͠͠·͏ ‣ GANͷmode collapseͱࣅͨΑ͏ͳݱ 𝔼 ͜ΕΛ͙ͨΊʹΤϯτϩϐʔ߲͕ඞཁ
ੜثͷֶश ୈ1߲ͷޯɺҎԼͷΑ͏ʹ؆୯ʹࢉܭՄೳ (x) −log p = [ ] pϕ θ 𝔼 𝔼 ∇ϕ (z) ∇ E G z∼p(z) [ ϕ θ ( ϕ )]
ੜثͷֶश ୈ2߲ͷΤϯτϩϐʔղੳతʹ·ٻΒͳ͍ จͰɺόονਖ਼نԽͷεέʔϧύϥϝʔλΛਖ਼نͷࢄͱΈͳͯ͠ ͦͷΤϯτϩϐʔΛ͢ࢉܭΔ͜ͱͰ༻͍ͯ͠Δ 𝒩 H (pϕ) ≈ ∑ ai H( 1 2 μ , σ = log 2eπσ ai) ( ai ai)) ∑ 2 ( a i
GANʹର͢Δར ࣝผثͷΘΓʹΤωϧΪʔؔΛֶश͢ΔͷͰɺີൺਪఆͳͲʹ͑Δ
ੜαϯϓϧ
Maximum Entropy Generators for Energy-Based Models https://arxiv.org/abs/1901.08508 Rithesh Kumar, Sherjil Ozair, Anirudh Goyal, Aaron Courville, Yoshua Bengio (Université de Montréal)
Τϯτϩϐʔͷࢉܭ KL (pϕ∥ pθ) = pϕ [−log pθ (x)] − H (pϕ) จ1ͰΤϯτϩϐʔ H (pϕ) ͷࢉܭΛόονਖ਼نԽͷεέʔϧύϥϝʔλͰ 𝔼 ߦ͍͕ͬͯͨɺώϡʔϦεςΟοΫͰཧతͳଥੑͳ͍
Τϯτϩϐʔͷࢉܭ જࡏม z ͱੜثͷग़ྗ x = Gϕ (z) ͷ૬ޓใྔΛߟ͑Δͱ 𝔼 I(x, z) = H (x) − H (x ∣ z) = (z) (z) H G − H G ∣ z p(z) [ ( ϕ ) ( ϕ )]
Τϯτϩϐʔͷࢉܭ Gϕ ͕ܾఆతͳؔͷͱ͖ɺH (Gϕ (z) ∣ z) = 0 ͳͷͰ H (pϕ) = p(z) [H (Gϕ (z))] = I (x, z) 𝔼 ͭ·ΓɺΤϯτϩϐʔͷΘΓʹɺ૬ޓใྔΛ࠷େԽ͢Εྑ͍
૬ޓใྔͷਪఆ ૬ޓใྔͷਪఆํ๏ɺ͍ۙΖ͍ΖఏҊ͞Ε͍ͯΔ͕ɺ ͜͜ͰɺDeep InfoMaxͰఏҊ͞ΕͨJS divergenceʹͮ͘جਪఆ๏Λ༻͍Δ IJSD (x, z) = sup T∈ (−T (x, −sp z)) − [ ] p(x, z) (T (x, sp z)) [ ] p(x)p(z) T p (x, z) ͔Βͷαϯϓϧͱ p (x) p (z) ͔ΒͷαϯϓϧΛݟ͚ΔࣝผͰث 𝔼 𝔼 𝒯 ಉ࣌ʹֶश͢Δ
ີਪఆ ෳࡶͳ͏·ۙ͘ࣅͰ͖͍ͯΔ
Mode Collapse 1000 (or 10000) ݸͷϞʔυΛͭσʔλʢStackedMNISTʣͰֶशͨ͠ͱ͖ʹɺ ϞʔυΛ͍ͭ͘ଊ͑ΒΕ͍ͯΔ͔Λൺֱ͢Δ࣮ݧ MEG (ఏҊ๏) ͯ͢ͷϞʔυΛଊ͓͑ͯΓɺmode collapse͕͜ىΒͳ͍
ը૾ੜ CIFAR-10 EBM͔ΒMCMCͰαϯϓϧͨ͠߹WGAN-GPΑΓIS, FID͕ྑ͍
Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling https://arxiv.org/abs/2003.06060 Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio (Université de Montréal, Google Brain)
GANͷҰൠతͳղऍ ࣝผີʹثൺਪఆث ࣝผثσʔλ p (x) ͱੜαϯϓϧͷ pϕ (x) = ີൺਪఆͯ͠ͱثͷׂΛՌͨ͢ i.e., ࣝผ࠷͕ثదͳͱ͖ 𝔼 p (x) (x) D* = θ p (x) + pϕ (x) p(z) [p (Gϕ (z))] ͷ
GANͷҰൠతͳղऍ ࣝผີʹثൺਪఆث σ ( ⋅ ) ΛγάϞΠυؔɺdθ (x) = σ p (x) Dθ (x) = p (x) + pϕ (x) −1 (D (x)) ͱ͢Δͱ ⇒ p (x) ∝ pϕ (x) exp (dθ (x)) σʔλ p (x) ੜثͷ pϕ (x) ͱ exp (dθ (x)) ͷੵʹൺྫ͢Δ ➡ ֶशޙͷGANͰ͜ͷ͔Βαϯϓϧ͢Εɺੜͷ্࣭͕͕ΔͷͰʁ
જࡏۭؒͰͷMCMC Discriminator Driven Latent Sampling (DDLS) pϕ (x) exp (dθ (x)) ͔ΒαϯϓϦϯά͍͕ͨ͠ɺσʔλۭؒͰMCMCΛ͢Δͷ ޮ͕ѱ͍͘͠ ΘΓʹੜ ثGϕ (z) ͷજࡏ্ۭؒͰMCMC (Langevin dynamics)Λߦ͏ E (z) = − log p (z) − dθ (Gϕ (z)) z ← z − η ∇z E (z) + ϵ ϵ ∼ Normal (0,2ηI)
࣮ݧ ֶशࡁΈͷGANʹDDLSΛ͏͚ͩͰɺISFID͕͔ͳΓվળ͢Δ
·ͱΊ GANͱEBMਂ͍ؔʹ͋Δ ྆ऀͷݟΛੜ͔͢͜ͱͰɺ྆ऀͷ͍͍ͱ͜औΓΛ͢ΔΞϓϩʔν͕Ͱ͖Δ • EBMͷαϯϓϦϯάʹੜثΛ͏ • GANͷαϯϓϦϯάʹMCMCΛ͏ ࠓޙࣅͨΑ͏ͳΞϓϩʔνͷ͕ڀݚ৭ʑͱग़ͯ͘Δ༧ײ