[DL輪読会]GANとエネルギーベースモデル

378 Views

May 19, 21

スライド概要

2020/08/28
Deep Learning JP:
http://deeplearning.jp/seminar-2/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

(ダウンロード不可)

関連スライド

各ページのテキスト
1.

GANͱΤωϧΪʔϕʔεϞσϧ Shohei Taniguchi, Matsuo Lab (M2) 1

2.

֓ཁ ҎԼͷ3ຊͷ࿦จΛϕʔεʹɺGANͱEBMͷؔ܎ʹ͍ͭͯ·ͱΊΔ Deep Directed Generative Models with Energy-Based Probability Estimatio https://arxiv.org/abs/1606.03439 Maximum Entropy Generators for Energy-Based Model https://arxiv.org/abs/1901.08508 n
 s
 g
 Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Samplin https://arxiv.org/abs/2003.06060

3.

Outline લఏ஌ࣝ • Generative Adversarial Network • Energy-based Model GANͱEBMͷྨࣅ఺ ࿦จ঺հ

4.

Generative Adversarial Network [Goodfellow et al., 2014] ࣝผ‫ ث‬Dθ ͱੜ੒‫ ث‬Gϕ ͷϛχϚοΫεήʔϜ dx dz dx Dθ : ℝ → [0,1], Gϕ : ℝ → ℝ ℒ (θ, ϕ) = p(x) [log Dθ(x)] + p(z) [log (1 𝔼 𝔼 ࣝผ‫ث‬͸ ℒ Λ࠷େԽ͠ɺੜ੒‫ث‬͸ ℒ Λ࠷খԽ − Dθ (Gϕ (z)))]

5.

GANͷֶश GANͷߋ৽ࣜ͸ ℒ (θ, ϕ) = N ∑ i=1 log Dθ(xi) + log (1 − Dθ (Gϕ (zi))) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ) zi ∼ Normal (0,I)

6.

GANͷҰൠతͳղऍ ࣝผ‫౓ີʹث‬ൺਪఆ‫ث‬ ࣝผ‫ث‬͸σʔλ෼෍ p (x) ͱੜ੒αϯϓϧͷ෼෍ pϕ (x) = ີ౓ൺਪఆ‫ͯ͠ͱث‬ͷ໾ׂΛՌͨ͢ i.e., ࣝผ‫࠷͕ث‬దͳͱ͖ 𝔼 p (x) (x) D* = θ p (x) + pϕ (x) p(z) [p (Gϕ (z))] ͷ

7.

GANͷҰൠతͳղऍ ੜ੒‫ث‬ͷֶश͸JS divergenceͷ࠷খԽ ࣝผ‫࠷͕ث‬దͳͱ͖ ℒ (θ, ϕ) = JS (p (x) ∥ pϕ (x)) − 2 log 2 p (x) + pϕ (x) p (x) + pϕ (x) 1 1 JS (p (x) ∥ pϕ (x)) = KL p (x) ∥ + KL pϕ (x) ∥ 2 2 2 ( ) 2 ( ) ੜ੒‫ ث‬Gϕ ͸σʔλ෼෍ͱͷJensen-Shannon divergence࠷খԽʹΑΓֶश͞ΕΔ

8.

−log D Trick ΦϦδφϧͷϩεͩͱɺޯ഑ফࣦ͕‫͜ى‬Γ΍͍͢ͷͰɺ ‫ޙ‬൒ΛҎԼͷΑ͏ʹஔ͖‫͑׵‬ΔτϦοΫ͕Α͘࢖ΘΕΔ ℒ (θ, ϕ) = log D (x) − [ ] p(x) θ (z) log D G p(z) [ θ( ϕ )] 𝔼 𝔼 ͨͩ͠ɺ͜ͷ৔߹͸ີ౓ൺਪఆʹ‫ͮ͘ج‬ղऍ͸੒Γཱͨͳ͍

9.

GANͷ೿ੜ σʔλ෼෍ͱͷ‫཭ڑ‬ͷࢦඪΛJSҎ֎ʹม͑Δͱɺ༷ʑͳGANͷ೿ੜ‫࡞͕ܥ‬ΕΔ ྫɿWasserstein GAN ℒ (θ, ϕ) = D (x) − [ ] p(x) θ (z) D G p(z) [ θ ( ϕ )] dx Dθ ͸1-Ϧϓγοπͳؔ਺ʢℝ → ℝʣ 𝔼 𝔼 ͜ͷͱ͖ɺੜ੒‫ث‬ͷֶश͸1-Wasserstein distanceͷ࠷খԽͱͳΔ

10.

Energy-based Model ΤωϧΪʔؔ਺ Eθ (x) Ͱ֬཰ϞσϧΛද‫͢ݱ‬Δ pθ (x) = exp (−Eθ (x)) Z (θ) Z (θ) = exp (−Eθ (x)) dx ∫ Eθ (x) ͸ෛͷର਺໬౓ −log pθ (x) ʹൺྫ

11.

EBMͷֶश Contrastive Divergence EBMͷର਺໬౓ͷޯ഑͸ ∇θ log pθ (x) = − ∇θ Eθ (x) + ∇θ log Z (θ) = − ∇θ Eθ (x) + (x′ ) ∇ E [ ] x′∼pθ(x) θ θ  𝔼  ‫܇‬࿅σʔλͷΤωϧΪʔΛԼ͛ͯɺϞσϧ͔ΒͷαϯϓϧͷΤωϧΪʔΛ্͛Δ

12.

EBM͔ΒͷαϯϓϦϯά Langevin dynamics ޯ഑߱Լ๏ʹϊΠζ͕ͷͬͨ‫ܗ‬ ޯ഑ϕʔεͷMCMC x ← x − η ∇x Eθ [x] + ϵ ϵ ∼ Normal (0,2ηI) ͜ͷߋ৽ࣜͰ‫܁‬Γฦ͠αϯϓϦϯά͢Δͱɺαϯϓϧͷ෼෍͸ pθ (x) ʹऩଋ͢Δ

13.

EBMͷֶश Contrastive Divergence ·ͱΊΔͱɺEBMͷߋ৽ࣜ͸ ℒ (θ, x′) = − N ∑ θ ← θ + ηθ ∇θ ℒ (θ, x′) x′i ← x′i − ηx′ ∇x′ℒ (θ, x′) + ϵ ϵ ∼ Normal (0,2ηI)         i=1 Eθ (xi) + Eθ (x′i)

14.

EBMͷֶश Contrastive Divergence ·ͱΊΔͱɺEBMͷߋ৽ࣜ͸ ℒ (θ, x′) = − N ∑ θ ← θ + ηθ ∇θ ℒ (θ, x′) x′i ← x′i − ηx′ ∇x′ℒ (θ, x′) + ϵ ϵ ∼ Normal (0,2ηI)         i=1 Eθ (xi) + Eθ (x′i) Α͘‫ݟ‬ΔͱGANͬΆ͍

15.

GANͷߋ৽ࣜ ℒ (θ, ϕ) = N ∑ i=1 log Dθ(xi) + log (1 − Dθ (Gϕ (zi))) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ) zi ∼ Normal (0,I)

16.

GANͷߋ৽ࣜ with -logDτϦοΫ ℒ (θ, ϕ) = N ∑ i=1 log Dθ(xi) − log Dθ (Gϕ (zi)) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ) zi ∼ Normal (0,I)

17.

GANͷߋ৽ࣜ with -logDτϦοΫ Eθ (x) = − log Dθ (x)ͱ͓͘ͱ ℒ (θ, ϕ) = − N ∑ i=1 Eθ (xi) + Eθ (Gϕ (zi)) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ) zi ∼ Normal (0,I)

18.

GANͱEBMͷྨࣅੑ GAN with - logD trick ℒ (θ, ϕ) = − N ∑ i=1 EBM Eθ (xi) + Eθ (Gϕ (zi)) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) i=1 ϵ ∼ Normal (0,2ηI)  ΊͬͪΌࣅͯΔ͚Ͳͪΐͬͱҧ͏      zi ∼ Normal (0,I)  ∑ Eθ (xi) + Eθ (x′i) θ ← θ + ηθ ∇θ ℒ (θ, x′) x′i ← x′i − ηx′ ∇x′ℒ (θ, x′) + ϵ ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ)  ℒ (θ, x′) = − N

19.

GANͱEBMͷྨࣅੑ GAN with - logD trick ℒ (θ, ϕ) = − N ∑ i=1 EBM Eθ (xi) + Eθ (Gϕ (zi)) θ ← θ + ηθ ∇θ ℒ (θ, ϕ) ℒ (θ, x′) = − N ∑ i=1 Eθ (xi) + Eθ (x′i) θ ← θ + ηθ ∇θ ℒ (θ, x′) x′i ← x′i − ηx′ ∇x′ℒ (θ, x′) + ϵ ϕ ← ϕ − ηϕ ∇ϕ ℒ (θ, ϕ) ϊΠζ͔ΒαϯϓϧΛੜ੒͢Δؔ਺GϕΛ ߋ৽͢Δ ΊͬͪΌࣅͯΔ͚Ͳͪΐͬͱҧ͏         zi ∼ Normal (0,I) αϯϓϧΛ௚઀ߋ৽͢Δ୅ΘΓʹ ϵ ∼ Normal (0,2ηI) ߋ৽ʹϊΠζ͕ͷΔ

20.

࿦จ঺հ EBMͷֶशΛGANΈ͍ͨʹੜ੒‫ث‬Λ࢖ͬͯͰ͖ͳ͍͔ʁ ➡ ࿦จ1, 2 GANͷࣝผ‫ث‬ΛΤωϧΪʔؔ਺ͱΈͳ͢ͱɺੜ੒࣌ʹࣝผ‫ث‬Λ࢖͑ΔͷͰ͸ʁ ➡ ࿦จ3

21.

Deep Directed Generative Models with Energy-Based Probability Estimation https://arxiv.org/abs/1606.03439 Taesup Kim, Yoshua Bengio (Université de Montréal)

22.

EBMͷֶश Contrastive Divergence EBMͷର਺໬౓ͷޯ഑ ∇θ log pθ (x) = − ∇θ Eθ (x) + ≈ − ∇θ Eθ (x)+ x′∼pθ(x) [ ∇θ Eθ (x′)] (z) ∇ E G z∼p(z) [ θ θ ( ϕ )]  𝔼  𝔼 pθ (x) ͔ΒͷαϯϓϦϯάΛ Gϕ (z) ͔ΒͷαϯϓϦϯάͰஔ͖‫͑׵‬Δ

23.

ੜ੒‫ث‬ͷֶश pϕ (x) = p(z) [δ (Gϕ (z))] ͱ͢Δͱɺpθ (x) = pϕ (x) ͱͳΕ͹ྑ͍ͷͰ ͜ͷ2ͭͷ෼෍ͷKL divergenceΛ࠷খԽ͢Δ͜ͱͰֶश͢Δ KL (pϕ∥ pθ) = (x) −log p − H p [ ] pϕ θ ( ϕ) αϯϓϧͷΤωϧΪʔΛ αϯϓϧͷΤϯτϩϐʔΛ 𝔼 𝔼 Լ͛Δ ্͛Δ

24.

ੜ੒‫ث‬ͷֶश ͳͥΤϯτϩϐʔ߲͕ඞཁ͔ KL (pϕ∥ pθ) = pϕ [−log pθ (x)] − H (pϕ) αϯϓϧͷΤωϧΪʔΛ αϯϓϧͷΤϯτϩϐʔΛ Լ͛Δ ্͛Δ ΋͠Τϯτϩϐʔ߲͕ͳ͍ͱɺੜ੒‫ث‬͸ΤωϧΪʔ͕࠷খʢʹີ౓͕࠷େʣͷ αϯϓϧͷΈΛੜ੒͢ΔΑ͏ʹֶशͯ͠͠·͏ ‣ GANͷmode collapseͱࣅͨΑ͏ͳ‫ݱ‬৅ 𝔼 ͜ΕΛ๷͙ͨΊʹΤϯτϩϐʔ߲͕ඞཁ

25.

ੜ੒‫ث‬ͷֶश ୈ1߲ͷޯ഑͸ɺҎԼͷΑ͏ʹ؆୯ʹ‫ࢉܭ‬Մೳ (x) −log p = [ ] pϕ θ 𝔼 𝔼 ∇ϕ (z) ∇ E G z∼p(z) [ ϕ θ ( ϕ )]

26.

ੜ੒‫ث‬ͷֶश ୈ2߲ͷΤϯτϩϐʔ͸ղੳతʹ‫·ٻ‬Βͳ͍ ࿦จͰ͸ɺόονਖ਼‫ن‬ԽͷεέʔϧύϥϝʔλΛਖ਼‫ن‬෼෍ͷ෼ࢄͱΈͳͯ͠ ͦͷΤϯτϩϐʔΛ‫͢ࢉܭ‬Δ͜ͱͰ୅༻͍ͯ͠Δ 𝒩 H (pϕ) ≈ ∑ ai H( 1 2 μ , σ = log 2eπσ ai) ( ai ai)) ∑ 2 ( a i

27.

GANʹର͢Δར఺ ࣝผ‫ث‬ͷ୅ΘΓʹΤωϧΪʔؔ਺Λֶश͢ΔͷͰɺີ౓ൺਪఆͳͲʹ࢖͑Δ

28.

ੜ੒αϯϓϧ

29.

Maximum Entropy Generators for Energy-Based Models https://arxiv.org/abs/1901.08508 
 Rithesh Kumar, Sherjil Ozair, Anirudh Goyal, Aaron Courville, Yoshua Bengio (Université de Montréal)

30.

Τϯτϩϐʔͷ‫ࢉܭ‬ KL (pϕ∥ pθ) = pϕ [−log pθ (x)] − H (pϕ) ࿦จ1Ͱ͸Τϯτϩϐʔ H (pϕ) ͷ‫ࢉܭ‬Λόονਖ਼‫ن‬ԽͷεέʔϧύϥϝʔλͰ 𝔼 ߦ͍͕ͬͯͨɺώϡʔϦεςΟοΫͰཧ࿦తͳଥ౰ੑ΋ͳ͍

31.

Τϯτϩϐʔͷ‫ࢉܭ‬ જࡏม਺ z ͱੜ੒‫ث‬ͷग़ྗ x = Gϕ (z) ͷ૬‫ޓ‬৘ใྔΛߟ͑Δͱ 𝔼 I(x, z) = H (x) − H (x ∣ z) = (z) (z) H G − H G ∣ z p(z) [ ( ϕ ) ( ϕ )]

32.

Τϯτϩϐʔͷ‫ࢉܭ‬ Gϕ ͕ܾఆ࿦తͳؔ਺ͷͱ͖ɺH (Gϕ (z) ∣ z) = 0 ͳͷͰ H (pϕ) = p(z) [H (Gϕ (z))] = I (x, z) 𝔼 ͭ·ΓɺΤϯτϩϐʔͷ୅ΘΓʹɺ૬‫ޓ‬৘ใྔΛ࠷େԽ͢Ε͹ྑ͍

33.

૬‫ޓ‬৘ใྔͷਪఆ ૬‫ޓ‬৘ใྔͷਪఆํ๏͸ɺۙ೥͍Ζ͍ΖఏҊ͞Ε͍ͯΔ͕ɺ ͜͜Ͱ͸ɺDeep InfoMaxͰఏҊ͞ΕͨJS divergenceʹ‫ͮ͘ج‬ਪఆ๏Λ༻͍Δ IJSD (x, z) = sup T∈ (−T (x, −sp z)) − [ ] p(x, z) (T (x, sp z)) [ ] p(x)p(z) T ͸ p (x, z) ͔Βͷαϯϓϧͱ p (x) p (z) ͔ΒͷαϯϓϧΛ‫ݟ‬෼͚Δࣝผ‫Ͱث‬ 𝔼 𝔼 𝒯 ಉ࣌ʹֶश͢Δ

34.

ີ౓ਪఆ ෳࡶͳ෼෍΋͏·ۙ͘ࣅͰ͖͍ͯΔ

35.

Mode Collapse 1000 (or 10000) ‫ݸ‬ͷϞʔυΛ΋ͭσʔλʢStackedMNISTʣͰֶशͨ͠ͱ͖ʹɺ ϞʔυΛ͍ͭ͘ଊ͑ΒΕ͍ͯΔ͔Λൺֱ͢Δ࣮‫ݧ‬ MEG (ఏҊ๏) ͸͢΂ͯͷϞʔυΛଊ͓͑ͯΓɺmode collapse͕‫͜ى‬Βͳ͍

36.

ը૾ੜ੒ CIFAR-10 EBM͔ΒMCMCͰαϯϓϧͨ͠৔߹͸WGAN-GPΑΓ΋IS, FID͕ྑ͍

37.

Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling https://arxiv.org/abs/2003.06060 Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio (Université de Montréal, Google Brain)

38.

GANͷҰൠతͳղऍ ࣝผ‫౓ີʹث‬ൺਪఆ‫ث‬ ࣝผ‫ث‬͸σʔλ෼෍ p (x) ͱੜ੒αϯϓϧͷ෼෍ pϕ (x) = ີ౓ൺਪఆ‫ͯ͠ͱث‬ͷ໾ׂΛՌͨ͢ i.e., ࣝผ‫࠷͕ث‬దͳͱ͖ 𝔼 p (x) (x) D* = θ p (x) + pϕ (x) p(z) [p (Gϕ (z))] ͷ

39.

GANͷҰൠతͳղऍ ࣝผ‫౓ີʹث‬ൺਪఆ‫ث‬ σ ( ⋅ ) ΛγάϞΠυؔ਺ɺdθ (x) = σ p (x) Dθ (x) = p (x) + pϕ (x) −1 (D (x)) ͱ͢Δͱ ⇒ p (x) ∝ pϕ (x) exp (dθ (x)) σʔλ෼෍ p (x) ͸ੜ੒‫ث‬ͷ෼෍ pϕ (x) ͱ exp (dθ (x)) ͷੵʹൺྫ͢Δ ➡ ֶश‫ޙ‬ͷGANͰ͜ͷ෼෍͔Βαϯϓϧ͢Ε͹ɺੜ੒ͷ্࣭͕͕ΔͷͰ͸ʁ

40.

જࡏۭؒͰͷMCMC Discriminator Driven Latent Sampling (DDLS) pϕ (x) exp (dθ (x)) ͔ΒαϯϓϦϯά͍͕ͨ͠ɺσʔλۭؒͰMCMCΛ͢Δͷ͸ ޮ཰͕ѱ͘೉͍͠ ୅ΘΓʹੜ੒‫ ث‬Gϕ (z) ͷજࡏ্ۭؒͰMCMC (Langevin dynamics)Λߦ͏ E (z) = − log p (z) − dθ (Gϕ (z)) z ← z − η ∇z E (z) + ϵ ϵ ∼ Normal (0,2ηI)

41.

࣮‫ݧ‬ ֶशࡁΈͷGANʹDDLSΛ࢖͏͚ͩͰɺIS΍FID͕͔ͳΓվળ͢Δ

42.

·ͱΊ GANͱEBM͸ਂ͍ؔ܎ʹ͋Δ ྆ऀͷ஌‫ݟ‬Λੜ͔͢͜ͱͰɺ྆ऀͷ͍͍ͱ͜औΓΛ͢ΔΞϓϩʔν͕Ͱ͖Δ • EBMͷαϯϓϦϯάʹੜ੒‫ث‬Λ࢖͏ • GANͷαϯϓϦϯάʹMCMCΛ࢖͏ ࠓ‫ޙ‬΋ࣅͨΑ͏ͳΞϓϩʔνͷ‫͕ڀݚ‬৭ʑͱग़ͯ͘Δ༧‫ײ‬