338 Views
January 24, 20
スライド概要
2020/01/24
Deep Learning JP:
http://deeplearning.jp/seminar-2/
DL輪読会資料
ۙͷΤωϧΪʔϕʔεϞσϧͷਐల Shohei Taniguchi, Matsuo Lab (M1) 1
എܠ • ࠷ۙɺΤωϧΪʔϕʔεϞσϧ (EBM) ͕·ͨ͞Ε࢝Ί͍ͯΔ(?) • ҎԼͷ2ຊͷจΛϝΠϯͰհ - Flow Contrastive Estimation of Energy-Based Models ‣ ϑϩʔͱΤωϧΪʔϕʔεͷ2ͭͷੜϞσϧΛಉ࣌ʹֶश͢Δ - Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One ‣ ΤωϧΪʔϕʔεϞσϧΛͬͯੜϞσϧͱࣝผثΛಉ࣌ʹֶ श͢Δ 2
Outline 1. લఏࣝɿEnergy Based Model (EBM) - EBMͷओͳֶशͷํ ‣ Contrastive Divergence Learning (CD๏) ‣ Noice Contrastive Estimation (ϊΠζରরਪఆ) 2. EBMͷྺ࢙ - Restricted Boltzmann Machine (RBM) ͱͦͷޙ 3. Flow Contrastive Estimation of Energy-Based Models 4. Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One 3
લఏࣝɿEnergy Based Model 4
EBMͱ • σʔλxͷ֬ີؔpθ (x)ΛɺxΛೖྗͱͯ͠εΧϥʔΛฦ͢Τωϧ ΪʔؔEθ (x)Λ༻͍ͯɺҎԼͷΑ͏ʹఆٛ͢Δ pθ (x) = exp (−Eθ (x)) Z (θ) Z (θ) = exp (−Eθ (x)) dx ( ) ∫ - Z (θ)ؔͱݺΕΔ 5
EBMͷ͍Ͳ͜Ζ • ີ(ൺ)ਪఆ - ΤωϧΪʔؔີͷରͷෛͷʹൺྫ͢ΔͷͰɺෳͷσʔλͷ ີͷൺֱ͕Մೳ ‣ ͨͩ͠ɺີͦͷͷ͕ؔ͋ΔͨΊ͍ͳ͖Ͱࢉܭ ‣ ޙड़ͷNCEΛ͑ີͷਪఆ·ٻΔ - ҟৗݕͳͲʹ༗༻ (?) • σʔλͷαϯϓϦϯά - ΤωϧΪʔ͕ؔඍՄೳͰ͋ΕHMCͰσʔλͷαϯϓϦϯά͕Մೳ ‣ ͨͩ͠ɺߴ࣍ͱͩݩMCMCͷऩଋ͕͍ͷͰ࣮ࡍʹ͍͠ݫ 6
EBMͷֶश • جຊతʹ܇࿅σʔλʹର͢Δରlog pθ (x)Λ࠷େԽ͢ΔΑ͏ʹ ֶश͢Δ (࠷ਪఆ) - ͔͠͠ଟ͘ͷ߹ɺؔZ (θ)ੵΛؚΉͷͰ͍ͳ͖Ͱࢉܭ ➡ ͍ͳ͖Ͱࢉܭ - ͜ͷͨΊɺEBMͷֶशʹؔͷࢉܭΛආ͚ΔͨΊͷ͕ඞ ཁͱͳΔ 7
Contrastive Divergence Learning (CD๏) • SGDͳͲͷޯϕʔεͰֶश͢ΔʹɺରͦͷͷͰͳ͘ɺͦ ∂ log pθ (x) ͷύϥϝʔλʹ͍ͭͯͷޯ ͕͔Εे ∂θ - ͜ΕҎԼͷΑ͏ʹࢉܭՄೳ ∂ log pθ (x) ∂Eθ (x) ∂Eθ (x) = 𝔼pθ(x) − 𝔼pdata(x) [ ∂θ ] [ ∂θ ] ∂θ ‣ ཁΤωϧΪʔؔͷޯΛϞσϧ͔Βͷαϯϓϧͱσʔλ͔Β ͷαϯϓϧʹରͯࠩͯ͠͠ࢉܭΛऔΕΑ͍ ‣ Ϟσϧpθ (x)͔ΒͷαϯϓϧMCMCͱ͔ͰؤுͬͯऔΔ 8
CD๏ͷ՝ • Ϟσϧpθ (x)͔ΒαϯϓϧΛऔΔͷ͕ΊΜͲ͍͘͞ - MCMCσʔλ͕ߴ࣍ͳʹݩΔͱऩଋʹ͕͔͔࣌ؒΔ - ύϥϝʔλͷߋ৽Λ͢ΔͨͼʹMCMCͰαϯϓϧΛऔ͍ͬͯͨΒඇ ৗʹ͕͔͔࣌ؒͬͯ͠·͏ ➡ Ϟσϧ͔ΒαϯϓϧΛऔΒͳͯ͘ྑ͍ํ๏͕ཉ͍͠ 9
Noise Contrastive Estimation (NCE, ϊΠζରরਪఆ) • ·ͣɺؔZ (θ)ࣗମΛผͷύϥϝʔλcͰਪఆͤ͞Δ͜ͱʹ͠ɺ log pθ (x) = − Eθ (x) − cΛ࠷େԽͤ͞Δ͜ͱΛߟ͑Δ - ͜ͷͱ͖ɺҎԼͷతؔΛ࠷େԽ͢ΔΑ͏ʹֶश͢Δͱɺθର͕࠷େ Խ͞ΕɺcZ (θ)ʹҰக͢Δ͜ͱ͕ΒΕ͍ͯΔ pθ(x) q(x) J (θ) = 𝔼pdata(x) log + 𝔼q(x) log pθ(x) + q(x) ] pθ(x) + q(x) ] [ [ ‣ ͨͩ͠ɺq (x)ͳΜΒ͔ͷϊΠζ (ΨγΞϯϊΠζͱ͔) ‣ ײతʹσʔλ͔ΒͷαϯϓϧͱϊΠζΛݟ͚ΒΕΔΑ͏ʹֶश͞ΕΔ ‣ ࣮GANͱগ͕ؔ͋͠Δ (ޙड़) 10
NCEͷ՝ • q (x)ΛͲͷΑ͏ʹબͿ͔ - q (x)͕ຬ͖ͨ݅͢ɿ ① ີ͕؆୯ʹ͖ͰࢉܭΔ ② ؆୯ʹαϯϓϧͰ͖Δ ③ σʔλpdata (x)ʹ͍ۙ͜ͱ͕·͍͠ - ①, ② ΘΓͱ؆୯͕ͩɺ③͕͍͠ ‣ ͱ͍͏͔ɺ࠷ॳ͔Βσʔλʹ͍͕ۙಘΒΕ͍ͯΔͳΒɺ Θ͟Θ͟EBMΛֶश͢Δඞཁͳ͍ 11
EBMͷྺ࢙ 12
EBMొͷഎܠɿࣄલֶश • ॳظͷਂֶशͰɺଟͷϞσϧΛֶश͢Δʹɺࣄલֶश͕ෆՄ ܽͩͬͨ • ࣄલֶशͷख๏ͱͯ͠େ͖͘2ͭ͋ͬͨ - ࠶ߏϕʔε: ͝ͱʹ࠶ߏࠩޡΛ࠷খԽͤ͞ΔΑ͏ʹֶश ɹɹɹɹɹɹ e.g. Autoencoder, Denoising AE - EBMϕʔε: ֤χϡʔϩϯΛ2ͷ֬มͱͯ͠͝ͱʹର ɹɹɹɹɹ Λ࠷େԽ͢ΔΑ͏ʹֶश ɹɹɹɹe.g. Restricted Boltzmann Machine, Deep Boltzmann Machine 13
Restricted Boltzmann Machine (RBM) • ӅΕϢχοτhiΛͦΕͧΕ2ͷ֬มͱͯ͠ɺ P (hi = 1 | v) = σ (v⊤W:,i + bi)ͱ͢ΔͱɺΤωϧΪʔ ؔ E(v, h) = − b ⊤v − c ⊤h − v ⊤Wh ͱͳΔ • ͜ΕΛશͯͷhʹ͍ͭͯपลԽͯ͠ਖ਼نԽͨ͠ RBM 1 p (v) = p(v, h) p(v, h) = exp(−E(v, h)) ∑ ( ) Z i Λ࠷େԽ͢ΔΑ͏ʹֶश (hi͕2ͳͷͰपลԽ؆୯ʹ͖ͰࢉܭΔ) • ֶशCD๏Ͱߦ͏͜ͱ͕ଟ͍ • ͜ΕΛଟʹੵΈॏͶֶͯश͢Δͷ͕Deep Boltzmann Machine 14
RBMҎޙͷEBM • RBMDBMͳͲͷEBMΛ༻͍ͨࣄલֶश࣌ඇৗʹ༗ޮ͕ͩͬͨɺ ReLUυϩοϓΞτͷొॳظԽख๏ͷൃలʹΑΓɺࣄલֶश ΘΕͳ͘ͳͬͨ • ੜϞσϧͱͯ͠VAE, GANͳͲͷొͱͱʹଘࡏͨͬͳ͘ͳ͕ײ • ࢉܭਆܦՊֶͰͷͷϞσϧͱ͍ͯ͠·ͩʹΑ͘ΘΕΔ • ͳͥ࠷ۙ·ͨ͞Ε͍ͯΔʁ ➡ ΤωϧΪʔؔͷ͍ํ͕มԽͨ͜͠ͱͰɺ༷ʑͳ༻్ʹ͑ΔΑ ͏ʹͳ͖ͬͯͨ 15
EBMͷࠓੲ ੲͷEBM (RBMͳͲ) ࠷ۙͷEBM • ӅΕͷ֤χϡʔϩϯΛ2ͷ • ΤωϧΪʔؔͦͷͷΛNNͰ ֬มͱߟ͑ͯɺͦͷશମʹ ରͯ͠ΤωϧΪʔؔΛఆٛ • ӅΕͷχϡʔϩϯʹ͍ͭͯप ลԽͨ͠Λ༻ֶ͍ͯश E (v, h (1), h (2), h (3)) = − v ⊤W (1)h (1) − h (1)⊤W (2)h (2) − h (2)⊤W (3)h (3) ఆٛ͢Δ (NNશମͱͯ͠1ͭͷ ܾఆతͳؔͱߟ͑Δ) • ΤωϧΪʔؔͷग़ྗΛͦͷ· ·༻ֶ͍ͯश E (v) = NN (v) = w (n) ⋯φ (W (2)φ (W (1)v + b (1)) + b (2)) + b (n) ( ) ֶश๏ͳͲڞ௨͕ͩɺ͍ํ͕͔ͳΓҧ͏͜ͱʹҙ 16
࠷ۙͷEBMͷྫ • Implicit Generation and Modeling with Energy-Based Models (NeurIPS 2019) - EBMͰ៉ྷͳը૾ੜ͕Ͱ͖ΔΑ͏ʹͳͬͨ - ֶशCD๏ϕʔε - ਖ਼ଇԽͳͲΛ͢Δ͜ͱͰ͔ͳΓੜͷ ࣭͕վળ - ࠓճৄ͘͠औΓ্͛·ͤΜ 32x32 Imagenet 17
Flow Contrastive Estimation of Energy-Based Models 18
ॻࢽใ • ஶऀ ੜαϯϓϧ (flow) - Ruiqi Gao, Erik Nijkamp, Diederik P. Kingma, Zhen Xu, Andrew M. Dai, Ying Nian Wu • NeurIPS 2019 Bayesian Deep Learning Workshop • Kingmaܑوͷ৽࡞ • NCEϕʔεͰEBMΛֶश͠ͳ͕ΒflowϞσϧಉ࣌ʹֶश͢Δ • ੜϞσϧͷ৭ʑͳ͍ͯͯ͠ྲྀ߹͕ݟΊͪΌΊͪΌ໘ന͍ 19
Noise Contrastive Estimation (࠶) • ·ͣɺؔZ (θ)ࣗମΛผͷύϥϝʔλcͰਪఆͤ͞Δ͜ͱʹ͠ɺ log pθ (x) = − Eθ (x) − cΛ࠷େԽͤ͞Δ͜ͱΛߟ͑Δ - ͜ͷͱ͖ɺҎԼͷతؔΛ࠷େԽ͢ΔΑ͏ʹֶश͢Δͱɺθର͕࠷େ Խ͞ΕɺcZ (θ)ʹҰக͢Δ͜ͱ͕ΒΕ͍ͯΔ pθ(x) q(x) J (θ) = 𝔼pdata(x) log + 𝔼q(x) log pθ(x) + q(x) ] pθ(x) + q(x) ] [ [ ‣ ͨͩ͠ɺq (x)ͳΜΒ͔ͷϊΠζ (ΨγΞϯϊΠζͱ͔) ‣ ײతʹσʔλ͔ΒͷαϯϓϧͱϊΠζΛݟ͚ΒΕΔΑ͏ʹֶश͞ΕΔ ‣ ࣮GANͱগ͕ؔ͋͠Δ (ޙड़) 20
NCEͷ՝ (࠶) • q (x)ΛͲͷΑ͏ʹબͿ͔ - q (x)͕ຬ͖ͨ݅͢ɿ ① ີ͕؆୯ʹ͖ͰࢉܭΔ ② ؆୯ʹαϯϓϧͰ͖Δ ③ σʔλpdata (x)ʹ͍ۙ͜ͱ͕·͍͠ - ①, ② ΘΓͱ؆୯͕ͩɺ③͕͍͠ ‣ ͱ͍͏͔ɺ࠷ॳ͔Βσʔλʹ͍͕ۙಘΒΕ͍ͯΔͳΒɺ Θ͟Θ͟EBMΛֶश͢Δඞཁͳ͍ 21
Flow Contrastive Estimation (FCE) • ϊΠζq (x)ʹflowϞσϧΛಉ࣌ʹֶश͠ͳ͕Β͏ͱ͍͏ͷ͕ϝΠϯ ΞΠσΞ - flow͕Θ͔Βͳ͍ਓླ͞ΜͷࢿྉΛࢀর https://www.slideshare.net/DeepLearningJP2016/dlflowbased-deepgenerative-models • ͜ͷͱ͖ɺflowϞσϧqα (x)ͷֶश௨ৗͷ࠷େԽͰͬͯΑ͍͕ɺ FCEͰNCEͷతؔΛEBMϞσϧͱ࠷ʹٯখԽ͢ΔΑ͏ʹֶश͢Δ qα (gα(z)) pθ(x) V(θ, α) = 𝔼pdata(x) log + 𝔼p(z) log pθ(x) + qα(x) ] [ [ pθ (gα(z)) + qα (gα(z)) ] - ͭ·ΓɺEBMͱflowΛఢରతʹֶशͤ͞Δ 22
FCEͰflowԿΛֶश͍ͯ͠Δͷ͔ʁ qα (gα(z)) pθ(x) V(θ, α) = 𝔼pdata(x) log + 𝔼p(z) log pθ(x) + qα(x) ] [ [ pθ (gα(z)) + qα (gα(z)) ] • ͜ͷࣜɺΑ͘ݟΔͱGANʹΊͬͪΌࣅͯΔ pθ(x) x ͕EBMͷαϯϓϧͰ͋Δ֬ pθ(x) + qα(x) qα (gα(z)) pθ (gα(z)) + qα (gα(z)) gα(z) ͕flow͔ΒͷαϯϓϧͰ͋Δ֬ • ͜ΕΛ࠷খԽ͢Δͱ͍͏͜ͱɺEBMͱflow͔Βͷαϯϓϧͷݟ͚͕͔ͭͳ ͘ͳΔΑ͏ʹֶश͢Δͱ͍͏͜ͱ 23
Vͷ࠷খԽ = JSD࠷খԽ • EBMͷֶश͕ਐΉͱɺEBMͷσʔλʹ͍ۙͯ͘͠ͷͰɺ flow࠷ऴతʹਅͷσʔλʹରͯ͠ఢରతʹֶश͞ΕΔ ➡ GANͱಉ͡ • GANͱಉ༷ʹɺEBMͷ͕σʔλʹҰக͍ͯ͠Δঢ়ଶʹ͓͍ ͯɺVͷ࠷খԽJensen-Shannon Divergence (JSD) ͷ࠷খԽͱՁ JSD (qα∥pdata) = KL (pdata∥ (pdata + qα) /2) + KL (qα∥ (pdata + qα) /2) 24
FCEͷར • EBMͱflowϞσϧ͕ಉ࣌ʹֶशͰ͖Δ - flowσʔλͷαϯϓϧ༰қ͕ͩɺϠίϏΞϯͷʹࢉܭΑΓɺ ΞʔΩςΫνϟʹ੍͕͋ΔͷͰද͕ྗݱ͍ - EBMදྗݱߴ͍͕ɺσʔλΛαϯϓϧ͢ΔʹMCMCͳͲΛ Θͳ͚ΕͳΒͣ໘ ➡ ྆ํಘΒΕΔͷͰɺ͍͍ͱ͜औΓ͕Ͱ͖Δ ‣ σʔλͷີਪఆʹEBMΛ͍ɺαϯϓϦϯάʹflowΛ͏ ͳͲ 25
࣮ݧ1 ਓ2DσʔλͰີਪఆ • 1൪ࠨͷΑ͏ͳͰಘΒΕͨσʔλ Ͱֶशͨ͠ϞσϧͷີͷΛՄࢹԽ - Glow-MLE: ࠷๏Ͱֶशͨ͠Glow - Glow-FCE: FCEͰֶशͨ͠Glow - EBM-FCE: FCEͰֶशͨ͠EBM • FCEͰֶशͨ͠EBM͕1൪៉ྷʹີΛ ਪఆͰ͖͍ͯΔ 26
࣮ݧ1 ਓ2DσʔλͰີਪఆ • EBMͷີਪఆͷਫ਼ͷֶशۂઢ • GlowΛ࠷ਪఆͰࣄલֶश͔ͯ͠ΒFCEͨ͠߹ (trained)ΑΓɺϥ ϯμϜͳॳظԽͰ࠷ॳ͔ΒFCEͰಉ࣌ʹֶशͨ͠߹ (rand) ͷํ͕ҙ ֎ʹऩଋ͕ૣ͍ 27
࣮ݧ2 ࣮ը૾σʔλ FCEͰֶशͨ͠Glowͷੜը૾ FIDͷൺֱɹɹɹɹɹɹɹɹɹɹςετσʔλʹର͢Δෛͷର 28
FCE·ͱΊ • NCEͰEBMΛֶश͢ΔࡍʹɺϊΠζʹflowΛಉ࣌ʹఢରతʹֶश ͤ͞ΔFlow Contrastive Estimation (FCE) ΛఏҊ • σʔλͷαϯϓϧflowͰɺີਪఆEBMΛ༻͍ΔͳͲͷ྆ऀͷར Λੜ͔͍ͨ͠ํ͕Մೳʹ • flowϞσϧGANͷgeneratorͱಉ͡ࢦඪ (JSD) Ͱֶश͍ͯ͠Δ͕ɺ discriminator͕શʹࣝผͰ͖Δͱgeneratorͷޯ͕ফ͑ͯ͠·ֶͬͯ श͕ෆ҆ఆͳGANͱҧͬͯɺ҆ఆֶͯ͠शͰ͖ͦ͏ • EBMΛ҆ఆֶͯ͠शͰ͖Δख๏ͱͯ͠ߩݙେ͖͍ 29
Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One 30
ॻࢽใ • ஶऀ - Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky • ICLR 2020 accepted (8, 8, 6) • σʔλxͱϥϕϧyͷಉ࣌ͷΤωϧΪʔؔΛߟ͑ΕɺࣝผϞσ ϧp (y | x)ͱੜϞσϧp (x)Λಉ࣌ʹಘΒΕΔͱ͍͏จ • ݴΘΕͯΈΕͦ͏͚ͩͲࢥ͍͔ͭͳ͔ͬͨͱ͍͏͡ײͷ༰ • ࣮ݧ͔ͳΓׂѪ͢ΔͷͰɺͳʹؾΔਓݩจΛಡΜͰ͍ͩ͘͞ 31
Joint Energy based Model (JEM) • ࣝผϞσϧͰ࠷ʹޙsoftmaxΛ͔͚ΔલͷෛͷΤωϧΪʔؔͱΈ ͳͤΔͱ͍͏ͷ͕ओͳண pθ(y | x) = exp (fθ(x)[y]) ∑y′ exp (fθ(x)[y′]) • ͜ΕΛ༻͍Δͱɺx ͱ y ͷಉ࣌ pθ(x, y) = exp (fθ(x)[y]) Z(θ) , Z (θ) = ∫∑ y′ exp (fθ(x)[y′]) dx 32
Joint Energy based Model (JEM) • ಉ࣌ͷରɺࣝผϞσϧͱੜϞσϧͷͦΕͧΕͷର ͷͰද͞ΕΔͷͰɺ͜ΕΛಉ࣌ʹ࠷େԽ͢ΕΑ͍ log pθ(x, y) = log pθ(x) + log pθ(y | x) • ୈ2߲௨ৗͷࣝผϞσϧͷֶश • ୈ1߲xͷΤωϧΪʔؔEθ (x)͕ҎԼͷΑ͏ʹ؆୯ʹ·ٻΔͷͰɺ ී௨ͷCD๏ͰֶशͰ͖Δ (ଟNCEͰͰ͖Δͣ) Eθ(x) = − LogSumExpy (fθ(x)[y]) = − log ∑ y exp (fθ(x)[y]) 33
JEMͷར • ࣝผϞσϧɺੜϞσϧ͕ಉ࣌ʹಘΒΕΔ - ΤωϧΪʔؔͰϥϕϧΛݻఆͯ͠Εɺclass-conditionalͳੜ Մೳ • ͋ࢣڭΓʹ؆୯ʹ֦ுͰ͖Δ - ϥϕϧ͕ͳ͍σʔλ୯ʹੜϞσϧଆֶ͚ͩश͢ΕΑ͍ 34
࣮ࣝ ݧผɾੜϞσϧͷಉֶ࣌श • ࣝผɺੜͱʹ୯ମͰֶशͨ͠ͷʹඖఢ͢Δਫ਼ CIFAR10 • class-conditionalͳੜը૾ 35
JEMͷ՝ • ΓCD๏ͰͷEBMͷֶश͕ը૾ͱ͔ͩͱ͍͠ - ͕͕ؔ͋ͬͯ·ٻΒͳ͍ͷͰɺֶश͕͏·͍͍ͬͯ͘Δ ͔ͮ͠ূݕΒ͍ - MCMCͰͷαϯϓϦϯάΛֶͬͯश͢ΔͷΓෆ҆ఆ ‣ ࡉ͔͍νϡʔχϯάʹ͔ͳΓηϯγςΟϒΒ͍͠ ‣ લͷFCEΛ͏ͱղܾ͢Δ͔ (?) 36
શମ·ͱΊ • ۙͷΤωϧΪʔϕʔεϞσϧͷਐలʹ͍ͭͯ·ͱΊͨ • ࣄલֶशϞσϧͱͯ͠ͷRBMͷࠒͱҧ͍ɺ࠷͔ۙͳΓॊೈͳΘ Εํ͕͞Ε͖͍ͯͯΔ • EBMֶशΛ҆ఆͤ͞Δͷ͕͔͕ͬͨ͠ɺલͷFCEEBMͷ҆ఆ తͳֶशʹ͔ͳΓߩ͍ͯ͠ݙΔͱࢥ͏ - JEMͷֶशʹFCEΛͬͨΒͲ͏ͳΔ͔ݸਓతʹͳʹؾΔ • ಉ͘͡NCEΛ͏૬ޓใྔܥͷख๏ͱͷؔ࿈ͳͲɺࠓޙ͍Ζ͍Ζ ͱ͕ڀݚਐలͦ͠͏ • Ռͨͯ͠EBMϒʔϜ͕དྷΔͷ͔(?) 37