1K Views
February 19, 21
スライド概要
2020/05/08
Deep Learning JP:
http://deeplearning.jp/seminar-2/
DL輪読会資料
The frontier of simulation-based inference Shohei Taniguchi, Matsuo Lab (M1) 1
Paper Info ॻࢽใ • ஶऀ • Kyle Cranmera, Johann Brehmera, and Gilles Louppe • χϡʔϤʔΫେɺϦΤʔδϡେ • PNAS accepted • બఆཧ༝ • ͳ͠ਪͷڵຯ
Outline ൃද֓ཁ 1. લఏࣝ • ͳ͠ਪ 2. ۙͷਐల • ػցֶशʢಛʹਂֶशʣͷൃల • ͳ͠ਪͷߩݙ
Statistical Inference ౷ܭతਪ • σʔλ͔Β౷ܭϞσϧ p (X ∣ Θ) ͷύϥϝʔλ Θ Λਪఆ͢Δ͜ͱ • ༷ʑͳਪํ๏ 1. ࠷ਪఆɹɹargmaxΘ p (X = x ∣ Θ) 2. MAPਪఆɹɹargmaxΘ p (X = x ∣ Θ) p (Θ) 3. ϕΠζਪఆɹp(Θ ∣ X = x) = p (X = x ∣ Θ) p (Θ) ∫ p (X = x ∣ Θ) p (Θ) dΘ
Likelihood ͱ p (X ∣ Θ)ʹ͍ͭͯ • Θ = θ Λݻఆͯ͠Xͷؔͱͯ͠ ؔີ֬ → ͖ͱͨݟp (X ∣ Θ = θ) • X = x Λݻఆͯ͠Θͷؔͱͯ͠ → ͖ͱͨݟؔɹɹ p (X = x ∣ Θ) • ۠ผ͢ΔͨΊʹɺཅʹ L (θ ∣ X = x) ͱॻ͘͜ͱ͋Δ • ػցֶशͷจͩͱɺ۠ผͤͣʹ p (x ∣ θ) ͳͲͱॻ͔ΕΔ͜ͱ͕ଟ͍
Markov Chain Monte Carlo Ϛϧίϑ࿈ϞϯςΧϧϩɺMCMC θ0 Λॳظɺq (Θ′ ∣ Θ) ΛఏҊͱͯ͠ɺҎԼΛ܁Γฦ͢ 1. θt ∼ q (Θ′ ∣ θt−1) Λαϯϓϧ 2. ֬ min p (x ∣ θt) p (θt) q (θt−1 ∣ θt) Ͱ θt Λ࠾ 1, ( p (x ∣ θt−1) p (θt−1) q (θt ∣ θt−1) ) ͦΕҎ֎غ٫
Likelihood-free Inference ͳ͠ਪɺSimulation-based Inference • p (x ∣ θ) ཅʹॻ͚ͳ͍͕αϯϓϦϯά x ∼ p (X ∣ θ) ՄೳͳϞσϧͰͷਪ (ྫ) જࡏมϞσϧ p (X ∣ θ) = p (X ∣ Z, θ) p (Z ∣ θ) dZ ∫ • αϯϓϦϯάաఔΛ x = f (z, θ) ͷΑ͏ʹॻ͚Δ߹ͯ͢ͳ͠Ϟσϧ • ͭ·Γɺ p (X ∣ Z, Θ) Θ͔Βͳ͍ͱ͍͏߹ؚΉ • જࡏม߹Θͤͯਪ͢Δ߹͋Δ
Example 1: Population Genetics Ҩ౷ֶܭ • ͋ΔूஂͷDNAͷσʔλ͔ΒɺಥવมҟMRCA ·Ͱͷ࣌ؒͳͲΛਪఆ͢Δ • MRCA: ࠷͍ۙڞ௨ͷઌ https://www.ism.ac.jp/~fukumizu/ABC2015/ABC_review.pdf • ύϥϝʔλΛܾΊͯɺҨͷաఔΛγϛϡϨʔγϣϯ͢Δ͜ͱ؆୯͕ͩɺ ཅʹఆٛͰ͖ͳ͍͜ͱ͕ଟ͍
Example 2: GAN Generative Adversarial Networks • Λ͍ͳ͠ࢉܭΘΓʹdiscriminatorΛ༻ҙͯ͠ɺఢରతʹֶश minD maxG V (D, G) = 𝔼x∼p(X) [log D (x)] + 𝔼z∼p(Z) log (1 − D (G (z))) [ ] • discriminatorɺσʔλͱϞσϧͷີൺΛਪఆ͍ͯ͠ΔͱΈͳͤΔ p (x) D (x) r (x) = = p (x ∣ θ) 1 − D (x)
Traditional Methods ݹయతͳख๏ 1. Approximate Bayesian Computation • αϯϓϦϯάʹͮ͘جख๏ • ͳ͠ਪͱ͍͏߹ɺ΄ͱΜͲABCΛࢦ͢͜ͱ͕ଟ͍ 2. Surrogate model • ͳͲΛ͢ࢉܭΔΘΓͷϞσϧΛ༻ҙ͢Δ • ݹయతʹΧʔωϧີਪఆͳͲ͕ΘΕΔ
Approximate Bayesian Computation ۙࣅϕΠζࢉܭɺABC ҎԼΛ܁Γฦ͢ 1. θ ∼ p (Θ) Λαϯϓϧ 2. xθ ∼ p (X ∣ θ) Λαϯϓϧ 3. ͋Δࢦڑඪ d Ͱɺd (xθ, x) < ϵ ͳΒ θΛ࠾ɺͦΕҎ֎غ٫ • ϵ → 0 Ͱࣄޙ p (Θ ∣ x) ʹऩଋ
ABC with MCMC MCMCΛ༻͍ͨABC θ0 Λॳظɺq (Θ′ ∣ Θ) ΛఏҊͱͯ͠ɺҎԼΛ܁Γฦ͢ 1. θt ∼ q (Θ′ ∣ θt−1) Λαϯϓϧ 2. xθt ∼ p (X ∣ θt) Λαϯϓϧ 3. d (xθ, x) < ϵ ͳΒ 4ɺͦΕҎ֎غ٫ 4. ֬ min ( 1, p (θt) q (θt−1 ∣ θt) p (θt−1) q (θt ∣ θt−1) ) Ͱ θt Λ࠾ɺͦΕҎ֎غ٫
Challenges ABCͷ՝ • ᮢ ϵ ͱਫ਼ͷτϨʔυΦϑ • ϵ Λখ͘͢͞Εਫ਼ྑ͘ͳΔ͕ɺ΄ͱΜͲͷαϯϓϧ͕غ٫͞ΕΔ • ࢦڑඪ d ͷઃܭ • X ্ͰڑΛଌΔͷ͕ཧ͕ͩɺغ٫͞Ε͍͢ • ཁ౷ ྔܭS (x) ͷڑΛଌΔ͜ͱ͕ଟ͍͕ɺS ͷઃ͕ܭ͍͠
Surrogate model ཧϞσϧ ؔΛۙࣅ͢ΔཧϞσϧ p̂ (X ∣ Θ; w) Λ ༻ҙͯ͠ɺҎԼΛߦ͏ 1. αϯϓϧ θ ∼ p (Θ), xθ ∼ p (X ∣ θ) Λ ࢣڭσʔλͱͯ͠ཧϞσϧΛֶश 2. ཧϞσϧΛͬͯσʔλ͔Β θ Λਪఆ 3. (option) ਪఆͨ͠ θ ͰཧϞσϧΛՃ ֶश ➡ 2, 3Λ܁Γฦ͢
Challenges ཧϞσϧͷ՝ • ਪఆͷਫ਼͕ཧϞσϧʹґଘ • ཧϞσϧͷද͕ྗݱऑ͍ͱɺਫ਼͕ѱ͘ͳΔ • ߴ࣍ʹݩεέʔϧ͠ͳ͍ • ཧϞσϧʹΧʔωϧີਪఆ͕ΘΕ͖͕ͯͨɺߴ࣍Ͱݩ͍͠ݫ
Frontiers of simulation-based inference ͳ͠ਪͷ՝ 1. αϯϓϧޮ • গͳ͍αϯϓϧͰਪఆ͍ͨ͠ 2. ਪఆਫ਼ • ਫ਼ྑ͘ਪఆ͍ͨ͠ 3. σʔλޮ (amortization) • σʔλ͕૿͑ͯޮΑ͘ਪఆ͍ͨ͠
Revolution of Machine Learning ػցֶशͷൃల • ۙɺػցֶशɺಛʹਂֶशͷٸ͕ڀݚʹൃల • ಛʹ͋ࢣڭΓֶशͰΊ͟·͍͠Ռ • ਂֶशͷख๏͕ͳ͠ਪʹΘΕ࢝Ί͍ͯΔ • ཧϞσϧʹχϡʔϥϧͳີਪఆثΛ͏ • GANͷΑ͏ͳີൺਪఆʹ͍ͨͮجਪख๏
18
Quantities ҎԼͷྔ͕͖ͰࢉܭΔ͔Λج४ʹख๏ΛબͿͱྑ͍ I. p (x ∣ z, θ)ɿજࡏม͕༩͑ΒΕͨͱ͖ͷ֬ີ II. t (x, z ∣ θ) ≡ ∇θ log p (x, z ∣ θ)ɿજࡏมͱͷಉ࣌ͷޯ III. ∇z log p (x, z ∣ θ)ɿ؍ଌͱજࡏมͷಉ࣌ͷજࡏมʹ͍ͭͯͷޯ IV. r (x, z ∣ θ, θ′) ≡ p (x, z ∣ θ) p (x, z ∣ θ′) ɿҟͳΔύϥϝʔλͰͷಉ࣌ͷີൺ V. ∇θ (x, z)ɿ؍ଌͱજࡏมͷύϥϝʔλʹ͍ͭͯͷޯ VI. ∇z xɿ؍ଌͷજࡏมʹ͍ͭͯͷޯ
Approximate Bayesian Computation with Monte Carlo Sampling ABCͰαϯϓϦϯά • ී௨ͷABC • ҎԼΛ܁Γฦ͢ 1. θ ∼ p (Θ) Λαϯϓϧ 2. xθ ∼ p (X ∣ θ) Λαϯϓϧ 3. ͋Δࢦڑඪ d Ͱɺd (xθ, x) < ϵ ͳΒ θΛ࠾ɺͦΕҎ֎غ٫
Approximate Bayesian Computation with Learned Summary Statistics ཁ౷ྔܭΛֶश • ݹయతʹɺཁ౷ྔܭͷઃܭΛυϝΠϯࣝ ͷ͋ΔઐՈ͕ߦ͖ͬͯͨ • ΘΓʹྑ͍ੑ࣭Λͭཁ౷ྔܭΛֶश͢Δ (ྫ) t (x ∣ θ) ≡ ∇θ log p (x ∣ θ) ɹ p (x) = p (x ∣ θ) ⇒ t (x ∣ θ) = 0 Λຬͨ͢ (V) ∇θ (x, z) ͕Θ͔ΕࢦͰۙࣅՄ ೳ (II) t (x, z ∣ θ) ͕Θ͔ΕαϯϓϧΛͬͯ t (x ∣ θ)Λۙࣅ͢ΔNNΛֶशͤ͞ΒΕΔ
Probabilistic Programming with Monte Carlo sampling ֬తϓϩάϥϛϯάͰαϯϓϦϯά • StanͳͲͷ֬తϓϩάϥϛϯ ά( ޠݴPPL) ͕ൃల • (Ⅰ) p (x ∣ z, θ) ͕Θ͔Ε MCMCͳͲͷαϯϓϦϯάख ๏͕ߴʹ࣮ߦͰ͖Δ • ABCPPLͰ؆୯ʹՄೳʹ
Probabilistic Programming with Inference Compilation ֬తϓϩάϥϛϯάΛ༻͍ͨਪͷमਖ਼ • ͍ΘΏΔamortized inference • ۙࣅࣄޙ q (z, θ ∣ x) ΛNNͰ ఆٛͯ͠ɺ(Ⅰ) p (x ∣ z, θ) Λͬ ֶͯश͢Δ • q (z, θ ∣ x) ΛఏҊͱͯ͠ɺ ஞ࣍ॏαϯϓϦϯάΛͯ͠ɺ ࠷ऴతͳਪ݁ՌΛಘΔ
Amortized Likelihood ঈ٫ • ී௨ͷཧϞσϧΛ͏ख๏ͷ͜ͱ • ۙΧʔωϧີਪఆͷΘΓʹ normalizing flowͳͲͷχϡʔϥϧີ ਪఆ͕ΘΕΔΑ͏ʹͳͬͨ • ཧϞσϧ1ֶशͨ͠Βɺ৽͠ ͍σʔλʹରͯ͠ޮతʹਪͰ ͖ɺίετ͕͖ͰݮΔ (amortize)
Amortized Posterior ঈ٫ࣄޙ • ࣄޙ p (Θ ∣ X) ͷཧϞσϧ p̂ (Θ ∣ X; w) Λ࡞Δ • ֶशͨ͠ཧϞσϧʹσʔλ x Λ ೖྗͨ͠Βࣄޙ͕ಘΒΕΔ • ࣄޙͷཧϞσϧΛ࡞Δͷ ͔͕ͬͨ͠ɺදྗݱͷߴ͍χϡ ʔϥϧີਪఆͷͰ༻׆Մೳʹ
Amortized Likelihood Ratio ঈ٫ൺ • ൺ p (X ∣ Θ) ʹ͍ͭͯ p (X ∣ Θ′) ཧϞσϧ r ̂ (X, Θ, Θ′; w) Λ࡞ֶͬͯश͢Δ • GANͷdiscriminatorͱ΄ͱΜͲಉ͕ͩ͡ɺ ύϥϝʔλೖྗʹೖΔ • ཧϞσϧΛMCMCͷغ٫ͷ͏ʹࢉܭ min 1, ( r ̂ (x, θt, θt−1; w) p (θt) q (θt−1 ∣ θt) p (θt−1) q (θt ∣ θt−1) )
Amortized surrogates trained with augmented data ঈ٫ཧϞσϧ (Ⅱ) t (x, z ∣ θ) (Ⅳ) r (x, z ∣ θ, θ′) ͕ ಘΒΕΔͱ͖ɺൺͷཧ Ϟσϧ p̂ (X ∣ Θ; w), r ̂ (X, Θ, Θ′; w) ɺҎԼͷଛࣦؔͷ࠷খԽͰֶश Ͱ͖Δ 1 LROLR[r]̂ = yi r (xi, zi) − r ̂ (xi) ∑ N i 2 + (1 − yi) 1 r (xi, zi) − 1 2 r ̂ (xi) 1 ̂ LSCANDAL[ p]̂ = LMLE + α t (xi, zi) − ∇θ log p(x) N∑ i 2
Asymptotically Exact Bayesian inference ۙతʹͳີݫਪ๏ • (Ⅵ) ∇z x ͕ಘΒΕΔͱ͖ɺજࡏมʹ͍ͭͯͳີݫϕΠζਪఆ͕Մೳ • x = f (z, θ) Λ੍݅ͱߟ͑Δͱɺ੍͖MCMC๏Λ͏͜ͱͰࣄޙ͔Βͷαϯϓϧ͕ಘΒΕΔ • z0 Λॳظɺq (Z′ ∣ Z) ΛఏҊͱͯ͠ɺҎԼΛ܁Γฦ͢ 1. zt ∼ q (Z′ ∣ zt−1) Λαϯϓϧ 2. d (f (zt, θ), x) = 0 Λ४χϡʔτϯ๏Ͱղ͘ 3. ֬ min ( 1, p (zt) q (zt−1 ∣ zt) p (zt−1) q (zt ∣ zt−1) ) Ͱ θt Λ࠾ɺͦΕҎ֎غ٫
Summary ͳ͠ਪͷൃల • ͳ͠ਪ͕ɺਂֶशͷख๏ͷͰ༻׆ΑΓޮత͔ͭߴਫ਼ʹߦ͑ΔΑ ͏ʹͳ͖ͬͯͨ • ख़࿅ऀʹΑΔཁ౷ྔܭͷઃ → ܭNNͰཁ౷ྔܭΛֶश • Χʔωϧີਪఆ → χϡʔϥϧີਪఆ (normalizing flow) • ֬తϓϩάϥϛϯάͷൃల
ײ • ͜ͷจɺਂֶश͕ͳ͠ਪʹ༩͑Δߩ͕͍ͯͭʹݙϝΠϯ͕ͩɺ ʹٯͳ͠ਪͷཧ͔ΒಘΒΕΔਂֶशͷ͕͋ݟΔͱ໘ന͍ • ͳ͠ਪύϥϝʔλ͕গͷ߹͕ଟ͍͕ɺଟύϥϝʔλΛѻ͏ ਂֶशͰͳ͠ਪ͢ΔͨΊͷํ๏ඞཁͦ͏ • GAN1ͭͷޭྫ͕ͩɺطଘͷͳ͠ਪख๏ͱͷ͕ܨΓ·ͩෆ໌ྎ • ཁ౷ͯ͠ͱྔܭJS/Wasserstein divergenceΛ͏߹ʹ૬ (?)