[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning


April 01, 19


Recent Advances in Autoencoder-Based Representation Learning Presenter: Tatsuya Matsushima @__tmats__ , Matsuo Lab 1


ྠಡ಺༰ʹ͍ͭͯ Recent Advances in Autoencoder-Based Representation Learning • https://arxiv.org/abs/1812.05069 (Submitted on 12 Dec 2018) • Michael Tschannen, Olivier Bachem, Mario Lucic • ETH Zurich, Google Brain • NeurIPS 2018 Workshop (Bayesian Deep Learning) • http://bayesiandeeplearning.org/ • ͪͳΈʹҾ༻จ‫ݙ‬ͷखલ·ͰͰ19ϖʔδ΋͋Δ͕ɼ࠷ॳͷ3ϖʔδͷΈͰaccept͕ܾ·ΔΒ͍͠ • ΦʔτΤϯίʔμϕʔεͷද‫ֶݱ‬शʹؔ͢ΔαʔϕΠ࿦จ • ۙ೥ͷϞσϧΛͱͯ΋޿͘Χόʔ͍ͯ͠Δ • ྔ͕ଟͯ͘ಡΉͷ͸πϥΠ(ΧλϩάͬΆ͍…) ɹ※Ҿ༻ϚʔΫͷͳ͍ਤද͸͜ͷ࿦จ͕ग़య 2


TL; DR • Ͱ͖Δ͚ͩ‫͍ྑͰ͠ͳࢣڭ‬ද‫ݱ‬Λ‫͚ͭݟ‬Δ͜ͱ͸ɼ‫ػ‬ցֶशʹ͓͍ͯ೉୊ • ຊ࿦จͰ͸ɼΦʔτΤϯίʔμΛ࢖ͬͨख๏ʹؔͯ͠·ͱΊΔ • λεΫʹͱͬͯྑ͍ද‫͕͋ݱ‬ΔͱԾఆ͢Δmeta-priorͱ͍͏ߟ͑ํΛར༻ͯٞ͠࿦͢Δ • ಛʹɼᶃ (ۙࣅ)ࣄ‫ޙ‬෼෍ͷਖ਼ଇԽɼᶄ ΤϯίʔμͱσίʔμͷҼ਺෼ղɼᶅ ࣄલ෼෍ʹॊೈͳ ෼෍Λ༻͍Δɼͱ͍͏ΞϓϩʔνΛத৺ʹѻ͏ • Rate-DistortionͷτϨʔυΦϑͱ͍͏ߟ͑ํΛ༻͍ͨ෼ੳΛߦ͏ 3


͓‫ͪ࣋ؾ‬ • ঢ়ଶද‫ֶݱ‬श(SRL)ʹؔ͢ΔαʔϕΠΛલճͷྠಡͰൃදͨ͠ • [DLྠಡձ]‫ڧ‬ԽֶशͷͨΊͷঢ়ଶද‫ֶݱ‬श ʵΑΓྑ͍ʮੈքϞσϧʯͷ֫ಘʹ޲͚ͯʵ https://www.slideshare.net/DeepLearningJP2016/dl-124128933 • SRLͰ͸VAEϕʔεͷख๏͕ଟ༻͞Ε͓ͯΓɼͦ΋ͦ΋VAEͰද‫ֶݱ‬श͢Δख๏ͷ੔ཧΛ ͔ͨͬͨ͠ 4


VAEͷ͓͞Β͍ 5


VAE Variational Autoencoder (VAE) [Kingma+ 2014a] • જࡏม਺ϞσϧΛֶश͢ΔͨΊʹɼ‫܇‬࿅σʔλͷର਺໬౓ͷ࠷େԽΛ໨ࢦ͢ 𝔼p(x) ̂ [−log pθ(x)] = ℒVAE(θ, ϕ) − 𝔼p(x) ̂ [DKL (qϕ(z | x)∥pθ(z | x))] −ℒVAE 𝔼p(x) • KL͸ඇෛͳͷͰɼɹɹɹ͸ɼର਺໬౓ ̂ [−log pθ(x)] ͷԼքʹͳ͍ͬͯΔ(ELBO) • ͭ·ΓELBOͷ࠷େԽΛ͢Ε͹ྑ͍(VAEͷloss ℒVAE ͷ࠷খԽ) ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] ̂ p(x) ※ ‫ݧܦ‬σʔλ෼෍ɹɹͰฏ‫ۉ‬ΛͱΔ͜ͱΛ໌ࣔతʹ͍ࣔͯͯ͠ɼए‫׳ݟׯ‬Εͳ͍͚Ͳී௨ͷVAEͷELBO 6


VAE VAEͷloss ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] z (i) ∼ qϕ(z | x (i)) • ୈ1߲͸ɼɹɹɹɹɹʹΑΔαϯϓϧΛ༻͍ɼޯ഑͸reparametrization trickΛ࢖ͬͯ‫఻ٯ‬೻ ͤ͞Δ • ୈ2߲͸ɼclosed-formʹ‫ٻ‬ΊΔ͔ɼαϯϓϧ͔Βਪఆ͢Δ qϕ(z | x) = 𝒩 μϕ(x), diag (σϕ(x)) p(z) = 𝒩(0,I ) ΛબΜͩͱ͖ • ۙࣅࣄ‫ޙ‬෼෍ͱͯ͠,ɹɹɹɹɹɹɹɹɹɹɹɹɹɼࣄલ෼෍ͱͯ͠ɼ ( ) ͸closed-formʹ‫͖Ͱࢉܭ‬Δ • ͦͷ΄͔ͷͱ͖͸ɼ෼෍ؒͷ‫཭ڑ‬Λαϯϓϧ͔Βਪఆ͢Δඞཁ͕͋Δ 7


ఢରతֶशʹΑΔີ౓ൺਪఆ f-μΠόʔδΣϯε f f (1) = 0 ͕੒ཱ͢ΔͱԾఆͨ͠ͱ͖ɼpx ͱ py ͷf-μΠόʔδΣϯεΛ • ɹΛತؔ਺Ͱɼ ͱఆٛ͢Δɽ px(x) Df (px∥py) = f p (x)dx ∫ ( py(x) ) y • f (t) = t log t ͷͱ͖ɼKL divergenceʹͳΔ Df (px∥py) = DKL (px∥py) py x • pɹͱɹ͔Βͷαϯϓϧ͕༩͑ΒΕͨͱ͖ɼdensity-ratio trickΛ࢖ͬͯf-μΠόʔδΣϯεΛਪఆ Ͱ͖Δ • GANʹΑͬͯ஌ΒΕΔΑ͏ʹͳͬͨ 8


ఢରతֶशʹΑΔີ౓ൺਪఆ GANʹΑΔDensity-ratio TrickΛ࢖ͬͨKLμΠόʔδΣϯεͷਪఆ px py c ∈ {0,1} • ɹͱɹΛϥϕϧɹɹɹɹʹΑͬͯ৚͚݅ͮΒΕͨ෼෍ͱͯ͠ද‫͢ݱ‬Δ px(x) = p(x | c = 1) py(x) = p(x | c = 0) • ͭ·Γɼɹɹɹɹɹɹɹɼ • 2஋෼ྨλεΫʹམͱ͠ࠐΉ px(x) • Discriminator Sη ͸ɼͦͷೖྗ͕෼෍ɹɹ͔ΒಘΒΕͨ΋ͷͰ͋Δ֬཰Λ༧ଌ͢Δ • ͜ͷͱ͖ɼີ౓ൺ͸Ϋϥεͷ֬཰͕ಉ౳ͱͯ͠ɼ Sη(x) px(x) p(x | c = 1) p(c = 1 | x) = = ≈ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹͱͳΔ py(x) p(x | c = 0) p(c = 0 | x) 1 − Sη(x) px • Ҏ্ΑΓɼɹ͔Βi.i.dͳɹ‫ݸ‬ͷαϯϓϧ͕ಘΒΕͨͱ͖ɼ N Sη (x (i)) 1 N DKL (px∥py) ≈ log ∑ N i=1 ( 1 − Sη (x (i)) ) 9


Maximum Mean Discrepancy (MMD) ℋ k:𝒳→𝒳 ɹɹɹɹΛ࿈ଓͰ༗քͳ൒ਖ਼ఆ஋ΧʔωϧɼɹΛରԠ͢Δ࠶ੜ֩ώϧϕϧτۭؒɼ py(x) φ :𝒳→ℋ px(x) ɹɹɹɹΛͦͷಛ௃ࣸ૾ͱ͢ΔͱɼɹɹͱɹɹͷMMD͸ MMD (px, py) = 𝔼x∼px[φ(x)] − 𝔼y∼py[φ(y)] 2 ℋ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹͱఆٛ͞ΕΔ • ௚‫ײ‬తʹ͸ɼ෼෍ؒͷ‫཭ڑ‬͸ಛ௃ྔͷembeddingͷฏ‫ۉ‬ͷ‫͞ࢉܭͯ͠ͱ཭ڑ‬ΕΔͱ͍͏࿩ • ྫ) 𝒳 = ℋ = ℝd Ͱɼφ(x) = x ͷͱ͖ɼMMD͸ฏ‫ۉ‬ͷࠩ MMD (px, py) = μpx − μpy 2 2 φ • ద੾ͳࣸ૾ɹΛબͿ͜ͱͰɼߴ࣍ͷϞʔϝϯτͷ‫Ͱ఺؍‬μΠόʔδΣϯεΛਪఆͰ͖Δ 10


Meta-Priorͷ࣮‫ݱ‬ख๏ͱͯ͠ͷVAE 11


Meta-Priorͱ͸ Meta-prior [Bengio+ 2013] • ಉ࣌ʹଟ͘ͷλεΫʹ࢖͑Δද‫ݱ‬ͷੑ࣭ʹؔ͢ΔԾఆ • ͦ΋ͦ΋ɼද‫ݱ‬ͷʮྑ͞ʯ͸Կʹ࢖ΘΕΔ͔ʹґଘ͢Δ΋ͷͰ͋Δ • ϥϕϧΛ͚ͭΔͷ͸ίετ͕ߴ͍ͷͰຊ౰͸ϥϕϧͷ਺Λ‫ݮ‬Β͍ͨ͠ • But ‫ࢣڭ‬৴߸ͳ͠Ͱྑ͍ද‫ݱ‬Λֶश͢Δͷ͸೉͍͠ • ϥϕϧͳ͠ʹɼྑ͍ද‫͔ͨͬͭݟ͕ݱ‬Β͏Ε͍͠ɹ→meta-priorͷ‫༻׆‬ 12


Meta-Priorͷछྨ [Bengio+ 2013] Disentanglement • σʔλ͸ಠཱͳมԽ͢Δཁૉ͔Βੜ੒͞Ε͍ͯΔͱ͍͏Ծఆ • ྫ) ෺ମͷ޲͖ɼޫ‫ݯ‬ͷঢ়ଶ • ͜ΕΒͷཁૉΛผʑͷද‫֫ͯ͠ͱݱ‬ಘ͢Δ͜ͱͰͦͷ‫ޙ‬ͷଟ͘ͷλεΫʹ࢖͑Δ͸ͣ આ໌ม਺ͷ֊૚ੑ • ੈք͕ந৅తͳ֓೦ͷ֊૚ੑͰઆ໌Ͱ͖Δͱ͍͏Ծఆ • ྫ) ෺ମ͸༷ʑͳཻ౓Ͱઆ໌͞ΕΔ(ଐੑΛ༩͑Δ͜ͱͰ΋ͬͱ۩ମతʹઆ໌Ͱ͖Δ) 13


Meta-Priorͷछྨ [Bengio+ 2013] ൒‫͋ࢣڭ‬Γֶश • ‫͋ࢣڭ‬Γɾͳֶ͠शͷ྆ํͰද‫ݱ‬Λ‫ڞ‬༗͢Δ͜ͱͰγφδʔ͕ੜ·ΕΔͱ͍͏Ծఆ • Ұൠతʹϥϕϧ෇͖σʔλͷ਺͸গͳ͍ͷͰɼϥϕϧͷͳ͍σʔλ΋࢖͏͜ͱͰද‫ֶݱ‬शͷΨ ΠυʹͳΔ Ϋϥελੑ • ଟ͘ͷσʔλ͸ෳ਺ͷΧςΰϦͷߏ଄Λ࣋ͪɼΧςΰϦґଘͷมԽΛ͢Δͱ͍͏Ծఆ • ͜ͷΑ͏ͳߏ଄͸ͦΕͧΕͷίϯϙʔωϯτ͕ҰͭͷΧςΰϦʹ૬౰͢Δજࡏۭؒͷࠞ߹Ϟσ ϧͰද‫͢ݱ‬Δ͜ͱ͕Ͱ͖Δ͸ͣ 14


Meta-PriorͷΦʔτΤϯίʔμʹΑΔ࣮‫ݱ‬ (‫ࢣڭ‬σʔλͳ͠ͷ)ද‫ֶݱ‬शͷଟ͘ͷΞϧΰϦζϜ͸ ΦʔτΤϯίʔμʹ‫͍ͯͮج‬ఏҊ͞Ε͍ͯΔ • ͜ͷ࿦จͰ঺հ͍ͯ͠Δͷ͸ɼԿΒ͔ͷmeta-priorΛ࣮‫͠ݱ‬Α͏ͱͯ͠Δ΋ͷ ࠓճશ෦঺հ͢Δͷ͸ϜϦͳͷͰ͍͔ͭ͘… (ͱ͍͏͔஌Βͳ͔ͬͨͷ݁ߏ͋Δ) 15


Meta-PriorΛϞσϧʹಋೖ͢Δํ๏ ᶃ Τϯίʔμʹਖ਼ଇԽ߲ΛՃ͑Δ • disentangleͳද‫ݱ‬ͷֶशͷͨΊʹΑ͘༻͍ΒΕΔ ᶄ ΤϯίʔμͱσίʔμͷҼ਺෼ղ • ֊૚తͳද‫͢ʹݱ‬ΔͨΊʹΑ͘༻͍ΒΕΔ ᶅ ࣄલ෼෍ʹॊೈͳ෼෍Λ༻͍Δ • ΫϥελΛද‫͢ݱ‬ΔͨΊʹΑ͘༻͍ΒΕΔɹྫ) ࠞ߹Ϟσϧ 16


ᶃΤϯίʔμʹਖ਼ଇԽ߲ΛՃ͑Δ 17


VAEͷਖ਼ଇԽ z ∼ qϕ(z | x) meta-priorΛજࡏද‫ݱ‬ɹɹɹɹʹද‫ͤ͞ݱ‬ΔͨΊʹɼ 1 N qϕ(z | x) qϕ(z) = 𝔼p(x) qϕ(z | x (i)) ۙࣅࣄ‫ޙ‬෼෍ɹɹɹ΍aggregate͞Εͨ(ۙࣅ)ࣄ‫ޙ‬෼෍ɹɹɹɹɹɹɹɹɹɹɹɹɹ ̂ [qϕ(z | x)] = N∑ i=1 ʹؔ͢Δ߲Λ௨ৗͷVAEͷ໨తؔ਺ʹ௥Ճ͢Δ ℒVAE(θ, ϕ) + λ1𝔼p(x) ̂ [R1 (qϕ(z | x))] + λ2 R2 (qϕ(z)) qϕ(z) • aggregate͞Εͨ(ۙࣅ)ࣄ‫ޙ‬෼෍ɹɹ͸σʔλશମʹґଘ͢ΔͷͰɼཧ࿦తʹ͸ϛχόον ͷޯ഑๏͸࢖͑ͳ͍ͨΊۙࣅΛ࢖͏ ℒVAE • VAEͷ‫ݩ‬ʑͷό΢ϯυɹɹɹΑΓ΋ΏΔ͍ό΢ϯυʹͳΔͷͰɼ࠶ߏ੒ͷ࣭͕௿͍Մೳੑ 18


VAEͷਖ਼ଇԽ ۙࣅࣄ‫ޙ‬෼෍ʹؔ͢Δਖ਼ଇԽΛ༻͍Δओͳख๏ ℒVAE(θ, ϕ) + λ1𝔼p(x) ̂ [R1 (qϕ(z | x))] + λ2 R2 (qϕ(z)) ϥϕϧ Optional ඞཁ 19


VAEͷਖ਼ଇԽ ਖ਼ଇԽͷํ๏ qϕ(z) • aggregate͞Εͨ(ۙࣅ)ࣄ‫ޙ‬෼෍ɹɹʹؔ͢Δਖ਼ଇԽΛߦ͏΋ͷ΋ଟ͍ • divergenceʹ‫ͮ͘ج‬ਖ਼ଇԽɾϞʔϝϯτʹ‫ͮ͘ج‬ਖ਼ଇԽ aggregate͞Εͨ (ۙࣅ)ࣄ‫ޙ‬෼෍ ʹؔ͢Δਖ਼ଇԽ߲ 20


DisentanglementͷͨΊͷਖ਼ଇԽ ਖ਼ଇԽʹΑͬͯdisentangleΛ໨ࢦ͢ҙਤ • σʔλͷੜ੒աఔʹɼ৚݅෇͖ಠཱͳม਺ɹͱ৚݅෇͖ಠཱͰ͸ͳ͍ม਺ɹΛԾఆ͢Δ v w x ∼ p(x | v, w) p(v | x) = p v |x ∏ ( j ) j qϕ(z | x) v • ਪ࿦Ϟσϧɹɹɹ͕ɹΛ༧ଌͰ͖ΔΑ͏ʹlossΛมߋ͢Ε͹ྑ͍ 21


DisentanglementͷͨΊͷਖ਼ଇԽ DisentangleͷධՁ • ΋͠Մೳͳ৔߹͸ɼਅͷม਺ͱͷൺֱʹΑͬͯߦ͏ • ଟ͘ͷ࿦จͰdisentangleͳද‫͋Ͱݱ‬Δ͜ͱͷओு͕ߦΘΕ͍ͯΔ͕ɼ࣮ࡍ͸disentangleͷ ਖ਼֬ͳ֓೦ͷఆٛ΍ɼ‫͠ͳࢣڭ‬ͷઃఆԼͰͲΕ͚ͩ༗ޮͳͷ͔͸Θ͔Βͳ͍ • ͳͷͰɼຊ࿦จͰ͸ίϯηϓτ͚ͩ঺հ(ͲΕ͚ͩdisentangle͞Ε͍ͯΔ͔͸‫)͍ͳ͠ʹؾ‬ • [Locatello+ 2018]͕େ‫ن‬໛ͳ࣮‫ݧ‬Λͯ͠‫͍ͯ͠ূݕ‬Δ • ಛ௃తͳํ๏ • (a) ELBOͷॏΈ෇͚Λม͑Δ • (b) xͱzͷ૬‫ޓ‬৘ใྔΛ࢖͏ • (c) જࡏม਺ؒͷಠཱੑΛԾఆ 22


(a) ELBOͷॏΈ͚ͮΛม͑Δ β-VAE [Higgins+ 2017] • ௨ৗͷVAEͷLoss ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qKL(q | x)∥p(z))] ͷୈ2߲ΛॏΈ෇͚Δ ℒβ−VAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ1𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] qϕ(z | x) • ۙࣅࣄ‫ޙ‬෼෍ɹɹɹ͕ࣄલ෼෍ɹɹʹ͖ۙͮ΍͘͢ͳΔͷͰɼཁૉ͕෼ղ͞Ε΍͘͢ͳΔ p(z) ͱ‫ظ‬଴Ͱ͖Δ ਤग़య: [Higgins+ 2017] 23


(b) xͱzͷ૬‫ޓ‬৘ใྔΛ࢖͏ ௨ৗͷVAEͷLoss ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qKL(q | x)∥p(z))] ͷ2߲໨Λ෼ղ 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] = Iqϕ(x; z) + DKL (qϕ(z)∥p(z)) Iqϕ(x; z) x z • ɹͱɹͷ૬‫ޓ‬৘ใྔɹɹɹͷ߲ͱɼ qϕ(z) p(z) aggregate͞Εͨ(ۙࣅ)ࣄ‫ޙ‬෼෍ɹɹͱࣄલ෼෍ɹɹͷKL߲͕ग़ͯ͘Δ[Hoffman+ 2016] • FactorVAE[Kim+ 2018]ͷྫΛ঺հ • ଞʹ΋ɼβ-TCVAE[Chen+ 2018]ɼInfoVAE[Zhao+ 2017a]ɼDIP-VAE[Kumar+ 2018]ͳͲ͕͋Δ 24


(b) xͱzͷ૬‫ޓ‬৘ใྔΛ࢖͏ Factor VAE [Kim+ 2018] DKL (qϕ(z)∥p(z)) • βVAEͷloss ℒβ−VAE ͸ɹɹɹɹɹɹɹΛ͚ۙͮΔͷͰɼҼࢠ෼ղ͢Δ࡞༻͕͋Δ͕ Iqϕ(x; z) ಉ࣌ʹɼɹɹɹͷ߲ʹΑΔϖφϧςΟ͕ՃΘͬͯ͠·͏ • ͦͷͨΊɼtoral correlation TC (qϕ(z)) = DKL qϕ(z)∥ ʹΑΔਖ਼ଇԽΛߟ͑Δ ∏ j qϕ (zj) ℒFactorVAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ2 TC (qϕ(z)) • αϯϓϧ͔Βਪఆ͢ΔͨΊʹɼdiscriminatorͷֶशΛ࢖ͬͨdensity ratio trickΛར༻͢Δ • [DLྠಡձ]Disentangling by Factorising https://www.slideshare.net/DeepLearningJP2016/dldisentangling-by-factorising 25


(c) જࡏม਺ؒͷಠཱੑΛԾఆ HSIC-VAE [Lopez+ 2018] • Hilbert-Schmidt independence criterion (HSIC) [Gretton+2005]Λ࢖ͬͯɼ zG = {zk} જࡏද‫ݱ‬ͷάϧʔϓɹɹɹɹɹɹ͕ؒಠཱʹͳΔΑ͏ʹଅ͢ k∈G ℒHSIC−VAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ2HSIC qϕ (zG1), qϕ (zG2) ( ) • HSIC͸Χʔωϧʹ‫ͮ͘ج‬ಠཱੑͷࢦඪ (ຊ࿦จͷAppendixAʹઆ໌͕͋Δ) • ϥϕϧɹͰද‫͞ݱ‬ΕΔηϯγςΟϒͳ৘ใΛજࡏද‫͔ݱ‬ΒऔΓআͨ͘Ίʹ s ɹɹɹɹɹɹɹɹΛਖ਼ଇԽ߲ͱͯ͠ར༻͢Δ͜ͱ΋Ͱ͖Δ HSIC (qϕ(z), p(s)) • p(s) ɹ ͸αϯϓϧ͔Βਪఆ ଞʹ΋ɼજࡏม਺ؒͷಠཱੑΛԾఆ͢Δख๏ͱͯ͠ɼHFVAE [Esmaeili+ 2018]ͳͲ͕͋Δ 26


જࡏද‫͕ݱ‬ແࢹ͞ΕΔͷΛ๷͙ͨΊͷਖ਼ଇԽ PixelGAN-AE [Makhzani+ 2017] • PixelCNN[van den Oord+ 2016]ͷΑ͏ͳද‫ྗݱ‬ͷେ͖͍σίʔμΛ༻͍Δ৔߹ɼ જࡏม਺ʹཔΒͳͯ͘΋খ͞ͳ࠶ߏ੒‫ࠩޡ‬Λୡ੒͢Δ͜ͱ͕Ͱ͖ͯ͠·͏ • જࡏද‫ݱ‬ͷ৘ใྔ͕খ͘͞ͳΓɼྑ͍ද‫ͳʹݱ‬Βͳ͍Մೳੑ͕͋Δ • ͦͷͨΊɼVAEͷlossͷKL߲ 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] = Iqϕ(x; z) + DKL (qϕ(z)∥p(z)) Iqϕ(x; z) ͷ૬‫ޓ‬৘ใྔͷ߲ɹɹɹΛͳ͘͢͜ͱΛఏҊ ℒPixelGAN−AE(θ, ϕ) = ℒVAE(θ, ϕ) − Iqϕ(x; z) • DKL (qϕ(z)∥p(z)) ࢒ΓͷKL߲ɹɹɹɹɹɹɹ͸GANΛ࢖ͬͯۙࣅ ਤग़య: [Makhzani+ 2017] ଞʹ΋ɼજࡏม਺͕ແࢹ͞ΕͳΑ͏ʹ͢Δਖ਼ଇԽͱͯ͠ɼVIB[Alemi+ 2016]ɼ Information dropout[Achille+ 2018]ͳͲ͕͋Δ 27


ϥϕϧ͋Γͷਖ਼ଇԽ Variational Fair Autoencoder (VFAE) [Louizos+ 2016] z s • ϥϕϧɹͰද‫͞ݱ‬ΕΔηϯγςΟϒͳ৘ใΛજࡏද‫ݱ‬ɹ͔ΒऔΓআ͘ ℒVAE ʹMMDϕʔεͷਖ਼ଇԽ߲ q(z | s = k) q(z | s = k′) • ɹɹɹɹɹͱɹɹɹɹɹ͕ಠཱʹͳΔΑ͏ʹɼVAEͷloss ΛՃ͑Δ ℒVFAE(θ, ϕ) = ℒVAE + λ2 • ϥϕϧͷࣄ‫ޙ‬෼෍ qϕ(z | s = ℓ ) = ∑ (i) i:s =ℓ 1 K ∑ ℓ=2 {i : s (i) = ℓ} MMD (qϕ(z | s = ℓ), qϕ(z | s = 1)) qϕ(z | x (i), s (i)) • MMDͷ୅ΘΓʹHSICΛ༻͍ΔHSIC-VAE[Lopez+ 2018]Ͱ͸ɹͷ෼෍͕ΧςΰϦ෼෍Ͱͳ s ͍৔߹ʹ΋ରԠͰ͖Δ • s ͕2஋ͷ৔߹ʹVFAE[Louizos+ 2016]ͱHSIC-VAE [Lopez+ 2018]͸ಉ౳ ଞʹ΋ɼϥϕϧ͕͋Δ৔߹ͷਖ਼ଇԽख๏ͱͯ͠ɼFader Network[Lample+ 2017]ɼ DC-IGN[Kulkarni+ 2015]ͳͲ͕͋Δ 28


ᶄ ΤϯίʔμͱσίʔμͷҼ਺෼ղ 29


෼෍ͷҼ਺෼ղ ϞσϧΞʔΩςΫνϟΛ޻෉͢Δ͜ͱͰ෼෍ΛҼ਺෼ղ͢Δ • ྫ) જࡏม਺ͷ֊૚ੑΛ໌ࣔతʹೖΕࠐΉ ϥϕϧ ඞཁ H: ֊૚త N: ଟมྔΨ΢ε A: ࣗ‫ݾ‬ճ‫ ؼ‬C: Categorical L: Learned prior 30


൒‫͋ࢣڭ‬ΓVAE M2Ϟσϧ [Kingma+ 2014b] • ਪ࿦Ϟσϧʹ֊૚ੑ͕͋Δ • y z ɹ͕જࡏม਺ɹͱӅΕΫϥεม਺ɹʹΑͬͯੜ੒͞Ε͍ͯΔͱԾఆ x qϕ(z, y | x) = qϕ(z | y, x)qϕ(y | x) • qϕ(z | y, x) ℒVAE x y ֶश࣌ʹɹͱରԠ͢Δϥϕϧɹ͕͋Δ৔߹͸ɹɹɹɹɹΛ࢖͍ɼlossͱͯ͠ɹɹɹΛར༻ qϕ(z, y | x) ϥϕϧ͕ͳ͍৔߹͸ɼϥϕϧΛɹɹɹɹɹʹΑΓਪ࿦ • M1Ϟσϧͱ૊Έ߹ΘͤΔ͜ͱ΋Ͱ͖Δ(M1+M2Ϟσϧ) • ࢀߟࢿྉ • DL Hacksྠಡ Semi-supervised Learning with Deep Generative Models https://www.slideshare.net/YuusukeIwasawa/dl-hacks2015-0421iwasawa • Semi-Supervised Learning with Deep Generative ModelsΛ᷿Ͱ࿩୊ͷpixyzͰ࣮૷ͯ͠Έͨ࿩ https://qiita.com/kogepan102/items/22b685ce7e9a51fbab98 31


VLAE Varational Lossy Autoencoder (VLAE) [Chen+ 2017] • σίʔμ͕ࣗ‫ݾ‬ճ‫ؼ‬ϞσϧͷΑ͏ͳද‫ྗݱ‬ͷߴ͍Ϟσϧͷ৔߹ɼ જࡏද‫ݱ‬ͷ৘ใྔ͕খ͘͞ͳΓɼྑ͍ද‫ͳʹݱ‬Βͳ͍Մೳੑ͕͋Δ z pθ(x | z) • ͜ΕΛղܾ͢ΔͨΊʹɼσίʔμɹɹɹΛɼજࡏද‫ݱ‬ɹʹอଘͯ͠΄͍͠৘ใ͕ ೖΒͳ͍Α͏ʹઃ‫͢ܭ‬Δ z • ྫ) જࡏද‫ݱ‬ɹʹάϩʔόϧͳߴ࣍‫ݩ‬ͷ৘ใ͕อଘ͞ΕΔΑ͏ʹɼ pθ(x | z) j W( j) σίʔμɹɹɹΛɼத৺ͷϐΫηϧ͕ɹͷ΢Πϯυ΢ɹɹͷ͋Δࣗ‫ݾ‬ճ‫ؼ‬Ϟσϧ pθ(x | z) = ∏ j pθ (xj | z, xW( j)) ͱͯ͠ɼ޿ൣғͷۭؒతͳґଘੑΛϞσϧ͠ͳ͍Α͏ʹ͢Δ ଞʹ΋ɼ෼෍ΛҼ਺෼ղ͢Δख๏ͱͯ͠ɼPixelVAE[Gulrajani+ 2017]ɼ LadderVAE[Sønderby+ 2016]ɼVLaAE[Zhao+ 2017b]͕͋Δ 32


ᶅ ࣄલ෼෍ʹॊೈͳ෼෍Λ༻͍Δ 33


ࣄલ෼෍ͷબ୒ p(z) ࣄલ෼෍ɹɹͷબ୒ʹΑͬͯmeta-priorΛೖΕࠐΉ • Ұ൪໌ࣔతͳmeta-priorͷೖΕࠐΈํ • ྫ) MNISTͰ਺ࣈͱॻ͖ํΛ෼͚ΔͨΊʹɼ཭ࢄͱ࿈ଓͷજࡏม਺Λ྆ํ༻͍Δ ྫ) άϥϑΟΧϧϞσϧͰࣄલ෼෍ΛϞσϧԽ(SVAE) [Johnson+ 2016] ϥϕϧ ඞཁ N: ଟมྔΨ΢ε G: άϥϑΟΧϧϞσϧ C: Categorical M: mixture L; Learned Prior 34


཭ࢄͷજࡏม਺ JointVAE [Dupont 2018] • ҟͳΔछྨͷજࡏม਺ͷdisentanglementͷͨΊʹɼ z ࿈ଓͷજࡏม਺ɹͱ཭ࢄͷજࡏม਺ɹΛಋೖ c qϕ(c | x)qϕ(z | x) • ۙࣅࣄ‫ޙ‬෼෍Λɹɹɹɹɹɹͱͯ͠Ҽ਺෼ղ • qϕ(c | x) ΧςΰϦ෼෍ɹɹɹɹʹ͍ͭͯ͸Gumbel-SoftmaxΛ༻͍Δ ℒβ−VAE • ͜ͷͱ͖ɼKL߲(β-VAEͷ໨తؔ਺ɹɹɹɹͷୈ2߲)͸ DKL (qϕ(z | x)qϕ(c | x)∥p(z)p(c)) = DKL (qϕ(z | x)∥p(z)) + DKL (qϕ(c | x)∥p(c)) ଞʹ΋ɼ཭ࢄͷજࡏม਺Λ༻͍Δख๏ͱͯ͠ɼVQ-VAE[van den Oord+ 2017]ͳͲ͕͋Δ 35


ͦͷଞͷΞϓϩʔν 36


ͦͷଞͷΞϓϩʔν ϊΠζʹର͢Δϩόετੑ • Denoising Autoencoder (DAE) [Vincent+ 2008] ࣌‫ྻܥ‬σʔλ • જࡏද‫ݱ‬Λ࣌ؒมԽ͢Δม਺ͱ͠ͳ͍ม਺ʹ෼͚Δ [Yingzhen+ 2018] [Hsieh+2018] • જࡏද‫ݱ‬Λର৅ͷ࢟੎ͱ಺༰ʹ෼͚Δ [Villegas+ 2017] [Denton+ 2017] [Fraccaro+ 2017] 37


ͦͷଞͷΞϓϩʔν ϐΫηϧۭؒͰͷdiscriminator qϕ(z | x) pθ(x | z) • ΤϯίʔμɹɹɹͱσίʔμɹɹɹͷϖΞ͕࠶ߏ੒‫ࠩޡ‬Λ࠷খԽ͢ΔΑ͏ʹֶश͢Δ୅Θ ̂ qϕ(z | x)p(x) pθ(x | z)p(z) Γʹɼಉ࣌෼෍ɹɹɹɹɹͱɹɹɹɹɹͷϚονϯάΛߦ͏ • Adversarially Learned Inference (ALI) [Dumoulin+ 2017] ਤग़య: [Dumoulin+ 2017] • Bidirectional GAN (BiGAN) [Donahue+ 2017] ਤग़య: [Donahue+ 2017] 38


Rate-Distortion-Usefulness Tradeoff 39


Rate-Distortion Tradeoff λεΫʹؔ͢Δ஌ࣝΛલఏʹͨ͠ख๏ͱɼmeta-priorʹ‫ͮ͘ج‬ख๏ʹ͸ִͨΓ͕͋Δ • ྫ) ‫͠ͳࢣڭ‬ͷβVAE͸[Higgins+ 2017]͸ɼਓ޻తͳσʔληοτ΍ߏ଄ͷ͋Δըૉͷ௿ ͍࣮σʔληοτͰ͔͠‫͞ূݕ‬Ε͍ͯͳ͍͕ɼ‫͋ࢣڭ‬ΓͷFaderNetwork[Lample+ 2017]͸ ߴղ૾౓ͳσʔλʹ΋εέʔϧ͢Δ ͜ͷ͜ͱΛ”Rate-Distortion Tradeoff”[Alemi+ 2018a]ʹ‫͢࡯ߟ͍ͯͮج‬Δ 40


Rate-Distortion Tradeoff ҎԼͷྔΛߟ͑Δ • Τϯτϩϐʔ H = − p(x)log p(x)dx = Ep(x)[−log p(x)] ∫ • Distortion: ࠶ߏ੒ͷෛͷର਺໬౓ D=− ∬ p(x)qϕ(z | x)log pθ(x | z)dxdz = Ep(x) [𝔼qϕ(z|x) [−log pθ(x | z)]] qϕ(z | x) p(z) • Rate: ࣄ‫ޙ‬෼෍ɹɹɹͱࣄલ෼෍ɹɹͷKL R= ∬ p(x)qϕ(z | x)log qϕ(z | x) p(z) dxdz = 𝔼p(x) [DKL (qθ(q | x)∥p(z))] • ͜ͷͱ͖ɼ௨ৗͷVAEͷELBO͸ ELBO = − ℒVAE = − (D + R) 41


Rate-Distortion Tradeoff Rate-Distortion Tradeoff [Alemi+ 2018a] • RateͱDistortionʹؔͯ͠ɼҎԼͷτϨʔυΦϑ͕੒Γཱͭʢৄ͘͠͸࿦จࢀর) H−D≤R =H−R • D ɹɹɹɹɹ্ͷ఺͸ಉ͡ELBO • पล໬౓ͷ࠷େԽͷΈΛߦ͏Ϟσϧ͸ɼRate͕ߴ͘ͳΔͨΊɼ ද‫ֶݱ‬शͷϞσϧͱͯ͠͸࢖͑ͳ͍Մೳੑ͕͋Δ • ද‫ྗݱ‬ͷߴ͍σίʔμΛ༻͍ͨ৔߹ʹ‫͜ى‬Δ σ • [Alemi+ 2018a]Ͱ͸ɼRateΛ͋Δ஋ɹʹ੍໿ͯ͠࠷దԽΛ ߦ͏͜ͱΛఏҊ • ͭ·Γɼmin D + | σ − R | ϕ,θ ਤग़య: [Alemi+ 2018a] 42


Rate-Distortion Tradeoff ͔͠͠ɼRateΛ‫ݻ‬ఆͯ͠΋ɼֶशͨ͠ද‫ݱ‬͸λεΫʹͱͬͯ༗ӹͰͳ͍Մೳੑ z • શମͷ৘ใྔ(Τϯτϩϐʔ)ͷ͏ͪɼͲͷ෦෼͕જࡏද‫ݱ‬ɹʹอଘ͞ΕɼͲͷ෦෼͕σ ίʔμʹอଘ͞Εͨͷ͔Θ͔Βͳ͍ • ྫ) ը૾ͷ෼ྨλεΫͷ৔߹͸ɼ෺ମͷ໨ཱͭಛ௃Λอଘͯ͠΄͍͕͠ɼ෺ମͷ഑ஔͷೝࣝλ εΫͷ৔߹͸ɼ৔ॴΛอଘ͍ͯͯ͠΄͍͠ z • ΋͠λεΫʹؔ܎͋Δ৘ใ͕જࡏද‫ݱ‬ɹʹอଘ͞Εͨͱͯ͠΋ɼλεΫΛղͨ͘Ίʹ༗ӹ ͳ‫Ͱࣜܗ‬อଘ͞Ε͍ͯΔ͔͸อূ͕ͳ͍ • ྫ) ֶशͨ͠ද‫ʹݱ‬ରͯ͠ઢ‫ͳܗ‬ϞσϧͰλεΫΛղ͘͜ͱ͕‫্ྫ׳‬ଟ͍͕ɼ ઢ‫ͳܗ‬ϞσϧͰղ͚ΔΑ͏ͳද‫ͯͬͳʹݱ‬Δอূ͸ͳ͍ ͭ·ΓɼRate-Distortion TradeoffͰ͸ɼԿͷ৘ใ͕ɼͲͷΑ͏ͳ‫Ͱࣜܗ‬ɼͲΕ͚ͩ อଘ͞Ε͍ͯΔ͔Θ͔Βͳ͍ 43


Rate-Distortion-Usefulness Tradeoff Rate-Distortion-Usefulness Tradeoff ΛఏҊ • ୈ3ͷ࣠ͱͯ͠”usefulness”Λ௥Ճ͢Δ • ͜ͷ࣠͸λεΫΛ΍ͬͯΈͳ͍ͱධՁͰ͖ͳ͍ͷͰɼ೚ҙͷλεΫʹର͢Δ࣠ͷઃఆ͸೉͍͠ • ຊ࿦จͰ‫͖ͨͯݟ‬ਖ਼ଇԽ΍ΞʔΩςΫνϟͷ޻෉͸ R-D‫ۂ‬ઢʹԊΘͤΔ͚ͩͰ͸ͳ͘usefulnessͷํ޲ʹ ΋ಋ͜͏ͱ͢Δ΋ͷͱղऍͰ͖Δ 44


Rate-Distortion-Usefulness Tradeoff Usefulness͸ɼ೚ҙͷλεΫʹର͢Δ࣠ͷઃఆ͸೉͍͠ • ͳͷͰɼߟ͑͏ΔλεΫͷ෦෼ू߹Λͱ͖ͬͯͯఆࣜԽ͢Δ • y ࣄલʹ௥Ճతͳม਺ɹ͕Θ͔͍ͬͯͯ͜ΕΛ༧ଌ͠Α͏ͱ͢Δͱɼ Dy = − ∬ p(x, y)qϕ(z | x)log pθ(y | z)dxdydz = 𝔼p(x,y) [𝔼qϕ(z|x) [−log pθ(y | z)]] • ͜ͷͱ͖ɼɹɹɹͷτϨʔυΦϑ͕ੜ·ΕΔ R − Dy • ͜ͷ͋ͨΓͷ࿩͸ɼ[Alemi+ 2018b]Ͱٞ࿦͞Ε͍ͯΔΑ͏͕ͩɼ ೉ͦ͠͏….?(ಡΊͯͳ͍) 45


͓ΘΓʹ 46


·ͱΊ • λεΫʹͱͬͯྑ͍ද‫͕͋ݱ‬ΔͱԾఆ͢Δmeta-priorΛߟ͑ɼ ͜ΕΒͷੑ࣭Λଅਐ͢Δ‫ڀݚ‬ͷํ޲ੑʹؔͯٞ͠࿦ͨ͠ • ಛʹɼᶃ (ۙࣅ)ࣄ‫ޙ‬෼෍ͷਖ਼ଇԽɼᶄ ΤϯίʔμͱσίʔμͷҼ਺෼ղɼᶅ ࣄલ෼෍ʹॊೈͳ ෼෍Λ༻͍Δɼͱ͍͏ΞϓϩʔνΛத৺ʹߟ࡯ͨ͠ • ͜ΕΒͷΞϓϩʔν͸૊Έ߹Θͤͯར༻͢Δ͜ͱ͕Ͱ͖Δ • supervisionͷఔ౓ͱ݁Ռͱͯ͠ಘΒΕΔද‫ݱ‬ͷ༗༻ੑʹ͸τϨʔυΦϑ͕͋Δ • Rate-DistortionͷτϨʔυΦϑΛ௨ͯ͡ɼ໬౓ͷ࠷େԽͷΈͰ͸ྑ͍ද‫͕ݱ‬ಘΒΕΔอূ ͕ͳ͍͜ͱ͕Θ͔Δ • “usefulness”ͷ࣠Λߟྀ͢Δ͜ͱ͕ඞཁ 47


‫ײ‬૝ • Rate-Distortion-Usefulness͸ɼҰ‫ݟ‬౰વͷ͜ͱΛ‫͕ͩ͏ͦͯͬݴ‬ɼ‫ݟ‬མͱ͕ͪ͠ • ੈքϞσϧ‫ܥ‬ͷٞ࿦Ͱ͸ɼ݁ߏԿͰ΋͔ΜͰ΋zʹಥͬࠐΊ͹͍͍ͷͰ͸ͱ͍͏࿩ʹͳΔ‫ڪ‬Ε͕͋ ΔͷͰ ex) GQN • Meta-PriorͷֶशΛ͍ͨ͠ͳ͋ͱ͍͏‫ͯͬ͘ͳʹͪ࣋ؾ‬Δ • ‫ؼ‬ೲόΠΞεͩͱߟ͑Ε͹‫׬‬શʹmeta-learningͳΑ͏ͳ‫ؾ‬΋ͯ͘͠Δ • [DLྠಡձ]Meta-Learning Probabilistic Inference for Prediction https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-forprediction-126167192 • usefulnesΛධՁ͢Δͨ͘͞ΜͷλεΫ͕͋Ε͹ྑͦ͞͏(‫࣮ݱ‬త͔͸ผ) • ࠓճͷൃදͰʮଞʹ΋ɼʙʯͰলུͨ͠Ϟσϧʹؔͯ͠΋͞Βͬͱ·ͱΊͯެ։͍ͨ͠ • ঺հ͞Ε͍ͯΔϞσϧΛͲΜͲΜPixyzͰ࣮૷ͯ͠PixyzooʹೖΕ͍ͨ (εςϚ) 48


Pixyz & Pixyzoo Pixyzɹhttps://github.com/masa-su/pixyz • ฐ‫ླݚ‬໦͞Μ࡞ͷਂ૚ੜ੒Ϟσϧ༻ϥΠϒϥϦ(Pytorchϕʔε) • ωοτϫʔΫΛ֬཰ϞσϧͰӅṭ͢Δॻ͖ํΛ͢ΔͨΊ ֬཰෼෍ؒͷૢ࡞ΛωοτϫʔΫͱ෼཭ͯ͠ߟ͑Δ͜ͱ͕Ͱ͖ ίʔυͷՄಡੑ͕ߴ͍ Pixyzooɹhttps://github.com/masa-su/pixyzoo • PixyzʹΑΔ࣮૷ू • ‫ࡏݱ‬ɼGQNɼVIBͳͲͷ࣮૷্͕͕͍ͬͯΔ • [DLHacks]PyTorch, PixyzʹΑΔGenerative Query Networkͷ࣮૷ https://www.slideshare.net/DeepLearningJP2016/dlhackspytorch-pixyzgenerative-querynetwork-126329901 49


