[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning

100 Views

April 01, 19

スライド概要

2019/01/18
Deep Learning JP:
http://deeplearning.jp/seminar-2/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

(ダウンロード不可)

関連スライド

各ページのテキスト
1.

Recent Advances in Autoencoder-Based Representation Learning Presenter: Tatsuya Matsushima @__tmats__ , Matsuo Lab 1

2.

ྠಡ಺༰ʹ͍ͭͯ Recent Advances in Autoencoder-Based Representation Learning • https://arxiv.org/abs/1812.05069 (Submitted on 12 Dec 2018) • Michael Tschannen, Olivier Bachem, Mario Lucic • ETH Zurich, Google Brain • NeurIPS 2018 Workshop (Bayesian Deep Learning) • http://bayesiandeeplearning.org/ • ͪͳΈʹҾ༻จ‫ݙ‬ͷखલ·ͰͰ19ϖʔδ΋͋Δ͕ɼ࠷ॳͷ3ϖʔδͷΈͰaccept͕ܾ·ΔΒ͍͠ • ΦʔτΤϯίʔμϕʔεͷද‫ֶݱ‬शʹؔ͢ΔαʔϕΠ࿦จ • ۙ೥ͷϞσϧΛͱͯ΋޿͘Χόʔ͍ͯ͠Δ • ྔ͕ଟͯ͘ಡΉͷ͸πϥΠ(ΧλϩάͬΆ͍…) ɹ※Ҿ༻ϚʔΫͷͳ͍ਤද͸͜ͷ࿦จ͕ग़య 2

3.

TL; DR • Ͱ͖Δ͚ͩ‫͍ྑͰ͠ͳࢣڭ‬ද‫ݱ‬Λ‫͚ͭݟ‬Δ͜ͱ͸ɼ‫ػ‬ցֶशʹ͓͍ͯ೉୊ • ຊ࿦จͰ͸ɼΦʔτΤϯίʔμΛ࢖ͬͨख๏ʹؔͯ͠·ͱΊΔ • λεΫʹͱͬͯྑ͍ද‫͕͋ݱ‬ΔͱԾఆ͢Δmeta-priorͱ͍͏ߟ͑ํΛར༻ͯٞ͠࿦͢Δ • ಛʹɼᶃ (ۙࣅ)ࣄ‫ޙ‬෼෍ͷਖ਼ଇԽɼᶄ ΤϯίʔμͱσίʔμͷҼ਺෼ղɼᶅ ࣄલ෼෍ʹॊೈͳ ෼෍Λ༻͍Δɼͱ͍͏ΞϓϩʔνΛத৺ʹѻ͏ • Rate-DistortionͷτϨʔυΦϑͱ͍͏ߟ͑ํΛ༻͍ͨ෼ੳΛߦ͏ 3

4.

͓‫ͪ࣋ؾ‬ • ঢ়ଶද‫ֶݱ‬श(SRL)ʹؔ͢ΔαʔϕΠΛલճͷྠಡͰൃදͨ͠ • [DLྠಡձ]‫ڧ‬ԽֶशͷͨΊͷঢ়ଶද‫ֶݱ‬श ʵΑΓྑ͍ʮੈքϞσϧʯͷ֫ಘʹ޲͚ͯʵ https://www.slideshare.net/DeepLearningJP2016/dl-124128933 • SRLͰ͸VAEϕʔεͷख๏͕ଟ༻͞Ε͓ͯΓɼͦ΋ͦ΋VAEͰද‫ֶݱ‬श͢Δख๏ͷ੔ཧΛ ͔ͨͬͨ͠ 4

5.

VAEͷ͓͞Β͍ 5

6.

VAE Variational Autoencoder (VAE) [Kingma+ 2014a] • જࡏม਺ϞσϧΛֶश͢ΔͨΊʹɼ‫܇‬࿅σʔλͷର਺໬౓ͷ࠷େԽΛ໨ࢦ͢ 𝔼p(x) ̂ [−log pθ(x)] = ℒVAE(θ, ϕ) − 𝔼p(x) ̂ [DKL (qϕ(z | x)∥pθ(z | x))] −ℒVAE 𝔼p(x) • KL͸ඇෛͳͷͰɼɹɹɹ͸ɼର਺໬౓ ̂ [−log pθ(x)] ͷԼքʹͳ͍ͬͯΔ(ELBO) • ͭ·ΓELBOͷ࠷େԽΛ͢Ε͹ྑ͍(VAEͷloss ℒVAE ͷ࠷খԽ) ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] ̂ p(x) ※ ‫ݧܦ‬σʔλ෼෍ɹɹͰฏ‫ۉ‬ΛͱΔ͜ͱΛ໌ࣔతʹ͍ࣔͯͯ͠ɼए‫׳ݟׯ‬Εͳ͍͚Ͳී௨ͷVAEͷELBO 6

7.

VAE VAEͷloss ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] z (i) ∼ qϕ(z | x (i)) • ୈ1߲͸ɼɹɹɹɹɹʹΑΔαϯϓϧΛ༻͍ɼޯ഑͸reparametrization trickΛ࢖ͬͯ‫఻ٯ‬೻ ͤ͞Δ • ୈ2߲͸ɼclosed-formʹ‫ٻ‬ΊΔ͔ɼαϯϓϧ͔Βਪఆ͢Δ qϕ(z | x) = 𝒩 μϕ(x), diag (σϕ(x)) p(z) = 𝒩(0,I ) ΛબΜͩͱ͖ • ۙࣅࣄ‫ޙ‬෼෍ͱͯ͠,ɹɹɹɹɹɹɹɹɹɹɹɹɹɼࣄલ෼෍ͱͯ͠ɼ ( ) ͸closed-formʹ‫͖Ͱࢉܭ‬Δ • ͦͷ΄͔ͷͱ͖͸ɼ෼෍ؒͷ‫཭ڑ‬Λαϯϓϧ͔Βਪఆ͢Δඞཁ͕͋Δ 7

8.

ఢରతֶशʹΑΔີ౓ൺਪఆ f-μΠόʔδΣϯε f f (1) = 0 ͕੒ཱ͢ΔͱԾఆͨ͠ͱ͖ɼpx ͱ py ͷf-μΠόʔδΣϯεΛ • ɹΛತؔ਺Ͱɼ ͱఆٛ͢Δɽ px(x) Df (px∥py) = f p (x)dx ∫ ( py(x) ) y • f (t) = t log t ͷͱ͖ɼKL divergenceʹͳΔ Df (px∥py) = DKL (px∥py) py x • pɹͱɹ͔Βͷαϯϓϧ͕༩͑ΒΕͨͱ͖ɼdensity-ratio trickΛ࢖ͬͯf-μΠόʔδΣϯεΛਪఆ Ͱ͖Δ • GANʹΑͬͯ஌ΒΕΔΑ͏ʹͳͬͨ 8

9.

ఢରతֶशʹΑΔີ౓ൺਪఆ GANʹΑΔDensity-ratio TrickΛ࢖ͬͨKLμΠόʔδΣϯεͷਪఆ px py c ∈ {0,1} • ɹͱɹΛϥϕϧɹɹɹɹʹΑͬͯ৚͚݅ͮΒΕͨ෼෍ͱͯ͠ද‫͢ݱ‬Δ px(x) = p(x | c = 1) py(x) = p(x | c = 0) • ͭ·Γɼɹɹɹɹɹɹɹɼ • 2஋෼ྨλεΫʹམͱ͠ࠐΉ px(x) • Discriminator Sη ͸ɼͦͷೖྗ͕෼෍ɹɹ͔ΒಘΒΕͨ΋ͷͰ͋Δ֬཰Λ༧ଌ͢Δ • ͜ͷͱ͖ɼີ౓ൺ͸Ϋϥεͷ֬཰͕ಉ౳ͱͯ͠ɼ Sη(x) px(x) p(x | c = 1) p(c = 1 | x) = = ≈ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹͱͳΔ py(x) p(x | c = 0) p(c = 0 | x) 1 − Sη(x) px • Ҏ্ΑΓɼɹ͔Βi.i.dͳɹ‫ݸ‬ͷαϯϓϧ͕ಘΒΕͨͱ͖ɼ N Sη (x (i)) 1 N DKL (px∥py) ≈ log ∑ N i=1 ( 1 − Sη (x (i)) ) 9

10.

Maximum Mean Discrepancy (MMD) ℋ k:𝒳→𝒳 ɹɹɹɹΛ࿈ଓͰ༗քͳ൒ਖ਼ఆ஋ΧʔωϧɼɹΛରԠ͢Δ࠶ੜ֩ώϧϕϧτۭؒɼ py(x) φ :𝒳→ℋ px(x) ɹɹɹɹΛͦͷಛ௃ࣸ૾ͱ͢ΔͱɼɹɹͱɹɹͷMMD͸ MMD (px, py) = 𝔼x∼px[φ(x)] − 𝔼y∼py[φ(y)] 2 ℋ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹͱఆٛ͞ΕΔ • ௚‫ײ‬తʹ͸ɼ෼෍ؒͷ‫཭ڑ‬͸ಛ௃ྔͷembeddingͷฏ‫ۉ‬ͷ‫͞ࢉܭͯ͠ͱ཭ڑ‬ΕΔͱ͍͏࿩ • ྫ) 𝒳 = ℋ = ℝd Ͱɼφ(x) = x ͷͱ͖ɼMMD͸ฏ‫ۉ‬ͷࠩ MMD (px, py) = μpx − μpy 2 2 φ • ద੾ͳࣸ૾ɹΛબͿ͜ͱͰɼߴ࣍ͷϞʔϝϯτͷ‫Ͱ఺؍‬μΠόʔδΣϯεΛਪఆͰ͖Δ 10

11.

Meta-Priorͷ࣮‫ݱ‬ख๏ͱͯ͠ͷVAE 11

12.

Meta-Priorͱ͸ Meta-prior [Bengio+ 2013] • ಉ࣌ʹଟ͘ͷλεΫʹ࢖͑Δද‫ݱ‬ͷੑ࣭ʹؔ͢ΔԾఆ • ͦ΋ͦ΋ɼද‫ݱ‬ͷʮྑ͞ʯ͸Կʹ࢖ΘΕΔ͔ʹґଘ͢Δ΋ͷͰ͋Δ • ϥϕϧΛ͚ͭΔͷ͸ίετ͕ߴ͍ͷͰຊ౰͸ϥϕϧͷ਺Λ‫ݮ‬Β͍ͨ͠ • But ‫ࢣڭ‬৴߸ͳ͠Ͱྑ͍ද‫ݱ‬Λֶश͢Δͷ͸೉͍͠ • ϥϕϧͳ͠ʹɼྑ͍ද‫͔ͨͬͭݟ͕ݱ‬Β͏Ε͍͠ɹ→meta-priorͷ‫༻׆‬ 12

13.

Meta-Priorͷछྨ [Bengio+ 2013] Disentanglement • σʔλ͸ಠཱͳมԽ͢Δཁૉ͔Βੜ੒͞Ε͍ͯΔͱ͍͏Ծఆ • ྫ) ෺ମͷ޲͖ɼޫ‫ݯ‬ͷঢ়ଶ • ͜ΕΒͷཁૉΛผʑͷද‫֫ͯ͠ͱݱ‬ಘ͢Δ͜ͱͰͦͷ‫ޙ‬ͷଟ͘ͷλεΫʹ࢖͑Δ͸ͣ આ໌ม਺ͷ֊૚ੑ • ੈք͕ந৅తͳ֓೦ͷ֊૚ੑͰઆ໌Ͱ͖Δͱ͍͏Ծఆ • ྫ) ෺ମ͸༷ʑͳཻ౓Ͱઆ໌͞ΕΔ(ଐੑΛ༩͑Δ͜ͱͰ΋ͬͱ۩ମతʹઆ໌Ͱ͖Δ) 13

14.

Meta-Priorͷछྨ [Bengio+ 2013] ൒‫͋ࢣڭ‬Γֶश • ‫͋ࢣڭ‬Γɾͳֶ͠शͷ྆ํͰද‫ݱ‬Λ‫ڞ‬༗͢Δ͜ͱͰγφδʔ͕ੜ·ΕΔͱ͍͏Ծఆ • Ұൠతʹϥϕϧ෇͖σʔλͷ਺͸গͳ͍ͷͰɼϥϕϧͷͳ͍σʔλ΋࢖͏͜ͱͰද‫ֶݱ‬शͷΨ ΠυʹͳΔ Ϋϥελੑ • ଟ͘ͷσʔλ͸ෳ਺ͷΧςΰϦͷߏ଄Λ࣋ͪɼΧςΰϦґଘͷมԽΛ͢Δͱ͍͏Ծఆ • ͜ͷΑ͏ͳߏ଄͸ͦΕͧΕͷίϯϙʔωϯτ͕ҰͭͷΧςΰϦʹ૬౰͢Δજࡏۭؒͷࠞ߹Ϟσ ϧͰද‫͢ݱ‬Δ͜ͱ͕Ͱ͖Δ͸ͣ 14

15.

Meta-PriorͷΦʔτΤϯίʔμʹΑΔ࣮‫ݱ‬ (‫ࢣڭ‬σʔλͳ͠ͷ)ද‫ֶݱ‬शͷଟ͘ͷΞϧΰϦζϜ͸ ΦʔτΤϯίʔμʹ‫͍ͯͮج‬ఏҊ͞Ε͍ͯΔ • ͜ͷ࿦จͰ঺հ͍ͯ͠Δͷ͸ɼԿΒ͔ͷmeta-priorΛ࣮‫͠ݱ‬Α͏ͱͯ͠Δ΋ͷ ࠓճશ෦঺հ͢Δͷ͸ϜϦͳͷͰ͍͔ͭ͘… (ͱ͍͏͔஌Βͳ͔ͬͨͷ݁ߏ͋Δ) 15

16.

Meta-PriorΛϞσϧʹಋೖ͢Δํ๏ ᶃ Τϯίʔμʹਖ਼ଇԽ߲ΛՃ͑Δ • disentangleͳද‫ݱ‬ͷֶशͷͨΊʹΑ͘༻͍ΒΕΔ ᶄ ΤϯίʔμͱσίʔμͷҼ਺෼ղ • ֊૚తͳද‫͢ʹݱ‬ΔͨΊʹΑ͘༻͍ΒΕΔ ᶅ ࣄલ෼෍ʹॊೈͳ෼෍Λ༻͍Δ • ΫϥελΛද‫͢ݱ‬ΔͨΊʹΑ͘༻͍ΒΕΔɹྫ) ࠞ߹Ϟσϧ 16

17.

ᶃΤϯίʔμʹਖ਼ଇԽ߲ΛՃ͑Δ 17

18.

VAEͷਖ਼ଇԽ z ∼ qϕ(z | x) meta-priorΛજࡏද‫ݱ‬ɹɹɹɹʹද‫ͤ͞ݱ‬ΔͨΊʹɼ 1 N qϕ(z | x) qϕ(z) = 𝔼p(x) qϕ(z | x (i)) ۙࣅࣄ‫ޙ‬෼෍ɹɹɹ΍aggregate͞Εͨ(ۙࣅ)ࣄ‫ޙ‬෼෍ɹɹɹɹɹɹɹɹɹɹɹɹɹ ̂ [qϕ(z | x)] = N∑ i=1 ʹؔ͢Δ߲Λ௨ৗͷVAEͷ໨తؔ਺ʹ௥Ճ͢Δ ℒVAE(θ, ϕ) + λ1𝔼p(x) ̂ [R1 (qϕ(z | x))] + λ2 R2 (qϕ(z)) qϕ(z) • aggregate͞Εͨ(ۙࣅ)ࣄ‫ޙ‬෼෍ɹɹ͸σʔλશମʹґଘ͢ΔͷͰɼཧ࿦తʹ͸ϛχόον ͷޯ഑๏͸࢖͑ͳ͍ͨΊۙࣅΛ࢖͏ ℒVAE • VAEͷ‫ݩ‬ʑͷό΢ϯυɹɹɹΑΓ΋ΏΔ͍ό΢ϯυʹͳΔͷͰɼ࠶ߏ੒ͷ࣭͕௿͍Մೳੑ 18

19.

VAEͷਖ਼ଇԽ ۙࣅࣄ‫ޙ‬෼෍ʹؔ͢Δਖ਼ଇԽΛ༻͍Δओͳख๏ ℒVAE(θ, ϕ) + λ1𝔼p(x) ̂ [R1 (qϕ(z | x))] + λ2 R2 (qϕ(z)) ϥϕϧ Optional ඞཁ 19

20.

VAEͷਖ਼ଇԽ ਖ਼ଇԽͷํ๏ qϕ(z) • aggregate͞Εͨ(ۙࣅ)ࣄ‫ޙ‬෼෍ɹɹʹؔ͢Δਖ਼ଇԽΛߦ͏΋ͷ΋ଟ͍ • divergenceʹ‫ͮ͘ج‬ਖ਼ଇԽɾϞʔϝϯτʹ‫ͮ͘ج‬ਖ਼ଇԽ aggregate͞Εͨ (ۙࣅ)ࣄ‫ޙ‬෼෍ ʹؔ͢Δਖ਼ଇԽ߲ 20

21.

DisentanglementͷͨΊͷਖ਼ଇԽ ਖ਼ଇԽʹΑͬͯdisentangleΛ໨ࢦ͢ҙਤ • σʔλͷੜ੒աఔʹɼ৚݅෇͖ಠཱͳม਺ɹͱ৚݅෇͖ಠཱͰ͸ͳ͍ม਺ɹΛԾఆ͢Δ v w x ∼ p(x | v, w) p(v | x) = p v |x ∏ ( j ) j qϕ(z | x) v • ਪ࿦Ϟσϧɹɹɹ͕ɹΛ༧ଌͰ͖ΔΑ͏ʹlossΛมߋ͢Ε͹ྑ͍ 21

22.

DisentanglementͷͨΊͷਖ਼ଇԽ DisentangleͷධՁ • ΋͠Մೳͳ৔߹͸ɼਅͷม਺ͱͷൺֱʹΑͬͯߦ͏ • ଟ͘ͷ࿦จͰdisentangleͳද‫͋Ͱݱ‬Δ͜ͱͷओு͕ߦΘΕ͍ͯΔ͕ɼ࣮ࡍ͸disentangleͷ ਖ਼֬ͳ֓೦ͷఆٛ΍ɼ‫͠ͳࢣڭ‬ͷઃఆԼͰͲΕ͚ͩ༗ޮͳͷ͔͸Θ͔Βͳ͍ • ͳͷͰɼຊ࿦จͰ͸ίϯηϓτ͚ͩ঺հ(ͲΕ͚ͩdisentangle͞Ε͍ͯΔ͔͸‫)͍ͳ͠ʹؾ‬ • [Locatello+ 2018]͕େ‫ن‬໛ͳ࣮‫ݧ‬Λͯ͠‫͍ͯ͠ূݕ‬Δ • ಛ௃తͳํ๏ • (a) ELBOͷॏΈ෇͚Λม͑Δ • (b) xͱzͷ૬‫ޓ‬৘ใྔΛ࢖͏ • (c) જࡏม਺ؒͷಠཱੑΛԾఆ 22

23.

(a) ELBOͷॏΈ͚ͮΛม͑Δ β-VAE [Higgins+ 2017] • ௨ৗͷVAEͷLoss ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qKL(q | x)∥p(z))] ͷୈ2߲ΛॏΈ෇͚Δ ℒβ−VAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ1𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] qϕ(z | x) • ۙࣅࣄ‫ޙ‬෼෍ɹɹɹ͕ࣄલ෼෍ɹɹʹ͖ۙͮ΍͘͢ͳΔͷͰɼཁૉ͕෼ղ͞Ε΍͘͢ͳΔ p(z) ͱ‫ظ‬଴Ͱ͖Δ ਤग़య: [Higgins+ 2017] 23

24.

(b) xͱzͷ૬‫ޓ‬৘ใྔΛ࢖͏ ௨ৗͷVAEͷLoss ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qKL(q | x)∥p(z))] ͷ2߲໨Λ෼ղ 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] = Iqϕ(x; z) + DKL (qϕ(z)∥p(z)) Iqϕ(x; z) x z • ɹͱɹͷ૬‫ޓ‬৘ใྔɹɹɹͷ߲ͱɼ qϕ(z) p(z) aggregate͞Εͨ(ۙࣅ)ࣄ‫ޙ‬෼෍ɹɹͱࣄલ෼෍ɹɹͷKL߲͕ग़ͯ͘Δ[Hoffman+ 2016] • FactorVAE[Kim+ 2018]ͷྫΛ঺հ • ଞʹ΋ɼβ-TCVAE[Chen+ 2018]ɼInfoVAE[Zhao+ 2017a]ɼDIP-VAE[Kumar+ 2018]ͳͲ͕͋Δ 24

25.

(b) xͱzͷ૬‫ޓ‬৘ใྔΛ࢖͏ Factor VAE [Kim+ 2018] DKL (qϕ(z)∥p(z)) • βVAEͷloss ℒβ−VAE ͸ɹɹɹɹɹɹɹΛ͚ۙͮΔͷͰɼҼࢠ෼ղ͢Δ࡞༻͕͋Δ͕ Iqϕ(x; z) ಉ࣌ʹɼɹɹɹͷ߲ʹΑΔϖφϧςΟ͕ՃΘͬͯ͠·͏ • ͦͷͨΊɼtoral correlation TC (qϕ(z)) = DKL qϕ(z)∥ ʹΑΔਖ਼ଇԽΛߟ͑Δ ∏ j qϕ (zj) ℒFactorVAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ2 TC (qϕ(z)) • αϯϓϧ͔Βਪఆ͢ΔͨΊʹɼdiscriminatorͷֶशΛ࢖ͬͨdensity ratio trickΛར༻͢Δ • [DLྠಡձ]Disentangling by Factorising https://www.slideshare.net/DeepLearningJP2016/dldisentangling-by-factorising 25

26.

(c) જࡏม਺ؒͷಠཱੑΛԾఆ HSIC-VAE [Lopez+ 2018] • Hilbert-Schmidt independence criterion (HSIC) [Gretton+2005]Λ࢖ͬͯɼ zG = {zk} જࡏද‫ݱ‬ͷάϧʔϓɹɹɹɹɹɹ͕ؒಠཱʹͳΔΑ͏ʹଅ͢ k∈G ℒHSIC−VAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ2HSIC qϕ (zG1), qϕ (zG2) ( ) • HSIC͸Χʔωϧʹ‫ͮ͘ج‬ಠཱੑͷࢦඪ (ຊ࿦จͷAppendixAʹઆ໌͕͋Δ) • ϥϕϧɹͰද‫͞ݱ‬ΕΔηϯγςΟϒͳ৘ใΛજࡏද‫͔ݱ‬ΒऔΓআͨ͘Ίʹ s ɹɹɹɹɹɹɹɹΛਖ਼ଇԽ߲ͱͯ͠ར༻͢Δ͜ͱ΋Ͱ͖Δ HSIC (qϕ(z), p(s)) • p(s) ɹ ͸αϯϓϧ͔Βਪఆ ଞʹ΋ɼજࡏม਺ؒͷಠཱੑΛԾఆ͢Δख๏ͱͯ͠ɼHFVAE [Esmaeili+ 2018]ͳͲ͕͋Δ 26

27.

જࡏද‫͕ݱ‬ແࢹ͞ΕΔͷΛ๷͙ͨΊͷਖ਼ଇԽ PixelGAN-AE [Makhzani+ 2017] • PixelCNN[van den Oord+ 2016]ͷΑ͏ͳද‫ྗݱ‬ͷେ͖͍σίʔμΛ༻͍Δ৔߹ɼ જࡏม਺ʹཔΒͳͯ͘΋খ͞ͳ࠶ߏ੒‫ࠩޡ‬Λୡ੒͢Δ͜ͱ͕Ͱ͖ͯ͠·͏ • જࡏද‫ݱ‬ͷ৘ใྔ͕খ͘͞ͳΓɼྑ͍ද‫ͳʹݱ‬Βͳ͍Մೳੑ͕͋Δ • ͦͷͨΊɼVAEͷlossͷKL߲ 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] = Iqϕ(x; z) + DKL (qϕ(z)∥p(z)) Iqϕ(x; z) ͷ૬‫ޓ‬৘ใྔͷ߲ɹɹɹΛͳ͘͢͜ͱΛఏҊ ℒPixelGAN−AE(θ, ϕ) = ℒVAE(θ, ϕ) − Iqϕ(x; z) • DKL (qϕ(z)∥p(z)) ࢒ΓͷKL߲ɹɹɹɹɹɹɹ͸GANΛ࢖ͬͯۙࣅ ਤग़య: [Makhzani+ 2017] ଞʹ΋ɼજࡏม਺͕ແࢹ͞ΕͳΑ͏ʹ͢Δਖ਼ଇԽͱͯ͠ɼVIB[Alemi+ 2016]ɼ Information dropout[Achille+ 2018]ͳͲ͕͋Δ 27

28.

ϥϕϧ͋Γͷਖ਼ଇԽ Variational Fair Autoencoder (VFAE) [Louizos+ 2016] z s • ϥϕϧɹͰද‫͞ݱ‬ΕΔηϯγςΟϒͳ৘ใΛજࡏද‫ݱ‬ɹ͔ΒऔΓআ͘ ℒVAE ʹMMDϕʔεͷਖ਼ଇԽ߲ q(z | s = k) q(z | s = k′) • ɹɹɹɹɹͱɹɹɹɹɹ͕ಠཱʹͳΔΑ͏ʹɼVAEͷloss ΛՃ͑Δ ℒVFAE(θ, ϕ) = ℒVAE + λ2 • ϥϕϧͷࣄ‫ޙ‬෼෍ qϕ(z | s = ℓ ) = ∑ (i) i:s =ℓ 1 K ∑ ℓ=2 {i : s (i) = ℓ} MMD (qϕ(z | s = ℓ), qϕ(z | s = 1)) qϕ(z | x (i), s (i)) • MMDͷ୅ΘΓʹHSICΛ༻͍ΔHSIC-VAE[Lopez+ 2018]Ͱ͸ɹͷ෼෍͕ΧςΰϦ෼෍Ͱͳ s ͍৔߹ʹ΋ରԠͰ͖Δ • s ͕2஋ͷ৔߹ʹVFAE[Louizos+ 2016]ͱHSIC-VAE [Lopez+ 2018]͸ಉ౳ ଞʹ΋ɼϥϕϧ͕͋Δ৔߹ͷਖ਼ଇԽख๏ͱͯ͠ɼFader Network[Lample+ 2017]ɼ DC-IGN[Kulkarni+ 2015]ͳͲ͕͋Δ 28

29.

ᶄ ΤϯίʔμͱσίʔμͷҼ਺෼ղ 29

30.

෼෍ͷҼ਺෼ղ ϞσϧΞʔΩςΫνϟΛ޻෉͢Δ͜ͱͰ෼෍ΛҼ਺෼ղ͢Δ • ྫ) જࡏม਺ͷ֊૚ੑΛ໌ࣔతʹೖΕࠐΉ ϥϕϧ ඞཁ H: ֊૚త N: ଟมྔΨ΢ε A: ࣗ‫ݾ‬ճ‫ ؼ‬C: Categorical L: Learned prior 30

31.

൒‫͋ࢣڭ‬ΓVAE M2Ϟσϧ [Kingma+ 2014b] • ਪ࿦Ϟσϧʹ֊૚ੑ͕͋Δ • y z ɹ͕જࡏม਺ɹͱӅΕΫϥεม਺ɹʹΑͬͯੜ੒͞Ε͍ͯΔͱԾఆ x qϕ(z, y | x) = qϕ(z | y, x)qϕ(y | x) • qϕ(z | y, x) ℒVAE x y ֶश࣌ʹɹͱରԠ͢Δϥϕϧɹ͕͋Δ৔߹͸ɹɹɹɹɹΛ࢖͍ɼlossͱͯ͠ɹɹɹΛར༻ qϕ(z, y | x) ϥϕϧ͕ͳ͍৔߹͸ɼϥϕϧΛɹɹɹɹɹʹΑΓਪ࿦ • M1Ϟσϧͱ૊Έ߹ΘͤΔ͜ͱ΋Ͱ͖Δ(M1+M2Ϟσϧ) • ࢀߟࢿྉ • DL Hacksྠಡ Semi-supervised Learning with Deep Generative Models https://www.slideshare.net/YuusukeIwasawa/dl-hacks2015-0421iwasawa • Semi-Supervised Learning with Deep Generative ModelsΛ᷿Ͱ࿩୊ͷpixyzͰ࣮૷ͯ͠Έͨ࿩ https://qiita.com/kogepan102/items/22b685ce7e9a51fbab98 31

32.

VLAE Varational Lossy Autoencoder (VLAE) [Chen+ 2017] • σίʔμ͕ࣗ‫ݾ‬ճ‫ؼ‬ϞσϧͷΑ͏ͳද‫ྗݱ‬ͷߴ͍Ϟσϧͷ৔߹ɼ જࡏද‫ݱ‬ͷ৘ใྔ͕খ͘͞ͳΓɼྑ͍ද‫ͳʹݱ‬Βͳ͍Մೳੑ͕͋Δ z pθ(x | z) • ͜ΕΛղܾ͢ΔͨΊʹɼσίʔμɹɹɹΛɼજࡏද‫ݱ‬ɹʹอଘͯ͠΄͍͠৘ใ͕ ೖΒͳ͍Α͏ʹઃ‫͢ܭ‬Δ z • ྫ) જࡏද‫ݱ‬ɹʹάϩʔόϧͳߴ࣍‫ݩ‬ͷ৘ใ͕อଘ͞ΕΔΑ͏ʹɼ pθ(x | z) j W( j) σίʔμɹɹɹΛɼத৺ͷϐΫηϧ͕ɹͷ΢Πϯυ΢ɹɹͷ͋Δࣗ‫ݾ‬ճ‫ؼ‬Ϟσϧ pθ(x | z) = ∏ j pθ (xj | z, xW( j)) ͱͯ͠ɼ޿ൣғͷۭؒతͳґଘੑΛϞσϧ͠ͳ͍Α͏ʹ͢Δ ଞʹ΋ɼ෼෍ΛҼ਺෼ղ͢Δख๏ͱͯ͠ɼPixelVAE[Gulrajani+ 2017]ɼ LadderVAE[Sønderby+ 2016]ɼVLaAE[Zhao+ 2017b]͕͋Δ 32

33.

ᶅ ࣄલ෼෍ʹॊೈͳ෼෍Λ༻͍Δ 33

34.

ࣄલ෼෍ͷબ୒ p(z) ࣄલ෼෍ɹɹͷબ୒ʹΑͬͯmeta-priorΛೖΕࠐΉ • Ұ൪໌ࣔతͳmeta-priorͷೖΕࠐΈํ • ྫ) MNISTͰ਺ࣈͱॻ͖ํΛ෼͚ΔͨΊʹɼ཭ࢄͱ࿈ଓͷજࡏม਺Λ྆ํ༻͍Δ ྫ) άϥϑΟΧϧϞσϧͰࣄલ෼෍ΛϞσϧԽ(SVAE) [Johnson+ 2016] ϥϕϧ ඞཁ N: ଟมྔΨ΢ε G: άϥϑΟΧϧϞσϧ C: Categorical M: mixture L; Learned Prior 34

35.

཭ࢄͷજࡏม਺ JointVAE [Dupont 2018] • ҟͳΔछྨͷજࡏม਺ͷdisentanglementͷͨΊʹɼ z ࿈ଓͷજࡏม਺ɹͱ཭ࢄͷજࡏม਺ɹΛಋೖ c qϕ(c | x)qϕ(z | x) • ۙࣅࣄ‫ޙ‬෼෍Λɹɹɹɹɹɹͱͯ͠Ҽ਺෼ղ • qϕ(c | x) ΧςΰϦ෼෍ɹɹɹɹʹ͍ͭͯ͸Gumbel-SoftmaxΛ༻͍Δ ℒβ−VAE • ͜ͷͱ͖ɼKL߲(β-VAEͷ໨తؔ਺ɹɹɹɹͷୈ2߲)͸ DKL (qϕ(z | x)qϕ(c | x)∥p(z)p(c)) = DKL (qϕ(z | x)∥p(z)) + DKL (qϕ(c | x)∥p(c)) ଞʹ΋ɼ཭ࢄͷજࡏม਺Λ༻͍Δख๏ͱͯ͠ɼVQ-VAE[van den Oord+ 2017]ͳͲ͕͋Δ 35

36.

ͦͷଞͷΞϓϩʔν 36

37.

ͦͷଞͷΞϓϩʔν ϊΠζʹର͢Δϩόετੑ • Denoising Autoencoder (DAE) [Vincent+ 2008] ࣌‫ྻܥ‬σʔλ • જࡏද‫ݱ‬Λ࣌ؒมԽ͢Δม਺ͱ͠ͳ͍ม਺ʹ෼͚Δ [Yingzhen+ 2018] [Hsieh+2018] • જࡏද‫ݱ‬Λର৅ͷ࢟੎ͱ಺༰ʹ෼͚Δ [Villegas+ 2017] [Denton+ 2017] [Fraccaro+ 2017] 37

38.

ͦͷଞͷΞϓϩʔν ϐΫηϧۭؒͰͷdiscriminator qϕ(z | x) pθ(x | z) • ΤϯίʔμɹɹɹͱσίʔμɹɹɹͷϖΞ͕࠶ߏ੒‫ࠩޡ‬Λ࠷খԽ͢ΔΑ͏ʹֶश͢Δ୅Θ ̂ qϕ(z | x)p(x) pθ(x | z)p(z) Γʹɼಉ࣌෼෍ɹɹɹɹɹͱɹɹɹɹɹͷϚονϯάΛߦ͏ • Adversarially Learned Inference (ALI) [Dumoulin+ 2017] ਤग़య: [Dumoulin+ 2017] • Bidirectional GAN (BiGAN) [Donahue+ 2017] ਤग़య: [Donahue+ 2017] 38

39.

Rate-Distortion-Usefulness Tradeoff 39

40.

Rate-Distortion Tradeoff λεΫʹؔ͢Δ஌ࣝΛલఏʹͨ͠ख๏ͱɼmeta-priorʹ‫ͮ͘ج‬ख๏ʹ͸ִͨΓ͕͋Δ • ྫ) ‫͠ͳࢣڭ‬ͷβVAE͸[Higgins+ 2017]͸ɼਓ޻తͳσʔληοτ΍ߏ଄ͷ͋Δըૉͷ௿ ͍࣮σʔληοτͰ͔͠‫͞ূݕ‬Ε͍ͯͳ͍͕ɼ‫͋ࢣڭ‬ΓͷFaderNetwork[Lample+ 2017]͸ ߴղ૾౓ͳσʔλʹ΋εέʔϧ͢Δ ͜ͷ͜ͱΛ”Rate-Distortion Tradeoff”[Alemi+ 2018a]ʹ‫͢࡯ߟ͍ͯͮج‬Δ 40

41.

Rate-Distortion Tradeoff ҎԼͷྔΛߟ͑Δ • Τϯτϩϐʔ H = − p(x)log p(x)dx = Ep(x)[−log p(x)] ∫ • Distortion: ࠶ߏ੒ͷෛͷର਺໬౓ D=− ∬ p(x)qϕ(z | x)log pθ(x | z)dxdz = Ep(x) [𝔼qϕ(z|x) [−log pθ(x | z)]] qϕ(z | x) p(z) • Rate: ࣄ‫ޙ‬෼෍ɹɹɹͱࣄલ෼෍ɹɹͷKL R= ∬ p(x)qϕ(z | x)log qϕ(z | x) p(z) dxdz = 𝔼p(x) [DKL (qθ(q | x)∥p(z))] • ͜ͷͱ͖ɼ௨ৗͷVAEͷELBO͸ ELBO = − ℒVAE = − (D + R) 41

42.

Rate-Distortion Tradeoff Rate-Distortion Tradeoff [Alemi+ 2018a] • RateͱDistortionʹؔͯ͠ɼҎԼͷτϨʔυΦϑ͕੒Γཱͭʢৄ͘͠͸࿦จࢀর) H−D≤R =H−R • D ɹɹɹɹɹ্ͷ఺͸ಉ͡ELBO • पล໬౓ͷ࠷େԽͷΈΛߦ͏Ϟσϧ͸ɼRate͕ߴ͘ͳΔͨΊɼ ද‫ֶݱ‬शͷϞσϧͱͯ͠͸࢖͑ͳ͍Մೳੑ͕͋Δ • ද‫ྗݱ‬ͷߴ͍σίʔμΛ༻͍ͨ৔߹ʹ‫͜ى‬Δ σ • [Alemi+ 2018a]Ͱ͸ɼRateΛ͋Δ஋ɹʹ੍໿ͯ͠࠷దԽΛ ߦ͏͜ͱΛఏҊ • ͭ·Γɼmin D + | σ − R | ϕ,θ ਤग़య: [Alemi+ 2018a] 42

43.

Rate-Distortion Tradeoff ͔͠͠ɼRateΛ‫ݻ‬ఆͯ͠΋ɼֶशͨ͠ද‫ݱ‬͸λεΫʹͱͬͯ༗ӹͰͳ͍Մೳੑ z • શମͷ৘ใྔ(Τϯτϩϐʔ)ͷ͏ͪɼͲͷ෦෼͕જࡏද‫ݱ‬ɹʹอଘ͞ΕɼͲͷ෦෼͕σ ίʔμʹอଘ͞Εͨͷ͔Θ͔Βͳ͍ • ྫ) ը૾ͷ෼ྨλεΫͷ৔߹͸ɼ෺ମͷ໨ཱͭಛ௃Λอଘͯ͠΄͍͕͠ɼ෺ମͷ഑ஔͷೝࣝλ εΫͷ৔߹͸ɼ৔ॴΛอଘ͍ͯͯ͠΄͍͠ z • ΋͠λεΫʹؔ܎͋Δ৘ใ͕જࡏද‫ݱ‬ɹʹอଘ͞Εͨͱͯ͠΋ɼλεΫΛղͨ͘Ίʹ༗ӹ ͳ‫Ͱࣜܗ‬อଘ͞Ε͍ͯΔ͔͸อূ͕ͳ͍ • ྫ) ֶशͨ͠ද‫ʹݱ‬ରͯ͠ઢ‫ͳܗ‬ϞσϧͰλεΫΛղ͘͜ͱ͕‫্ྫ׳‬ଟ͍͕ɼ ઢ‫ͳܗ‬ϞσϧͰղ͚ΔΑ͏ͳද‫ͯͬͳʹݱ‬Δอূ͸ͳ͍ ͭ·ΓɼRate-Distortion TradeoffͰ͸ɼԿͷ৘ใ͕ɼͲͷΑ͏ͳ‫Ͱࣜܗ‬ɼͲΕ͚ͩ อଘ͞Ε͍ͯΔ͔Θ͔Βͳ͍ 43

44.

Rate-Distortion-Usefulness Tradeoff Rate-Distortion-Usefulness Tradeoff ΛఏҊ • ୈ3ͷ࣠ͱͯ͠”usefulness”Λ௥Ճ͢Δ • ͜ͷ࣠͸λεΫΛ΍ͬͯΈͳ͍ͱධՁͰ͖ͳ͍ͷͰɼ೚ҙͷλεΫʹର͢Δ࣠ͷઃఆ͸೉͍͠ • ຊ࿦จͰ‫͖ͨͯݟ‬ਖ਼ଇԽ΍ΞʔΩςΫνϟͷ޻෉͸ R-D‫ۂ‬ઢʹԊΘͤΔ͚ͩͰ͸ͳ͘usefulnessͷํ޲ʹ ΋ಋ͜͏ͱ͢Δ΋ͷͱղऍͰ͖Δ 44

45.

Rate-Distortion-Usefulness Tradeoff Usefulness͸ɼ೚ҙͷλεΫʹର͢Δ࣠ͷઃఆ͸೉͍͠ • ͳͷͰɼߟ͑͏ΔλεΫͷ෦෼ू߹Λͱ͖ͬͯͯఆࣜԽ͢Δ • y ࣄલʹ௥Ճతͳม਺ɹ͕Θ͔͍ͬͯͯ͜ΕΛ༧ଌ͠Α͏ͱ͢Δͱɼ Dy = − ∬ p(x, y)qϕ(z | x)log pθ(y | z)dxdydz = 𝔼p(x,y) [𝔼qϕ(z|x) [−log pθ(y | z)]] • ͜ͷͱ͖ɼɹɹɹͷτϨʔυΦϑ͕ੜ·ΕΔ R − Dy • ͜ͷ͋ͨΓͷ࿩͸ɼ[Alemi+ 2018b]Ͱٞ࿦͞Ε͍ͯΔΑ͏͕ͩɼ ೉ͦ͠͏….?(ಡΊͯͳ͍) 45

46.

͓ΘΓʹ 46

47.

·ͱΊ • λεΫʹͱͬͯྑ͍ද‫͕͋ݱ‬ΔͱԾఆ͢Δmeta-priorΛߟ͑ɼ ͜ΕΒͷੑ࣭Λଅਐ͢Δ‫ڀݚ‬ͷํ޲ੑʹؔͯٞ͠࿦ͨ͠ • ಛʹɼᶃ (ۙࣅ)ࣄ‫ޙ‬෼෍ͷਖ਼ଇԽɼᶄ ΤϯίʔμͱσίʔμͷҼ਺෼ղɼᶅ ࣄલ෼෍ʹॊೈͳ ෼෍Λ༻͍Δɼͱ͍͏ΞϓϩʔνΛத৺ʹߟ࡯ͨ͠ • ͜ΕΒͷΞϓϩʔν͸૊Έ߹Θͤͯར༻͢Δ͜ͱ͕Ͱ͖Δ • supervisionͷఔ౓ͱ݁Ռͱͯ͠ಘΒΕΔද‫ݱ‬ͷ༗༻ੑʹ͸τϨʔυΦϑ͕͋Δ • Rate-DistortionͷτϨʔυΦϑΛ௨ͯ͡ɼ໬౓ͷ࠷େԽͷΈͰ͸ྑ͍ද‫͕ݱ‬ಘΒΕΔอূ ͕ͳ͍͜ͱ͕Θ͔Δ • “usefulness”ͷ࣠Λߟྀ͢Δ͜ͱ͕ඞཁ 47

48.

‫ײ‬૝ • Rate-Distortion-Usefulness͸ɼҰ‫ݟ‬౰વͷ͜ͱΛ‫͕ͩ͏ͦͯͬݴ‬ɼ‫ݟ‬མͱ͕ͪ͠ • ੈքϞσϧ‫ܥ‬ͷٞ࿦Ͱ͸ɼ݁ߏԿͰ΋͔ΜͰ΋zʹಥͬࠐΊ͹͍͍ͷͰ͸ͱ͍͏࿩ʹͳΔ‫ڪ‬Ε͕͋ ΔͷͰ ex) GQN • Meta-PriorͷֶशΛ͍ͨ͠ͳ͋ͱ͍͏‫ͯͬ͘ͳʹͪ࣋ؾ‬Δ • ‫ؼ‬ೲόΠΞεͩͱߟ͑Ε͹‫׬‬શʹmeta-learningͳΑ͏ͳ‫ؾ‬΋ͯ͘͠Δ • [DLྠಡձ]Meta-Learning Probabilistic Inference for Prediction https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-forprediction-126167192 • usefulnesΛධՁ͢Δͨ͘͞ΜͷλεΫ͕͋Ε͹ྑͦ͞͏(‫࣮ݱ‬త͔͸ผ) • ࠓճͷൃදͰʮଞʹ΋ɼʙʯͰলུͨ͠Ϟσϧʹؔͯ͠΋͞Βͬͱ·ͱΊͯެ։͍ͨ͠ • ঺հ͞Ε͍ͯΔϞσϧΛͲΜͲΜPixyzͰ࣮૷ͯ͠PixyzooʹೖΕ͍ͨ (εςϚ) 48

49.

Pixyz & Pixyzoo Pixyzɹhttps://github.com/masa-su/pixyz • ฐ‫ླݚ‬໦͞Μ࡞ͷਂ૚ੜ੒Ϟσϧ༻ϥΠϒϥϦ(Pytorchϕʔε) • ωοτϫʔΫΛ֬཰ϞσϧͰӅṭ͢Δॻ͖ํΛ͢ΔͨΊ ֬཰෼෍ؒͷૢ࡞ΛωοτϫʔΫͱ෼཭ͯ͠ߟ͑Δ͜ͱ͕Ͱ͖ ίʔυͷՄಡੑ͕ߴ͍ Pixyzooɹhttps://github.com/masa-su/pixyzoo • PixyzʹΑΔ࣮૷ू • ‫ࡏݱ‬ɼGQNɼVIBͳͲͷ࣮૷্͕͕͍ͬͯΔ • [DLHacks]PyTorch, PixyzʹΑΔGenerative Query Networkͷ࣮૷ https://www.slideshare.net/DeepLearningJP2016/dlhackspytorch-pixyzgenerative-querynetwork-126329901 49

50.

Appendix 50

51.

References [Achille+ 2018] A. Achille and S. Soatto, “Information dropout: Learning optimal representations through noisy computation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. https://ieeexplore.ieee.org/document/8253482 [Alemi+ 2016] A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in International Conference on Learning Representations, 2016. https://openreview.net/forum?id=HyxQzBceg [Alemi+ 2018a] A. Alemi, B. Poole, I. Fischer, J. Dillon, R. A. Saurous, and K. Murphy, “Fixing a broken ELBO,” in Proc. of the International Conference on Machine Learning, 2018, pp. 159–168. http://proceedings.mlr.press/v80/alemi18a.html [Alemi+ 2018b] A. A. Alemi and I. Fischer, “TherML: Thermodynamics of machine learning,” arXiv:1807.04162, 2018. https:// arxiv.org/abs/1807.04162 [Bengio+ 2013] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. https://ieeexplore.ieee.org/ document/6472238 [Chen+ 2017] X. Chen, D. P. Kingma, T. Salimans, Y. Duan, P. Dhariwal, J. Schulman, I. Sutskever, and P. Abbeel, “Variational lossy autoencoder,” in International Conference on Learning Representations, 2017. https://openreview.net/forum? id=BysvGP5ee [Chen+ 2018] T. Q. Chen, X. Li, R. Grosse, and D. Duvenaud, “Isolating sources of disentanglement in variational autoencoders,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/7527-isolatingsources-of-disentanglement-in-variational-autoencoders 51

52.

References [Denton+ 2017] E. L. Denton and V. Birodkar, “Unsupervised learning of disentangled representations from video,” in Advances in Neural Information Processing Systems, 2017, pp. 4414–4423. https://papers.nips.cc/paper/7028-unsupervised-learning-ofdisentangled-representations-from-video [Donahue+ 2017] J. Donahue, P. Krahenb ¨ uhl, and T. Darrell, “Adversarial feature learning,” in ¨ International Conference on Learning Representations, 2017. https://openreview.net/forum?id=BJtNZAFgg [Dumoulin+ 2017] V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville, “Adversarially learned inference,” in International Conference on Learning Representations, 2017. https://openreview.net/forum?id=B1ElR4cgg [Dupont 2018] E. Dupont, “Learning disentangled joint continuous and discrete representations,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/7351-learning-disentangled-joint-continuous-and-discreterepresentations [Esmaeili+ 2018] B.Esmaeili,H.Wu,S.Jain,A.Bozkurt,N.Siddharth,B.Paige,D.H.Brooks,J.Dy,andJ.-W. van de Meent, “Structured disentangled representations,” arXiv:1804.02086, 2018. https://arxiv.org/abs/1804.02086 [Fraccaro+ 2017] M. Fraccaro, S. Kamronn, U. Paquet, and O. Winther, “A disentangled recognition and nonlinear dynamics model for unsupervised learning,” in Advances in Neural Information Processing Systems, 2017, pp. 3601–3610. https:// papers.nips.cc/paper/6951-a-disentangled-recognition-and-nonlinear-dynamics-model-for-unsupervised-learning [Gretton+ 2005] A. Gretton, O. Bousquet, A. Smola, and B. Scho ̈lkopf, “Measuring statistical dependence with Hilbert-Schmidt norms,” in International Conference on Algorithmic Learning Theory. Springer, 2005, pp. 63–77. https://link.springer.com/chapter/ 10.1007/11564089_7 [Gulrajani+ 2017] I. Gulrajani, K. Kumar, F. Ahmed, A. A. Taiga, F. Visin, D. Vazquez, and A. Courville, “PixelVAE: A latent variable model for natural images,” in International Conference on Learning Representations, 2017. https://openreview.net/ 52 forum?id=BJKYvt5lg

53.

References [Higgins+ 2017] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-VAE: Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning Representations, 2017. https://openreview.net/forum?id=Sy2fzU9gl [Hoffman+ 2016] M. D. Hoffman and M. J. Johnson, “Elbo surgery: yet another way to carve up the variational evidence lower bound,” in Workshop in Advances in Approximate Bayesian Inference, NIPS, 2016. http://approximateinference.org/accepted/ HoffmanJohnson2016.pdf [Hsieh+2018] J.-T. Hsieh, B. Liu, D.-A. Huang, L. Fei-Fei, and J. C. Niebles, “Learning to decompose and disentangle representations for video prediction,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/ 7333-learning-to-decompose-and-disentangle-representations-for-video-prediction [Johnson+ 2016] M. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta, “Composing graphical models with neural networks for structured representations and fast inference,” in Advances in Neural Information Processing Systems, 2016, pp. 2946–2954. https://papers.nips.cc/paper/6379-composing-graphical-models-with-neural-networks-for-structuredrepresentations-and-fast-inference [Kim+ 2018] H. Kim and A. Mnih, “Disentangling by factorising,” in Proc. of the International Conference on Machine Learning, 2018, pp. 2649–2658. http://proceedings.mlr.press/v80/kim18b.html [Kingma+ 2014a] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in International Conference on Learning Representations, 2014. https://openreview.net/forum?id=33X9fd2-9FyZd [Kingma+ 2014b] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-supervised learning with deep generative models,” in Advances in Neural Information Processing Systems, 2014, pp. 3581–3589. https://papers.nips.cc/paper/5352-semisupervised-learning-with-deep-generative-models 53

54.

References [Kulkarni+ 2015] T.D.Kulkarni, W.F.Whitney, P.Kohli, and J.Tenenbaum, “Deep convolutional inverse graphics network,” in Advances in Neural Information Processing Systems, 2015, pp. 2539–2547. https://papers.nips.cc/paper/5851-deepconvolutional-inverse-graphics-network [Kumar+ 2018] A. Kumar, P. Sattigeri, and A. Balakrishnan, “Variational inference of disentangled latent concepts from unlabeled observations,” in International Conference on Learning Representations, 2018. https://openreview.net/forum? id=H1kG7GZAW [Lample+ 2017] G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer et al., “Fader networks: Manipulating images by sliding attributes,” in Advances in Neural Information Processing Systems, 2017, pp. 5967–5976. https://papers.nips.cc/paper/ 7178-fader-networksmanipulating-images-by-sliding-attributes [Locatello+ 2018] F. Locatello, S. Bauer, M. Lucic, S. Gelly, B. Scho ̈lkopf, and O. Bachem, “Challenging common assumptions in the unsupervised learning of disentangled representations,” arXiv:1811.12359, 2018. https://arxiv.org/abs/1811.12359 [Lopez+ 2018] R. Lopez, J. Regier, M. I. Jordan, and N. Yosef, “Information constraints on auto-encoding variational bayes,” in Advances in Neural Information Processing Systems, 2018. https://papers.nips.cc/paper/7850-information-constraints-on-autoencoding-variational-bayes [Louizos+ 2016] C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel, “The variational fair autoencoder,” in International Conference on Learning Representations, 2016. https://arxiv.org/abs/1511.00830 [Makhzani+ 2017] A. Makhzani and B. J. Frey, “PixelGAN autoencoders,” in Advances in Neural Information Processing Systems, 2017, pp. 1975–1985. https://papers.nips.cc/paper/6793-pixelgan-autoencoders [Sønderby+ 2016] C. K. Sønderby, T. Raiko, L. Maaløe, S. K. Sønderby, and O. Winther, “Ladder variational autoencoders,” in Advances in Neural Information Processing Systems, 2016, pp. 3738–3746. https://papers.nips.cc/paper/6275-ladder54 variational-autoencoders

55.

References [van den Oord+ 2016] A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, and A. Graves, “Conditional image generation with PixelCNN decoders,” in Advances in Neural Information Processing Systems, 2016, pp. 4790–4798. https:// papers.nips.cc/paper/6527-conditional-image-generation-with-pixelcnn-decoders [van den Oord+ 2017] A. van den Oord, O. Vinyals et al., “Neural discrete representation learning,” in Advances in Neural Information Processing Systems, 2017, pp. 6306–6315. https://papers.nips.cc/paper/7210-neural-discrete-representationlearning [Villegas+ 2017] R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee, “Decomposing motion and content for natural video sequence prediction,” in International Conference on Learning Representations, 2017. https://openreview.net/forum? id=rkEFLFqee [Vincent+ 2008] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proc. of the International Conference on Machine Learning, 2008, pp. 1096–1103. https:// dl.acm.org/citation.cfm?id=1390294 [Yingzhen+ 2018] L. Yingzhen and S. Mandt, “Disentangled sequential autoencoder,” in Proc. of the International Conference on Machine Learning, 2018, pp. 5656–5665. http://proceedings.mlr.press/v80/yingzhen18a.html [Zhao+ 2017a] S.Zhao, J.Song, and S.Ermon,“InfoVAE: Information maximizing variational autoencoders,” arXiv:1706.02262, 2017. https://arxiv.org/abs/1706.02262 [Zhao+ 2017b] S. Zhao, J. Song, and S. Ermon, “Learning hierarchical features from deep generative models,” in Proc. of the International Conference on Machine Learning, 2017, pp. 4091–4099. http://proceedings.mlr.press/v70/zhao17c.html 55