109 Views
April 01, 19
スライド概要
2019/01/18
Deep Learning JP:
http://deeplearning.jp/seminar-2/
DL輪読会資料
Recent Advances in Autoencoder-Based Representation Learning Presenter: Tatsuya Matsushima @__tmats__ , Matsuo Lab 1
ྠಡ༰ʹ͍ͭͯ Recent Advances in Autoencoder-Based Representation Learning • https://arxiv.org/abs/1812.05069 (Submitted on 12 Dec 2018) • Michael Tschannen, Olivier Bachem, Mario Lucic • ETH Zurich, Google Brain • NeurIPS 2018 Workshop (Bayesian Deep Learning) • http://bayesiandeeplearning.org/ • ͪͳΈʹҾ༻จݙͷखલ·ͰͰ19ϖʔδ͋Δ͕ɼ࠷ॳͷ3ϖʔδͷΈͰaccept͕ܾ·ΔΒ͍͠ • ΦʔτΤϯίʔμϕʔεͷදֶݱशʹؔ͢ΔαʔϕΠจ • ۙͷϞσϧΛͱͯ͘Χόʔ͍ͯ͠Δ • ྔ͕ଟͯ͘ಡΉͷπϥΠ(ΧλϩάͬΆ͍…) ɹ※Ҿ༻ϚʔΫͷͳ͍ਤද͜ͷจ͕ग़య 2
TL; DR • Ͱ͖Δ͚͍ͩྑͰ͠ͳࢣڭදݱΛ͚ͭݟΔ͜ͱɼػցֶशʹ͓͍ͯ • ຊจͰɼΦʔτΤϯίʔμΛͬͨख๏ʹؔͯ͠·ͱΊΔ • λεΫʹͱͬͯྑ͍ද͕͋ݱΔͱԾఆ͢Δmeta-priorͱ͍͏ߟ͑ํΛར༻ͯٞ͢͠Δ • ಛʹɼᶃ (ۙࣅ)ࣄޙͷਖ਼ଇԽɼᶄ ΤϯίʔμͱσίʔμͷҼղɼᶅ ࣄલʹॊೈͳ Λ༻͍Δɼͱ͍͏ΞϓϩʔνΛத৺ʹѻ͏ • Rate-DistortionͷτϨʔυΦϑͱ͍͏ߟ͑ํΛ༻͍ͨੳΛߦ͏ 3
͓ͪ࣋ؾ • ঢ়ଶදֶݱश(SRL)ʹؔ͢ΔαʔϕΠΛલճͷྠಡͰൃදͨ͠ • [DLྠಡձ]ڧԽֶशͷͨΊͷঢ়ଶදֶݱश ʵΑΓྑ͍ʮੈքϞσϧʯͷ֫ಘʹ͚ͯʵ https://www.slideshare.net/DeepLearningJP2016/dl-124128933 • SRLͰVAEϕʔεͷख๏͕ଟ༻͞Ε͓ͯΓɼͦͦVAEͰදֶݱश͢Δख๏ͷཧΛ ͔ͨͬͨ͠ 4
VAEͷ͓͞Β͍ 5
VAE Variational Autoencoder (VAE) [Kingma+ 2014a] • જࡏมϞσϧΛֶश͢ΔͨΊʹɼ܇࿅σʔλͷରͷ࠷େԽΛࢦ͢ 𝔼p(x) ̂ [−log pθ(x)] = ℒVAE(θ, ϕ) − 𝔼p(x) ̂ [DKL (qϕ(z | x)∥pθ(z | x))] −ℒVAE 𝔼p(x) • KLඇෛͳͷͰɼɹɹɹɼର ̂ [−log pθ(x)] ͷԼքʹͳ͍ͬͯΔ(ELBO) • ͭ·ΓELBOͷ࠷େԽΛ͢Εྑ͍(VAEͷloss ℒVAE ͷ࠷খԽ) ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] ̂ p(x) ※ ݧܦσʔλɹɹͰฏۉΛͱΔ͜ͱΛ໌ࣔతʹ͍ࣔͯͯ͠ɼए׳ݟׯΕͳ͍͚Ͳී௨ͷVAEͷELBO 6
VAE VAEͷloss ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] z (i) ∼ qϕ(z | x (i)) • ୈ1߲ɼɹɹɹɹɹʹΑΔαϯϓϧΛ༻͍ɼޯreparametrization trickΛͬͯٯ ͤ͞Δ • ୈ2߲ɼclosed-formʹٻΊΔ͔ɼαϯϓϧ͔Βਪఆ͢Δ qϕ(z | x) = 𝒩 μϕ(x), diag (σϕ(x)) p(z) = 𝒩(0,I ) ΛબΜͩͱ͖ • ۙࣅࣄޙͱͯ͠,ɹɹɹɹɹɹɹɹɹɹɹɹɹɼࣄલͱͯ͠ɼ ( ) closed-formʹ͖ͰࢉܭΔ • ͦͷ΄͔ͷͱ͖ɼؒͷڑΛαϯϓϧ͔Βਪఆ͢Δඞཁ͕͋Δ 7
ఢରతֶशʹΑΔີൺਪఆ f-μΠόʔδΣϯε f f (1) = 0 ཱ͕͢ΔͱԾఆͨ͠ͱ͖ɼpx ͱ py ͷf-μΠόʔδΣϯεΛ • ɹΛತؔͰɼ ͱఆٛ͢Δɽ px(x) Df (px∥py) = f p (x)dx ∫ ( py(x) ) y • f (t) = t log t ͷͱ͖ɼKL divergenceʹͳΔ Df (px∥py) = DKL (px∥py) py x • pɹͱɹ͔Βͷαϯϓϧ͕༩͑ΒΕͨͱ͖ɼdensity-ratio trickΛͬͯf-μΠόʔδΣϯεΛਪఆ Ͱ͖Δ • GANʹΑͬͯΒΕΔΑ͏ʹͳͬͨ 8
ఢରతֶशʹΑΔີൺਪఆ GANʹΑΔDensity-ratio TrickΛͬͨKLμΠόʔδΣϯεͷਪఆ px py c ∈ {0,1} • ɹͱɹΛϥϕϧɹɹɹɹʹΑ͚ͬͯ݅ͮΒΕͨͱͯ͠ද͢ݱΔ px(x) = p(x | c = 1) py(x) = p(x | c = 0) • ͭ·Γɼɹɹɹɹɹɹɹɼ • 2ྨλεΫʹམͱ͠ࠐΉ px(x) • Discriminator Sη ɼͦͷೖྗ͕ɹɹ͔ΒಘΒΕͨͷͰ͋Δ֬Λ༧ଌ͢Δ • ͜ͷͱ͖ɼີൺΫϥεͷ͕֬ಉͱͯ͠ɼ Sη(x) px(x) p(x | c = 1) p(c = 1 | x) = = ≈ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹͱͳΔ py(x) p(x | c = 0) p(c = 0 | x) 1 − Sη(x) px • Ҏ্ΑΓɼɹ͔Βi.i.dͳɹݸͷαϯϓϧ͕ಘΒΕͨͱ͖ɼ N Sη (x (i)) 1 N DKL (px∥py) ≈ log ∑ N i=1 ( 1 − Sη (x (i)) ) 9
Maximum Mean Discrepancy (MMD) ℋ k:𝒳→𝒳 ɹɹɹɹΛ࿈ଓͰ༗քͳਖ਼ఆΧʔωϧɼɹΛରԠ͢Δ࠶ੜ֩ώϧϕϧτۭؒɼ py(x) φ :𝒳→ℋ px(x) ɹɹɹɹΛͦͷಛࣸ૾ͱ͢ΔͱɼɹɹͱɹɹͷMMD MMD (px, py) = 𝔼x∼px[φ(x)] − 𝔼y∼py[φ(y)] 2 ℋ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹͱఆٛ͞ΕΔ • ײతʹɼؒͷڑಛྔͷembeddingͷฏۉͷ͞ࢉܭͯ͠ͱڑΕΔͱ͍͏ • ྫ) 𝒳 = ℋ = ℝd Ͱɼφ(x) = x ͷͱ͖ɼMMDฏۉͷࠩ MMD (px, py) = μpx − μpy 2 2 φ • దͳࣸ૾ɹΛબͿ͜ͱͰɼߴ࣍ͷϞʔϝϯτͷͰ؍μΠόʔδΣϯεΛਪఆͰ͖Δ 10
Meta-Priorͷ࣮ݱख๏ͱͯ͠ͷVAE 11
Meta-Priorͱ Meta-prior [Bengio+ 2013] • ಉ࣌ʹଟ͘ͷλεΫʹ͑Δදݱͷੑ࣭ʹؔ͢ΔԾఆ • ͦͦɼදݱͷʮྑ͞ʯԿʹΘΕΔ͔ʹґଘ͢ΔͷͰ͋Δ • ϥϕϧΛ͚ͭΔͷίετ͕ߴ͍ͷͰຊϥϕϧͷΛݮΒ͍ͨ͠ • But ࢣڭ৴߸ͳ͠Ͱྑ͍දݱΛֶश͢Δͷ͍͠ • ϥϕϧͳ͠ʹɼྑ͍ද͔ͨͬͭݟ͕ݱΒ͏Ε͍͠ɹ→meta-priorͷ༻׆ 12
Meta-Priorͷछྨ [Bengio+ 2013] Disentanglement • σʔλಠཱͳมԽ͢Δཁૉ͔Βੜ͞Ε͍ͯΔͱ͍͏Ծఆ • ྫ) ମͷ͖ɼޫݯͷঢ়ଶ • ͜ΕΒͷཁૉΛผʑͷද֫ͯ͠ͱݱಘ͢Δ͜ͱͰͦͷޙͷଟ͘ͷλεΫʹ͑Δͣ આ໌มͷ֊ੑ • ੈք͕நతͳ֓೦ͷ֊ੑͰઆ໌Ͱ͖Δͱ͍͏Ծఆ • ྫ) ମ༷ʑͳཻͰઆ໌͞ΕΔ(ଐੑΛ༩͑Δ͜ͱͰͬͱ۩ମతʹઆ໌Ͱ͖Δ) 13
Meta-Priorͷछྨ [Bengio+ 2013] ͋ࢣڭΓֶश • ͋ࢣڭΓɾͳֶ͠शͷ྆ํͰදݱΛڞ༗͢Δ͜ͱͰγφδʔ͕ੜ·ΕΔͱ͍͏Ծఆ • Ұൠతʹϥϕϧ͖σʔλͷগͳ͍ͷͰɼϥϕϧͷͳ͍σʔλ͏͜ͱͰදֶݱशͷΨ ΠυʹͳΔ Ϋϥελੑ • ଟ͘ͷσʔλෳͷΧςΰϦͷߏΛ࣋ͪɼΧςΰϦґଘͷมԽΛ͢Δͱ͍͏Ծఆ • ͜ͷΑ͏ͳߏͦΕͧΕͷίϯϙʔωϯτ͕ҰͭͷΧςΰϦʹ૬͢Δજࡏۭؒͷࠞ߹Ϟσ ϧͰද͢ݱΔ͜ͱ͕Ͱ͖Δͣ 14
Meta-PriorͷΦʔτΤϯίʔμʹΑΔ࣮ݱ (ࢣڭσʔλͳ͠ͷ)දֶݱशͷଟ͘ͷΞϧΰϦζϜ ΦʔτΤϯίʔμʹ͍ͯͮجఏҊ͞Ε͍ͯΔ • ͜ͷจͰհ͍ͯ͠ΔͷɼԿΒ͔ͷmeta-priorΛ࣮͠ݱΑ͏ͱͯ͠Δͷ ࠓճશ෦հ͢ΔͷϜϦͳͷͰ͍͔ͭ͘… (ͱ͍͏͔Βͳ͔ͬͨͷ݁ߏ͋Δ) 15
Meta-PriorΛϞσϧʹಋೖ͢Δํ๏ ᶃ Τϯίʔμʹਖ਼ଇԽ߲ΛՃ͑Δ • disentangleͳදݱͷֶशͷͨΊʹΑ͘༻͍ΒΕΔ ᶄ ΤϯίʔμͱσίʔμͷҼղ • ֊తͳද͢ʹݱΔͨΊʹΑ͘༻͍ΒΕΔ ᶅ ࣄલʹॊೈͳΛ༻͍Δ • ΫϥελΛද͢ݱΔͨΊʹΑ͘༻͍ΒΕΔɹྫ) ࠞ߹Ϟσϧ 16
ᶃΤϯίʔμʹਖ਼ଇԽ߲ΛՃ͑Δ 17
VAEͷਖ਼ଇԽ z ∼ qϕ(z | x) meta-priorΛજࡏදݱɹɹɹɹʹදͤ͞ݱΔͨΊʹɼ 1 N qϕ(z | x) qϕ(z) = 𝔼p(x) qϕ(z | x (i)) ۙࣅࣄޙɹɹɹaggregate͞Εͨ(ۙࣅ)ࣄޙɹɹɹɹɹɹɹɹɹɹɹɹɹ ̂ [qϕ(z | x)] = N∑ i=1 ʹؔ͢Δ߲Λ௨ৗͷVAEͷతؔʹՃ͢Δ ℒVAE(θ, ϕ) + λ1𝔼p(x) ̂ [R1 (qϕ(z | x))] + λ2 R2 (qϕ(z)) qϕ(z) • aggregate͞Εͨ(ۙࣅ)ࣄޙɹɹσʔλશମʹґଘ͢ΔͷͰɼཧతʹϛχόον ͷޯ๏͑ͳ͍ͨΊۙࣅΛ͏ ℒVAE • VAEͷݩʑͷόϯυɹɹɹΑΓΏΔ͍όϯυʹͳΔͷͰɼ࠶ߏͷ࣭͕͍Մೳੑ 18
VAEͷਖ਼ଇԽ ۙࣅࣄޙʹؔ͢Δਖ਼ଇԽΛ༻͍Δओͳख๏ ℒVAE(θ, ϕ) + λ1𝔼p(x) ̂ [R1 (qϕ(z | x))] + λ2 R2 (qϕ(z)) ϥϕϧ Optional ඞཁ 19
VAEͷਖ਼ଇԽ ਖ਼ଇԽͷํ๏ qϕ(z) • aggregate͞Εͨ(ۙࣅ)ࣄޙɹɹʹؔ͢Δਖ਼ଇԽΛߦ͏ͷଟ͍ • divergenceʹͮ͘جਖ਼ଇԽɾϞʔϝϯτʹͮ͘جਖ਼ଇԽ aggregate͞Εͨ (ۙࣅ)ࣄޙ ʹؔ͢Δਖ਼ଇԽ߲ 20
DisentanglementͷͨΊͷਖ਼ଇԽ ਖ਼ଇԽʹΑͬͯdisentangleΛࢦ͢ҙਤ • σʔλͷੜաఔʹɼ͖݅ಠཱͳมɹͱ͖݅ಠཱͰͳ͍มɹΛԾఆ͢Δ v w x ∼ p(x | v, w) p(v | x) = p v |x ∏ ( j ) j qϕ(z | x) v • ਪϞσϧɹɹɹ͕ɹΛ༧ଌͰ͖ΔΑ͏ʹlossΛมߋ͢Εྑ͍ 21
DisentanglementͷͨΊͷਖ਼ଇԽ DisentangleͷධՁ • ͠Մೳͳ߹ɼਅͷมͱͷൺֱʹΑͬͯߦ͏ • ଟ͘ͷจͰdisentangleͳද͋ͰݱΔ͜ͱͷओு͕ߦΘΕ͍ͯΔ͕ɼ࣮ࡍdisentangleͷ ਖ਼֬ͳ֓೦ͷఆٛɼ͠ͳࢣڭͷઃఆԼͰͲΕ͚ͩ༗ޮͳͷ͔Θ͔Βͳ͍ • ͳͷͰɼຊจͰίϯηϓτ͚ͩհ(ͲΕ͚ͩdisentangle͞Ε͍ͯΔ͔)͍ͳ͠ʹؾ • [Locatello+ 2018]͕େنͳ࣮ݧΛ͍ͯͯ͠͠ূݕΔ • ಛతͳํ๏ • (a) ELBOͷॏΈ͚Λม͑Δ • (b) xͱzͷ૬ޓใྔΛ͏ • (c) જࡏมؒͷಠཱੑΛԾఆ 22
(a) ELBOͷॏΈ͚ͮΛม͑Δ β-VAE [Higgins+ 2017] • ௨ৗͷVAEͷLoss ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qKL(q | x)∥p(z))] ͷୈ2߲ΛॏΈ͚Δ ℒβ−VAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ1𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] qϕ(z | x) • ۙࣅࣄޙɹɹɹ͕ࣄલɹɹʹ͖ۙͮ͘͢ͳΔͷͰɼཁૉ͕ղ͞Ε͘͢ͳΔ p(z) ͱظͰ͖Δ ਤग़య: [Higgins+ 2017] 23
(b) xͱzͷ૬ޓใྔΛ͏ ௨ৗͷVAEͷLoss ℒVAE(θ, ϕ) = 𝔼p(x) ̂ [𝔼qϕ(z|x) [−log pθ(x | z)]] + 𝔼p(x) ̂ [DKL (qKL(q | x)∥p(z))] ͷ2߲Λղ 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] = Iqϕ(x; z) + DKL (qϕ(z)∥p(z)) Iqϕ(x; z) x z • ɹͱɹͷ૬ޓใྔɹɹɹͷ߲ͱɼ qϕ(z) p(z) aggregate͞Εͨ(ۙࣅ)ࣄޙɹɹͱࣄલɹɹͷKL߲͕ग़ͯ͘Δ[Hoffman+ 2016] • FactorVAE[Kim+ 2018]ͷྫΛհ • ଞʹɼβ-TCVAE[Chen+ 2018]ɼInfoVAE[Zhao+ 2017a]ɼDIP-VAE[Kumar+ 2018]ͳͲ͕͋Δ 24
(b) xͱzͷ૬ޓใྔΛ͏ Factor VAE [Kim+ 2018] DKL (qϕ(z)∥p(z)) • βVAEͷloss ℒβ−VAE ɹɹɹɹɹɹɹΛ͚ۙͮΔͷͰɼҼࢠղ͢Δ࡞༻͕͋Δ͕ Iqϕ(x; z) ಉ࣌ʹɼɹɹɹͷ߲ʹΑΔϖφϧςΟ͕ՃΘͬͯ͠·͏ • ͦͷͨΊɼtoral correlation TC (qϕ(z)) = DKL qϕ(z)∥ ʹΑΔਖ਼ଇԽΛߟ͑Δ ∏ j qϕ (zj) ℒFactorVAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ2 TC (qϕ(z)) • αϯϓϧ͔Βਪఆ͢ΔͨΊʹɼdiscriminatorͷֶशΛͬͨdensity ratio trickΛར༻͢Δ • [DLྠಡձ]Disentangling by Factorising https://www.slideshare.net/DeepLearningJP2016/dldisentangling-by-factorising 25
(c) જࡏมؒͷಠཱੑΛԾఆ HSIC-VAE [Lopez+ 2018] • Hilbert-Schmidt independence criterion (HSIC) [Gretton+2005]Λͬͯɼ zG = {zk} જࡏදݱͷάϧʔϓɹɹɹɹɹɹ͕ؒಠཱʹͳΔΑ͏ʹଅ͢ k∈G ℒHSIC−VAE(θ, ϕ) = ℒVAE(θ, ϕ) + λ2HSIC qϕ (zG1), qϕ (zG2) ( ) • HSICΧʔωϧʹͮ͘جಠཱੑͷࢦඪ (ຊจͷAppendixAʹઆ໌͕͋Δ) • ϥϕϧɹͰද͞ݱΕΔηϯγςΟϒͳใΛજࡏද͔ݱΒऔΓআͨ͘Ίʹ s ɹɹɹɹɹɹɹɹΛਖ਼ଇԽ߲ͱͯ͠ར༻͢Δ͜ͱͰ͖Δ HSIC (qϕ(z), p(s)) • p(s) ɹ αϯϓϧ͔Βਪఆ ଞʹɼજࡏมؒͷಠཱੑΛԾఆ͢Δख๏ͱͯ͠ɼHFVAE [Esmaeili+ 2018]ͳͲ͕͋Δ 26
જࡏද͕ݱແࢹ͞ΕΔͷΛ͙ͨΊͷਖ਼ଇԽ PixelGAN-AE [Makhzani+ 2017] • PixelCNN[van den Oord+ 2016]ͷΑ͏ͳදྗݱͷେ͖͍σίʔμΛ༻͍Δ߹ɼ જࡏมʹཔΒͳͯ͘খ͞ͳ࠶ߏࠩޡΛୡ͢Δ͜ͱ͕Ͱ͖ͯ͠·͏ • જࡏදݱͷใྔ͕খ͘͞ͳΓɼྑ͍දͳʹݱΒͳ͍Մೳੑ͕͋Δ • ͦͷͨΊɼVAEͷlossͷKL߲ 𝔼p(x) ̂ [DKL (qϕ(z | x)∥p(z))] = Iqϕ(x; z) + DKL (qϕ(z)∥p(z)) Iqϕ(x; z) ͷ૬ޓใྔͷ߲ɹɹɹΛͳ͘͢͜ͱΛఏҊ ℒPixelGAN−AE(θ, ϕ) = ℒVAE(θ, ϕ) − Iqϕ(x; z) • DKL (qϕ(z)∥p(z)) ΓͷKL߲ɹɹɹɹɹɹɹGANΛͬͯۙࣅ ਤग़య: [Makhzani+ 2017] ଞʹɼજࡏม͕ແࢹ͞ΕͳΑ͏ʹ͢Δਖ਼ଇԽͱͯ͠ɼVIB[Alemi+ 2016]ɼ Information dropout[Achille+ 2018]ͳͲ͕͋Δ 27
ϥϕϧ͋Γͷਖ਼ଇԽ Variational Fair Autoencoder (VFAE) [Louizos+ 2016] z s • ϥϕϧɹͰද͞ݱΕΔηϯγςΟϒͳใΛજࡏදݱɹ͔ΒऔΓআ͘ ℒVAE ʹMMDϕʔεͷਖ਼ଇԽ߲ q(z | s = k) q(z | s = k′) • ɹɹɹɹɹͱɹɹɹɹɹ͕ಠཱʹͳΔΑ͏ʹɼVAEͷloss ΛՃ͑Δ ℒVFAE(θ, ϕ) = ℒVAE + λ2 • ϥϕϧͷࣄޙ qϕ(z | s = ℓ ) = ∑ (i) i:s =ℓ 1 K ∑ ℓ=2 {i : s (i) = ℓ} MMD (qϕ(z | s = ℓ), qϕ(z | s = 1)) qϕ(z | x (i), s (i)) • MMDͷΘΓʹHSICΛ༻͍ΔHSIC-VAE[Lopez+ 2018]Ͱɹͷ͕ΧςΰϦͰͳ s ͍߹ʹରԠͰ͖Δ • s ͕2ͷ߹ʹVFAE[Louizos+ 2016]ͱHSIC-VAE [Lopez+ 2018]ಉ ଞʹɼϥϕϧ͕͋Δ߹ͷਖ਼ଇԽख๏ͱͯ͠ɼFader Network[Lample+ 2017]ɼ DC-IGN[Kulkarni+ 2015]ͳͲ͕͋Δ 28
ᶄ ΤϯίʔμͱσίʔμͷҼղ 29
ͷҼղ ϞσϧΞʔΩςΫνϟΛ͢Δ͜ͱͰΛҼղ͢Δ • ྫ) જࡏมͷ֊ੑΛ໌ࣔతʹೖΕࠐΉ ϥϕϧ ඞཁ H: ֊త N: ଟมྔΨε A: ࣗݾճ ؼC: Categorical L: Learned prior 30
͋ࢣڭΓVAE M2Ϟσϧ [Kingma+ 2014b] • ਪϞσϧʹ֊ੑ͕͋Δ • y z ɹ͕જࡏมɹͱӅΕΫϥεมɹʹΑͬͯੜ͞Ε͍ͯΔͱԾఆ x qϕ(z, y | x) = qϕ(z | y, x)qϕ(y | x) • qϕ(z | y, x) ℒVAE x y ֶश࣌ʹɹͱରԠ͢Δϥϕϧɹ͕͋Δ߹ɹɹɹɹɹΛ͍ɼlossͱͯ͠ɹɹɹΛར༻ qϕ(z, y | x) ϥϕϧ͕ͳ͍߹ɼϥϕϧΛɹɹɹɹɹʹΑΓਪ • M1ϞσϧͱΈ߹ΘͤΔ͜ͱͰ͖Δ(M1+M2Ϟσϧ) • ࢀߟࢿྉ • DL Hacksྠಡ Semi-supervised Learning with Deep Generative Models https://www.slideshare.net/YuusukeIwasawa/dl-hacks2015-0421iwasawa • Semi-Supervised Learning with Deep Generative ModelsΛ᷿ͰͷpixyzͰ࣮ͯ͠Έͨ https://qiita.com/kogepan102/items/22b685ce7e9a51fbab98 31
VLAE Varational Lossy Autoencoder (VLAE) [Chen+ 2017] • σίʔμ͕ࣗݾճؼϞσϧͷΑ͏ͳදྗݱͷߴ͍Ϟσϧͷ߹ɼ જࡏදݱͷใྔ͕খ͘͞ͳΓɼྑ͍දͳʹݱΒͳ͍Մೳੑ͕͋Δ z pθ(x | z) • ͜ΕΛղܾ͢ΔͨΊʹɼσίʔμɹɹɹΛɼજࡏදݱɹʹอଘͯ͠΄͍͠ใ͕ ೖΒͳ͍Α͏ʹઃ͢ܭΔ z • ྫ) જࡏදݱɹʹάϩʔόϧͳߴ࣍ݩͷใ͕อଘ͞ΕΔΑ͏ʹɼ pθ(x | z) j W( j) σίʔμɹɹɹΛɼத৺ͷϐΫηϧ͕ɹͷΠϯυɹɹͷ͋ΔࣗݾճؼϞσϧ pθ(x | z) = ∏ j pθ (xj | z, xW( j)) ͱͯ͠ɼൣғͷۭؒతͳґଘੑΛϞσϧ͠ͳ͍Α͏ʹ͢Δ ଞʹɼΛҼղ͢Δख๏ͱͯ͠ɼPixelVAE[Gulrajani+ 2017]ɼ LadderVAE[Sønderby+ 2016]ɼVLaAE[Zhao+ 2017b]͕͋Δ 32
ᶅ ࣄલʹॊೈͳΛ༻͍Δ 33
ࣄલͷબ p(z) ࣄલɹɹͷબʹΑͬͯmeta-priorΛೖΕࠐΉ • Ұ൪໌ࣔతͳmeta-priorͷೖΕࠐΈํ • ྫ) MNISTͰࣈͱॻ͖ํΛ͚ΔͨΊʹɼࢄͱ࿈ଓͷજࡏมΛ྆ํ༻͍Δ ྫ) άϥϑΟΧϧϞσϧͰࣄલΛϞσϧԽ(SVAE) [Johnson+ 2016] ϥϕϧ ඞཁ N: ଟมྔΨε G: άϥϑΟΧϧϞσϧ C: Categorical M: mixture L; Learned Prior 34
ࢄͷજࡏม JointVAE [Dupont 2018] • ҟͳΔछྨͷજࡏมͷdisentanglementͷͨΊʹɼ z ࿈ଓͷજࡏมɹͱࢄͷજࡏมɹΛಋೖ c qϕ(c | x)qϕ(z | x) • ۙࣅࣄޙΛɹɹɹɹɹɹͱͯ͠Ҽղ • qϕ(c | x) ΧςΰϦɹɹɹɹʹ͍ͭͯGumbel-SoftmaxΛ༻͍Δ ℒβ−VAE • ͜ͷͱ͖ɼKL߲(β-VAEͷతؔɹɹɹɹͷୈ2߲) DKL (qϕ(z | x)qϕ(c | x)∥p(z)p(c)) = DKL (qϕ(z | x)∥p(z)) + DKL (qϕ(c | x)∥p(c)) ଞʹɼࢄͷજࡏมΛ༻͍Δख๏ͱͯ͠ɼVQ-VAE[van den Oord+ 2017]ͳͲ͕͋Δ 35
ͦͷଞͷΞϓϩʔν 36
ͦͷଞͷΞϓϩʔν ϊΠζʹର͢Δϩόετੑ • Denoising Autoencoder (DAE) [Vincent+ 2008] ࣌ྻܥσʔλ • જࡏදݱΛ࣌ؒมԽ͢Δมͱ͠ͳ͍มʹ͚Δ [Yingzhen+ 2018] [Hsieh+2018] • જࡏදݱΛରͷ࢟ͱ༰ʹ͚Δ [Villegas+ 2017] [Denton+ 2017] [Fraccaro+ 2017] 37
ͦͷଞͷΞϓϩʔν ϐΫηϧۭؒͰͷdiscriminator qϕ(z | x) pθ(x | z) • ΤϯίʔμɹɹɹͱσίʔμɹɹɹͷϖΞ͕࠶ߏࠩޡΛ࠷খԽ͢ΔΑ͏ʹֶश͢ΔΘ ̂ qϕ(z | x)p(x) pθ(x | z)p(z) Γʹɼಉ࣌ɹɹɹɹɹͱɹɹɹɹɹͷϚονϯάΛߦ͏ • Adversarially Learned Inference (ALI) [Dumoulin+ 2017] ਤग़య: [Dumoulin+ 2017] • Bidirectional GAN (BiGAN) [Donahue+ 2017] ਤग़య: [Donahue+ 2017] 38
Rate-Distortion-Usefulness Tradeoff 39
Rate-Distortion Tradeoff λεΫʹؔ͢ΔࣝΛલఏʹͨ͠ख๏ͱɼmeta-priorʹͮ͘جख๏ʹִͨΓ͕͋Δ • ྫ) ͠ͳࢣڭͷβVAE[Higgins+ 2017]ɼਓతͳσʔληοτߏͷ͋Δըૉͷ ͍࣮σʔληοτͰ͔͠͞ূݕΕ͍ͯͳ͍͕ɼ͋ࢣڭΓͷFaderNetwork[Lample+ 2017] ߴղ૾ͳσʔλʹεέʔϧ͢Δ ͜ͷ͜ͱΛ”Rate-Distortion Tradeoff”[Alemi+ 2018a]ʹ͢ߟ͍ͯͮجΔ 40
Rate-Distortion Tradeoff ҎԼͷྔΛߟ͑Δ • Τϯτϩϐʔ H = − p(x)log p(x)dx = Ep(x)[−log p(x)] ∫ • Distortion: ࠶ߏͷෛͷର D=− ∬ p(x)qϕ(z | x)log pθ(x | z)dxdz = Ep(x) [𝔼qϕ(z|x) [−log pθ(x | z)]] qϕ(z | x) p(z) • Rate: ࣄޙɹɹɹͱࣄલɹɹͷKL R= ∬ p(x)qϕ(z | x)log qϕ(z | x) p(z) dxdz = 𝔼p(x) [DKL (qθ(q | x)∥p(z))] • ͜ͷͱ͖ɼ௨ৗͷVAEͷELBO ELBO = − ℒVAE = − (D + R) 41
Rate-Distortion Tradeoff Rate-Distortion Tradeoff [Alemi+ 2018a] • RateͱDistortionʹؔͯ͠ɼҎԼͷτϨʔυΦϑ͕Γཱͭʢৄ͘͠จࢀর) H−D≤R =H−R • D ɹɹɹɹɹ্ͷಉ͡ELBO • पลͷ࠷େԽͷΈΛߦ͏ϞσϧɼRate͕ߴ͘ͳΔͨΊɼ දֶݱशͷϞσϧͱͯ͑͠ͳ͍Մೳੑ͕͋Δ • දྗݱͷߴ͍σίʔμΛ༻͍ͨ߹ʹ͜ىΔ σ • [Alemi+ 2018a]ͰɼRateΛ͋Δɹʹ੍ͯ͠࠷దԽΛ ߦ͏͜ͱΛఏҊ • ͭ·Γɼmin D + | σ − R | ϕ,θ ਤग़య: [Alemi+ 2018a] 42
Rate-Distortion Tradeoff ͔͠͠ɼRateΛݻఆͯ͠ɼֶशͨ͠දݱλεΫʹͱͬͯ༗ӹͰͳ͍Մೳੑ z • શମͷใྔ(Τϯτϩϐʔ)ͷ͏ͪɼͲͷ෦͕જࡏදݱɹʹอଘ͞ΕɼͲͷ෦͕σ ίʔμʹอଘ͞Εͨͷ͔Θ͔Βͳ͍ • ྫ) ը૾ͷྨλεΫͷ߹ɼମͷཱͭಛΛอଘͯ͠΄͍͕͠ɼମͷஔͷೝࣝλ εΫͷ߹ɼॴΛอଘ͍ͯͯ͠΄͍͠ z • ͠λεΫʹؔ͋Δใ͕જࡏදݱɹʹอଘ͞Εͨͱͯ͠ɼλεΫΛղͨ͘Ίʹ༗ӹ ͳͰࣜܗอଘ͞Ε͍ͯΔ͔อূ͕ͳ͍ • ྫ) ֶशͨ͠දʹݱରͯ͠ઢͳܗϞσϧͰλεΫΛղ͘͜ͱ্͕ྫ׳ଟ͍͕ɼ ઢͳܗϞσϧͰղ͚ΔΑ͏ͳදͯͬͳʹݱΔอূͳ͍ ͭ·ΓɼRate-Distortion TradeoffͰɼԿͷใ͕ɼͲͷΑ͏ͳͰࣜܗɼͲΕ͚ͩ อଘ͞Ε͍ͯΔ͔Θ͔Βͳ͍ 43
Rate-Distortion-Usefulness Tradeoff Rate-Distortion-Usefulness Tradeoff ΛఏҊ • ୈ3ͷ࣠ͱͯ͠”usefulness”ΛՃ͢Δ • ͜ͷ࣠λεΫΛͬͯΈͳ͍ͱධՁͰ͖ͳ͍ͷͰɼҙͷλεΫʹର͢Δ࣠ͷઃఆ͍͠ • ຊจͰ͖ͨͯݟਖ਼ଇԽΞʔΩςΫνϟͷ R-DۂઢʹԊΘͤΔ͚ͩͰͳ͘usefulnessͷํʹ ಋ͜͏ͱ͢ΔͷͱղऍͰ͖Δ 44
Rate-Distortion-Usefulness Tradeoff UsefulnessɼҙͷλεΫʹର͢Δ࣠ͷઃఆ͍͠ • ͳͷͰɼߟ͑͏ΔλεΫͷ෦ू߹Λͱ͖ͬͯͯఆࣜԽ͢Δ • y ࣄલʹՃతͳมɹ͕Θ͔͍ͬͯͯ͜ΕΛ༧ଌ͠Α͏ͱ͢Δͱɼ Dy = − ∬ p(x, y)qϕ(z | x)log pθ(y | z)dxdydz = 𝔼p(x,y) [𝔼qϕ(z|x) [−log pθ(y | z)]] • ͜ͷͱ͖ɼɹɹɹͷτϨʔυΦϑ͕ੜ·ΕΔ R − Dy • ͜ͷ͋ͨΓͷɼ[Alemi+ 2018b]Ͱٞ͞Ε͍ͯΔΑ͏͕ͩɼ ͦ͠͏….?(ಡΊͯͳ͍) 45
͓ΘΓʹ 46
·ͱΊ • λεΫʹͱͬͯྑ͍ද͕͋ݱΔͱԾఆ͢Δmeta-priorΛߟ͑ɼ ͜ΕΒͷੑ࣭Λଅਐ͢Δڀݚͷํੑʹؔͯٞͨ͠͠ • ಛʹɼᶃ (ۙࣅ)ࣄޙͷਖ਼ଇԽɼᶄ ΤϯίʔμͱσίʔμͷҼղɼᶅ ࣄલʹॊೈͳ Λ༻͍Δɼͱ͍͏ΞϓϩʔνΛத৺ʹߟͨ͠ • ͜ΕΒͷΞϓϩʔνΈ߹Θͤͯར༻͢Δ͜ͱ͕Ͱ͖Δ • supervisionͷఔͱ݁Ռͱͯ͠ಘΒΕΔදݱͷ༗༻ੑʹτϨʔυΦϑ͕͋Δ • Rate-DistortionͷτϨʔυΦϑΛ௨ͯ͡ɼͷ࠷େԽͷΈͰྑ͍ද͕ݱಘΒΕΔอূ ͕ͳ͍͜ͱ͕Θ͔Δ • “usefulness”ͷ࣠Λߟྀ͢Δ͜ͱ͕ඞཁ 47
ײ • Rate-Distortion-UsefulnessɼҰݟવͷ͜ͱΛ͕ͩ͏ͦͯͬݴɼݟམͱ͕ͪ͠ • ੈքϞσϧܥͷٞͰɼ݁ߏԿͰ͔ΜͰzʹಥͬࠐΊ͍͍ͷͰͱ͍͏ʹͳΔڪΕ͕͋ ΔͷͰ ex) GQN • Meta-PriorͷֶशΛ͍ͨ͠ͳ͋ͱ͍͏ͯͬ͘ͳʹͪ࣋ؾΔ • ؼೲόΠΞεͩͱߟ͑Εશʹmeta-learningͳΑ͏ͳؾͯ͘͠Δ • [DLྠಡձ]Meta-Learning Probabilistic Inference for Prediction https://www.slideshare.net/DeepLearningJP2016/dlmetalearning-probabilistic-inference-forprediction-126167192 • usefulnesΛධՁ͢Δͨ͘͞ΜͷλεΫ͕͋Εྑͦ͞͏(࣮ݱత͔ผ) • ࠓճͷൃදͰʮଞʹɼʙʯͰলུͨ͠Ϟσϧʹؔͯ͠͞Βͬͱ·ͱΊͯެ։͍ͨ͠ • հ͞Ε͍ͯΔϞσϧΛͲΜͲΜPixyzͰ࣮ͯ͠PixyzooʹೖΕ͍ͨ (εςϚ) 48
Pixyz & Pixyzoo Pixyzɹhttps://github.com/masa-su/pixyz • ฐླݚ͞Μ࡞ͷਂੜϞσϧ༻ϥΠϒϥϦ(Pytorchϕʔε) • ωοτϫʔΫΛ֬ϞσϧͰӅṭ͢Δॻ͖ํΛ͢ΔͨΊ ֬ؒͷૢ࡞ΛωοτϫʔΫͱͯ͠ߟ͑Δ͜ͱ͕Ͱ͖ ίʔυͷՄಡੑ͕ߴ͍ Pixyzooɹhttps://github.com/masa-su/pixyzoo • PixyzʹΑΔ࣮ू • ࡏݱɼGQNɼVIBͳͲͷ্࣮͕͕͍ͬͯΔ • [DLHacks]PyTorch, PixyzʹΑΔGenerative Query Networkͷ࣮ https://www.slideshare.net/DeepLearningJP2016/dlhackspytorch-pixyzgenerative-querynetwork-126329901 49
Appendix 50
References [Achille+ 2018] A. Achille and S. Soatto, “Information dropout: Learning optimal representations through noisy computation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. https://ieeexplore.ieee.org/document/8253482 [Alemi+ 2016] A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in International Conference on Learning Representations, 2016. https://openreview.net/forum?id=HyxQzBceg [Alemi+ 2018a] A. Alemi, B. Poole, I. Fischer, J. Dillon, R. A. Saurous, and K. Murphy, “Fixing a broken ELBO,” in Proc. of the International Conference on Machine Learning, 2018, pp. 159–168. http://proceedings.mlr.press/v80/alemi18a.html [Alemi+ 2018b] A. A. Alemi and I. Fischer, “TherML: Thermodynamics of machine learning,” arXiv:1807.04162, 2018. https:// arxiv.org/abs/1807.04162 [Bengio+ 2013] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. https://ieeexplore.ieee.org/ document/6472238 [Chen+ 2017] X. Chen, D. P. Kingma, T. Salimans, Y. Duan, P. Dhariwal, J. Schulman, I. Sutskever, and P. Abbeel, “Variational lossy autoencoder,” in International Conference on Learning Representations, 2017. https://openreview.net/forum? id=BysvGP5ee [Chen+ 2018] T. Q. Chen, X. Li, R. Grosse, and D. Duvenaud, “Isolating sources of disentanglement in variational autoencoders,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/7527-isolatingsources-of-disentanglement-in-variational-autoencoders 51
References [Denton+ 2017] E. L. Denton and V. Birodkar, “Unsupervised learning of disentangled representations from video,” in Advances in Neural Information Processing Systems, 2017, pp. 4414–4423. https://papers.nips.cc/paper/7028-unsupervised-learning-ofdisentangled-representations-from-video [Donahue+ 2017] J. Donahue, P. Krahenb ¨ uhl, and T. Darrell, “Adversarial feature learning,” in ¨ International Conference on Learning Representations, 2017. https://openreview.net/forum?id=BJtNZAFgg [Dumoulin+ 2017] V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville, “Adversarially learned inference,” in International Conference on Learning Representations, 2017. https://openreview.net/forum?id=B1ElR4cgg [Dupont 2018] E. Dupont, “Learning disentangled joint continuous and discrete representations,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/7351-learning-disentangled-joint-continuous-and-discreterepresentations [Esmaeili+ 2018] B.Esmaeili,H.Wu,S.Jain,A.Bozkurt,N.Siddharth,B.Paige,D.H.Brooks,J.Dy,andJ.-W. van de Meent, “Structured disentangled representations,” arXiv:1804.02086, 2018. https://arxiv.org/abs/1804.02086 [Fraccaro+ 2017] M. Fraccaro, S. Kamronn, U. Paquet, and O. Winther, “A disentangled recognition and nonlinear dynamics model for unsupervised learning,” in Advances in Neural Information Processing Systems, 2017, pp. 3601–3610. https:// papers.nips.cc/paper/6951-a-disentangled-recognition-and-nonlinear-dynamics-model-for-unsupervised-learning [Gretton+ 2005] A. Gretton, O. Bousquet, A. Smola, and B. Scho ̈lkopf, “Measuring statistical dependence with Hilbert-Schmidt norms,” in International Conference on Algorithmic Learning Theory. Springer, 2005, pp. 63–77. https://link.springer.com/chapter/ 10.1007/11564089_7 [Gulrajani+ 2017] I. Gulrajani, K. Kumar, F. Ahmed, A. A. Taiga, F. Visin, D. Vazquez, and A. Courville, “PixelVAE: A latent variable model for natural images,” in International Conference on Learning Representations, 2017. https://openreview.net/ 52 forum?id=BJKYvt5lg
References [Higgins+ 2017] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-VAE: Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning Representations, 2017. https://openreview.net/forum?id=Sy2fzU9gl [Hoffman+ 2016] M. D. Hoffman and M. J. Johnson, “Elbo surgery: yet another way to carve up the variational evidence lower bound,” in Workshop in Advances in Approximate Bayesian Inference, NIPS, 2016. http://approximateinference.org/accepted/ HoffmanJohnson2016.pdf [Hsieh+2018] J.-T. Hsieh, B. Liu, D.-A. Huang, L. Fei-Fei, and J. C. Niebles, “Learning to decompose and disentangle representations for video prediction,” in Advances in Neural Information Processing Systems, 2018. http://papers.nips.cc/paper/ 7333-learning-to-decompose-and-disentangle-representations-for-video-prediction [Johnson+ 2016] M. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta, “Composing graphical models with neural networks for structured representations and fast inference,” in Advances in Neural Information Processing Systems, 2016, pp. 2946–2954. https://papers.nips.cc/paper/6379-composing-graphical-models-with-neural-networks-for-structuredrepresentations-and-fast-inference [Kim+ 2018] H. Kim and A. Mnih, “Disentangling by factorising,” in Proc. of the International Conference on Machine Learning, 2018, pp. 2649–2658. http://proceedings.mlr.press/v80/kim18b.html [Kingma+ 2014a] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in International Conference on Learning Representations, 2014. https://openreview.net/forum?id=33X9fd2-9FyZd [Kingma+ 2014b] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-supervised learning with deep generative models,” in Advances in Neural Information Processing Systems, 2014, pp. 3581–3589. https://papers.nips.cc/paper/5352-semisupervised-learning-with-deep-generative-models 53
References [Kulkarni+ 2015] T.D.Kulkarni, W.F.Whitney, P.Kohli, and J.Tenenbaum, “Deep convolutional inverse graphics network,” in Advances in Neural Information Processing Systems, 2015, pp. 2539–2547. https://papers.nips.cc/paper/5851-deepconvolutional-inverse-graphics-network [Kumar+ 2018] A. Kumar, P. Sattigeri, and A. Balakrishnan, “Variational inference of disentangled latent concepts from unlabeled observations,” in International Conference on Learning Representations, 2018. https://openreview.net/forum? id=H1kG7GZAW [Lample+ 2017] G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer et al., “Fader networks: Manipulating images by sliding attributes,” in Advances in Neural Information Processing Systems, 2017, pp. 5967–5976. https://papers.nips.cc/paper/ 7178-fader-networksmanipulating-images-by-sliding-attributes [Locatello+ 2018] F. Locatello, S. Bauer, M. Lucic, S. Gelly, B. Scho ̈lkopf, and O. Bachem, “Challenging common assumptions in the unsupervised learning of disentangled representations,” arXiv:1811.12359, 2018. https://arxiv.org/abs/1811.12359 [Lopez+ 2018] R. Lopez, J. Regier, M. I. Jordan, and N. Yosef, “Information constraints on auto-encoding variational bayes,” in Advances in Neural Information Processing Systems, 2018. https://papers.nips.cc/paper/7850-information-constraints-on-autoencoding-variational-bayes [Louizos+ 2016] C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel, “The variational fair autoencoder,” in International Conference on Learning Representations, 2016. https://arxiv.org/abs/1511.00830 [Makhzani+ 2017] A. Makhzani and B. J. Frey, “PixelGAN autoencoders,” in Advances in Neural Information Processing Systems, 2017, pp. 1975–1985. https://papers.nips.cc/paper/6793-pixelgan-autoencoders [Sønderby+ 2016] C. K. Sønderby, T. Raiko, L. Maaløe, S. K. Sønderby, and O. Winther, “Ladder variational autoencoders,” in Advances in Neural Information Processing Systems, 2016, pp. 3738–3746. https://papers.nips.cc/paper/6275-ladder54 variational-autoencoders
References [van den Oord+ 2016] A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, and A. Graves, “Conditional image generation with PixelCNN decoders,” in Advances in Neural Information Processing Systems, 2016, pp. 4790–4798. https:// papers.nips.cc/paper/6527-conditional-image-generation-with-pixelcnn-decoders [van den Oord+ 2017] A. van den Oord, O. Vinyals et al., “Neural discrete representation learning,” in Advances in Neural Information Processing Systems, 2017, pp. 6306–6315. https://papers.nips.cc/paper/7210-neural-discrete-representationlearning [Villegas+ 2017] R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee, “Decomposing motion and content for natural video sequence prediction,” in International Conference on Learning Representations, 2017. https://openreview.net/forum? id=rkEFLFqee [Vincent+ 2008] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proc. of the International Conference on Machine Learning, 2008, pp. 1096–1103. https:// dl.acm.org/citation.cfm?id=1390294 [Yingzhen+ 2018] L. Yingzhen and S. Mandt, “Disentangled sequential autoencoder,” in Proc. of the International Conference on Machine Learning, 2018, pp. 5656–5665. http://proceedings.mlr.press/v80/yingzhen18a.html [Zhao+ 2017a] S.Zhao, J.Song, and S.Ermon,“InfoVAE: Information maximizing variational autoencoders,” arXiv:1706.02262, 2017. https://arxiv.org/abs/1706.02262 [Zhao+ 2017b] S. Zhao, J. Song, and S. Ermon, “Learning hierarchical features from deep generative models,” in Proc. of the International Conference on Machine Learning, 2017, pp. 4091–4099. http://proceedings.mlr.press/v70/zhao17c.html 55