[DL Hacks]Self-Attention Generative Adversarial Networks

195 Views

July 30, 18

#deep learning #GAN #SAGAN #Self-Attention #Spectral Normalization #Image Generation

スライド概要

2018/07/30
Deep Learning JP:
http://deeplearning.jp/hacks/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 90.7K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 67.4K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 61.1K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 49.4K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 47.1K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 46.8K

各ページのテキスト

2018.07.30 m.yokoo Self-Attention Generative Adversarial Networks 1

選択理由 • 今回、この論文・実装を選択した理由。 GANから出発し、DCGANを基に wassesutainGANを基に、spectral正則化、さらにTTURを実装していることから、さらにself- attensionを追加して完成していることから、いろいろと経験できるのではないかと思い選択しました。 2

GPUの使用 • 実装にあたっては、どうしてもGPUを使用することが必要だったため、AWSのDeepLarning AMIを使って実装しました。 • EC2 インスタンスリージョン: 米国東部（バージニア北部）プライマリインスタンスタイプ: p2.xlarge • NVIDIA K80 GPU 1個を使用 3

• 学習にGPUを使って、５日と3時間57分かかりました。 • 更に、10000枚の画像をawsからdownloadするのに、5時間20 分かかりました。１万枚（100万iterateから100stepで1枚で1 万枚) 4

• GANの問題点 • 学習が難しい • 勾配消失問題が起こる • 生成結果のクオリティを損失関数から判断しにくい • モード崩壊が起こる 6

• GANによるClass-conditionalな画像の生成においてSOTAを達成 7

• SNGAN-projection(Miyato et al, 2018)でdiscriminatorに適用していた spectoral normalizationをGeneratorにも適用した。 • SNGAN-projection(Miyato et al, 2018)では，Discriminatorの Lipschitzs定数をコントロールし安定性を向上させた。 • Generatorもspectoral normalizationの恩恵を受けることを示した。 • Generatorの1更新に対するDiscriminatorの更新回数を減らすことが可能となり，計算量をへらすことができた。学習が安定することも示した。 8

• Two-timescale update rule (TTUR) (Heusel et al, 2017)を適用した． • – Generator側の学習率をDiscriminator側の学習率より小さくするとナッシュ均衡解に収束することが示せる． • – 学習率: generator 0.0001 discriminator 0.0004 9

10.

Self-Attention Generative Adversarial Networks • arXiv preprint by Zhang et al. 論文 • Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena • (Submitted on 21 May 2018) • Han Zhang∗ Rutgers University Ian Goodfellow Google Brain Dimitris Metaxas Rutgers University Augustus Odena • Google Brain 10

11.

• 既存のGANはCNNベースのため局所特徴に依存しており、離れた場所の情報を参照することができない。そのため、Attention の仕組みを導入して離れた局所特徴を重みをかけて参照できるようにする手法。局所特徴とAttention情報の利用の度合いは、係数でもって調整を行う。 11

12.

• In this paper, we proposed Self-Attention Generative Adversarial Networks (SAGANs), which incorporate a self-attention mechanism into the GAN framework. • The self-attention module is effective in modeling long-range dependencies. • In addition, we show that spectral normalization applied to the generator stabilizes GAN training and that TTUR speeds up training of regularized discriminators. • SAGAN achieves the state-of-the-art performance on classconditional image generation on ImageNe 12

13.

14.

15.

16.

17.

18.

19.

20.

21.

損失関数は以下のものを使用する 21

22.

23.

24.

25.

26.

27.

評価指標 27

28.

29.

30.

実装にあたって • DeepMindの論文では、コードが出ていませんでしたが、 • Pythorchによる実装が、GitHubに出ましたので、それを参考に実装しました。 30

31.

• Heykeetae 今回、こちらのSAGANを実装してみた。 • この後に、 christiancosgroveのSAGANの実装も出てきた。 31

32.

実装にあたって • 論文では、ImageNet 1400枚超の画像とタグのデータセットを使っていますが、 • この実装では、celebFaces Attribute Dataset 200K (めいめい 40のattribute アノテーションを持つ)を使っています。 • 論文では、 SN on G/D+TTUR 1Mイノテーションを使って重みを更新しています。で、FID 22.96を出しています。 • 論文に合わせて、 SN on G/D+TTUR で、1Mイノテーション使って重みを更新しています。 Adv-lossとしてwgn−gpを使用しています。 32

33.

• Meta overview • This repository provides a PyTorch implementation of SAGAN. Both wgan-gp and wgan-hinge loss are ready, but note that wgan-gp is somehow not compatible with the spectral normalization. Remove all the spectral normalization at the model for the adoption of wgan-gp. • Self-attentions are applied to later two layers of both discriminator and generator 33

https://arxiv.org/abs/1805.08318

34.

今回のhyper-parameter parameter.py 34

35.

Training setting parameter.py 35

36.

37.

38.

39.

40.

model sagan_models.py 40

41.

42.

Generator 42

43.

44.

45.

Disciminator 45

46.

47.

学習 trainer.py 47

48.

49.

50.

51.

52.

53.

54.

55.

56.

実装結果 SN on G/D+TTUR 100 56

57.

実装結果 SN on G/D+TTUR 100iter 200iter 57

58.

実装結果 SN on G/D+TTUR 300 400 58

59.

実装結果 SN on G/D+TTUR 500 600 59

60.

実装結果 SN on G/D+TTUR 700 800 60

61.

実装結果 SN on G/D+TTUR 10万回 61

62.

実装結果 SN on G/D+TTUR 50万回 62

63.

実装結果 SN on G/D+TTUR 80万回 63

64.

実装結果 SN on G/D+TTUR 100万回 64

65.

TTUR GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium https://arxiv.org/abs/1706.08500 TTURの論文この論文では、TTURトレーニングと、GANの評価方法として、 FIDの二つを紹介している。尚、 TTURトレーニングの結果、DCGANs とImproved Wasserstein GANs (WGAN-GP)を、改善したと述べている。 65

https://arxiv.org/abs/1706.08500

66.

2つの時間スケールの更新ルールによって訓練されたGANは、ローカルナッシュ均衡に収束する • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter • (Submitted on 26 Jun 2017 (v1), last revised 12 Jan 2018 (this version, v6)) • GANの学習で、生成モデル側の学習率を識別モデルより小さくするとODEの理論を使ってナッシュ均衡解に収束することが示せる。また、生成画像の良さを測る、inception scoreより優れているFIDを提案。 66

67.

• Generative Adversarial Networks（GAN）は、最尤が実行不可能な複雑なモデルを用いて現実的な画像を作成することに優れています。しかし、GANトレーニングのコンバージェンスはまだ証明されていません。 • 我々は、任意のGAN損失関数に対する確率的勾配降下を伴うGANをトレーニングするための2つの時間スケール更新ルール（TTUR）を提案する。 • TTURは、識別器とジェネレータの両方に個別の学習率を持ちます。 • 確率論的近似の理論を用いて、TTURが穏やかな仮定のもとで定常局所ナッシュ平衡に収束することを証明する。このコンバージェンスは、人気のあるアダムの最適化に引き継がれています。この最適化は、摩擦を伴う重いボールのダイナミクスに従うことを証明しています。 67

68.

• 画像生成時にGANの性能を評価するために、生成された画像の類似度を実際のものと比較してInception Scoreよりも優れた Fr'echet Inception Distance」（FID）を導入する。 • 実験では、TTURはCelebA、CIFAR-10、SVHN、LSUNの寝室、および10億語のベンチマークに関する従来のGANトレーニングよりも優れたDCGANおよびWasserstein GAN（WGAN-GP）の学習を改善します。 68

69.

FIDの計算方法 • http://bluewidz.blogspot.com/2017/12/frechet-inceptiondistance.html • FIDはそのようには計算できません。GANで再現したい真の分布から生成された画像の集合と、GANで再現した分布から生成した画像の集合との距離を計算することになります。距離が近ければ近いほど良い画像であると判断します。FIDは、Google Brainが実施したGANの大規模評価の評価指標にも用いられています • https://github.com/bioinf-jku/TTUR/blob/master/fid.py 69

70.

Inception score • このスコアは、GAN (Generative Adversarial Network)が生成した画像の評価値として使われることがあります。 • Inceptionモデルで識別しやすい画像であるほど、かつ、識別されるラベルのバリエーションが豊富であるほどスコアが高くなるように設計されたスコアです。 • http://bluewidz.blogspot.com/2017/12/inception-score.html • https://github.com/hvy/chainer-inception-score 70

71.

FIDの計算方法 71

72.

Inception scoreの計算 72

73.

74.

Adam Follows an HBF ODE and Ensures TTUR Convergence • In our experiments, we aim at using Adam stochastic approximation to avoid mode collapsing. GANs suffer from “mode collapsing” where large masses of probability are mapped onto a few modes that cover only small regions. While these regions represent meaningful samples, the variety of the real world data is lost and only few prototype samples are generated. Different methods have been proposed to avoid mode collapsing [11, 43]. We obviate mode collapsing by using Adam stochastic approximation [29]. Adam can be described as Heavy Ball with Friction (HBF) (see below), since it averages over past gradients 74

75.

Spectral Normalization • https://arxiv.org/abs/1802.05957 Spectral Normalization • の論文 • 新しいweight normalization technique called spectral normalizationを提案して、training of the discriminatorの安定性を更新したと言っている。TraininngにおけるDiscriminator の制限をだけを言っている。 75

https://arxiv.org/abs/1802.05957

76.

• 一言でいうと • GANの要件であるDiscriminatorのLipschitzs制約を重要視し、Discriminatorの各層に Spectral Normalizationを適用することでGeneratorが精度の高い出力を得られるようになる。 • Discriminatorの各層にSpectrum Normalizationを行うことで、これまで行われていた様々な正規化(batch normalization, weight decay, feature matching, gradient penarty) を必要とすることなく学習が安定し、良い結果が得られる。 • DCGANのDiscriminatorにSpectrum Normalizationを導入し、CIFAR-10において Inception score 7.41を出し、WGAN-GP, DFM, Cramer GANなど既存のGANより良い結果を出している。 • ただ、本論文の場合はちょっと前提事項が多いため、この点について可能な範囲で言及を行った方が読み手の理解を助けると思います。具体的には、本論文はまず GANにおいてDiscriminatorの挙動が性能のカギを握っているという論説があること、かつその挙動を制御するのにLipschitz constant(リプシッツ定数：写像後の空間での距離が元空間における距離の何倍であるかを表す係数？)が正則化項として有用であること、の2点の前提がしかれており、それを踏まえLipschitz constantを制御するためにスペクトルノルム (spectral norm)による制約を加えているように読めます。 76

77.

• we used the Adam optimizer • the number of updates of the discriminator per one update of the generator and (2) learning rate α and the first and second order momentum parameters (β1, β2) of Adam 77

78.

_ • pytorch-spectral-normalization-gan • Main.py datasetsはCIFAR10を使用している。 • get resnet model working with wasserstein and hinge losses • Model.py DCGAN-like generator and discriinatorを作っている。 • Model_resnet.py ResNet generator and discriminatorを作成。 • Spectral_normalization.py • Spectral_normalization_nondiff.py 78

https://github.com/christiancosgrove/pytorch-spectral-normalization-gan

79.

• • • • 生成的な反復的ネットワークのためのスペクトル正規化宮藤武人、片岡俊樹、小山正則、吉田祐一（2018年2月16日に提出）生成的な対立ネットワークの研究における課題の1つは、そのトレーニングの不安定さです。本論文では、識別器の学習を安定させるためのスペクトル正規化と呼ばれる新しい重み正規化手法を提案する。私たちの新しい正規化手法は計算上軽く、既存の実装に組み込むのが容易です。 CIFAR10、STL-10、ILSVRC2012のデータセットでスペクトル正規化の有効性を検証し、従来のトレーニング安定化手法と比較して、スペクトル正規化GAN（SN-GAN）がより優れた品質または同等の品質の画像を生成できることを実験的に確認しました。 79

80.

• Wasserstein GAN • Martin Arjovsky 、 Soumith Chintala 、 LéonBottou • （2017年1月26日に提出（ v1 ）、2017年12月6日に最後に改訂された（このバージョン、v3）） • 従来のGANトレーニングの代替案であるWGANという新しいアルゴリズムを紹介します。この新しいモデルでは、学習の安定性を向上させ、モード崩壊などの問題を取り除き、デバッグやハイパーパラメータ検索に役立つ有意義な学習曲線を提供できることを示します。さらに、対応する最適化問題は健全であり、分布間の他の距離への深いつながりを強調する広範な理論的作業を提供することを示す。 80

81.

• Wasserstein GAN（以下WGAN）はEarth Mover’s Distance （またはWasserstein Distance）を最小化する全く新しいGAN の学習方法を提案しています。 • Wasserstein GAN と Kantorovich-Rubinstein 双対性 81

82.

• この論文の唯一の太字箇所にこう書かれていますが、 • In no experiment did we see evidence of mode collapse for the WGAN algorithm. • 確かにWGANはmode collapseを回避できているように見えます 82

83.

84.

PyTorchでspectral_normできる 84

85.

GANの評価指標（ICML2018) Precesion（品質） and Recall（多様性） Distrbution • Assessing Generative Models via Precision and Recall • Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as Fréchet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since they yield one-dimensional scores. We propose a novel definition of precision and recall for distributions which disentangles the divergence into two separate dimensions. The proposed notion is intuitive, retains desirable properties, and naturally leads to an efficient algorithm that can be used to evaluate generative models. We relate this notion to total variation as well as to recent evaluation metrics such as Inception Score and FID. To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and the Variational Autoencoder. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution. 85

86.

GANの評価指標（ICML2018) パーシステントホモロジーを使って • Geometry Score: A Method For Comparing Generative Adversarial Networks • Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as Fr¥'echet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since they yield onedimensional scores. We propose a novel definition of precision and recall for distributions which disentangles the divergence into two separate dimensions. The proposed notion is intuitive, retains desirable properties, and naturally leads to an efficient algorithm that can be used to evaluate generative models. We relate this notion to total variation as well as to recent evaluation metrics such as Inception Score and FID. To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and the Variational Autoencoder. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution. 86