[Dl輪読会]dl hacks輪読

169 Views

November 25, 16

#deep learning #Deep Learning #Neural Network Architecture Search #Hyperparameter Optimization #Meta Networks #ResNet

スライド概要

2016/11/25
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 83.3K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 59.3K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 53K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 38.3K

【拡散モデル勉強会】拡散モデルのサンプラーまとめ

Deep Learning JP 33.3K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 30.6K

各ページのテキスト

DL Hacks輪読 2016/11/25 黒滝紘生

趣旨 - ネットワークの構造を、ある程度自動で決められないか - ICLR2017の4つの論文などを紹介する - カテゴリ - ハイパーパラメータ推定 ("HyperBand", "Neural Architecture Search with RL") - メタネットワークで生成 ("HyperNetworks") - レイヤーのスキップ≒ ResNet系 ("DCNN Design Pattern") - その他 (刈り込み/追加など) 2

目次 - ハイパーパラメータ推定 ("HyperBand", "Neural Architecture Search with RL") - メタネットワークで生成 ("HyperNetworks") - レイヤーのスキップ≒ ResNet系 ("DCNN Design Pattern") - その他 (刈り込みなど ) 3

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization - https://arxiv.org/abs/1603.06560 , https://openreview.net/forum?id=ry18Ww5ee ICLR2017 UR (Openreviewのショートバージョンの方が読みやすい ) SVHNやCIFAR-10用ネットワークのハイパーパラメータ調整タスク「ハイパーパラメータの組み合わせに対して、限られたデータ資源 (データ数、バッチ数など )を割り当てる bandit問題」として定式化する。先行研究の"Successive Halving"では、「広く浅く割り当てる vs狭く深く割り当てる」の調整ができなかった。 Successive Halvingの、ハイパーハイパーパラメータを、更にグリッドサーチすることで、最新手法(SMAC_early)と同等or上回る結果を得た。 4

Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves - IJCAI 2015 https://pdfs.semanticscholar.org/044f/0b1d5d0b421abbc7569ba4cc4bf859fd9801.pdf 前ページのHyperbandのベースライン (SMAC_early)の提案論文ハイパーパラメータサーチには、 (この論文の時点で )3つの方法があった - Baysian OptimizationによるSpearmint - Random forestによるSMAC - - 密度推定による Tree Parzen Estimator(TPE) この論文では、 SMACとTPEに対し、人間のエキスパートを真似た early stoppingを入れて、良い性能を出した 5

Neural Network Architecture Optimization through Submodularity and Supermodularity - http://arxiv.org/abs/1609.00074 Sep 2016 Baysian Optimizationによる最適化の State of the art 6

http://arxiv.org/abs/1609.00074

Neural Architecture Search with Reinforcement Learning - Google Brain, ICLR2017 under review https://arxiv.org/abs/1611.01578 強化学習とRNNで、1項目ごとに決めていく (下図) CIFAR10とPenn Treebank用のネットワークを生成した 7

https://arxiv.org/abs/1611.01578

Online Adaptation of Deep Architectures with Reinforcement Learning - ECAI 2016 https://arxiv.org/abs/1608.02292 強化学習で、Denoising Autoencoderの構造を学習する (画像は、ベースラインの論文のもの。この mergeやincrementを、動作と捉えて RLする。) 8

https://arxiv.org/abs/1608.02292

目次 - ハイパーパラメータ推定 ("HyperBand", "Neural Architecture Search with RL") - メタネットワークで生成 ("HyperNetworks") - レイヤーのスキップ≒ ResNet系 ("DCNN Design Pattern") - その他 (刈り込み/追加など) 9

10.

HyperNetworks - https://arxiv.org/abs/1609.09106, Sep 2016, ICLR2017 under review http://blog.otoro.net/2016/09/28/hyper-networks/ RNNには、毎時間の重みが変化しない制約があった。小さなLSTMから、毎時間メインの LSTMの重みを出力することで、解決した。 10

11.

HyperNetworks - テキスト生成大きなResNetの重み生成手書き文字生成 (2D混合ガウス分布を、 HyperNetworkで生成していく ) - Tensorflowの通常のRNNCellとして使える。ネットワークからネットワーク重みを生成するアイデアは、 HyperNEAT(後述)から来ている。 Character-Level Penn Treebank と Hutter Prize Wikipedia でstate of the art 11

12.

Evolving Neural Networks through Augmenting Topologies - Evolutionary Computation 2002 Vol.10-2 http://dx.doi.org/10.1162/106365602320169811 遺伝的アルゴリズム +αで、入力ノードと出力ノードの間の分岐を変化させる。 12

http://dx.doi.org/10.1162/106365602320169811

13.

A Hypercube-based Encoding for Evolving Large-scale Neural Networks - Artificial Life 2009 Vol.15-2 http://www.mitpressjournals.org/doi/abs/10.1162/artl.2009.15.2.15202#.WDd3JKJ95TY メタネットワークに、 (エッジの始点 , 終点)を入力すると、 (そのエッジのウェイト )が出力される。小さいネットワーク (CPNN)で、様々なメインネットワーク構造を表せる 13

http://www.mitpressjournals.org/doi/abs/10.1162/artl.2009.15.2.15202#.WDd3JKJ95TY

14.

Convolution by Evolution - http://mlanctot.info/files/papers/gecco16-dppn.pdf Google DeepMind CPNNを微分で学習可能にした "DPNN"を提案構造は変化するが、重みの値は BPで学習する 14

http://mlanctot.info/files/papers/gecco16-dppn.pdf

15.

他のNEAT - 画素の密集地に多くのネットワーク分岐を割り当てる CNNの前処理に使う ATARIのタスクに使う制御タスクに使う - しかし、GAの重さがネックとなっていた HyperNetworkは、全体をBPにして、応用先を変えることで解決した 15

16.

目次 - ハイパーパラメータ推定 ("HyperBand", "Neural Architecture Search with RL") - メタネットワークで生成 ("HyperNetworks") - レイヤーのスキップ≒ ResNet系 ("DCNN Design Pattern") - その他 (刈り込み/追加など) 16

17.

Deep Convolutional Neural Network Design Patterns - https://arxiv.org/abs/1611.00847 , ICLR2017 under review ここ数年の、CNN構造いじる系論文のサーベイさらに、構造いじりのアイデアのデザインパターンを提唱している。 - デザインパターン : 頻出テクニックに、名前をつけて、会話しやすくしたもの。デザインパターンを元に、いくつかの新しいネットワークを提案している。 17

https://arxiv.org/abs/1611.00847

18.

Training Very Deep Networks - https://arxiv.org/abs/1507.06228 , ICML 2015 DL Workshop -> NIPS 2015 Highlighted Paper Highway Networksの論文 Resnetの恒等写像がゲートになっている 18

19.

Deep Networks with Stochastic Depth - https://arxiv.org/abs/1603.09382 ResNetのブロックを、訓練時のみ、ランダムに消した。テスト時は全使用深さ方向のDropout。 19

https://arxiv.org/abs/1603.09382

20.

Densely Connected Convolutional Networks - - https://arxiv.org/abs/1608.06993 Resnet with Stochastic Depthと同じ著者前のConv Layerの出力を、1つ上のLayerだけでなく、その先のLayerにも入力する (いわゆるconcat layer)。先のレイヤーほど太っていくが、 - 1. 4レイヤーずつでリセットする。 (Dense Block) - 2. レイヤーの増加幅 (Growth Rate)を、小さくする。この2つによって、パラメータを増やしすぎずに済む。下のレイヤーの情報を再利用できるためと考えられている。 SVHN、CIFAR-{10,100}でstate of the art 20

https://arxiv.org/abs/1608.06993

21.

Resnet in Resnet: Generalizing Resnet Architectures - ICLR 2016 Workshop http://arxiv.org/abs/1603.08029 21

http://arxiv.org/abs/1603.08029

22.

Residual Networks are Exponential Ensembles of Relatively Shallow Networks - NIPS 2016 https://arxiv.org/abs/1605.06431 左のResNetが、実は右のように展開したものと等価なことを示した。 22

https://arxiv.org/abs/1605.06431

23.

FractalNet: Ultra-Deep Neural Networks without Residuals - http://arxiv.org/abs/1605.07648 恒等写像でバイパスする ResNetと違い、同じ関数 (レイヤー)を2回合成したパスとの concatでバイパスする。よって、少ないレイヤーからスタートして、倍々に深さが増えるさらに、ResNet w/ stochastic depth同様に、各レイヤーを確率的に落として FFルートを作る、 "Drop-path"という手法を提案しているだいたいVGG-16やResNetと同じ精度が出る 23

http://arxiv.org/abs/1605.07648

24.

Xception: Deep Learning with Depthwise Separable Convolutions - https://arxiv.org/abs/1610.02357 Inceptionを一般化、発展させた 24

https://arxiv.org/abs/1610.02357

25.

目次 - ハイパーパラメータ推定 ("HyperBand", "Neural Architecture Search with RL") - メタネットワークで生成 ("HyperNetworks") - レイヤーのスキップ≒ ResNet系 ("DCNN Design Pattern") - その他 (刈り込み/追加など) 25

26.

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding - ICLR 2016 Best Paper Award https://arxiv.org/abs/1510.00149 , https://github.com/songhan/Deep-Compression-AlexNet 刈り込み、量子化、ハフマン符号化を組み合わせて、ネットワークを圧縮する 26

https://arxiv.org/abs/1510.00149

27.

Blockout: Dynamic Model Selection for Hierarchical Deep Networks - https://arxiv.org/abs/1512.05246 DropoutやDropconnectを一般化したもの。これらを「確率的なノードグループへの接続割り当て」と解釈した (Dropoutがグループ1個、connectがN個)。決められたグループ数について、「 i番目のグループへの接続率」を BPで学習させた。 CIFARで良い性能を出した 27

https://arxiv.org/abs/1512.05246

28.

Deconstructing the Ladder Network Architecture - - https://arxiv.org/abs/1511.06430 , ICML2016 ラストがY.Bengio Ladderの元論文 "Semi-Supervised Learning with Ladder Networks (Rasmus, 2015)" の疑問点をいろいろ検証したり、構造を改善した Autoencoderに似てる 28

https://arxiv.org/abs/1511.06430

29.

[beta]

Using Fast Weights to Attend to the Recent Past
-

https://arxiv.org/abs/1610.06258 , 2ndがG.Hinton
activationとweightの中間のスピードで更新される "Fast weight"を導入することで、性能が上がった。
Fast weightは隠れ状態h(t)から計算され、一種の attentionと見なせる。また生物学的にも根拠がある。
- 具体的には、 RNNのh(t)とh(t+1)の間に、S回のh_s(t) (s=0..S)の隠れ状態の移り変わりを考える
(Eq.2, Figure 1)。
- この移り変わりでは、 h(t){h(t)^T}に基づく接続A (Fast weight)と、普通の接続 W (Slow weight)を混ぜ
合わせている (Eq.1)。

29

https://arxiv.org/abs/1610.06258

30.