【DL輪読会】Learning to-combine-top-down-and-bottom-up-signals-in-recurrent-neural-networks-with-attention-over-modules upload

>100 Views

February 05, 21

#deep learning #machine learning #neuroscience #attention mechanism #reinforcement learning #bidirectional recurrent independent mechanisms

スライド概要

2021/02/05
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 86.4K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 59.8K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 57.2K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 40.5K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 35.1K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 34.7K

各ページのテキスト

書誌情報 ● ICML 2020 ○ http://proceedings.mlr.press/v119/mittal20a.html ● 組織: MILA, Google Brainなど ● 著者: Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio ● Recurrent Independent Mechanisms(Goyal, 2019)の発展 ● 実装: https://github.com/sarthmit/BRIMs ● 一言で ○ Bottom-up情報, Top-down情報をAttentionによって組み合わせ, また隠れ状態を複数のModule で構成するBRIMsを提案. Top-down情報の活用によりノイズに頑健で,学習と評価時で分布が変化するようなタスクにおいて既存モデルよりも性能が高いことを示した. 2

背景神経科学の知見: 脳内処理での前半と後半領域の相互作用 ● Visual: 後段処理に伴って, 前段の処理の活動を調節する ○ 他のsensory modalitiesでも同じようなことを観測 ● Global Workspace Theory: ○ > 機能特化型のモジュールの計算結果の情報が、全体にブロードキャストされて、それがフレキシブルに自由に使えることで、多様な認知機能が実現されるという考え ■ 引用: https://note.com/kanair/n/n3c7c6d20288a ● 神経同士でどのようにBottom-up情報とTop-down情報が相互作用しやり取りしているかは詳細には分かっていない ● Top-down: ○ ノイズや曖昧なsensory dataを処理するのに役立つ ■ よく知っている部屋が暗闇になっても把握できる ● Bottom-up: ○ エージェントが予期せぬ刺激に反応することを可能にする ● 機械学習においてもTop-down構造役立つ?? 3

https://note.com/kanair/n/n3c7c6d20288a

背景 Bidirectional layerd modelsは十分に探索されていない ● 既存研究 ○ ○ ○ ○ Dayan et al.,1995; Larochelle & Bengio, 2008; Salakhutdinov & Hinton,2009 feedforward models ほど研究されていない ■ 通常のDeepなモデルはBottom-upのみの構造既存モデルにただ双方向性を出せばいいってものでもない ■ Iuzzoline et al., 2019 ちゃんと結果が出るようにモデル構造を考えたい ● 取り入れたい仕組み(お気持ち) ○ ○ ○ 効果的にTop-down信号を選び抜く Bottom-up信号をmodulateする信号の混ざり合いがいい感じになって欲しい ● →ModularityとAttention 4

提案手法 Bidirectional Recurrent Independent Mechanisms 5

提案手法を構成する要素 Key-Value Attention ● 隠れ状態の更新の際用いる ● Q: queries, K: keys(d次元), V: values ● AS: Attention score, AR: Attention modulated results 6

提案手法を構成する要素 Attention補足: https://web.eecs.umich.edu/~justincj/slides/eecs498/498_FA2019_lecture13.pdf より 7

https://web.eecs.umich.edu/~justincj/slides/eecs498/498_FA2019_lecture13.pdf

提案手法を構成する要素 Modularity ● 物理プロセスは独立で, 相互作用することはそんなに頻繁ではない ○ 2つのボールの動きをモデリングするときに, 2つのボールの動きを別々の独立したメカニズムとしてモデル化できる ■ 実際は2つのボールは地球からもお互いのボールからも引力が働くけれども ■ ボール同士が衝突することはそれほど頻繁でない ● 独立なメカニズムを持つModulesが組み合わさって現象が構成されているとすると ○ ○ 着目すべき重要なModuleの変化を時間ごとに絞れる ■ 学習が容易に他タスクへ汎化する ■ 特定のModuleのMechanismsは変わるかもしれないがそれ以外は前回学んだ mechanismsを活かせる 8

提案手法を構成する要素 Recurrent Independent Mechanisms(RIMs) ● 隠れ状態hはnこのmoduleを持つとする ○ n個のうち数個のみが各時刻でactivateする ● 隠れ状態の更新手順 ○ ○ ○ 1. Selective Activation 入力との関係性からactivateするmoduleを選択する 2. Independent Dynamics activateするmoduleはそれぞれ処理を行う (通常のRNNの隠れ状態の更新と同じ手順) 3.Communication 他のmoduleの情報を統合して隠れ状態の更新 9

10.

提案手法を構成する要素 RIMs 手順(1): Selective Activation ● Key-Value Activationを用いてAttention scoreを算出 ○ scoreの上位m個のmoduleをactivateする 10

11.

提案手法を構成する要素 RIMs 手順(2): Independent Dynamics ● St: activateするmoduleのkey集合 ● Fk: 更新を担う関数(GRU or LSTMなど) 11

12.

提案手法を構成する要素 RIMs 手順(3): Communication ● ActiveなmodulesがQueryを発行 12

13.

提案手法 Bidirectional Recurrent Independent Mechanisms(BRIMs) 13

14.

提案手法 Communication between layers ● 第L層のactivateするmodule選択の際にL-1, L+1層の隠れ状態の情報を用いる ○ RIMsにおけるSelective Activation ＝ 14

15.

提案手法 Sparse Activation ● RIMsにおけるIndependent Dynamics 15

16.

提案手法 Communication within layers ● RIMsにおけるcommunication 16

17.

提案手法 Bidirectional Recurrent Independent Mechanisms(BRIMs) 17

18.

実験: Sequential MNIST, Sequential CIFAR Top-down情報は汎化性能の向上に繋がるか?? A: Attention, H: Hierarchy, M: modularity, B: Bidirectional 18

19.

実験: ノイズありSequential CIFAR 観測データにノイズが増えるとTop-downの情報の重要性が増す 19

20.

実験: Stochastic Moving MNIST Video prediction 遷移の予測性能向上, 高い階層の表現の有用性 ● 提案手法が既存手法上回る ● Top-downの情報が役立っている?? ● →locationを当てるタスクも解かせてみた ○ ○ ○ ○ 1 layer RIMs ■ 82% acc higher layer of LSTM(2layer) ■ 76% acc higher layer of BRIMs(2layer) ■ 88% acc lower layer of BRIMs(2layer) ■ 58% acc 20

21.

実験: Atari long-term plan(高い階層)と予期せぬ危険から逃れる(低い階層) 21

22.

まとめ ● Bidirectional構造によって汎化性能の向上が期待される ○ ○ Top-down構造の情報からbottom-up情報で注目すべき情報を選別する ■ priorのような働き Bidirectionalのみでは不十分 ■ ただconcatとかするよりはAttention ■ Modularity (Independent Dynamics)も大事 ● Modularity無し(LSTM) < Modularity有り(RIMs) ● BRIMsは既存のRecurrent構造の置き換え可能 ○ ○ 追加のロスが必要な訳ではない LSTMの部分をBRIMsに変えるみたいなことができる 22