[DL輪読会]Fast and Slow Learning of Recurrent Independent Mechanisms

253 Views

June 04, 21

#deep learning #Deep Learning #Recurrent Independent Mechanisms #RIM #Modular Network #Machine Learning

スライド概要

2021/06/04
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 90.7K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 67.4K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 61.1K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 49.4K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 47.1K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 46.9K

各ページのテキスト

DEEP LEARNING JP [DL Papers] Fast and Slow Learning of Recurrent Independent Mechanisms XIN ZHANG, Matsuo Lab http://deeplearning.jp/

http://deeplearning.jp/

書誌情報 ● タイトル： ○ Fast and Slow Learning of Recurrent Independent Mechanisms ● 著者 ○ Kanika Madan, Rosemary Nan Ke, Anirudh Goyal, Bernhard Scholkopfm, Yoshua Bengio. ● ICLR 2021 ● 概要 ○ 脳に存在する機能毎に独立な部分を,Modular Networkで実現しようと... ○ Recurrent Independent Mechanisms(RIM)はその一種. ○ RIMの学習を異なるStepで行う仕組みを提案し, 手法を改良した研究. 2

Introduction

Introduction：Modular Networks ➢ VQA：Parserで再利用な可能なModuleを選び, Networkを作成. Deep Compositional Question Answering with Neural Module Networks 2016 4

https://arxiv.org/pdf/1511.02799.pdf

Introduction：Modular Networks ➢ 多めにネットワークを生成して, 進化論の思想で, 役立つModuleを残していく. 5

Introduction：Modular Networks ➢ ロボットのModule, タスクのModuleを学習して, 新たな組み合わせに汎化できる. Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer 2016 6

https://arxiv.org/pdf/1609.07088.pdf

Meta Learning of Recurrent Independent Mechanisms

RIM:Recurrent Independent Mechanisms ➢ Inputを潜在空間にEncode, RIMを通すことで, Inputに関連したMemoryをOutput. ○ OutputをValue, Policyに分割して,PPOの学習に使う. ➢ RIMは, 独立したNこのModule, AttentionでInputに関連したK個のRIMを更新. 8

Meta Learning of RIM ➢ Fast Inner：RIM, Policy head. ➢ SLOW：Input Attention & Communication Attention, Value head. 9

10.

提案手法：MIR ➢ PPOのLoss. ➢ θM, θA,でAttentionとModuleの更新異なるStepで行う. 10

11.

Related Work - Modular Networks（Introdcution） - Meta Learning - Modular meta-learning 2018 - Meta-Learning to Disentangle Causal Mechanisms - A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms - Learning neural causal models from unknown interventions 11

12.

Experiment

13.

a: Improve sample efficiency? ➢ YES, 赤い線が提案手法, 横軸がFrame数. 13

14.

b: Lead to policy that generalize better? ➢ Yes, More DifficultはZero shot Transfer, Baselineを大きくリード. 14

15.

c: Fast adaptation to new distributions? ➢ 簡単な環境でPre-trainして,target 環境で成功率を測る. ○ もっと効率的に知識のピースを再利用していると言える. 15

16.

Ablation: Meta-Learning setupが大事？ ➢ Meta-learningの重要さを示す? Meta-LSTMがvanillaより良い図. 16

17.

Ablation: Sparsity, Slow-factor of Outer loop n=4, k=2の例.. ➢ 全部使うより, SparsityがModuleの機能性を向上させる. 17

18.

Ablation: Value function Visualization ➢ 左の図, Valueが上がったり下がったり...ゴールが見えている時は, 高い値を示す. ➢ Frame 12はゴールの目の前にいて,すごく高い, 13はタスク終了なので,低くなる. 18

19.

Ablation: Visualizing Module Activations ➢ 左のInputで,活性化されるModuleを示している. n=5, k=3. ➢ F7のところで左の緑の点が見えて,M5が活性化される.. 19

20.

Ablation: Importance of Fast and Slow Update Loops. ➢ Inner loop, Outer loopの役割を交換すると,精度は落ちる. Vanilaと同じ程度に. ➢ AttentionのLearning rateだけを落としても,うまくいかない.(slowLR) 20

21.

Ablation: Roles of the Active Modules ➢ Active Modulesを減らしたら,エピソードを完成するのに,より長い時間をかけた. 21

22.

Conclusion

23.

まとめ&感想まとめ： - 知識の分解と再利用を実現するのに, 必要なアーキテクチャに関する研究. - 多くの関連分野(meta RL, HRL, time scale in RL, attention)をうまく繋げた面白い研究.(OpenReview.) - 具体的にはRIMをMete-Learning的な考え方で実現してみた. - Meta-learningの活用で,汎化性能を挙げられることに期待. 感想： - Modular Networkの研究が面白い, RIMはBengio先生が推してて重要な研究. - それぞれのModuleが異なる役割をもっと明確に担当させるのに, 方法がありそう. - DADS の 23

https://openreview.net/pdf?id=HJgLZR4KvH

24.

Appendix - 関連研究: - Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules ブログ RIM： - https://www.zhihu.com/search?type=content&q=Recurrent%20independent%20mechanism s

25.

Appendix：PPO