[DL輪読会]A closer look at few shot classification

>100 Views

March 04, 19

#deep learning #Deep Learning #Few-shot Classification #Baseline++ #Domain shift #Evaluation methods

スライド概要

2019/03/01
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 87.4K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 59.9K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 58.5K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 41.4K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 37.9K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 37.5K

各ページのテキスト

DEEP LEARNING JP [DL Papers] A Closer Look at Few-shot Classification (ICLR2019) Kazuki Fujikawa, DeNA http://deeplearning.jp/ 1

http://deeplearning.jp/

サマリ • 書誌情報 – A Closer Look at Few-shot Classification • ICLR2019（to appear） • Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang • 概要 – Few-shot classificationの標準的な評価方法の問題を指摘 • Few-shot learningで標準的に用いられる実験設定において、今回定義するBaseline++は SOTAに匹敵する • 大きなNNを用いた場合、CUB, mini-ImageNet両方のベンチマークデータセットで BaselineがSOTAに匹敵する • データにドメインシフトがある場合、Baselineの方が優れている – 適切に比較可能な実験設定で一貫して実験できるよう、実験コードを公開 • まだ解かれていないドメインシフトのあるFew-shot Learningの問題が前進することを期待する 2

アウトライン • 背景 • 比較手法 • 実験・結果 3

アウトライン • 背景 • 比較手法 • 実験・結果 4

背景 • Few-shot classification を扱う研究が近年増加している – 解きたいタスクの教師データが少ない場合に、大規模データから他のタスクでも転用可能な知識を抽出できるようにしよう、というのが一つの方向性 Meta Learning Models Taxonomy Model Based ● ● ● ● ● Santoro et al. ’16 Duan et al. ’17 Wang et al. ‘17 Munkhdalai & Yu ‘17 Mishra et al. ‘17 Metric Based ● ● ● ● ● Koch ’15 Vinyals et al. ‘16 Snell et al. ‘17 Shyam et al. ‘17 Sung et al. ‘17 Optimization Based ● ● ● ● ● ● ● Schmidhuber ’87, ’92 Bengio et al. ’90, ‘92 Hochreiter et al. ’01 Li & Malik ‘16 Andrychowicz et al. ’16 Ravi & Larochelle ‘17 Finn et al. ‘17 Ada ptedf r om Fi nn ‘ 1 7 図引用: Vinyals, Oriol. NIPS 2017 Meta-Learning symposium. Or i olVi ny a l s, NI PS 1 7 • 一方過去の研究では下記の観点で課題がある – ベースラインのパフォーマンスが不当に低評価されており、公平な評価ができていない • Data augumentationされていない、等 – データのドメインシフトを考慮していない • 未知のクラスも同じデータセットからサンプリングして評価している 5

アウトライン • 背景 • 比較手法 • 実験・結果 6

Baseline • 問題設定 – Training stage • Base classのデータを使って特徴抽出器 𝑓𝜃 , 分類器 𝐶(∙ |𝐖𝑏 ) を学習する • Base class: Few-shotで分類したいクラスとは別のクラスのデータ（ラベル付きデータが大量にある前提） – Fine-tuning stage • 𝑓𝜃 は固定し、Novel classのデータを使って分類器 𝐶(∙ |𝐖𝑛 ) を学習する Published as a conference paper at ICLR 2019 • Novel class: Few-shotで分類したいクラスのデータ（ラベル付きデータが数件しか無い前提） Training stage Base class data (Many) Feature extractor Fine-tuning stage Classifier Novel class data (Few) Fixed Feature extractor Classifier Classifier Cosine distance … Softmax ! Baseline++ … Linear layer Softmax ! Baseline Figure 1: Baseline and Baseline++ few-shot classiﬁcation methods. Both the baseline and 7

Baseline • モデル – Baseline • 𝑓𝜃 (𝐱 𝑖 ) と 𝐖 ∈ ℝ𝑑×𝑐 との内積に基づくクロスエントロピー誤差を最小化する – Baseline++ • 𝑓𝜃 (𝐱 𝑖 ) と 𝐖𝑏 = 𝐖1, 𝐖2 , … , 𝐖𝑐 ∈ ℝ𝑑×𝑐 とのコサイン距離に基づくクロスエントロピー誤差を最小化する • Baselineと比べてクラス内変動（intra-class variation）を減らすことが狙い Published as a conference paper at ICLR 2019 • [Hu+, CVPR2015], [Gidaris & Komodakis, CVPR2018] でも導入されている Training stage Base class data (Many) Feature extractor Fine-tuning stage Classifier Novel class data (Few) Fixed Feature extractor Classifier Classifier Cosine distance … Softmax ! Baseline++ … Linear layer Softmax ! Baseline Figure 1: Baseline and Baseline++ few-shot classiﬁcation methods. Both the baseline and 8

Few-shot classification • 問題設定（N-way K-shot） – Meta-testing stage • Support set（N件ずつのKクラスラベル付きデータ）を手がかりに、Query set（ラベル無しデータ）をKクラスいずれかに分類する – Meta-training stage • Meta-testingでの状況に合わせて、Support set, Query setをBase classからサンプリングする • サンプリングされたQuery setが、Support setを参考に正しく分類できるように特徴抽出器 𝑓𝜃 を Published as a conference paper at ICLR 2019 学習する Meta-testing stage Meta-training stage Sampled # classes Base support set !" Base class data (Many) Base query set MatchingNet Euclidean distance Relation Module MAML Linear distance Support set conditioned model RelationNet ProtoNet Class Class mean $ mean $ ) Linear Feature extractor Cosine Novel support set (Novel class data Support set conditioned model Gradient Figure 2: M eta-lear ning few-shot classiﬁcation algor ithms. The meta-learning classiﬁer M(·|S) is conditioned on the support set S. (Top) In the meta-train stage, the support set Sb and the query 9

10.

Few-shot classification • モデル – MatchingNet [Vinyals+, NIPS2016] • Nクラス x Kサンプルの Support set と Query set に対してそれぞれ 𝑓𝜃 で特徴抽出 • コサイン距離に基づくクロスエントロピー誤差を最小化 – ProtoNet [Snell+, NIPS2017] • Nクラス x Kサンプルの Support set と Query set に対してそれぞれ 𝑓𝜃 で特徴抽出 Published as a conference paper at ICLR 2019 • Support set から得られた特徴ベクトルをクラス毎に平均し、N個のprototype（ベクトル）を作る • Query set のベクトルとprototypeとのユークリッド距離に基づくクロスエントロピー誤差を最小化 Meta-testing stage Meta-training stage Sampled # classes Base support set !" Base class data (Many) Base query set MatchingNet Euclidean distance Relation Module MAML Linear distance Support set conditioned model RelationNet ProtoNet Class Class mean $ mean $ ) Linear Feature extractor Cosine Novel support set (Novel class data Support set conditioned model Gradient Figure 2: M eta-lear ning few-shot classiﬁcation algor ithms. The meta-learning classiﬁer M(·|S) is conditioned on the support set S. (Top) In the meta-train stage, the support set Sb and the query 10

11.

Few-shot classification • モデル – RelationNet [Sung+, CVPR2018] • 大枠はProtoNetと同じ • NNでパラメタライズされたRelation Moduleのスコアに基づくクロスエントロピー誤差を最小化 – MAML [Finn+, ICML2017] • Support set（小数のラベル付きデータ）でFine-tuningをした時に、Query setの予測誤差が小さくなるようなモデルパラメータの初期値を学習する Published as a conference paper at ICLR 2019 Meta-testing stage Meta-training stage Sampled # classes Base support set !" Base class data (Many) Base query set MatchingNet Euclidean distance Relation Module MAML Linear distance Support set conditioned model RelationNet ProtoNet Class Class mean $ mean $ ) Linear Feature extractor Cosine Novel support set (Novel class data Support set conditioned model Gradient Figure 2: M eta-lear ning few-shot classiﬁcation algor ithms. The meta-learning classiﬁer M(·|S) is conditioned on the support set S. (Top) In the meta-train stage, the support set Sb and the query 11

12.

アウトライン • 背景 • 関連研究 • 比較手法 • 実験・結果 12

13.

データセット • mini-ImageNet – ImageNetをベースに、計算量削減のため解像度やクラス数を限定して作成したデータセット • 解像度: 84x84 • データ数（計60,000件） – train: 64クラス x 600件 – valid: 16クラス x 600件 – test: 20クラス x 600件 • CUB Russakovsky, Olga, et al. "Imagenet large scale visual recognition challenge." International journal of computer vision 115.3 (2015): 211-252. – 鳥に関する粒度の細かいラベルがつけられた画像のデータセット • データ数（計11,788件） – train: 100クラス – valid: 50クラス – test: 50クラス Wah C., Branson S., Welinder P., Perona P., Belongie S. “The Caltech-UCSD Birds-200-2011 Dataset.” Computation & Neural Systems Technical Report, CNS-TR-2011-001. 13

14.

実験概要 • タスク概要 – 実験1: 論文の報告に多い標準の問題設定で、各手法を統一的に再評価 – 実験2: 特徴抽出器のNNを深くしたモデルを使って各手法を実験 – 実験3: ドメインシフトがある設定（mini-ImageNet→CUB）で各手法を実験 • ハイパーパラメータ – 手法特有のハイパーパラメータは下記のように設定 • baseline, baseline++ – training stage: batchsize: 16, epochs: 400 – testing stage: batchsize: 4, iterations: 100 • meta-learning – 1-shot: 60,000 episodes, 5-shot: 40,000 episodes 14

15.

実験1: 標準設定による再評価 • 各比較手法について、実験設定を標準設定で揃えて再実験 – データセット: mini-ImageNet, 特徴抽出器 𝑓𝜃 : 4層CNN – 各手法の設定 • Baseline ⇔ Baseline*: data-augmentation有り ⇔ 無し • ProtoNet ⇔ ProtoNet#: 5-way ⇔ 30-way(1-shot), 20-way(5-shot) でmeta-train • 考察 – Baselineについては、data-augmentationすることにより改善可能であり、報告値は過小評価されている – Baseline++を含めて比較するとSOTA手法に匹敵する 15

16.

実験2: 特徴抽出器のNNを深化させた実験 • 各比較手法について、特徴抽出器のNNを深化させた時のパフォーマンスを比較 – データセット: CUB, mini-ImageNet – 特徴抽出器 𝑓𝜃 : 4層CNN, 6層CNN, ResNet-10, ResNet-18, ResNet-34 • 考察 – CUBでは層の深さを深くした場合に手法間の差が小さくなっている – mini-ImageNetでは層を深くするとBaselineに負ける手法が出てくる 16

17.

実験3: ドメインシフトを含む実験 • 各比較手法について、ドメインシフトを含む場合のパフォーマンスを比較 – データセット: mini-ImageNet (meta-training) → CUB (meta-testing) – 特徴抽出器 𝑓𝜃 : ResNet-18 • 考察 – BaselineがMeta-learningの手法全てを上回る結果に Published as a conference paper at ICLR 2019 – ドメイン間の相違が増大するにつれ、Meta-learningの手法は相対的に有効でなくなるという結果になった Baseline Baseline++ MatchingNet ProtoNet MAML RelationNet 90% mini-I mageNet ! CUB 80% Baseline Baseline++ 65.57± 0.70 62.04± 0.76 70% M atchingNet ProtoNet M AM L RelationNet 53.07± 0.74 62.02± 0.70 51.34± 0.72 57.71± 0.73 50% Table 3: 5-shot accur acy under the cross-domain scenar io with a ResNet-18 backbone. Baseline outperforms all other 60% 40% CUB miniImageNet Small miniImageNet -> CUB Large Domain Difference Figure 4: 5-shot accur acy in differ ent scenar ios with a ResNet-18 backbone. The Baseline model performs relative well with larger domain 17

18.

結論 • Few-shot classificationの標準的な評価方法の問題を指摘 – Few-shot learningで標準的に用いられる実験設定において、Baseline++はSOTAに匹敵 – 大きなNNを用いた場合、CUB, mini-ImageNet両方のベンチマークデータセットで BaselineがSOTAに匹敵 – データにドメインシフトがある場合、Baselineの方が優れている • 適切に比較可能な実験設定で一貫して実験できるよう、実験コードを公開 – まだ解かれていないドメインシフトのあるFew-shot Learningの問題が前進することを期待 18

19.

References • Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang. A Closer Look at Few-shot Classification. In ICLR 2019. • Spyros Gidaris and Nikos Komodakis. Dynamic few-shot visual learning without forgetting. In CVPR 2018. • Junlin Hu, Jiwen Lu, and Yap-Peng Tan. Deep transfer metric learning. In CVPR 2015. • Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In NIPS 2016 • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NIPS 2017. • Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In CVPR 2018. • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML 2017. 19