[DL輪読会]Parity Models: A General Framework for Coding-Based Resilience in ML Inference (SOSP'19)

>100 Views

November 15, 19

#deep learning #Deep Learning #Parity Models #Inference Optimization #Machine Learning #Erasure Coding

スライド概要

2019/11/15
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 84.7K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 59.4K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 54.1K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 38.8K

【拡散モデル勉強会】拡散モデルのサンプラーまとめ

Deep Learning JP 33.3K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 31.9K

各ページのテキスト

DEEP LEARNING JP [DL Papers] Parity Models: A General Framework for Coding-Based Resilience in ML Inference (SOSP’19) Presenter: Masanori Misono (The University of Tokyo) http://deeplearning.jp/ 2019-11-15 1

http://deeplearning.jp/

書誌情報 • Parity Models: Erasure-Coded Resilience for Prediction Serving Systems • Jack Kosaian, K. V. Rashmi (CMU), Shivaram Venkataraman (University of Wisconsin-Madison) • SOSP’19 • https://sosp19.rcs.uwaterloo.ca/program.html • https://github.com/Thesys-lab/parity-models • Erasure-code (消失訂正符号) を利用して，MLの推論を高速化 2019-11-15 2

SOSPについて • ACM Symposium on Operating Systems Principles • 隔年で開催 • OSDI (Operating Systems Design and Implementation) と並んでシステム系のトップカンファレンス TensorFlow (OSDI’16) • 最近はML系応用の話も増加傾向にある毎回~30本ぐらいしか論文が通らないのでML系カンファレンスと比較すると数はかなり少ない 2019-11-15 3

今年のML系の話 • PipeDream: Generalized Pipeline Parallelism for DNN Training Stanford, Microsoft, CMU DNN訓練をパイプライン化することで，model parallel / data parallelのいいとこ取り • • A Generic Communication Scheduler for Distributed DNN Training Acceleration ByteDanceで使われている data parallelism時のデータ転送のスケジューリング手法 TASO: Optimizing Deep Learning Computation with Automated Generation of Graph Substitutions Stanford DNNグラフの最適化を自動でおこなうフレームワーク • Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform Columbia University データが増え続ける環境下でのdifferential privacy 2019-11-15 4

今年のML系の話 (続き) • Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis Microsoft 複数のアプリケーションに対応したDNN serving system • Optimizing Data-Intensive Computations in Existing Libraries with Split Annotations Stanford 行列データの値の持ち方をアノテーションすることで，計算時に無駄なコピーを避ける • Parity Models: Erasure-Coded Resilience for Prediction Serving Systems 今回紹介 2019-11-15 5

AI Systems • http://learningsys.org/sosp19/index.html • SOSP併設のワークショップ • Neuripsの Systems for ML workshopや，USENIX OpML と同じ立ち位置 • トピック What are the Unique Challenges and Opportunities in Systems for ML? A View of Programming Languages & Software Engineering for ML Software Asynchrony and Quantization for Efficient and Scalable Learning Learning Based Coded-Computation: A Novel Approach for Resilient Computation in ML Inference Systems Building Scalable Systems for Reinforcement Learning and Using Reinforcement Learning for Better Systems Challenges and Progress in Scaling ML Fairness 詳しいレポート • https://syncedreview.com/2019/11/01/visiting-the-sosp-2019-ai-system-workshop/ 2019-11-15 6

Motivation • 機械学習のアプリケーションの推論を早くしたい 2019-11-15 7

既存手法 • ロードバランシング 2019-11-15 8

課題 • Tail Latency 2019-11-15 9

10.

Erasure Coding • 冗長化手法の一つ • k個のデータブロックと，m個のパリティブロックで符号化 • このうちk個あればデータが復元できる 2019-11-15 10

11.

アイディア: Erasure-codeを推論に応用する Coded-Computation 2019-11-15 11

12.

エンコード・デコード方法は? 2019-11-15 12

13.

ニューラルネットワークで近似? 2019-11-15 13

14.

ニューラルネットワークで近似 → 遅い! 2019-11-15 14

15.

提案手法 / !" ( %' ) ,# &' %' +) %' 2019-11-15 %' (*$ 15

16.

パリティモデルの学習 x1 x2 y1 2019-11-15 x = x1+x2+x3 x3 y2 y3 p y3 = y1 + y2 + p ∵ p = y3 - y1 - y2 16

17.

実装 • Clipper (NSDI’17)ベースでserving systemを作成 https://github.com/Thesys-lab/parity-models http://clipper.ai • モデルはpytorch • MSEで訓練 2019-11-15 17

18.

実験パリティモデルの精度 k=2, r=1, resnet18 2019-11-15 k=2,3,4 18

19.

実験パリティモデルの精度 CIFER10, ResNet-18 2019-11-15 19

20.

実験推論速度 latency-accuracy tradeoff 2019-11-15 20

21.

実験推論速度 2019-11-15 21

22.

議論 • パリティモデルのアーキテクチャは何を使うべき? 原理的には推論モデルと同じ必要はない論文ではこのモデルは探索しておらず，推論と同じモデルを利用同じモデルであれば，推論時間もだいたい同じ • エンコーダー・デコーダーの方式は? 以下のようにするとCIFER10では精度が数%~数10%向上 2019-11-15 22

23.

まとめ • Erasure Codingを応用した，tail latencyを小さくするための推論システムを提案 • パリティ計算のエンコーダー・デコーダー自体はシンプルな計算を用いて，専用モデル (パリティモデル) で実際の近似を学習させるのがポイント • latency-accuracy tradeoff 2019-11-15 23