"Analytics4NN: Accelerating Neural Architecture Search through Modeling and High-Performance Computing Techniques"

>100 Views

May 28, 24

スライド概要

Science of Computing: Classical, AI/ML Invited Talk (DAY-2 : Jan 30, 2024)
Michela Taufer (University of Tennessee)
"Analytics4NN: Accelerating Neural Architecture Search through Modeling and High-Performance Computing Techniques"

The 6th R-CCS International Symposium
https://www.r-ccs.riken.jp/R-CCS-Symposium/2024/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

Analytics4NN: Accelerating Neural Architecture Search through Modeling and High-Performance Computing Techniques Georgia Channing*, Ria Patel*, Paula Olaya*, Ariel Keller Rorabaugh*, Silvina Caino-Lores*, Osamu Miyashita↡, Catherine Schuman*, Florence Tama↡†, and Michela Taufer* *University of Tennessee, Knoxville, USA ↡Center for Computer Science, RIKEN, Kobe, Japan †Nagoya University, Nagoya, Japan Supported by NSF award #2103845, OLCF CSC427 allocation, and IBM Global University Program

2.

Analytics for Neural Networks (A4NN) University of Tennessee Design efficient workflows to reduce the time and resources required to generate accurate and efficient neural network architectures Georgia Channing Paula Olaya Silvina Caino-Lores Ria Patel Ariel Rorabaugh Seoyoung An Michela Taufer Catherine Schuman RIKEN Osamu Miyashita 2 Florence Tama

3.

An Explosion of Scientific Data from Experiments 3

4.

Generating Data through XFEL Experiments X-ray Free Electron Laser (XFEL) beams create 2D protein diffraction (PD) patterns that reveal properties of the 3D protein structure The experiments generate large image datasets which embed protein structures 4

5.

Extracting Conformational Structural Properties Conformation is the shape adopted by a protein and is caused by the rotation of the protein atoms around one or more single bonds Conformation A Φ, θ, Ψ = 24o, 151o, 346o 5 Conformation B Φ, θ, Ψ = 34o, 139o, 106o

6.

Different Beam Intensity and Noise in XFEL Images Images have different granularity depending on the beam intensity • Different intensities embed different amounts of noise • Low Beam Intensity 1 x 10¹⁴ photons/㎛ /pulse 6 Medium Beam Intensity 1 x 10¹⁵ photons/㎛ /pulse High Beam Intensity 1 x 10¹⁶ photons/㎛ /pulse

7.

Using Neural Networks and Neural Architecture Search Neural Networks (NN) can extract information from scientific data Protein images 7 Protein properties: ● Protein type ● Orientation ● Structure Olaya et al, “Identifying Structural Properties of Proteins from X- ray Free Electron Laser Diffraction Patterns”. In Proceedings of the 18th IEEE International Conference on e-Science (eScience), pages 1–10, Salt Lake City, Utah, USA, October 2022. IEEE Computer Society.

8.

Using Neural Networks and Neural Architecture Search Neural Networks (NN) can extract information from scientific data Protein images Protein properties: ● Protein type ● Orientation ● Structure Custom NNs are needed for each dataset 8 Olaya et al, “Identifying Structural Properties of Proteins from X- ray Free Electron Laser Diffraction Patterns”. In Proceedings of the 18th IEEE International Conference on e-Science (eScience), pages 1–10, Salt Lake City, Utah, USA, October 2022. IEEE Computer Society.

9.

Using Neural Networks and Neural Architecture Search Neural Networks (NN) can extract information from scientific data Protein images Protein properties: ● Protein type ● Orientation ● Structure Custom NNs are needed for each dataset 9 Neural Architecture Search (NAS) can automatically find an optimal NN for a given dataset.

10.

Monolithic Implementations ● NAS algorithms and their implementations are monolithic and couple the search and estimation strategies → Reduces the possibility of modular optimizations1 10 1 C. White et al. Neural Architecture Search: Insights from 1000 Papers. arXiv:2301.08727. 2023. Performance Estimation Strategy

11.

Enormous Energy Consumption ● NAS algorithms and their implementations are monolithic and couple the search and estimation strategies → Reduces the possibility of modular optimizations1 Performance Estimation Strategy ● NAS workflows consume enormous amounts of energy and time by training non-optimal networks for long training periods → Limits the accessibility of NAS for researchers with compute limitations1 11 1 C. White et al. Neural Architecture Search: Insights from 1000 Papers. arXiv:2301.08727. 2023.

12.

Ever-Growing Energy Consumption 1952 12 1960 1968 1978 1984 1992 Publication Year Sevilla, Jaime & Heim, Lennart & Ho, Anson & Besiroglu, Tamay & Hobbhahn, Marius & Villalobos, Pablo. (2022). Compute Trends Across Three Eras of Machine Learning. 2000 2008 Large scale models Deep learning Pre-Deep learning Training compute (FLOPs) Training compute (FLOPs) of milestones Machine Learning system over time 2016

13.

Obscured NN Evolution and Metadata ● NAS algorithms and their implementations are monolithic and couple the search and estimation strategies → Reduces the possibility of modular optimizations1 Performance Estimation Strategy ● NAS workflows consume enormous amounts of energy and time by training non-optimal networks for long training periods → Limits the accessibility of NAS for researchers with compute limitations1 ● Search strategies obscure the evolution of NN architecture and their learning histories → Hinders the explainability of resulting NNs1 13 1 C. White et al. Neural Architecture Search: Insights from 1000 Papers. arXiv:2301.08727. 2023.

14.

Analytics for Neural Networks (A4NN) We propose to augment NAS with our A4NN workflow: ● Transform NAS implementations from monolithic software tools into a flexible, modular workflow ● Generate adaptable NN fitness predictions ● Build an open-access NN data commons ● Assess A4NN with a dataset of simulated XFEL Images High Medium Low 14

15.

Analytics for Neural Networks (A4NN) We propose to augment NAS with our A4NN workflow: ● Transform NAS implementations from monolithic software tools into a flexible, modular workflow ● Generate adaptable NN fitness predictions ● Build an open-access NN data commons ● Assess A4NN with a dataset of simulated XFEL Images Low 15

16.

The A4NN Workflow: Input Data Conformations from protein diffraction patterns generated through simulations of XFEL experiments Balanced conformation classes for each beam intensity à 80/20 train-test split of 63,508/15,876 images 16

17.

The A4NN Workflow: NSGA-Net (NAS) Select a NAS to train NNs from a specified search space *NSGA-Net optimizes for minimal FLOPS 17

18.

The A4NN Workflow: NSGA-Net (NAS) Select a NAS to train NNs from a specified search space *NSGA-Net optimizes for minimal FLOPS 18

19.

The A4NN Workflow: Prediction Engine Pull partially trained NNs and curate a parametric function to model the NNs fitness learning curves *NSGA-Net optimizes for minimal FLOPS 19

20.

The A4NN Workflow: NN Data Commons Record the NN’s behavior throughout training for reproducible and explainable Optimal Model *NSGA-Net optimizes for minimal FLOPS 20

21.

Analytics for Neural Networks (A4NN) We propose to augment NAS with our A4NN workflow: ● Transform NAS implementations from monolithic software tools into a flexible, modular workflow ● Generate adaptable NN fitness predictions ● Build an open-access NN data commons ● Assess A4NN with a dataset of simulated XFEL Images 21

22.

Predictive Engine for NNs (PENGUIN) Predict NN performance to enable early stopping and expedite architecture optimization 22

23.

Predictive Engine for NNs (PENGUIN) ● Generate fast, dynamic fitness predictions to inform the NAS ● Flexible learning curve modelling with parametric functions ● Prediction validation over multiple epochs Prediction Engine Parametric Modeling Prediction Analyzer Users can plug in any parametric function to use for modeling– even a custom function! 23 Tom Viering and Marco Loog “The Shape of Learning Curves: a Review,” 2021 arxiv preprint.

24.

Predictive Engine: An Example Prediction engine converges to a fitness prediction for epoch 20 (i.e., end of training) 24 Ariel Keller Rorabaugh, Silvina Caino-Lores, Travis Johnston, and Michela Taufer. Building High-throughput Neural Architecture Search Workflows via a Decoupled Fitness Prediction Engine IEEE Trans. Parallel Distributed Syst. (TPDS), 33(11):2913–2926, 2022.

25.

Predictive Engine: An Example Prediction engine converges to a fitness prediction for epoch 20 (i.e., end of training) With this prediction, the workflow can: ● Terminate training of an NN ● Generate more NNs based on a top performing NN ● Save time and resources 25 Ariel Keller Rorabaugh, Silvina Caino-Lores, Travis Johnston, and Michela Taufer. Building High-throughput Neural Architecture Search Workflows via a Decoupled Fitness Prediction Engine IEEE Trans. Parallel Distributed Syst. (TPDS), 33(11):2913–2926, 2022.

26.

Analytics for Neural Networks (A4NN) We propose to augment NAS with our A4NN workflow: ● Transform NAS implementations from monolithic software tools into a flexible, modular workflow ● Generate adaptable NN fitness predictions ● Build an open-access NN data commons ● Assess A4NN with a dataset of simulated XFEL Images 26

27.

Tracker: Collecting Metadata Analyze discarded NNs for similarities à understanding NN evolution and reproducibility 27

28.

Visualizing Similarities NN architecture generated with NSGA-Net as a graphic and binary array of connectivity phases à store matching subsequence and their length Validation accuracy Scatter plot of matching network subsequences from NN commons Counts unique NN structures with subsequences Subsequence length 28 Seoyoung An, Georgia Channing, Catherine Schuman, and Michela Taufer. Visual Analytics Interactive Tool for Neural Network Archaeology. In Proceedings of the IEEE Cluster Conference (CLUSTER), 2023.

29.

Visualizing Similarities NN architecture generated with NSGA-Net as a graphic and binary array of connectivity phases à store matching subsequence and their length Validation accuracy Validation accuracy Scatter plot of matching network subsequences from NN commons Analysis of subsequences I in highly accurate NNs Counts unique NN structures with subsequences Subsequence length 29 The network structure of the selected network in our NN commons in ascending validation accuracy à dashed lines represent matching subsequence. Seoyoung An, Georgia Channing, Catherine Schuman, and Michela Taufer. Visual Analytics Interactive Tool for Neural Network Archaeology. In Proceedings of the IEEE Cluster Conference (CLUSTER), 2023.

30.

Data Commons: Accessing Metadata 30

31.

Analytics for Neural Networks (A4NN) We propose to augment NAS with our A4NN workflow: ● Transform NAS implementations from monolithic software tools into a flexible, modular workflow ● Generate adaptable NN fitness predictions ● Build an open-access NN data commons ● Assess A4NN with a dataset of simulated XFEL Images High Medium Low 31

32.

Performance Results: Metrics of Success With A4NN, we strive for equal task performance with improved efficiency. We evaluate our workflow by: • Accuracy (maximize) — Achieving results on par with SOTA • FLOPS (minimize) — Proxy for energy consumption per model • Wall-time (minimize) — Proxy for energy consumption for workflow Balanced conformation classes for each beam intensity à 80/20 train-test split of 63,508/15,876 images 32

33.

Performance Results: Accuracy and FLOPS A4NN the higher, the better the higher, the better Standalone NAS the lower, the better the lower, the better Comparable accuracy and FLOPS between standalone NAS and A4NN 33

34.

Performance Results: Saved Wall-time Hardware In-Use: ● DARWIN Cluster at U. Delaware ● 1 and 4 NVIDIA V100 GPUs Configurations: ● NSGA-NET ● A4NN (1 GPU) ● A4NN (4 GPUs) Data: ● low intensity ● medium intensity ● high intensity 5.5x faster wall-times when using A4NN compared to standalone NAS 34

35.

A4NN Results Compared to State of the Art We compare our results on 4 GPU to a SOTA workflow for this dataset, called XPSI. Beam Low Medium High Metric XPSI A4NN Wall-Time 15.5 h 12.1 h Accuracy 92% 97.8% Wall-Time 15.5 h 9.2 h Accuracy 99% 99.9% Wall-Time 15.5 h 9.5h Accuracy 100% 100% A4NN trains faster than XPSI and it’s accuracies match or outperform XPSI 35 Paula Olaya, Silvina Caino-Lores, Vanessa Lama, Ria Patel, Ariel Rorabaugh, Osamu Miyashita, Florence Tama, and Michela Taufer. Identifying Structural Properties of Proteins from X- ray Free Electron Laser Diffraction Patterns. In Proceedings of the 18th IEEE International Conference on e-Science (eScience), pages 1–10, Salt Lake City, Utah, USA, October 2022. IEEE Computer Society.

36.

Lessons Learned Strategies for successful optimization of NN training include: • Decouple search and estimation strategies • Minimize energy and training time • Explain NN per performance by 35 examining training NNs’ history With the A4NN workflow, we deliver: • A composable, reusable deep-learning workflow for scientific datasets • An efficient prediction methodology for any NAS • 54 GB of metadata and model checkpoints for future study 36

37.

Data Commons and Reproducibility Contact: [email protected] 37