>100 Views
May 29, 24
スライド概要
大阪大学大学院 基礎工学研究科 システム創成専攻 飯國研究室所属 修士2年 主に機械学習・深層学習に興味あり
Structure Learning via Deep Unfolding for Compressed Sensing Algorithms Koshi Nagahisa Iiguni Lab Graduate School of Engineering Science, Osaka University
Outline • Introduction • Related work • Proposed method • Experiment • Conclusion 2/17
Introduction|Compressed Sensing[1] 3/17 Estimate a sparse signal 𝒙∗ ∈ ℝ𝑵 (where most elements are zero) from the data 𝒚 ∈ ℝ𝑀 obtained by linear observation 𝒚 = 𝑨𝒙∗ + 𝒆(𝑀 < 𝑁) Observation vector Sparse vector you want to estimate Applications : 𝑀 𝒆 MRI (Magnetic Resonance Imaging) [2] Image recovered by Image recovered 1/4 data collection compressed sensing from complete data 𝑁 Estimate [1] D. L. Donoho, "Compressed sensing," IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, April 2006, doi: 10.1109/TIT.2006.871582. [2] M. Lustig, D. L. Donoho, J. M. Santos and J. M. Pauly, "Compressed Sensing MRI," IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 72-82, March 2008, doi: 10.1109/MSP.2007.914728.
Optimization problem for Compressed Sensing 4/17 Compressed sensing can be performed by solving the optimization problem. objective function ෝ = argmin 𝒙 𝒙 ∈ ℝ𝑁 Data fidelity term 𝒚∈ℝ𝑀 𝑨 ∈ ℝ 𝑀×𝑁 𝒙 ∈ℝ𝑁 𝑀<𝑁 1 𝒚 − 𝑨𝒙 𝟐𝟐 + 𝜆 𝒙 1 2 The aim is to minimize the difference between 𝒚 and 𝑨𝒙 to ensure that 𝒚 = 𝑨𝒙. Regularization term We estimate 𝒙 that ensures sparsity by minimizing the L1 norm of 𝒙 𝑁 𝒙 𝟏 = |𝑥𝑖 | 𝑖=1 regularization parameter 𝜆 adjusts the influence of the data fidelity term and the regularization term.
ISTA (Iterative Shrinkage Thresholding Algorithm) 5/17 ISTA gives the solution to the optimization problem through iterative computations. ISTA step size 𝒓(𝑡) = 𝒙(𝑡) − 𝛾𝑨⊤ 𝑨𝒙(𝑡) − 𝒚 ←Gradient descent step derived from 𝒚 − 𝑨𝒙 𝟐𝟐 𝒙(𝑡+1) = 𝑆𝜆𝛾 𝒓 𝑡 ←Shrinkage step derived from 𝜆 𝒙 1 𝑆𝜆𝛾 (𝑟) 𝑆𝜆𝛾 (⋅) : Soft thresholding function
ISTA (Iterative Shrinkage Thresholding Algorithm) The convergence properties of ISTA are greatly affected by the step size 𝛾. Convergence properties of ISTA (𝜆 = 10) Converges faster diverge 6/17
Related Work | Deep Unfolding[3] 7/17 Applying deep learning to signal reconstruction algorithms can result in significantly superior convergence properties compared to conventional methods. process A process B + process C Signal flow graph of iterative algorithm Neural network [3]K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in Proc. the 27th International Conference on International Conference on Machine Learning, Jun. 2010, pp. 399–406.
Deep Unfolding 8/17 Unfolding the iterative algorithm, we utilize deep learning techniques to properly learn the parameters 𝛾 0 , 𝛾 1 , …. Input 𝒚 Input 𝒚 Loss function Unfold in the time direction output 𝒙 ISTA LISTA (Learned ISTA) We can learn the step size for each iteration through backpropagation.
Deep Unfolding The step sizes in LISTA take appropriate values for each iteration, resulting in faster convergence compared to ISTA with fixed step sizes 𝜆 = 10 9/17
Deep Unfolding 10/17 Problem : Deep unfolding has a fixed network structure and has no flexibility. And it is unclear whether this structure has the best convergence properties. Conventional deep unfolding has a fixed structure Ideal deep unfolding Loss function Loss function I want to learn the structure with the step sizes. Originally it's , but may perform better.
NAS (Neural Architecture Search) [4] 11/17 NAS learns the structure of the neural network itself along with the parameters of the neural network. Designing the structure of a neural network Setting hyperparameters Training the parameters of the neural network The process of a conventional neural network The process of NAS Naive NAS constructs a lot of models to select the optimal one, resulting in high performance but taking an enormous computational cost. [4]B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016.
DARTS (Differentiable Architecture Search)[5] 12/17 Making the search space differentiable enables efficient execution of NAS. node operation output … input 3 𝑜1 (𝑥) 𝑜2 (𝑥) 𝑜3 (𝑥) conv 3 × 3 𝑤2 pooling ReLU 𝑜ҧ 𝑥 = 𝑤𝑘 × 𝑜𝑘 𝑥 𝑤1 + 𝑤3 𝑜(𝑥) ҧ 𝑘=1 The structure parameter 𝛼𝑘 for the operation 𝑜𝑘 𝑤𝑘 = exp(𝛼𝑘 ) σ3𝑘 ′=1 exp(𝛼𝑘 ′ ) (𝑤1 + 𝑤2 + 𝑤3 = 1) [5] Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "DARTS: Differentiable Architecture Search." International Conference on Learning Representations. 2018.
Proposed method 13/17 We apply DARTS to deep unfolding in order to more suitable structures for compressed sensing. Proposed algorithm:AS-ISTA(Architecture Searched-ISTA) 𝑡 exp 𝛽𝑟,𝑘 (𝑡) 𝑤𝑟,𝑘 = 𝒓(𝑡) = 𝑤𝑟,1 × 𝑓 𝒙 𝑡 (𝑡) + 𝑤𝑟,2 × 𝑔 𝒙 𝑡 (𝑡) + 𝑤𝑥,2 × 𝑔 𝒓 𝑡 𝒙(𝑡+1) = 𝑤𝑥,1 × 𝑓 𝒓 𝑡 (𝑡) 𝑡 σ2𝑘 ′ =1 exp(𝛽𝑟,𝑘 ′) (𝑡) (𝑡) (𝑡) 𝑤𝑥,𝑘 = Gradient descent step: 𝑓 𝒙 = 𝒙 − 𝛾 (𝑡) 𝑨⊤ 𝑨𝒙 − 𝒚 (𝑘 = 1,2) exp(𝛽𝑥,𝑘 ) 𝑡 σ2𝑘 ′ =1 exp(𝛽𝑥,𝑘 ′) (𝑘 = 1,2) Shrinkage step: 𝑔 𝒙 = S𝜆𝛾(𝑡) 𝒙 𝑡 𝑡 𝑡 𝑡 Structure parameter: 𝛽𝑟,1 , 𝛽𝑟,2 , 𝛽𝑥,1 , 𝛽𝑥,2 Step size: 𝛾 (𝑡)
Proposed method 14/17 Appearance of AS-ISTA structure The gradient descent step 𝑓 and the reduction step 𝑔 are connected in parallel. We learn the structure parameters and step sizes alternately. structure parameter Input 𝒚 step size Loss function
Experiment|Simulation Settings 15/17 Generation of training data 75 𝒚 = 𝑨 ~𝑁 0,1 × 𝒙∗ + 𝒆 𝒆 ~𝑁 0, 𝜎 2 (Signal to Noise ratio 10dB) 150 Percentage of non-zero components :8% non-zero elements ~𝑁 0,1 Settings of deep learning Maximum iterations : 100 1 ෝ 1 (object function of ISTA) Loss function : 𝒚 − 𝑨ෝ 𝒙 𝟐𝟐 + 𝜆 𝒙 2 (𝑡) (𝑡) 𝑡 𝑡 Initial values of the structure parameters : 𝛽𝑟,1 = 𝛽𝑟,2 = 𝛽𝑥,1 = 𝛽𝑥,2 = 0 (𝑡) (𝑡) (𝑡) (𝑡) 𝑤𝑟,1 = 𝑤𝑟,2 = 𝑤𝑥,1 = 𝑤𝑥,2 = 0.5
Result 16/17 Compare the estimation accuracy (MSE : Mean Squared Error) at each iteration of ISTA, LISTA with only step size 𝛾 learned by conventional deep unfolding, and ASISTA (proposed) 𝜆 = 1.0 𝜆 = 0.1 MSE MSE 𝜆 = 10 AS-ISTA converges most quickly to the ISTA estimated solution. It is also found to converge within 100 iterations for small regularization parameter λ. We find that the proposed deep unfolding can learn parameters with better convergence properties than conventional deep unfolding.
Conclusion 17/17 Summery ISTA, the algorithm that performs compressed sensing, can be applied to deep learning techniques to learn parameters that has good convergence properties. DARTS achieves structural learning by connecting operations in parallel and learning weights. Compared to conventional deep unfolding, the proposed method incorporating structural learning (DARTS) is faster in convergence speed and shows good convergence properties at various λ. Future work I will apply proposed method to other compressed sensing algorithms to see whether I can get proper structural learning. I will see if the results obtained from this sparse signal reconstruction problem can be applied to actual image processing problems.
Appendix|Comparison of learned step sizes 18/17 step size of AS-ISTA exhibits a zigzag pattern, gradually decreasing as it converges. 𝜆 = 10 𝜆=1 𝜆 = 0.1
Appendix|learned structure of AS-ISTA 19/17 These graphs show that the weights of the gradient descent step. 𝜆 = 10 Gradient descent step 𝜆=1 𝜆 = 0.1 Shrinkage step