>100 Views
March 18, 15
スライド概要
Presented at IEEE 18th International Conference on Digital Signal Processing (DSP 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Yusuke Iwao, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi, "Superresolution-based stereo signal separation via supervised nonnegative matrix factorization," Proceedings of IEEE 18th International Conference on Digital Signal Processing (DSP 2013), T3C-2, Santorini, Greece, July 2013.
http://d-kitamura.net/links_en.html
18th International conference on Digital Signal Processing 2013 Superresolution-Based Stereo Signal Separation via Supervised Nonnegative Matrix Factorization Daichi Kitamura, Hiroshi Saruwatari, Yusuke Iwao, Kiyohiro Shikano (Nara Institute of Science and Technology, Nara, Japan) Kazunobu Kondo, Yu Takahashi (Yamaha Corporation Research & Development Center, Shizuoka, Japan)
Outline • 1. Research background • 2. Conventional method – – – – – Nonnegative matrix factorization Penalized supervised nonnegative matrix factorization Directional clustering Multichannel NMF Hybrid method • 3. Proposed method – Regularized superresolution-based nonnegative matrix factorization • 4. Experiments • 5. Conclusions 2
Outline • 1. Research background • 2. Conventional method – – – – – Nonnegative matrix factorization Penalized supervised nonnegative matrix factorization Directional clustering Multichannel NMF Hybrid method • 3. Proposed method – Regularized superresolution-based nonnegative matrix factorization • 4. Experiments • 5. Conclusions 3
Background • Music signal separation technologies have received much attention. Applications • Automatic music transcription • 3D audio system, etc. • Music signal separation based on nonnegative matrix factorization (NMF) has been a very active area of the research. • The extraction performance of NMF markedly degrades for the case of many source mixtures. We propose a new method for multichannel signal separation with NMF utilizing both spectral and spatial cues included in mixtures of multiple instruments. 4
Outline • 1. Research background • 2. Conventional method – – – – – Nonnegative matrix factorization Penalized supervised nonnegative matrix factorization Directional clustering Multichannel NMF Hybrid method • 3. Proposed method – Regularized superresolution-based nonnegative matrix factorization • 4. Experiments • 5. Conclusions 5
NMF Frequency Frequency Amplitude • NMF is a type of sparse representation algorithm that decomposes a nonnegative matrix into two nonnegative matrices. [D. D. Lee, et al., 2001] Time Observed matrix (Spectrogram) 𝒀: Observed matrix 𝑭: Basis matrix 𝑮: Activation matrix Time Amplitude Activation matrix (Time-varying gain) Basis matrix (Spectral bases) Ω: Number of frequency bins 𝑇: Number of frames 𝐾: Number of bases 6
Penalized Supervised NMF (PSNMF) • In PSNMF, the following decomposition is addressed under the condition that is known in advance. [Yagi, et al., 2012] Training process Supervision sound Separation process Supervised bases of the target sound Fix trained bases and update is forced to become uncorrelated with . Update Problem of PSNMF: When the signal includes many sources, the extraction performance markedly degrades. 7
Directional Clustering • Directional clustering can estimate sources and their direction in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009] L :Source component :Centroid vector Right cluster R L-ch input signal Left cluster Center cluster R-ch input signal Problem of directional clustering: This method cannot separate sources in the same direction. 8
Multichannel NMF • Multichannel NMF also has been proposed [Ozerov, et al., 2010] [Sawada, et al., 2012]. • Natural extension of NMF for a multichannel signal • This method uses spectral and spatial cues to achieve the unsupervised separation task. Problem of multichannel NMF: This unified method is very difficult optimization problem mathematically. Many variables should be optimized using only one cost function. Multichannel NMF involve strong dependence on initial values and lack robustness. 9
Hybrid method • Conventional hybrid method utilizes PSNMF after the directional clustering. [Iwao, et al., 2012] • This method consists of two techniques. – Directional clustering – PSNMF L R Spatial separation Source separation Directional clustering PSNMF Conventional Hybrid method 10
Problem of hybrid method • The signal extracted by the hybrid method has considerable distortion. • There are many spectral chasms in the spectrogram obtained by directional clustering. • The resolution of the spectrogram is degraded. 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 0 1 1 0 1 Frequency Frequency Frequency Directional Clustering Input spectrogram Binary mask Separated cluster : Chasms 1 0 0 0 0 0 0 1 1 1 0 1 1 0 Time Time : Target direction Time : Other direction :Hadamard product (product of each element) 11
Outline • 1. Research background • 2. Conventional method – – – – – Nonnegative matrix factorization Penalized supervised nonnegative matrix factorization Directional clustering Multichannel NMF Hybrid method • 3. Proposed method – Regularized superresolution-based nonnegative matrix factorization • 4. Experiments • 5. Conclusions 12
Proposed hybrid method Conventional hybrid method Proposed hybrid method Input stereo signal L-ch Input stereo signal R-ch L-ch R-ch STFT STFT Directional clustering Directional clustering Center component L-ch R-ch PSNMF PSNMF ISTFT ISTFT Mixing Extracted signal Index of center cluster Center component L-ch R-ch Superresolutionbased SNMF Superresolutionbased SNMF ISTFT ISTFT Mixing Extracted signal Employ a new supervised NMF algorithm as an alternative to the conventional PSNMF in the hybrid method. 13
Superresolution-based supervised NMF • In proposed supervised NMF, the spectral chasms are treated as unseen observations using index matrix. Chasms Frequency Separated cluster : Chasms Time Frequency Index matrix 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 : Grid of separated component 0 : Grid of chasm (hole) 1 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 0 Time Treat chasms as unseen observations. 14
Superresolution-based supervised NMF • The components of the target sound lost after directional clustering can be extrapolated using supervised bases. Reconstructed spectrogram : Chasms Time Superresolution using supervised bases Frequency Frequency Separated cluster Time Supervised bases 15
Superresolution-based supervised NMF • Signal flow of the proposed hybrid method Frequency of source component Target source (a) Observed spectra Left Center Direction Right 16
Superresolution-based supervised NMF • Signal flow of the proposed hybrid method Frequency of source component Target source (a) Observed spectra Left Frequency of source component Target direction Center Direction (b) After directional clustering Left Right Directional clustering Center sources lose some of their z components Center Direction Right 17
Superresolution-based supervised NMF Frequency of source component • Signal flow of the proposed hybrid method (b) After directional clustering Left Center sources lose some of their z components Center Direction Right 18
Superresolution-based supervised NMF Frequency of source component • Signal flow of the proposed hybrid method (b) After directional clustering Frequency of source component Left Center sources lose some of their z components Center Direction (c) After superresolutionbased SNMF Left Center Direction Right Superresolutionbased NMF Extrapolated target source Right 19
Superresolution-based supervised NMF • The basis extrapolation includes an underlying problem. • If the time-frequency spectra are almost unseen in the spectrogram, which means that the indexes are almost zero, a large extrapolation error may occur. • It is necessary to regularize the extrapolation. 4 Frequency [kHz] Frequency Separated cluster Extrapolation error (incorrectly modifying the activation) Time Almost unseen frame 3 2 1 0 0 1 2 3 Time [s] 4 20
Superresolution-based supervised NMF • We propose to introduce the regularization term in the cost function. • The intensity of these regularizations are proportional to the number of chasms in each frame. Regularization of norm minimization 𝑰 : Index matrix 𝑖𝜔,𝑡 : Entry of index matrix 𝑰 𝑓𝜔,𝑘 : Entry of matrix 𝑭 𝑔𝑘,𝑡 : Entry of matrix 𝑮 ∙ҧ : Binary complement 21
Superresolution-based supervised NMF • The cost function in regularized superresolution-based NMF is defined using the index matrix as follows: Regularization term Penalty term • Since the divergence is only defined in grids whose index is one, the chasms in the spectrogram are ignored. : an arbitrary divergence function : Weighting parameter : Penalty term to force and to become uncorrelated with each other 22
Superresolution-based supervised NMF • The update rules that minimize the cost function based on KL divergence are obtained as follows: 23
Superresolution-based supervised NMF • The update rules that minimize the cost function based on Euclidian distance are obtained as follows: 24
Outline • 1. Research background • 2. Conventional method – – – – – Nonnegative matrix factorization Penalized supervised nonnegative matrix factorization Directional clustering Multichannel NMF Hybrid method • 3. Proposed method – Regularized superresolution-based nonnegative matrix factorization • 4. Experiments • 5. Conclusions 25
Evaluation experiment • We compared five methods. – – – – – Simple directional clustering Simple PSNMF Multichannel NMF based on IS-divergence Conventional hybrid method using PSNMF Proposed hybrid method using superresolution-based SNMF Input stereo signal Input stereo signal L-ch L-ch R-ch R-ch STFT STFT Directional clustering Directional clustering Center component Index of center cluster Center component L-ch PSNMF R-ch PSNMF L-ch Superresolutionbased SNMF R-ch Superresolutionbased SNMF ISTFT ISTFT ISTFT ISTFT Mixing Mixing Extracted signal Extracted signal 26
Evaluation experiment • We used stereo-panning signals ( , ). • Mixture of four instruments (Ob., Fl., Tb., and Pf.) generated by MIDI synthesizer • We used the same type of MIDI sounds of the target instruments as supervision for training process. Left Center 2 4 1 Target source Right 3 Supervision sound Two octave notes that cover all notes of the target signal 27
Experimental results ( ) • Average SDR, SIR, and SAR scores for each method, where the four instruments are shuffled with 12 combinations. SDR :quality of the separated target sound SIR :degree of separation between the target and other sounds SAR :absence of artificial distortion Good SDR SIR SAR Bad 28
Experimental results ( ) • Average SDR, SIR, and SAR scores for each method, where the four instruments are shuffled with 12 combinations. SDR :quality of the separated target sound SIR :degree of separation between the target and other sounds SAR :absence of artificial distortion Good SDR SIR SAR Bad 29
Conclusions • We propose a new supervised NMF algorithm for the hybrid method to separate stereo or multichannel signals. • The proposed supervised method recovers the resolution of spectrogram, which is obtained by the binary masking in directional clustering, using supervised basis extrapolation. • The proposed hybrid method can separate the target signal with high performance compared with conventional methods. Thank you for your attention! 30