>100 Views
November 25, 24
スライド概要
日本大学 文理学部 情報科学科 北原研究室。 「Technology Makes Music More Fun」を合言葉に、音楽をはじめとするエンターテインメントの高度化に資する技術の研究開発を行っています。
A System for Retrieving Video Game Music Nihon University Ryusei Hayashi, Tetsuro Kitahara 1
Intro > Background Background ● Video Game Music (VGM) express game and scene feature at the same time Okami / CAPCOM CO., LTD. Undertale / Toby Fox Prolouge ● Prolouge / Masami Ueda ● Once Upon a Time / Toby Fox Last Boss ● Rising Sun / Rei Kondoh ● Hopes And Dreams / Toby Fox 2
Intro > Background Background ● Video Game Music (VGM) express game and scene feature at the same time Japanese Style Chiptune Style ● Prolouge / Masami Ueda ● Once Upon a Time / Toby Fox ● Rising Sun / Rei Kondoh ● Hopes And Dreams / Toby Fox 3
Intro > Background Background ● Video Game Music (VGM) express game and scene feature at the same time Low Tension ● Prolouge / Masami Ueda ● Once Upon a Time / Toby Fox High Tension ● Rising Sun / Rei Kondoh ● Hopes And Dreams / Toby Fox 4
Intro > Background Background ● Difficult to retrieve VGM taking game and scene feature at the same time Let's develop a Yakuza game! 5
Intro > Background Background ● Difficult to retrieve VGM taking game and scene feature at the same time Let's develop a Yakuza game! Found the Yakuza fight scene VGM! 6
Intro > Background Background ● Difficult to retrieve VGM taking game and scene feature at the same time Let's develop a Yakuza game! Found the Yakuza fight scene VGM! Not found Yakuza love scene VGM... 7
EgGMAn - Engine of Game Music Analysis 8
Intro > Purpose Purpose ● Purpose: Retrieve VGM taking game and scene feature at the same time ● Premise: One VGM has been decided for the developing game ● Condition ○ Maintain the game feature of the decided VGM for the developing game ○ Change the scene feature of the decided VGM for the developing game Game Feature Change😊 Scene Feature Maintain😊 9
Intro > Problem Problem ● Condition ○ Maintain the game feature of the decided VGM to be included in the game ○ Change the scene feature of the decided VGM to be included in the game → Partial Contradiction Game Feature Maintain🙁 Scene Feature Change🙁 10
Intro > Solution Solution ● Vectorize VGM with VAE ● Assumption ○ VGM vector form cluster in each scene ○ Constant difference between VGM Scene1 v12 c1 v11 Scene2 v1n vector and center of each scene vm v22 c2 SceneM 2 cm vm v21 v2n 1 vm n 11
Intro > Solution Solution ● Vectorize VGM with VAE ● Assumption ○ VGM vector form cluster in each scene ○ Constant difference between VGM Scene1 Game1 v12 c1 v11 Scene2 v1n vector and center of each scene vm v22 c2 SceneM 2 cm vm vm v21 v2n 1 Game1 n 12
Intro > Solution Solution ● Vectorize VGM with VAE ● Assumption ○ VGM vector form cluster in each scene ○ Constant difference between VGM Scene1 v12 c1 v11 Scene2 v1n vector and center of each scene vm v22 c2 SceneM 2 cm vm v21 v2n 1 vm n 13
Method > Input/Output Input/Output ● Yakuza Fight VGM Input ○ Scene to use Source Music EgGMAn Target Scene ■ ● VGM for the developing game Source Scene ■ ○ Love Source Music ■ ○ Fight Another scene to put VGM Output ○ Target Music ■ VGM to attach to Target Scene Yakuza Love VGM 14
Method > Vectorization Vectorization ● Source Music Source Music ○ Convert Source Music to vector z with VAE VAE z 15
Method > Vectorization Vectorization ● Source Music ○ ● Convert Source Music to vector z with VAE Source Scene ○ Create set of VGM P for use in Source Scene ○ Convert set P to set of vector Pz in VAE ○ Compute the center pc of the set Pz z p2z p1z pc pmz 16
Method > Vectorization Vectorization ● Source Music ○ ● ● Convert Source Music to vector z with VAE Source Scene ○ Create set of VGM P for use in Source Scene ○ Convert set P to set of vector Pz in VAE ○ Compute the center pc of the set Pz Target Scene ○ z pc q2z q1z qc Create set of VGM Q for use in Target Scene z ○ Convert set Q to set of vector Q in VAE ○ Compute the center qc of the set Qz qnz 17
Method > Compute Vector Compute Vector ● Target Music ○ Predict Target Music vector z’ z vik - ci = vjk - cj pc qc 18
Method > Compute Vector Compute Vector ● Target Music ○ Predict Target Music vector z’ vik - ci = vjk - cj c c z - p = z’ - q z z’ pc qc 19
Method > Compute Vector Compute Vector ● Target Music ○ Predict Target Music vector z’ vik - ci = vjk - cj z - pc = z’ - qc z’ = z + qc - pc z z’ pc qc 20
Method > Retrieval Retrieval ● Target Music ○ Predict Target Music vector z’ ○ Compute distance from vector z’ to VGM ○ Sort VGM based on distance Sort z’ q2z q1z qnz 21
Method > Preprocessing Preprocessing ● Pass Spectrogram of VGM to VAE ● Method ○ Extract 10 to 30 second segment of VGM ○ Detect beats in the extracted segment ○ Select Beats ■ First beat ■ Farthest beat from it in less than 10 second ○ Extract the segment formed by selected beats ○ Convert the extracted segment to a spectrogram 0 10 30 10 Hz: 1024 Time: 256 22
Method > Dataset Dataset ● Tag Table ○ Create from Audiostock ○ Link and save the ID and Tag ○ Link and save the ID and MP3 ● Scene Set ○ Collect about 170 frequently used Scene in games as words ○ Time, Weather, and 7 other types 23
Method > Dataset > Similarity Table Similarity Table ● Method ● Similarity ○ Store Scene in row 1, Tag in column 1 ○ Vectorize Tag and Scene in Word2vec ○ Store i, j-th similarity between i-th ○ Compute cosine similarity between Scene and j-th Tag Tag and Scene 24
Method > Dataset > Scene Table Scene Table ● Replace Tag in Tag Table with Scene ● Method ○ Extract Tag from the Tag Table ○ Extract the similarity of the extracted Tag from Similarity Table ○ Extract Scene with extracted similarity greater than threshold ○ Store extracted Scene in the Scene Table 25
Method > VAE VAE (Variational Auto-Encoder) ● Structure Loss Function ● ○ Encoder: Convert data to vector ○ MSE: Error in input and reconstruct data ○ Decoder: Reconstruct vector to data ○ KLD: Error in vector and normal distribution MSE Encoder Input Data Decoder z KLD Normal Distribution Reconstruct Data 26
Method > VAE > Structure > Encoder Encoder ● Convert data to vector ● Implement by convolution (→) Time: 256 Hz: 1024 1024*256*1 1024*256*32 1024*16*32 1024*1*32 1*256*560 1*16*560 1*1*560 FCN 560 Sampling 32 (Vector) 27
Method > VAE > Structure > Decoder Decoder ● Reconstruct vector to data ● Implement by deconvolution (→) Time: 256 Hz: 1024 1024*1*32 1024*16*32 1024*256*32 1*1*560 1*16*560 1*256*560 1024*256*1 FCN 560 FCN 32 (Vector) 28
Method > VAE > Loss Function > MSE MSE (Mean Squared Error) ● Error in input and reconstruct vector data ● Ensures that vector reflect data MSE Input Data Reconstruct Data 29
Method > VAE > Loss Function > KLD KLD (Kullback-Leibler Divergence) ● Error in vector and normal distribution ● Ensure vector continuity Encoder 𝛍 Sampling Decoder 𝛔 KLD Normal Distribution 30
Experiment > Preliminary Experiment Preliminary Experiment ● Purpose: Confirm if VAE can train VGM or not ● Prepare ● ○ Randomly extract 5120 MP3 from the Tag Table ○ Split the extracted MP3 3:1 for training:validation data Execute ○ Train VAE with training data ○ Visualize the loss function for training and validation data ○ Reconstruct training and validation data with trained VAE 31
Experiment > Preliminary Experiment Preliminary Experiment ● ● Loss Function ○ Training Data: Continue to decrease ○ Validation Data: Stop to decrease ○ VAE caused overtraining Reconstruct ○ Need another way to consider tempo 32
Experiment > Operational Experiment Operational Experiment ● Purpose: Confirm the practicality of EgGMAn in game development ● Prepare ● ○ Develop the front end of EgGMAn ○ Distribute EgGMAn and Survey Execute ○ Ask them to use EgGMAn during development ○ Ask them to fill out Survey after development Survey How predictable was the retrieval? How suitable was the retrieval for game? How suitable was the retrieval for scene? 33
Experiment > Operational Experiment Operational Experiment ● Global Game Jam 2024 ○ ○ Information Not suitable at all Not suitable Even ■ Place: Tokyo Univ. of Tech. Suitable ■ Term: 2 days Very suitable ■ Team: 5 teams (30 people) How predictable was the retrieval? Result ■ Participant: 5 people ■ Good Review: 3 people How suitable was the retrieval for game? How suitable was the retrieval for scene? 34
Outro > Conclusion Conclusion ● ● ● Intro ○ Purpose: Retrieve VGM taking game and scene feature at the same time ○ Premise: One VGM to be included in the game has been decided Method ○ Convert Source Music to vector z with VAE ○ Compute center pc, qc of Source Scene, Traget Scene ○ Compute Target Music vector with z’ = z + qc - pc Experiment ○ Preliminary Experiment: Confirm if VAE can train VGM or not ○ Operational Experiment: Confirm the practicality of EgGMAn in game development 35
Outro > Future Future ● Objective Evaluation ○ ○ ● Evaluate the validity of Assumption ■ VGM vector form cluster in each scene ■ Constant difference between VGM vector and center of each scene Evaluate the retrieval accuracy Subjective Evaluation ○ Evaluate retrieval accuracy 36
Outro > Acknowledgment Acknowledgment ● Thank you for valuable advice and feedback ● Prof. Shigeyuki Hirai / Kyoto Sangyo Univ. ● Mr. Kenji Kojima / CAPCOM CO., LTD. ● Mr. Tomoya Kishi / CAPCOM CO., LTD. ● Mr. Takaaki Ichijo / HEAD-HIGH CO., LTD. ● Dr. Akinori Ito / Tokyo Univ. of Tech. ● Prof. Koji Mikami / Tokyo Univ. of Tech. 37
A System for Retrieving Video Game Music Nihon University Ryusei Hayashi, Tetsuro Kitahara 38