212 Views
December 29, 24
スライド概要
東北大学工学部電気情報物理工学科
Graduation Study Lab Seminar Midterm Presentation 画像認識タスクに汎用な 熱赤外線画像着色モデルの検討 General-Purpose Infrared Image Coloring Model for Various Recognition Tasks 大町・宮崎研究室 学部4年 谷内寛人 Hiroto TANIUCHI, 4th year undergraduate, Laboratory for Image Information Communications (IICLab), Tohoku University Your feedback is welcome! 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 1
Outline 1. Introduction 2. Ugawa’s Model 3. Possibility and Motivation 4. Approach and Proposal 5. Experiment Design 6. Conclusion 7. Topic 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 2
1. Introduction “Color” is interesting as research topic! ...Thermal infrared cameras are robust sensor that can withstand environmental changes. Feature Applications Capture the heat of objects No lighting required Available in bad weather Automatic operation[1] ht tps:/ /ww w.fli r.com /gl oba lassets/ news/ 120 0x6 28_ auton omou sv ehi cl e.jpg Visible light image https://www.mi.t.u-tokyo.ac.jp/static/projects/mil_multispectral/det_result.png 2024/12/20(Fri) Thermal Infrared (TIR) Image Rescue[1] ht tps:/ /ww w.fli r.jp/gl oba lassets/ defense/ solu tio n-an dl andi ng-pa ges/ui s/ se arch-re scu e-be nner .jpg/ co nstrai n11 30x 0-20 105 106 68.jp g IIC Lab B4 Hiroto Taniuchi Crime prevention ht tps:/ /shop pi ng.w tw.jp/cdn/ sh op/ fil es/00 000 000 077 6 _k ZHOT 5k.p ng?v= 168 751 477 6 3
1. Introduction Visible Light Thermal InfraRed Influenced by the environment Easy to grasp the situation Robust to changes in the environment Low visibility https://www.mi.t.u-tokyo.ac.jp/static/projects/mil_multispectral/det_result.png https://www.mi.t.u-tokyo.ac.jp/static/projects/mil_multispectral/det_result.png Heat Information Color Texture Fake Visible Light Image Generated from TIR Image Easy to grasp the situation environment 2024/12/20(Fri) Robust to IIC Lab B4 Hiroto Taniuchi 4
2. Ugawa’s Model ・Proposed by Kei Ugawa, agraduate of IIC Lab ・TICC-GAN[3] used as baseline ・Generate colored images that properly reflect meaning of objects ・Refer to feature maps from the segmentation module 〜2023/11/25 Memory of Ekiden〜 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 5
2. Ugawa’s Model Previous Coloring Model Using GAN Discriminator Generator TIR Image 𝐼 (Original) Fake Visible Light Image 𝑉 (Generated) 𝐼 … 𝐼𝑛𝑓𝑟𝑎𝑟𝑒𝑑 𝑟𝑎𝑦 ቊ 𝑉 … 𝑉𝑖𝑠𝑖𝑏𝑙𝑒 𝑟𝑖𝑔ℎ𝑡 Contents Loss Perceptual Loss Total Variation Loss Adversarial Loss Visible Light Image 𝑉 (Ground Truth) 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 6
2. Ugawa’s Model Ugawa’s Model ※イメージ図 Segmentation Module 𝑀𝑎𝑠𝑘 Class Discriminator Class Mask 𝑉𝑀𝑎𝑠𝑘 Discriminator Coloring 着色モジュール Module TIR Image 𝐼 (Original) Fake Visible Light Image 𝑉 (Generated) 𝐼 … 𝐼𝑛𝑓𝑟𝑎𝑟𝑒𝑑 𝑟𝑎𝑦 ቊ 𝑉 … 𝑉𝑖𝑠𝑖𝑏𝑙𝑒 𝑟𝑖𝑔ℎ𝑡 Contents Loss Perceptual Loss Total Variation Loss Adversarial Loss Class Adversarial Loss Visible Light Image 𝑉 (Ground Truth) 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 7
2. Ugawa’s Model Previous Coloring Model Using GAN Discriminator Generator TIR Image 𝐼 (Original) Fake Visible Light Image 𝑉 (Generated) 𝐼 … 𝐼𝑛𝑓𝑟𝑎𝑟𝑒𝑑 𝑟𝑎𝑦 ቊ 𝑉 … 𝑉𝑖𝑠𝑖𝑏𝑙𝑒 𝑟𝑖𝑔ℎ𝑡 Contents Loss Perceptual Loss Total Variation Loss Adversarial Loss Visible Light Image 𝑉 (Ground Truth) 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 8
2. Ugawa’s Model Previous Coloring Model Using GAN -Loss- Discriminator Generator TIR Image 𝐼 (Original) 𝐿𝑎𝑑𝑣 𝐿𝐷 Fake Visible Light Image 𝑉 (Generated) 𝐿𝑐𝑜𝑛 𝐿𝑝𝑒𝑟 𝐿𝑡𝑣 𝐼 … 𝐼𝑛𝑓𝑟𝑎𝑟𝑒𝑑 𝑟𝑎𝑦 ቊ 𝑉 … 𝑉𝑖𝑠𝑖𝑏𝑙𝑒 𝑟𝑖𝑔ℎ𝑡 Visible Light Image 𝑉 (Ground Truth) 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 9
2. Ugawa’s Model Ugawa’s Model -LossSegmentation Module 𝑀𝑎𝑠𝑘 𝑉𝑀𝑎𝑠𝑘 Class Class Mask Discriminator 𝑉𝑀𝑎𝑠𝑘 𝐿𝑐𝑙𝑎𝑠𝑠 𝐿𝐷𝑐𝑙𝑎𝑠𝑠 Discriminator 𝐿𝑎𝑑𝑣 𝐿𝐷 Coloring Module TIR Image 𝐼 (Original) Fake Visible Light Image 𝑉 (Generated) 𝐿𝑐𝑜𝑛 𝐿𝑝𝑒𝑟 𝐿𝑡𝑣 𝐼 … 𝐼𝑛𝑓𝑟𝑎𝑟𝑒𝑑 𝑟𝑎𝑦 ቊ 𝑉 … 𝑉𝑖𝑠𝑖𝑏𝑙𝑒 𝑟𝑖𝑔ℎ𝑡 Visible Light Image 𝑉 (Ground Truth) 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 10
2. Ugawa’s Model Previous Coloring Model Using GAN -Loss- Discriminator Generator TIR Image 𝐼 (Original) 𝐿𝑎𝑑𝑣 𝐿𝐷 Fake Visible Light Image 𝑉 (Generated) 𝐿𝑐𝑜𝑛 𝐿𝑝𝑒𝑟 𝐿𝑡𝑣 𝐼 … 𝐼𝑛𝑓𝑟𝑎𝑟𝑒𝑑 𝑟𝑎𝑦 ቊ 𝑉 … 𝑉𝑖𝑠𝑖𝑏𝑙𝑒 𝑟𝑖𝑔ℎ𝑡 Visible Light Image 𝑉 (Ground Truth) 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 11
3. Possibility and Motivation Viewpoint ・As part of the experiment, input the fake visible right image 𝑉 to segmentation model 𝑀𝑜𝑑𝑒𝑙 𝑆 trained by visible light images. ・Lack of coloring accuracy and could not classify accurately. (Originally the goal of the model is to improve image quality.) ・If the output of the model works when we input it to a model trained by visible light images, it means we can apply TIR images to Publicly available trained large-scale models! ex) ResNet, BiT, CLIP... ・In short, we would like to covert TIR image while taking account naturalness not only for human, but also for recognition model. Motivation from Ugawa’s model as an input Using output image 𝑽 for recognition models trained by visible light images 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 12
4. Approach and Proposal Dataset should have visible light image, TIR image and annotation aligned on the same scene. MFNet Dataset ・Natural images of parking and roads ・Night and day ・640x480 pixels ・total 1606 ・Made in University of Tokyo [4] ht tps:/ /m.me dia -ama zo n.com/ imag es/I/ 91 TC HCg vJ2 L._S Y42 5_.jpg ht tps:/ /ww w.mi .t.u-tok yo.ac.jp/ st atic/ pro je ct s/ mi l_ mul tispe ct ral/pr edi ction Exam ple s_ good .png 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 13
4. Approach and Proposal Proposed Method ※イメージ図 Segmentation Module 𝑀𝑎𝑠𝑘 Class Discriminator Class Mask MFNet データセット 𝑉𝑀𝑎𝑠𝑘 Discriminator Coloring Module Fake Visible Light Image 𝑉 (Generated) TIR Image 𝐼 (Original) Segmentation Model Trained by Visible Light 𝐼 … 𝐼𝑛𝑓𝑟𝑎𝑟𝑒𝑑 𝑟𝑎𝑦 ቐ 𝑉 … 𝑉𝑖𝑠𝑖𝑏𝑙𝑒 𝑟𝑖𝑔ℎ𝑡 𝑆 … 𝑆𝑒𝑔𝑚𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 Class Adversarial Loss Segmentation Loss Visible Light Image 𝑉 (GT①) 2024/12/20(Fri) 𝑆መ Contents Loss Perceptual Loss Total Variation Loss Adversarial Loss Segmentaion 𝑆 (GT②) IIC Lab B4 Hiroto Taniuchi 14
4. Approach and Proposal Previous Coloring Model Using GAN Discriminator Coloring Module Fake Visible Light Image 𝑉 (Generated) TIR Image 𝐼 (Original) 𝐼 … 𝐼𝑛𝑓𝑟𝑎𝑟𝑒𝑑 𝑟𝑎𝑦 ቊ 𝑉 … 𝑉𝑖𝑠𝑖𝑏𝑙𝑒 𝑟𝑖𝑔ℎ𝑡 Contents Loss Perceptual Loss Total Variation Loss Adversarial Loss Visible Light Image 𝑉 (GT) 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 15
4. Approach and Proposal Ugawa’s Model ※イメージ図 Segmentation Module 𝑀𝑎𝑠𝑘 Class Discriminator Class Mask 𝑉𝑀𝑎𝑠𝑘 Discriminator Coloring Module Fake Visible Light Image 𝑉 (Generated) TIR Image 𝐼 (Original) 𝐼 … 𝐼𝑛𝑓𝑟𝑎𝑟𝑒𝑑 𝑟𝑎𝑦 ቊ 𝑉 … 𝑉𝑖𝑠𝑖𝑏𝑙𝑒 𝑟𝑖𝑔ℎ𝑡 Contents Loss Perceptual Loss Total Variation Loss Adversarial Loss Class Adversarial Loss Visible Light Image 𝑉 (GT) 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 16
4. Approach and Proposal Proposed Method ※イメージ図 Segmentation Module 𝑀𝑎𝑠𝑘 Class Discriminator Class Mask MFNet データセット 𝑉𝑀𝑎𝑠𝑘 Discriminator Coloring Module Fake Visible Light Image 𝑉 (Generated) TIR Image 𝐼 (Original) Segmentation Model Trained by Visible Light 𝐼 … 𝐼𝑛𝑓𝑟𝑎𝑟𝑒𝑑 𝑟𝑎𝑦 ቐ 𝑉 … 𝑉𝑖𝑠𝑖𝑏𝑙𝑒 𝑟𝑖𝑔ℎ𝑡 𝑆 … 𝑆𝑒𝑔𝑚𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 Class Adversarial Loss Segmentation Loss Visible Light Image 𝑉 (GT①) 2024/12/20(Fri) 𝑆መ Contents Loss Perceptual Loss Total Variation Loss Adversarial Loss Segmentaion 𝑆 (GT②) IIC Lab B4 Hiroto Taniuchi 17
5. Experiment Design Run Ugawa’s model stored on the server □ Follow-up experiment under the same conditions as those of Ugawa’s paper, and obtain values close to the results by ourselves. Midterm Presentation of Dec. □ (Implement the Ugawa’s model on our own) □ Input the infrared images 𝐼of the MFNet dataset to the trained Ugawa’s model 𝑀𝑜𝑑𝑒𝑙𝑈 and obtain a fake light image 𝑉 □ Input 𝑉 to the segmentation model 𝑀𝑜𝑑𝑒𝑙 𝑆 trained by visible light images and compare classification accuracy with real visible light version = Confirm that the output 𝑆 𝑉 with 𝑉 input is less accurate than the output 𝑆 𝑉 with 𝑉 input. End of Jan. ′ □ Train 𝑀𝑜𝑑𝑒𝑙 𝑈 adding “classification accuracy at 𝑉 input” into loss = Attempt to maintain “naturalness as seen by humans“ and make it “work well even when input to a model pretrained with visible light images Final Presentation of Feb. 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 18
6. Conclusion Motivation from Ugawa’s model as an input Using output image 𝑽 for recognition models trained by visible light images Approach ” 𝑺 ・Add the gap between ground truth𝑺 and “segmentated 𝑽 (=unnaturalness for recognition model) to the loss function of the discriminator ・Introducing MFNet, a dataset with visible light image, TIR image and annotation aligned on the same scene ・Finally generalize to obgect detection tasks 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 19
7. Topic ・Is the novelty of this research theme recognized? ・I'm in pain from the error resolution. Crying. ・How to manage experimental conditions? ・How to connect and recombine existing models? ・How to verify what the image recognition model is looking at (Grad Cam?) ・How to explore repositories on GitHub? ・How to version control with GitHub? 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi ...etc 20
Thank you for your attention! 終 制作・著作 ━━━━━ Your feedback is welcome! 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi ⒾⒾⒸ 21
References [1] FLIR Systems homepage https://www.flir.jp/ [2] Kei Ugawa. A Study on Thermal Infrared Image Colorization Based on Semantic Information, 2024. [3] X. Kuang et al. ‘Thermal infrared colorization via conditional generative adversarial network’, Infrared Physics & Technology, vol. 107, p. 103338, Jun. 2020, doi: 10.1016/j.infrared.2020.103338. [4] Qishen Ha, Kohei Watanabe, Takumi Karasawa, Yoshitaka Ushiku, Tatsuya Harada. MFNet: Towards Real-Time Semantic Segmentation for Autonomous Vehicles with Multi-Spectral Scenes. The 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017), 2017. https://www.mi.t.u-tokyo.ac.jp/static/projects/mil_multispectral/ 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 22
Prospects and Ideas for Master’s Degree Continue undergraduate research ・Generalization to other tasks =Use of object detection data contained in MFNet ・Lightweight Ugawa’s Model https ://www.mi.t.u-tokyo.ac.jp/static/projects/mil_mult ispectral/det_res ult.png Other themes related to color ・Evaluate the amount of semantic information that is lost when converted to monochrome. ・Quantitatively evaluate how easy it is for colorblind people to see the display of a package, etc. ・Implement a service that provides color schemes for slides according to the input of feelings and themes using words. 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 23
Feedback Obtained ・ Possibly more effective using diffusion models than GAN 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 24
Supplementary Material–Ugawa’s Model Generator Loss Functions- 2024/12/20(Fri) IIC Lab B4 Hiroto Taniuchi 25