>100 Views
October 19, 17
スライド概要
Deep Learning JP:
http://deeplearning.jp/hacks/
DL輪読会資料
Image-to-Image Translation with Conditional Adversarial Nets (Pix2Pix) & Perceptual Adversarial Networks for Image-to-Image Transformation (PAN) 2017/10/2 DLHacks Otsubo
Topic : image-to-image “translation” 1
Info Pix2Pix [CVPR2017] • Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros - iGAN [ECCV 2016] interactive-deep-colorization [SIGGRAPH 2017] Context-Encoder [CVPR 2016] Image Quilting [SIGGRAPH 2001] Texture Synthesis by Non-parametric Sampling [ICCV 1999] • University of California • 178 citations PAN [arXiv2017] • Chaoyue Wang, Chang Xu, Chaohui Wang, Dacheng Tao • University of Technology Sydney, The University of Sydney, Universite Paris-Est 2
Background • Many tasks are regarded as “translation” from input image to output image - Diverse methods exist for them Is there single framework to achieve them? 3
Overview Pix2Pix • General-purpose solution to image-to-image translation using single framework - Single framework: conditional GAN (cGAN) PAN • Pix2Pix - (per-pixel loss) + (perceptual adversarial loss) 4
Naive Implementation : U-Net (①) ①per-pixel loss (L1/L2) 5
Pix2Pix (①+②) ②adversarial loss 6
Pix2Pix’s loss (①+②) ① ② ② 7
PAN (②+③) ③perceptual adversarial loss 8
PAN’s loss (②+③) ② ② ③ ③ m : constant L1 norm 9
Example1 : Image De-Raining • Removing rain from single images via a deep detail network [Fu, CVPR2017] • ID-GAN (cGAN) [Zhang, arXiv2017] - per-pixel loss - adversarial loss - pre-trained VGG’s perceptual loss Input Output (Ground Truth) 10
Example1 : Image De-Raining • Removing rain from single images via a deep detail network [Fu, CVPR2017] • ID-GAN (cGAN) [Zhang, arXiv2017] - per-pixel loss - adversarial loss - pre-trained VGG’s perceptual loss (cf. PAN uses discriminator’s perceptual loss) Input Output (Ground Truth) 11
Example2 : Image Inpainting • Globally and Locally Consistent Image Completion [Iizuka, SIGGRAPH2017] • Context Encoders (cGAN) [Pathak, CVPR2016] - per-pixel loss - adversarial loss Input Output (Ground Truth) 12
Example3 : Semantic Segmentation Cityscape / Pascal VOC • DeepLabv3 [Chen, arXiv2017] • PSPNet [Zhao, CVPR2017] http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php? cls=mean&challengeid=11&compid=6 Cell Tracking / CREMI • Learned Watershed [Wolf, ICCV2017] • U-Net [Ronneberger, MICCAI2015] Input http://www.codesolorzano.com/Challenges/CTC/Welcome.html Output (Ground Truth) 13
Result1 : Image De-Raining (≒pix2pix) (≒pix2pix)→ 14
Result2 : Image Inpainting 15
Result3 : Semantic Segmentation 16
Discussion Why is perceptual adversarial loss so efficient? vs. No perceptual loss (Pix2Pix) - Perceptual loss enables D to detect more discrepancy between True/False images vs. Pre-trained VGG perceptual loss (ID-GAN) - VGG features tend to focus on content - PAN features tend to focus on discrepancy - PAN’s loss leads to avoid adversarial examples [Goodfellow, ICLR2015] (?) 17
Minor Difference • Pix2Pix uses Patch-GAN - Small size(70×70) patch-discriminator - Final output of D is average of patch-discriminator’s responses (convolutionally applied) 18
To Do • Implement 1. Pix2Pix (Patch Discriminator) 2. PAN (Patch Discriminator) 3. PAN (Normal Discriminator) Wang et al. might compare 1 with 3. 19
20
Implementation 2017/10/17 DLHacks Otsubo
My Implementation • https://github.com/DLHacks/pix2pix_PAN • pix2pix - https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix • PAN - per-pixel loss à perceptual adversarial loss - not same as paper’s original architecture - num of parameters is same as pix2pix 22
My Experiments • Facade (label à picture) • Map (picture à Google map) • Cityscape (picture à label) 23
Result (Facade pix2pix) 24
Result (Facade PAN) 25
Result (Map pix2pix) 26
Result (Map PAN) 27
Result (Cityscape pix2pix) 28
Result (Cityscape PAN) 29
Result (PSNR[dB]) 30
Discussion – Why pix2pix > PAN? • per-pixel loss is needed? • patch discriminator is not suited for PAN? • positive margin m • (bad pix2pix implementation in PAN’s paper…?) 31