IEEESM’23 IEEE International Conference on Smart Mobility Benchmark of Deep Learning Visual and Far-infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring Takumi Fukuda, Ismail Arai, Arata Endo, Masatoshi Kakiuchi, Kazutoshi Fujikawa Nara Institute of Science and Technology, Japan
Contents 1 1. Introduction 2. Experimental Method 3. Result 1. Error rate vs. hour when using visible video 2. Error rate vs. temperature when using far-infrared video 3. Average error rate by time and weather 4. Conclusion Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Contents 2 1. Introduction 2. Experimental Method 3. Result 1. Error rate vs. hour when using visible video 2. Error rate vs. temperature when using far-infrared video 3. Average error rate by time and weather 4. Conclusion Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
1.1 Motivation • Sidewalks contribute to the economy and environment. • Increased revenues for vicinity stores[1]. • Active walking reduces vehicle emissions and noise[2]. • To develop sidewalks, it is necessary to collect data on sidewalk usage. • The number of pedestrians (head count). [1]: Yuji Yoshimura, Yusuke Kumakoshi, Yichun Fan, Sebastiano Milardo, Hideki Koizumi, Paolo Santi, Juan Murillo Arias, Siqi Zheng, and Carlo Ratti. Street pedestrianization in urban districts: Economic impacts in Spanish cities. Cities, Vol. 120, p. 103468, January 2022. [2]:Shuhana Shamsuddin, Rasyiqah Hassan, and Siti Bilyamin. Walkable environment in increasing the liveability of a city. Procedia - Social and Behavioral Sciences, Vol. 50, p. 167–178, 12 2012. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 3
1.2 Previous Study 4 • Cameras on the highway are used for head count. Percentage of CCTV cameras that achieved the target accuracy of head count. 1 hour from 7-9 am 2 hours from 9-16 1 hour from 20-22 Vehicle 77.6% 75.5% 24.1% Pedestrian 1.0% - - • Comparison: Hourly traffic • Within ±5%[3] of target It is difficult to estimate the number of cars at night. We expect that head counts will be even lower at night. Not measured various environments. Investigate estimation accuracy at various times and in weather. [3]: Fowzia Akhter, Sam Khadivizand, Hasin Reza Siddiquei, Md Eshrat E. Alahi, and Subhas Mukhopadhyay. Iot enabled intelligent sensor node for smart city: Pedestrian counting and ambient monitoring. Sensors, Vol. 19, No. 15, 2019. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
1.3 Aim (For accurate head counts) • This study investigates the factors that reduce the accuracy of head count when using both visible and far-infrared videos. • Why use far-infrared video? • Because it may be possible to detect objects which are not detectable in visible video. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 5
Contents 6 1. Introduction 2. Experimental Method 3. Result 1. Error rate vs. hour when using visible video 2. Error rate vs. temperature when using far-infrared video 3. Average error rate by time and weather 4. Conclusion Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
2.1 Capturing method 7 • Synchronous capture with visible and far-infrared camera. • Visible: 10 min/video • Far-infrared: 10 min/video Visible Far-infrared Resolution (pixel) 600×800 514×640 Frame rate (fps) 10 10 Camera angle (degree) 31-108 32 Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
2.2 Head count method STEP 1 Object detection 8 � � + � � � + � � + � Object detection Pedestrian detection for each frame � + � 1 1 STEP 2 STEP 3 1 Object tracking Assign IDs to pedestrians in each frame Object tracking 2 4 2 4 4 3 3 3 Estimate head count Unique number of IDs = head count Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 5 5 people
2.2 Head count method STEP 1 Object detection 9 � � + � � � + � � + � Object detection Pedestrian detection for each frame � + � 1 1 STEP 2 STEP 3 1 Object tracking Assign IDs to pedestrians in each frame Object tracking 2 4 2 4 4 3 3 3 Estimate head count Unique number of IDs = head count Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 5 5 people
2.2.2 Pedestrian detection using YOLOX (STEP 1) 10 In visible video, pedestrians can be detected with high accuracy by using a pre-trained model. • YOLOX[4] • Pre-trained models available, trained on MS COCO’s[5] large image dataset. • No need to prepare large image data sets. • MS COCO has no data on far-infrared images. For far-infrared video, the detection accuracy is low when using pre-trained models. [4]:Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO series in 2021,” July 2021, https://github.com/Megvii-BaseDetection/YOLOX [5]:T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and L. Zitnick, “Microsoft COCO: Common objects in context,” in ECCV. European Conference on Computer Vision, Sep. 2014. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
2.2.3 Transfer learning in YOLOX (STEP 1) Transfer learning A method for learning data for a newly added model using data from a pre-trained model. 11 • Train/Valid images. • Annotation on about 8,000 frames of far-infrared images in March. • Train: approx. 5600 images. • Valid: approx. 2400 images. • Pre-trained model. Far-infrared images and annotation data. Pre-trained model • Use the highest accuracy model in the YOLOX repository. • Detection category. • The ’person’ label is used for training. • We don’t use categories such as ‘bicycle’ and ‘dog.’ Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
2.2.4 Object tracking using BYTE (STEP 2) • BYTE[6] 12 𝒏 frame • Tracking the predicted position of an object using the Kalman filter compared to the detection result. • Separate tracking of high and low confidence objects to accommodate occlusion. Parameter Value Category “person” Minimum confidence value to track 0.25 Threshold of IoU (Overlap between predicted and detected results) Threshold of Confidence 0.8 𝒏 − 𝟏 frame 𝒏 − 𝟐 frame 1 or 2? 1 Category: person Confidence: 0.89 1 Category: person Confidence: 0.92 Category: person Confidence: 0.88 Predict Detect 0.7 [6]: Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” in Computer Vision – ECCV 2022. Cham: Springer Nature Switzerland, Oct. 2022, pp. 1–21. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
2.3 Data on various conditions Number of videos used in the experiment Daytime 13 • Various condition Night Sunny Rainy Sunny Rainy Visible 14 14 14 14 Far-infrared 14 14 14 14 • Time: Daytime or Night • Weather: Sunny or Rainy • Temperature: 11–34°C • 14 videos used for each condition. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
2.4 Evaluation Metric 100 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖 − 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖 δ= 𝑁 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖 δ : Error_rate(signed) 𝑁 : Number of videos 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖 : Head count for the 𝒊-th video 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖 : Estimated head count for the 𝒊-th video 14 • Calculate the error rate for each condition. • The +/− sign indicates that more/less than the correct head count was estimated. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Contents 15 1. Introduction 2. Experimental Method 3. Result 1. Error rate vs. hour when using visible video 2. Error rate vs. temperature when using far-infrared video 3. Average error rate by time and weather 4. Conclusion Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
3.1 Error rate trends when using visible video 16 Daytime: Error rate was within 5% 19/28. Night: Error rate was within 5% 6/28. Low light, may not be able to detect/track pedestrians Daytime/Sunny Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring Night/Rainy
3.1.1 Error cause on the sunny day Sunny 17 Cannot detect/track a person on a bicycle at night Person on bicycles is captured in blurred images. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
3.1.1 Error cause on the rainy day Rainy Cannot detect/track people who using rain gear. Rain gear hides a person's features. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 18
3.1 Error rate trends when using far-infrared video • Underestimate at certain temperatures. temperature < 23°C 23°C ≤ temperature ≤ temperature < 30°C 30°C Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 19
3.2.1 Far-infrared image at 11°C 11°C 27°C 20 34°C • The temperature of pedestrians is higher than the background. • Succeed in detecting the pedestrian. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
3.2.2 Far-infrared image at 27°C 11°C 27°C 21 34°C • Background and pedestrian temperatures are almost the same. • Fail to detect the pedestrian. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
3.2.3 Far-infrared image at 34°C 11°C 27°C 22 34°C • The temperature of background is higher than the pedestrians. • Succeed in detecting the pedestrian. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Complementary relationship between visible and 3.2.4 far-infrared videos Night/Rainy, 19°C Night/Sunny, 27°C • Depending on the time, weather, and temperature, it is better to use visible or far-infrared video. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 23
Contents 24 1. Introduction 2. Experimental Method 3. Result 1. Error rate vs. hour when using visible video 2. Error rate vs. temperature when using far-infrared video 3. Average error rate by time and weather 4. Conclusion Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
4.1 Conclusion and future work 25 • Conclusion • We obtain head count using visible and far-infrared videos. • Causes of underestimate in visible video. • Not lit, use of rain gear, clothing in the same color as the background • Causes of underestimate in far-infrared video. • Temperatures 23–30°C, use of rain gear • Future work • Construct a system that dynamically switches between visible and far-infrared videos for head count throughout the day. • Increase the number of classes such as bicycles and pets. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
IEEESM’23 IEEE International Conference on Smart Mobility Thank you
IEEESM’23 IEEE International Conference on Smart Mobility Appendix
Video Transmission Cameras • Raspberry Pi is always connected to the 4G LTE network SSH LTE Raspberry Pi M2M router 29 NAIST’s server • Transmitted at regular intervals over the SSH protocol to the campus server Flow of transmitting video Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
IoU (Intersection over Union) score 30 Correct area 10 01 0 1 • Evaluate the overlap between the correct and predictive regions IoU = 0.68 𝐈𝐨𝐔 = 𝐎𝐯𝐞𝐫𝐥𝐚𝐩𝐩𝐢𝐧𝐠 𝐚𝐫𝐞𝐚𝐬 𝐓𝐨𝐭𝐚𝐥 𝐚𝐫𝐞𝐚𝐬 = 𝟖𝟏 𝟏𝟏𝟗 𝟎 ≤ 𝐈𝐨𝐔 ≤ 𝟏 10 Predict area Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring = 𝟎. 𝟔𝟖
Comparison of head count estimation methods 31 Good ○ > △ > ✕ Bad RFID tag Have all pedestrians carry RFID tags. The head count is estimated by reading the ID when approaching the RFID reader. Wi-Fi & BLE The head count is estimated by acquiring Bluetooth beacons and Wi-Fi probe requests emitted by devices. Laser scan A laser scanner is placed about 20 cm above the ground, and the head count is estimated by measuring the laser reflection time. 3D LiDAR Estimates head counts by capturing sidewalks and pedestrians as 3D point cloud data. Camera-video The head count is estimated by detecting and tracking pedestrians in the video. Accuracy ○ △ △ ○ ○ Direction of movement ○ ✕ ○ ○ ○ Installation cost ✕ ○ ○ ○ ○ calculation cost ○ ○ ○ ✕ ○ Change of measurement target ○ ✕ ✕ ○ ○ Privacy ○ △ ○ ○ ✕ Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Background of Digital Signage Installation 1950mm 32 • Installation conditions for cameras and computers • Stable power supply • Scattered locations • Protection of cameras and computers Digital signage Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Near-infrared and far-infrared rays • Near-infrared rays • Wavelength: 0.7–2.5 [μm]. • Near-infrared rays from sunlight and other sources are reflected by objects and captured. • Different materials absorb near-infrared rays differently. • Need to emit near-infrared rays in an unlit environment. • Far-infrared rays • Wavelength: 8–14 [μm]. • Capture far-infrared rays emitted by objects. • Far-infrared rays are emitted depending on the temperature of the object. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 33
Selection of sheets to be used for capturing windows 34 • Need a sheet that transmits visible light and far infrared radiation • Utilizes far-infrared transmissive transparent sheet GAT [7] Good ◎ > ○ > △ > ✕ Bad Acrylic Quartz glass Polyethylene GAT Visible Light Transmittance ◎ ◎ △−✕ ○ Infrared Transmittance ✕ △ (far-infrared✕) ○ ○ Cost ◎ ✕ ◎ ○ △ ✕ ◎ ◎ Specific gravity Ease of processing ○ ✕ ◎ ○ Wavelength range and transmittance transmitted by GAT (Adapted from Asahi Kasei Advance Corporation Web page) [7]: https://www.asahi-kasei.co.jp/advance/jp/gat/index.html Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Video resolution adjustment 35 56 724 800 640 28 14 Shrink 600 Trimming 543 514 514 15 • They are adjusted to be the same as the far-infrared video because of the resolution and angle of view differences between the visible video (600 x 800) and the far-infrared video (514 x 640). Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Output of object detector 36 Output of YOLOX Frame Category Confidence x y w h 1 1 0.87 60 240 50 200 1 1 0.75 150 280 30 180 2 1 0.85 65 230 40 190 2 1 0.71 160 300 28 170 3 1 0.88 68 220 37 185 ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ • Video frame • Category No. • Number used in the MS COCO dataset is used • Bounding box Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Bounding box (0,0) 37 • Four values indicating the position of an object w • x: x-coordinate of box center • y: y-coordinate of box center (x,y) h • w: Width of box (Percentage of the image width as 1) • h: Height of box (Percentage of the image height to 1) bounding box Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Results of number of correct answers and head count • The total number of moving objects was estimated using RGB video and FIR video • Head count for each hour of traffic (including bicyclists and wheelchair users) Number of correct answers and head count for each pattern Daytime Night Sunny Rainy Sunny Rainy Correct 1108 1399 884 629 Visible 1162 1443 822 567 Far-infrared 1027 1172 758 554 Fused 1146 1470 837 574 Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 38
Frame rate and shutter speed Shutter speed Frame rate(4fps) The optimal shutter speed is "1/frame rate to 1/2 frame rate." Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 40
Shutter speed 41 When the shutter speed is high When the shutter speed is low The subject appears to stand still. The image will be bright. The image will be dark. Subjects appear to be blurred. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Average error rate for each condition 42 Error rate for estimating head count Daytime Sunny Rainy Night Sunny Rainy Visible (%) 4.9 (+) 3.1 (+) 7.0 (−) 7.6 (−) Far-infrared (%) 7.3 (−) 16.2 (−) 14.3 (−) 14.0 (−) • In all cases, the visible video was found to be more accurate. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Error rate for temperatures below 24°C 43 Error rate for estimating head count Daytime Sunny Rainy Night Sunny Rainy Visible (%) 2.1 (+) 1.2 (+) 1.0 (−) 10.7 (−) Far-infrared (%) 7.2 (−) 10.7 (−) 1.8 (−) 7.5 (−) • Far infrared video is more accurate when it rains at night. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Error rate for temperatures 24°C30°C 44 Error rate for estimating head count Daytime Sunny Night Rainy Sunny Rainy Visible (%) 6.6 (+) 4.3 (+) 12.7 (−) 10.7 (−) Far-infrared (%) 9.6 (−) 16.5 (−) 18.2 (−) 36.9 (−) • Visible video is more accurate. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Error rate when temperature exceeds 30°C Error rate for estimating head count Daytime Sunny Night Rainy Sunny Rainy Visible (%) 12.7 (+) - - - Far-infrared (%) 2.7 (−) - - - • Use far-infrared video when temperatures exceed 30°C. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 45
Effect of water droplets When the visible video is used, raindrops on the window cause diffuse reflections of light. A pedestrian is blurred in the video. As a result, pedestrians are captured in a blurred image. When far-infrared video is used, the image is clear and unaffected by raindrops adhering to the capture window. 可視光映像と遠赤外線映像の融合による気象変化の影響を最小化した歩行者数推定手法の提案 46
Average temperature in Kobe 47 Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. 4.9 5.2 8.2 13.9 18.4 22.1 26.1 27.5 23.9 18.1 12.7 7.6 Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Average number of pedestrians 48 Number of videos used in the experiment Daytime Average number of pedestrians Night Sunny Rainy Sunny Rainy 79.1 100.0 63.1 48.4 • The average head count varies with temperature and weather conditions. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Fusion of visible video and far-infrared video 49 • Creating fused video using DenseFuse Confusion with background Visible movie Small temperature difference Far-infrared movie Expect all pedestrian detection Fused movie Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Results of head count estimation using fused video The error rate is significant during daytime and rainy weather when the fused video is used. Small error rate during daytime/sunny weather and nighttime when using fused video. Error rate for estimating head count Daytime Sunny Rainy Night Sunny 50 Rainy Visible (%) 4.9 (+) 3.1 (+) 7.0 (−) 7.6 (−) Far-infrared (%) 7.3 (−) 16.2 (−) 14.3 (−) 14.0 (−) Fused (%) 3.4 (+) 5.1 (+) 6.7 (−) 6.6 (−) The error rate is smaller for the fused video under all conditions. Symbols (+,-) in the figure indicate a positive or negative error Example 1: 103 guesses, 100 correct answers → 3% (+) Example 2: 97 guesses, 100 correct answers → 3% (-) Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Average error rate for each condition 51 • Daytime • Estimation of head count using visible video Daytime Sunny Rainy Night Sunny Rainy Visible (%) 4.9 (+) 3.1 (+) 7.0 (−) 7.6 (−) Far-infrared (%) 7.3 (−) 16.2 (−) 14.3 (−) 14.0 (−) Fused (%) 3.4 (+) 5.1 (+) 6.7 (−) 6.6 (−) • Night • Estimation of head count using fused video 時間帯によって利用する映像を切り替え, 歩行者数推定を行うとより精度が向上 Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Error rate for temperatures below 24°C Daytime Sunny Rainy 52 Night Sunny Rainy Visible (%) 2.1 (+) 1.2 (+) 1.0 (−) 10.7 (−) Far-infrared (%) 7.2 (−) 10.7 (−) 1.8 (−) 7.5 (−) Fused (%) 1.0 (−) 2.0 (+) 4.0 (−) 6.5 (−) • Error rate within +-5% except for rain at night Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring
Error rate for temperatures 24°C30°C Daytime Sunny Night Rainy Sunny Rainy Visible (%) 6.6 (+) 4.3 (+) 12.7 (−) 10.7 (−) Far-infrared (%) 9.6 (−) 16.5 (−) 18.2 (−) 36.9 (−) Fused (%) 9.9 (+) 7.6 (+) 11.3 (−) 16.4 (−) • Fusion video contains farinfrared information, which increases the error rate. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring 53
Error rate when temperature exceeds 30°C Daytime Sunny 54 Night Rainy Sunny Rainy Visible (%) 12.7 (+) - - - Far-infrared (%) 2.7 (−) - - - Fused 7.3 (+) • Far-infrared videos are effective when temperatures are high. Benchmark of Deep Learning Visual and Far-Infrared Videos Toward Weather-tolerant Pedestrian Traffic Monitoring