129 Views
March 27, 23
スライド概要
IT Engineer.
Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation IEEESM’23 @ KAUST Takumi Niwa, Ismail Arai, Arata Endo, Masatoshi Kakiuchi, Kazutoshi Fujikawa (Nara Institute of Science and Technology (NAIST), Japan)
Bus Arrival Time (BAT) prediction 2 • BAT Prediction is important to improve the quality of route bus services. • Users can use route buses with less waiting time. • Bus operators can manage and evaluate bus schedules. Predicted Bus Arrival Times Next 2nd 3rd 16:21 (+1 min) 16:38 (- 2 min) 17:25 (+5 min) • The existing studies on BAT prediction used deep learning models in recent years[1]. • Several prediction models [2,3] can predict BAT for multiple trips. [1] N. Singh, and K. Kumar, “A review of bus arrival time prediction using artificial intelligence,” WIREs Data Mining and Knowledge Discovery, vol. 12, no. 4, p. e1457. [2] N. C. Petersen, F. Rodrigues, and F. C. Pereira, “Multi-output bus travel time prediction with convolutional LSTM neural network,” Expert Systems with Applications, vol. 120, pp. 426–435, 2019. [3] A. Ishinaga, I. Arai, M. Kakiuchi, and K. Fujikawa, “Bus arrival time prediction method by convolution of operation and weather information,” Research Report: Intelligent Transportation Systems and Smart Communities, vol. 2021-ITS-84, no. 6, pp. 1–8, 2021. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Bus operation data • The existing BAT prediction uses bus operation data. • Including running time of links and stopping time of bus stops. • Several prediction models require consecutive data for multiple trips. • Ishinaga’s model [3] requires the input of bus operation data for the last 8 trips. • Even if the data has a missing rate of only 10%, the probability of being able to continuously extract it 8 times is as low as 43%. • Missing bus operation data require imputation. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 3
4 9863 trips (390 days) White: Normal data Black: Missing data • # of missing trips = 743 / 9863 trips (7.53%) • The percentage of the data obtained for 8 consecutive trips is 67.2%. • It is difficult to eliminate missing due to trouble with onboard bus equipment, packet loss, and failure in the GPS sensor. • The existing studies [2,3] have used simple missing imputations such as LOCF. • Last observation carried forward (LOCF) replaces a missing value with the last value observed before the missing value occurred. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 2022-09-25 2021-09-01 Missing in actual bus operation dataset
Related works of imputation in time series data prediction 5 • Shin et al. [4] reduced the prediction error of LSTM-based traffic congestion prediction by using the imputation method focused on traffic data characteristics. Spatial imputation Temporal imputation Pattern imputation Using the road conditions adjacent to the missing location Using the mean of the 𝑛-time previous data at a missing location Using the pattern data generated in advance for each day of the week + + [4] D.-H. Shin, K. Chung, and R. C. Park, “Prediction of traffic congestion based on LSTM through correction of missing temporal and spatial data,” IEEE Access, vol. 8, pp. 150 784–150 796, 2020. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Related works of imputation in time series data prediction 6 • Shin et al. [4] reduced the prediction error of LSTM-based traffic congestion prediction by using the imputation method focused on traffic data characteristics. Would an imputation focused on the characteristics of bus operation data help to reduce BAT prediction errors? [4] D.-H. Shin, K. Chung, and R. C. Park, “Prediction of traffic congestion based on LSTM through correction of missing temporal and spatial data,” IEEE Access, vol. 8, pp. 150 784–150 796, 2020. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Bus operation dataset used in this study Index Date Trip ID 1 2022-06-01 2 Running time (sec) Stopping time (sec) 7 Timetable difference (sec) 1 ⋯ 5 1 93.5 ⋯ 2022-06-01 2 129.0 ⋮ ⋮ ⋮ 26 2022-06-01 26 105.5 ⋯ 479.5 118.0 ⋯ 404.5 -119.7 ⋯ -284.7 27 2022-06-02 1 125.0 ⋯ 456.0 60.0 ⋯ 16.0 -58.9 ⋯ 109.6 ⋮ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⋮ 1 ⋯ 6 1 ⋯ 1355.5 28.0 ⋯ 36.0 -18.5 ⋯ 692.9 ⋯ NA 121.0 ⋯ NA -120.5 ⋯ NA ⋯ ⋮ ⋯ ⋮ ⋯ ⋮ ⋯ • Trip ID indicates the number of trips in a day. • Running time, stopping time, and timetable difference are recorded for each link/bus stop. ⋮ ⋮ ⋮ ⋮ ⋯ 6 ⋮ • If even one value is missing, the data of the trip is missing. • Imputation is performed on a columnby-column basis. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Proposal methods 8 • We propose three imputation methods based on Shin’s method [4]. • We do not use spatial imputation because it is not applicable due to the nature of bus operation data. Temporal imputation Uses means of running and stopping times of several trips before missing data. Pattern imputation Combined imputation Uses means of running and stopping times at the same hour as a bus service where missing occurs. Uses pattern imputation for consecutive missing data and temporal imputation for nonconsecutive missing data. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Proposal methods: Temporal imputation Trip ID 1 Running time 1 222.5 2 250.0 3 125.0 4 5 • Use the mean of the 𝑁mean trips before the missing value. 𝑁mean = 3 Trip ID mean: 199.2 1 Running time 1 222.5 2 250.0 3 125.0 NA 4 199.2 209.5 5 209.5 We expect this method to reflect recent trip disruptions, such as rain or traffic congestion delays. This method does not focus on daily periodicity. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 9
Proposal methods: Pattern imputation Running time 2 Date Trip ID 2022-06-02 1 2022-06-02 2 114.0 ︙ ︙ ︙ 2022-06-03 1 93.5 2022-06-03 2 125.0 ︙ ︙ ︙ 2022-06-04 1 82.5 2022-06-04 2 85.0 ︙ ︙ ︙ Trip ID 102.5 Calculate the mean of running times with matching trip IDs Trip ID 1 Running time 2 1 NA 2 103.5 ︙ ︙ Pattern data Trip ID impute Running time 2 92.8 2 108.0 ︙ ︙ Running time 2 1 92.8 2 103.5 ︙ ︙ 10 • Use pattern data generated from non-missing parts in a bus operation dataset. • The pattern data is generated by calculating the mean of the bus operation data with matching trip IDs. We expect this method to incorporate daily periodicity, such as morning and evening congestion periods. This method ignores the recent trip disruptions. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Proposal methods: combined imputation 11 Use temporal imputation for non-consecutive missing Trip ID Running time 1 Trip ID Running time 1 1 222.5 1 222.5 2 250.0 2 250.0 3 125.0 3 125.0 4 NA 4 119.2 5 209.5 5 209.5 6 NA 6 212.4 ︙ ︙ ︙ ︙ 𝑁mean = 3 mean: 199.2 Pattern data Trip ID ︙ 5 6 7 ︙ Running time 1 ︙ 185.6 212.4 200.8 ︙ Use pattern imputation for consecutive missing • Use a combination of temporal imputation and pattern imputation This method compensates for the weaknesses of temporal imputation and pattern imputation each other. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Analysis of bus operation dataset 1 day • There is daily periodicity. 1 0.8 Autocorrelation 12 • This route has 26 trips per day. 0.6 • Other running and stopping times have the same periodicity. 0.4 0.2 0 −0.2 0 26 52 78 104 130 156 182 Lags Autocorrelation of running time for a given link Is an imputation focused on daily periodicity effective? ※ Lags is the number of trips shifted from the original data. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Evaluation How the prediction error changes when different imputation methods are applied to bus operation data → Evaluate whether each imputation method is suitable for BAT prediction • We input test data with varying missing rates into trained prediction models. • We assumed the case of missing input data at the time of prediction. • We predicted the BATs of one, two, and three trips ahead, then compared the prediction errors for each imputation method. • We use Ishinaga’s model [3] for BAT prediction. • We used Mean Absolute Error (MAE) as the error measure for predicted BATs. ( • MAE = ) ∑)*+( 𝑡̂* − 𝑡* • where 𝑡̂* is predicted BAT, 𝑡* is actual BAT, and 𝑛 is a number of instances. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 13
Methods used in experiments 14 Targets for comparison with the proposed method u Baseline: Historical Average (HA) • HA uses the mean of previous trips as the predicted BAT (Not Ishinaga’s method). u Existed method: LOCF • Last observation carried forward (LOCF) replaces a missing value with the last value observed before the missing value occurred. • The existing studies [2,3] have used it, u Temporal imputation • 𝑁mean = 5 due to analysis of training data. u Pattern imputation • Pattern data was made with training data. u Combined imputation • 𝑁mean = 5 due to analysis of training data. • Pattern data was made with training data. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Target bus route 15 • Estimated trip time: 35 min. 6: Shin-Kobe sta. • Number of trips per day: 26 • 6:00–22:00 4: Kobe Bay Sheraton hotel 3: Rokko island Konan hospital 2: West Court 7 bangai 5: Kobe-Sannomiya sta. 1: Kobe international univ. • Data collection period: from 2021-09-01 to 2022-09-25 • Total number of trips: 9863 Line No.21 (inbound) of Kobe Minato Kanko Bus © OpenStreetMap, https://openstreetmap.org/copyright Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Results – MAE of one trip ahead prediction HA MAE of one trip ahead (sec) 170 LOCF Temporal imputation Pattern imputation Combined imputation • LOCF had the lowest MAE when the missing rate of test data was less than 30%. 165 160 155 • As the missing rate increased, the MAE of LOCF increased significantly. 150 145 140 16 0 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 • Temporal imputation had a higher MAE. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Results – MAE of two trips ahead prediction HA MAE of two trips ahead (sec) 170 LOCF Temporal imputation Pattern imputation 17 Combined imputation • The MAE increased compared to one trip ahead, mainly when applying LOCF. 165 160 155 150 • Pattern imputation had a lower MAE. 145 140 0 10 20 30 40 50 60 70 80 90 Missing rate of test data (%) Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Results – MAE of three trips ahead prediction HA MAE of three trips ahead (sec) 170 LOCF Temporal imputation Pattern imputation Combined imputation 165 • MAE increased further compared to two trips ahead. 160 155 150 • Pattern imputation had a lower MAE. 145 140 18 0 10 20 30 40 50 60 70 80 90 Missing rate of test data (%) Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Results – Focusing on missing rates of 30% or less 170 155 150 145 140 Pattern imputation 1 trip ahead 0 155 150 145 140 0 Combined imputation 160 160 MAE of two trips ahead (sec) 160 Temporal imputation 160 MAE of one trip ahead (sec) 165 LOCF 10 20 30 10 20 30 40 Missing rate of test data (%) 50 2 trips ahead 155 150 145 140 0 10 20 30 60 70 80 90 Missing rate of test data (%) Missing rate of test data (%) LOCF had a lower MAE when predicting one trip ahead. MAE of three trips ahead (sec) HA 19 3 trips ahead 155 150 145 140 0 10 20 30 Missing rate of test data (%) Imputation focused on daily periodicity is effective when predicting multiple trips. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Conclusion 20 • BAT prediction requires the imputation of bus operation data. • Even a missing rate of a few percent in a bus operation dataset makes it difficult to input the prediction model. • We proposed three imputation methods focused on the characteristics of bus operation data. • Temporal imputation / Pattern imputation / Combined imputation • Pattern imputation focused on daily periodicity is particularly effective in BAT prediction for multiple trips. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Future works 21 • Evaluating whether the imputation focused on daily periodicity can also reduce the error of BAT prediction for other bus routes. • Improving our imputation method, especially temporal imputation. • Temporal imputation results cannot reflect the daily periodicity, and the prediction model cannot learn the daily periodicity. • We need another approach that incorporates the disruptions of the previous trip. • E.g., The degree of delay of the last trip can be calculated based on the pattern data, and a bias can be applied. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Thank you for listening! • Takumi Niwa (Nara Institute of Science and Technology (NAIST), Japan) • E-mail: [email protected] Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Appendix Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Decoder Predicted BAT Temperature Precipitation Sunny day flag Weather Data Predicted running time calculation binding Bus Operation Data Encoder binding Cloudy day flag Rainy day flag Decoder Timetable diff BiConvLSTM Encoder Stopping time Prediction model for running time Value scaling Running time Missing data imputation Existing BAT prediction method (Ishinaga et al.)[2] BiConvLSTM Predicted stopping time Prediction model for stopping time 24 • Bus operation & weather data for the last 8 trips are input to the prediction model. • Arrival times for the next 3 trips are predicted. [2] A. Ishinaga, I. Arai, M. Kakiuchi, and K. Fujikawa, “Bus arrival time prediction method by convolution of operation and weather information,” Research Report: Intelligent Transportation Systems and Smart Communities, vol. 2021-ITS-84, no. 6, pp. 1–8, 2021. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Evaluation experiment 1. Divided the bus operation dataset into training, validation, and test data 2. Prepared several sets of test data with artificially increased missing values 3. Created prediction models for each imputation method 4. Predicted BATs by applying the same imputation method as the prediction model to each test data 5. Compared the MAE of predicted BATs Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 25
Evaluation experiment (1/5) 26 1. Divided the bus operation dataset into training, validation, and test data 2. 3. 4. 5. Prepared several sets of test data with artificially increased missing values Created prediction models for each imputation method Predicted BATs by applying the same imputation method as the prediction model to each test data Compared the MAE of predicted BATs Kind Start date End date # of trips # of error trips Error rate (%) Training data Sep. 1, 2021 Sep. 3, 2022 9291 720 7.75 Validation data Sep. 4, 2022 Sep. 10, 2022 182 11 6.04 Test data Sep. 11, 2022 Sep. 25, 2022 390 12 3.08 • We divided the dataset based on a period to prevent data leakage. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Evaluation experiment (2/5) 1. 27 Divided the bus operation dataset into training, validation, and test data 2. Prepared several sets of test data with artificially increased missing values 3. 4. 5. Created prediction models for each imputation method Predicted BATs by applying the same imputation method as the prediction model to each test data Compared the MAE of predicted BATs st te Original test data Error rate 3.08% st te x91 Error rate 10% x10 ⋯ Error rate 20% Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation st te x10 st te Test data with artificial missing (90 types) st te Test data x10 Error rate 90%
Evaluation experiment (3/5) 1. 2. 28 Divided the bus operation dataset into training, validation, and test data Prepared several sets of test data with artificially increased missing values 3. Created prediction models for each imputation method 4. 5. Predicted BATs by applying the same imputation method as the prediction model to each test data Compared the MAE of predicted BATs Apply imputation method Model LOCF applied ⋮ Create Ishinaga’s prediction models [3] Training & validation data ⋮ Model One prediction model for each imputation method u LOCF u Temporal imputation u Pattern imputation u Combined imputation = Four prediction models Combined imputation applied Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Evaluation experiment (4/5) 1. 2. 3. 29 Divided the bus operation dataset into training, validation, and test data Prepared several sets of test data with artificially increased missing values Created prediction models for each imputation method 4. Predicted BATs by applying the same imputation method as the prediction model to each test data 5. Compared the MAE of predicted BATs st te Apply imputation method x91 LOCF applied st te x91 input x91 x91 predict ⋮ ⋮ st te Test data BATs Model Model Combined imputation applied Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation ⋮ x91
Evaluation experiment (5/5) 1. 2. 3. 4. 30 Divided the bus operation dataset into training, validation, and test data Prepared several sets of test data with artificially increased missing values Created prediction models for each imputation method Predicted BATs by applying the same imputation method as the prediction model to each test data 5. Compared the MAE of predicted BATs BATs Calculate MAE MAE of one trip ahead (sec) x91 ⋮ LOCF Temporal imputation Pattern imputation Combined imputation 165 160 155 150 145 140 x91 HA 170 0 10 20 30 40 50 60 70 80 90 Missing rate of test data (%) Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
How to view experiment results HA MAE of one trip ahead (sec) 170 LOCF Temporal imputation Pattern imputation 31 Combined imputation 165 160 155 150 145 140 0 10 20 30 40 50 60 70 80 90 Missing rate of test data (%) Results for original test data (missing rate: 3.08%) MAE varies because there were ten patterns predicted BATs for each missing rate. • Plot: Mean of MAE • Bars: Std of MAE Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
The need for accurate BAT prediction 32 Rate [%] • According to a survey by Gooze et al [5], users are dissatisfied when the BAT prediction error is higher than 4– 5 minutes or more. Allowable BAT prediction error [min] • In the study by Ishinaga et al [2], the MAE is around 2 min, but there is the error of more than 10 minutes on rainy days. [5] A. Gooze, K. E. Watkins, and A. Borning, “Benefits of real-time transit in- formation and impacts of data accuracy on rider experience,” Transportation Research Record, vol. 2351, no. 1, pp. 95–103, 2013. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Related works of imputation in time series data prediction 33 • Shin et al. [4] reduced the prediction error of LSTM-based traffic congestion prediction by using the imputation method focused on traffic data characteristics. • Learning and evaluation with varying missing rates • Compared with conventional simple interpolation methods • Historical imputation method (HIM) • Nearest Neighbor imputation method (NIM) • Low absolute mean percent error (MAPE) [4] D.-H. Shin, K. Chung, and R. C. Park, “Prediction of traffic congestion based on LSTM through correction of missing temporal and spatial data,” IEEE Access, vol. 8, pp. 150 784–150 796, 2020. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Bus route 34 バス停 1 バス停 2 バス停 3 走行区間 1 Bus stop1 1 バス停 2 Bus stop2走行区間 2 バス停 Bus stop3 3 バス停 Example of a route with 𝑩 bus stops "" #" Link 1 走行区間 1 $" #" ## ## $# "# #$ !$ ・・・ $$ ・・・ ・・・ ・・・ ・・・ Bus stop B B バス停 "!%" #! Link B-1 走行区間 B-1 $! "!%" $!! ! !! Running time 𝑟! • At link 𝑏 Stopping time 𝑠! • At bus stop 𝑏 #! Timetable Difference 𝑑! • Seconds behind schedule • Negative number for early arrival 7:15 6:45 ・・・ 走行区間 B-1 7:15 !$ ・・・ ・・・ $$ 6:45 6:42 6:40 時刻表 !# !# 6:42 6:40 時刻表 Timetable #$ Link 2 走行区間 2 $# "" $" "# バス停 B Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Ishinaga’s prediction model [2] 35 入力: !!" 運行分 Input: trips Conv LSTM … BN Dropout … Conv LSTM Conv LSTM デコーダ Conv LSTM Time Distributed (Dense) Decoder … Encoder エンコーダ BN Conv LSTM Conv LSTM Conv LSTM Dropout … Conv LSTM … Conv LSTM BN Dropout Conv LSTM BN 出力: !#$% 運行分 Output: trips … Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Daily periodicity of running time 36 Running time (sec) • Morning and evening buses have shorter running times. Estimated Trip ID Mean + Std of running time per trip for link 4 • Fewer passengers • Less traffic • Trips 6-10 have longer running times. • Many passengers Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Links and estimated travel time of target route Link No. 37 Estimated travel time (min.) Departure Arrival 1 Kobe international univ. West Court 7 bangai 2 2 West Court 7 bangai Rokko island Konan hosp. 1 3 Rokko island Konan hosp. Kobe Bay Sheraton hotel 2 4 Kobe Bay Sheraton hotel Kobe-Sannomiya sta. 20 5 Kobe-Sannomiya sta. Shin-Kobe sta. 10 Total: 35 • Daily periodicity is small for links 1-3 due to the short running time. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Autocorrelation Autocorrelation – Running time of link 1 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 38
Autocorrelation Autocorrelation – Running time of link 2 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 39
Autocorrelation Autocorrelation – Running time of link 3 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 40
Autocorrelation Autocorrelation – Running time of link 4 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 41
Autocorrelation Autocorrelation – Running time of link 5 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 42
Autocorrelation Autocorrelation – Stopping time of bus stop 1 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 43
Autocorrelation Autocorrelation – Stopping time of bus stop 2 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 44
Autocorrelation Autocorrelation – Stopping time of bus stop 3 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 45
Autocorrelation Autocorrelation – Stopping time of bus stop 4 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 46
Autocorrelation Autocorrelation – Stopping time of bus stop 5 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 47
Autocorrelation Autocorrelation – Stopping time of bus stop 6 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 48
Determination of 𝑵mean for temporal imputation and combined imputation 49 • Prediction of bus arrival time at the last bus stop is influenced by links 4 and 5. • Autocorrelations for running times 4 and 5 showed correlations for 5 trips before and after the peak. • We used 𝑁mean = 5. Autocorrelation of running time of link 5 Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
MAE of imputation (sec) MAE of imputation (Running time, Error rate = 30%) Temporal imputation Pattern imputation Combined imputation Link Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 50
MAE of imputation (sec) MAE of imputation (Stopping time, Error rate = 30%) Temporal imputation Pattern imputation Combined imputation Bus stop Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 51
Analysis of impute results of temporal imputation 52 Sep. 17th, 2021,𝑁mean = 5 Artificial missing Running Time 4 (sec) Temporal imputation • Trips 12 and 13 were imputed higher than actual. • Trips 17 and 18 were imputed lower than actual. Trip • Temporal imputations may show the opposite increase or decrease from the actual increase or decrease. The red dotted line is the estimated running time (1200 sec.) Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Analysis of impute results of pattern imputation 53 Sep. 17th, 2021 Pattern imputation Artificial missing Pattern data Running Time 4 (sec) • Less imputation error than temporal imputation. • Pattern imputation has a larger error when trips are disrupted compared to the pattern (trip 17, 18) Trip The red dotted line is the estimated running time (1200 sec.) Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Analysis of impute results of combined imputation 54 Sep. 15th, 2021 LOCF Temporal imputation Pattern imputation Artificial missing Combined imputation • Except for Trip 9, the impute results are the same as for pattern imputation. Running Time 4 (sec) HA 10 20 30 40 50 60 70 80 90 • Trip 9 has more errors than pattern imputation. Missing rate of test data (%) Trip Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation
Prediction result – one trip ahead, stable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 便 Trip Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 55
Prediction result – two trip ahead, stable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 便 Trip Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 56
Prediction result – three trip ahead, stable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 便 Trip Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 57
Prediction result – one trip ahead, unstable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 Trip 便 Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 58
Prediction result – one trip ahead, unstable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 Trip 便 Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 59
Prediction result – one trip ahead, unstable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 Trip 便 Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 60
Which imputation use in combined imputation? Temporal • The higher the missing rate of test data, the more the impute results are identical to pattern imputation. Original utilization (%) Pattern Error rate of test data Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 61