4.4 Article

Seoul bike trip duration prediction using data mining techniques

Journal

IET INTELLIGENT TRANSPORT SYSTEMS
Volume 14, Issue 11, Pages 1465-1474

Publisher

WILEY
DOI: 10.1049/iet-its.2019.0796

Keywords

data mining; feature extraction; mean square error methods; regression analysis; traffic information systems; intelligent transportation systems; random forests; nearest neighbour methods; Seoul bike trip duration prediction; data mining techniques; trip distance; Seoul bike data; Seoul bike sharing system; intelligent transport systems; traveller information systems; trip-time prediction; rental bikes; feature engineering; feature extraction; statistical models; linear regression; gradient boosting machines; k nearest neighbour; Random Forest; root mean squared error; coefficient of variance; mean absolute error; median absolute error

Funding

  1. National Research Foundation of Korea [5199990214660] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

Ask authors/readers for more resources

Trip duration is the most fundamental measure in all modes of transportation. Hence, it is crucial to predict the trip-time precisely for the advancement of Intelligent Transport Systems and traveller information systems. To predict the trip duration, data mining techniques are employed in this study to predict the trip duration of rental bikes in Seoul Bike sharing system. The prediction is carried out with the combination of Seoul Bike data and weather data. The data used include trip duration, trip distance, pickup and dropoff latitude and longitude, temperature, precipitation, wind speed, humidity, solar radiation, snowfall, ground temperature and 1-hour average dust concentration. Feature engineering is done to extract additional features from the data. Four statistical models are used to predict the trip duration. (a) Linear regression, (b) Gradient boosting machines, (c) k nearest neighbour and (d) Random Forest (RF). Four performance metrics root mean squared error, coefficient of variance, mean absolute error and median absolute error is used to determine the efficiency of the models. In comparison with the other models, the best model RF can explain the variance of 93% in the testing set and 98% (R-2) in the training set. The outcome proves that RF is effective to be employed for the prediction of trip duration.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available