☆ 4.4 Article

Seoul bike trip duration prediction using data mining techniques

IET INTELLIGENT TRANSPORT SYSTEMS (2020)

Journal

IET INTELLIGENT TRANSPORT SYSTEMS

Volume 14, Issue 11, Pages 1465-1474

Publisher

WILEY

DOI: 10.1049/iet-its.2019.0796

Keywords

data mining; feature extraction; mean square error methods; regression analysis; traffic information systems; intelligent transportation systems; random forests; nearest neighbour methods; Seoul bike trip duration prediction; data mining techniques; trip distance; Seoul bike data; Seoul bike sharing system; intelligent transport systems; traveller information systems; trip-time prediction; rental bikes; feature engineering; feature extraction; statistical models; linear regression; gradient boosting machines; k nearest neighbour; Random Forest; root mean squared error; coefficient of variance; mean absolute error; median absolute error

Funding

National Research Foundation of Korea [5199990214660] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Trip duration is the most fundamental measure in all modes of transportation. Hence, it is crucial to predict the trip-time precisely for the advancement of Intelligent Transport Systems and traveller information systems. To predict the trip duration, data mining techniques are employed in this study to predict the trip duration of rental bikes in Seoul Bike sharing system. The prediction is carried out with the combination of Seoul Bike data and weather data. The data used include trip duration, trip distance, pickup and dropoff latitude and longitude, temperature, precipitation, wind speed, humidity, solar radiation, snowfall, ground temperature and 1-hour average dust concentration. Feature engineering is done to extract additional features from the data. Four statistical models are used to predict the trip duration. (a) Linear regression, (b) Gradient boosting machines, (c) k nearest neighbour and (d) Random Forest (RF). Four performance metrics root mean squared error, coefficient of variance, mean absolute error and median absolute error is used to determine the efficiency of the models. In comparison with the other models, the best model RF can explain the variance of 93% in the testing set and 98% (R-2) in the training set. The outcome proves that RF is effective to be employed for the prediction of trip duration.

Seoul bike trip duration prediction using data mining techniques

Journal

IET INTELLIGENT TRANSPORT SYSTEMS

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Seoul bike trip duration prediction using data mining techniques

Journal

IET INTELLIGENT TRANSPORT SYSTEMS

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper