4.5 Article

Revealing representative day-types in transport networks using traffic data clustering

Journal

Publisher

TAYLOR & FRANCIS INC
DOI: 10.1080/15472450.2023.2205020

Keywords

Cluster validity; clustering; day clustering; dimensionality reduction; external indices; internal indices; network-wide; prediction

Ask authors/readers for more resources

Recognition of spatio-temporal traffic patterns is crucial for intelligent transport systems (ITS). Common practice relies on unsupervised machine learning methods for clustering, but they have limitations in evaluating the quality of clustering. This paper compares internal and external validation methods using short-term prediction, finding that internal evaluation tends to underestimate the number of representative day-types needed. The paper also explores the use of dimensionality reduction in clustering, achieving similar performance with lower computational costs.
Recognition of spatio-temporal traffic patterns at the network-wide level plays an important role in data-driven intelligent transport systems (ITS) and is a basis for applications such as short-term prediction and scenario-based traffic management. Common practice in the transport literature is to rely on well-known general unsupervised machine-learning methods (e.g., k-means, hierarchical, spectral, DBSCAN) to select the most representative structure and number of day-types based solely on internal evaluation indices. These are easy to calculate but are limited since they only use information in the clustered dataset itself. In addition, the quality of clustering should ideally be demonstrated by external validation criteria, by expert assessment or the performance in its intended application. The main contribution of this paper is to test and compare the common practice of internal validation with external validation criteria represented by the application to short-term prediction, which also serves as a proxy for more general traffic management applications. When compared to external evaluation using short-term prediction, internal evaluation methods have a tendency to underestimate the number of representative day-types needed for the application. Additionally, the paper investigates the impact of using dimensionality reduction. By using just 0.1% of the original dataset dimensions, very similar clustering and prediction performance can be achieved, with up to 20 times lower computational costs, depending on the clustering method. K-means and agglomerative clustering may be the most scalable methods, using up to 60 times fewer computational resources for very similar prediction performance to the p-median clustering.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available