☆ 4.7 Article

Heter-Train: A Distributed Training Framework Based on Semi-Asynchronous Parallel Mechanism for Heterogeneous Intelligent Transportation Systems

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS (2023)

期刊

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

卷 -, 期 -, 页码 -

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TITS.2023.3286400

关键词

Transportation big data; intelligent transportation; distributed training; edge computing; heterogeneous systems

类别

Engineering, Civil Engineering, Electrical & Electronic Transportation Science & Technology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper introduces a distributed training framework called Heter-Train, which addresses the issue of parallel training on heterogeneous cloud-edge-vehicle clusters in intelligent transportation systems. The framework includes a communication-efficient semi-asynchronous parallel mechanism and a solution for heterogeneous communication. Experimental results demonstrate significant speedups on training time without sacrificing accuracy.

Transportation big data (TBD) are increasingly combined with artificial intelligence to mine novel patterns and information due to the powerful representational capabilities of deep neural networks (DNNs), especially for anti-COVID19 applications. The distributed cloud-edge-vehicle training architecture has been applied to accelerate DNNs training while ensuring low latency and high privacy for TBD processing. However, multiple intelligent devices (e.g., intelligent vehicles, edge computing chips at base stations) and different networks in intelligent transportation systems lead to computing power and communication heterogeneity among distributed nodes. Existing parallel training mechanisms perform poorly on heterogeneous cloud-edge-vehicle clusters. The synchronous parallel mechanism may force fast workers to wait for the slowest worker for synchronization, thus wasting their computing power. The asynchronous mechanism has communication bottlenecks and can exacerbate the straggler problem, causing increased training iterations and even incorrect convergence. In this paper, we introduce a distributed training framework, Heter-Train. First, a communication-efficient semi-asynchronous parallel mechanism (SAP-SGD) is proposed, which can take full advantage of acceleration effect of asynchronous strategy on heterogeneous training and constrain the straggler problem by using global interval synchronization. Second, Considering the difference in node bandwidth, we design a solution for heterogeneous communication. Moreover, a novel weighted aggregation strategy is proposed to aggregate the model parameters with different versions. Finally, experimental results show that our proposed strategy can achieve up to $6.74 \times$ speedups on training time, with almost no accuracy decrease.

Heter-Train: A Distributed Training Framework Based on Semi-Asynchronous Parallel Mechanism for Heterogeneous Intelligent Transportation Systems

期刊

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Heter-Train: A Distributed Training Framework Based on Semi-Asynchronous Parallel Mechanism for Heterogeneous Intelligent Transportation Systems

期刊

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文