4.7 Article

High-performance flow classification using hybrid clusters in software defined mobile edge computing

Journal

COMPUTER COMMUNICATIONS
Volume 160, Issue -, Pages 643-660

Publisher

ELSEVIER
DOI: 10.1016/j.comcom.2020.07.002

Keywords

Flow classification; Graphics Processing Unit (GPU); GPU cluster; Mobile edge computing; Software Defined Networking; Software Defined Mobile Edge Computing (SDMEC)

Ask authors/readers for more resources

Mobile Edge Computing (MEC) provides different storage and computing capabilities within the access range of mobile devices. This moderates the burden of offloading compute/storage-intensive processes of the mobile devices to the centralized cloud data centers. As a result, the network latency is reduced and the quality of service provided for the mobile end users is improved. Different applications benefit from the large-scale deployments of MEC servers. However, the considerable complexity of managing large scale deployments of the sheer number of applications for the millions of mobile devices is a challenge. Recently, Software Defined Networking (SDN) is leveraged to resolve the problem by providing unified and programmable interfaces for managing network devices. Most of the current SDN packet processing services are tightly dependent on the packet classification service. This primary service classifies any incoming packet based on matching a set of specific fields of its header against a flow table. Acceleration of this basic process considerably increases the performance of the SDN-based MEC. In this paper, the hierarchical tree algorithm, which is a packet classification method, is parallelized using popular platforms on a cluster of Graphics Processing Units (GPUs), a cluster of Central Processing Units (CPUs), and a hybrid cluster. The best scenario for the parallel implementation of this algorithm on the CPU cluster is that which combines OpenMP and MPI. In this case, the throughput of the classifier is 4.2 million packets per second (MPPS). On the GPU cluster, two different scenarios have been used. In the first scenario, the global memory is used to store the rules and the Hierarchical-trie of the classifier while in the second scenario we break the filter set in a way that the resulting Hierarchical-trie of each subset could be stored in the shared memory of GPU. According to the results, although the first GPU cluster scenario achieves a throughput of 29.19 MPPS and a speedup 58 times as great as the serial mode, the second scenario is 12 times faster due to using the shared memory. The best performance, however, belongs to the hybrid cluster mode. The hybrid cluster achieves a throughput of 30.59 which is 1.4 MPPS more than the GPU cluster.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available