☆ 4.6 Article

Extended E-N-DIST Algorithm for Alias Detection

IEEE ACCESS (2021)

Journal

IEEE ACCESS

Volume 9, Issue -, Pages 7952-7959

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2020.3048755

Keywords

Alias detection; edit distance (ED); Levenshtein distance (LD); E-N-DIST; dynamic programming

Funding

Deanship of Scientific Research, Qassim University

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Nowadays, people refer to celebrities and experts not only by their real names but also by their aliases on the web. This research proposes a reliable algorithm to detect aliases resulting from the transliteration of Arabic names into English, with improvements in calculating substitution and transposition costs. Testing shows that this algorithm outperforms others in achieving a better average percentage of similarity.

Nowadays personal names are not the only way to refer to celebrities and experts from different fields, instead, they can be referred to by their aliases on the web. Associated aliases have remarkable importance in retrieving information about the personal name from the websites. Therefore, disclosing aliases can have an important role in overcoming many real-world challenges. In this research, the aim is to explore and propose a reliable algorithm that can detect aliases that occurred due to transliteration of Arabic names into English. An extension to the Enhanced N-gram distance algorithm (E-N-DIST) which was previously published is introduced in this paper. The proposed algorithm is called the Extended Enhanced N-gram distance algorithm (E-E-N-DIST). The differences between E-N-DIST and E-E-N-DIST are two main changes in calculating the cost of substitution and transposition. First, E-E-N-DIST is computed based on 2(n+1) - 1 states. The second is the use of an edit operation called the 'Exchange of Vowels' to count the common spelling errors that happen due to the transliteration from one language to another. The idea of exchange of vowels is to search for vowels (viz. (=) a', (=) e', (=) i', (=) o', and (=) u') and the non-vowel character (=) y' that has a vowel sound or a part of it in other languages to estimate the operations cost of insertion and deletion. The proposed algorithm tested using a dataset for the literature; the results obtained are compared with other algorithms from the state of the art. The proposed algorithm outperforms other algorithms; it achieved a better average percentage of similarity than all other compared algorithms.

Extended E-N-DIST Algorithm for Alias Detection

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Extended E-N-DIST Algorithm for Alias Detection

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper