4.6 Article

SPWalk: Similar Property Oriented Feature Learning for Phishing Detection

Journal

IEEE ACCESS
Volume 8, Issue -, Pages 87031-87045

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.2992381

Keywords

Feature learning; network embedding; phishing detection; similar property

Funding

  1. National Science Foundation of China [61972297, U1636107]

Ask authors/readers for more resources

Detecting phishing webpages is an essential task that protects legitimate websites and their users from various malicious activities. To classify the suspect webpage as phishing or legitimate, robust and effective features used for classification are in demand. However, recent phishing attacks usually make phishing webpages resemble the legitimate webpages in visual and functional aspects. This poses a greater difficulty for feature extraction. We herein propose SPWalk, an unsupervised feature learning algorithm for phishing detection. In SPWalk, similar property nodes refer to a collection of phishing webpages or legitimate webpages. We first construct a weblink network with nodes representing webpages. The edges between nodes represent the reference relationships that connect webpages through hyperlinks or similar textual content. Then, SPWalk applies the network embedding technique to mapping nodes into a low-dimensional vector space. A biased random walk procedure efficiently integrates both structural information between nodes and URL information of each node. The effectiveness and robustness of SPWalk come from three points. (1). Phishing attackers do not have full control over reference relationships. (2). The structural regularities generated by diverse reference relationships can be exploited to discriminate between phishing and legitimate webpages. (3). Node URL information makes the learned node representations more suited for phishing detection. Using node as numeric features, we conduct experiments to classify webpages as legitimate or phishing. We demonstrate the superiority of SPWalk over state-of-the-art techniques on phishing detection, especially in terms of precision (over 95 & x0025;). Even in the case that phishing webpages are well camouflaged by attackers for evading detection, SPwalk exhibits better classification efficacy consistently.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available