4.6 Article

Web2Vec: Phishing Webpage Detection Method Based on Multidimensional Features Driven by Deep Learning

Journal

IEEE ACCESS
Volume 8, Issue -, Pages 221214-221224

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.3043188

Keywords

Feature extraction; Phishing; Uniform resource locators; Deep learning; Semantics; Licenses; Learning systems; Attention mechanism; deep learning; phishing; representation learning

Funding

  1. Shaanxi Provincial Natural Science Foundation [2020JM-533, 2018JQ5095]
  2. Chinese Postdoctoral Science Foundation [2020M673446]

Ask authors/readers for more resources

Phishing is a kind of online attack that attempts to defraud sensitive information of network users. Current phishing webpage detection methods mainly use manual feature collection, and there are problems that feature extraction is complicated and the possible correlation between features cannot be avoided. To solve the problems, a new phishing webpage detection model is proposed, among which the main components are automatic learning representations from multi-aspects features through representation learning and extracting features by hybrid deep learning network. Firstly, the model treats URL, HTML page content, and DOM (Document Object Model) structure of webpages as character sequences respectively, and uses representation learning technology to automatically learn the representation of the webpages; then, sends multiple representations to a hybrid deep learning network composed of a convolutional neural network and a bidirectional long and short-term memory network through different channels to extract local and global features, and use the attention mechanism to strengthen the influence of important features; finally, the output of multiple channels is fused to realize classification prediction. Through four sets of experiments to verify the detection effect of the model, the results show that the overall classification effect of the model is better than the existing classic phishing webpage detection methods, the accuracy reaches 99.05%, and the false positive rate is only 0.25%. It is proved that the strategies of extracting webpage features from all aspects through representation learning and hybrid deep learning network can effectively improve the detection effect of phishing webpages.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available