☆ 4.6 Article

PhiKitA: Phishing Kit Attacks Dataset for Phishing Websites Identification

IEEE ACCESS (2023)

Journal

IEEE ACCESS

Volume 11, Issue -, Pages 40779-40789

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2023.3268027

Keywords

Phishing; Internet; Classification algorithms; Feature extraction; Computer security; Computer crime; Uniform resource locators; Social engineering (security); Cyber threat intelligence; Cybersecurity; cybercrime; cyber threats; phishing; social engineering; phishing kits

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and on a larger scale. This study proposes a novel dataset called PhiKitA, which contains phishing kits and phishing websites generated using these kits. The study applies MD5 hashes, fingerprints, and graph representation DOM algorithms to analyze the dataset, with promising results in detecting phishing campaigns and classifying phishing websites.

Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and more massive. Detecting phishing kits in deployed websites might help to detect phishing campaigns earlier. To the best of our knowledge, there are no datasets providing a set of phishing kits that are used in websites that were attacked by phishing. In this work, we propose PhiKitA, a novel dataset that contains phishing kits and also phishing websites generated using these kits. We have applied MD5 hashes, fingerprints, and graph representation DOM algorithms to obtain baseline results in PhiKitA in three experiments: familiarity analysis of phishing kit samples, phishing website detection and identifying the source of a phishing website. In the familiarity analysis, we find evidence of different types of phishing kits and a small phishing campaign. In the binary classification problem for phishing detection, the graph representation algorithm achieved an accuracy of 92.50%, showing that the phishing kit data contain useful information to classify phishing. Finally, the MD5 hash representation obtained a 39.54% F1 score, which means that this algorithm does not extract enough information to distinguish phishing websites and their phishing kit sources properly.

PhiKitA: Phishing Kit Attacks Dataset for Phishing Websites Identification

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

PhiKitA: Phishing Kit Attacks Dataset for Phishing Websites Identification

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper