4.6 Article

Detection of DGA-Generated Domain Names with TF-IDF

期刊

ELECTRONICS
卷 11, 期 3, 页码 -

出版社

MDPI
DOI: 10.3390/electronics11030414

关键词

DGA; botnet; TF-IDF; machine learning; deep learning

向作者/读者索取更多资源

This paper addresses the detection of domain names generated by domain name generation algorithms (DGAs) using machine learning and deep learning. The authors propose the use of TF-IDF to measure the frequencies of relevant n-grams in domain names and utilize them as features in learning algorithms. Experimental results show that a deep MLP model achieves the best performance, with an AUC of 0.995 and an average F1-score of 0.891.
Botnets often apply domain name generation algorithms (DGAs) to evade detection by generating large numbers of pseudo-random domain names of which only few are registered by cybercriminals. In this paper, we address how DGA-generated domain names can be detected by means of machine learning and deep learning. We first present an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting DGA-generated domain names. We observe that a common methodology is still missing, and the use of different datasets causes that experimental results can hardly be compared. We next propose the use of TF-IDF to measure frequencies of the most relevant n-grams in domain names, and use these as features in learning algorithms. We perform experiments with various machine-learning and deep-learning models using TF-IDF features, of which a deep MLP model yields the best results. For comparison, we also apply an LSTM model with embedding layer to convert domain names from a sequence of characters into a vector representation. The performance of our LSTM and MLP models is rather similar, achieving 0.994 and 0.995 AUC, and average F1-scores of 0.907 and 0.891 respectively.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据