☆ 4.7 Article

Algorithmically generated malicious domain names detection based on n-grams features

EXPERT SYSTEMS WITH APPLICATIONS (2021)

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Volume 170, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2020.114551

Keywords

Domain generation algorithm; Botnet; Machine learning; DNS query; Kullback-Leibner divergence; Jaccard Index

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper presents a methodology for detecting DGA generated domain names using supervised machine learning, which achieves good accuracy and effectiveness in classifying previously unseen domains. The approach is based on lexical features and outperforms some state-of-the-art featureless classification methods based on deep learning.

Botnets are one of the major cyber infections used in several criminal activities. In most botnets, a Domain Generation Algorithm (DGA) is used by bots to make DNS queries aimed at establishing the connection with the Command and Control (C&C) server. The identification of such queries by monitoring the network DNS traffic is then crucial for bot detection. In this paper we present a methodology to detect DGA generated domain names based on a supervised machine learning process, trained with a dataset of known benign and malicious domain names. The proposed approach represents the domain names through a set of features which express the similarity between the 2-grams and 3-grams in a single unclassified domain name and those in domain names known as malicious or benign. We used the Kullback-Leibner divergence and the Jaccard Index to estimate the similarity, and we tested different machine learning algorithms to classify each domain name as benign or DGA-based (with both binary and multi-class approach). The results of our experiments demonstrate that the proposed methodology, which only exploits lexical features of domain names, attains a good level of accuracy and results in a general model able to classify previously unseen domains in an effective way. It is also able to outperform some of the state-of-the-art featur eless classification methods based on deep learning.

Algorithmically generated malicious domain names detection based on n-grams features

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Algorithmically generated malicious domain names detection based on n-grams features

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper