☆ 4.5 Article

Vulcan: Automatic extraction and analysis of cyber threat intelligence from unstructured text

COMPUTERS & SECURITY (2022)

Journal

COMPUTERS & SECURITY

Volume 120, Issue -, Pages -

Publisher

ELSEVIER ADVANCED TECHNOLOGY

DOI: 10.1016/j.cose.2022.102763

Keywords

Cyber threat intelligence; CTI; Cybersecurity; Information extraction; Language model

Funding

Engineering Research Center Program through the National Research Foundation of Korea (NRF) - Korean Government MSIT [NRF-2018R1A5A1059921]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

To address the evolving cyber threats, researchers have developed CTI systems to extract intelligence from publicly available sources. However, the reliance on indicators of compromise (IOC) has limited their ability to understand and detect threats. In this study, the authors propose Vulcan, a novel CTI system that extracts descriptive or static CTI data from unstructured text and determines their semantic relationships. Experimental results show high accuracy, and Vulcan enables the development of threat analysis applications.

To counteract the rapidly evolving cyber threats, many research effort s have been made to design cyber threat intelligence (CTI) systems that extract CTI data from publicly available sources. Specifically, indicators of compromise (IOC), such as file hash and IP address, receives the most attention among security researchers. However, the ability of IOC-centric CTI systems to understand and detect threats remains questionable for two reasons. First, IOCs are forensic artifacts that indicate that an endpoint or network has been compromised. They cannot depict the technical details of threats. Second, attackers frequently change infrastructure and static indicators, which makes IOCs have a very short lifespan. Therefore, when designing a CTI system, we should turn our attention to other types of CTI data that are helpful in threat understanding and detection (e.g., attack vector, tool). In this work, we propose Vulcan, a novel CTI system that extracts descriptive or static CTI data from unstructured text and determines their semantic relationships. To do this, we design a neural language model-based named entity recognition (NER) and relation extraction (RE) models tailored for cybersecurity domain. The experimental results confirm that Vulcan is highly accurate with an average F 1 -score of 0.972 and 0.985 for NER and RE tasks, respectively. Vulcan also provides an environment where security practitioners can develop applications for threat analysis. To prove the applicability of Vulcan, we introduce two applications, evolution identification and threat profiling. The applications save time and labor costs to analyze cyber threats and show the detailed characteristics of the threats. (c) 2022 Elsevier Ltd. All rights reserved.

Vulcan: Automatic extraction and analysis of cyber threat intelligence from unstructured text

Journal

COMPUTERS & SECURITY

Publisher

ELSEVIER ADVANCED TECHNOLOGY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Vulcan: Automatic extraction and analysis of cyber threat intelligence from unstructured text

Journal

COMPUTERS & SECURITY

Publisher

ELSEVIER ADVANCED TECHNOLOGY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper