4.5 Article

Vulcan: Automatic extraction and analysis of cyber threat intelligence from unstructured text

Journal

COMPUTERS & SECURITY
Volume 120, Issue -, Pages -

Publisher

ELSEVIER ADVANCED TECHNOLOGY
DOI: 10.1016/j.cose.2022.102763

Keywords

Cyber threat intelligence; CTI; Cybersecurity; Information extraction; Language model

Funding

  1. Engineering Research Center Program through the National Research Foundation of Korea (NRF) - Korean Government MSIT [NRF-2018R1A5A1059921]

Ask authors/readers for more resources

To address the evolving cyber threats, researchers have developed CTI systems to extract intelligence from publicly available sources. However, the reliance on indicators of compromise (IOC) has limited their ability to understand and detect threats. In this study, the authors propose Vulcan, a novel CTI system that extracts descriptive or static CTI data from unstructured text and determines their semantic relationships. Experimental results show high accuracy, and Vulcan enables the development of threat analysis applications.
To counteract the rapidly evolving cyber threats, many research effort s have been made to design cyber threat intelligence (CTI) systems that extract CTI data from publicly available sources. Specifically, indicators of compromise (IOC), such as file hash and IP address, receives the most attention among security researchers. However, the ability of IOC-centric CTI systems to understand and detect threats remains questionable for two reasons. First, IOCs are forensic artifacts that indicate that an endpoint or network has been compromised. They cannot depict the technical details of threats. Second, attackers frequently change infrastructure and static indicators, which makes IOCs have a very short lifespan. Therefore, when designing a CTI system, we should turn our attention to other types of CTI data that are helpful in threat understanding and detection (e.g., attack vector, tool). In this work, we propose Vulcan, a novel CTI system that extracts descriptive or static CTI data from unstructured text and determines their semantic relationships. To do this, we design a neural language model-based named entity recognition (NER) and relation extraction (RE) models tailored for cybersecurity domain. The experimental results confirm that Vulcan is highly accurate with an average F 1 -score of 0.972 and 0.985 for NER and RE tasks, respectively. Vulcan also provides an environment where security practitioners can develop applications for threat analysis. To prove the applicability of Vulcan, we introduce two applications, evolution identification and threat profiling. The applications save time and labor costs to analyze cyber threats and show the detailed characteristics of the threats. (c) 2022 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available