4.4 Article

SeqMask: Behavior Extraction Over Cyber Threat Intelligence Via Multi-Instance Learning

Journal

COMPUTER JOURNAL
Volume -, Issue -, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/comjnl/bxac172

Keywords

Cyber Threat Intelligence; Behavior Analysis; Information Extraction; Tactics; Techniques and Procedures (TTPs); Multi-Instance Learning

Funding

  1. National Key Research and Development Program [2019QY1400]
  2. National Natural Science Foundation of China [U2133208, 62101368]
  3. Sichuan Youth Science and Technology Innovation Team [2022JDTD0014]
  4. Basic Research Program of China [2020-JCJQ-ZD-021]

Ask authors/readers for more resources

This paper introduces a multi-instance learning approach called SeqMask for extracting TTPs and behavior keywords from Cyber Threat Intelligence, as well as predicting and verifying the validity of TTPs labels.
Identification and extraction of Tactics, Techniques and Procedures (TTPs) for Cyber Threat Intelligence (CTI) restore the full picture of cyber attacks and guide the analysts to assess the system risk. Existing frameworks can hardly provide uniform and complete processing mechanisms for TTPs information extraction without adequate knowledge background. A multi-instance learning approach named SeqMask is proposed in this paper as a solution. SeqMask extracts behavior keywords from CTI evaluated by the semantic impact, and predicts TTPs labels by conditional probabilities. Still, the framework has two mechanisms to determine the validity of keywords. One using expert experience verification. The other verifies the distortion of the classification effect by blocking existing keywords. In the experiments, SeqMask reached 86.07% and 73.99% in F1 scores for TTPs classifications. For the top 20% of keywords, the expert approval rating is 92.20%, where the average repetition of keywords whose scores between 100% and 90% is 60.02%. Particularly, when the top 65% of the keywords were blocked, the F1 decreased to about 50%; when removing the top 50%, the F1 was under 31%. Further, we also validate the possibility of extracting TTPs from full-size CTI and malware whose F1 are improved by 2.16% and 0.81%.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available