3.8 Proceedings Paper

A crawler architecture for harvesting the clear, social, and dark web for IoT-related cyber-threat intelligence

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/SERVICES.2019.00016

Keywords

IoT; cyber-security; cyber-threat intelligence; crawling architecture; machine learning; language models

Funding

  1. European Union [786698]

Ask authors/readers for more resources

The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that given the appropriate tools and methods may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available