4.6 Article

CUPID: A labeled dataset with Pentesting for evaluation of network intrusion detection

Journal

JOURNAL OF SYSTEMS ARCHITECTURE
Volume 129, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.sysarc.2022.102621

Keywords

Network intrusion detection; Feature selection

Funding

  1. National Science Foundation, United States grant [OAC-2001789, OAC-1920462, OAC-2115134]
  2. Silicon Valley Foundation
  3. Cisco Research Center
  4. Colorado State Bill [18-086]

Ask authors/readers for more resources

This article introduces the CUPID dataset, which aims to address the limitations of existing datasets in network intrusion detection research. The CUPID dataset includes human-generated traffic with accurate labels, providing a valuable resource for training and testing machine learning algorithms used in network intrusion detection systems.
Reproducibility of network intrusion detection research necessitates widely available datasets that represent real-world scenarios. One of the key omissions of existing datasets used in empirical evaluations of network intrusions is the lack of human-generated traffic with accurate labels to distinguish benign and malicious behavior. Using an emulated network environment with a vulnerable web application, we collected baseline traffic, human-generated normal user traffic, automated attacks, and the attacks of ten human penetration testers of varying abilities. We preprocessed this collected data to produce a new dataset named the Colorado University Pentesting Intrusion Dataset (CUPID). The attacks span from reconnaissance activities to delivery of an exploit payload. To our knowledge, this is the first collection that provides labeled, Institutional Review Board-approved, benign and attacker data that is publicly available. The CUPID dataset can be used to train and test the limits of classification-based machine learning algorithms used for network intrusion detection systems.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available