☆ 3.9 Article

Clustering digital forensic string search output

DIGITAL INVESTIGATION (2014)

Journal

DIGITAL INVESTIGATION

Volume 11, Issue 4, Pages 314-322

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.diin.2014.10.002

Keywords

Digital forensics; Text string search; Clustering; k-means; SOM; LDA

Funding

Naval Postgraduate School [N00244-11-1-0011]
Naval Supply Systems Command (NAVSUP) Fleet Logistics Center San Diego (NAVSUP FLC San Diego)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

This research comparatively evaluates four competing clustering algorithms for thematically clustering digital forensic text string search output. It does so in a more realistic context, respecting data size and heterogeneity, than has been researched in the past. In this study, we used physical-level text string search output, consisting of over two million search hits found in nearly 50,000 allocated files and unallocated blocks. Holding the data set constant, we comparatively evaluated k-Means, Kohonen SOM, Latent Dirichlet Allocation (LDA) followed by k-Means, and LDA followed by SOM. This enables true cross-algorithm evaluation, whereas past studies evaluated singular algorithms using unique, non-reproducible datasets. Our research shows an LDA + k-Means using a linear, centroid-based user navigation procedure produces optimal results. The winning approach increased information retrieval effectiveness, from the baseline random walk absolute precision rate of 0.04, to an average precision rate of 0.67. We also explored a variety of algorithms for user navigation of search hit results, finding that the performance of k-means clustering can be greatly improved with a non-linear, non-centroid-based cluster and document navigation procedure, which has potential implications for digital forensic tools and use thereof, particularly given the popularity and speed of k-means clustering. (C) 2014 Elsevier Ltd. All rights reserved.

Clustering digital forensic string search output

Journal

DIGITAL INVESTIGATION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Clustering digital forensic string search output

Journal

DIGITAL INVESTIGATION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper