☆ 3.8 Proceedings Paper

Text Mining in Unclean, Noisy or Scrambled Datasets for Digital Forensics Analytics

2017 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC) (2017)

期刊

2017 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC)

卷 -, 期 -, 页码 76-83

出版社

IEEE

DOI: 10.1109/EISIC.2017.19

关键词

text mining; digital forensics; email analysis; ARPaD; LERP-RSA; pattern detection

类别

Computer Science, Information Systems Computer Science, Theory & Methods Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In our era, most of the communication between people is realized in the form of electronic messages and especially through smart mobile devices. As such, the written text exchanged suffers from bad use of punctuation, misspelling words, continuous chunk of several words without spaces, tables, internet addresses etc. which make traditional text analytics methods difficult or impossible to be applied without serious effort to clean the dataset. Our proposed method in this paper can work in massive noisy and scrambled texts with minimal preprocessing by removing special characters and spaces in order to create a continuous string and detect all the repeated patterns very efficiently using the Longest Expected Repeated Pattern Reduced Suffix Array (LERP-RSA) data structure and a variant of All Repeated Patterns Detection (ARPaD) algorithm. Meta-analyses of the results can further assist a digital forensics investigator to detect important information to the chunk of text analyzed.

Text Mining in Unclean, Noisy or Scrambled Datasets for Digital Forensics Analytics

期刊

2017 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC)

出版社

IEEE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Text Mining in Unclean, Noisy or Scrambled Datasets for Digital Forensics Analytics

期刊

2017 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC)

出版社

IEEE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文