4.7 Article

MiFI-Outlier: Minimal infrequent itemset-based outlier detection approach on uncertain data stream

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 191, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2019.105268

Keywords

Outlier detection; Minimal infrequent itemset mining; Uncertain data stream; Deviation indices; Data mining

Funding

  1. Chinese Universities Scientific Fund [2017XD001]
  2. Fundamental Research Funds for the Central Universities, China [2018XD004]

Ask authors/readers for more resources

Massive outlier detection approaches have been proposed for static datasets in the past twenty years, and they have acquired good achievements. In real life, uncertain data stream is more and more common, but most existing outlier detection approaches were not suitable for uncertain data stream environment. In addition, many outlier detection approaches have not considered the appearing frequency of each element, which resulted the detected outliers not coincide with the definition of outlier. Itemset-based outlier detection approaches provided a good solution for this problem, and they have got more attentions in these years. In this paper, a novel two-step minimal infrequent itemset-based outlier detection approach called MiFI-Outlier is proposed to effectively detect the outliers from uncertain data stream. In itemset mining phase, a matrix-based method called MiFIUDSM is proposed to mine the minimal infrequent itemsets (Mins) from uncertain data stream, and then an improved approach called MiFI-UDSM* is proposed for more effectively mining these minimal infrequent itemsets using the ideas of item cap and support cap. In outlier detection phase, based on the mined MiFIs, three deviation indices including minimal infrequent itemset deviation index (MiFIDI), similarity deviation index (SDI) and transaction deviation index (TDI) are defined to measure the deviation degree of each transaction, and then the MiFI-Outlier is used to identify the outliers from uncertain data stream. Several experimental studies are conducted on public datasets and synthetic datasets, and the results show that the proposed approaches outperform in infrequent itemset mining phase and outlier detection phase. (C) 2019 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available