4.5 Article

BSig: evaluating the statistical significance of biclustering solutions

Journal

DATA MINING AND KNOWLEDGE DISCOVERY
Volume 32, Issue 1, Pages 124-161

Publisher

SPRINGER
DOI: 10.1007/s10618-017-0521-2

Keywords

Biclustering; Statistical significance; Pattern mining

Funding

  1. FCT [PTDC/EEI-SII/1937/2014, SFRH/BD/75924/2011, UID/CEC/50021/2013, UID/CEC/00408/2013]
  2. Fundação para a Ciência e a Tecnologia [SFRH/BD/75924/2011, PTDC/EEI-SII/1937/2014] Funding Source: FCT

Ask authors/readers for more resources

Statistical evaluation of biclustering solutions is essential to guarantee the absence of spurious relations and to validate the high number of scientific statements inferred from unsupervised data analysis without a proper statistical ground. Most biclustering methods rely on merit functions to discover biclusters with specific homogeneity criteria. However, strong homogeneity does not guarantee the statistical significance of biclustering solutions. Furthermore, although some biclustering methods test the statistical significance of specific types of biclusters, there are no methods to assess the significance of flexible biclustering models. This work proposes a method to evaluate the statistical significance of biclustering solutions. It integrates state-of-the-art statistical views on the significance of local patterns and extends them with new principles to assess the significance of biclusters with additive, multiplicative, symmetric, order-preserving and plaid coherencies. The proposed statistical tests provide the unprecedented possibility to minimize the number of false positive biclusters without incurring on false negatives, and to compare state-of-the-art biclustering algorithms according to the statistical significance of their outputs. Results on synthetic and real data support the soundness and relevance of the proposed contributions, and stress the need to combine significance and homogeneity criteria to guide the search for biclusters.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available