4.6 Article

Conceptual clustering of heterogeneous gene expression sequences

Journal

ARTIFICIAL INTELLIGENCE REVIEW
Volume 20, Issue 1-2, Pages 53-73

Publisher

KLUWER ACADEMIC PUBL
DOI: 10.1023/A:1026036631075

Keywords

bioinformatics; clustering; knowledge discovery; schema mapping; sequence processing

Ask authors/readers for more resources

We are concerned with clustering and characterising gene expression sequences that have been classified according to heterogeneous classification schemes. We adopt a model-based approach that uses a Hidden Markov Model (HMM) that has as states the stages of the underlying process that generates the gene sequences, thus allowing us to handle complex and heterogeneous data. Each cluster is described in terms of a HMM where we seek to find schema mappings between the states of the original sequences and the states of the HMM. The general solution that we propose involves several distinct tasks. Firstly, there is a clustering problem where we seek to group similar sequences; for this we use mutual entropy to identify associations between sequence states. Secondly, because we are concerned with clustering heterogeneous sequences, we must determine the mappings between the states of each sequence in a cluster and the states of an underlying hidden process; for this we compute the most probable mapping. Thirdly, using these mappings we employ maximum likelihood techniques to learn the probabilistic description of the hidden Markov process for each cluster. Fourthly, we use these descriptions to characterise the clusters using Dynamic Programming to determine the most probable pathway for each cluster. Finally, we derive linguistic labels to describe the clusters in a user-friendly manner. Such an approach provides an intuitive way of describing the underlying shape of the process by explicitly modelling the temporal aspects of the data. Non time-homogeneous HMMs are used to capture the full temporal semantics.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available