3.8 Article

Genetic algorithm guided selection: Variable selection and subset selection

Ask authors/readers for more resources

A novel Genetic Algorithm guided Selection method, GAS, has been described. The method utilizes a simple encoding scheme which can represent both compounds and variables used to construct a QSAR/QSPR model. A genetic algorithm is then utilized to simultaneously optimize the encoded variables that include both descriptors and compound subsets. The GAS method generates multiple models each applying to a subset of the compounds. Typically the subsets represent clusters with different chemotypes. Also a procedure based on molecular similarity is presented to determine which model should be applied to a given test set compound. The variable selection method implemented in GAS has been tested and compared using the Selwood data set (n = 31 compounds; v = 53 descriptors). The results showed that the method is comparable to other published methods. The subset selection method implemented in GAS has been first tested using an artificial data set (n = 100 points; v = 1 descriptor) to examine its ability to subset data points and second applied to analyze the XLOGP data set (n = 1831 compounds; v = 126 descriptors). The method is able to correctly identify artificial data points belonging to various subsets. The analysis of the XLOGP data set shows that the subset selection method can be useful in improving a QSAR/QSPR model when the variable selection method fails.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available