4.7 Article

Relevance-diversity algorithm for feature selection and modified Bayes for prediction

Journal

ALEXANDRIA ENGINEERING JOURNAL
Volume 66, Issue -, Pages 329-342

Publisher

ELSEVIER
DOI: 10.1016/j.aej.2022.11.002

Keywords

Naive Bayes; Feature Selection; Relevance; Attributes Selection; Classification

Ask authors/readers for more resources

Big data analytics uncovers hidden patterns through classification, prediction and reinforcement of big datasets. Relevant, important and informative features are selected using different filtration techniques. A new feature selection technique called Relevance-diversity algorithm and a new supervised classification algorithm based on Naive Bayes classification are proposed. The performance of these techniques is evaluated using various datasets, and the results show improvements in terms of feature selection, accuracy, and time complexity.
Big data analytics uncovers hidden patterns through classification, prediction and rein-forcement of big datasets. In these datasets, some features have a negligible connection with other features and some may be insignificant as their presence does not impact the results of big data ana-lytics. The algorithms of big data analytics generate better classification models when supplied with a dataset consisting of relevant, important and informative features. These features can be classified as important and unimportant. For the selection of important features, different filtrations tech-niques are used. These techniques filter features on different basis like information gain, informa-tion dispersion, Gini index, etc. and have a few drawbacks reviewed in this paper. The first contribution of this paper is to propose a new feature selection technique named Relevance -diversity algorithm for selecting important features based on two measures i.e. relevance and diver-sity for optimizing features as low as possible and reducing the search time used in feature selection. The second contribution of the paper is that it proposes a new supervised classification algorithm based on Naive Bayes classification. The assumption of naive i.e. feature independence is discarded from the algorithm of Naive Bayes classification. The features are considered to be dependent on each other and their combined impact on the class value is evaluated. The newly proposed classi-fication algorithm is then applied to the features selected through the relevance-diversity based fea-ture selection technique. The datasets of Weather, Tic-Tac-Toe, Lenses, Balance-scale and CarEval are used for the evaluation of both the techniques. The results of the proposed feature selection method are compared with the existing methods and the results of Modified-Bayes are compared with the existing Naive Bayes algorithm. Analysis revealed that the proposed method performed better in terms of the number of features, accuracy and time complexity.(c) 2022 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Engineering, Alexandria University. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/ 4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available