Purpose : Improve the accuracy of a classifier by using minimum number of features. Multiple feature selection techniques are proposed which helping to find out the most important features. In this paper, feature selection methods Co-relation based feature Selection, Wrapper method and Information Gain are used, before applying supervised learning based classification techniques to show the impact of implementing them by analyzing and comparing performance of different classification methods used in this experiment.
In this paper, Na??ve Bayes (NB), J48 and Support Vector Machine (SVM) classifiers are used for the classification of Glass Identification data sets. The Glass Identification data set is intended towards the study of criminological investigation. It includes 11 attributes and is a multiclass type of data set. It has seven different classes according to type of glass .
The overall flow of the experiment is as shown in figure 1. First, the classification results are noted without doing any kind of feature selection techniques on data sets. Then, using three feature selection techniques (Co-relation based Feature Selection, Wrapper, and Information Gain), different feature subsets are passed to each classifier then results are noted.
Figure 1: System Architecture
Using the methodology discussed earlier, experiment is performed on glass data sets without feature selection. Analysis of results is done using following evaluation metrics.
There are various parameters to measure the performance, among them only Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) are considered in this paper. Their formulas as shown below and variables values will be taken from the confusion matrix.
Without Feature Selection
Glass data set is used for training three different classifiers which are Naive Bayes (NB), J48 and Support Vector Machine (SVM). The obtained results are as shown in Table 1. Results are compared based on the above mentioned evaluation measures.
Table 1 : Performance of classifiers for Glass data set
Result : SVM is showing more accuracy than other two. To improve the results feature selection techniques are applied glass data sets.
With Feature Selection
Here, results after applying feature selection techniques are compared using three different classifiers. The data sets are undergone through CFS, Wrapper and IG feature selection techniques
Figure 2-4 provides results on glass data set after applying FS.
Figure 2:Accuracy of classifier using FS techniques for Glass data set
Figure 3: TPR of classifier using FS techniques for Glass data set.
Figure 4:FPR of classifier using FS techniques for Glass data set
Result : From the figures it is observed that SVM is giving higher TPR and accuracy. This means more number of attributes are correctly classified with minimum misclassification
For glass data set feature selected are as shown in Table 2. Observation from table includes that wrapper method is selecting less no. of features as well as improving accuracy and TPR of all three classifiers
Table 2:Selected features for Glass data set
In this paper, impact of feature selection on supervised learning based classifiers is compared. Accuracy, TPR and FPR are used as an evaluation metric for comparison. From the experimental results it has been observed that Information Gain and Wrapper method improves accuracy and True Positive Rate and minimizes False Positive Rate.