Feature Selection
Feature selection is a process where you automatically select those features in your data that contribute most to the prediction variable or output in which you are interested. Having irrelevant features in your data can decrease the accuracy of many models, especially linear algorithms like linear and logistic regression. Three benefits of performing feature selection before modeling your data are:
- Reduces Overfitting: Less redundant data means less opportunity to make decisions based on noise.
- Improves Accuracy: Less misleading data means modeling accuracy improves.
- Reduces Training Time: Less data means that algorithms train faster.
Here is the comparison between the three feature selection methods:
Filter Methods | Wrapper Methods | Embedded Methods |
---|---|---|
Generic set of methods which do not incorporate a specific machine learning algorithm. | Evaluates on a specific machine learning algorithm to find optimal features. | Embeds (fix) features during model building process. Feature selection is done by observing each iteration of model training phase. |
Much faster compared to Wrapper methods in terms of time complexity | High computation time for a dataset with many features | Sits between Filter methods and Wrapper methods in terms of time complexity |
Less prone to over-fitting | High chances of over-fitting because it involves training of machine learning models with different combination of features | Generally used to reduce over-fitting by penalizing the coefficients of a model being too large. |
Examples: Correlation, Chi-Square test, ANOVA, Information gain etc | Examples - Fomard Selection, Backward elimination, Bi-directional elimination(Stepwise Selection) etc. | Examples- LASSO, Elastic Net, Ridge Regression etc. |
Note: Examples use the Pima Indians onset of diabetes dataset to demonstrate the feature selection method. This is a binary classification problem where all of the attributes are numeric.