towardsdatascience.com/feature-selection-with-pandas-e3690ad8504b
1 Users
0 Comments
7 Highlights
0 Notes
Tags
Top Highlights
predicting the “MEDV” column
The filtering here is done using correlation matrix and it is most commonly done using Pearson correlation.
has correlation of above 0.5 (taking absolute value) with the output variable.
If these variables are correlated with each other, then we need to keep only one of them and drop the rest.
From the above code, it is seen that the variables RM and LSTAT are highly correlated with each other (-0.613808). Hence we would keep only one variable and drop the other. We will keep LSTAT since its correlation with MEDV is higher than that of RM.
Filter method is less accurate. It is great while doing EDA, it can also be used for checking multi co-linearity in data.
Wrapper and Embedded methods give more accurate results but as they are computationally expensive, these method are suited when you have lesser features (~20).
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.