文章目录
feature selection
Filter Methods: assess the relevance of features by looking only at the intrinsic properties of the data.
•Principal Component Analysis
•Independent Component Analysis
Wrapper Methods: embed the model hypothesis search within the feature subset search.
•Deterministic
•Randomized
Embedded Methods: Features relevance is assessed using the learning classifier.
•The MDA error estimation embedded in the RF classifier
•The feature selection algorithm embedded in Support Vector Machines
特征重要性
可基于随机森林算法预测特征重要性
- permutation importance: 将特征m打乱,每棵树再预测OOB样本,打乱前后预测正确的样本数量值会有一个Δ。Δ在所有树中的均值作为特征m的重要性
- gini importance:每颗树在特征m处分裂所带来的gini decrease