特征选择

文章目录
  1. 1. 特征重要性

feature selection

  • Filter Methods: assess the relevance of features by looking only at the intrinsic properties of the data.

    •Principal Component Analysis

    •Independent Component Analysis

  • Wrapper Methods: embed the model hypothesis search within the feature subset search.

    •Deterministic

    •Randomized

  • Embedded Methods: Features relevance is assessed using the learning classifier.

    •The MDA error estimation embedded in the RF classifier

    •The feature selection algorithm embedded in Support Vector Machines

特征重要性

可基于随机森林算法预测特征重要性

  • permutation importance: 将特征m打乱,每棵树再预测OOB样本,打乱前后预测正确的样本数量值会有一个Δ。Δ在所有树中的均值作为特征m的重要性
  • gini importance:每颗树在特征m处分裂所带来的gini decrease