但是这个变量重要性到底怎么算的呢?
http://scikit-learn.org/stable/modules/tree.html中介绍了gini和信息熵的计算

http://stackoverflow.com/questions/15810339/how-are-feature-importances-in-randomforestclassifier-determined 中提到这些

通过上述描述,找到这个http://papers.nips.cc/paper/4928-understanding-variable-importances-in-forests-of-randomized-trees.pdf

这里给出了例子http://stats.stackexchange.com/questions/92419/relative-importance-of-a-set-of-predictors-in-a-random-forests-classification-in

这下终于搞明白了
再看一个例子
http://blog.datadive.net/selecting-good-features-part-iii-random-forests/
No comments:
Post a Comment