一种基于聚类约简决策树的改进随机森林算法
An improved random forest algorithm based on decision trees clustering reduction
  
DOI:
中文关键词:  随机森林;分类精度;相似性;聚类
英文关键词:random forest;classification accuracy;similarity;clustering
基金项目:
作者单位
王诚 南京邮电大学 通信与信息工程学院,江苏南京210003 
王凯 南京邮电大学 通信与信息工程学院,江苏南京210003 
摘要点击次数: 970
全文下载次数: 1109
中文摘要:
      传统随机森林算法为了提高分类准确率,常常需要构建大量的决策树模型。由于训练数据集的复杂性以及传统随机森林在构建过程中引入的随机性,算法在训练过程中会生成部分分类性能差和相似度较高的决策树,影响模型的整体分类性能。针对这个问题,提出一种基于决策树聚类的改进随机森林算法(Trees Clustering Random Forest,TCRF),从分类精度和相似性角度去除不合格的决策树,根据AUC值从原始森林中提取出相对高精度子森林,利用基于Kappa统计量的距离度量方法对子森林聚类,从划分出的簇中选取具有代表性的树组成高精度低相似的森林。实验结果证明:改进后的算法在集成准确率以及分类效率上要高于传统随机森林算法。
英文摘要:
      Numerous decision tree models are established to improve classification accuracy of traditional random forest algorithm.Due to the complexity of the training data set and the randomness introduced by traditional random forest,many decision trees with poor classification performance and high similarity are generated during the training process,which affect the ensemble classification performance of the model.In order to solve this problem,an improved random forest algorithm (Trees Clustering Random Forest,TCRF) using clustering to decrease decision trees is proposed to remove the unqualified decision tree from the point of classification accuracy and similarity.According to the AUC value,the relatively high accuracy sub forest is extracted from the original forest,and the sub forest is clustered by using the distance measurement method based on Kappa statistics,the representative trees are selected from the divided clusters to form a forest with high accuracy and low similarity.The experimental results show that the improved algorithm is higher than the traditional random forest algorithm in ensemble accuracy and classification efficiency.
查看全文  查看/发表评论  下载PDF阅读器

你是第3768298访问者
版权所有《南京邮电大学学报(自然科学版)》编辑部
Tel:86-25-85866913 E-mail:xb@njupt.edu.cn
技术支持:本系统由北京勤云科技发展有限公司设计