一、集成学习综述

  • 1.集成方法或元算法是对其他算法进行组合的一种方式,下面的博客中主要关注的是AdaBoost元算法。将不同的分类器组合起来,而这种组合结果被称为集成方法/元算法。使用集成算法时会有很多的形式,如:

    • 不同算法的集成
    • 同一种算法在不同设置下的集成
    • 数据集不同部分分配给不同分类器之后的集成
  • 2.AdaBoost算法优缺点
    • 优点:泛化错误率低,易编码,可以应用在大部分分类器上,无参数调整
    • 缺点:对离群点敏感
    • 适用数据类型:数值型和标称型数据

二、基于数据集多重采样的分类器

  • 1.bagging方法(bootstrap aggregating)

    • 主要思想:

      • (1). 从原始数据集中抽取新的训练集。每次从原始数据集中使用有放回采样数据的方法,抽取n个样本(在原始数据集中,有些样本可能被重复采样,而有些样本可能一次都未被采样到)。共进行k次抽取,得到k个新的数据集(k个新训练集之间是相互独立的),新的数据集的大小和原始数据集的大小相等。
      • (2). 每次使用一个新的训练集得到一个模型,k个新的训练集总共可以得到k个新的模型
      • (3). 对分类问题:将(2)中得到的k个模型采用投票方式得到分类结果;对回归问题:计算(2)中模型的均值作为最后的结果(所有模型的重要性相同!!!)
  • 2.boosting方法
    • 不论是在bagging还是boosting当中,所使用的多个分类器的类型都是一样的。但是,在boosting中,不同的分类器通过串行训练来获得的,每个新分类器都根据已训练出的分类器的性能来进行训练。boosting是通过关注被已有分类器错分的那些数据来获得新的分类器,,boosting方法有多个版本,下面介绍的是最流行的一个版本AdaBoosting算法。
    • 主要思想:
      • (1). 对每一次的训练数据样本赋予一个权重,并且每一次样本的权重分布依赖上一次的分类结果。
      • (2). 基分类器之间采用序列的线性加权方式来组合。

三、bagging方法与boosting方法对比

  • 1.样本选择上:

    • bagging方法:新的训练集是在原始训练集中采用有放回的方式采样样本的,从原始训练集中选取的每个新的训练集之间是相互独立的。
    • boosting方法:每一次的训练集不变,只是训练集中的每个样本在分类器中的权重发生变化,而权重是根据上一次的分类结果进行调整的。
  • 2.样本权重上:
    • bagging方法:使用均匀选取样本,每个样本的权重相同。
    • boosting方法:根据错误率不断调整样本的权重,错误率越大,权重越大。
  • 3.预测函数:
    • bagging方法:所有预测函数的权重相等
    • boosting方法:每个弱分类器都有相应的权重,对分类误差小的分类器会有更大的权重
  • 4.并行计算:
    • bagging方法:各个预测函数可以并行计算
    • boosting方法:各个预测函数只能顺序生成,因为后一个模型需要前一个模型的输出结果

四、集成学习的常见应用

  • 1.常见算法

    • Bagging + 决策树 = 随机森林
    • AdaBoost + 决策树 = 提升树
    • Gradient Boosting + 决策树 = GBDT
  • 2.基于错误率提升分类器的性能(AdaBoost算法原理介绍)
    • 2.1 AdaBoost算法介绍
      集成学习算法思想:使用弱分类器和多个样本来构建一个强分类器。AdaBoost是adaptive boosting的缩写,主要运行过程是:首先,对训练数据集中的每个样本进行训练,并赋予每个样本一个权重,这些权重构成一个向量D。一开始,这些样本的初始权重都是相同的!然后,在训练数据上训练出一个弱分类器并计算弱分类器的错误率。接着,在相同的训练数据上再次训练弱分类器。在分类器的第二次训练过程中,将会重新调整每个样本的权重!其中对第一次中分对的样本降低其权重,在第一次中分错的样本提高其权重。为了从所有弱分类器中得到最终的分类结果,AdaBoost还会对每个弱分类器都分配一个权重值alpha,这些alpha值是基于每个弱分类器的错误率进行计算出来的。
    • 2.2 错误率ε的定义如下:
    • 2.3 alpha的计算公式
    • 2.4 AdaBoost算法流程如下
    • 2.5 对上图的解释如下:
      • 首先,对训练数据集中的每个样本进行初始化权重,此时每个样本的权重是相同的,这些权重构成了权重向量D;然后,经过第一个弱分类器后,训练集中每个样本的权重发生变化,根据第一个弱分类器的分类结果计算其错误率ε;接着,计算出alpha的值;计算出aplha值之后,可以对权重向量D进行更新,使得对第一个分类器分类结果中分类错误的样本,提高其权重。对分类正确的样本,降低其权重。权重向量D的更新方法如下:

        • 2.5.1 如果某个样本被第一个弱分类器分类正确,那么该样本的权重更新公式是:
        • 2.5.2 如果某个样本被第一个弱分类器分类错误,那么该样本的权重更新公式是:
      • 在计算出D后,AdaBoost又开始进行下一轮的迭代,AdaBoost算法会不断的重复训练和调整权重,直到训练错误率为0或弱分类器的数目达到用户的指定值为止。
    1. AdaBoost算法实战(基于单层决策树构建弱分类器)
  • 3.1 从上图可以看出,试着从某个坐标轴上选择一个值(即选择一条与坐标轴平行的直线)来将所有的蓝色圆点和橘色圆点分开,这显然是不可能的。这就是单层决策树难以处理的一个著名问题。通过使用多颗单层决策树,我们可以构建出一个能够对该数据集完全正确分类的分类器。
  #################数据集的可视化#####################def loadSimData():"""创建单层决策树的数据集"""dataMat = np.matrix([[1., 2.1],[1.5, 1.6],[1.3, 1.],[1., 1.],[2., 1.]])classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]return dataMat, classLabelsdef showDataSet(dataMat, labelMat):"""数据可视化"""data_plus = []  # 正样本data_minus = []  # 负样本for i in range(len(dataMat)):if labelMat[i] > 0:data_plus.append(dataMat[i])else:data_minus.append(dataMat[i])data_plus_np = np.array(data_plus)data_minus_np = np.array(data_minus)  # 转化成numpy中的数据类型plt.scatter(np.transpose(data_plus_np)[0], np.transpose(data_plus_np)[1])plt.scatter(np.transpose(data_minus_np)[0], np.transpose(data_minus_np)[1])plt.title("Dataset Visualize")plt.xlabel("x1")plt.ylabel("x2")plt.show()
if __name__ == '__main__':data_Arr, classLabels = loadSimData()showDataSet(data_Arr, classLabels)

  • 3.2 蓝色横线上的是一个类别,蓝色横线下边是一个类别。显然,此时有一个蓝点分类错误,计算此时的分类误差错误率为1/5 = 0.2。这个横线与坐标轴的y轴的交点,就是我们设置的阈值,通过不断改变阈值的大小,找到使单层决策树的分类误差最小的阈值。同理,竖线也是如此,找到最佳分类的阈值,就找到了最佳单层决策树。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date    : 2019-05-12 21:31:41
# @Author  : cdl (1217096231@qq.com)
# @Link    : https://github.com/cdlwhm1217096231/python3_spider
# @Version : $Id$import numpy as np
import matplotlib.pyplot as plt# 数据集可视化def loadSimpleData():dataMat = np.matrix([[1., 2.1],[1.5, 1.6],[1.3, 1.],[1., 1.],[2., 1.]])classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]return dataMat, classLabelsdef showDataSet(dataMat, labelMat):data_plus = []data_minus = []for i in range(len(dataMat)):if labelMat[i] > 0:data_plus.append(dataMat[i])else:data_minus.append(dataMat[i])data_plus_np = np.array(data_plus)data_minus_np = np.array(data_minus)plt.scatter(np.transpose(data_plus_np)[0], np.transpose(data_plus)[1])plt.scatter(np.transpose(data_minus_np)[0], np.transpose(data_minus)[1])plt.title("dataset visualize")plt.xlabel("x")plt.ylabel("y")plt.show()# 构建单层决策树分类函数
def stumpClassify(dataMat, dimen, threshval, threshIneq):"""dataMat:数据矩阵dimen:第dimen列,即第几个特征threshval:阈值threshIneq:标志返回值retArray:分类结果"""retArray = np.ones((np.shape(dataMat)[0], 1))  # 初始化retArray为1if threshIneq == "lt":retArray[dataMat[:, dimen] <= threshval] = -1.0   # 如果小于阈值,则赋值为-1else:retArray[dataMat[:, dimen] > threshval] = 1.0   # 如果大于阈值,则赋值为-1return retArray
# 找到数据集上最佳的单层决策树,单层决策树是指只考虑其中的一个特征,在该特征的基础上进行分类,寻找分类错误率最低的阈值即可。例如本文中的例子是,如果以第一列特征为基础,阈值选择1.3,并且设置>1.3的为-1,<1.3的为+1,这样就构造出了一个二分类器def buildStump(dataMat, classLabels, D):"""dataMat:数据矩阵classLabels:数据标签D:样本权重返回值是:bestStump:最佳单层决策树信息;minError:最小误差;bestClasEst:最佳的分类结果"""dataMat = np.matrix(dataMat)labelMat = np.matrix(classLabels).Tm, n = np.shape(dataMat)numSteps = 10.0bestStump = {}  # 存储最佳单层决策树信息的字典bestClasEst = np.mat(np.zeros((m, 1)))   # 最佳分类结果minError = float("inf")for i in range(n):  # 遍历所有特征rangeMin = dataMat[:, i].min()rangeMax = dataMat[:, i].max()stepSize = (rangeMax - rangeMin) / numSteps  # 计算步长for j in range(-1, int(numSteps) + 1):for inequal in ["lt", "gt"]:threshval = (rangeMin + float(j) * stepSize)  # 计算阈值predictVals = stumpClassify(dataMat, i, threshval, inequal)  # 计算分类结果errArr = np.mat(np.ones((m, 1)))  # 初始化误差矩阵errArr[predictVals == labelMat] = 0  # 分类完全正确,赋值为0# 基于权重向量D而不是其他错误计算指标来评价分类器的,不同的分类器计算方法不一样weightedError = D.T * errArr  # 计算弱分类器的分类错误率---这里没有采用常规方法来评价这个分类器的分类准确率,而是乘上了样本权重Dprint("split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % (i, threshval, inequal, weightedError))if weightedError < minError:minError = weightedErrorbestClasEst = predictVals.copy()bestStump["dim"] = ibestStump["thresh"] = threshvalbestStump["ineq"] = inequalreturn bestStump, minError, bestClasEst
  • 3.3 通过遍历,改变不同的阈值,计算最终的分类误差,找到分类误差最小的分类方式,即为我们要找的最佳单层决策树。这里lt表示less than,表示分类方式,对于小于阈值的样本点赋值为-1,gt表示greater than,也是表示分类方式,对于大于阈值的样本点赋值为-1。经过遍历,我们找到,训练好的最佳单层决策树的最小分类误差为0,就是对于该数据集,无论用什么样的单层决策树,分类误差最小就是0。这就是我们训练好的弱分类器。接下来,使用AdaBoost算法提升分类器性能,将分类误差缩短到0,看下AdaBoost算法是如何实现的。
# 使用Adaboost算法提升弱分类器性能
def adbBoostTrainDS(dataMat, classLabels, numIt=40):"""dataMat:数据矩阵classLabels:标签矩阵numIt:最大迭代次数返回值:weakClassArr  训练好的分类器   aggClassEst:类别估计累计值"""weakClassArr = []m = np.shape(dataMat)[0]D = np.mat(np.ones((m, 1)) / m)  # 初始化样本权重DaggClassEst = np.mat(np.zeros((m, 1)))for i in range(numIt):bestStump, error, classEst = buildStump(dataMat, classLabels, D)  # 构建单个单层决策树# 计算弱分类器权重alpha,使error不等于0,因为分母不能为0alpha = float(0.5 * np.log((1.0 - error) / max(error, 1e-16)))bestStump["alpha"] = alpha   # 存储每个弱分类器的权重alphaweakClassArr.append(bestStump)  # 存储单个单层决策树print("classEst: ", classEst.T)expon = np.multiply(-1 * alpha *np.mat(classLabels).T, classEst)  # 计算e的指数项D = np.multiply(D, np.exp(expon))D = D / D.sum()# 计算AdaBoost误差,当误差为0时,退出循环aggClassEst += alpha * classEst  # 计算类别估计累计值--注意这里包括了目前已经训练好的每一个弱分类器print("aggClassEst: ", aggClassEst.T)aggErrors = np.multiply(np.sign(aggClassEst) != np.mat(classLabels).T, np.ones((m, 1)))  # 目前集成分类器的分类误差errorRate = aggErrors.sum() / m  # 集成分类器分类错误率,如果错误率为0,则整个集成算法停止,训练完成print("total error: ", errorRate)if errorRate == 0.0:breakreturn weakClassArr, aggClassEst
  • 3.4 使用AdaBoost提升分类器性能
# Adaboost分类函数
def adaClassify(dataToClass, classifier):"""dataToClass:待分类样本classifier:训练好的强分类器"""dataMat = np.mat(dataToClass)m = np.shape(dataMat)[0]aggClassEst = np.mat(np.zeros((m, 1)))for i in range(len(classifier)):   # 遍历所有分类器,进行分类classEst = stumpClassify(dataMat, classifier[i]["dim"], classifier[i]["thresh"], classifier[i]["ineq"])aggClassEst += classifier[i]["alpha"] * classEstprint(aggClassEst)return np.sign(aggClassEst)
  • 3.5 整个Adaboost提升算法的代码如下:
  #!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date    : 2019-05-12 21:31:41
# @Author  : cdl (1217096231@qq.com)
# @Link    : https://github.com/cdlwhm1217096231/python3_spider
# @Version : $Id$import numpy as np
import matplotlib.pyplot as plt# 数据集可视化def loadSimpleData():dataMat = np.matrix([[1., 2.1],[1.5, 1.6],[1.3, 1.],[1., 1.],[2., 1.]])classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]return dataMat, classLabelsdef showDataSet(dataMat, labelMat):data_plus = []data_minus = []for i in range(len(dataMat)):if labelMat[i] > 0:data_plus.append(dataMat[i])else:data_minus.append(dataMat[i])data_plus_np = np.array(data_plus)data_minus_np = np.array(data_minus)plt.scatter(np.transpose(data_plus_np)[0], np.transpose(data_plus)[1])plt.scatter(np.transpose(data_minus_np)[0], np.transpose(data_minus)[1])plt.title("dataset visualize")plt.xlabel("x")plt.ylabel("y")plt.show()# 构建单层决策树分类函数
def stumpClassify(dataMat, dimen, threshval, threshIneq):"""dataMat:数据矩阵dimen:第dimen列,即第几个特征threshval:阈值threshIneq:标志返回值retArray:分类结果"""retArray = np.ones((np.shape(dataMat)[0], 1))  # 初始化retArray为1if threshIneq == "lt":retArray[dataMat[:, dimen] <= threshval] = -1.0   # 如果小于阈值,则赋值为-1else:retArray[dataMat[:, dimen] > threshval] = 1.0   # 如果大于阈值,则赋值为-1return retArray# 找到数据集上最佳的单层决策树,单层决策树是指只考虑其中的一个特征,在该特征的基础上进行分类,寻找分类错误率最低的阈值即可。例如本文中的例子是,如果以第一列特征为基础,阈值选择1.3,并且设置>1.3的为-1,<1.3的为+1,这样就构造出了一个二分类器
def buildStump(dataMat, classLabels, D):"""dataMat:数据矩阵classLabels:数据标签D:样本权重返回值是:bestStump:最佳单层决策树信息;minError:最小误差;bestClasEst:最佳的分类结果"""dataMat = np.matrix(dataMat)labelMat = np.matrix(classLabels).Tm, n = np.shape(dataMat)numSteps = 10.0bestStump = {}  # 存储最佳单层决策树信息的字典bestClasEst = np.mat(np.zeros((m, 1)))   # 最佳分类结果minError = float("inf")for i in range(n):  # 遍历所有特征rangeMin = dataMat[:, i].min()rangeMax = dataMat[:, i].max()stepSize = (rangeMax - rangeMin) / numSteps  # 计算步长for j in range(-1, int(numSteps) + 1):for inequal in ["lt", "gt"]:threshval = (rangeMin + float(j) * stepSize)  # 计算阈值predictVals = stumpClassify(dataMat, i, threshval, inequal)  # 计算分类结果errArr = np.mat(np.ones((m, 1)))  # 初始化误差矩阵errArr[predictVals == labelMat] = 0  # 分类完全正确,赋值为0# 基于权重向量D而不是其他错误计算指标来评价分类器的,不同的分类器计算方法不一样weightedError = D.T * errArr  # 计算弱分类器的分类错误率---这里没有采用常规方法来评价这个分类器的分类准确率,而是乘上了样本权重Dprint("split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % (i, threshval, inequal, weightedError))if weightedError < minError:minError = weightedErrorbestClasEst = predictVals.copy()bestStump["dim"] = ibestStump["thresh"] = threshvalbestStump["ineq"] = inequalreturn bestStump, minError, bestClasEst# 使用Adaboost算法提升弱分类器性能
def adbBoostTrainDS(dataMat, classLabels, numIt=40):"""dataMat:数据矩阵classLabels:标签矩阵numIt:最大迭代次数返回值:weakClassArr  训练好的分类器   aggClassEst:类别估计累计值"""weakClassArr = []m = np.shape(dataMat)[0]D = np.mat(np.ones((m, 1)) / m)  # 初始化样本权重DaggClassEst = np.mat(np.zeros((m, 1)))for i in range(numIt):bestStump, error, classEst = buildStump(dataMat, classLabels, D)  # 构建单个单层决策树# 计算弱分类器权重alpha,使error不等于0,因为分母不能为0alpha = float(0.5 * np.log((1.0 - error) / max(error, 1e-16)))bestStump["alpha"] = alpha   # 存储每个弱分类器的权重alphaweakClassArr.append(bestStump)  # 存储单个单层决策树print("classEst: ", classEst.T)expon = np.multiply(-1 * alpha *np.mat(classLabels).T, classEst)  # 计算e的指数项D = np.multiply(D, np.exp(expon))D = D / D.sum()# 计算AdaBoost误差,当误差为0时,退出循环aggClassEst += alpha * classEst  # 计算类别估计累计值--注意这里包括了目前已经训练好的每一个弱分类器print("aggClassEst: ", aggClassEst.T)aggErrors = np.multiply(np.sign(aggClassEst) != np.mat(classLabels).T, np.ones((m, 1)))  # 目前集成分类器的分类误差errorRate = aggErrors.sum() / m  # 集成分类器分类错误率,如果错误率为0,则整个集成算法停止,训练完成print("total error: ", errorRate)if errorRate == 0.0:breakreturn weakClassArr, aggClassEst# Adaboost分类函数
def adaClassify(dataToClass, classifier):"""dataToClass:待分类样本classifier:训练好的强分类器"""dataMat = np.mat(dataToClass)m = np.shape(dataMat)[0]aggClassEst = np.mat(np.zeros((m, 1)))for i in range(len(classifier)):   # 遍历所有分类器,进行分类classEst = stumpClassify(dataMat, classifier[i]["dim"], classifier[i]["thresh"], classifier[i]["ineq"])aggClassEst += classifier[i]["alpha"] * classEstprint(aggClassEst)return np.sign(aggClassEst)if __name__ == "__main__":dataMat, classLabels = loadSimpleData()showDataSet(dataMat, classLabels)weakClassArr, aggClassEst = adbBoostTrainDS(dataMat, classLabels)print(adaClassify([[0, 0], [5, 5]], weakClassArr))
  • 结果如下:

      split: dim 0, thresh 0.90, thresh ineqal: lt, the weighted error is 0.400split: dim 0, thresh 0.90, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 1.00, thresh ineqal: lt, the weighted error is 0.400split: dim 0, thresh 1.00, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 1.10, thresh ineqal: lt, the weighted error is 0.400split: dim 0, thresh 1.10, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 1.20, thresh ineqal: lt, the weighted error is 0.400split: dim 0, thresh 1.20, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 1.30, thresh ineqal: lt, the weighted error is 0.200split: dim 0, thresh 1.30, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 1.40, thresh ineqal: lt, the weighted error is 0.200split: dim 0, thresh 1.40, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 1.50, thresh ineqal: lt, the weighted error is 0.400split: dim 0, thresh 1.50, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 1.60, thresh ineqal: lt, the weighted error is 0.400split: dim 0, thresh 1.60, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 1.70, thresh ineqal: lt, the weighted error is 0.400split: dim 0, thresh 1.70, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 1.80, thresh ineqal: lt, the weighted error is 0.400split: dim 0, thresh 1.80, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 1.90, thresh ineqal: lt, the weighted error is 0.400split: dim 0, thresh 1.90, thresh ineqal: gt, the weighted error is 0.400split: dim 0, thresh 2.00, thresh ineqal: lt, the weighted error is 0.600split: dim 0, thresh 2.00, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 0.89, thresh ineqal: lt, the weighted error is 0.400split: dim 1, thresh 0.89, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 1.00, thresh ineqal: lt, the weighted error is 0.200split: dim 1, thresh 1.00, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 1.11, thresh ineqal: lt, the weighted error is 0.200split: dim 1, thresh 1.11, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 1.22, thresh ineqal: lt, the weighted error is 0.200split: dim 1, thresh 1.22, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 1.33, thresh ineqal: lt, the weighted error is 0.200split: dim 1, thresh 1.33, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 1.44, thresh ineqal: lt, the weighted error is 0.200split: dim 1, thresh 1.44, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 1.55, thresh ineqal: lt, the weighted error is 0.200split: dim 1, thresh 1.55, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 1.66, thresh ineqal: lt, the weighted error is 0.400split: dim 1, thresh 1.66, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 1.77, thresh ineqal: lt, the weighted error is 0.400split: dim 1, thresh 1.77, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 1.88, thresh ineqal: lt, the weighted error is 0.400split: dim 1, thresh 1.88, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 1.99, thresh ineqal: lt, the weighted error is 0.400split: dim 1, thresh 1.99, thresh ineqal: gt, the weighted error is 0.400split: dim 1, thresh 2.10, thresh ineqal: lt, the weighted error is 0.600split: dim 1, thresh 2.10, thresh ineqal: gt, the weighted error is 0.400classEst:  [[-1.  1. -1. -1.  1.]]aggClassEst:  [[-0.69314718  0.69314718 -0.69314718 -0.69314718  0.69314718]]total error:  0.2split: dim 0, thresh 0.90, thresh ineqal: lt, the weighted error is 0.250split: dim 0, thresh 0.90, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 1.00, thresh ineqal: lt, the weighted error is 0.625split: dim 0, thresh 1.00, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 1.10, thresh ineqal: lt, the weighted error is 0.625split: dim 0, thresh 1.10, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 1.20, thresh ineqal: lt, the weighted error is 0.625split: dim 0, thresh 1.20, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 1.30, thresh ineqal: lt, the weighted error is 0.500split: dim 0, thresh 1.30, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 1.40, thresh ineqal: lt, the weighted error is 0.500split: dim 0, thresh 1.40, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 1.50, thresh ineqal: lt, the weighted error is 0.625split: dim 0, thresh 1.50, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 1.60, thresh ineqal: lt, the weighted error is 0.625split: dim 0, thresh 1.60, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 1.70, thresh ineqal: lt, the weighted error is 0.625split: dim 0, thresh 1.70, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 1.80, thresh ineqal: lt, the weighted error is 0.625split: dim 0, thresh 1.80, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 1.90, thresh ineqal: lt, the weighted error is 0.625split: dim 0, thresh 1.90, thresh ineqal: gt, the weighted error is 0.250split: dim 0, thresh 2.00, thresh ineqal: lt, the weighted error is 0.750split: dim 0, thresh 2.00, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 0.89, thresh ineqal: lt, the weighted error is 0.250split: dim 1, thresh 0.89, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 1.00, thresh ineqal: lt, the weighted error is 0.125split: dim 1, thresh 1.00, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 1.11, thresh ineqal: lt, the weighted error is 0.125split: dim 1, thresh 1.11, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 1.22, thresh ineqal: lt, the weighted error is 0.125split: dim 1, thresh 1.22, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 1.33, thresh ineqal: lt, the weighted error is 0.125split: dim 1, thresh 1.33, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 1.44, thresh ineqal: lt, the weighted error is 0.125split: dim 1, thresh 1.44, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 1.55, thresh ineqal: lt, the weighted error is 0.125split: dim 1, thresh 1.55, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 1.66, thresh ineqal: lt, the weighted error is 0.250split: dim 1, thresh 1.66, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 1.77, thresh ineqal: lt, the weighted error is 0.250split: dim 1, thresh 1.77, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 1.88, thresh ineqal: lt, the weighted error is 0.250split: dim 1, thresh 1.88, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 1.99, thresh ineqal: lt, the weighted error is 0.250split: dim 1, thresh 1.99, thresh ineqal: gt, the weighted error is 0.250split: dim 1, thresh 2.10, thresh ineqal: lt, the weighted error is 0.750split: dim 1, thresh 2.10, thresh ineqal: gt, the weighted error is 0.250classEst:  [[ 1.  1. -1. -1. -1.]]aggClassEst:  [[ 0.27980789  1.66610226 -1.66610226 -1.66610226 -0.27980789]]total error:  0.2split: dim 0, thresh 0.90, thresh ineqal: lt, the weighted error is 0.143split: dim 0, thresh 0.90, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 1.00, thresh ineqal: lt, the weighted error is 0.357split: dim 0, thresh 1.00, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 1.10, thresh ineqal: lt, the weighted error is 0.357split: dim 0, thresh 1.10, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 1.20, thresh ineqal: lt, the weighted error is 0.357split: dim 0, thresh 1.20, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 1.30, thresh ineqal: lt, the weighted error is 0.286split: dim 0, thresh 1.30, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 1.40, thresh ineqal: lt, the weighted error is 0.286split: dim 0, thresh 1.40, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 1.50, thresh ineqal: lt, the weighted error is 0.357split: dim 0, thresh 1.50, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 1.60, thresh ineqal: lt, the weighted error is 0.357split: dim 0, thresh 1.60, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 1.70, thresh ineqal: lt, the weighted error is 0.357split: dim 0, thresh 1.70, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 1.80, thresh ineqal: lt, the weighted error is 0.357split: dim 0, thresh 1.80, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 1.90, thresh ineqal: lt, the weighted error is 0.357split: dim 0, thresh 1.90, thresh ineqal: gt, the weighted error is 0.143split: dim 0, thresh 2.00, thresh ineqal: lt, the weighted error is 0.857split: dim 0, thresh 2.00, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 0.89, thresh ineqal: lt, the weighted error is 0.143split: dim 1, thresh 0.89, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 1.00, thresh ineqal: lt, the weighted error is 0.500split: dim 1, thresh 1.00, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 1.11, thresh ineqal: lt, the weighted error is 0.500split: dim 1, thresh 1.11, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 1.22, thresh ineqal: lt, the weighted error is 0.500split: dim 1, thresh 1.22, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 1.33, thresh ineqal: lt, the weighted error is 0.500split: dim 1, thresh 1.33, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 1.44, thresh ineqal: lt, the weighted error is 0.500split: dim 1, thresh 1.44, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 1.55, thresh ineqal: lt, the weighted error is 0.500split: dim 1, thresh 1.55, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 1.66, thresh ineqal: lt, the weighted error is 0.571split: dim 1, thresh 1.66, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 1.77, thresh ineqal: lt, the weighted error is 0.571split: dim 1, thresh 1.77, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 1.88, thresh ineqal: lt, the weighted error is 0.571split: dim 1, thresh 1.88, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 1.99, thresh ineqal: lt, the weighted error is 0.571split: dim 1, thresh 1.99, thresh ineqal: gt, the weighted error is 0.143split: dim 1, thresh 2.10, thresh ineqal: lt, the weighted error is 0.857split: dim 1, thresh 2.10, thresh ineqal: gt, the weighted error is 0.143classEst:  [[1. 1. 1. 1. 1.]]aggClassEst:  [[ 1.17568763  2.56198199 -0.77022252 -0.77022252  0.61607184]]total error:  0.0[[-0.69314718][ 0.69314718]][[-1.66610226][ 1.66610226]][[-2.56198199][ 2.56198199]][[-1.][ 1.]][Finished in 2.5s]
    

集成学习Bagging和Boosting算法总结相关推荐

  1. 集成学习-Bagging和Boosting算法

    文章目录 集成学习 Bagging 随机森林 Bosting Adaboost GBDT XGBoost 前些天发现了一个巨牛的人工智能学习网站,通俗易懂,风趣幽默,忍不住分享一下给大家.点击跳转到网 ...

  2. 树模型系列之集成学习(Bagging、Boosting、Stacking)

    文章目录 树模型系列之集成学习(Bagging.Boosting.Stacking) bagging Boosting Stacking 偏差与方差 集成学习的偏差与方差 Bagging 的偏差与方差 ...

  3. 集成学习——Bagging、Boosting、Stacking

    目录 偏差与方差 投票法 集成学习 Bagging Bootstraps Bagging Boosting 基本概念 Adaboost 前向分步算法 梯度提升树(GBDT) XGBoost Light ...

  4. 机器学习入门(五):集成学习Bagging,Boosting,RandomForest和GridSearchCV参数调优

    0)集成学习 集成学习(ensemble methods)的目的是结合不同的分类器,生成一个meta-classifier, 从而使其拥有比单个classifier有更好的泛化能力(The goal ...

  5. 集成学习2:Boosting算法:AdaboostGBDT

    文章目录 一. Boosting算法原理 二. Adaboost算法 2.1 Adaboost算法原理 2.2 Adaboost算法举例 2.3 Adaboos代码举例 三. 前向分步算法 3.1加法 ...

  6. 【IM】关于集成学习Bagging和Boosting的理解

    集成学习在各大比赛中非常流程,如XGboost.LGBM等,对其基学习器决策树及其剪枝等,可参考: https://blog.csdn.net/fjssharpsword/article/detail ...

  7. bagging和boosting算法(集成学习算法)

    一.集成学习简介   在讲boosting和bagging算法之前,首先需要了解一下集成学习的概念.集成学习是指将若干弱分类器组合之后产生一个强分类器.弱分类器(weak learner)指那些分类准 ...

  8. 【集成学习】:bagging和boosting算法及对比

    参考:bagging和boosting算法(集成学习算法) Bagging算法和Boosting区别和联系 机器学习笔记-集成学习之Bagging,Boosting,随机森林三者特性对比 目录 1. ...

  9. R语言基于Bagging算法(融合多个决策树)构建集成学习Bagging分类模型、并评估模型在测试集和训练集上的分类效果(accuray、F1、偏差Deviance):Bagging算法与随机森林对比

    R语言基于Bagging算法(融合多个决策树)构建集成学习Bagging分类模型.并评估模型在测试集和训练集上的分类效果(accuray.F1.偏差Deviance):Bagging算法与随机森林对比 ...

最新文章

  1. Linux命令:文本处理工具awk详解
  2. JAVA入门[4]-IntelliJ IDEA配置Tomcat
  3. Android JNI 第三篇 Java参数类型与本地参数类型对照
  4. Dubbo服务引用过程
  5. 【OpenCV3】棋盘格角点检测与绘制——cv::findChessboardCorners()与cv::drawChessboardCorners()详解
  6. centos6 rsync+inotify 数据同步
  7. 怎么打公式_迫真公式部~注入之里技
  8. JDBC操作oracle
  9. markdown如何设置图片大小_Markdown编辑知乎文章的完全攻略
  10. java把一个文件的内容复制到另外一个文件
  11. 【2020模拟考试T3】【PAT乙】1028 人口普查 (20分) 字符串比较
  12. 写于Silverlight整装待发之际【瞿杰】
  13. Hibernate注解----关联映射注解以及课程总结详解----图片版本
  14. 用python完成《商务与经济统计(第13版)》课后练习——第九章
  15. 连点器android版本,连点器安卓手机版
  16. 计算机一级exc除法函数,excel除法函数 excel除法如何表示
  17. codeforces 592D(树DP)
  18. 百度地图迁徙大数据_百度地图迁徙大数据:2月5日,成都为全国迁入城市第二;迁出城市第一...
  19. SLC、MLC、TLC闪存颗粒
  20. JDK的下载与安装(详细版)

热门文章

  1. 跟着老司机玩转Node自定义命令行
  2. 读书笔记-PowerShell实战指南(第三版)
  3. Android Gradle基础实践
  4. Bootstrap方法为页面添加一个弹出框
  5. 你了解的技术宅是这样吗?
  6. 手动启动“远程过程调用”服务时,出现错误信息1058
  7. inux下如何查看CPU信息, 包括位数和多核信息
  8. 同时测试多个服务是否存活的脚本[shell和perl]
  9. MySQL优化篇:SQL优化流程
  10. 面试官:Redis新版本开始引入多线程,谈谈你的看法?