




First, it is simple and effective. Second, the cost of retraining is low (changes in the category system and training set changes are common in Web environments and e-commerce applications). Third, the calculation of time and space is linear to the size of the training set (in some cases not too large). Fourth, since the KNN method mainly depends on the neighboring limited samples, rather than determining the category by means of discriminating the class domain, the KNN method is better than the other for the sample sets that have overlapping or overlapping classes. The method is more suitable. Fifth, this algorithm is more suitable for the automatic classification of class domains with large sample sizes, and those class domains with smaller sample sizes are more prone to misclassification using this algorithm.


• The estimate of the regression function can be highly unstable as it is an average of only a few points. This is the price that we pay for flexibility.

• Curse of dimensionality.

• Generating predictions is computationally expensive


KNN的python 实现:



(以k取值2和50 为例)




可以通过列举出各种K的取值来找出Test 数据集中的rmse最小值。(training 数据集中的rmse 会随K 的增大而增大)




2.1 ridge regression

第二项也称之为l2 regularisation


Solving multilinearity is one of the advantages of ridge regression. Using the ridge model can improve predicted performance. Another advantage is that the ridge model can significantly solve the over-adjustment problem by introducing a penalty term. Therefore, the unimportant features from the use of burrs to the regularization of features become infinitely close to zero, efficiently reducing the variance and improving the performance of the prediction model.


Since the coefficients of the penalty term can become infinitely close to zero but it can not be zero, there are still many features that can not be explained completely.


Python 的实现:

2.2 Lasso regression

第二项也称之为l1 regularisation

优缺点和ridge regression 相类似。

Python 的实现:



XGB boost是目前为止,对于数据分类预测最为有效的的实现方法。其准确率是在所有方法中独占鳌头的,因此具有很大的现实意义。

XGB boost的原理主要是基于决策树的分类方式。不同决策树的累加求得最后的分类。对于普通的决策树而言,首先是建立尽可能大的树,然后开始用贪心算法开始裁剪。而XGB有所不同的点在于,他新添加的每一棵树都是用了最优的添加。使得最后的结果能达到一个最优解。另一方面,XGB加入了复杂度的惩罚,即正则项,正则项里包含了树的叶子节点个数、每个叶子节点上输出的score的L2模的平方和(对于其具体的原理,理解的还不够透彻)。



1. Comparing with gradient boost, XGBoost is faster, since the weight of XGBoost is known as Newton “step”, which does not need line search, the step length has been naturally known as ‘1’.

2. Advantage in characteristics rank, since XGBoost ranks the data and set the result as block types before the training, the block data type can be used repeatedly in further boosting.

3. XGBoost dealing with bias-variance tradeoff, the result of regularization term can control the complex level, and avoiding overfitting.

总结来说,XGB 就是一种很好用的算法。




四LGB boost

LGB boost是微软公司2016推出的算法,其是在XGB算法上面的改进。主要提升了XGB算法的运行的速度,与之相对应的代价就是精度的损失。


The algorithm is similar with XGBoost, except the tree learning growth direction, when the data is small, LightGBM is to growth trees leaf-wise. The other traditional algorithm is to grow trees by depth-wise. The parallel features which is the most different with the other has been shown below (Sphinx):

1. Workers find local best split point {feature, threshold} on local feature set.

2. Communicate local best splits with each other and get the best one.

3. Perform the optimum split.


1. Optimization in speed and reducing memory usage, especially large number data training.

2. Optimization in accuracy, differ with the most tree learning algorithms, LightGBM does not grow trees by depth-wise, it grows trees leaf-wise, when the data is small.

3. Optimal split for categorical features, since LightGBM uses its accumulated values to sorts the histogram, and then benefit from this idea, the best split on the sorted histogram has been found.

Python 的实现:



  1. 乐高ev3 读取外部数据_数据就是新乐高

    乐高ev3 读取外部数据 When I was a kid, I used to love playing with Lego. My brother and I built almost all k ...

  2. 基于灰狼算法优化概率神经网络PNN的分类预测-附代码

    基于灰狼算法优化概率神经网络PNN的分类预测 - 附代码 文章目录 基于灰狼算法优化概率神经网络PNN的分类预测 - 附代码 1.PNN网络概述 2.变压器故障诊街系统相关背景 2.1 模型建立 3. ...

  3. ML之xgboost:利用xgboost算法对breast_cancer数据集实现二分类预测并进行graphviz二叉树节点图可视化

    ML之xgboost:利用xgboost算法对breast_cancer数据集实现二分类预测并进行graphviz二叉树节点图可视化 目录 实现结果 实现代码 实现结果

  4. TF之GD:基于tensorflow框架搭建GD算法利用Fashion-MNIST数据集实现多分类预测(92%)

    TF之GD:基于tensorflow框架搭建GD算法利用Fashion-MNIST数据集实现多分类预测(92%) 目录 输出结果 实现代码 输出结果 Successfully downloaded t ...

  5. ML之xgboost:利用xgboost算法(自带,特征重要性可视化+且作为阈值训练模型)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测)

    ML之xgboost:利用xgboost算法(自带,特征重要性可视化+且作为阈值训练模型)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测) 目录 输出结果 ...

  6. ML之LoRDTRF:基于LoRDT(CART)RF算法对mushrooms蘑菇数据集(22+1,6513+1611)训练来预测蘑菇是否毒性(二分类预测)

    ML之LoR&DT&RF:基于LoR&DT(CART)&RF算法对mushrooms蘑菇数据集(22+1,6513+1611)训练来预测蘑菇是否毒性(二分类预测) 目录 ...

  7. ML之xgboost:利用xgboost算法(sklearn+GridSearchCV)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测)

    ML之xgboost:利用xgboost算法(sklearn+GridSearchCV)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测) 目录 输出结果 ...

  8. ML之xgboost:利用xgboost算法(sklearn+7CrVa)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测)

    ML之xgboost:利用xgboost算法(sklearn+7CrVa)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测) 目录 输出结果 设计思路 核心 ...

  9. ML之xgboost:利用xgboost算法(sklearn+3Split+调参曲线+EarlyStop)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测)

    ML之xgboost:利用xgboost算法(sklearn+3Spli+调参曲线+EarlyStop)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测) ...

  10. ML之xgboost:利用xgboost算法(sklearn+3Split+调参曲线)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测)

    ML之xgboost:利用xgboost算法(sklearn+3Split+调参曲线)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测) 目录 输出结果 设 ...


  1. .net3.5的安装与修复
  2. How to access the folder of Android
  3. 旧闻 - 来怀念一下Sun公司
  4. 基于MATLAB的小波去噪
  5. 详细透彻的分析DM9000网卡驱动程序(3)
  6. 工作流实战篇_01_flowable 流程Demo案例
  7. php 图片 byte数组,php – 将图像存储在PostgreSQL数据库的bytea字段中
  8. 【AI面试题】逻辑回归和线性回归的区别
  9. 如何正确使用广告素材、优化Facebook广告
  10. Machine Learning课程中的常见符号的含义
  11. 收集DC中失败的登录信息并邮件通知
  12. 粒子群优化算法及MATLAB实现
  13. android tcp 工具,TcpIp工具包app
  14. python爬取虎扑评论_python-2:爬取某个网页(虎扑)帖子的标题做词云图
  15. 基于stm32单片机PT100铂电阻温度采集系统
  16. 2017已经接近尾声,然而我却什么都没干成
  17. 每日英语好文翻译(11)
  18. db2 replace函数的用法_总结篇--SUBSTITU函数实用终极帖
  19. 使用vue做一个“淘宝“项目(显示页面)
  20. 微信公众号接入图灵机器人


  1. h5新增标签及css3新增属性
  2. Windows10系统如何以管理员身份进入CMD的四种方法
  3. Cisco模拟器配置OSPF
  4. M.2接口SSD固态硬盘的SATA NGFF协议和PCIe NVMe协议介绍
  5. 歌词文件lrc的解析类(目前在WINCE下使用)
  6. 一分钟RecyclerView转场动画实现(淡入/出、旋转、缩放等)
  7. 哪种蓝牙耳机音质好又便宜?便宜又好用的耳机蓝牙耳机推荐
  8. 天下没有免费的午餐,国产化替代迫在眉睫
  9. 如何才能成为一名淘宝客?
  10. 标准BP算法matlab实现,简单易懂