机器学习 | 实战（一）Decision_tree

from sklearn import tree
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split

wine = load_wine()

wine

{‘data’: array([[1.423e+01, 1.710e+00, 2.430e+00, …, 1.040e+00, 3.920e+00,
1.065e+03],
[1.320e+01, 1.780e+00, 2.140e+00, …, 1.050e+00, 3.400e+00,
1.050e+03],
[1.316e+01, 2.360e+00, 2.670e+00, …, 1.030e+00, 3.170e+00,
1.185e+03],
…,
[1.327e+01, 4.280e+00, 2.260e+00, …, 5.900e-01, 1.560e+00,
8.350e+02],
[1.317e+01, 2.590e+00, 2.370e+00, …, 6.000e-01, 1.620e+00,
8.400e+02],
[1.413e+01, 4.100e+00, 2.740e+00, …, 6.100e-01, 1.600e+00,
5.600e+02]]),
‘target’: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2]),
‘frame’: None,
‘target_names’: array([‘class_0’, ‘class_1’, ‘class_2’], dtype=’<U7’),
‘DESCR’: ‘… _wine_dataset:\n\nWine recognition dataset\n------------------------\n\nData Set Characteristics:\n\n :Number of Instances: 178 (50 in each of three classes)\n :Number of Attributes: 13 numeric, predictive attributes and the class\n :Attribute Information:\n \t\t- Alcohol\n \t\t- Malic acid\n \t\t- Ash\n\t\t- Alcalinity of ash \n \t\t- Magnesium\n\t\t- Total phenols\n \t\t- Flavanoids\n \t\t- Nonflavanoid phenols\n \t\t- Proanthocyanins\n\t\t- Color intensity\n \t\t- Hue\n \t\t- OD280/OD315 of diluted wines\n \t\t- Proline\n\n - class:\n - class_0\n - class_1\n - class_2\n\t\t\n :Summary Statistics:\n \n ============================= ==== ===== ======= =====\n Min Max Mean SD\n ============================= ==== ===== ======= =====\n Alcohol: 11.0 14.8 13.0 0.8\n Malic Acid: 0.74 5.80 2.34 1.12\n Ash: 1.36 3.23 2.36 0.27\n Alcalinity of Ash: 10.6 30.0 19.5 3.3\n Magnesium: 70.0 162.0 99.7 14.3\n Total Phenols: 0.98 3.88 2.29 0.63\n Flavanoids: 0.34 5.08 2.03 1.00\n Nonflavanoid Phenols: 0.13 0.66 0.36 0.12\n Proanthocyanins: 0.41 3.58 1.59 0.57\n Colour Intensity: 1.3 13.0 5.1 2.3\n Hue: 0.48 1.71 0.96 0.23\n OD280/OD315 of diluted wines: 1.27 4.00 2.61 0.71\n Proline: 278 1680 746 315\n ============================= ==== ===== ======= =====\n\n :Missing Attribute Values: None\n :Class Distribution: class_0 (59), class_1 (71), class_2 (48)\n :Creator: R.A. Fisher\n :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n :Date: July, 1988\n\nThis is a copy of UCI ML Wine recognition datasets.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data\n\nThe data is the results of a chemical analysis of wines grown in the same\nregion in Italy by three different cultivators. There are thirteen different\nmeasurements taken for different constituents found in the three types of\nwine.\n\nOriginal Owners: \n\nForina, M. et al, PARVUS - \nAn Extendible Package for Data Exploration, Classification and Correlation. \nInstitute of Pharmaceutical and Food Analysis and Technologies,\nVia Brigata Salerno, 16147 Genoa, Italy.\n\nCitation:\n\nLichman, M. (2013). UCI Machine Learning Repository\n[https://archive.ics.uci.edu/ml]. Irvine, CA: University of California,\nSchool of Information and Computer Science. \n\n… topic:: References\n\n (1) S. Aeberhard, D. Coomans and O. de Vel, \n Comparison of Classifiers in High Dimensional Settings, \n Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of \n Mathematics and Statistics, James Cook University of North Queensland. \n (Also submitted to Technometrics). \n\n The data was used with many others for comparing various \n classifiers. The classes are separable, though only RDA \n has achieved 100% correct classification. \n (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) \n (All results using the leave-one-out technique) \n\n (2) S. Aeberhard, D. Coomans and O. de Vel, \n “THE CLASSIFICATION PERFORMANCE OF RDA” \n Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of \n Mathematics and Statistics, James Cook University of North Queensland. \n (Also submitted to Journal of Chemometrics).\n’,
‘feature_names’: [‘alcohol’,
‘malic_acid’,
‘ash’,
‘alcalinity_of_ash’,
‘magnesium’,
‘total_phenols’,
‘flavanoids’,
‘nonflavanoid_phenols’,
‘proanthocyanins’,
‘color_intensity’,
‘hue’,
‘od280/od315_of_diluted_wines’,
‘proline’]}

wine.data

array([[1.423e+01, 1.710e+00, 2.430e+00, …, 1.040e+00, 3.920e+00,
1.065e+03],
[1.320e+01, 1.780e+00, 2.140e+00, …, 1.050e+00, 3.400e+00,
1.050e+03],
[1.316e+01, 2.360e+00, 2.670e+00, …, 1.030e+00, 3.170e+00,
1.185e+03],
…,
[1.327e+01, 4.280e+00, 2.260e+00, …, 5.900e-01, 1.560e+00,
8.350e+02],
[1.317e+01, 2.590e+00, 2.370e+00, …, 6.000e-01, 1.620e+00,
8.400e+02],
[1.413e+01, 4.100e+00, 2.740e+00, …, 6.100e-01, 1.600e+00,
5.600e+02]])

wine.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2])

wine.data.shape

(178, 13)

import pandas as pd
pd.concat([pd.DataFrame(wine.data),pd.DataFrame(wine.target)],axis=1)

	0	1	2	3	4	5	6	7	8	9	10	11	12	0
0	14.23	1.71	2.43	15.6	127.0	2.80	3.06	0.28	2.29	5.64	1.04	3.92	1065.0	0
1	13.20	1.78	2.14	11.2	100.0	2.65	2.76	0.26	1.28	4.38	1.05	3.40	1050.0	0
2	13.16	2.36	2.67	18.6	101.0	2.80	3.24	0.30	2.81	5.68	1.03	3.17	1185.0	0
3	14.37	1.95	2.50	16.8	113.0	3.85	3.49	0.24	2.18	7.80	0.86	3.45	1480.0	0
4	13.24	2.59	2.87	21.0	118.0	2.80	2.69	0.39	1.82	4.32	1.04	2.93	735.0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
173	13.71	5.65	2.45	20.5	95.0	1.68	0.61	0.52	1.06	7.70	0.64	1.74	740.0	2
174	13.40	3.91	2.48	23.0	102.0	1.80	0.75	0.43	1.41	7.30	0.70	1.56	750.0	2
175	13.27	4.28	2.26	20.0	120.0	1.59	0.69	0.43	1.35	10.20	0.59	1.56	835.0	2
176	13.17	2.59	2.37	20.0	120.0	1.65	0.68	0.53	1.46	9.30	0.60	1.62	840.0	2
177	14.13	4.10	2.74	24.5	96.0	2.05	0.76	0.56	1.35	9.20	0.61	1.60	560.0	2

178 rows × 14 columns

wine.feature_names

[‘alcohol’,
‘malic_acid’,
‘ash’,
‘alcalinity_of_ash’,
‘magnesium’,
‘total_phenols’,
‘flavanoids’,
‘nonflavanoid_phenols’,
‘proanthocyanins’,
‘color_intensity’,
‘hue’,
‘od280/od315_of_diluted_wines’,
‘proline’]

wine.target_names

array([‘class_0’, ‘class_1’, ‘class_2’], dtype=’<U7’)

train_test_split是随机划分训练集和测试集的。测试集占0.3，训练集占0.7：

Xtrain,Xtest,Ytrain,Ytest = train_test_split(wine.data,wine.target,test_size = 0.3)

Xtrain.shape

(124, 13)

Xtest.shape

(54, 13)

Ytrain

array([0, 2, 0, 0, 1, 2, 2, 1, 1, 1, 1, 0, 1, 1, 1, 0, 2, 0, 0, 2, 2, 0,
1, 0, 1, 1, 2, 1, 1, 0, 0, 0, 1, 2, 1, 0, 1, 1, 2, 2, 2, 2, 2, 1,
0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 2, 0, 1, 2, 2, 1, 1, 1, 2, 2, 1,
2, 0, 2, 2, 1, 1, 1, 0, 1, 0, 0, 0, 2, 1, 0, 2, 0, 1, 2, 1, 0, 2,
1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2, 2, 0, 1, 2, 0,
1, 0, 0, 1, 1, 0, 2, 0, 2, 2, 2, 0, 1, 0])

Ytest

array([0, 2, 1, 1, 2, 2, 1, 0, 0, 1, 0, 0, 0, 1, 1, 2, 0, 1, 1, 0, 0, 2,
2, 1, 1, 1, 0, 2, 2, 2, 1, 2, 1, 1, 0, 2, 0, 2, 2, 1, 1, 0, 2, 1,
1, 0, 1, 0, 0, 0, 0, 1, 2, 1])

clf = tree.DecisionTreeClassifier(criterion="entropy",random_state = 30)
"""
entropy为“不纯度”指标。
random_state是用来设置分支中的随机模式的参数。输入任意整数，会长出同一棵树，使结果具有一定的稳定性。
决策树是随机的，在高维度时随机性会表现得比较明显，在低维度时随机性几乎没什么变化（比如鸢尾花数据集，只有3个特征，而通常只会用到它的三个特征）
30没有什么特殊的意义，可以换成任意数字，可以使score每一次都是一致的。
"""clf = clf.fit(Xtrain,Ytrain)
score = clf.score(Xtest,Ytest)#返回预测的准确度
score

0.9259259259259259

score

0.9259259259259259

feature_name = ['酒精','苹果酸','灰','灰的碱性','镁','总酚','类黄酮','非黄烷类酚类','花青素','颜色强度','色调','od280/od315稀释葡萄酒','脯氨酸']
import graphviz
dot_data = tree.export_graphviz(clf,feature_names = feature_name,class_names = ["琴酒","雪莉","贝尔摩德"],filled = True#filled为“真”表示填充颜色。当颜色越深时，不纯度越低,rounded = True#当rounded为“真”时，框为圆角的；如果不设置rounded，框为长方形，即有棱角的)
graph = graphviz.Source(dot_data)
graph

clf.feature_importances_
"""
查看使用了那些特征。没有使用的特征的importance为0，用到的特征会有一个对应的importance值
"""

array([0.34731406, 0. , 0. , 0. , 0. ,
0. , 0.44736729, 0. , 0. , 0.11003237,
0. , 0. , 0.09528628])

[*zip(feature_name,clf.feature_importances_)]
"""
把特征名和对应的importance联系起来。根节点对于决策树的贡献永远都是最高的
"""

[(‘酒精’, 0.3473140578439304),
(‘苹果酸’, 0.0),
(‘灰’, 0.0),
(‘灰的碱性’, 0.0),
(‘镁’, 0.0),
(‘总酚’, 0.0),
(‘类黄酮’, 0.44736728516836155),
(‘非黄烷类酚类’, 0.0),
(‘花青素’, 0.0),
(‘颜色强度’, 0.1100323740621365),
(‘色调’, 0.0),
(‘od280/od315稀释葡萄酒’, 0.0),
(‘脯氨酸’, 0.0952862829255716)]

clf = tree.DecisionTreeClassifier(criterion="entropy",random_state=30,splitter="random""""#增加随机性，使树变大变宽。如果使用random后准确率变高，那么就保留这一行代码；#如果用random后准确率变低了，就注掉这一行代码，默认的模式就是best#一切以追求“score越大”为目标""")
clf = clf.fit(Xtrain, Ytrain)
score = clf.score(Xtest, Ytest)
score

0.9074074074074074

import graphviz
dot_data = tree.export_graphviz(clf,feature_names= feature_name,class_names=["琴酒","雪莉","贝尔摩德"],filled=True,rounded=True)
graph = graphviz.Source(dot_data)
graph

树对训练集的拟合程度如何：

score_train = clf.score(Xtrain,Ytrain)
score_train

1.0

clf = tree.DecisionTreeClassifier(criterion="entropy",random_state=30,splitter="random",max_depth=3"""#多于三层的部分全部都会被砍掉.如果砍掉之后score没有什么变化，说明被砍掉的这些对于结果确实没什么帮助#如果砍掉之后score变小了，说明砍多了""",min_samples_leaf=10,min_samples_split=25 #如果小于25就不会分支)
clf = clf.fit(Xtrain, Ytrain)
dot_data = tree.export_graphviz(clf,feature_names= feature_name,class_names=["琴酒","雪莉","贝尔摩德"],filled=True,rounded=True)
graph = graphviz.Source(dot_data)
graph

score = clf.score(Xtest,Ytest)
score

0.9259259259259259

import matplotlib.pyplot as plt #画图用的
test = []
for i in range(10):clf = tree.DecisionTreeClassifier(max_depth=i+1,criterion="entropy",random_state=30,splitter="random")clf = clf.fit(Xtrain, Ytrain)score = clf.score(Xtest, Ytest)test.append(score)
plt.plot(range(1,11),test,color="red",label="max_depth")
plt.legend()
plt.show()

clf.apply(Xtest)

array([30,  9, 22, 22,  9,  9, 22, 30, 26, 16, 30, 30, 26, 26, 16,  9, 30,4, 22, 30, 26,  4,  9, 16, 22, 16, 30,  9,  9,  9, 16,  9,  9, 22,30,  8, 30,  9,  9, 16, 16, 30,  9, 16, 16, 24, 16, 30, 30, 30, 30,22,  9, 16], dtype=int64)

clf.predict(Xtest)

array([0, 2, 1, 1, 2, 2, 1, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 1, 0, 0, 1,
2, 1, 1, 1, 0, 2, 2, 2, 1, 2, 2, 1, 0, 1, 0, 2, 2, 1, 1, 0, 2, 1,
1, 1, 1, 0, 0, 0, 0, 1, 2, 1])

from sklearn.datasets import load_boston
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor

boston = load_boston()
regressor = DecisionTreeRegressor(random_state = 0)#实例化
cross_val_score(regressor,boston.data,boston.target,cv = 10,scoring = "neg_mean_squared_error")
"""
交叉验证五个重要参数。cv=10表示把数据分成10份，做10次。
scoring表示划分下的准确度，如果没有 scoring这一项，对于回归来说，默认返回R²。
有这一项，表示用“负均方误差”作为衡量指标
"""

array([-18.08941176, -10.61843137, -16.31843137, -44.97803922,-17.12509804, -49.71509804, -12.9986    , -88.4514    ,-55.7914    , -25.0816    ])

boston#连续的数据，回归

{‘data’: array([[6.3200e-03, 1.8000e+01, 2.3100e+00, …, 1.5300e+01, 3.9690e+02,
4.9800e+00],
[2.7310e-02, 0.0000e+00, 7.0700e+00, …, 1.7800e+01, 3.9690e+02,
9.1400e+00],
[2.7290e-02, 0.0000e+00, 7.0700e+00, …, 1.7800e+01, 3.9283e+02,
4.0300e+00],
…,
[6.0760e-02, 0.0000e+00, 1.1930e+01, …, 2.1000e+01, 3.9690e+02,
5.6400e+00],
[1.0959e-01, 0.0000e+00, 1.1930e+01, …, 2.1000e+01, 3.9345e+02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, …, 2.1000e+01, 3.9690e+02,
7.8800e+00]]),
‘target’: array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
35.4, 24.7, 31.6, 23.3, 19.6, 18.7, 16. , 22.2, 25. , 33. , 23.5,
19.4, 22. , 17.4, 20.9, 24.2, 21.7, 22.8, 23.4, 24.1, 21.4, 20. ,
20.8, 21.2, 20.3, 28. , 23.9, 24.8, 22.9, 23.9, 26.6, 22.5, 22.2,
23.6, 28.7, 22.6, 22. , 22.9, 25. , 20.6, 28.4, 21.4, 38.7, 43.8,
33.2, 27.5, 26.5, 18.6, 19.3, 20.1, 19.5, 19.5, 20.4, 19.8, 19.4,
21.7, 22.8, 18.8, 18.7, 18.5, 18.3, 21.2, 19.2, 20.4, 19.3, 22. ,
20.3, 20.5, 17.3, 18.8, 21.4, 15.7, 16.2, 18. , 14.3, 19.2, 19.6,
23. , 18.4, 15.6, 18.1, 17.4, 17.1, 13.3, 17.8, 14. , 14.4, 13.4,
15.6, 11.8, 13.8, 15.6, 14.6, 17.8, 15.4, 21.5, 19.6, 15.3, 19.4,
17. , 15.6, 13.1, 41.3, 24.3, 23.3, 27. , 50. , 50. , 50. , 22.7,
25. , 50. , 23.8, 23.8, 22.3, 17.4, 19.1, 23.1, 23.6, 22.6, 29.4,
23.2, 24.6, 29.9, 37.2, 39.8, 36.2, 37.9, 32.5, 26.4, 29.6, 50. ,
32. , 29.8, 34.9, 37. , 30.5, 36.4, 31.1, 29.1, 50. , 33.3, 30.3,
34.6, 34.9, 32.9, 24.1, 42.3, 48.5, 50. , 22.6, 24.4, 22.5, 24.4,
20. , 21.7, 19.3, 22.4, 28.1, 23.7, 25. , 23.3, 28.7, 21.5, 23. ,
26.7, 21.7, 27.5, 30.1, 44.8, 50. , 37.6, 31.6, 46.7, 31.5, 24.3,
31.7, 41.7, 48.3, 29. , 24. , 25.1, 31.5, 23.7, 23.3, 22. , 20.1,
22.2, 23.7, 17.6, 18.5, 24.3, 20.5, 24.5, 26.2, 24.4, 24.8, 29.6,
42.8, 21.9, 20.9, 44. , 50. , 36. , 30.1, 33.8, 43.1, 48.8, 31. ,
36.5, 22.8, 30.7, 50. , 43.5, 20.7, 21.1, 25.2, 24.4, 35.2, 32.4,
32. , 33.2, 33.1, 29.1, 35.1, 45.4, 35.4, 46. , 50. , 32.2, 22. ,
20.1, 23.2, 22.3, 24.8, 28.5, 37.3, 27.9, 23.9, 21.7, 28.6, 27.1,
20.3, 22.5, 29. , 24.8, 22. , 26.4, 33.1, 36.1, 28.4, 33.4, 28.2,
22.8, 20.3, 16.1, 22.1, 19.4, 21.6, 23.8, 16.2, 17.8, 19.8, 23.1,
21. , 23.8, 23.1, 20.4, 18.5, 25. , 24.6, 23. , 22.2, 19.3, 22.6,
19.8, 17.1, 19.4, 22.2, 20.7, 21.1, 19.5, 18.5, 20.6, 19. , 18.7,
32.7, 16.5, 23.9, 31.2, 17.5, 17.2, 23.1, 24.5, 26.6, 22.9, 24.1,
18.6, 30.1, 18.2, 20.6, 17.8, 21.7, 22.7, 22.6, 25. , 19.9, 20.8,
16.8, 21.9, 27.5, 21.9, 23.1, 50. , 50. , 50. , 50. , 50. , 13.8,
13.8, 15. , 13.9, 13.3, 13.1, 10.2, 10.4, 10.9, 11.3, 12.3, 8.8,
7.2, 10.5, 7.4, 10.2, 11.5, 15.1, 23.2, 9.7, 13.8, 12.7, 13.1,
12.5, 8.5, 5. , 6.3, 5.6, 7.2, 12.1, 8.3, 8.5, 5. , 11.9,
27.9, 17.2, 27.5, 15. , 17.2, 17.9, 16.3, 7. , 7.2, 7.5, 10.4,
8.8, 8.4, 16.7, 14.2, 20.8, 13.4, 11.7, 8.3, 10.2, 10.9, 11. ,
9.5, 14.5, 14.1, 16.1, 14.3, 11.7, 13.4, 9.6, 8.7, 8.4, 12.8,
10.5, 17.1, 18.4, 15.4, 10.8, 11.8, 14.9, 12.6, 14.1, 13. , 13.4,
15.2, 16.1, 17.8, 14.9, 14.1, 12.7, 13.5, 14.9, 20. , 16.4, 17.7,
19.5, 20.2, 21.4, 19.9, 19. , 19.1, 19.1, 20.1, 19.9, 19.6, 23.2,
29.8, 13.8, 13.3, 16.7, 12. , 14.6, 21.4, 23. , 23.7, 25. , 21.8,
20.6, 21.2, 19.1, 20.6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9]),
‘feature_names’: array([‘CRIM’, ‘ZN’, ‘INDUS’, ‘CHAS’, ‘NOX’, ‘RM’, ‘AGE’, ‘DIS’, ‘RAD’,
‘TAX’, ‘PTRATIO’, ‘B’, ‘LSTAT’], dtype=’<U7’),
‘DESCR’: “… _boston_dataset:\n\nBoston house prices dataset\n---------------------------\n\nData Set Characteristics: \n\n :Number of Instances: 506 \n\n :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.\n\n :Attribute Information (in order):\n - CRIM per capita crime rate by town\n - ZN proportion of residential land zoned for lots over 25,000 sq.ft.\n - INDUS proportion of non-retail business acres per town\n - CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n - NOX nitric oxides concentration (parts per 10 million)\n - RM average number of rooms per dwelling\n - AGE proportion of owner-occupied units built prior to 1940\n - DIS weighted distances to five Boston employment centres\n - RAD index of accessibility to radial highways\n - TAX full-value property-tax rate per $10,000\n - PTRATIO pupil-teacher ratio by town\n - B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town\n - LSTAT % lower status of the population\n - MEDV Median value of owner-occupied homes in $1000’s\n\n :Missing Attribute Values: None\n\n :Creator: Harrison, D. and Rubinfeld, D.L.\n\nThis is a copy of UCI ML housing dataset.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/housing/\n\n\nThis dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.\n\nThe Boston house-price data of Harrison, D. and Rubinfeld, D.L. ‘Hedonic\nprices and the demand for clean air’, J. Environ. Economics & Management,\nvol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, ‘Regression diagnostics\n…’, Wiley, 1980. N.B. Various transformations are used in the table on\npages 244-261 of the latter.\n\nThe Boston house-price data has been used in many machine learning papers that address regression\nproblems. \n \n… topic:: References\n\n - Belsley, Kuh & Welsch, ‘Regression diagnostics: Identifying Influential Data and Sources of Collinearity’, Wiley, 1980. 244-261.\n - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.\n”,
‘filename’: ‘D:\anaconda\lib\site-packages\sklearn\datasets\data\boston_house_prices.csv’}

#用回归树拟合正弦曲线
import numpy as np#numpy用于生成点，正弦曲线
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt

rng = np.random.RandomState(1)   #用numpy生成随机数种子

5*rng.rand(80,1)
"""
rng.rang()用于随机生成0-1的随机数，比如rng.rand(10)表示生成10个0-1之间的随机数。
因为这个数太小了，所以把它*5
"""

array([[7.20324493e-01],
[1.14374817e-04],
[3.02332573e-01],
[1.46755891e-01],
[9.23385948e-02],
[1.86260211e-01],
[3.45560727e-01],
[3.96767474e-01],
[5.38816734e-01],
[4.19194514e-01],
[6.85219500e-01],
[2.04452250e-01],
[8.78117436e-01],
[2.73875932e-02],
[6.70467510e-01],
[4.17304802e-01],
[5.58689828e-01],
[1.40386939e-01],
[1.98101489e-01],
[8.00744569e-01],
[9.68261576e-01],
[3.13424178e-01],
[6.92322616e-01],
[8.76389152e-01],
[8.94606664e-01],
[8.50442114e-02],
[3.90547832e-02],
[1.69830420e-01],
[8.78142503e-01],
[9.83468338e-02],
[4.21107625e-01],
[9.57889530e-01],
[5.33165285e-01],
[6.91877114e-01],
[3.15515631e-01],
[6.86500928e-01],
[8.34625672e-01],
[1.82882773e-02],
[7.50144315e-01],
[9.88861089e-01],
[7.48165654e-01],
[2.80443992e-01],
[7.89279328e-01],
[1.03226007e-01],
[4.47893526e-01],
[9.08595503e-01],
[2.93614148e-01],
[2.87775339e-01],
[1.30028572e-01],
[1.93669579e-02],
[6.78835533e-01],
[2.11628116e-01],
[2.65546659e-01],
[4.91573159e-01],
[5.33625451e-02],
[5.74117605e-01],
[1.46728575e-01],
[5.89305537e-01],
[6.99758360e-01],
[1.02334429e-01],
[4.14055988e-01],
[6.94400158e-01],
[4.14179270e-01],
[4.99534589e-02],
[5.35896406e-01],
[6.63794645e-01],
[5.14889112e-01],
[9.44594756e-01],
[5.86555041e-01],
[9.03401915e-01],
[1.37474704e-01],
[1.39276347e-01],
[8.07391289e-01],
[3.97676837e-01],
[1.65354197e-01],
[9.27508580e-01],
[3.47765860e-01],
[7.50812103e-01],
[7.25997985e-01],
[8.83306091e-01]])

np.sort(5*rng.rand(80,1),axis = 0)

array([[2.01012446e-03],
[6.97578649e-02],
[7.50949037e-02],
[7.76663778e-02],
[1.07624026e-01],
[2.69546360e-01],
[3.29805453e-01],
[3.31674172e-01],
[3.32682407e-01],
[6.31647597e-01],
[6.54984224e-01],
[6.75395790e-01],
[8.69778333e-01],
[8.80981278e-01],
[9.20051008e-01],
[9.67171413e-01],
[1.05087005e+00],
[1.17181043e+00],
[1.30157549e+00],
[1.31648385e+00],
[1.32459779e+00],
[1.42859426e+00],
[1.57622402e+00],
[1.66031787e+00],
[1.72368326e+00],
[1.85042099e+00],
[1.88290157e+00],
[1.95003857e+00],
[1.97437806e+00],
[2.10096840e+00],
[2.16838174e+00],
[2.29940133e+00],
[2.42995334e+00],
[2.45126761e+00],
[2.52831083e+00],
[2.62335155e+00],
[2.73173408e+00],
[2.74773961e+00],
[2.78326594e+00],
[2.87355752e+00],
[2.88928608e+00],
[2.91007090e+00],
[2.96740704e+00],
[2.99555154e+00],
[3.02155241e+00],
[3.02358050e+00],
[3.06015589e+00],
[3.08389179e+00],
[3.14858754e+00],
[3.19730440e+00],
[3.20783104e+00],
[3.39534418e+00],
[3.67532982e+00],
[3.76377777e+00],
[3.86089015e+00],
[3.93964617e+00],
[3.99301796e+00],
[4.02377282e+00],
[4.03680264e+00],
[4.04745346e+00],
[4.13557736e+00],
[4.14422904e+00],
[4.22367223e+00],
[4.39415992e+00],
[4.46444354e+00],
[4.52696159e+00],
[4.53907926e+00],
[4.57803175e+00],
[4.59300889e+00],
[4.59366718e+00],
[4.62403985e+00],
[4.63090713e+00],
[4.65986035e+00],
[4.70053741e+00],
[4.73985106e+00],
[4.74508160e+00],
[4.75088060e+00],
[4.81631264e+00],
[4.86891769e+00],
[4.88379575e+00]])

X = np.sort(5*rng.rand(80,1),axis = 0) #生成0~5之间的x
#sort把生成的数进行排序
#在python中一维数组不分行列
y = np.sin(X).ravel()#生成正弦曲线

plt.figure()
plt.scatter(X,y,s=20,edgecolor="black",c="darkorange",labeL="data")

<matplotlib.collections.PathCollection at 0x22cd4354490>

"""
对正弦曲线的值加上或者减去一个随机数，产生噪声
对y进行切片，5表示步长。生成16个随机数
0.5 - rng.rand(16)为[-0.5，0.5]
"""
y[::5] += 3*(0.5 - rng.rand(16))

array([-1.01790064e+00, 4.67258703e-02, 5.17981159e-02, 9.15349072e-02,
1.36013223e-01, -1.12214917e+00, 1.54128182e-01, 1.84246556e-01,
3.27470284e-01, 3.81976939e-01, 1.35315808e+00, 5.62399817e-01,
5.87694942e-01, 6.57213155e-01, 6.65222035e-01, 1.90737005e+00,
7.09402680e-01, 7.62488632e-01, 8.06470557e-01, 9.44798863e-01,
1.76681973e+00, 9.69083168e-01, 9.71905000e-01, 9.92771229e-01,
9.99840513e-01, 1.55677801e+00, 9.99998816e-01, 9.98250987e-01,
9.65877280e-01, 9.59472363e-01, 1.89541201e+00, 9.07470751e-01,
9.04254287e-01, 8.46416049e-01, 8.40252513e-01, 5.05650397e-01,
7.77800663e-01, 7.70463889e-01, 7.69031842e-01, 7.50404333e-01,
9.56504420e-01, 6.82538309e-01, 5.48131997e-01, 5.27159720e-01,
5.06685960e-01, -4.93205468e-01, 4.33574943e-01, 3.41967566e-01,
2.54966423e-01, 4.90051834e-02, 9.73717835e-01, 9.11173438e-03,
4.74659830e-05, -3.04626900e-01, -4.02292594e-01, -1.03410129e+00,
-4.94630294e-01, -5.58934116e-01, -5.91336180e-01, -6.65213004e-01,
3.54645177e-02, -7.67045894e-01, -7.96485442e-01, -8.15650825e-01,
-8.29636728e-01, -1.13779682e+00, -9.67082046e-01, -9.82200459e-01,
-9.82866499e-01, -9.83239974e-01, -5.33229129e-01, -9.87461231e-01,
-9.90763889e-01, -9.91515198e-01, -9.98591407e-01, 3.29500095e-01,
-9.99838785e-01, -9.99050544e-01, -9.80126479e-01, -9.71955796e-01])

#np.random.rand(数组结构)，生成随机数组的函数
#了解将为函数ravel()的用法
np.random.random((2,1))
np.random.random((2,1)).ravel()
np.random.random((2,1)).ravel().shape

(2,)

regr_1 = DecisionTreeRegressor(max_depth = 2)
regr_2 = DecisionTreeRegressor(max_depth = 5)
regr_1.fit(X,y)
regr_2.fit(X,y)

DecisionTreeRegressor(max_depth=5)

X_test = np.arange(0.0,5.0,0.01)[:,np.newaxis]
#np.arange(开始点，结束点，步长)   【开始点，结束点）
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)

np.arange(0.0,5.0,0.01)[:,]

array([0. , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 , 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43,
0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 , 0.51, 0.52, 0.53, 0.54,
0.55, 0.56, 0.57, 0.58, 0.59, 0.6 , 0.61, 0.62, 0.63, 0.64, 0.65,
0.66, 0.67, 0.68, 0.69, 0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76,
0.77, 0.78, 0.79, 0.8 , 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87,
0.88, 0.89, 0.9 , 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98,
0.99, 1. , 1.01, 1.02, 1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09,
1.1 , 1.11, 1.12, 1.13, 1.14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.2 ,
1.21, 1.22, 1.23, 1.24, 1.25, 1.26, 1.27, 1.28, 1.29, 1.3 , 1.31,
1.32, 1.33, 1.34, 1.35, 1.36, 1.37, 1.38, 1.39, 1.4 , 1.41, 1.42,
1.43, 1.44, 1.45, 1.46, 1.47, 1.48, 1.49, 1.5 , 1.51, 1.52, 1.53,
1.54, 1.55, 1.56, 1.57, 1.58, 1.59, 1.6 , 1.61, 1.62, 1.63, 1.64,
1.65, 1.66, 1.67, 1.68, 1.69, 1.7 , 1.71, 1.72, 1.73, 1.74, 1.75,
1.76, 1.77, 1.78, 1.79, 1.8 , 1.81, 1.82, 1.83, 1.84, 1.85, 1.86,
1.87, 1.88, 1.89, 1.9 , 1.91, 1.92, 1.93, 1.94, 1.95, 1.96, 1.97,
1.98, 1.99, 2. , 2.01, 2.02, 2.03, 2.04, 2.05, 2.06, 2.07, 2.08,
2.09, 2.1 , 2.11, 2.12, 2.13, 2.14, 2.15, 2.16, 2.17, 2.18, 2.19,
2.2 , 2.21, 2.22, 2.23, 2.24, 2.25, 2.26, 2.27, 2.28, 2.29, 2.3 ,
2.31, 2.32, 2.33, 2.34, 2.35, 2.36, 2.37, 2.38, 2.39, 2.4 , 2.41,
2.42, 2.43, 2.44, 2.45, 2.46, 2.47, 2.48, 2.49, 2.5 , 2.51, 2.52,
2.53, 2.54, 2.55, 2.56, 2.57, 2.58, 2.59, 2.6 , 2.61, 2.62, 2.63,
2.64, 2.65, 2.66, 2.67, 2.68, 2.69, 2.7 , 2.71, 2.72, 2.73, 2.74,
2.75, 2.76, 2.77, 2.78, 2.79, 2.8 , 2.81, 2.82, 2.83, 2.84, 2.85,
2.86, 2.87, 2.88, 2.89, 2.9 , 2.91, 2.92, 2.93, 2.94, 2.95, 2.96,
2.97, 2.98, 2.99, 3. , 3.01, 3.02, 3.03, 3.04, 3.05, 3.06, 3.07,
3.08, 3.09, 3.1 , 3.11, 3.12, 3.13, 3.14, 3.15, 3.16, 3.17, 3.18,
3.19, 3.2 , 3.21, 3.22, 3.23, 3.24, 3.25, 3.26, 3.27, 3.28, 3.29,
3.3 , 3.31, 3.32, 3.33, 3.34, 3.35, 3.36, 3.37, 3.38, 3.39, 3.4 ,
3.41, 3.42, 3.43, 3.44, 3.45, 3.46, 3.47, 3.48, 3.49, 3.5 , 3.51,
3.52, 3.53, 3.54, 3.55, 3.56, 3.57, 3.58, 3.59, 3.6 , 3.61, 3.62,
3.63, 3.64, 3.65, 3.66, 3.67, 3.68, 3.69, 3.7 , 3.71, 3.72, 3.73,
3.74, 3.75, 3.76, 3.77, 3.78, 3.79, 3.8 , 3.81, 3.82, 3.83, 3.84,
3.85, 3.86, 3.87, 3.88, 3.89, 3.9 , 3.91, 3.92, 3.93, 3.94, 3.95,
3.96, 3.97, 3.98, 3.99, 4. , 4.01, 4.02, 4.03, 4.04, 4.05, 4.06,
4.07, 4.08, 4.09, 4.1 , 4.11, 4.12, 4.13, 4.14, 4.15, 4.16, 4.17,
4.18, 4.19, 4.2 , 4.21, 4.22, 4.23, 4.24, 4.25, 4.26, 4.27, 4.28,
4.29, 4.3 , 4.31, 4.32, 4.33, 4.34, 4.35, 4.36, 4.37, 4.38, 4.39,
4.4 , 4.41, 4.42, 4.43, 4.44, 4.45, 4.46, 4.47, 4.48, 4.49, 4.5 ,
4.51, 4.52, 4.53, 4.54, 4.55, 4.56, 4.57, 4.58, 4.59, 4.6 , 4.61,
4.62, 4.63, 4.64, 4.65, 4.66, 4.67, 4.68, 4.69, 4.7 , 4.71, 4.72,
4.73, 4.74, 4.75, 4.76, 4.77, 4.78, 4.79, 4.8 , 4.81, 4.82, 4.83,
4.84, 4.85, 4.86, 4.87, 4.88, 4.89, 4.9 , 4.91, 4.92, 4.93, 4.94,
4.95, 4.96, 4.97, 4.98, 4.99])

l = np.array([1,2,3,4])
l
#一维

array([1, 2, 3, 4])

l.shape #查看数组的行列，可以看到，l是四行的一维数组，一维不分行列

(4,)

l[:,np.newaxis]#升维

array([[1],
[2],
[3],
[4]])

l[:,np.newaxis].shape #观察到是四行1列的二维数组

(4, 1)

l[np.newaxis,:].shape #一行四列，是上面的转置

(1, 4)

X_test = np.arange(0.0,5.0,0.01)[:,np.newaxis]

y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)

y_1

array([-0.18445037, -0.18445037, -0.18445037, -0.18445037, -0.18445037,
-0.18445037, -0.18445037, -0.18445037, -0.18445037, -0.18445037,
-0.18445037, -0.18445037, -0.18445037, -0.18445037, -0.18445037,
-0.18445037, -0.18445037, -0.18445037, -0.18445037, -0.18445037,
-0.18445037, -0.18445037, -0.18445037, -0.18445037, -0.18445037,
-0.18445037, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
0.76897391, 0.76897391, 0.76897391, 0.76897391, 0.76897391,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.50195873, -0.50195873, -0.50195873, -0.50195873, -0.50195873,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037,
-0.86237037, -0.86237037, -0.86237037, -0.86237037, -0.86237037])

y_2

array([-1.01790064, -1.01790064, -1.01790064, -1.01790064, 0.04926199,
0.04926199, 0.04926199, 0.04926199, 0.11377406, 0.11377406,
0.11377406, 0.11377406, 0.11377406, 0.11377406, -1.12214917,
0.16918737, 0.16918737, 0.16918737, 0.16918737, 0.16918737,
0.16918737, 0.16918737, 0.16918737, 0.16918737, 0.16918737,
0.16918737, 0.32747028, 0.32747028, 0.32747028, 0.32747028,
0.32747028, 0.32747028, 0.32747028, 0.32747028, 0.32747028,
0.32747028, 0.32747028, 0.38197694, 0.38197694, 0.38197694,
0.38197694, 0.38197694, 0.38197694, 0.38197694, 0.38197694,
0.38197694, 0.38197694, 0.38197694, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 1.04916323,
1.04916323, 1.04916323, 1.04916323, 1.04916323, 0.79188977,
0.79188977, 0.79188977, 0.79188977, 0.79188977, 0.79188977,
0.79188977, 0.79188977, 0.79188977, 0.79188977, 0.79188977,
0.79188977, 0.79188977, 0.79188977, 0.79188977, 0.79188977,
0.79188977, 0.79188977, 0.79188977, 0.79188977, 0.79188977,
0.79188977, 0.79188977, 0.79188977, 0.79188977, 0.79188977,
0.79188977, 0.79188977, 0.79188977, 0.79188977, 0.79188977,
0.79188977, 0.79188977, 0.79188977, 0.79188977, 0.79188977,
0.79188977, 0.79188977, 0.79188977, 0.79188977, 0.79188977,
0.79188977, 0.79188977, 0.79188977, 0.79188977, 0.79188977,
0.79188977, 0.79188977, 0.79188977, 0.548132 , 0.548132 ,
0.548132 , 0.548132 , 0.548132 , 0.548132 , 0.548132 ,
0.548132 , 0.548132 , 0.548132 , 0.51692284, 0.51692284,
0.51692284, 0.51692284, 0.51692284, 0.51692284, -0.49320547,
-0.49320547, -0.49320547, -0.49320547, -0.49320547, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
0.29462731, 0.29462731, 0.29462731, 0.29462731, 0.29462731,
-0.3046269 , -0.3046269 , -0.3046269 , -0.3046269 , -0.3046269 ,
-0.3046269 , -0.3046269 , -0.3046269 , -0.3046269 , -0.3046269 ,
-0.3046269 , -0.3046269 , -0.3046269 , -0.3046269 , -0.3046269 ,
-0.3046269 , -0.3046269 , -0.3046269 , -0.3046269 , -0.3046269 ,
-0.3046269 , -0.40229259, -0.40229259, -0.40229259, -0.40229259,
-0.40229259, -0.40229259, -0.40229259, -1.03410129, -1.03410129,
-1.03410129, -1.03410129, -1.03410129, -0.5775284 , -0.5775284 ,
-0.5775284 , -0.5775284 , -0.5775284 , -0.5775284 , -0.5775284 ,
-0.5775284 , -0.5775284 , -0.5775284 , -0.5775284 , -0.5775284 ,
-0.5775284 , -0.5775284 , -0.5775284 , -0.5775284 , -0.5775284 ,
-0.5775284 , -0.5775284 , -0.5775284 , -0.5775284 , -0.5775284 ,
-0.5775284 , -0.5775284 , -0.5775284 , -0.5775284 , -0.5775284 ,
-0.5775284 , -0.5775284 , -0.5775284 , 0.03546452, 0.03546452,
0.03546452, 0.03546452, 0.03546452, 0.03546452, 0.03546452,
-0.78176567, -0.78176567, -0.78176567, -0.78176567, -0.78176567,
-0.78176567, -0.78176567, -0.78176567, -0.82264378, -0.82264378,
-0.82264378, -0.82264378, -0.82264378, -0.82264378, -0.82264378,
-0.82264378, -0.82264378, -0.82264378, -0.82264378, -0.82264378,
-0.82264378, -1.13779682, -1.13779682, -1.13779682, -1.13779682,
-1.13779682, -1.13779682, -1.13779682, -1.13779682, -1.13779682,
-1.13779682, -1.13779682, -1.13779682, -1.13779682, -1.13779682,
-1.13779682, -1.13779682, -1.13779682, -0.93521665, -0.93521665,
-0.93521665, -0.93521665, -0.93521665, -0.93521665, -0.93521665,
-0.93521665, -0.93521665, -0.93521665, -0.93521665, -0.93521665,
-0.93521665, -0.93521665, -0.93521665, -0.93521665, -0.93521665,
-0.93521665, -0.93521665, -0.93521665, -0.93521665, -0.93521665,
-0.93521665, -0.93521665, -0.93521665, -0.93521665, -0.93521665,
-0.93521665, -0.93521665, -0.93521665, 0.32950009, -0.99944466,
-0.99944466, -0.99944466, -0.99944466, -0.99944466, -0.99944466,
-0.99944466, -0.99944466, -0.99944466, -0.99944466, -0.99944466,
-0.99944466, -0.99944466, -0.99944466, -0.99944466, -0.97604114,
-0.97604114, -0.97604114, -0.97604114, -0.97604114, -0.97604114,
-0.97604114, -0.97604114, -0.97604114, -0.97604114, -0.97604114,
-0.97604114, -0.97604114, -0.97604114, -0.97604114, -0.97604114])

plt.figure()
#scatter是用于画散点图的，点的大小为“20”，边框颜色为“黑色”，点的颜色为“深橙色”，标签
plt.scatter(X,y,s=20,edgecolor="black",c="darkorange",label="data")
#plt。plot是画线的（x,y,颜色，线的名字，线宽）
plt.plot(X_test,y_1,color="cornflowerblue",label="max_depth=2",linewidth=2)
plt.plot(X_test,y_2,color="yellowgreen",label="max_depth=5",linewidth=2)
#横坐标
plt.xlabel("data")
#纵坐标
plt.ylabel("target")
#标题
plt.title("Decision Tree Regression")
#显示图例
plt.legend()
#画图
plt.show()

观察不同深度的树的拟合效果。可以看到最大深度=5的线过拟合了，明显受到噪声的影响。而最大深度为2的拟合效果还比较接近正弦曲线

机器学习 | 实战（一）Decision_tree_红酒数据集相关推荐

【机器学习】DecisionTreeClassifier与红酒数据集
1,决策树决策树(Decision Tree)是一种非参数的有监督学习方法,它能够从一系列有特征和标签的数据中总结出决策规则,并用树状图的结构来呈现这些规则,以解决分类和回归问题.决策树算法容易理解 ...
《机器学习实战》源代码和数据集下载
永久有效: 链接: https://pan.baidu.com/s/1nlAPt941Dw7N_v5ZaX42dQ 提取码: 43s8
机器学习实战源码和数据集下载
官网下载地址:下载地址,找到source code下载即可.点击之后也许无法访问. git下载地址:https://github.com/pbharrin/machinelearninginactio ...
《机器学习实战》chapter02 K-近邻算法（KNN）
2.2 示例:使用K-近邻算法改进约会网站的配结果收集数据:提供文本文件准备数据:使用Python解析文本文件(文本转numpy矩阵.归一化数据等) 分析数据:使用Matplotlib画二维扩散图 ...
Logistics回归数据集（testSet.txt）《机器学习实战》【美】Peter Harrington python3.6+pycharm完美实现代码
若你需要这个数据集,直接新建一个.txt文件,将这里的数据全部拷贝过去即可,下面为这本书<机器学习实战>[美]Peter Harrington 第五章 Logistics 回归使用 ...
【10月31日】机器学习实战（二）决策树：隐形眼镜数据集
决策树的优点:计算的复杂度不高,输出的结果易于理解,对中间值的确实不敏感,可以处理不相关的特征数据决策树的缺点:可能会产生过度匹配的问题. 其本质的思想是通过寻找区分度最好的特征(属性),用于支持分 ...
《机器学习实战》源码和数据集的下载
<机器学习实战>这本书对于我们了解机器学习原理和代码实现提供了很大的帮助,源码和数据集可在其英文版的官方网站进行下载: https://www.manning.com/books/ma ...
机器学习实战的数据集在哪找_在哪里找到很棒的机器学习数据集
机器学习实战的数据集在哪找 Good machine learning research starts with an exceptional dataset. There is no need to ...
【机器学习】岭回归和LASSO回归详解以及相关计算实例-加利福尼亚的房价数据集、红酒数据集
文章目录一,岭回归和LASSO回归 1.1 多重共线性 1.2 岭回归接手 1.3 线性回归的漏洞(线性系数的矛盾解释) 1.4 Ridge Regression 1.5 岭回归实例(加利福尼亚的房 ...
【机器学习】拟合优度度量和梯度下降（红酒数据集的线性回归模型sklearnRidge）
文章目录一.拟合优度度量(可决系数) 1.1总离差平方和的分解 1.2 TSS=ESS+RSS 1.3 红酒数据集实例R2_score实现代码二. 梯度下降 2.1 损失函数直观图(单特征/变量举 ...

机器学习 | 实战（一）Decision_tree_红酒数据集

机器学习 | 实战（一）Decision_tree_红酒数据集相关推荐

最新文章

热门文章