python中的auto_ml自动机器学习框架学习实践

之前就有接触过auto_ml这个自动机器学习框架，但是一直没有时间做一个简单的记录总结，以便于后续有时间继续学习，我相信随着机器学习的普及推广和发展，自动机器学习一定会占据越来越大的作用，因为机器学习、深度学习里面很大的一部分时间都要花在特征工程、模型选择、组合和参数调优上面，auto_ml框架提供了一种很好的解决思路，当前的自动学习框架也有很多，想要完整地进行学习还是需要花费一定的时间的，这里就简单对之前使用的auto_ml做个简单的记录。

由于数据集的缘故我不能随意公开使用，这里索性直接使用官方提供的Demo来简单学习实践一下，之后使用自己的数据集的时候只需要做一点数据集规范格式的统一处理就好了。

以波士顿房价数据为例，简单的一个小例子如下：

def bostonSimpleFunc():'''波士顿房价数据的简单应用实例'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output','CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)ml_predictor.score(test_data, test_data.MEDV)

运行结果如下：

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.If you have any issues, or new feature ideas, let us know at http://auto.ml
You are running on version 2.9.10
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'warm_start': True, 'learning_rate': 0.1}
Running basic data cleaning
Fitting DataFrameVectorizer
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'warm_start': True, 'learning_rate': 0.1}********************************************************************************************
About to fit the pipeline for the model GradientBoostingRegressor to predict MEDV
Started at:
2019-06-12 09:14:59
[1] random_holdout_set_from_training_data's score is: -9.82
[2] random_holdout_set_from_training_data's score is: -9.054
[3] random_holdout_set_from_training_data's score is: -8.48
[4] random_holdout_set_from_training_data's score is: -7.925
[5] random_holdout_set_from_training_data's score is: -7.424
[6] random_holdout_set_from_training_data's score is: -7.051
[7] random_holdout_set_from_training_data's score is: -6.608
[8] random_holdout_set_from_training_data's score is: -6.315
[9] random_holdout_set_from_training_data's score is: -6.0
[10] random_holdout_set_from_training_data's score is: -5.728
[11] random_holdout_set_from_training_data's score is: -5.499
[12] random_holdout_set_from_training_data's score is: -5.288
[13] random_holdout_set_from_training_data's score is: -5.126
[14] random_holdout_set_from_training_data's score is: -4.918
[15] random_holdout_set_from_training_data's score is: -4.775
[16] random_holdout_set_from_training_data's score is: -4.625
[17] random_holdout_set_from_training_data's score is: -4.513
[18] random_holdout_set_from_training_data's score is: -4.365
[19] random_holdout_set_from_training_data's score is: -4.281
[20] random_holdout_set_from_training_data's score is: -4.196
[21] random_holdout_set_from_training_data's score is: -4.133
[22] random_holdout_set_from_training_data's score is: -4.033
[23] random_holdout_set_from_training_data's score is: -4.004
[24] random_holdout_set_from_training_data's score is: -3.945
[25] random_holdout_set_from_training_data's score is: -3.913
[26] random_holdout_set_from_training_data's score is: -3.852
[27] random_holdout_set_from_training_data's score is: -3.844
[28] random_holdout_set_from_training_data's score is: -3.795
[29] random_holdout_set_from_training_data's score is: -3.824
[30] random_holdout_set_from_training_data's score is: -3.795
[31] random_holdout_set_from_training_data's score is: -3.778
[32] random_holdout_set_from_training_data's score is: -3.748
[33] random_holdout_set_from_training_data's score is: -3.739
[34] random_holdout_set_from_training_data's score is: -3.72
[35] random_holdout_set_from_training_data's score is: -3.721
[36] random_holdout_set_from_training_data's score is: -3.671
[37] random_holdout_set_from_training_data's score is: -3.644
[38] random_holdout_set_from_training_data's score is: -3.639
[39] random_holdout_set_from_training_data's score is: -3.617
[40] random_holdout_set_from_training_data's score is: -3.62
[41] random_holdout_set_from_training_data's score is: -3.614
[42] random_holdout_set_from_training_data's score is: -3.643
[43] random_holdout_set_from_training_data's score is: -3.647
[44] random_holdout_set_from_training_data's score is: -3.624
[45] random_holdout_set_from_training_data's score is: -3.589
[46] random_holdout_set_from_training_data's score is: -3.578
[47] random_holdout_set_from_training_data's score is: -3.565
[48] random_holdout_set_from_training_data's score is: -3.555
[49] random_holdout_set_from_training_data's score is: -3.549
[50] random_holdout_set_from_training_data's score is: -3.539
[52] random_holdout_set_from_training_data's score is: -3.571
[54] random_holdout_set_from_training_data's score is: -3.545
[56] random_holdout_set_from_training_data's score is: -3.588
[58] random_holdout_set_from_training_data's score is: -3.587
[60] random_holdout_set_from_training_data's score is: -3.584
[62] random_holdout_set_from_training_data's score is: -3.585
[64] random_holdout_set_from_training_data's score is: -3.589
[66] random_holdout_set_from_training_data's score is: -3.59
[68] random_holdout_set_from_training_data's score is: -3.558
[70] random_holdout_set_from_training_data's score is: -3.587
[72] random_holdout_set_from_training_data's score is: -3.583
[74] random_holdout_set_from_training_data's score is: -3.58
[76] random_holdout_set_from_training_data's score is: -3.578
[78] random_holdout_set_from_training_data's score is: -3.577
[80] random_holdout_set_from_training_data's score is: -3.591
[82] random_holdout_set_from_training_data's score is: -3.592
[84] random_holdout_set_from_training_data's score is: -3.586
[86] random_holdout_set_from_training_data's score is: -3.58
[88] random_holdout_set_from_training_data's score is: -3.562
[90] random_holdout_set_from_training_data's score is: -3.561
The number of estimators that were the best for this training dataset: 50
The best score on the holdout set: -3.539421497275334
Finished training the pipeline!
Total training time:
0:00:01Here are the results from our GradientBoostingRegressor
predicting MEDV
Calculating feature responses, for advanced analytics.
The printed list will only contain at most the top 100 features.
+----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+
|    | Feature Name   |   Importance |    Delta |   FR_Decrementing |   FR_Incrementing |   FRD_abs |   FRI_abs |   FRD_MAD |   FRI_MAD |
|----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------|
|  1 | ZN             |       0.0001 |  11.5619 |           -0.0027 |            0.0050 |    0.0027 |    0.0050 |    0.0000 |    0.0000 |
| 13 | CHAS=1.0       |       0.0011 | nan      |          nan      |          nan      |  nan      |  nan      |  nan      |  nan      |
| 12 | CHAS=0.0       |       0.0012 | nan      |          nan      |          nan      |  nan      |  nan      |  nan      |  nan      |
|  2 | INDUS          |       0.0013 |   3.4430 |            0.0070 |           -0.0539 |    0.0070 |    0.0539 |    0.0000 |    0.0000 |
|  7 | RAD            |       0.0029 |   4.2895 |           -0.7198 |            0.0463 |    0.7198 |    0.0463 |    0.3296 |    0.0000 |
|  5 | AGE            |       0.0145 |  13.9801 |            0.0757 |           -0.0292 |    0.2862 |    0.2393 |    0.0000 |    0.0000 |
|  8 | TAX            |       0.0160 |  82.9834 |            0.9411 |           -0.3538 |    0.9691 |    0.3538 |    0.0398 |    0.0000 |
| 10 | B              |       0.0171 |  45.7266 |           -0.1144 |            0.0896 |    0.1746 |    0.1200 |    0.1503 |    0.0000 |
|  3 | NOX            |       0.0193 |   0.0588 |            0.1792 |           -0.1584 |    0.1996 |    0.2047 |    0.0000 |    0.0000 |
|  9 | PTRATIO        |       0.0247 |   1.1130 |            0.5625 |           -0.2905 |    0.5991 |    0.2957 |    0.4072 |    0.1155 |
|  0 | CRIM           |       0.0252 |   4.4320 |           -0.0986 |           -0.4012 |    0.3789 |    0.4623 |    0.0900 |    0.0900 |
|  6 | DIS            |       0.0655 |   1.0643 |            3.4743 |           -0.2346 |    3.5259 |    0.5256 |    0.5473 |    0.2233 |
| 11 | LSTAT          |       0.3086 |   3.5508 |            1.5328 |           -1.6693 |    1.5554 |    1.6703 |    1.3641 |    1.6349 |
|  4 | RM             |       0.5026 |   0.3543 |           -1.1450 |            1.7191 |    1.1982 |    1.8376 |    0.4338 |    0.8010 |
+----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+*******
Legend:
Importance = Feature ImportanceExplanation: A weighted measure of how much of the variance the model is able to explain is due to this column
FR_delta = Feature Response Delta AmountExplanation: Amount this column was incremented or decremented by to calculate the feature reponses
FR_Decrementing = Feature Response From Decrementing Values In This Column By One FR_deltaExplanation: Represents how much the predicted output values respond to subtracting one FR_delta amount from every value in this column
FR_Incrementing = Feature Response From Incrementing Values In This Column By One FR_deltaExplanation: Represents how much the predicted output values respond to adding one FR_delta amount to every value in this column
FRD_MAD = Feature Response From Decrementing- Median Absolute DeltaExplanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if decrementing this feature provokes strong changes that are both positive and negative
FRI_MAD = Feature Response From Incrementing- Median Absolute DeltaExplanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if incrementing this feature provokes strong changes that are both positive and negative
FRD_abs = Feature Response From Decrementing Avg Absolute ChangeExplanation: What is the average absolute change in predicted output values to subtracting one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way
FRI_abs = Feature Response From Incrementing Avg Absolute ChangeExplanation: What is the average absolute change in predicted output values to adding one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way
*******None***********************************************
Advanced scoring metrics for the trained regression model on this particular dataset:Here is the overall RMSE for these predictions:
2.9415706036925924Here is the average of the predictions:
21.3944468736Here is the average actual value on this validation set:
21.4882352941Here is the median prediction:
20.688959488015513Here is the median actual value:
20.15Here is the mean absolute error:
2.011340247445387Here is the median absolute error (robust to outliers):
1.4717184675805761Here is the explained variance:
0.8821274319123865Here is the R-squared value:
0.882007483541501
Count of positive differences (prediction > actual):
51
Count of negative differences:
51
Average positive difference:
1.91755182694
Average negative difference:
-2.10512866795***********************************************[Finished in 2.8s]

作者说了，这个auto_ml是为了产品研发的，提供了很完整的应用，这里从训练测试数据集划分、模型训练、模型持久化、模型加载、模型预测几个部分来拿波士顿房价数据做一个完成的实践，具体如下：

def bostonWholeFunc():'''波士顿房价数据的一个比较完整的实例包括： 训练测试数据集划分、模型训练、模型持久化、模型加载、模型预测'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output', 'CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)test_score = ml_predictor.score(test_data, test_data.MEDV)file_name = ml_predictor.save()trained_model = load_ml_model(file_name)predictions = trained_model.predict(test_data)print('=====================predictions===========================')print(predictions)predictions = trained_model.predict_proba(test_data)print('=====================predictions===========================')print(predictions)

结果如下：

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.If you have any issues, or new feature ideas, let us know at http://auto.ml
You are running on version 2.9.10
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'warm_start': True, 'learning_rate': 0.1}
Running basic data cleaning
Fitting DataFrameVectorizer
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'warm_start': True, 'learning_rate': 0.1}********************************************************************************************
About to fit the pipeline for the model GradientBoostingRegressor to predict MEDV
Started at:
2019-06-12 09:21:21
[1] random_holdout_set_from_training_data's score is: -9.93
[2] random_holdout_set_from_training_data's score is: -9.281
[3] random_holdout_set_from_training_data's score is: -8.683
[4] random_holdout_set_from_training_data's score is: -8.03
[5] random_holdout_set_from_training_data's score is: -7.494
[6] random_holdout_set_from_training_data's score is: -7.074
[7] random_holdout_set_from_training_data's score is: -6.649
[8] random_holdout_set_from_training_data's score is: -6.374
[9] random_holdout_set_from_training_data's score is: -6.115
[10] random_holdout_set_from_training_data's score is: -5.877
[11] random_holdout_set_from_training_data's score is: -5.566
[12] random_holdout_set_from_training_data's score is: -5.391
[13] random_holdout_set_from_training_data's score is: -5.088
[14] random_holdout_set_from_training_data's score is: -4.911
[15] random_holdout_set_from_training_data's score is: -4.692
[16] random_holdout_set_from_training_data's score is: -4.566
[17] random_holdout_set_from_training_data's score is: -4.379
[18] random_holdout_set_from_training_data's score is: -4.296
[19] random_holdout_set_from_training_data's score is: -4.14
[20] random_holdout_set_from_training_data's score is: -4.009
[21] random_holdout_set_from_training_data's score is: -3.92
[22] random_holdout_set_from_training_data's score is: -3.856
[23] random_holdout_set_from_training_data's score is: -3.81
[24] random_holdout_set_from_training_data's score is: -3.72
[25] random_holdout_set_from_training_data's score is: -3.632
[26] random_holdout_set_from_training_data's score is: -3.601
[27] random_holdout_set_from_training_data's score is: -3.538
[28] random_holdout_set_from_training_data's score is: -3.487
[29] random_holdout_set_from_training_data's score is: -3.459
[30] random_holdout_set_from_training_data's score is: -3.458
[31] random_holdout_set_from_training_data's score is: -3.422
[32] random_holdout_set_from_training_data's score is: -3.408
[33] random_holdout_set_from_training_data's score is: -3.356
[34] random_holdout_set_from_training_data's score is: -3.335
[35] random_holdout_set_from_training_data's score is: -3.323
[36] random_holdout_set_from_training_data's score is: -3.313
[37] random_holdout_set_from_training_data's score is: -3.262
[38] random_holdout_set_from_training_data's score is: -3.236
[39] random_holdout_set_from_training_data's score is: -3.207
[40] random_holdout_set_from_training_data's score is: -3.214
[41] random_holdout_set_from_training_data's score is: -3.198
[42] random_holdout_set_from_training_data's score is: -3.188
[43] random_holdout_set_from_training_data's score is: -3.174
[44] random_holdout_set_from_training_data's score is: -3.164
[45] random_holdout_set_from_training_data's score is: -3.122
[46] random_holdout_set_from_training_data's score is: -3.122
[47] random_holdout_set_from_training_data's score is: -3.109
[48] random_holdout_set_from_training_data's score is: -3.11
[49] random_holdout_set_from_training_data's score is: -3.119
[50] random_holdout_set_from_training_data's score is: -3.113
[52] random_holdout_set_from_training_data's score is: -3.113
[54] random_holdout_set_from_training_data's score is: -3.099
[56] random_holdout_set_from_training_data's score is: -3.102
[58] random_holdout_set_from_training_data's score is: -3.097
[60] random_holdout_set_from_training_data's score is: -3.069
[62] random_holdout_set_from_training_data's score is: -3.061
[64] random_holdout_set_from_training_data's score is: -3.024
[66] random_holdout_set_from_training_data's score is: -2.999
[68] random_holdout_set_from_training_data's score is: -2.999
[70] random_holdout_set_from_training_data's score is: -2.984
[72] random_holdout_set_from_training_data's score is: -2.978
[74] random_holdout_set_from_training_data's score is: -2.96
[76] random_holdout_set_from_training_data's score is: -2.943
[78] random_holdout_set_from_training_data's score is: -2.947
[80] random_holdout_set_from_training_data's score is: -2.938
[82] random_holdout_set_from_training_data's score is: -2.921
[84] random_holdout_set_from_training_data's score is: -2.914
[86] random_holdout_set_from_training_data's score is: -2.91
[88] random_holdout_set_from_training_data's score is: -2.901
[90] random_holdout_set_from_training_data's score is: -2.906
[92] random_holdout_set_from_training_data's score is: -2.892
[94] random_holdout_set_from_training_data's score is: -2.885
[96] random_holdout_set_from_training_data's score is: -2.884
[98] random_holdout_set_from_training_data's score is: -2.894
[100] random_holdout_set_from_training_data's score is: -2.88
[103] random_holdout_set_from_training_data's score is: -2.893
[106] random_holdout_set_from_training_data's score is: -2.889
[109] random_holdout_set_from_training_data's score is: -2.886
[112] random_holdout_set_from_training_data's score is: -2.869
[115] random_holdout_set_from_training_data's score is: -2.875
[118] random_holdout_set_from_training_data's score is: -2.852
[121] random_holdout_set_from_training_data's score is: -2.855
[124] random_holdout_set_from_training_data's score is: -2.848
[127] random_holdout_set_from_training_data's score is: -2.854
[130] random_holdout_set_from_training_data's score is: -2.86
[133] random_holdout_set_from_training_data's score is: -2.857
[136] random_holdout_set_from_training_data's score is: -2.854
[139] random_holdout_set_from_training_data's score is: -2.856
[142] random_holdout_set_from_training_data's score is: -2.854
[145] random_holdout_set_from_training_data's score is: -2.845
[148] random_holdout_set_from_training_data's score is: -2.84
[151] random_holdout_set_from_training_data's score is: -2.838
[154] random_holdout_set_from_training_data's score is: -2.838
[157] random_holdout_set_from_training_data's score is: -2.839
[160] random_holdout_set_from_training_data's score is: -2.837
[163] random_holdout_set_from_training_data's score is: -2.838
[166] random_holdout_set_from_training_data's score is: -2.838
[169] random_holdout_set_from_training_data's score is: -2.84
[172] random_holdout_set_from_training_data's score is: -2.828
[175] random_holdout_set_from_training_data's score is: -2.836
[178] random_holdout_set_from_training_data's score is: -2.834
[181] random_holdout_set_from_training_data's score is: -2.836
[184] random_holdout_set_from_training_data's score is: -2.837
[187] random_holdout_set_from_training_data's score is: -2.86
[190] random_holdout_set_from_training_data's score is: -2.862
[193] random_holdout_set_from_training_data's score is: -2.856
[196] random_holdout_set_from_training_data's score is: -2.855
[199] random_holdout_set_from_training_data's score is: -2.857
[202] random_holdout_set_from_training_data's score is: -2.856
[205] random_holdout_set_from_training_data's score is: -2.86
[208] random_holdout_set_from_training_data's score is: -2.859
[211] random_holdout_set_from_training_data's score is: -2.857
[214] random_holdout_set_from_training_data's score is: -2.855
[217] random_holdout_set_from_training_data's score is: -2.852
[220] random_holdout_set_from_training_data's score is: -2.849
[223] random_holdout_set_from_training_data's score is: -2.853
[226] random_holdout_set_from_training_data's score is: -2.845
[229] random_holdout_set_from_training_data's score is: -2.846
[232] random_holdout_set_from_training_data's score is: -2.849
The number of estimators that were the best for this training dataset: 172
The best score on the holdout set: -2.827876248876794
Finished training the pipeline!
Total training time:
0:00:01Here are the results from our GradientBoostingRegressor
predicting MEDV
Calculating feature responses, for advanced analytics.
The printed list will only contain at most the top 100 features.
+----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+
|    | Feature Name   |   Importance |    Delta |   FR_Decrementing |   FR_Incrementing |   FRD_abs |   FRI_abs |   FRD_MAD |   FRI_MAD |
|----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------|
| 12 | CHAS=0.0       |       0.0000 | nan      |          nan      |          nan      |  nan      |  nan      |  nan      |  nan      |
|  1 | ZN             |       0.0004 |  11.5619 |           -0.0194 |            0.0204 |    0.0205 |    0.0230 |    0.0000 |    0.0000 |
| 13 | CHAS=1.0       |       0.0005 | nan      |          nan      |          nan      |  nan      |  nan      |  nan      |  nan      |
|  2 | INDUS          |       0.0031 |   3.4430 |            0.1103 |            0.0494 |    0.1565 |    0.1543 |    0.0597 |    0.0000 |
|  7 | RAD            |       0.0059 |   4.2895 |           -0.3558 |            0.0537 |    0.3620 |    0.1431 |    0.3727 |    0.0000 |
|  5 | AGE            |       0.0105 |  13.9801 |            0.2805 |           -0.3050 |    0.5735 |    0.4734 |    0.3615 |    0.2435 |
| 10 | B              |       0.0118 |  45.7266 |           -0.1885 |            0.1507 |    0.3139 |    0.2903 |    0.1688 |    0.0582 |
|  8 | TAX            |       0.0167 |  82.9834 |            1.1477 |           -0.4399 |    1.2920 |    0.4563 |    0.2671 |    0.2617 |
|  9 | PTRATIO        |       0.0247 |   1.1130 |            0.5095 |           -0.2323 |    0.5599 |    0.4590 |    0.2984 |    0.3357 |
|  0 | CRIM           |       0.0284 |   4.4320 |           -0.4701 |           -0.2061 |    0.7788 |    0.4938 |    0.5027 |    0.2806 |
|  3 | NOX            |       0.0298 |   0.0588 |            0.3083 |           -0.1691 |    0.4285 |    0.3968 |    0.0745 |    0.0745 |
|  6 | DIS            |       0.0608 |   1.0643 |            3.4966 |           -0.3628 |    3.5823 |    0.8045 |    0.9935 |    0.3655 |
|  4 | RM             |       0.3571 |   0.3543 |           -1.2174 |            1.4995 |    1.3628 |    1.7090 |    0.7740 |    1.0375 |
| 11 | LSTAT          |       0.4504 |   3.5508 |            1.9849 |           -1.8635 |    2.0343 |    1.9289 |    1.8354 |    1.5375 |
+----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+*******
Legend:
Importance = Feature ImportanceExplanation: A weighted measure of how much of the variance the model is able to explain is due to this column
FR_delta = Feature Response Delta AmountExplanation: Amount this column was incremented or decremented by to calculate the feature reponses
FR_Decrementing = Feature Response From Decrementing Values In This Column By One FR_deltaExplanation: Represents how much the predicted output values respond to subtracting one FR_delta amount from every value in this column
FR_Incrementing = Feature Response From Incrementing Values In This Column By One FR_deltaExplanation: Represents how much the predicted output values respond to adding one FR_delta amount to every value in this column
FRD_MAD = Feature Response From Decrementing- Median Absolute DeltaExplanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if decrementing this feature provokes strong changes that are both positive and negative
FRI_MAD = Feature Response From Incrementing- Median Absolute DeltaExplanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if incrementing this feature provokes strong changes that are both positive and negative
FRD_abs = Feature Response From Decrementing Avg Absolute ChangeExplanation: What is the average absolute change in predicted output values to subtracting one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way
FRI_abs = Feature Response From Incrementing Avg Absolute ChangeExplanation: What is the average absolute change in predicted output values to adding one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way
*******None***********************************************
Advanced scoring metrics for the trained regression model on this particular dataset:Here is the overall RMSE for these predictions:
2.4474947386663786Here is the average of the predictions:
21.2925792927Here is the average actual value on this validation set:
21.4882352941Here is the median prediction:
20.457423442279662Here is the median actual value:
20.15Here is the mean absolute error:
1.844793596155306Here is the median absolute error (robust to outliers):
1.3340192567295777Here is the explained variance:
0.9188375538746201Here is the R-squared value:
0.9183155397464807
Count of positive differences (prediction > actual):
51
Count of negative differences:
51
Average positive difference:
1.64913759477
Average negative difference:
-2.04044959754***********************************************We have saved the trained pipeline to a filed called "auto_ml_saved_pipeline.dill"
It is saved in the directory:
C:\Users\18706\Desktop\myBlogs\auto_ml_use
To use it to get predictions, please follow the following flow (adjusting for your own uses as necessary:`from auto_ml.utils_models import load_ml_model
`trained_ml_pipeline = load_ml_model("auto_ml_saved_pipeline.dill")
`trained_ml_pipeline.predict(data)`Note that this pickle/dill file can only be loaded in an environment with the same modules installed, and running the same Python version.
This version of Python is:
sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)When passing in new data to get predictions on, columns that were not present (or were not found to be useful) in the training data will be silently ignored.
It is worthwhile to make sure that you feed in all the most useful data points though, to make sure you can get the highest quality predictions.
=====================predictions===========================
[23.503099796820333, 32.63486484873551, 17.607843570794248, 22.96364141712182, 18.037259790025, 22.154154350077157, 18.157171399351753, 14.490724400217747, 20.91569106207268, 21.371745165599958, 19.978460029298827, 17.617959317911595, 6.657480263073871, 21.259425283809687, 19.30470390603625, 23.54754498054679, 20.616057833202493, 8.569816325663448, 45.01902942229479, 15.319975928505148, 23.84765254861352, 24.49050663723932, 12.344561585629016, 23.24874551694055, 15.137348894013865, 15.067038653704085, 21.674735923166942, 12.88017013620315, 19.43339890697579, 20.933210490656045, 20.235546222120107, 22.99264652948031, 20.45638944287541, 20.50831821637611, 14.026411558432988, 17.14000803427353, 34.322736768893236, 19.82116882409099, 20.757084718131125, 23.523990773770624, 17.92101235838185, 30.745980540024213, 45.09505946725109, 18.76719301853909, 23.69250732281568, 14.627546717865679, 15.404318347865019, 23.856332667077602, 18.597560915078148, 28.295069087679007, 20.335783749261154, 35.49551328178157, 17.049478769941757, 27.36240739278428, 49.168123673644864, 21.919364008618228, 16.431621230418827, 32.50614954154076, 22.60486571683311, 17.190717714534216, 24.86659240393153, 34.726632201151446, 32.56154963374883, 17.991423510542266, 23.19139847589728, 16.3827778391806, 13.763406903575234, 23.041746542718485, 28.897952087920405, 15.16115409656009, 20.54704218671605, 27.630784534960636, 9.265217126500687, 20.218468086624206, 22.678130640115423, 3.978712919679104, 20.458457441683915, 44.47945990229906, 12.603336785642627, 11.482102006681343, 21.066151218556975, 13.559181962607349, 21.19973222974325, 10.447704116792627, 20.110776756244167, 28.928923567731772, 15.527462244687818, 23.24725371877329, 25.743821297087276, 18.04671684265537, 22.950747524482065, 9.088864852661203, 19.075035374223955, 18.42257896844079, 23.564483816162195, 19.647455910849818, 44.12778583727594, 11.427374611849514, 12.040264853009598, 16.998049081305517, 20.25692214075818, 22.80453061159547]
=====================predictions===========================
[[1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0]]
[Finished in 3.3s]

在官方提供的原始实例上，我对输出多加了一层类别的输出。

完整的程序如下：

#!usr/bin/env python
#encoding:utf-8
from __future__ import division'''
__Author__:沂水寒城
功能： auto_ml 学习实践使用GitHub地址：https://github.com/yishuihanhan/auto_ml
官方文档：https://auto-ml.readthedocs.io/en/latest/formatting_data.html
'''from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset
from auto_ml.utils_models import load_ml_modeldef bostonSimpleFunc():'''波士顿房价数据的简单应用实例'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output','CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)ml_predictor.score(test_data, test_data.MEDV)def bostonWholeFunc():'''波士顿房价数据的一个比较完整的实例包括： 训练测试数据集划分、模型训练、模型持久化、模型加载、模型预测'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output', 'CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)test_score = ml_predictor.score(test_data, test_data.MEDV)file_name = ml_predictor.save()trained_model = load_ml_model(file_name)predictions = trained_model.predict(test_data)print('=====================predictions===========================')print(predictions)predictions = trained_model.predict_proba(test_data)print('=====================predictions===========================')print(predictions)if __name__=='__main__':bostonSimpleFunc()bostonWholeFunc()

相应地GitHub地址和官方文档地址在代码的开头部分我都给出来了，感兴趣的话可以去看看，记录学习了！

python中的auto_ml自动机器学习框架学习实践相关推荐

Auto_ML自动机器学习之TPOT——学习笔记（1）
Auto_ML自动机器学习之TPOT--学习笔记(1) 前言 TPOT简介入门练习练习前言没有python的基础,也不是程序员,而自动机器学习也号称要让我们这些小白会用,遂学之. TPOT简介 ...
Auto_ML自动机器学习之TPOT——学习笔记（2）
Auto_ML自动机器学习之TPOT--学习笔记(2) 数据预处理 mat文件格式转化为csv 批量读取文件路径整合--批量mat转csv多文件合并数据预处理 mat文件格式转化为csv 环境 W ...
2018年度自动机器学习框架盘点
面试:你懂什么是分布式系统吗?Redis分布式锁都不会?>>> 自动机器学习在过去的一年里呈爆发式的增长态势,一个又一个新的自动化学习框架的诞生让人眼花缭乱,目不暇接为了方便 ...
自动机器学习框架之一_Auto-Sklearn
当我们做完了特征工程之后,就可以代入模型训练和预测,对于模型的选择及调参,主要根据分析者的经验.在具体使用时,经常遇到同一批数据,同一种模型,不同的分析者得出的结果相差很多. 前面学习了几种常用的 ...
Python中的TCP的客户端UDP学习----第一篇博客
Python中的TCP的客户端&UDP学习--第一篇博客 PS: 每日的怼人句子"我真想把我的脑子放到你的身体里,让你感受一下智慧的光芒" 先说UDP流程发送: 创建套接 ...
python tab键自动补全_设置python中TAB键自动补全方法
设置python中TAB键自动补全方法一.创建自动补全脚本如下: vi /tmp/python/tab.py #!/usr/bin/python # python tab file import s ...
Flask 框架是 Python 中最流行的 Web 框架之一
Flask 是 Python 中最流行的 Web 框架之一,以小巧.灵活.可扩展性强著称, 熟练掌握Flask 框架,深入解析Flask 框架的精髓,希望掌握Flask 最佳实践项目. Flask W ...
RPC - 如何动手实现一个简单RPC框架 - 学习/实践
1.应用场景主要用于学习RPC的原理,工作流程,拆解和组装一个简单的RPC框架. 2.学习/操作 1.文档阅读 31 | 动手实现一个简单的RPC框架(一):原理和程序的结构-极客时间 32 | 动 ...
python中import numpy_Python开发：NumPy学习（一）ndarray数组
一.数据维度一个数据表达一个含义,一组数据表达一个或多个含义. 数据维度概念:一组数据的组织形式,其中有一维数据.二维数据.多维数据.高维数据. 1. 一维数据一维数据由对等关系的有序或无序数据构 ...
python中对象的特性_Python深入学习之对象的属性
Python一切皆对象(object),每个对象都可能有多个属性(attribute).Python的属性有一套统一的管理方案. 属性的__dict__系统对象的属性可能来自于其类定义,叫做类属性( ...

python中的auto_ml自动机器学习框架学习实践

python中的auto_ml自动机器学习框架学习实践相关推荐

最新文章

热门文章