1.Cross Validation (交叉验证)

cross validation大概的意思是:对于原始数据我们要将其一部分分为train_data,一部分分为test_data。train_data用于训练,test_data用于测试准确率。在test_data上测试的结果叫做validation_error。将一个算法作用于一个原始数据,我们不可能只做出随机的划分一次train和test_data,然后得到一个validation_error,就作为衡量这个算法好坏的标准。因为这样存在偶然性。我们必须好多次的随机的划分train_data和test_data,分别在其上面算出各自的validation_error。这样就有一组validation_error,根据这一组validation_error,就可以较好的准确的衡量算法的好坏。

cross validation是在数据量有限的情况下的非常好的一个evaluate performance的方法。而对原始数据划分出train data和test data的方法有很多种,这也就造成了cross validation的方法有很多种。

sklearn中的cross validation模块,最主要的函数是如下函数: 
sklearn.cross_validation.cross_val_score:他的调用形式是scores = cross_validation.cross_val_score(clf, raw_data, raw_target, cv=5, score_func=None)

参数解释:

clf:表示的是不同的分类器,可以是任何的分类器。比如支持向量机分类器。clf = svm.SVC(kernel=’linear’, C=1); 
raw_data:原始数据; 
raw_target:原始类别标号; 
cv:代表的就是不同的cross validation的方法了。引用scikit-learn上的一句话(When the cv argument is an integer, cross_val_score uses the KFold or StratifiedKFold strategies by default, the latter being used if the estimator derives from ClassifierMixin.)如果cv是一个int数字的话,那么默认使用的是KFold或者StratifiedKFold交叉,如果如果指定了类别标签则使用的是StratifiedKFold。 
cross_val_score:这个函数的返回值就是对于每次不同的的划分raw_data时,在test_data上得到的分类的准确率。至于准确率的算法可以通过score_func参数指定,如果不指定的话,是用clf默认自带的准确率算法。

scikit-learn的cross-validation交叉验证代码:

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> cross_validation
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> svm
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>clf = svm.SVC(kernel=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'linear'</span>, C=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>)
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>scores = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#5-fold cv</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># change metrics</span>
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> metrics
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>cross_validation.cross_val_score(clf, iris.data, iris.target, cv=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>, score_func=metrics.f1_score)
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#f1 score: http://en.wikipedia.org/wiki/F1_score</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>

Note: if using LR, clf = LogisticRegression().

生成一个数据集做为交叉验证

<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">>>> import numpy as np
>>> from sklearn.cross_validation import train_test_split
>>> X, y = np.arange(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>).reshape((<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>)), range(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>)
>>> X
array(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[0, 1],[2, 3],[4, 5],[6, 7],[8, 9]]</span>)
>>> list(y)
[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>]</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li></ul>

将数据切分为训练集和测试集

<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.33</span>, random_state=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">42</span>)
...
>>> X_train
array(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[4, 5],[0, 1],[6, 7]]</span>)
>>> y_train
[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>]
>>> X_test
array(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[2, 3],[8, 9]]</span>)
>>> y_test
[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>]</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li></ul>

交叉验证的使用

下面是手动划分训练集和测试集,控制台中输入下列代码进行测试:

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> numpy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> np
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> cross_validation
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> datasets
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> svm
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>iris = datasets.load_iris()
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>iris.data.shape, iris.target.shape
((<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">150</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>), (<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">150</span>,))
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>X_train, X_test, y_train, y_test = cross_validation.train_test_split(
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">... </span>    iris.data, iris.target, test_size=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.4</span>, random_state=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>)
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>X_train.shape, y_train.shape
((<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">90</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>), (<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">90</span>,))
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>X_test.shape, y_test.shape
((<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">60</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>), (<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">60</span>,))
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>clf = svm.SVC(kernel=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'linear'</span>, C=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>).fit(X_train, y_train)
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>clf.score(X_test, y_test)
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.96</span>...</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li></ul>

下面是交叉验证的实例:

<code class="hljs r has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">>>> clf = svm.SVC(kernel=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'linear'</span>, C=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>)
>>> scores = cross_validation.cross_val_score(
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>    clf, iris.data, iris.target, cv=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>
>>> scores
array([ <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.96</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.</span>  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.96</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.96</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.</span>        ])</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li></ul>

通过cross_validation,设置cv=5,进行5倍交叉验证,最后得到一个scores的预测准确率数组,表示每次交叉验证得到的准确率。

<code class="hljs perl has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">>>> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Accuracy: <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">%0</span>.2f (+/- <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">%0</span>.2f)"</span> % (scores.mean(), scores.std() * <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>))
Accuracy: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">98</span> (+<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/- 0.03)</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

通过scores.mean()求出平均值,得到平均精度。还可以通过指定scoring来设置准确率算法

<code class="hljs r has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">>>> from sklearn import metrics
>>> scores = cross_validation.cross_val_score(clf, iris.data, iris.target,
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>     cv=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>, scoring=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'f1_weighted'</span>)
>>> scores
array([ <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.96</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.</span>  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.96</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.96</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.</span>        ])</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>

libsvm格式的数据导入:

<code class="hljs r has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">>>> from sklearn.datasets import load_svmlight_file
>>> X_train, y_train = load_svmlight_file(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/path/to/train_dataset.txt"</span>)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>
>>>X_train.todense()<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#将稀疏矩阵转化为完整特征矩阵</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>

2.处理非均衡问题

对于正负样本比例相差较大的非均衡问题,一种调节分类器的方法就是对分类器的训练数据进行改造。一种是欠抽样,一种是过抽样。过抽样意味着赋值样例,而欠抽样意味着删除样例。对于过抽样,最后可能导致过拟合问题;而对于欠抽样,则删掉的样本中可能包含某些重要的信息,会导致欠拟合。对于正例样本较少的情况下,通常采取的方式是使用反例类别的欠抽样和正例类别的过抽样相混合的方法


3.scikit-learn学习SVM

<code class="hljs r has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> digits = datasets.load_digits()
>>> print digits.data
[[  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5.</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>][  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>][  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">16.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>]<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>, [  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>][  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>][  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10.</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.</span>   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>]]
>>> digits.target
array([<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>])
>>> digits.images[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]
array([[  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>],[  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>],[  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>],[  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>],[  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>],[  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>],[  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">14.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>],[  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13.</span>,  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,   <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>]])
>>> from sklearn import svm
>>> clf = svm.SVC(gamma=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.001</span>, C=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100.</span>)
>>> clf.fit(digits.data[:-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>],digits.target[:-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>])
SVC(C=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100.0</span>, cache_size=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">200</span>, class_weight=None, coef0=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0</span>, degree=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,gamma=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.001</span>, kernel=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'rbf'</span>, max_iter=-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, probability=False,random_state=None, shrinking=True, tol=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.001</span>, verbose=False)
>>> clf.predict(digits.data[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>])
array([<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>])
>>> </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li></ul>

3.scikit-learn学习RandomForest

使用例子

<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">>>> from sklearn.ensemble import RandomForestClassifier
>>> X = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[0, 0], [1, 1]]</span>
>>> Y = [<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]
>>> clf = RandomForestClassifier(n_estimators=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>)
>>> clf = clf.fit(X, Y)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>

Method

randomForestClassifier分类器的初始值

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self,n_estimators=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>,criterion=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"gini"</span>,max_depth=None,min_samples_split=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,min_samples_leaf=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,min_weight_fraction_leaf=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.</span>,max_features=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"auto"</span>,max_leaf_nodes=None,bootstrap=True,oob_score=False,n_jobs=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,random_state=None,verbose=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>,warm_start=False,
</span></span></code><p><code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">     class_weight=None)</span>:</span></code><code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-function" style="box-sizing: border-box;">
</span></code><code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-function" style="box-sizing: border-box;">
</span></code><code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-function" style="box-sizing: border-box;">http://www.360doc.com/content/16/0626/16/20558639_570898095.shtml
</span></code></p>

K折交叉验证-python相关推荐

  1. python k折交叉验证,python中sklearnk折交叉验证

    python中sklearnk折交叉验证 发布时间:2018-06-10 11:09, 浏览次数:492 , 标签: python sklearnk 1.模型验证回顾 进行模型验证的一个重要目的是要选 ...

  2. k折交叉验证python代码_K折交叉验证法原理及python实现

    本文为原创文章,转载请注明出处! 在训练数据的过程或者参加数据比赛的时候,常常会遇到数据量不够大的情况,在一次比赛过程我学到一个小技巧-K折交叉验证法(k-fold CrossValidation), ...

  3. python交叉验证法_Python实现K折交叉验证法的方法步骤

    学习器在测试集上的误差我们通常称作"泛化误差".要想得到"泛化误差"首先得将数据集划分为训练集和测试集.那么怎么划分呢?常用的方法有两种,k折交叉验证法和自助法 ...

  4. k折交叉验证法python实现_Jason Brownlee专栏| 如何解决不平衡分类的k折交叉验证-不平衡分类系列教程(十)...

    作者:Jason Brownlee 编译:Florence Wong – AICUG 本文系AICUG翻译原创,如需转载请联系(微信号:834436689)以获得授权 在对不可见示例进行预测时,模型评 ...

  5. k折交叉验证matlab 流程_第51集 python机器学习:分层K折交叉验证及其他方式

    由于出现类似鸢尾花数据集这种分段数据可能简单的交叉验证无法适用,所以这里引用了分层K折交叉验证.在分层交叉验证中,我们划分数据,使得每个折中类别之间的比例整数与数据集中的比例相同,如下图所示: mgl ...

  6. 五折交叉验证/K折交叉验证, python代码到底怎么写

    五折交叉验证: 把数据平均分成5等份,每次实验拿一份做测试,其余用做训练.实验5次求平均值.如上图,第一次实验拿第一份做测试集,其余作为训练集.第二次实验拿第二份做测试集,其余做训练集.依此类推~ 但 ...

  7. python 实现k折交叉验证

    k折交叉验证原理: k折交叉验证是将数据分为k份,选取其中的k-1份为训练数据,剩余的一份为测试数据.k份数据循环做测试集进行测试.此原理适用于数据量小的数据. # k-折交叉验证(此处设置k=10) ...

  8. Python:K折交叉验证,将数据集分成训练集与测试集

    注意文件夹格式:父文件夹/类别/图像(同torch读取图像格式保存一致),传入路径为父文件夹路径. """ 对图像进行交叉验证, 用于检验分类效果 对每个类别的n张图像进 ...

  9. python 交叉验证后获取模型_Python机器学习:6.2 K折交叉验证评估模型性能

    训练机器学习模型的关键一步是要评估模型的泛化能力.如果我们训练好模型后,还是用训练集取评估模型的性能,这显然是不符合逻辑的.一个模型如果性能不好,要么是因为模型过于复杂导致过拟合(高方差),要么是模型 ...

最新文章

  1. @RequestParam加与不加的区别
  2. linux挂载一个文件夹,linux挂载一个文件夹到另一个文件夹
  3. C# 视频监控系列(13):H264播放器——控制播放和截图
  4. 使用Forge,WildFly Swarm和Arquillian开发微服务
  5. 安装开源 ITIL 门户 iTOP
  6. 第 13 章 外观模式
  7. 如何更改rhevm中admin的密码
  8. IDES SAP SEM 4.0/SAP BW 3.50 笔记本安装手册
  9. python安装哪个版本比较好,python选择哪个版本安装
  10. 猜拳小游戏java_java 猜拳小游戏
  11. linux_主线程子线程退出关系
  12. 练习:使用Python爬取COVID-19疫情国内当日数据
  13. Mybatis insert、update 、delete返回值
  14. 浪迹天涯king教你用elementui做复杂的表格,去处理报表数据(合并表头,合并表体行和列)
  15. Forge 发布倒计时三天:陈天写下他加入 ArcBlock 一周年的感慨 | ArcBlock 博客
  16. #谷歌地图# 谷歌地图报错
  17. 语文科计算机培训心得体会,语文培训心得体会(精选3篇)
  18. 2020学期学习计划
  19. 视频直播质量的评测和实现分享
  20. SDUT OJ单个字符输入和输出(顺序结构)

热门文章

  1. onlyoffice mysql_windows+onlyoffice安装-Go语言中文社区
  2. 蓝桥杯集合运算问题c语言,蓝桥杯 集合运算(set)
  3. 三种常数项级数收敛准则。
  4. 【读论文04】CVPR2022选读
  5. 如何提高情商?情商书籍推荐
  6. 数据预处理(纯干货,适合小白学习)
  7. [转载]MIDAS/Gen常见问题汇编(一) 转自百思论坛
  8. 手机号 imsi tmsi_你好,我借的网贷忘了叫什么名字了怎么办,手机号也没用了,怎么可以查到-免费法律咨询...
  9. 分享:笔记本花屏的解决方法
  10. 使用3DLiDAR传感器进行基于同心区域的区域地面分割和地面似然估计