【翻译自: How to Manually Optimize Neural Network Models】

【说明:Jason Brownlee PhD大神的文章个人很喜欢,所以闲暇时间里会做一点翻译和学习实践的工作,这里是相应工作的实践记录,希望能帮到有需要的人!】











# define a binary classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# summarize the shape of the dataset
print(X.shape, y.shape)


(1000, 5) (1000,)

接下来,我们需要定义一个Perceptron模型。Perceptron模型只有一个节点,该节点对数据集中的每一列都有一个输入权重。每个输入都乘以其相应的权重以得出加权总和,然后加上偏差权重,就像回归模型中的拦截系数一样。该加权和称为激活。最后,激活被解释并用于预测类别标签,正激活为1,负激活为0。在优化模型权重之前,我们必须开发模型以及对模型如何工作的信心。首先定义一个解释模型激活的函数。这称为激活函数或传递函数。后者的名称比较传统,是我的偏爱。下面的transfer()函数接受模型的激活并返回一个类标签,对于正或零激活,返回class = 1,对于负激活,返回class = 0。这称为步进传递函数。

# transfer function
def transfer(activation):if activation >= 0.0:return 1return 0

接下来,我们可以开发一个函数,该函数针对来自数据集的给定输入数据行计算模型的激活。此函数将获取数据行和模型的权重,并加上偏差权重来计算输入的增量总和。下面的激活()函数实现了这一点。注意:我们使用的是简单的Python列表和命令式编程样式,而不是NumPy数组或列表压缩,目的是使代码对Python初学者注意改变性。随时进行进行优化,并在下面的注释中 发布您的代码。

# activation function
def activate(row, weights):# add the bias, the last weightactivation = weights[-1]# add the weighted inputfor i in range(len(row)):activation += weights[i] * row[i]return activation

接下来,我们可以一起使用activate()和transfer()函数来生成给定数据行的预测。 下面的predict_row()函数实现了这一点。

# use model weights to predict 0 or 1 for a given row of data
def predict_row(row, weights):# activate for inputactivation = activate(row, weights)# transfer for activationreturn transfer(activation)

接下来,我们可以为给定数据集中的每一行调用predict_row()函数。 下面的predict_dataset()函数实现了这一点。同样,我们有意使用简单的命令式编码风格来提高可读性,而不是列表压缩。

# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, weights):yhats = list()for row in X:yhat = predict_row(row, weights)yhats.append(yhat)return yhats


# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of weights
n_weights = X.shape[1] + 1
# generate random weights
weights = rand(n_weights)


# generate predictions for dataset
yhat = predict_dataset(X, weights)


# calculate accuracy
score = accuracy_score(y, yhat)

我们可以将所有这些结合在一起,并演示用于分类的简单Perceptron模型。 下面列出了完整的示例。

# simple perceptron model for binary classification
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score# transfer function
def transfer(activation):if activation >= 0.0:return 1return 0# activation function
def activate(row, weights):# add the bias, the last weightactivation = weights[-1]# add the weighted inputfor i in range(len(row)):activation += weights[i] * row[i]return activation# use model weights to predict 0 or 1 for a given row of data
def predict_row(row, weights):# activate for inputactivation = activate(row, weights)# transfer for activationreturn transfer(activation)# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, weights):yhats = list()for row in X:yhat = predict_row(row, weights)yhats.append(yhat)return yhats# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of weights
n_weights = X.shape[1] + 1
# generate random weights
weights = rand(n_weights)
# generate predictions for dataset
yhat = predict_dataset(X, weights)
# calculate accuracy
score = accuracy_score(y, yhat)


注意:由于算法或评估程序的随机性,或者数值精度的差异,您的结果可能会有所不同。 考虑运行该示例几次并比较平均结果。




首先,我们需要将数据集分为训练集和测试集。 重要的是要保留一些在优化模型中不使用的数据,以便在用于对新数据进行预测时,我们可以对模型的性能进行合理的估计。


# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)


优化算法需要目标函数进行优化。 它必须具有一组权重,并返回与更好的模型相对应的最小或最大分数。



# objective function
def objective(X, y, weights):# generate predictions for datasetyhat = predict_dataset(X, weights)# calculate accuracyscore = accuracy_score(y, yhat)return score


该算法将需要一个初始解(例如随机权重),并将迭代地不断对解做一些细微更改并检查它是否导致性能更好的模型。 当前解决方案的更改量由step_size超参数控制。 此过程将继续进行固定数量的迭代,也作为超参数提供。


# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):# evaluate the initial pointsolution_eval = objective(X, y, solution)# run the hill climbfor i in range(n_iter):# take a stepcandidate = solution + randn(len(solution)) * step_size# evaluate candidate pointcandidte_eval = objective(X, y, candidate)# check if we should keep the new pointif candidte_eval >= solution_eval:# store the new pointsolution, solution_eval = candidate, candidte_eval# report progressprint('>%d %.5f' % (i, solution_eval))return [solution, solution_eval]


# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.05
# determine the number of weights
n_weights = X.shape[1] + 1
# define the initial solution
solution = rand(n_weights)
# perform the hill climbing search
weights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)
print('f(%s) = %f' % (weights, score))


# generate predictions for the test dataset
yhat = predict_dataset(X_test, weights)
# calculate accuracy
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %.5f' % (score * 100))


# hill climbing to optimize weights of a perceptron model for classification
from numpy import asarray
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score# transfer function
def transfer(activation):if activation >= 0.0:return 1return 0# activation function
def activate(row, weights):# add the bias, the last weightactivation = weights[-1]# add the weighted inputfor i in range(len(row)):activation += weights[i] * row[i]return activation# # use model weights to predict 0 or 1 for a given row of data
def predict_row(row, weights):# activate for inputactivation = activate(row, weights)# transfer for activationreturn transfer(activation)# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, weights):yhats = list()for row in X:yhat = predict_row(row, weights)yhats.append(yhat)return yhats# objective function
def objective(X, y, weights):# generate predictions for datasetyhat = predict_dataset(X, weights)# calculate accuracyscore = accuracy_score(y, yhat)return score# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):# evaluate the initial pointsolution_eval = objective(X, y, solution)# run the hill climbfor i in range(n_iter):# take a stepcandidate = solution + randn(len(solution)) * step_size# evaluate candidate pointcandidte_eval = objective(X, y, candidate)# check if we should keep the new pointif candidte_eval >= solution_eval:# store the new pointsolution, solution_eval = candidate, candidte_eval# report progressprint('>%d %.5f' % (i, solution_eval))return [solution, solution_eval]# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.05
# determine the number of weights
n_weights = X.shape[1] + 1
# define the initial solution
solution = rand(n_weights)
# perform the hill climbing search
weights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)
print('f(%s) = %f' % (weights, score))
# generate predictions for the test dataset
yhat = predict_dataset(X_test, weights)
# calculate accuracy
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %.5f' % (score * 100))



注意:由于算法或评估程序的随机性,或者数值精度的差异,您的结果可能会有所不同。 考虑运行该示例几次并比较平均结果。


>111 0.88060
>119 0.88060
>126 0.88209
>134 0.88209
>205 0.88209
>262 0.88209
>280 0.88209
>293 0.88209
>297 0.88209
>336 0.88209
>373 0.88209
>437 0.88358
>463 0.88507
>630 0.88507
>701 0.88507
f([ 0.0097317 0.13818088 1.17634326 -0.04296336 0.00485813 -0.14767616]) = 0.885075
Test Accuracy: 81.81818



多层感知器(MLP)模型是具有一层或多层的神经网络,其中每一层都有一个或多个节点。它是Perceptron模型的扩展,并且可能是使用最广泛的神经网络(深度学习)模型。在本节中,我们将基于在上一节中学到的知识来优化具有任意数量的层和每层节点的MLP模型的权重。首先,我们将开发模型并使用随机权重对其进行测试,然后使用随机爬坡优化模型权重。使用MLP进行二进制分类时,通常会使用S型传递函数(也称为逻辑函数),而不是Perceptron中使用的步进传递函数。此函数输出表示二项式概率分布的0-1之间的实数值,例如一个示例属于class = 1的概率。下面的transfer()函数实现了这一点。

# transfer function
def transfer(activation):# sigmoid transfer functionreturn 1.0 / (1.0 + exp(-activation))

我们可以使用上一部分中相同的activate()函数。 在这里,我们将使用它来计算给定层中每个节点的激活。必须使用更详尽的版本替换预报函数(predict_row())。该函数获取一行数据和网络,并返回网络的输出。

我们将网络定义为列表列表。 每个层将是节点列表,每个节点将是权重列表或权重数组。要计算网络的预测,我们只需枚举层,然后枚举节点,然后计算每个节点的激活和传输输出。 在这种情况下,我们将对网络中的所有节点使用相同的传递函数,尽管并非必须如此。对于具有多层以上的网络,上一层的输出用作下一层中每个节点的输入。 然后返回网络中最后一层的输出。下面的predict_row()函数实现了这一点。

# activation function for a network
def predict_row(row, network):inputs = row# enumerate the layers in the network from input to outputfor layer in network:new_inputs = list()# enumerate nodes in the layerfor node in layer:# activate the nodeactivation = activate(inputs, node)# transfer activationoutput = transfer(activation)# store outputnew_inputs.append(output)# output from this layer is input to the next layerinputs = new_inputsreturn inputs[0]


# create a one node network
node = rand(n_inputs + 1)
layer = [node]
network = [layer]

尽管具有S型传递函数,但实际上是Perceptron。 很无聊。让我们定义一个具有一个隐藏层和一个输出层的MLP。 第一个隐藏层将有10个节点,每个节点将从数据集中获取输入模式(例如5个输入)。 输出层将具有一个节点,该节点从第一隐藏层的输出中获取输入,然后输出预测。

# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]


# generate predictions for dataset
yhat = predict_dataset(X, network)


# round the predictions
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y, yhat)


# develop an mlp model for classification
from math import exp
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score# transfer function
def transfer(activation):# sigmoid transfer functionreturn 1.0 / (1.0 + exp(-activation))# activation function
def activate(row, weights):# add the bias, the last weightactivation = weights[-1]# add the weighted inputfor i in range(len(row)):activation += weights[i] * row[i]return activation# activation function for a network
def predict_row(row, network):inputs = row# enumerate the layers in the network from input to outputfor layer in network:new_inputs = list()# enumerate nodes in the layerfor node in layer:# activate the nodeactivation = activate(inputs, node)# transfer activationoutput = transfer(activation)# store outputnew_inputs.append(output)# output from this layer is input to the next layerinputs = new_inputsreturn inputs[0]# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, network):yhats = list()for row in X:yhat = predict_row(row, network)yhats.append(yhat)return yhats# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of inputs
n_inputs = X.shape[1]
# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]
# generate predictions for dataset
yhat = predict_dataset(X, network)
# round the predictions
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y, yhat)


注意:由于算法或评估程序的随机性,或者数值精度的差异,您的结果可能会有所不同。 考虑运行该示例几次并比较平均结果。




# take a step in the search space
def step(network, step_size):new_net = list()# enumerate layers in the networkfor layer in network:new_layer = list()# enumerate nodes in this layerfor node in layer:# mutate the nodenew_node = node.copy() + randn(len(node)) * step_size# store node in layernew_layer.append(new_node)# store layer in networknew_net.append(new_layer)return new_net

修改网络中的所有权重都是很积极的。搜索空间中较不积极的步骤可能是对模型中权重的子集进行较小的更改,可能由超参数控制。 这留作扩展。然后,我们可以从hillclimbing()函数中调用此新的step()函数。

# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):# evaluate the initial pointsolution_eval = objective(X, y, solution)# run the hill climbfor i in range(n_iter):# take a stepcandidate = step(solution, step_size)# evaluate candidate pointcandidte_eval = objective(X, y, candidate)# check if we should keep the new pointif candidte_eval >= solution_eval:# store the new pointsolution, solution_eval = candidate, candidte_eval# report progressprint('>%d %f' % (i, solution_eval))return [solution, solution_eval]


# stochastic hill climbing to optimize a multilayer perceptron for classification
from math import exp
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score# transfer function
def transfer(activation):# sigmoid transfer functionreturn 1.0 / (1.0 + exp(-activation))# activation function
def activate(row, weights):# add the bias, the last weightactivation = weights[-1]# add the weighted inputfor i in range(len(row)):activation += weights[i] * row[i]return activation# activation function for a network
def predict_row(row, network):inputs = row# enumerate the layers in the network from input to outputfor layer in network:new_inputs = list()# enumerate nodes in the layerfor node in layer:# activate the nodeactivation = activate(inputs, node)# transfer activationoutput = transfer(activation)# store outputnew_inputs.append(output)# output from this layer is input to the next layerinputs = new_inputsreturn inputs[0]# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, network):yhats = list()for row in X:yhat = predict_row(row, network)yhats.append(yhat)return yhats# objective function
def objective(X, y, network):# generate predictions for datasetyhat = predict_dataset(X, network)# round the predictionsyhat = [round(y) for y in yhat]# calculate accuracyscore = accuracy_score(y, yhat)return score# take a step in the search space
def step(network, step_size):new_net = list()# enumerate layers in the networkfor layer in network:new_layer = list()# enumerate nodes in this layerfor node in layer:# mutate the nodenew_node = node.copy() + randn(len(node)) * step_size# store node in layernew_layer.append(new_node)# store layer in networknew_net.append(new_layer)return new_net# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):# evaluate the initial pointsolution_eval = objective(X, y, solution)# run the hill climbfor i in range(n_iter):# take a stepcandidate = step(solution, step_size)# evaluate candidate pointcandidte_eval = objective(X, y, candidate)# check if we should keep the new pointif candidte_eval >= solution_eval:# store the new pointsolution, solution_eval = candidate, candidte_eval# report progressprint('>%d %f' % (i, solution_eval))return [solution, solution_eval]# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.1
# determine the number of inputs
n_inputs = X.shape[1]
# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]
# perform the hill climbing search
network, score = hillclimbing(X_train, y_train, objective, network, n_iter, step_size)
print('Best: %f' % (score))
# generate predictions for the test dataset
yhat = predict_dataset(X_test, network)
# round the predictions
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %.5f' % (score * 100))



注意:由于算法或评估程序的随机性,或者数值精度的差异,您的结果可能会有所不同。 考虑运行该示例几次并比较平均结果。


>55 0.755224
>56 0.765672
>59 0.794030
>66 0.805970
>77 0.835821
>120 0.838806
>165 0.840299
>188 0.841791
>218 0.846269
>232 0.852239
>237 0.852239
>239 0.855224
>292 0.867164
>368 0.868657
>823 0.868657
>852 0.871642
>889 0.871642
>892 0.871642
>992 0.873134
Best: 0.873134
Test Accuracy: 85.15152


