https://blog.csdn.net/Linli522362242/article/details/126863601

Deep feed forward (DFF)

Deep Feedforward Networks are also called as – Feedforward neural networks or – Multilayer Perceptrons (MLPs)

The Multilayer Perceptron and Backpropagation(反向传播(B-P网络),可以用来表示一种神经网络算法)

Figure 10-7. Architecture of a Multilayer Perceptron with two inputs, one hidden layer of four neurons, and three output neurons (the bias neurons are shown here, but usually they are implicit内含的
     An MLP is composed of one (passthrough) input layer, one or more layers of TLUs( threshold logic unit (TLU) or callled linear threshold units(LTU) ), called hidden layers, and one final layer of TLUs called the output layer (see Figure 10-7). The layers close to the input layer are usually called the lower layers, and the ones close to the outputs are usually called the upper layers. Every layer except the output layer includes a bias neuron and is fully connected to the next layer.

######################################
     The signal flows only in one direction (from the inputs to the outputs), so this architecture is an example of a feedforward neural network (FNN or FFNN).

In their 1969 monograph专著 titled Perceptrons, Marvin Minsky and Seymour Papert highlighted a number of serious weaknesses of Perceptrons(), in particular the fact that they are incapable of solving some trivial problems (e.g., the Exclusive OR (XOR) classification problem; see the left side of Figure 10-6). Of course this is true of any other linear classification model as well (such as Logistic Regression classifiers), but researchers had expected much more from Perceptrons, and their disappointment was great: as a result, many researchers dropped connectionism altogether (i.e., the study of neural networks) in favor of higher-level problems such as logic, problem solving, and search.

The XOR function (“exclusive or”) is an operation on two binary values, and . When exactly one of these binary values is equal to 1, the XOR function 1 returns . Otherwise, it returns 0(Same as zero, different as one). The XOR function provides the target function that we want to learn. Our model provides a function and our learning algorithm will adapt the parameters θ to make f as similar as possible to .
sigmoid:
OR where 

import matplotlib.pyplot as plt
import numpy as npdef sigmoid(z):return 1/(1+np.exp(-z)) # >0.5 ==> positive, <0.5 ==>negativedef heaviside(z):#if z>=0 ==>True ==>1 OR ==>False ==>0#arr = np.array([1,2,3,4,5])#(arr>=0).astype( arr.dtype ) ==> array([1, 1, 1, 1, 1])return (z>=0).astype(z.dtype) #>=0 ==> class #1, <0 ==> class #0def mlp_xor(x1, x2, activation = heaviside):#      activation(W^T*x)                        #                     activation(W^T*x)         activation( W^T*x )return activation( -1*activation( x1+x2-1.5 ) + activation( x1+x2-0.5 ) -0.5 )x1s = np.linspace(-0.2, 1.2, 100)
x2s = np.linspace(-0.2, 1.2, 100)
x1, x2 = np.meshgrid(x1s, # horizontal direction and be filled row by rowx2s  # vertical and be filled column by column)z1 = mlp_xor(x1, x2, activation=heaviside) # output(or prediction)
z2 = mlp_xor(x1, x2, activation=sigmoid)   # output(or prediction)#from matplotlib.colors import ListedColormap
#             [  'b',     'g',  'r',   'c',       'm',      'y',   'k']
#colorTuple = ('blue', 'green','red', 'cyan', 'magenta', 'yellow', 'black')plt.figure( figsize=(10,5) )plt.subplot(121)
#OR cmap = ListedColormap( colorTuple[:3] )  #从颜色列表生成的颜色映射对象
plt.contourf(x1, x2, z1, cmap='jet')
plt.plot([0,1], [0,1], "gs", markersize=20)
plt.plot([0,1], [1,0], "y^", markersize=20)
plt.title("Activation function: heaviside", fontsize=14)
plt.grid(True)plt.subplot(122)
plt.contourf(x1, x2, z2, cmap='jet')
plt.plot([0,1], [0,1], "gs", markersize=20)
plt.plot([0,1], [1,0], "y^", markersize=20)
plt.title("Activation function: sigmoid", fontsize=14)plt.show()

However, it turns out that some of the limitations of Perceptrons can be eliminated by stacking multiple Perceptrons. The resulting ANN is called a Multi-Layer Perceptron (MLP). In particular, an MLP can solve the XOR problem, as you can verify by computing the output of the MLP represented on the right of Figure 10-6, for each combination of inputs: with inputs (0, 0) or (1, 1) the network outputs 0, and with inputs (0, 1) or (1, 0) it outputs 1. All connections have a weight equal to 1, except the four connections where the weight is shown. Try verifying that this network indeed solves the XOR problem!
######################################


:the activation unit in the  layer for a sample, for sample
the  superscript for the hidden layer, and the  superscript for the output layer

For instance,

  • refers to the unit in the input layer,
  • refers to the unit in the hidden layer, and |
  •  refers to the unit in the output layer.
  • Here, the activation units  and are the bias units, which we set equal to 1.

The activation of the units in the input layer is just its input plus the bias unit:
     Each unit in layer is connected to all units in next layer via a weight coefficient.
     For example, the connection between the unit in layer to the unit in layer will be written as . Referring back to the previous figure, we denote the weight matrix that connects the input to the hidden layer as , and we write the matrix that connects the hidden layer to the output layer as .
Let's run through this algorithm in a bit more detail: