TenorFlowJS-激活函数

概述

激活函数用来决定一个神经元的最终的输出。譬如对一个细胞来说，理想的输出是0和1。但是如果真实的输出是0.85的话，这个时候用来决定输出是0还是1的函数就叫激活函数（从另一个角度来看，有些激活函数有点像数字信号处理里面的将连续信号离散化）。

激活函数或者位于网络的尾部，用于调整输出，或者位于Layer之间。本文介绍激活函数用于Dense Layer的情况。

图片来自：
https://cdn-images-1.medium.com/max/1400/0*44z992IXd9rqyIWk.png

Dense layer

Dense Layer的功能由下面的函数来描述：
output = activation(dot(input, kernel) + bias
这里的kernel不是过滤器的内核，而是weights matrix。所以kernel的大小，应该和输入的大小一致。
从这个公式看，Dense 层解决的是多元一次方程的求解问题。如果输入是1，则是y = kx +b。

tfjs-examples/getting-started:

async function run() {// Create a simple model.const model = tf.sequential();model.add(tf.layers.dense({units: 1, inputShape: [1]}));// Prepare the model for training: Specify the loss and the optimizer.model.compile({loss: 'meanSquaredError', optimizer: 'sgd'});/* 38.2320556640625// Generate some synthetic data for training. (y = 2x - 1)const xs = tf.tensor2d([-1, 0, 1, 2, 3, 4], [6, 1]);const ys = tf.tensor2d([-3, -1, 1, 3, 5, 7], [6, 1]);*/// 19.857912063598633// Generate some synthetic data for training. (y = x)const xs = tf.tensor2d([-1, 0, 1, 2, 3, 4], [6, 1]);const ys = tf.tensor2d([-1, 0, 1, 2, 3, 4], [6, 1]);// Train the model using the data.await model.fit(xs, ys, {epochs: 250});// Use the model to do inference on a data point the model hasn't seen.// Should print approximately 39.document.getElementById('micro-out-div').innerText =model.predict(tf.tensor2d([20], [1, 1])).dataSync();
}run();

参考文献：
https://keras.io/layers/core/#dense

linear

linear其实就是什么都不干，也是默认的操作。所以如果输入数据都大于0， relu和linear得到的结果是一样的。
如果输入是1x1，下面的代码可以用来测试linear， relu， softmax等激活函数。

import * as tf from '@tensorflow/tfjs';
function createModel() {// Build and compile model.const model = tf.sequential();// 200,5: 8.4533968; 200, 2: 3.2494676;// model.add(tf.layers.dense({units: 1, inputShape: [1]}));// 200,5 8.5429592; 200, 2 3.2085927// model.add(tf.layers.dense({units: 1, inputShape: [1], activation: 'relu'}));// 200,5, 1; 200, 2: 1// model.add(tf.layers.dense({units: 1, inputShape: [1], activation: 'softmax'}));// 200,5, 8.4282207; 200, 2: 3.260958model.add(tf.layers.dense({units: 1, inputShape: [1], activation: 'linear'}));model.compile({optimizer: 'sgd', loss: 'meanSquaredError'});return model;
}
async function predict(model) {// Generate some synthetic data for training.const xs = tf.tensor2d([[1], [2], [3], [4]], [4, 1]);const ys = tf.tensor2d([[1], [3], [5], [7]], [4, 1]);// Train model with fit().await model.fit(xs, ys, {epochs: 200});// Run inference with predict().model.predict(tf.tensor2d([[5]], [1, 1])).print();model.predict(tf.tensor2d([[2]], [1, 1])).print();
}
function main() {const model = createModel();predict(model);
}
main();

relu

relu也比较简单，就是将所有非负数变为0，正数不变，适合处理内容都是正数的部分，譬如图片。

softmax

Wiki对softmax的解释是：对向量进行归一化，凸显其中最大的值并抑制远低于最大值的其他分量。所以如果使用了激活函数softmax的Layer的inputShape是1，那么，softmax输出会一直是1（因为只有一个数）。

来自Wiki的例子能够很直观的理解softmax：

import math
z = [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0]
z_exp = [math.exp(i) for i in z]
print(z_exp)  # Result: [2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09]
sum_z_exp = sum(z_exp)
print(sum_z_exp)  # Result: 114.98
softmax = [round(i / sum_z_exp, 3) for i in z_exp]
print(softmax)  # Result: [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]

但是在http://cs231n.github.io/linear-classify/里面，作者将softmax解释为classifier。不过，如果深入文献的内容，参数cs231n里面的softmax，和激活函数的概念都是一样的：即都是将输出做进一步处理，以得到期望的输出。

图片来自：http://cs231n.github.io/assets/svmvssoftmax.png

可以单独对Tensor进行softmax运算：

import * as tf from '@tensorflow/tfjs';function testSoftmax() {const a = tf.tensor2d([2, 4, 6, 1, 2, 3], [2, 3]);a.softmax().print();  // or tf.softmax(a)
}
function main() {testSoftmax();
}
main();

输出是：

Tensor[[0.0158762, 0.1173104, 0.8668135],[0.0900306, 0.2447284, 0.6652408]]

也可以将softmax和Layer结合起来。下面的示例演示了softmax用于dense Layer的情况：

import * as tf from '@tensorflow/tfjs';function createModel() {// Build and compile model.const model = tf.sequential();model.add(tf.layers.dense({units: 2, inputShape: [2], activation: 'softmax'}));model.compile({optimizer: 'sgd', loss: 'meanSquaredError'});return model;
}async function predict(model) {// Generate some synthetic data for training.const xs = tf.tensor2d([1, 2], [1, 2]);const ys = tf.tensor2d([0.2, 0.8], [1, 2]);// Train model with fit().await model.fit(xs, ys, {epochs: 200});// Run inference with predict().model.predict(tf.tensor2d([2, 3], [1, 2])).print();model.predict(tf.tensor2d([2, 1.5], [1, 2])).print();
}function main() {const model = createModel();predict(model);
}main();

和之前的例子不同的是，这里的输入是2个元素。Output is:

Tensor[[0.1261015, 0.8738984],]
Tensor[[0.1033671, 0.8966329],]

参考文献：
https://zh.wikipedia.org/wiki/Softmax函数
https://medium.com/@udemeudofia01/basic-overview-of-convolutional-neural-network-cnn-4fcc7dbb4f17
https://www.quora.com/What-is-activation-in-convolutional-neural-networks