从零开始编写深度学习库（四）Eigen::Tensor学习使用及代码重构

从零开始编写深度学习库（四）Eigen::Tensor学习使用及代码重构

博客：http://blog.csdn.net/hjimce

微博：黄锦池-hjimce qq:1393852684

一、构造函数(1)矩阵大小可变构造函数：Class Tensor<data_type, rank>

// Create a tensor of rank 3 of sizes 2, 3, 4.  This tensor owns
// memory to hold 24 floating point values (24 = 2 x 3 x 4).
Tensor<float, 3> t_3d(2, 3, 4);//构建一个float类型，3维矩阵，每一维的长度分别为(2,3,4)// Resize t_3d by assigning a tensor of different sizes, but same rank.
t_3d = Tensor<float, 3>(3, 4, 3);

二、构造函数(2)矩阵大小固定构造函数：Class TensorFixedSize<data_type, Sizes<size0, size1, ...>>

这个在写代码的时候，就要固定矩阵的大小，不能用变量来指定矩阵大小，编译阶段直接编译固定大小的矩阵。相比于Tensor可变大小，其计算速度比较快。

// Create a 4 x 3 tensor of floats.
TensorFixedSize<float, Sizes<4, 3>> t_4x3;

三、构造函数(3)数据初始化构造函数：TensorMap<Tensor<data_type, rank>>(data, size0, size1, ...)，参数data:要初始化的数据数组地址，(size0,size1,……）矩阵每一维的长度，rank:矩阵的维度。

需要注意的是，该构造函数并没有在内存中另外拷贝一份data中的数据，而仅仅是数据指针映射，所以一旦构造，该tensor矩阵也是大小不可变的。

// Map a tensor of ints on top of stack-allocated storage.
int storage[128];  // 2 x 4 x 2 x 8 = 128
TensorMap<Tensor<int, 4>> t_4d(storage, 2, 4, 2, 8);//构造一个int类型，大小为(2,4,2,8)的四维矩阵，数据从storage中映射，没有拷贝数据// The same storage can be viewed as a different tensor.
// You can also pass the sizes as an array.
TensorMap<Tensor<int, 2>> t_2d(storage, 16, 8);// You can also map fixed-size tensors.  Here we get a 1d view of
// the 2d fixed-size tensor.
TensorFixedSize<float, Sizes<4, 5>> t_4x3;
TensorMap<Tensor<float, 1>> t_12(t_4x3.data(), 12);

四、tensor元素访问：<data_type> tensor(index0, index1...)这个比较简单，直接采用下标小括号访问。

// Set the value of the element at position (0, 1, 0);
Tensor<float, 3> t_3d(2, 3, 4);
t_3d(0, 1, 0) = 12.0f;// Initialize all elements to random values.
for (int i = 0; i < 2; ++i) {for (int j = 0; j < 3; ++j) {for (int k = 0; k < 4; ++k) {t_3d(i, j, k) = ...some random value...;}}
}// Print elements of a tensor.
for (int i = 0; i < 2; ++i) {LOG(INFO) << t_3d(i, 0, 0);
}

五、auto自动类型特殊功能：auto只能用于非数值访问表达式，延迟计算，这个类似于深度学习常用库中的符号编程，比如：

Tensor<float, 3> t3 = t1 + t2;
auto t4 = t1 + t2;

t4的数值并没有被真正的计算出来，也不存在内存数值。所以如果要打印t4的具体数值：

Tensor<float, 3> t3 = t1 + t2;
cout << t3(0, 0, 0);  // OK prints the value of t1(0, 0, 0) + t2(0, 0, 0)auto t4 = t1 + t2;
cout << t4(0, 0, 0);  // Compilation error!

就会出现编译错误。如果要获取t4的正真数值的时候，我们要具体定义类型类似于变量t3的定义方法。

auto t4 = t1 + t2;Tensor<float, 3> result = t4;  // Could also be: result(t4);这样就能获取t4中的数值了
cout << result(0, 0, 0);

所以如果希望矩阵经过一些列的计算后，到最后才获取具体的结果，可以采用auto：

// One way to compute exp((t1 + t2) * 0.2f);
auto t3 = t1 + t2;
auto t4 = t3 * 0.2f;
auto t5 = t4.exp();
Tensor<float, 3> result = t5;// Another way, exactly as efficient as the previous one:
Tensor<float, 3> result = ((t1 + t2) * 0.2f).exp();

六、auto符号表达式效率：

采用符号表达式，没有计算中间结果，无疑编写更方便，不过有的时候会提高效率，有的时候反而会降低效率，具体得看表达式。如果希望临时计算某个中间结果，可以采用的方法：

Assignment to a Tensor, TensorFixedSize, or TensorMap.
Use of the eval() method.
Assignment to a TensorRef.

(1)Assigning to a Tensor, TensorFixedSize, or TensorMap.

如果已经知道最后的表达式输出矩阵大小，建议采用TensorFixedSize效率更高，比如：

auto t3 = t1 + t2;             // t3 is an Operation.
auto t4 = t3 * 0.2f;           // t4 is an Operation.
auto t5 = t4.exp();            // t5 is an Operation.
Tensor<float, 3> result = t5;  // The operations are evaluated.最后一句需要改成如下语句：

TensorFixedSize<float, Sizes<4, 4, 2>> result = t5;

(2)eval()计算中间结果，有的时候效率更高，比如下面的效率比较低：

Tensor<...> X ...;
Tensor<...> Y = ((X - X.maximum(depth_dim).reshape(dims2d).broadcast(bcast))* beta).exp();

采用先就算出X的最大值，可以减少重复计算，保证maximum()只计算一次，大大提高效率：

Tensor<...> Y =((X - X.maximum(depth_dim).eval().reshape(dims2d).broadcast(bcast))* beta).exp();

七、减少不必要的计算，采用符号编程，计算指定元素：TensorRef。

有的时候，我们并不需要计算一整个输出矩阵，可能我们仅仅想要计算矩阵某个元素的数值而已，如果一整个矩阵计算，然后再拿出具体元素，无疑浪费不必要的计算。

// Create a TensorRef for the expression.  The expression is not
// evaluated yet.
TensorRef<Tensor<float, 3> > ref = ((t1 + t2) * 0.2f).exp();// Use "ref" to access individual elements.  The expression is evaluated
// on the fly.
float at_0 = ref(0, 0, 0);
cout << ref(0, 1, 0);

这个类似于稀疏矩阵，如果你要获取一整个矩阵，建议不要用这个，效率反而更低。

八、硬件、多线程、指令集等加速设置devices：在默认情况下，是采用cpu单线程，比如下面的代码：

Tensor<float, 2> a(30, 40);
Tensor<float, 2> b(30, 40);
Tensor<float, 2> c = a + b;

此时C的计算，默认是cpu 单线程。可以通过设置device，指定运行设备：

DefaultDevice my_device;
c.device(my_device) = a + b;

device可选参数：DefaultDevice, ThreadPoolDevice 、GpuDevice三个类的对象。设置device，必须知道c的大小。

采用多线程，线程池：

// Create the Eigen ThreadPoolDevice.
Eigen::ThreadPoolDevice my_device(4 /* number of threads to use */);// Now just use the device when evaluating expressions.
Eigen::Tensor<float, 2> c(30, 50);
c.device(my_device) = a.contract(b, dot_product_dims);

九、一些常用的函数API：

1、矩阵的维度：int NumDimensions

  Eigen::Tensor<float, 2> a(3, 4);cout << "Dims " << a.NumDimensions;=> Dims 2

2、矩阵形状：Dimensions dimensions()

Eigen::Tensor<float, 2> a(3, 4);
const Eigen::Tensor<float, 2>::Dimensions& d = a.dimensions();
cout << "Dim size: " << d.size << ", dim 0: " << d[0]<< ", dim 1: " << d[1];
=> Dim size: 2, dim 0: 3, dim 1: 4

3、获取指定维度大小： Index dimension(Index n)

  Eigen::Tensor<float, 2> a(3, 4);int dim1 = a.dimension(1);cout << "Dim 1: " << dim1;=> Dim 1: 4

4、获取矩阵元素个数: Index size()

Eigen::Tensor<float, 2> a(3, 4);
cout << "Size: " << a.size();
=> Size: 12

十、矩阵初始化API

1、所有元素初始化：setConstant(const Scalar& val)，用于把一个矩阵的所有元素设置成一个指定的常数。

a.setConstant(12.3f);
cout << "Constant: " << endl << a << endl << endl;
=>
Constant:
12.3 12.3 12.3 12.3
12.3 12.3 12.3 12.3
12.3 12.3 12.3 12.3

Eigen::Tensor<string, 2> a(2, 3);
a.setConstant("yolo");
cout << "String tensor: " << endl << a << endl << endl;
=>
String tensor:
yolo yolo yolo
yolo yolo yolo

2、全部置零：setZero()

3、从列表、数据初始化：setValues({..initializer_list})

Eigen::Tensor<float, 2> a(2, 3);
a.setValues({{0.0f, 1.0f, 2.0f}, {3.0f, 4.0f, 5.0f}});
cout << "a" << endl << a << endl << endl;
=>
a
0 1 2
3 4 5

如果给定的数组数据，少于矩阵元素的个数，那么后面不足的元素其值不变：

Eigen::Tensor<int, 2> a(2, 3);
a.setConstant(1000);
a.setValues({{10, 20, 30}});
cout << "a" << endl << a << endl << endl;
=>
a
10   20   30
1000 1000 1000

4、随机初始化： setRandom()

a.setRandom();
cout << "Random: " << endl << a << endl << endl;
=>
Random:0.680375    0.59688  -0.329554    0.10794-0.211234   0.823295   0.536459 -0.04520590.566198  -0.604897  -0.444451   0.257742

当然也可以设置指定的随机生成器，类似于python 的 random seed。也可以选择初始化方法：

UniformRandomGenerator
NormalRandomGenerator

5、数据指针： Scalar* data() and const Scalar* data() const，一般用于与其它库、类型数据转换的时候使用，比如opencv mat类型等

Eigen::Tensor<float, 2> a(3, 4);
float* a_data = a.data();
a_data[0] = 123.45f;
cout << "a(0, 0): " << a(0, 0);
=> a(0, 0): 123.45

十一、Tensor对象常用成员函数

1、构造相同形状的矩阵，数值初始化为val：constant(const Scalar& val)

Eigen::Tensor<float, 2> a(2, 3);
a.setConstant(1.0f);
Eigen::Tensor<float, 2> b = a + a.constant(2.0f);
Eigen::Tensor<float, 2> c = b * b.constant(0.2f);
cout << "a" << endl << a << endl << endl;
cout << "b" << endl << b << endl << endl;
cout << "c" << endl << c << endl << endl;
=>
a
1 1 1
1 1 1b
3 3 3
3 3 3c
0.6 0.6 0.6
0.6 0.6 0.6

2、构造形状相同，数值随机初始化：

Eigen::Tensor<float, 2> a(2, 3);
a.setConstant(1.0f);
Eigen::Tensor<float, 2> b = a + a.random();
cout << "a" << endl << a << endl << endl;
cout << "b" << endl << b << endl << endl;
=>
a
1 1 1
1 1 1b
1.68038   1.5662  1.82329
0.788766  1.59688 0.395103

十二、运算符操作：这些操作都是element wise 操作

除了加减乘除之外，还有逻辑运算：

operator&&(const OtherDerived& other)
operator||(const OtherDerived& other)
operator<(const OtherDerived& other)
operator<=(const OtherDerived& other)
operator>(const OtherDerived& other)
operator>=(const OtherDerived& other)
operator==(const OtherDerived& other)
operator!=(const OtherDerived& other)

两个矩阵的逻辑运算返回的都是布尔类型矩阵。

十三、选择运算：Selection (select(const ThenDerived& thenTensor, const ElseDerived& elseTensor)

Tensor<bool, 3> if = ...;
Tensor<float, 3> then = ...;
Tensor<float, 3> else = ...;
Tensor<float, 3> result = if.select(then, else);

如果if矩阵中的对应元素为1，那么返回的矩阵的对应元素选择then中对应的元素值，否则选择else中的元素值。

从零开始编写深度学习库（四）Eigen::Tensor学习使用及代码重构相关推荐

从零开始编写深度学习库（三）ActivationLayer网络层CPU实现
从零开始编写深度学习库(三)ActivationLayer网络层CPU实现博客:http://blog.csdn.net/hjimce 微博:黄锦池-hjimce qq:1393852684 一 ...
从零开始编写深度学习库（二）FullyconnecteLayer CPU编写
从零开始编写深度学习库(二)FullyconnecteLayer CPU编写博客:http://blog.csdn.net/hjimce 微博:黄锦池-hjimce qq:1393852684 ...
从零开始编写深度学习库（一）SoftmaxWithLoss CPU编写
从零开始编写深度学习库(一)SoftmaxWithLoss CPU编写博客:http://blog.csdn.net/hjimce 微博:黄锦池-hjimce qq:1393852684 一.C ...
从零开始编写深度学习库（五）Eigen Tensor学习笔记2.0
1.extract_image_patches函数的使用: 假设Eigen::Tensor形状为(3,8,8,9),现在要对第二维.第三维根据size大小为(2,2),stride=(2,2),那么如 ...
从零开始编写深度学习库（五）PoolingLayer 网络层CPU编写
记录:编写卷积层和池化层,比较需要注意的细节就是边界问题,还有另外一个就是重叠池化的情况,这两个小细节比较重要,边界问题pad在反向求导的时候,由于tensorflow是没有计算的,另外一个比较烦人的 ...
从零开始编写深度学习库（五）ConvolutionLayer CPU编写
对于池化层来说,需要注意的问题是:重叠池化,还有边界处理模式:valid.same模式这两个细节.由于我采用的边界处理方式,与tensorflow 在same模式下边界处理方式不同,valid模式下是 ...
小样本点云深度学习库_论文 | 小样本学习综述
转自:专知[https://www.zhuanzhi.ai/] [导读]现有的机器学习方法在很多场景下需要依赖大量的训练样本.但机器学习方法是否可以模仿人类,基于先验知识等,只基于少量的样本就可以进行 ...
深度学习（四十九）Tensorflow提高代码效率笔记
推荐系统学习（四）推荐系统学习资料(补充中...)
会议 ACM SIGKDD数据挖掘及知识发现会议:http://www.kdd.org/ ACM SIGIR国际计算机协会信息检索大会(SIGIR, Special Interest Group on ...

从零开始编写深度学习库（四）Eigen::Tensor学习使用及代码重构

从零开始编写深度学习库（四）Eigen::Tensor学习使用及代码重构相关推荐

最新文章

热门文章