本文介绍了用于Java开发机器学习和深度学习的Vector API
英语原文链接 https://software.intel.com/en-us/articles/vector-api-developer-program-for-java

Vector API教程

介绍
什么是SIMD？
什么是Vector API？
- Vector 接口
- Vector Type
- Vector 运算
机器学习中的性能提升
- 基本线性代数子程序（BLAS）
- 图像处理过滤
编写Vector代码
- 在Java *中使用Vector API
- 简单的矢量循环
- 教程：编写own-vector算法
- 教程：所有关于Vector API的知识
入门
- 构筑Vector API
- - 将JDK8二进制文件设置为JAVA_HOME
  - 下载并编译Panama源码
  - 使用Panama JDK构建自己的应用程序
  - 运行你的应用程序
- IDE配置
- - 配置IntelliJ以进行OpenJDK Panama开发
Vector 范例
- BLAS机器学习
- - BLAS-I
  - BLAS-II（DSPR）
  - BLAS-III（DSYR2K）
  - BLASS-III（DGEMM）
- 金融服务（FSI）算法
- - GetOptionPrice
  - BinomialOptions

介绍

如今，大数据应用程序，分布式深度学习和人工智能解决方案可以直接在现有的Apache Spark *或Apache Hadoop *集群之上运行，并可以从有效的横向扩展中受益。为了在这些应用程序中获得理想的数据并行性，Open JDK Project Panama提供了Vector API。用于Java *软件的Vector API开发人员计划提供了广泛的方法，可以丰富Java开发人员的机器学习和深度学习体验。

视频地址:https://www.youtube.com/embed/X49ucwtwuU0?feature=oembed&enablejsapi=1

本文向Java开发人员介绍Vector API，说明了如何在Java程序中开始使用API，并提供了矢量算法的示例。提供了有关如何构建矢量API以及如何使用它来构建Java应用程序的分步详细信息。此外，我们提供了有关如何在Java中为自己的算法实现Vector代码(后文翻译为矢量或向量)以提高性能的详细教程。

什么是SIMD？

单指令多数据（SIMD）允许在多个数据点上同时执行相同的操作，这得益于应用程序中数据级别的并行性。现代CPU具有高级SIMD操作支持，例如提供SIMD（指令）加速功能的AVX2，AVX3。

大数据应用程序（例如Apache Flink，Apache Spark机器学习库和Intel Big DL，数据分析和深度学习培训工作负载等）运行高度数据并行的算法。Java中的强大的SIMD支持将为扩展其中一些领域提供途径。

什么是Vector API？

用于Java *软件的Vector API开发人员项目使使用Java编写计算密集型应用程序，机器学习和人工智能算法，在没有Java本机接口（JNI）性能开销或对不可移植的本机代码的进一步维护需求的情况下,成为可能。API引入了一组用于对分大小的vector-types进行数据并行操作的方法，以便直接在Java中进行编程，而无需任何有关底层CPU的知识。JVM JIT编译器将这些低级API进一步有效地映射到现代CPU上的SIMD指令，以实现所需的性能加速；否则，将使用默认的VM实现将Java字节码映射为硬件指令。

Vector 接口

Vector API接口如下所示：

Vector Type

Vector Type（Vector <E，S>）采用’E’表示元素类型和’S’表示形状或向量的按位长度。基于最近的进展，Panama 项目支持以下元素和形状的Vectors创建。

1 Element types: Byte, Short, Integer, Long, Float, and Double
2 Shape types (bit-size): 128, 256, and 512

选择矢量形状以将它们紧密映射到CPU平台上可用的最大SIMD寄存器上。

Vector 运算

所有这些Vector类型都可以使用基本的Vector-Vector功能。典型的算术和三角函数的矢量运算均以掩码格式提供。mask用于if-else类型的条件操作。

示例部分展示了如何在程序中使用 Vector mask。

01  public abstract class DoubleVector<S extends Vector.Shape<Vector<?,?>>> implements Vector<Double,S> {
02  Vector<Double, S> add (Vector<Double, S> v2);
03  Vector<Double,S> add (Vector<Double, S> o, Mask<Double, S> m);
04  Vector<Double, S> mul (Vector<Double, S> v2);
05  Vector<Double, S> mul (Vector<Double, S> o, Mask<Double, S> m);
06  ….
07 Vector<Double, S> sin ();
08  Vector<Double, S> sin (Mask<Double, S> m);
09  Vector<Double, S> sqrt (),
10  …
11}

Vector API还提供了金融服务行业（FSI）和机器学习应用程序中经常需要的更高级的Vector操作。

01public abstract class IntVector<S extends Vector.Shape<Vector<?,?>>> implements Vector<Integer,S> {
02int sumAll ();
03       void intoArray(int[] a, int ix);
04       void intoArray (int [] is, int ix, Mask<Integer, S> m);
05       Vector<Integer, S> fromArray (int [] fs, int ix);
06       Vector<Integer, S> blend (Vector<Integer, S> o, Mask<Integer, S> m);
07       Vector<Integer, S> shuffle (Vector<Integer, S> o, Shuffle<Integer, S> s);
08       Vector<Integer, S> fromByte (byte f);
09       …
10      }

机器学习中的性能提升

基本线性代数子程序（BLAS）

使用Vector实现BLAS I，II和III 例程可以将性能提高3-4倍。

BLAS I和II 例程通常在Apache Spark机器学习库中使用。这些适用于班轮模型和决策树的分类和回归，协同过滤和聚类以及降维问题。BLAS-III例程（例如GEMM）广泛用于解决人工智能中使用的深度学习和神经网络问题。

*Open JDK Project Panama source build 0918201709182017。Java Hotspot 64位服务器VM（混合模式）。操作系统版本：Cent OS 7.3 64位

英特尔®至强®铂金8180处理器（使用512字节和1024字节的浮点数据块）。

JVM选项：-XX：+ UnlockDiagnosticVMOptions -XX：-CheckIntrinsics -XX：TypeProfileLevel = 121 -XX：+ UseVectorApiIntrinsics

图像处理过滤

使用Vector API，棕褐色过滤的速度最高可提高6倍。

编写Vector代码

在Java *中使用Vector API

Vector接口是com.oracle.vector软件包的一部分，我们从Vector API开始，在程序中导入以下内容。根据向量类型，用户可以选择导入FloatVector，IntVector等。

1import jdk.incubator.vector.FloatVector;
2 import jdk.incubator.vector.Vector;
3 import jdk.incubator.vector.Shapes;

矢量类型（Vector <E，S>）具有两个参数。

‘E’：元素类型，广泛支持int，float和double基本类型。

“ S”指定矢量的形状或按位大小。

在使用向量运算之前，程序员必须创建一个第一个向量实例来捕获元素类型和向量形状。使用该特定大小和形状的矢量可以被创建。

1 private static final FloatVector.FloatSpecies<Shapes.S256Bit> species = (FloatVector.FloatSpecies<Shapes.S256Bit>) Vector.speciesInstance (Float.class, Shapes.S_256_BIT);
2 IntVector.IntSpecies<Shapes.S512Bit> ispec = (IntVector.IntSpecies<Shapes.S512Bit>) Vector.speciesInstance(Integer.class, Shapes.S_512_BIT);

从此以后，用户可以创建FloatVector <Shapes.S256Bit>和IntVector <Shapes.S512Bit>类型的矢量实例。

简单的矢量循环

在本节中，我们提供了矢量API编程的风格。Vector API白皮书<使用Java 编写自矢量算法以提高性能>中提供了有关如何编写矢量算法的详细技巧和窍门。BLAS和FSI例程的示例矢量代码示例可在后续章节中找到。

第一个示例展示两个数组的向量加法。程序使用诸如fromArray（），intoArray（）之类的向量操作将向量加载/存储到数组中。

向量add（）运算用于算术运算。

1 public static void AddArrays (float [] left, float [] right, float [] res, int i) {
2 FloatVector.FloatSpecies<Shapes.S256Bit> species = (FloatVector.FloatSpecies<Shapes.S256Bit>)
3       Vector.speciesInstance (Float.class, Shapes.S_256_BIT);
4       FloatVector<Shapes.S256Bit> l  = species.fromArray (left, i);
5       FloatVector<Shapes.S256Bit> r  = species.fromArray (right, i);
6       FloatVector<Shapes.S256Bit> lr = l.add(r);
7       lr.intoArray (res, i);
8}

通过使用species.length（）查询向量大小来编写向量循环。考虑下面的标量循环，它将数组A和B相加并将结果存储到数组C中。

1 for (int i = 0; i < C.length; i++) {
2     C[i] = A[i] + B[i];
3 }

向量化循环如下所示：

01 public static void add (int [] C, int [] A, int [] B) {
02        IntVector.IntSpecies<Shapes.S256Bit> species =
03        (IntVector.IntSpecies<Shapes.S256Bit>)     Vector.speciesInstance(Integer.class, Shapes.S_256_BIT);
04        int i;
05        for (i = 0; (i + species.length()) < C.length; i += species.length ()) {
06    IntVector<Shapes.S256Bit> av = species.fromArray (A, i);
07            IntVector<Shapes.S256Bit> bv = species.fromArray (B, i);
08            av.add(bv).intoArray(C, i);
09        }
10        for (; i < C.length; i++) { // Cleanup loop
11            C[i] = A[i] + B[i];
12        }
13    }

也可以以长度不可知的方式编写该程序，而与向量大小无关。随后的程序通过Shape设置矢量代码的参数。

01public class AddClass<S extends Vector.Shape<Vector<?, ?>>> {
02      private final FloatVector.FloatSpecies<S> spec;
03      AddClass (FloatVector.FloatSpecies<S> v) {spec = v; }
04      //vector routine for add
05       void add (float [] A, float [] B, float [] C) {
06        int i=0;
07        for (; i+spec.length ()<C.length;i+=spec.length ()) {
08            FloatVector<S> av = spec.fromArray (A, i);
09            FloatVector<S> bv = spec.fromArray (B, i);
10            av.add (bv).intoArray(C, i);
11        }
12       //clean up loop
13        for (;i<a.length;i++) C[i]=A[i]+B[i];

条件语句中的运算可以使用掩码以矢量形式编写。
标量例程如下，

1for (int i = 0; i < SIZE; i++) {
2    float res = b[i];
3    if (a[i] > 1.0) {
4      res = res * a[i];
5          }
6         c[i] = res;
7      }

使用mask的Vector例程如下。

01public void useMask (float [] a, float [] b, float [] c, int SIZE) {
02 FloatVector.FloatSpecies<Shapes.S256Bit> species = (FloatVector.FloatSpecies <Shapes.S256Bit>) Vector.speciesInstance   Float.class, Shapes.S_256_BIT);
03 FloatVector<Shapes.S256Bit> tv=species.broadcast (1.0f); int i = 0;
04 for (; i+ species.length() < SIZE; i+ = species.length()){
05   FloatVector<Shapes.S256Bit> rv = species.fromArray (b, i);
06   FloatVector<Shapes.S256Bit> av = species.fromArray (a, i);
07   Vector.Mask<Float,Shapes.S256Bit> mask = av.greaterThan (tv);
08   rv.mul (av, mask).intoArray(c, i);
09 }
10  //后续处理
11}

教程：编写own-vector算法

<Vector API：在Oracle Java中编写自矢量算法以提高性能> 白皮书提供了一些使用Vector API编写Java代码的技巧和窍门，并且还介绍了一些提高性能的方法。

这些示例应为您提供一些在Oracle Java *中进行矢量编程的准则和最佳实践，以帮助您成功地编写自己的计算密集型算法的矢量版本。

更多有关信息，请参见PDF附件。

教程：所有关于Vector API的知识

网址1 https://www.youtube.com/embed/jRyD1EIOOis?feature=oembed&enablejsapi=1
网址2 https://www.youtube.com/embed/videoseries?list=PLX8CzqL3ArzXJ2EGftrmz4SzS6NRr6p2n&enablejsapi=1

入门

构筑Vector API

本节假定用户熟悉基本的Linux实用程序。

将JDK8二进制文件设置为JAVA_HOME

Panama项目需要在系统上使用JDK8。可以从此位置下载JDK 。

＃export JAVA_HOME = / pathto / jdk1.8-u91
＃export PATH = $ JAVA_HOME / bin：$ PATH

下载并编译Panama源码

可以使用商业资源控制管理工具下载Project Panama的源码。

# hg clone http://hg.openjdk.java.net/panama/panama/
# source get_source.sh
# ./configure
# make all

使用Panama JDK构建自己的应用程序

我们需要将vector.jar文件从包含Panama源的父文件目录中复制到Java应用程序的位置。

01import jdk.incubator.vector.IntVector;
02import jdk.incubator.vector.Shapes;
03import jdk.incubator.vector.Vector;
04
05public class HelloVectorApi {
06    public static void main(String[] args) {
07        IntVector.IntSpecies<Shapes.S128Bit> species =
08                (IntVector.IntSpecies<Shapes.S128Bit>) Vector.speciesInstance(
09                        Integer.class, Shapes.S_128_BIT);
10        int val = 1;
11        IntVector<Shapes.S128Bit> hello = species.broadcast(val);
12        if (hello.sumAll() == val * species.length()) {
13            System.out.println("Hello Vector API!");
14        }
15    }
16}

运行你的应用程序

/pathto/panama/build/linux-x86_64-normal-server-release/images/jdk/bin/java --add-modules=jdk.incubator.vector -XX:TypeProfileLevel=121 HelloVectorApi

IDE配置

配置IntelliJ以进行OpenJDK Panama开发

1）创建一个新项目。如果是刚安装IntelliJ或没有打开过项目，则在出现的窗口中点击“Create New Project”（您可以在下面的窗口中看到）。

否则，File > New > Project… 也有相同的效果。

2）在出现的“New Project”窗口中，确保选择左侧的Java。选择Panama编译作为Project SDK。

如果尚未将Panama build设置为Project SDK，请按右侧的“New…”按钮。否则，请转到步骤4。

3）弹出的窗口叫做“Select Home Directory for JDK”。您要选择的路径是/ path / to / panama / build / linux-x86_64-normal-server-release / images / jdk。点击确定。

4）单击下一步。此时，您可以选择从模板创建项目。继续并选择“Command Line App”，然后再次单击下一步。

5）为您的项目命名和位置，然后单击“完成”。

6）创建项目后，需要执行几个步骤才能成功使用Vector API。转到File > Project Structure…

7）确保在左窗格中选择了“项目”。将“Project language level:”更改为“9 - Modules, private methods in interfaces etc.”。最后，点击OK。

8）在显示目录结构的左窗格中，右键单击“ src”文件夹。导航到New > module-info.java

9）在此文件中，添加以下行“ requires jdk.incubator.vector;”。保存文件。

10）返回Main.java。添加使用API的所需代码。有关示例，请参见HelloVectorApi.java。

11）在运行应用程序之前，需要编辑运行配置。按下带有“play”按钮旁边的带有类名的按钮。您应该看到“Edit Configurations…”。点击。

12）在VM选项中，需要添加“ -XX：TypeProfileLevel = 121 -XX：+ UseVectorApiIntrinsics”。这两项在以后都可能成为可选的。如果您想开启/关闭将VectorApi转换为优化的x86内在函数的优化（出于稳定性考虑），您将需要添加：

“-XX：-UseVectorApiIntrinsics”。

13）按下“play”按钮以编译并运行该应用程序。在终端窗口的屏幕底部，您应该看到“ Hello Vector API！”或任何应用程序打印出的输出。

Vector 范例

BLAS机器学习

如果满足以下条件，则允许以源代码和二进制形式进行重新分发和使用，无论是否经过修改，都可以：

版权持有者和贡献者按“原样”提供此软件，不提供任何明示或暗示的担保，包括但不限于针对特定目的的适销性和适用性的暗示担保。在任何情况下，版权持有人或贡献者均不对任何直接，间接，偶发，特殊，专有或后果性的损害（包括但不限于，替代商品或服务的购买，使用，数据，或业务中断），无论基于合同，严格责任或侵权行为（包括疏忽或其他方式），无论是出于任何责任，无论是否出于使用本软件的目的，即使已经事先告知，也已作了规定。

BLAS-I

01import jdk.incubator.vector.DoubleVector;
02import jdk.incubator.vector.Vector;
03import jdk.incubator.vector.Shapes;
04import java.lang.Math;
05
06public class BLAS  {
07
08
09
10    static void VecDaxpy(double[] a, int a_offset, double[] b, int b_offset, double alpha) {
11        DoubleVector.DoubleSpecies<Shapes.S512Bit> spec= (DoubleVector.DoubleSpecies<Shapes.S512Bit>) Vector.speciesInstance(Double.class, Shapes.S_512_BIT);
12        DoubleVector<Shapes.S512Bit> alphaVec = spec.broadcast(alpha);
13        int i = 0;
14        for (; (i + a_offset+ spec.length()) < a.length && (i + b_offset + spec.length()) < b.length; i += spec.length()) {
15            DoubleVector<Shapes.S512Bit> bv = spec.fromArray(b, i + b_offset);
16            DoubleVector<Shapes.S512Bit> av = spec.fromArray(a, i + a_offset);
17            bv.add(av.mul(alphaVec)).intoArray(b, i + b_offset);
18        }
19
20        for (; i+a_offset < a.length && i+b_offset<b.length; i++) b[i + b_offset] += alpha * a[i + a_offset]; //tail
21    }
22
23    static void VecDaxpyFloat(float[] a, int a_offset, float[] b, int b_offset, float alpha) {
24        FloatVector.FloatSpecies<Shapes.S256Bit> spec= (FloatVector.FloatSpecies<Shapes.S256Bit>) Vector.speciesInstance(Float.class, Shapes.S_256_BIT);
25
26        int i = 0;
27        for (; (i + a_offset+spec.length()) < a.length && (i+b_offset+spec.length())<b.length; i += spec.length()) {
28
29            FloatVector<Shapes.S256Bit> bv = spec.fromArray(b, i + b_offset);
30            FloatVector<Shapes.S256Bit> av = spec.fromArray(a, i + a_offset);
31            FloatVector<Shapes.S256Bit> alphaVec = spec.broadcast(alpha);
32            bv.add(av.mul(alphaVec)).intoArray(b, i + b_offset);
33        }
34
35        for (; i+a_offset < a.length && i+b_offset<b.length; i++) b[i + b_offset] += alpha * a[i + a_offset];
36    }
37
38
39    static void VecDdot(double[] a, int a_offset, double[] b, int b_offset) {
40        DoubleVector.DoubleSpecies<Shapes.S512Bit> spec= (DoubleVector.DoubleSpecies<Shapes.S512Bit>) Vector.speciesInstance(Double.class, Shapes.S_512_BIT);
41
42        int i = 0; double sum = 0;
43        for (; (i + a_offset + spec.length()) < a.length && (i + b_offset+ spec.length()) < b.length; i += spec.length()) {
44            DoubleVector<Shapes.S512Bit> l = spec.fromArray(a, i + a_offset);
45            DoubleVector<Shapes.S512Bit> r = spec.fromArray(b, i + b_offset);
46            sum+=l.mul(r).sumAll();
47        }
48        for (; (i + a_offset < a.length) && (i + b_offset < b.length); i++) sum += a[i+a_offset] * b[i+b_offset]; //tail
49    }
50
51    static void VecDdotFloat(float[] a, int a_offset, float[] b, int b_offset) {
52        FloatVector.FloatSpecies<Shapes.S256Bit> spec= (FloatVector.FloatSpecies<Shapes.S256Bit>) Vector.speciesInstance(Float.class, Shapes.S_256_BIT);
53
54        int i = 0; float sum = 0;
55        for (; i+a_offset + spec.length() < a.length && i+b_offset+spec.length()<b.length; i += spec.length()) {
56            FloatVector<Shapes.S256Bit> l = spec.fromArray(a, i + a_offset);
57            FloatVector<Shapes.S256Bit> r = spec.fromArray(b, i + b_offset);
58            sum+=l.mul(r).sumAll();
59        }
60        for (; i+a_offset < a.length && i+b_offset<b.length; i++) sum += a[i+a_offset] * b[i+b_offset]; //tail
61    }
62}

BLAS-II（DSPR）

01
import jdk.incubator.vector.DoubleVector;
02
import jdk.incubator.vector.Vector;
03
import jdk.incubator.vector.Shapes;
04
05public class BLAS_II {
06
07    public static void VecDspr(String uplo, int n, double alpha, double[] x, int _x_offset, int incx, double[] ap, int _ap_offset) {
08        DoubleVector.DoubleSpecies<Shapes.S512Bit> spec= (DoubleVector.DoubleSpecies<Shapes.S512Bit>) Vector.speciesInstance(Double.class, Shapes.S_512_BIT);
09
10        double temp = 0.0;
11        int i = 0;
12        int ix = 0;
13        int j = 0;
14        int jx = 0;
15        int k = 0;
16        int kk = 0;
17        int kx = 0;
18        kk = 1;
19        if (uplo.equals("U")) {
20            // *        Form  A  when upper triangle is stored in AP.
21            if (incx == 1) {
22                for (j=0; j<n; j++) {
23                    if (x[j+_x_offset] != 0.0) {
24                        temp = alpha*x[j+_x_offset];
25                        DoubleVector<Shapes.S512Bit> tv = spec.broadcast(temp);
26                        for (i=0, k=kk; i+spec.length()<=j && i + _x_offset + spec.length() < x.length && k + _ap_offset + spec.length() < ap.length; i+= spec.length(), k+=spec.length()) {
27                            DoubleVector<Shapes.S512Bit> av = spec.fromArray(ap, k+_ap_offset);
28                            DoubleVector<Shapes.S512Bit> xv = spec.fromArray(x, i+_x_offset);
29av.add(xv.mul(tv)).intoArray(ap,k+_ap_offset);
30}
31for (; i<=j && i + _x_offset < x.length && k + _ap_offset <ap.length; i++, k++) {
32ap[k+_ap_offset]=ap[k+_ap_offset]+x[i+_x_offset]*temp;
33}
34}
35kk = kk + j;
36}
37}
38} else {
39// *        Form  A  when lower triangle is stored in AP.
40if (incx == 1) {
41for (j=0; j<n; j++) {
42if (x[j+_x_offset] != 0.0) {
43temp = alpha*x[j+_x_offset];
44DoubleVector<Shapes.S512Bit> tv=spec.broadcast(temp);
45k = kk;
46for (i=j; i+spec.length()<n && i + _x_offset + spec.length() < x.length && k + _ap_offset + spec.length() < ap.length; i+=spec.length(), k+=spec.length()) {
47DoubleVector<Shapes.S512Bit> av = spec.fromArray(ap, k+_ap_offset);
48DoubleVector<Shapes.S512Bit> xv = spec.fromArray(x, i+_x_offset);
49av.add(xv.mul(tv)).intoArray(ap,k+_ap_offset);
50}
51for (; i<n && i + _x_offset < x.length && k + _ap_offset <ap.length; i++, k++) {
52ap[k+_ap_offset] = ap[k+_ap_offset]+x[i+_x_offset]*temp;
53}
54}
55kk = kk+n-j;
56}
57}
58}
59}
6061
}
BLAS-II（DYSR）
01
import jdk.incubator.vector.DoubleVector;
02
import jdk.incubator.vector.Shapes;
03
import jdk.incubator.vector.Vector;
0405
public class BLAS2DSYR {
060708public static void VecDsyr(String uplo, int n, double alpha, double[] x, int _x_offset, int incx, double[] a, int _a_offset, int lda) {
09DoubleVector.DoubleSpecies<Shapes.S512Bit> spec= (DoubleVector.DoubleSpecies<Shapes.S512Bit>) Vector.speciesInstance(Double.class, Shapes.S_512_BIT);
10double temp = 0.0;
11int i = 0;
12int ix = 0;
13int j = 0;
14int jx = 0;
15int kx = 0;
1617if (uplo.equals("U") && incx == 1) {
18for (j=0; j<n; j++) {
19if (x[j+_x_offset] != 0.0) {
20temp=alpha*x[j+_x_offset];
21DoubleVector<Shapes.S512Bit> tv = spec.broadcast(temp);
22for (i=0; (i+spec.length())<=j && i+_x_offset+spec.length()<x.length && i+j*lda+_a_offset+spec.length()<a.length; i+= spec.length()) {
23DoubleVector<Shapes.S512Bit> xv = spec.fromArray(x, i+_x_offset);
24DoubleVector<Shapes.S512Bit> av = spec.fromArray(a, i+j*lda+_a_offset);
25av.add(xv.mul(tv)).intoArray(a,i+j*lda+_a_offset);
26}
27for (; i<=j && i+j*lda+_a_offset<a.length && i+_x_offset<x.length; i++) {
28a[i+j*lda+_a_offset] = a[i+j*lda+_a_offset]+x[i+_x_offset]*temp;
29}
30}
31}
3233} else if (uplo.equals("L") && incx == 1) {
34for (j = 0; j < n; j++) {
35if (x[j+_x_offset] != 0.0) {
36temp=alpha*x[j+_x_offset];
37DoubleVector<Shapes.S512Bit> tv = spec.broadcast(temp);
38for (i=j; (i+spec.length())<n && i+_x_offset+spec.length()<x.length && i+j*lda+_a_offset+spec.length()<a.length; i+=spec.length()) {
39DoubleVector<Shapes.S512Bit> xv = spec.fromArray(x,i+_x_offset);
40DoubleVector<Shapes.S512Bit> av = spec.fromArray(a,i+j*lda+_a_offset);
41av.add(xv.mul(tv)).intoArray(a,i+j*lda+_a_offset);
42}
43for (; i<n && i+j*lda+_a_offset<a.length && i+_x_offset<x.length; i++) {
44a[i+j*lda+_a_offset]=a[i+j*lda+_a_offset]+x[i+_x_offset]*temp;
45}
46}
47}
4849}
5051}
52
}

BLAS-III（DSYR2K）

001
import jdk.incubator.vector.FloatVector;
002
import jdk.incubator.vector.DoubleVector;
003
import jdk.incubator.vector.Shapes;
004
import jdk.incubator.vector.Vector;
005006
public class BLAS3DSYR2K {
007008009public void VecDsyr2k(String uplo, String trans, int n, int k, double alpha, double[] a, int _a_offset, int lda, double[] b, int _b_offset, int ldb, double beta, double[] c, int _c_offset, int Ldc) {
010DoubleVector.DoubleSpecies<Shapes.S512Bit> spec= (DoubleVector.DoubleSpecies<Shapes.S512Bit>) Vector.speciesInstance(Double.class, Shapes.S_512_BIT);
011double temp1 = 0.0;
012double temp2 = 0.0;
013int i = 0;
014int info = 0;
015int j = 0;
016int l = 0;
017int nrowa = 0;
018boolean upper = false;
019if (trans.equals("N")) {
020nrowa = n;
021} else {
022nrowa = k;
023}              //  Close else.
024DoubleVector<Shapes.S512Bit> zeroVec = spec.broadcast(0.0D);
025DoubleVector<Shapes.S512Bit> betaVec = spec.broadcast(beta);
026upper = uplo.equals("U");
027if (alpha == 0.0) {
028if (upper) {
029if (beta == 0.0) {
030for (j = 0; j < n; j++) {
031i = 0;
032for (; (i + spec.length()) < j; i += spec.length()) {
033zeroVec.intoArray(c, i + j * Ldc + _c_offset);
034}
035for (; i < j; i++) {
036c[i + j * Ldc + _c_offset] = 0.0;
037}
038}
039} else {
040for (j = 0; j < n; j++) {
041i = 0;
042for (; (i + spec.length()) < j; i += spec.length()) {
043DoubleVector<Shapes.S512Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
044cV.mul(betaVec).intoArray(c, i + j * Ldc + _c_offset);
045}
046for (; i < j; i++) {
047c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset];
048}
049}
050}
051}
052053//lower
054else {
055if (beta == 0.0) {
056for (j = 0; j < n; j++) {
057i = j;
058for (; i + spec.length() < n; i += spec.length()) {
059zeroVec.intoArray(c, i + j * Ldc + _c_offset);
060}
061for (; i < n; i++) {
062c[i + j * Ldc + _c_offset] = 0.0;
063}
064}
065} else {
066for (j = 0; j < n; j++) {
067i = j;
068for (; i + spec.length() < n; i += spec.length()) {
069DoubleVector<Shapes.S512Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
070cV.mul(betaVec).intoArray(c, i + j * Ldc + _c_offset);
071}
072}
073for (; i < n; i++) {
074c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset];
075}
076}
077}
078}
079//start operations
080if (trans.equals("N")) {
081// *        Form  C := alpha*A*B**T + alpha*B*A**T + C.
082if (upper) {
083for (j = 0; j < n; j++) {
084if (beta == 0.0) {
085i = 0;
086for (; i + spec.length() < j; i += spec.length()) {
087zeroVec.intoArray(c, i + j * Ldc + _c_offset);
088}
089for (; i < j; i++) {
090c[i + j * Ldc + _c_offset] = 0.0;
091}
092093} else if (beta != 1.0) {
094i = 0;
095for (; i + spec.length() < j; i += spec.length()) {
096DoubleVector<Shapes.S512Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
097cV.mul(betaVec).intoArray(c, i + j * Ldc + _c_offset);
098}
099for (; i < j; i++) {
100c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset];
101}
102}
103104for (l = 0; l < k; l++) {
105if ((a[j + l * lda + _a_offset] != 0.0) || (b[j + l * ldb + _b_offset] != 0.0)) {
106temp1 = alpha * b[j + l * ldb + _b_offset]; DoubleVector<Shapes.S512Bit> tv1 = spec.broadcast(temp1);
107temp2 = alpha * a[j + l * lda + _a_offset]; DoubleVector<Shapes.S512Bit> tv2 = spec.broadcast(temp2);
108i = 0;
109for (; (i + spec.length()) < j; i += spec.length()) {
110DoubleVector<Shapes.S512Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
111DoubleVector<Shapes.S512Bit> bV = spec.fromArray(b, i + l * ldb + _b_offset);
112DoubleVector<Shapes.S512Bit> aV = spec.fromArray(a, i + l * lda + _a_offset);
113cV.add(aV.mul(tv1)).add(bV.mul(tv2)).intoArray(c, i + j * Ldc + _c_offset);
114}
115for (; i < j; i++) {
116c[i + j * Ldc + _c_offset] = c[i + j * Ldc + _c_offset] + a[i + l * lda + _a_offset] * temp1 + b[i + l * ldb + _b_offset] * temp2;
117}
118}
119}
120}
121} else {
122123for (j = 0; j < n; j++) {
124if (beta == 0.0) {
125i = j;
126for (; (i + spec.length()) < n; i += spec.length()) {
127zeroVec.intoArray(c, i + j * Ldc + _c_offset);
128}
129for (; i < n; i++) {
130c[i + j * Ldc + _c_offset] = 0.0;
131}
132} else if (beta != 1.0) {
133i = j;
134for (; (i + spec.length()) < n; i += spec.length()) {
135DoubleVector<Shapes.S512Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
136cV.mul(betaVec).intoArray(c, i + j * Ldc + _c_offset);
137}
138for (; i < n; i++) {
139c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset];
140}
141}
142for (l = 0; l < k; l++) {
143if ((a[j + l * lda + _a_offset] != 0.0) || (b[j + l * ldb + _b_offset] != 0.0)) {
144temp1 = alpha * b[j + l * ldb + _b_offset]; DoubleVector<Shapes.S512Bit> tv1 = spec.broadcast(temp1);
145temp2 = alpha * a[j + l * lda + _a_offset]; DoubleVector<Shapes.S512Bit> tv2 = spec.broadcast(temp2);
146i = j;
147for (; i + spec.length() < n; i += spec.length()) {
148DoubleVector<Shapes.S512Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
149DoubleVector<Shapes.S512Bit> aV = spec.fromArray(a, i + l * lda + _a_offset);
150DoubleVector<Shapes.S512Bit> bV = spec.fromArray(b, i + l * ldb + _b_offset);
151cV.add(aV.mul(tv1)).add(bV.mul(tv2)).intoArray(c, i + j * Ldc + _c_offset);
152}
153for (; i < n; i++) {
154c[i + j * Ldc + _c_offset] = c[i + j * Ldc + _c_offset] + a[i + l * lda + _a_offset] * temp1 + b[i + l * ldb + _b_offset] * temp2;
155}
156}
157}
158}
159}
160} else {
161162
// *        Form  C := alpha*A**T*B + alpha*B**T*A + C.
163if (upper) {
164for (j = 0; j < n; j++) {
165for (i = 0; i < j; i++) {
166temp1 = 0.0;
167temp2 = 0.0;
168l = 0;
169for (; l + spec.length() < k; l += spec.length()) {
170DoubleVector<Shapes.S512Bit> aV1 = spec.fromArray(a, l + i * lda + _a_offset);
171DoubleVector<Shapes.S512Bit> bV1 = spec.fromArray(b, l + j * ldb + _b_offset);
172DoubleVector<Shapes.S512Bit> aV2 = spec.fromArray(a, l + j * lda + _a_offset);
173DoubleVector<Shapes.S512Bit> bV2 = spec.fromArray(b, l + i * ldb + _b_offset);
174temp1 += aV1.mul(bV1).sumAll();
175temp2 += aV2.mul(bV2).sumAll();
176}
177for (; l < k; l++) {
178temp1 = temp1 + a[l + i * lda + _a_offset] * b[l + j * ldb + _b_offset];
179temp2 = temp2 + b[l + i * ldb + _b_offset] * a[l + j * lda + _a_offset];
180}
181if (beta == 0.0) {
182c[i + j * Ldc + _c_offset] = alpha * temp1 + alpha * temp2;
183} else {
184c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset] + alpha * temp1 + alpha * temp2;
185}
186}
187}
188} else {
189for (j = 0; j < n; j++) {
190for (i = j; i < n; i++) {
191temp1 = 0.0;
192temp2 = 0.0;
193l = 0;
194for (; l+spec.length() < k; l+=spec.length()) {
195DoubleVector<Shapes.S512Bit> aV1=spec.fromArray(a,l + i * lda + _a_offset);
196DoubleVector<Shapes.S512Bit> bV1=spec.fromArray(b,l + j * ldb + _b_offset);
197DoubleVector<Shapes.S512Bit> bV2=spec.fromArray(b,l + i * ldb + _b_offset);
198DoubleVector<Shapes.S512Bit> aV2=spec.fromArray(a,l + j * lda + _a_offset);
199temp1+=aV1.mul(bV1).sumAll();
200temp2+=aV2.mul(bV2).sumAll();
201}
202for (; l < k; l++) {
203temp1 = temp1 + a[l + i * lda + _a_offset] * b[l + j * ldb + _b_offset];
204temp2 = temp2 + b[l + i * ldb + _b_offset] * a[l + j * lda + _a_offset];
205}
206if (beta == 0.0) {
207c[i + j * Ldc + _c_offset] = alpha * temp1 + alpha * temp2;
208} else {
209c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset] + alpha * temp1 + alpha * temp2;
210}
211}
212}
213}
214}
215216}
217218static public void VecDsyr2kFloat(String uplo, String trans, int n, int k, float alpha, float[] a, int _a_offset, int lda, float[] b, int _b_offset, int ldb, float beta, float[] c, int _c_offset, int Ldc) {
219FloatVector.FloatSpecies<Shapes.S256Bit> spec= (FloatVector.FloatSpecies<Shapes.S256Bit>) Vector.speciesInstance(Float.class, Shapes.S_256_BIT);
220float temp1 = 0.0f;
221float temp2 = 0.0f;
222int i = 0;
223int info = 0;
224int j = 0;
225int l = 0;
226int nrowa = 0;
227boolean upper = false;
228if (trans.equals("N")) {
229nrowa = n;
230} else {
231nrowa = k;
232}              //  Close else.
233//FloatVector<Shapes.S256Bit> zeroVec = spec.broadcast(0.0f);
234// FloatVector<Shapes.S256Bit> betaVec = spec.broadcast(beta);
235upper = uplo.equals("U");
236if (alpha == 0.0) {
237if (upper) {
238if (beta == 0.0) {
239for (j = 0; j < n; j++) {
240i = 0;
241for (; (i + spec.length()) < j; i += spec.length()) {
242FloatVector<Shapes.S256Bit> zeroVec = spec.broadcast(0.0f);
243zeroVec.intoArray(c, i + j * Ldc + _c_offset);
244}
245for (; i <= j; i++) {
246c[i + j * Ldc + _c_offset] = 0.0f;
247}
248}
249} else {
250for (j = 0; j < n; j++) {
251i = 0;
252for (; (i + spec.length()) <= j; i += spec.length()) {
253FloatVector<Shapes.S256Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
254FloatVector<Shapes.S256Bit> betaVec = spec.broadcast(beta);
255cV.mul(betaVec).intoArray(c, i + j * Ldc + _c_offset);
256}
257for (; i <= j; i++) {
258c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset];
259}
260}
261}
262}
263264//lower
265else {
266if (beta == 0.0) {
267for (j = 0; j < n; j++) {
268i = j;
269for (; i + spec.length() < n; i += spec.length()) {
270FloatVector<Shapes.S256Bit> zeroVec = spec.broadcast(0.0f);
271zeroVec.intoArray(c, i + j * Ldc + _c_offset);
272}
273for (; i < n; i++) {
274c[i + j * Ldc + _c_offset] = 0.0f;
275}
276}
277} else {
278for (j = 0; j < n; j++) {
279i = j;
280for (; i + spec.length() < n; i += spec.length()) {
281FloatVector<Shapes.S256Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
282FloatVector<Shapes.S256Bit> betaVec = spec.broadcast(beta);
283cV.mul(betaVec).intoArray(c, i + j * Ldc + _c_offset);
284}
285}
286for (; i < n; i++) {
287c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset];
288}
289}
290}
291}
292//start operations
293if (trans.equals("N")) {
294// *        Form  C := alpha*A*B**T + alpha*B*A**T + C.
295if (upper) {
296for (j = 0; j < n; j++) {
297if (beta == 0.0) {
298i = 0;
299for (; i + spec.length() <= j; i += spec.length()) {
300FloatVector<Shapes.S256Bit> zeroVec = spec.broadcast(0.0f);
301zeroVec.intoArray(c, i + j * Ldc + _c_offset);
302}
303for (; i <= j; i++) {
304c[i + j * Ldc + _c_offset] = 0.0f;
305}
306307} else if (beta != 1.0) {
308i = 0;
309for (; i + spec.length() <= j; i += spec.length()) {
310FloatVector<Shapes.S256Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
311FloatVector<Shapes.S256Bit> betaVec = spec.broadcast(beta);
312cV.mul(betaVec).intoArray(c, i + j * Ldc + _c_offset);
313}
314for (; i <= j; i++) {
315c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset];
316}
317}
318319for (l = 0; l < k; l++) {
320if ((a[j + l * lda + _a_offset] != 0.0) || (b[j + l * ldb + _b_offset] != 0.0)) {
321temp1 = alpha * b[j + l * ldb + _b_offset];
322temp2 = alpha * a[j + l * lda + _a_offset];
323i = 0;
324for (; (i + spec.length()) <= j; i += spec.length()) {
325FloatVector<Shapes.S256Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
326FloatVector<Shapes.S256Bit> bV = spec.fromArray(b, i + l * ldb + _b_offset);
327FloatVector<Shapes.S256Bit> aV = spec.fromArray(a, i + l * lda + _a_offset);
328FloatVector<Shapes.S256Bit> tv1 = spec.broadcast(temp1);
329FloatVector<Shapes.S256Bit> tv2 = spec.broadcast(temp2);
330cV.add(aV.mul(tv1)).add(bV.mul(tv2)).intoArray(c, i + j * Ldc + _c_offset);
331}
332for (; i <= j; i++) {
333c[i + j * Ldc + _c_offset] = c[i + j * Ldc + _c_offset] + a[i + l * lda + _a_offset] * temp1 + b[i + l * ldb + _b_offset] * temp2;
334}
335}
336}
337}
338} else {
339340for (j = 0; j < n; j++) {
341if (beta == 0.0) {
342i = j;
343for (; (i + spec.length()) < n; i += spec.length()) {
344FloatVector<Shapes.S256Bit> zeroVec = spec.broadcast(0.0f);
345zeroVec.intoArray(c, i + j * Ldc + _c_offset);
346}
347for (; i < n; i++) {
348c[i + j * Ldc + _c_offset] = 0.0f;
349}
350} else if (beta != 1.0) {
351i = j;
352for (; (i + spec.length()) < n; i += spec.length()) {
353FloatVector<Shapes.S256Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
354FloatVector<Shapes.S256Bit> betaVec = spec.broadcast(beta);
355cV.mul(betaVec).intoArray(c, i + j * Ldc + _c_offset);
356}
357for (; i < n; i++) {
358c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset];
359}
360}
361for (l = 0; l < k; l++) {
362if ((a[j + l * lda + _a_offset] != 0.0) || (b[j + l * ldb + _b_offset] != 0.0)) {
363temp1 = alpha * b[j + l * ldb + _b_offset];
364temp2 = alpha * a[j + l * lda + _a_offset];
365i = j;
366for (; i + spec.length() < n; i += spec.length()) {
367FloatVector<Shapes.S256Bit> cV = spec.fromArray(c, i + j * Ldc + _c_offset);
368FloatVector<Shapes.S256Bit> aV = spec.fromArray(a, i + l * lda + _a_offset);
369FloatVector<Shapes.S256Bit> bV = spec.fromArray(b, i + l * ldb + _b_offset);
370FloatVector<Shapes.S256Bit> tv1 = spec.broadcast(temp1);
371FloatVector<Shapes.S256Bit> tv2 = spec.broadcast(temp2);
372cV.add(aV.mul(tv1)).add(bV.mul(tv2)).intoArray(c, i + j * Ldc + _c_offset);
373}
374for (; i < n; i++) {
375c[i + j * Ldc + _c_offset] = c[i + j * Ldc + _c_offset] + a[i + l * lda + _a_offset] * temp1 + b[i + l * ldb + _b_offset] * temp2;
376}
377}
378}
379}
380}
381} else {
382383
// *        Form  C := alpha*A**T*B + alpha*B**T*A + C.
384if (upper) {
385for (j = 0; j < n; j++) {
386for (i = 0; i < j; i++) {
387temp1 = 0.0f;
388temp2 = 0.0f;
389l = 0;
390for (; l + spec.length() < k; l += spec.length()) {
391FloatVector<Shapes.S256Bit> aV1 = spec.fromArray(a, l + i * lda + _a_offset);
392FloatVector<Shapes.S256Bit> bV1 = spec.fromArray(b, l + j * ldb + _b_offset);
393FloatVector<Shapes.S256Bit> aV2 = spec.fromArray(a, l + j * lda + _a_offset);
394FloatVector<Shapes.S256Bit> bV2 = spec.fromArray(b, l + i * ldb + _b_offset);
395temp1 += aV1.mul(bV1).sumAll();
396temp2 += aV2.mul(bV2).sumAll();
397}
398for (; l < k; l++) {
399temp1 = temp1 + a[l + i * lda + _a_offset] * b[l + j * ldb + _b_offset];
400temp2 = temp2 + b[l + i * ldb + _b_offset] * a[l + j * lda + _a_offset];
401}
402if (beta == 0.0) {
403c[i + j * Ldc + _c_offset] = alpha * temp1 + alpha * temp2;
404} else {
405c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset] + alpha * temp1 + alpha * temp2;
406}
407}
408}
409} else {
410for (j = 0; j < n; j++) {
411for (i = j; i < n; i++) {
412temp1 = 0.0f;
413temp2 = 0.0f;
414l = 0;
415for (; l+spec.length() < k; l+=spec.length()) {
416FloatVector<Shapes.S256Bit> aV1=spec.fromArray(a,l + i * lda + _a_offset);
417FloatVector<Shapes.S256Bit> bV1=spec.fromArray(b,l + j * ldb + _b_offset);
418FloatVector<Shapes.S256Bit> bV2=spec.fromArray(b,l + i * ldb + _b_offset);
419FloatVector<Shapes.S256Bit> aV2=spec.fromArray(a,l + j * lda + _a_offset);
420temp1+=aV1.mul(bV1).sumAll();
421temp2+=aV2.mul(bV2).sumAll();
422}
423for (; l < k; l++) {
424temp1 = temp1 + a[l + i * lda + _a_offset] * b[l + j * ldb + _b_offset];
425temp2 = temp2 + b[l + i * ldb + _b_offset] * a[l + j * lda + _a_offset];
426}
427if (beta == 0.0) {
428c[i + j * Ldc + _c_offset] = alpha * temp1 + alpha * temp2;
429} else {
430c[i + j * Ldc + _c_offset] = beta * c[i + j * Ldc + _c_offset] + alpha * temp1 + alpha * temp2;
431}
432}
433}
434}
435}
436437}
438439
} // End class.

BLASS-III（DGEMM）

001
import jdk.incubator.vector.DoubleVector;
002
import jdk.incubator.vector.Shapes;
003
import jdk.incubator.vector.Vector;
004005
public class BLAS3GEMM {
006007008009void VecDgemm(String transa, String transb, int m, int n, int k, double alpha, double[] a, int a_offset, int lda, double[] b, int b_offset, int ldb, double beta, double[] c, int c_offset, int ldc) {
010DoubleVector.DoubleSpecies<Shapes.S512Bit> spec= (DoubleVector.DoubleSpecies<Shapes.S512Bit>) Vector.speciesInstance(Double.class, Shapes.S_512_BIT);
011double temp = 0.0;
012int i = 0;
013int info = 0;
014int j = 0;
015int l = 0;
016int ncola = 0;
017int nrowa = 0;
018int nrowb = 0;
019boolean nota = false;
020boolean notb = false;
021DoubleVector<Shapes.S512Bit> zeroVec = spec.broadcast(0.0);
022023if (m == 0 || n == 0 || ((alpha == 0 || k == 0) && beta == 1.0))
024return;
025//double temp=0.0;
026if (alpha == 0.0) {
027if (beta == 0.0) {
028for (j = 0; j < n; j++) {
029for (i = 0; (i + spec.length()) < m && (i + j * ldc + c_offset+spec.length())<c.length; i += spec.length()) {
030zeroVec.intoArray(c, i + j * ldc + c_offset);
031}
032for (; i < m && i + j * ldc + c_offset<c.length; i++) {
033c[i + j * ldc + c_offset] = 0.0;
034}
035}
036}
037038//beta!=0.0
039else {
040for (j = 0; j < n; j++) {
041DoubleVector<Shapes.S512Bit> bv = spec.broadcast(beta);
042for (i = 0; (i + spec.length()) < m && (i + j * ldc + c_offset+spec.length())<c.length; i += spec.length()) {
043DoubleVector<Shapes.S512Bit> cv = spec.fromArray(c, i + j * ldc + c_offset);
044cv.mul(bv).intoArray(c, i + j * ldc + c_offset);
045}
046for (; i < m && i + j * ldc + c_offset<c.length; i++) c[i + j * ldc + c_offset] = beta * c[i + j * ldc + c_offset];
047}
048}
049050}
051052if (notb) {
053if (nota) {
054055
// *           Form  C := alpha*A*B + beta*C.
056057for (j = 0; j < n; j++) {
058if (beta == 0.0) {
059for (i = 0; (i + spec.length()) < m && (i + j * ldc + c_offset+spec.length())<c.length; i += spec.length()) {
060zeroVec.intoArray(c, i + j * ldc + c_offset);
061}
062for (; i < m; i++) {
063c[i + j * ldc + c_offset] = 0.0;
064}
065} else if (beta != 1.0) {
066DoubleVector<Shapes.S512Bit> bv = spec.broadcast(beta);
067for (i = 0; (i + spec.length()) < m && (i + j * ldc + c_offset+spec.length())<c.length; i += spec.length()) {
068DoubleVector<Shapes.S512Bit> cv = spec.fromArray(c, i + j * ldc + c_offset);
069cv.mul(bv).intoArray(c, i + j * ldc + c_offset);
070}
071for (; i < m && (i + j * ldc + c_offset)<c.length; i++) c[i + j * ldc + c_offset] = beta * c[i + j * ldc + c_offset];
072}
073074for (l = 0; l < k; l++) {
075if (b[l + j * ldb + b_offset] != 0.0) {
076temp = alpha * b[l + j * ldb + b_offset];
077DoubleVector<Shapes.S512Bit> tv = spec.broadcast(temp);
078for (i = 0; (i + spec.length()) < m && (i + l * lda + a_offset+spec.length())<a.length && (i + j * ldc + c_offset+spec.length())<c.length; i += spec.length()) {
079DoubleVector<Shapes.S512Bit> av = spec.fromArray(a, i + l * lda + a_offset);
080DoubleVector<Shapes.S512Bit> cv = spec.fromArray(c, i + j * ldc + c_offset);
081cv.add(av.mul(tv)).intoArray(c, i + j * ldc + c_offset); //tv.fma(av, cv).toDoubleArray(c, i+j*ldc+c_offset);
082}
083for (; i < m && (i + l * lda + a_offset)<a.length && (i + j * ldc + c_offset)<c.length; i++)
084c[i + j * ldc + c_offset] = c[i + j * ldc + c_offset] + temp * a[i + l * lda + a_offset];
085}
086}
087}
088} else {
089for (j = 0; j < n; j++) {
090for (i = 0; i < m; i++) {
091temp = 0.0;
092for (l = 0; (l + spec.length()) < k && (l + i * lda + a_offset+spec.length())<a.length && (l + j * ldb + b_offset+spec.length())<b.length; l += spec.length()) {
093DoubleVector<Shapes.S512Bit> av = spec.fromArray(a, l + i * lda + a_offset);
094DoubleVector<Shapes.S512Bit> bv = spec.fromArray(b, l + j * ldb + b_offset);
095temp += av.mul(bv).sumAll();
096}
097for (; l < k && l + i * lda + a_offset<a.length && l + j * ldb + b_offset<b.length; l++) temp = temp + a[l + i * lda + a_offset] * b[l + j * ldb + b_offset];
098099if (beta == 0.0) {
100c[i + j * ldc + c_offset] = alpha * temp;
101} else {
102c[i + j * ldc + c_offset] = alpha * temp + beta * c[i + j * ldc + c_offset];
103}
104}
105}
106107}
108} else {
109if (nota) {
110// *           Form  C := alpha*A*B**T + beta*C
111for (j = 0; j < n; j++) {
112if (beta == 0.0) {
113for (i = 0; (i + spec.length()) < m && (i + j * ldc + c_offset+spec.length())<c.length; i += spec.length()) {
114zeroVec.intoArray(c, i + j * ldc + c_offset);
115}
116for (; i < m && (i + j * ldc + c_offset)<c.length; i++) {
117c[i + j * ldc + c_offset] = 0.0;
118}
119} else if (beta != 1.0) {
120DoubleVector<Shapes.S512Bit> bv = spec.broadcast(beta);
121for (i = 0; (i + spec.length()) < m && (i + j * ldc + c_offset+spec.length())<c.length; i += spec.length()) {
122DoubleVector<Shapes.S512Bit> cv = spec.fromArray(c, i + j * ldc + c_offset);
123cv.mul(bv).intoArray(c, i + j * ldc + c_offset);
124}
125for (; i < m && i + j * ldc + c_offset<c.length; i++) {
126c[i + j * ldc + c_offset] = beta * c[i + j * ldc + c_offset];
127}
128}
129130for (l = 0; l < k; l++) {
131if (b[j + l * ldb + b_offset] != 0.0) {
132temp = alpha * b[j + l * ldb + b_offset];
133DoubleVector<Shapes.S512Bit> tv = spec.broadcast(temp);
134for (i = 0; (i + spec.length()) < m && (i + j * ldc + c_offset+spec.length())<c.length && (i + l * lda + a_offset+spec.length())<a.length; i += spec.length()) {
135DoubleVector<Shapes.S512Bit> cv = spec.fromArray(c, i + j * ldc + c_offset);
136DoubleVector<Shapes.S512Bit> av = spec.fromArray(a, i + l * lda + a_offset);
137cv.add(tv.mul(av)).intoArray(c, i + j * ldc + c_offset); //tv.fma(av, cv).toDoubleArray(c, i + j * ldc + c_offset);
138}
139for (; i < m && (i + j * ldc + c_offset)<c.length && (i + l * lda + a_offset)<a.length; i++)
140c[i + j * ldc + c_offset] = c[i + j * ldc + c_offset] + temp * a[i + l * lda + a_offset];
141}
142}
143}
144} else {
145// *           Form  C := alpha*A**T*B**T + beta*C
146for (j = 0; j < n; j++) {
147for (i = 0; i < m; i++) {
148temp = 0.0;
149for (l = 0; (l + spec.length()) < k && (l + i * lda + a_offset+spec.length())<a.length && (j + l * ldb + b_offset+spec.length())<b.length; l += spec.length()) {
150DoubleVector<Shapes.S512Bit> av = spec.fromArray(a, l + i * lda + a_offset);
151DoubleVector<Shapes.S512Bit> bv = spec.fromArray(b, j + l * ldb + b_offset);
152temp += av.mul(bv).sumAll();
153}
154for (; l < k && (l + i * lda + a_offset)<a.length && (j + l * ldb + b_offset)<b.length; l++) {
155temp = temp + a[l + i * lda + a_offset] * b[j + l * ldb + b_offset];
156}
157158if (beta == 0.0) {
159c[i + j * ldc + c_offset] = alpha * temp;
160} else {
161c[i + j * ldc + c_offset] = alpha * temp + beta * c[i + j * ldc + c_offset];
162}
163164}
165}
166}
167}
168169}
170171
}

金融服务（FSI）算法

GetOptionPrice

01
import jdk.incubator.vector.DoubleVector;
02
import jdk.incubator.vector.Shapes;
03
import jdk.incubator.vector.Vector;
0405
public class FSI_getOptionPrice  {
060708public static double getOptionPrice(double Sval, double Xval, double T, double[] z, int numberOfPaths, double riskFree, double volatility)
09{
10double val=0.0 , val2=0.0;
11double VBySqrtT = volatility * Math.sqrt(T);
12double MuByT = (riskFree - 0.5 * volatility * volatility) * T;
1314//Simulate Paths
15for(int path = 0; path < numberOfPaths; path++)
16{
17double callValue  = Sval * Math.exp(MuByT + VBySqrtT * z[path]) - Xval;
18callValue = (callValue > 0) ? callValue : 0;
19val  += callValue;
20val2 += callValue * callValue;
21}
2223double optPrice=0.0;
24optPrice = val / numberOfPaths;
25return (optPrice);
26}
272829public static double VecGetOptionPrice(double Sval, double Xval, double T, double[] z, int numberOfPaths, double riskFree, double volatility) {
30DoubleVector.DoubleSpecies<Shapes.S512Bit> spec= (DoubleVector.DoubleSpecies<Shapes.S512Bit>) Vector.speciesInstance(Double.class, Shapes.S_512_BIT);
31double val = 0.0, val2 = 0.0;
3233double VBySqrtT = volatility * Math.sqrt(T);
34DoubleVector<Shapes.S512Bit> VByVec = spec.broadcast(VBySqrtT);
35double MuByT = (riskFree - 0.5 * volatility * volatility) * T;
36DoubleVector<Shapes.S512Bit> MuVec = spec.broadcast(MuByT);
37DoubleVector<Shapes.S512Bit> SvalVec = spec.broadcast(Sval);
38DoubleVector<Shapes.S512Bit> XvalVec = spec.broadcast(Xval);
39DoubleVector<Shapes.S512Bit> zeroVec =spec.broadcast(0.0D);
4041//Simulate Paths
42int path = 0;
43for (; (path + spec.length()) < numberOfPaths; path += spec.length()) {
44DoubleVector<Shapes.S512Bit> zv = spec.fromArray(z, path);
45DoubleVector<Shapes.S512Bit> tv = MuVec.add(VByVec.mul(zv)).exp(); //Math.exp(MuByT + VBySqrtT * z[path])
46DoubleVector<Shapes.S512Bit> callValVec = SvalVec.mul(tv).sub(XvalVec);
47callValVec = callValVec.blend(zeroVec, callValVec.greaterThan(zeroVec));
48val += callValVec.sumAll();
49val2 += callValVec.mul(callValVec).sumAll();
50}
51//tail
52for (; path < numberOfPaths; path++) {
53double callValue = Sval * Math.exp(MuByT + VBySqrtT * z[path]) - Xval;
54callValue = (callValue > 0) ? callValue : 0;
55val += callValue;
56val2 += callValue * callValue;
57}
58double optPrice = 0.0;
59optPrice = val / numberOfPaths;
60return (optPrice);
61}
62
}

BinomialOptions

01
import jdk.incubator.oracle.vector.*;
0203
public class FSI_BinomialOptions  {
04050607public static void VecBinomialOptions(double[] stepsArray, int STEPS_CACHE_SIZE, double vsdt, double x, double s, int numSteps, int NUM_STEPS_ROUND, double pdByr, double puByr) {
08DoubleVector.DoubleSpecies<Shapes.S512Bit> spec= (DoubleVector.DoubleSpecies<Shapes.S512Bit>) Vector.speciesInstance(Double.class, Shapes.S_512_BIT);
09IntVector.IntSpecies<Shapes.S512Bit> ispec = (IntVector.IntSpecies<Shapes.S512Bit>) Vector.speciesInstance(Integer.class, Shapes.S_512_BIT);
1011//   double stepsArray [STEPS_CACHE_SIZE];
12DoubleVector<Shapes.S512Bit> sv = spec.broadcast(s);
13DoubleVector<Shapes.S512Bit> vsdtVec = spec.broadcast(vsdt);
14DoubleVector<Shapes.S512Bit> xv = spec.broadcast(x);
15DoubleVector<Shapes.S512Bit> pdv = spec.broadcast(pdByr);
16DoubleVector<Shapes.S512Bit> puv = spec.broadcast(puByr);
17DoubleVector<Shapes.S512Bit> zv = spec.broadcast(0.0D);
18IntVector<Shapes.S512Bit> inc = ispec.fromArray(new int[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}, 0);
19IntVector<Shapes.S512Bit> nSV = ispec.broadcast(numSteps);
20int j;
21for (j = 0; (j + spec.length()) < STEPS_CACHE_SIZE; j += spec.length()) {
22IntVector<Shapes.S512Bit> jv = ispec.broadcast(j);
23Vector<Double,Shapes.S512Bit> tv = jv.add(inc).cast(Double.class).mul(spec.broadcast(2.0D)).sub(nSV.cast(Double.class));
24DoubleVector<Shapes.S512Bit> pftVec=sv.mul(vsdtVec.mul(tv).exp()).sub(xv);
25pftVec.blend(zv,pftVec.greaterThan(zv)).intoArray(stepsArray,j);
26}
27for (; j < STEPS_CACHE_SIZE; j++) {
28double profit = s * Math.exp(vsdt * (2.0D * j - numSteps)) - x;
29stepsArray[j] = profit > 0.0D ? profit : 0.0D;
30}
3132for (j = 0; j < numSteps; j++) {
33int k;
34for (k = 0; k + spec.length() < NUM_STEPS_ROUND; k += spec.length()) {
35DoubleVector<Shapes.S512Bit> sv0 = spec.fromArray(stepsArray, k);
36DoubleVector<Shapes.S512Bit> sv1 = spec.fromArray(stepsArray, k + 1);
37pdv.mul(sv1).add(puv.mul(sv0)).intoArray(stepsArray, k); //sv0 = pdv.fma(sv1, puv.mul(sv0)); sv0.intoArray(stepsArray,k);
38}
39for (; k < NUM_STEPS_ROUND; ++k) {
40stepsArray[k] = pdByr * stepsArray[k + 1] + puByr * stepsArray[k];
41}
42}
43}
4445
}

有关编译器优化的更多完整信息，请参见优化注意事项。

附件	大小
PDF icon Vector API writing own-vector final 9-27-17.pdf	1.03 MB

用于Java开发机器学习和深度学习的Vector API(翻译)相关推荐

机器学习、深度学习、神经网络学习资料集合(开发必备)
最近整理了下AI方面的学习资料,包含了学习社区.入门教程.汲取学习.深度学习.自然语言处理.计算机视觉.数据分析.面试和书籍等方面的知识.在这里分享给大家,欢迎大家点赞收藏. 学习社区神力AI(MA ...
人工智能、机器学习、深度学习学习资料整理(开发必备)
最近整理了下AI方面的学习资料,包含了学习社区.入门教程.汲取学习.深度学习.自然语言处理.计算机视觉.数据分析.面试和书籍等方面的知识.在这里分享给大家,欢迎大家点赞收藏. 学习社区神力AI(MA ...
2021-03-28为什么用SCALA语言优势在哪里 Scala适合服务端、大数据、数据挖掘、NLP、图像识别、机器学习、深度学习…等等开发。
Go适合服务端.桌面应用程序开发. Scala适合服务端.大数据.数据挖掘.NLP.图像识别.机器学习.深度学习-等等开发. Python适合做网络爬虫.自动化运维.快速地实现算法的原型. 但是Pyt ...
基于.NET下的人工智能|利用ICSharpCore搭建基于.NET Core的机器学习和深度学习的本地开发环境...
每个人都习惯使用Python去完成机器学习和深度学习的工作,但是对于习惯于某种特定语言的人来说,转型不是那么容易的事.这两年我花了不少时间在Python,毕竟工作的重心也从移动开发转为机器学习和深度学 ...
利用ICSharpCore搭建基于.NET Core的机器学习和深度学习的本地开发环境
每个人都习惯使用Python去完成机器学习和深度学习的工作,但是对于习惯于某种特定语言的人来说,转型不是那么容易的事.这两年我花了不少时间在Python,毕竟工作的重心也从移动开发转为机器学习和深度学 ...
[转]机器学习和深度学习资料汇总【01】
本文转自:http://blog.csdn.net/sinat_34707539/article/details/52105681 <Brief History of Machine Learn ...
终于有人把自然语言处理、机器学习、深度学习和AI讲明白了
导读:本文将带你了解自然语言处理的概念.应用,以及与机器学习.深度学习和人工智能之间的关系. 作者:卡蒂克·雷迪·博卡(Karthiek Reddy Bokka).舒班吉·霍拉(Shubhangi H ...
机器学习和深度学习资料汇总【02】
<Image Scaling using Deep Convolutional Neural Networks> 介绍:使用卷积神经网络的图像缩放. <Proceedings of ...
人工智能、机器学习、深度学习从入门到进阶学习资料整理
最近整理了下在这里分享给大家,欢迎大家点赞收藏. 学习社区神力AI(MANA):国内最大的AI代码平台. Learn AI:一个AI学习交流中心. AI研习社:一个专注于AI开发者和学术青年求知求职 ...

用于Java开发机器学习和深度学习的Vector API(翻译)