入门图形学：ComputeShader

这里学习一下Compute Shader，顾名思义就是用于计算（Compute）的Shader。

一般情况下，我们写跑在GPU上的shader都是为了特定的渲染效果，而做运算则用c#CPU去处理。但是GPU的运算能力，特别是浮点数运算能力强于CPU不止十倍，CPU则在逻辑处理能力上强于GPU。那么如果我们碰到低逻辑判断高运算需求的情况下，如果能用上GPU运算，那就可以节省很多时间了。比如AI模型训练，基本都是在GPU上跑算法，我们做视频图像识别卷积运算，用CPU去干这事，基本卡冒烟，而GPU（特别是硬件支持深度学习的GPU加速）才能顶得住，而且游刃有余。

好了，言归正传，我们来学习unity的compute shader（简称CS），先上官方：

unity computeshader

因为GPU的并行运算能力很强，所以CS能帮我们加速某些运算。

PS：其实普通的unlitshader照样可以，如下：

Shader "Compute/ComputeUnlitShader"
{Properties{_MainTex ("Texture", 2D) = "white" {}_BinaThreshold("Binaryzation Threshold",Range(0,0.01)) = 0.5}SubShader{Tags { "RenderType"="Opaque" }LOD 100Pass{CGPROGRAM#pragma vertex vert#pragma fragment frag#include "UnityCG.cginc"struct appdata{float4 vertex : POSITION;float2 uv : TEXCOORD0;};struct v2f{float2 uv : TEXCOORD0;float4 vertex : SV_POSITION;};sampler2D _MainTex;float4 _MainTex_ST;float _BinaThreshold;v2f vert (appdata v){v2f o;o.vertex = UnityObjectToClipPos(v.vertex);o.uv = TRANSFORM_TEX(v.uv, _MainTex);return o;}//去色fixed4 decolor(fixed4 col){fixed g = 0.299 * col.r + 0.587 * col.g + 0.114 * col.b;return fixed4(g,g,g,1);}//查边fixed4 edgecolor(fixed4 gcol){fixed x = ddx(gcol.r);fixed y = ddy(gcol.r);fixed w = (x+y)/2;if(w>_BinaThreshold){return fixed4(1,1,1,1);}return fixed4(0,0,0,1);}fixed4 frag (v2f i) : SV_Target{fixed4 col = tex2D(_MainTex, i.uv);fixed4 gcol = decolor(col);fixed4 ecol = edgecolor(gcol);return ecol;}ENDCG}}
}

c#代码：

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.UI;public class TestComputeUnlitShader : MonoBehaviour
{public RawImage sourceImg;public RawImage destImg;public Material cptUnlitMat;public Texture2D sourceTex;public Texture2D destTex;void Start(){sourceImg.texture = sourceTex;RenderTexture tempRt = new RenderTexture(sourceTex.width, sourceTex.height, 0);Graphics.Blit(sourceTex, tempRt, cptUnlitMat);destTex = RT2Tex2D(tempRt);destImg.texture = destTex;}private Texture2D RT2Tex2D(RenderTexture rt){Texture2D tex = new Texture2D(rt.width, rt.height, TextureFormat.RGB24, false);RenderTexture.active = rt;tex.ReadPixels(new Rect(0, 0, rt.width, rt.height), 0, 0);tex.Apply();return tex;}
}

我们将数据储存到texture2d，使用fragment函数处理计算，通过c#graphics.blit获取运算后的texture2d，则相当于使用了GPU完成一次数据运算。效果如下：

通过shader对图像进行了一次二值化处理的过程。

接下来我们尝试一下CS，一般情况下通过GPU shader进行运算的入参都是texture2d，出参也是texture2d，毕竟纹理即数据。

先看下默认CS代码：

// Each #kernel tells which function to compile; you can have many kernels
#pragma kernel CSMain// Create a RenderTexture with enableRandomWrite flag and set it
// with cs.SetTexture
RWTexture2D<float4> Result;[numthreads(8,8,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{// TODO: insert actual code here!Result[id.xy] = float4(id.x & id.y, (id.x & 15)/15.0, (id.y & 15)/15.0, 0.0);
}

还是来一个逐句解析（然而你看自带的英文注释都已经解释的比较清楚了）：

#pragma kernel CSMain （定义入口函数，类似c# main函数，或者shader中#pragma vertex vert等定义函数，不过CS中kernel可以定义多个）

RWTexture2D<float4> Result; （全称read write texture2d，即入参出参纹理，前面我们也说了shader中入参出参都用texture2d纹理来承载数据）

[numthreads(8,4,1)] （1个4*8线程集合，1个8列4行的线程矩阵，用于并行的处理texture2d中的rgba数据）

msdn关于numthreads解释

Dispatch(5,3,2)：分配一个5*3*2的<三维线程数组>的三维线程数组

numthreads[10,8,3]：分配一个10*8*3的<三维线程数组>

ps：其中numthreads=x*y*z要小于GPU的流处理单元数量。

uint3 id : SV_DispatchThreadID （语义，绑定当前所处线程id，该id包含3个int值xyz，xyz代表当前线程在<整个线程数组的数组中>的三维索引。因为我们是大量并行线程同时运行，有这个id保证我们可以获取到确定的线程运行的逻辑和结果）

其中三维索引SV_DispatchThreadID的计算公式：[(disptch.x,disptch.y,disptch.z)*[numthreads.x,numthreads.y,numthreads.z]]+SV_GroupThreadID

Result[id.xy] = float4(id.x & id.y, (id.x & 15)/15.0, (id.y & 15)/15.0, 0.0); （id.xy为当前线程处理的texture2d颜色二维矩阵的xy轴索引，便于我们获取准确的颜色二维坐标，但是有个前提：texture2d的width和height要分别与Dispatch(x,y)*numthreads[x,y]的处理线程单元一一对应，通熟来说就是一个线程处理一个像素）

注意：我反复用了几次<数组的数组>，就是为了突出GPU并行的能力之高，我们通过numthreads定义了<三维线程数组>（或者说x*y*z的长一维数组），而且我们还可以通过c#Dispatch(x,y,z)再次定义一个<三维线程数组>的三维线程数组，将texture2d纹理数据“平均切分开”用5*3*2=30个线程，且每个线程调用10*8*3=240个流处理单元处理数据。

我相信大家可能没有直观的感觉，下面我们就来写个demo演示一下：

#pragma kernel CSMainRWTexture2D<float4> xResult;
RWTexture2D<float4> yResult;
RWTexture2D<float4> zResult;[numthreads(4,8,2)]
void CSMain (uint3 id : SV_DispatchThreadID)
{//转为float进行计算float x = id.x;float y = id.y;float z = id.z;//根据id的xyz计算颜色值float xcol = x/512;float ycol = y/512;float zcol = z/2;//采样xyz的颜色值xResult[id.xy] = float4(xcol,xcol,xcol,1);yResult[id.xy] = float4(ycol,ycol,ycol,1);zResult[id.xy] = float4(zcol,zcol,zcol,1);
}

c#代码：

using UnityEngine;
using UnityEngine.UI;public class DemoCSCall : MonoBehaviour
{public int texWidth = 512;public int texHeight = 512;public RawImage imgx;public RawImage imgy;public RawImage imgz;public ComputeShader theCS;private RenderTexture xTex;private RenderTexture yTex;private RenderTexture zTex;void Start(){}private void Update(){if (Input.GetKeyDown(KeyCode.R)){//创建xyz展示纹理xTex = new RenderTexture(texWidth, texHeight, 0, RenderTextureFormat.ARGB32);xTex.enableRandomWrite = true;xTex.Create();yTex = new RenderTexture(texWidth, texHeight, 0, RenderTextureFormat.ARGB32);yTex.enableRandomWrite = true;yTex.Create();zTex = new RenderTexture(texWidth, texHeight, 0, RenderTextureFormat.ARGB32);zTex.enableRandomWrite = true;zTex.Create();//获取kernel idint kl = theCS.FindKernel("CSMain");//赋值xyz纹理入参出参theCS.SetTexture(kl, "xResult", xTex);theCS.SetTexture(kl, "yResult", yTex);theCS.SetTexture(kl, "zResult", zTex);//设置数据处理单元并运行CS//我们使用numthreads[4,8,2]//则根据纹理宽/4，高/8，1设置numthreads处理的数据范围theCS.Dispatch(kl, texWidth / 4, texHeight / 8, 1);//提取xyztex进行展示imgx.texture = xTex;imgy.texture = yTex;imgz.texture = zTex;}}
}

运行效果如下：

结合上面的代码和下面的运行结果来观察zTex，可以了解到CS线程SV_Group是乱序执行的，因为我们每次得到的id.z都不一样，所以zTex的黑灰色块一直变换。

同时，如果我们修改Dispatch的入参，也会出现一些现象：

theCS.Dispatch(kl, texWidth / 8, texHeight / 8, 1);

降低每个线程在纹理x轴的处理数据量，则如下：

CS只处理“一半”的纹理数据。如果我们增加每个线程的处理数据量，则没什么变化，无非就是重叠了线程之间的数据区间，浪费了一些硬件资源。

同时，CS也可以处理纯粹的数据，比如数组。其实这也很正常，CS就是专门给我们做计算的shader，如果我们必须将数据都写成图片，那光是处理入参的开销都不小，CS处理普通数据如下：

#pragma kernel CSMainRWStructuredBuffer<float2> Float2s;
RWTexture2D<float4> Result;
int Width;
int Height;float getDistance(float2 p,float2 c)
{float2 d = p-c;return sqrt(d.x*d.x+d.y*d.y);
}float getMaxDistance(float2 c)
{return sqrt(c.x*c.x+c.y*c.y);
}float4 getTexRGBA(float2 p)
{float2 center = float2(Width/2,Height/2);float dist = getDistance(p,center);float mdist = getMaxDistance(center);float4 col = float4(dist/mdist,dist/mdist,dist/mdist,1);return col;
}[numthreads(16,32,2)]
void CSMain (uint3 id : SV_DispatchThreadID)
{int index = id.y*Width+id.x;float2 p = Float2s[index];Result[id.xy] = getTexRGBA(p);
}

c#调用：

using UnityEngine;
using UnityEngine.UI;public class BufferCSCall : MonoBehaviour
{public int texWidth = 8192;public int texHeight = 8192;public ComputeShader theCS;public RawImage img;private ComputeBuffer csBuffer;private Vector2[] csFloats;private RenderTexture csTex;void Start(){//假设有这么一个二维vector2矩阵int bufferlen = texWidth * texHeight;csFloats = new Vector2[bufferlen];for (int x = 0; x < texWidth; x++){for (int y = 0; y < texHeight; y++){int index = x * texHeight + y;csFloats[index] = new Vector2(x, y);}}
#if UNITY_EDITORDebug.LogFormat("cs start time = {0}", Time.realtimeSinceStartup);
#endif//初始化texcsTex = new RenderTexture(texWidth, texHeight, 0, RenderTextureFormat.ARGB32);csTex.enableRandomWrite = true;csTex.Create();//绘制图像//通过Set函数传递数据到GPUint kl = theCS.FindKernel("CSMain");csBuffer = new ComputeBuffer(bufferlen, 32);csBuffer.SetData(csFloats);theCS.SetBuffer(kl, "Float2s", csBuffer);theCS.SetTexture(kl, "Result", csTex);theCS.SetInt("Width", texWidth);theCS.SetInt("Height", texHeight);theCS.Dispatch(kl, texWidth / 16, texHeight / 32, 1);img.texture = csTex;
#if UNITY_EDITORDebug.LogFormat("cs stop time = {0}", Time.realtimeSinceStartup);
#endif}
}

效果如下：

可以看得出来使用0.4s左右生成了一个基于像素中心距离插值的8k图像，下面用CPU试一下：

using UnityEngine;
using UnityEngine.UI;public class K8TexGenerator : MonoBehaviour
{public RawImage img;public int texWidth = 8192;public int texHeight = 8192;void Start(){
#if UNITY_EDITORDebug.LogFormat("cpu start time = {0}", Time.realtimeSinceStartup);
#endifTexture2D tex = new Texture2D(texWidth, texHeight, TextureFormat.ARGB32, false);for (int x = 0; x < texWidth; x++){for (int y = 0; y < texHeight; y++){Vector2 p = new Vector2(x, y);tex.SetPixel(x, y, getTexRGBA(p));}}tex.Apply();img.texture = tex;
#if UNITY_EDITORDebug.LogFormat("cpu stop time = {0}", Time.realtimeSinceStartup);
#endif}float GetDistance(Vector2 p, Vector2 c){Vector2 d = p - c;return Mathf.Sqrt(d.x * d.x + d.y * d.y);}float GetMaxDistance(Vector2 c){return Mathf.Sqrt(c.x * c.x + c.y * c.y);}Color getTexRGBA(Vector2 p){Vector2 center = new Vector2(texWidth / 2, texHeight / 2);float dist = GetDistance(p, center);float mdist = GetMaxDistance(center);Color col = new Color(dist / mdist, dist / mdist, dist / mdist, 1);return col;}
}

效果如下：

整整22.5s才生成一张同样算法的图，差距简直不可想象。所以说CS在数据处理方面还是很强大的。

好了，CS学习到这里，以后有机会碰到用CS的地方，再来举几个CS的用法例子。

入门图形学：ComputeShader相关推荐

入门图形学：平直着色和平滑着色
真是在老家关得有点郁闷了,不晓得这疫情什么时候结束.虽然我个人非常喜欢放假,因为有更多时间学习.玩游戏和搞自己的事情.不过这次疫情时间掐得太准了,刚好是放年假回老家拜年的时间,直接把我关在老家四十天, ...
入门图形学：透明原理
因为我们后面要大量使用透明效果(或者说透明效果是实现我们需要的着色特效的基础部分),所以这里就来非常通俗详细说一下透明的原理. 先来从最简单易懂的二维层面来讲解. 我猜大家使用最常见的透明效果就是在u ...
入门图形学：图形学原理（三）
紧接上一篇:https://blog.csdn.net/yinhun2012/article/details/79984729 之前我们了解了计算机图形显示所需要的两个重要硬件(显示器和GPU),现在 ...
入门图形学：图像二值化
最近工程做图像处理和网格处理,顺便来记几篇博客. 二值化处理在图像处理上,一般的图像(彩图)携带信息过多,用于计算处理方面就不适合,需要将图像进行灰度化->二值化. ...
入门图形学：图形学原理（一）
前面我们闲聊谈到作为一个准备长期坚持在研发道路上的程序员,怎么样才能保证自己所学的知识属于持久长存的,而不至于在新老交替的时候淘汰掉.那么什么知识属于保质期长久的呢?其中就谈到了理论,也就是也就是原理 ...
入门图形学：屏幕波爆特效
最近bilibili看了黑神话悟空的UE5演示视频,感觉是真牛逼,地址:黑神花悟空UE5实机演示视频遥想我也算是国内第一批用ue4的开发者了,15年开始用ue4.7源码版,做了一年多就又用回u3d了 ...
入门图形学：光照模型（四）
紧接上一篇:https://blog.csdn.net/yinhun2012/article/details/80924102 上一篇我们用cg中比较常见的vertex fragment函数去实现了通 ...
入门图形学：光照模型（二）
转载自: https://blog.csdn.net/yinhun2012/article/details/80912620 前面我们谈到了光照作用最终表面颜色公式,如下: 查看全文 http://w ...
入门图形学：武器光波特效
再次安利一下这个视频,黑神话悟空这里面战斗都太秀了,难度我凭观看感觉和黑魂血源有的一拼,我感觉我是打不过的,黑魂3的古达就把我虐死二三十次,血源更不说了,小怪我都打不赢,所以还是来实现一下其中的特效 ...
入门图形学：VR畸变后处理
最近真的忙,项目需求大改,晚上还要复习成人英语考试. 新增了"一键生成VR"需求,在现有框架上预编译同步普通键鼠和VR手柄的操作.同步PC和VR的各种接口使用等,业务开 ...

入门图形学：ComputeShader

入门图形学：ComputeShader相关推荐

最新文章

热门文章