GPU Command Buffer

原文链接
http://www.chromium.org/developers/design-documents/gpu-command-buffer。
This are mostly just notes on the GPU command buffer
GPU Command Buffer系统是Chrome通过OpenGL或OpenGL ES(或者ANGLE模拟的OpenGL ES)与GPU通信的途径。
GPU Command Buffer有一套模拟OpenGL ES 2.0 API的API，这套模拟的API不具有驱动和平台的兼容性。
目标：
command buffer系统的第一个目标是安全性。操作系统中的图形系统存在安全漏洞。
举两个简单的例子：
你可以申请一个texture或者一块buffer,这块返回的内存往往是其他的应用程序遗留下来的，
可能包含密码，图像或者其他不希望被调用程序看见的数据。
与此类似，还有很多有bug或设计不完善的api，对这些api的不适当调用会很容易导致浏览器崩溃。
GPU 进程的第一个目标就是防止这些问题。
command buffer系统的第二个目标是跨系统的兼容性。
从客户端的视角看，跨系统在行为上不应该有任何差别。
在某些情况下，这意味着需要一些在实际系统上并不存在的限制。
比如，禁止使用一些高级的GLSL特性，通过重写shaders或其他技术绕过bug.
command buffer系统的第三个目标是速度，速度是选择command buffer 实现的原因。
只需要很少或者根本不需要与service交流，客户端可以快速的写命令。
客户端在稍后会告诉service写入了更多的命令。
比如另一种实现方案，可以为每个OpenGL ES 2.0 函数使用一个单独的IPC，但那会相当的慢。
command buffer速度提升的另一个原因是它可以有效的并行调用操作系统的图形API。
glUniform，glDrawArrays调用本来开销很大，但是采用command buffer实现方案，
客户端只需要在command buffer中写入几个字节就完成了。GPU进程在另一个进程中会更有效率的并发调用
实际的OpenGL函数。
实现：
基本实现是"command buffer". 客户端(render 进程，pepper插件)将命令写入一些shared memory。
客户端更新'put' 指针,通过IPC通知GPU进程它在这块buffer中写入了多少。
GPU进程或服务端从这块buffer中读出命令，逐条验证每个命令的有效性，命令的参数以及参数是否适用于os图形API
的当前状态。验证完成后，才会真正的发起对os的调用。
这意味着即使是一个脆弱的运行本地代码，写入自己命令的render进程也不会导致GPU进程以一种使系统陷入危险的
方式调用图形系统。当写新的服务器端代码的时候，一定要记住不要设计要求客户端行为良好的命令。
要假设客户端行为恶劣，例如，要确保无论客户端做什么服务端的bookkeeping都不会出错。
API Layers:
Chrome GL调用的生命周期
In simple terms:
gl2.h->gles2_c_lib.cc->GLES2Implemetation->GLES2CmdHelper...SharedMemory...->GLES2DecoderImpl->
ui/gfx/gl/gl_bindings->OpenGL
CommandBuffer接口负责协调GLES2CmdHelper和GLES2DecoderImpl这间的交互。
CommandBuffer含有创建和销毁shared memory的方法，也含有交流当前状态的方法。
特别地，从客户端通过AsyncFlush()或Flush()发送最新的'put'指针，并从'Flush'的结果中取得最新的'get'指针。
CommandBufferService是CommandBuffer的一种实现，直接和GLES2DecoderImpl交互。
在单线程单进程的chrome中，传递CommandBufferService的实例给GLES2CmdHelper。
系统会正常工作。在真正的多进程chrome中，ComamdBufferProxy使用IPC通过GpuCommandBufferStub
到GpuScheduler再到CommandBufferService，完成client端到service端的对话。
Client端代码：
注意:src/gpu/command_buffer/client和src/gpu/command_buffer/common中的所有代码必须在没有别的库的情况下编译，
这些代码用在Pepper plugin中。
这些定义了OpenGL ES 2.0的公共接口：
src/third_party/khronos/GLES2/gl2.h
src/third_party/khronos/GLES2/gl2ext.h
这些定义了c的接口，大部分是自动生成的：
src/gpu/command_buffer/client/gles2_c_lib.cc
src/gpu/command_buffer/client/gles2_c_lib_autogen.h
这些是实际的客户端实现代码，向command buffer写入命令。大部分是自动生成的：
src/gpu/command_buffer/client/gles2_implementation.cc
src/gpu/command_buffer/client/gles2_implementation_autogen.h
这些是自动生成的用于格式命令的。
src/gpu/command_buffer/client/gles2_cmd_helper.h
src/gpu/command_buffer/client/gles2_cmd_helper_autogen.h
这些定义了实际的命令格式。
src/gpu/command_buffer/common/cmd_buffer_common.h
src/gpu/command_buffer/common/gles2_cmd_format.h
src/gpu/command_buffer/common/gles2_cmd_format_autogen.h
服务器端代码：
这些代码读取命令，验证并调用OpenGL.
src/gpu/command_buffer/service/gles2_cmd_decoder.cc
src/gpu/command_buffer/service/gles2_cmd_decoder_autogen.cc
三种传送数据的方式
有三种方式通过command buffer传送数据
1)在命令中传送
命令可以是固定长度(比如glUniform4f占用一个固定位置和4 floats)或者是变长的((glUniform4fv takes N sets of 4 floats).
数据放在command buffer中command命令之后，命令的长度会更新为包含数据后的长度。
优点：
* Easy. Fire and forget
缺点:
* 命令有最大长度限制1meg - 1
* Commands与command buffer一样长
2)通过shared memory传送
有些命令通过shared memory传送数据。
比如TexImage2D。客户端把数据放入shared memory.命令本身只包含shared memory id，shared memory中的偏移量，
和一个显式或隐式的size.TexImage2D命令包含的size是隐式的。
优点:
* 可传递任意大小的数据
* 可提前申请 shared memory并在任何时候填充 (glMapTexSubImage2D for example)
缺点:
*真正使用shared memory内容时，必须检查server.
3)通过bucket传送数据
Buckets是2中shared memory的一种抽象.
通过指定大小定义一个bucket(1 command),然后通过shared memory在bucket中传递数据(n commands),
最后在你想真正发送命令(ShaderSource, CompressedTeximage2D, ...)的时候发送命令并指向bucket.
buckets试图解决以下问题。
想像下，你在实现TexImage2D，你只有1M的shared memory,但是需要传递3M的texture.
因为只有1M的shared memory，你不能调用对3M数据调用TexImage2D。
所以，你先在不定义任何源数据的情况下调用TexImage2D，然后在调用3次TexSubImage2D传送数据，
GLES2Implementation就是这样做的。
现在想像你在实现ShaderSource。你有1M的shared memory但是需要传递3M的字符串。
没有ShaderSubSource函数，所以不能使用前面的解决方案。
创建一个3M的bucket,每次传送1M数据到bucket,然后发送ShaderSource命令并指向这个bucket。
优点:
* 不需要实现"SubData"命令
* 可以处理比shared memory大的数据
缺点:
* 慢.数据需要从shared memory拷贝到bucket。
添加一个命令：
添加一个新命令需要注意的事项：
在src/gpu/command_buffer/cmd_buffer_functions.txt中添加你的函数。
注意：看例子。对于新的enums，参考GLenumTypeOfEnum，对于GLint，参考GLintptr或GLsizei。
    如果不允许negative，参考GLxxxNotNegative。对于资源参考GLidTypeOfResource。
    在src/gpu/command_buffer/build_gles2_cmd_buffer.py的_FUNCTION_INFO中添加你的函数。
    拷贝一个与你的函数相似的函数。
    在command_buffer目录下调用build_gles2_cmd_buffer.py。
Note: we don't currently run this as part of the build as this lets us easily see the changes during code review.
    如果你想让你的函数像OpenGL函数那样是可调用的，添加你的函数到src/third_party/gl2ext.h。
    如果你只需要在WebGraphicsContext3DCommandBufferImpl中调用这个函数，不需要做这一步。
    如果你的函数会对其他函数(通常是glGetInteger, glTexImage2D等)添加新的GL ENUMs，
    在src/gpu/command_buffer/service/function_info.cc中添加。
    特别的，只在你的函数可用并被请求时，添加它们。参考function_info.cc中的代码。
    在src/gpu/command_buffer/common/gles2_cmd_utils.cc中添加它们。
    对于glGetInteger enums，在GLES2Util:GLGetNumValuesReturned中添加它们；
    对于texture和buffer的格式参考ElementsPerGroup, BytesPerElement；
    对于texture和renderbuffer的格式参考GetChannelsForFormat；
    See these CLs as examples
        http://codereview.chromium.org/8772033/

Texture问题
为了防止user program读取没有初始化的vram,所有textures在使用前必须被清除。
为了加快上传很多textures的programs的速度，清除工作会延迟发生。
如果你调用glTexImage(..., null)，command buffer会创建texture level并把它标注为unclear.
在对这个texture的任何读写之前，command buffer会清除它。
如果你添加了更新textures的函数，你需要调用TextureManager::ClearTextureLevel清除 uncleared level.
可以参考GLES2DecoderImpl::DoTexImage2D, GLES2DecoderImpl::DoCopyTexSubImage2D, etc..
需要警惕的OpenGL的问题
    有N * M texture binding points.
    OpenGL ES 2.0继承自1993年写成的OpenGL 1.0，当时可能只允许使用1个texture,2个binding points,
    GL_TEXTURE_2D, and GL_TEXTURE_CUBE_MAP.
    But, glActiveTexture select the active texture slot, each one has these binding points. To give an example
    GLuint textures[2];
    glGenTextures(2, textures);
    texture0 = textures[0];
    texture1 = textures[1];
    glBindTexture(GL_TEXTURE_2D, texture0);
    glActiveTexture(GL_TEXTURE1);
    glBindTexture(GL_TEXTURE_2D, texture1);
    glActiveTexture(GL_TEXTURE0);
    // This command will effect texture0 not texture1
    glTexImage2D(GL_TEXTURE_2D, ....
    As more texture targets are added there are more binding points.
    glGenXXX only reserves a name, it does not create a resource.
    In other worlds
    GLuint tex;
    glGenTextures(1, &tex);
    printf("%s\n", glIsTexture(tex) ? "true" : "false"); // prints false
    glBindTexture(GL_TEXTURE_2D, tex);
    printf("%s\n", glIsTexture(tex) ? "true" : "false"); // prints true
    This is a arguably a bug in the command buffer right now though there is code to work around it.
    You'll see we create internal tracking objects on glGenXXX but they are marked as invalid until
    glBindXXX time (except for Queries)glBindXXX creates objects.
    调用glBindXXX前不需要调用glGenXXX.
    你可以自己为buffers,framebuffers,renderbuffers和textures生成ids,当调用glBindXXX时，GL会自动生成一个对象。
    注意：Queries, Programs and Shaders例外。
    资源是引用计数的。
    这会有很多问题，通常来讲，调用glDeleteXXX会做两件事情。
    释放id,所以如果id被再次使用，将产生新的资源。
    释放对当前context的状态的引用，当前context绑定的对象，BUT NO WHERE ELSE.
    所以:glDeleteTextures会释放那些textures的id,清除当前contexts texture的引用，当前framebuffer对象的引用。
    不会清除其他framebuffer objects的引用或者其他的contexts.
    Programs and Shaders are far more quirky.
    Framebuffer objects不能跨contexts 共享。
    所有的对象都被声明为可共享的，但OpenGL spec的附录C又说FBO实际上是不可共享的。
    Texture id 0是默认的texture.
    OpenLG ES 2.0中没有关闭textures这种说法。
    绑定id=0的texture,就是绑定默认texture.所有的texture命令都会对默认texture起作用。
OpenGL ES 2.0的不兼容性
客户端的arrays指客户端内存存储vertex data的能力,直接包含opengl 引用。
command buffer不支持这一点。所有的vertex data都必须被放入OpenGL buffer.
为保证兼容性，客户端的类GLES2Implementation，通过跟踪OpenGL属性状态模拟客户端arrays,绘制时，将客户端的arrays拷贝到
buffer中，更新vertex 属性，调用draw，然后恢复vertex attribute state.
这个操作很慢，因为不知道客户端什么时候更新数据，所以每次调用draw时，buffer都要被更新。
For this reason, and because more modern versions of OpenGL require it, client side array emulation is
compiled out for everything except Native Client.
GL_FIXED
OpenGL ES 2.0要求支持GL_FIXED 作为一种属性类型(glVertexAttribPointer的参数). 桌面版GPUs不支持。
command buffer对GL——FIXED提供了可选的支持。
要支持GL_FIXED需要在所有GL调用前调用glEnableFeatureCHROMIUM。
这会使command buffer保持所有了GL buffer拷贝。
在调用DrawXXX时，所有类型为GL_FIXED的属性都被从各自的buffer拉出，转换成float，拷贝到临时的buffer。
属性变成指向这块临时的buffer.然后开始draw，attributes被重置为原来的状态。
很明显，这很慢而且需要大量内存。
GL_FIXED专门用来帮助移植OpenGL ES 2.0 apps to NaCl和通过 OpenGL ES 2.0 一致性测试。
重构想法:
将command buffer的解码和opengl es 2.0的模拟分离开，目前这两个功能混合在
Separating decoding the command buffer from emulating OpenGL ES 2.0 GLES2DecoderImpl中。
关于这两项功能的分离有一些讨论，我首先想到的是以下几点：
#1)验证shared memory
比如TexImage2D命令会被传递一个shard memory id,一个偏移量还有一个大小。
在真正的glTexImage2D命令被调用前，serviece需要验证id是有效的shared memory id,偏移量和大小都包含在shared memory内，
即对glTexImage2D的调用只针对指定的memory,而不是整个shared memory区域。
To do that requires potentially knowing various state that would normally be not efficiently
query-able given a separated OpenGL ES emulation.
It's possible the needed state could be easily exposed through separate functions or else maybe
changing TexImage2D and similar commands so the size is explicit 'size' instead of implicit
'width * height * type * format'.
#2)处理resource id
OpenGL ES 2.0 使用int 作为资源ID.客户端使用一套id,服务端使用另一套id.服务端维护客户端id到服务端id的映射。
为了避免管理这些id需要的客户端到服务端的round trip,客户端环境中不会共享资源,资源id完全有客户端管理，
只有在使用时才需要与服务端通信。
服务端在需要时候创建与客户端关联的服务端id。在当前设计下可以工作。
如果command buffer 代码跟OpenGL ES 2.0模拟代码分离，需要添加管理这些id的新方法，可能会需要一个双向的映射，
客户端id到command buffer service id的映射和command buffer service id到OpenGL ES 2.0 emulation id的映射。
会有解决这些问题的方法。
将功能从GLES2DecoderImpl移到各种resource managers
GLES2DecoderImpl 很大.差不多8000行.讨论过将一些大的功能模块移到各种resource manager中。
例如，将所有处理texture的函数TexImage2D, TexSubImage2D, TexParameter, CopyTexImage2D,
CopyTexSubImage2D, CompressedTexImage2D, CompressedTexSubImage2D, GenTexture, DeleteTexture,
IsTexture, TexStorage2DEXT从GLES2DecoderImpl 移动到TextureManager.
我很乐意这样做，但是这工作量不小，尤其是需要适配所有的单元测试。
不过，这样做仍然是相当简洁的实现方案。
将命令生成从OpenGL ES中分离出来
build_gles2_cmd_buffer.py中有OpenGL ES 函数到command的一一对应。
Ideally the commands in the command buffer would be separate from the OpenGL api so that it would be
easier to add any command needed and not have to expose it as an OpenGL ES extension.
去掉遗留代码
本来command buffer命令封装OpenGL ES API,是作为GPU进程的公共API的。
很多游戏终端为了性能原因会允许你直接使用command buffers.
可以直接操作command buffers意味着你可以预先计算command buffers and patch them on the fly as needed，
这意味这你的代码可以做少量的工作，从而大幅提升速度。
最后决定不直接暴露command buffer命令，但是基于原始设计的代码还没有被移除。
low-level commands
src/gpu/command_buffer/client/cmd_buffer_helper.cc, src/gpu/command_buffer/common/cmd_buffer_common.h 和
src/gpu/command_buffer/service/common_decoder.cc 是实现JUMP, CALL and RETURN的函数。
这些函数会影响系统其余部分的设计约束。
除非command buffer作为公共接口，否则这些代码并不需要，可以移除。
3 types of commands
build_gles2_cmd_buffer.py 为很多函数生成3个版本.
上面谈过的每种数据传输方式都会对应一个函数版本。
比如TexImage2D, TexImage2DImmediate和TexImage2DBucket分别是TexImage2D,
直接将数据放入command buffer的TexImage2D版本和TexImage2D的bucket版本实现。
当command buffer是公共接口时，这3个版本各有长短都很重要。
但是现在之需要GLES2Implementation中使用的命令. 可能产生3个版本的代码可以删除了。
remove _CMD_ID_TABLE
在build_gles2_cmd_buffer.py中 _CMD_ID_TABLE的唯一目的是保证命令的id不变。
当命令是公共的API时，这点很重要。
现在已经不重要了， _CMD_ID_TABLE可以去掉了，command可以在任何时候改变ids.
size in entries
Left over from the O3D code, the command buffer works on CommandBufferEntry units.
Each unit is 32 bits and sizes of commands and command data is calculated in those units.
There's a lot of superfluous math involved in converting to an from those units.
If instead the code was refactored so that the size of commands was in bytes all of that extra math code could disappear.

GPU Command Buffer相关推荐

【TA-霜狼_may-《百人计划》】图形3.7.2 command buffer简
[TA-霜狼_may-<百人计划>]图形3.72command buffer 及urp概述 @[TOC]([TA-霜狼_may-<百人计划>]图形3.72command buf ...
Chrome内核解析 -- 绘制引擎基础篇：Command Buffer
转载请注明出处:http://write.blog.csdn.net/postedit/41743463 本文讲解Chromium里集中处理GL操作的重要模块:command buffer. Grap ...
Unity 特效扭曲效果（使用command buffer实现）
了解command buffer CommandBuffer携带一系列的渲染命令,可以指定在相机渲染的某个点执行本身命令(包括特殊渲染,保存当前rendertexture等) CommandBuffe ...
Unity SRP Batcher的工作原理
抓手根据我的理解总结,SRP Batcher就是 1.把调用draw call前,一大堆CPU的设置工作给一口气处理了,增加了效率. 2.把材质的属性数据直接永久放入到显卡的CBUFFER里,那只要 ...
SRP Batcher 原理及应用
SRP Batcher 概述 SRP Batcher 是 Unity 在 2018 年随着 SRP 的发布而推出的一种新的批处理方式.启用 SRP Batcher 并不会减少 Draw Call,而会 ...
跨平台Web Canvas渲染引擎架构的设计与思考(内含实现方案)
这篇文章主要从技术视角介绍下跨平台WebCanvas的架构设计以及一些关键模块的实现方案(以Android为主),限于作者水平,有不准确的地方欢迎指正或者讨论. 设计目标标准化:Web Canvas ...
跨平台Web Canvas渲染引擎架构的设计与思考
简介: 这篇文章主要从技术视角介绍下跨平台WebCanvas的架构设计以及一些关键模块的实现方案(以Android为主),限于作者水平,有不准确的地方欢迎指正或者讨论. 设计目标标准化:Web Ca ...
使用Xcode External Build System实现Rust 项目 Capture GPU Frame 在线调试 Metal 2018.12.18
文档列表见:Rust 移动端跨平台复杂图形渲染项目开发系列总结(目录) 根据kvark指导,Xcode创建External Build System项目可通过Capture GPU Frame查看gf ...
深入浅出 | 谈谈MNN GPU性能优化策略
MNN(Mobile Neural Network)是一个高性能.通用的深度学习框架,支持在移动端.PC端.服务端.嵌入式等各种设备上高效运行.MNN利用设备的GPU能力,全面充分"榨干&q ...

GPU Command Buffer

GPU Command Buffer相关推荐

最新文章

热门文章