Parallel Rendering Overview

本页面的内容:
  • Stage 1
  • Stage 2
  • Stage 3
  • Synchronization
  • Debugging

In the old days before Parallel rendering was an option there was just the GameThread and the RenderThread. The GameThread would enqueue RenderThread commands to execute later. These commands would directly make calls into the RHI layer that serves as our cross-platform interface into the different graphics API's. This means that the RenderThread would directly call items such as LockDrawPrimitive, etc on D3D11. Unfortunately this didn't let us parallelize very well so then the new Parallel renderer came online in the following stages.

Stage 1

The purpose of Stage 1 was to separate the renderer into a front-end and a back-end. This way the RenderThread no longer makes direct calls through the RHI. Instead the RenderThread generates a cross-platform command list derived from FRHICommandList. There are multiple types of command list for different purposes. E.g. FRHICommandListImmediate and FRHIAsyncComputeCommandList. So when RenderThread wants to perform DrawPrimitive, it takes an RHICommandList object and enqueues a DrawPrimitive command. These commands are small classes with an overridden execute function that store the data they need to execute when created. So for DrawPrim it stores off NumPrimsNumInstances,etc, puts it on the end of the commandlist and then the RenderThread goes on its way.

These RHI Commandlists are then 'Translated' (i.e. executed) on a separate thread that is called the RHI Thread. This is now where we make calls into the actual platform level API's. There are still some non-commandlist operations like Lock/Unlock that may need to be executed immediately by the RenderThread. In these cases we either flush the RHI Thread and wait, or copy data and queue. This is platform implementation dependent per-operation. Submission of commands to the GPU can be platform controlled by heuristic data (submit multiple times per frame, queue all commands till end of frame, etc). Finally, we added a new RHISubmitCommandsHint command to indicate to the RHI that it should submit now if possible.

Stage 2

Now that command generation is separated from command execution we can parallelize either one independently. This means that even on DX11 where parallelization doesn't work very well on the backend, we can generate commandlists in parallel easily now. When we parallelize data it is done as data-parallel rather than task-parallel. In other words, we will break each individual pass up into chunks rather than run the different passes as long tasks concurrently. We do this in all the major passes such as BasePass, DepthPass, VelocityPass, etc.

The mechanism for this is the pure virtual FParallelCommandListSet class. You can find a derivation of this class for every pass which is parallelized. E.g. FbasePassParallelCommandListSet. These classes are responsible for creating an RHICommandList for each thread, submitting the results in the proper order, setting any necessary state at the start of the partial commandlist, etc. Load balancing is important here to avoid some worker threads having too little or too much work to do compared to the others. UE4 will automatically do its best to load balance properly. Special submission commands are inserted into the RHICommandList to ensure that GPU submission happens in the correct order, and that translation is finished before submitting.

Once all the worker threads for generating a given pass are kicked off, the rendering thread continues on. It does not wait for the tasks to finish. Thus the renderer is generally no longer allowed to modify state shared with these workers such as the View and Projection Matrices.

Stage 3

Stage three brings support for backend parallelization on platforms that support it like consoles, DX12 and Vulkan. In this case we actually do the Translation in parallel where we can. Basically, anything generated in parallel on the frontend is translated in parallel on the backend. The main interface used in translation is the IRHICommandContext. There is a derived RHICommandContext for each platform and API. During translation the RHICommandList is given an RHICommandContext to operate on. Each command�s execute function calls into the RHICommandContext API. The CommandContext is responsible for state shadowing, validation, and any API specific details necessary to perform the given operation.

Synchronization

Synchronization of the renderer between the GameThread, RenderThread, RHI Thread, and the GPU is a complex topic. At the highest level UE4 is normally configured as a single frame-behind renderer. Meaning specifically that the GameThread may be processing Frame N+1 while the RenderThread is allowed to be processing Frame N or Frame N+1 (as commands come in) if the RenderThread is processing faster than the GameThread. The addition of the RHIThread complicates this slightly in that we allow the RenderingThread to move ahead of the RHIThread by about half a frame. Specifically, the RenderThread is allowed to complete the visibility calculations for Frame N+1 before waiting for the RHI Thread to complete Frame N. Thus for a GameThread on Frame N+1, the RenderThread may be processing commands for Frame N or Frame N+1, and the RHI Thread may also be translating commands from Frame N or Frame N+1 depending on execution times. These guarantees are arbitrated by FframeEndSync and FRHICommandListImmediate::RHIThreadFence.

Another useful guarantee is that no matter how the parallelization is configured, the order of Submission of commands to the GPU is unchanged from the order the commands would have been submitted in a single-threaded renderer. This is required for correctness and must be maintained during any code refactoring.

Debugging

There are various CVARs to control this behavior. Because many of these stages are orthogonal they can be independently disabled for testing, and new platforms can be brought up in stages as time allows. e.g.

Command

Description

r.rhicmdusedeferredcontexts

Will control parallelization of the backend.

r.rhicmduseparallelalgorithms

Will control parallelization of the frontend.

r.rhithread.enable

Will disable the RHIThread completely. Commandlists will still be generated, they will just be translated directly on the RenderThread at certain points.

r.rhicmdbypass

Can completely disable commandlist generation and make the renderer behave like it originally did, bypassing the commandlist and directly calling the RHI commands on the rendering thread This only takes effect after you have also disabled the RHI thread.

原文:https://docs.unrealengine.com/latest/CHN/Programming/Rendering/ParallelRendering/index.html

转载于:https://www.cnblogs.com/wodehao0808/p/8110541.html

(转载)(官方)UE4--图像编程----Parallel Rendering Overview相关推荐

  1. WINDOWS图像编程

    图形设备接口(GDI,Graphics Device Interface)的主要目标之一是支持在输出设备(如视频显示器.打印机和绘图仪)上的与设备无关的图形. GDI通过将应用程序与不同输出设备特性相 ...

  2. 如何运用并行编程Parallel提升任务执行效率

    本文来自小易,[DoTNET技术圈]公众号已获得转载授权. <.NET并发变成实战>读后感:并行编程Parallel 手打目录: 一.前言 二.任务并行库(TPL)的介绍 三.Parall ...

  3. TC图像编程-汉字串显示【原创】

    TC图像编程-汉字串显示[原创] 2008/03/09 14:36 /*我把这代码贴出来是学习之用,所以一些说明我也就懒得讲了*/ /********************************* ...

  4. 《转载》Python并发编程之线程池/进程池--concurrent.futures模块

    本文转载自 Python并发编程之线程池/进程池--concurrent.futures模块 一.关于concurrent.futures模块 Python标准库为我们提供了threading和mul ...

  5. UE4异步编程专题 - TFunction

    0. 关于这个专题 游戏要给用户良好的体验,都会尽可能的保证60帧或者更高的fps.一帧留给引擎的时间也不过16ms的时长,再除去渲染时间,留给引擎时间连10ms都不到,能做的事情是极其有限的.同步模 ...

  6. [独家放送]Unity2019更新规划速览,将有官方的可视化编程!

    本文首发于洪流学堂微信公众号. 洪流学堂,学Unity快人几步 欢迎一起进入2019年,在新的一年里Unity有什么大动作呢?本文带你浏览你最关心的Unity2019的核心功能! 你可能最关心的功能有 ...

  7. Python可以这样学(第七季:pillow扩展库图像编程)-董付国-专题视频课程

    Python可以这样学(第七季:pillow扩展库图像编程)-1738人已学习 课程介绍         董付国老师系列教材<Python程序设计(第2版)>(ISBN:978730243 ...

  8. C# 图像编程 (1) 准备工作; 你好,空姐; 为空姐照片添加特效

    很久之前,就想写一系列C#图像编程的文章,但始终没有下笔,其主要原因有二:(1)我的C#图像处理库 Geb.Image 库在大幅度变动中:(2)没有找到一个很好的演示工具.现在,对于第一个问题,Geb ...

  9. 计算机数控编程特点,什么是数控图像编程系统有哪些特点

    第五章 图像数控编程 一. 图像编程简介 二. 二坐标平面轮廓数 控加工图像编程 三. 多坐标图像数控编程 四. 高速切削加工简介 图像编程即根据计算 机图形显示器上显示的 零件设计三维模型,在1. ...

最新文章

  1. 如何用Python批量提取PDF文本内容?
  2. Spin Control (上下控件)
  3. c语言 utf 8转字符串,如何将UTF-8字节[]转换为字符串?
  4. android ocr识别源码_身份证识别OCR解决手动输入繁琐问题
  5. (一)uboot的移植与制作
  6. 串口接收到的字符型数据如何转化成数字
  7. python自动化办公真的好用吗-用 Python 自动化办公能做到哪些有趣或有用的事情?...
  8. Mac开箱JAVA开发推荐装机软件
  9. 右键添加cmd命令快捷键,右键cmd快捷键丢失
  10. 使用hexo+gitee免费搭建个人博客全网最详细
  11. vue 上传视频到保利威视
  12. ping的各种意义、作用
  13. python 图灵完备_区块链学习6:图灵完备和图灵不完备
  14. Multisim3.8应用实例
  15. BeanDefinitionParsingException Configuration problem Unabl
  16. java调用扫描仪_通过Java调用Dynamsoft .NET TWAIN SDK控制扫描仪扫描文档
  17. STM32+IAP方案的实现,IAP实现原理(详细解决说明)。
  18. 【转】Hive导入10G数据的测试
  19. 毕业论文浅析计算机病毒,浅析计算机病毒的有效防御论文
  20. 标志是企业视觉形象识别的核心

热门文章

  1. 关东升的《从零开始学Swift》3月9日已经上架
  2. 网络工程原理与实践教程实验安排
  3. 剑指offer:矩形覆盖
  4. 【LeetCode 剑指offer刷题】树题6:28 对称二叉树(101. Symmetric Tree)
  5. [模板]tarjan求强连通分量
  6. 百度地图API二次开发小经验分享
  7. 向sdcard中添加文件遇到的一些问题
  8. Windows Shell 学习 3
  9. 我的Android学习体系
  10. Ecplise SVN 配置和使用