Roofline 代码现状:
- CS Roofline Toolkit 为 Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis 的实现,uo-cdux/ert-mirror 为 github 上的一个镜像;
- cyanguwa/nersc-roofline 为 Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs 对应的代码,包含 GPP 和 C 语言的 ERT kernel;
- NERSC/roofline-on-nvidia-gpus 为 8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks 所对应的代码,数据收集方法有改进,但只有 GPP;
- NERSC/timemory 是 Timemory: Modular Performance Analysis for HPC 所对应的代码,更为系统和规范。
下面对 NERSC/roofline-on-nvidia-gpus 进行介绍。
NERSC/roofline-on-nvidia-gpus 展示了在 NVIDIA GPU 尤其是在 V100上使用 Roofline 分析方法。仓库的结构如下所示。
/example-codes 包含一些玩具内核kernel_abc.cu
和一个真正的 HPC 迷你应用程序 GPP,提取自材料科学代码 BerkeleyGW。
/ncu-section-files 包含 CUDA 11 中 Nsight Compute 附带的默认 Speed of Light 节文件,以及几个用于分层 Roofline 分析的自定义节文件,用于双精度、单精度、半精度和张量核心操作。这些节文件旨在使用 Nsight Compute(ncu
) 自动收集屋顶线数据并进行可视化。
run.ncu
演示了如何在 CUDA 11 中运行 Nsight Compute,而run.gpp.ncu
是一个 Slurm 作业脚本,用于在 Cori GPU 上运行五个版本的 GPP 示例。
/custom-scripts 提供了一套作业启动、后处理和可视化脚本,可用于手动的 Roofline 数据采集和可视化。这样做的目的是使用户更容易将 Roofline 分析集成到自己的工作流中。
Customized ncu-based Roofline Workflow
为了与用户的其他工作流集成,/custom-scripts 提供了一套用于手动度量收集和 Roofline 可视化的脚本。
run.gpp.customized
postprocess.py
and roofline.py
run.gpp.customized
自定义脚本以 GPP 为例展示了 Roofline 分析所需的 Nsight Compute 指标列表。这些指标使用 Nsight Compute ncu
(或nv-nsight-cu-cli
)命令行实用程序收集,并写入/custom-scripts
中的.csv
文件。
然后,postprocess.py
使用 Pandas 对结果进行后处理,以计算每个被分析内核的算术强度( Arithmetic Intensity,AI)和 FLOP/s 吞吐量。
处理完成后,postprocess.py
将调用基于 Matplotlib 的roofline.py
绘制 Roofline 图表,然后将图表保存到.png
文件中。
这些脚本中使用的数据收集方法详述如下。它是 CUDA 11 中 Nsight Compute 的新功能。
Time
:
- sm__cycles_elapsed.avg / sm__cycles_elapsed.avg.per_second
FLOPs
:
DP
: sm__sass_thread_inst_executed_op_dadd_pred_on.sum + 2 x sm__sass_thread_inst_executed_op_dfma_pred_on.sum + sm__sass_thread_inst_executed_op_dmul_pred_on.sum
SP
: sm__sass_thread_inst_executed_op_fadd_pred_on.sum + 2 x sm__sass_thread_inst_executed_op_ffma_pred_on.sum + sm__sass_thread_inst_executed_op_fmul_pred_on.sum
HP
: sm__sass_thread_inst_executed_op_hadd_pred_on.sum + 2 x sm__sass_thread_inst_executed_op_hfma_pred_on.sum + sm__sass_thread_inst_executed_op_hmul_pred_on.sum
Tensor Core
: 512 x sm__inst_executed_pipe_tensor.sum
Bytes
:
DRAM
: dram__bytes.sum
L2
: lts__t_bytes.sum
L1
: l1tex__t_bytes.sum
#mermaid-svg-pSBUv9W1Wkg6Vi9e .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .label text{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .node rect,#mermaid-svg-pSBUv9W1Wkg6Vi9e .node circle,#mermaid-svg-pSBUv9W1Wkg6Vi9e .node ellipse,#mermaid-svg-pSBUv9W1Wkg6Vi9e .node polygon,#mermaid-svg-pSBUv9W1Wkg6Vi9e .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .node .label{text-align:center;fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .node.clickable{cursor:pointer}#mermaid-svg-pSBUv9W1Wkg6Vi9e .arrowheadPath{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .flowchart-link{stroke:#333;fill:none}#mermaid-svg-pSBUv9W1Wkg6Vi9e .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-pSBUv9W1Wkg6Vi9e .edgeLabel rect{opacity:0.9}#mermaid-svg-pSBUv9W1Wkg6Vi9e .edgeLabel span{color:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .cluster text{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-pSBUv9W1Wkg6Vi9e .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-pSBUv9W1Wkg6Vi9e text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-pSBUv9W1Wkg6Vi9e .actor-line{stroke:grey}#mermaid-svg-pSBUv9W1Wkg6Vi9e .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .sequenceNumber{fill:#fff}#mermaid-svg-pSBUv9W1Wkg6Vi9e #sequencenumber{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e #crosshead path{fill:#333;stroke:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .messageText{fill:#333;stroke:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-pSBUv9W1Wkg6Vi9e .labelText,#mermaid-svg-pSBUv9W1Wkg6Vi9e .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-pSBUv9W1Wkg6Vi9e .loopText,#mermaid-svg-pSBUv9W1Wkg6Vi9e .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-pSBUv9W1Wkg6Vi9e .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-pSBUv9W1Wkg6Vi9e .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-pSBUv9W1Wkg6Vi9e .noteText,#mermaid-svg-pSBUv9W1Wkg6Vi9e .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-pSBUv9W1Wkg6Vi9e .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-pSBUv9W1Wkg6Vi9e .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-pSBUv9W1Wkg6Vi9e .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-pSBUv9W1Wkg6Vi9e .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e .section{stroke:none;opacity:0.2}#mermaid-svg-pSBUv9W1Wkg6Vi9e .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-pSBUv9W1Wkg6Vi9e .section2{fill:#fff400}#mermaid-svg-pSBUv9W1Wkg6Vi9e .section1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .section3{fill:#fff;opacity:0.2}#mermaid-svg-pSBUv9W1Wkg6Vi9e .sectionTitle0{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .sectionTitle1{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .sectionTitle2{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .sectionTitle3{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-pSBUv9W1Wkg6Vi9e .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e .grid path{stroke-width:0}#mermaid-svg-pSBUv9W1Wkg6Vi9e .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .task{stroke-width:2}#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskText:not([font-size]){font-size:11px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .task.clickable{cursor:pointer}#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskText0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskText1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskText2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskText3{fill:#fff}#mermaid-svg-pSBUv9W1Wkg6Vi9e .task0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .task1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .task2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskTextOutside0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskTextOutside2{fill:#000}#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskTextOutside1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .taskTextOutside3{fill:#000}#mermaid-svg-pSBUv9W1Wkg6Vi9e .active0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .active1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .active2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeText0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeText1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeText2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeText3{fill:#000 !important}#mermaid-svg-pSBUv9W1Wkg6Vi9e .done0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .done1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .done2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneText0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneText1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneText2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneText3{fill:#000 !important}#mermaid-svg-pSBUv9W1Wkg6Vi9e .crit0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .crit1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .crit2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeCrit0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeCrit1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeCrit2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneCrit0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneCrit1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneCrit2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-pSBUv9W1Wkg6Vi9e .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-pSBUv9W1Wkg6Vi9e .milestoneText{font-style:italic}#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneCritText0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneCritText1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneCritText2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .doneCritText3{fill:#000 !important}#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeCritText0,#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeCritText1,#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeCritText2,#mermaid-svg-pSBUv9W1Wkg6Vi9e .activeCritText3{fill:#000 !important}#mermaid-svg-pSBUv9W1Wkg6Vi9e .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.classGroup text .title{font-weight:bolder}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.clickable{cursor:pointer}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-pSBUv9W1Wkg6Vi9e .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-pSBUv9W1Wkg6Vi9e .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-pSBUv9W1Wkg6Vi9e .dashed-line{stroke-dasharray:3}#mermaid-svg-pSBUv9W1Wkg6Vi9e #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-pSBUv9W1Wkg6Vi9e #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-pSBUv9W1Wkg6Vi9e #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-pSBUv9W1Wkg6Vi9e #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-pSBUv9W1Wkg6Vi9e #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-pSBUv9W1Wkg6Vi9e #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-pSBUv9W1Wkg6Vi9e #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-pSBUv9W1Wkg6Vi9e #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-pSBUv9W1Wkg6Vi9e .commit-id,#mermaid-svg-pSBUv9W1Wkg6Vi9e .commit-msg,#mermaid-svg-pSBUv9W1Wkg6Vi9e .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-pSBUv9W1Wkg6Vi9e g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-pSBUv9W1Wkg6Vi9e .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-pSBUv9W1Wkg6Vi9e .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-pSBUv9W1Wkg6Vi9e .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-pSBUv9W1Wkg6Vi9e .edgeLabel text{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-pSBUv9W1Wkg6Vi9e .node circle.state-start{fill:black;stroke:black}#mermaid-svg-pSBUv9W1Wkg6Vi9e .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-pSBUv9W1Wkg6Vi9e #statediagram-barbEnd{fill:#9370db}#mermaid-svg-pSBUv9W1Wkg6Vi9e .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .statediagram-state .divider{stroke:#9370db}#mermaid-svg-pSBUv9W1Wkg6Vi9e .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-pSBUv9W1Wkg6Vi9e .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-pSBUv9W1Wkg6Vi9e .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-pSBUv9W1Wkg6Vi9e .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-pSBUv9W1Wkg6Vi9e .note-edge{stroke-dasharray:5}#mermaid-svg-pSBUv9W1Wkg6Vi9e .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-pSBUv9W1Wkg6Vi9e .error-icon{fill:#522}#mermaid-svg-pSBUv9W1Wkg6Vi9e .error-text{fill:#522;stroke:#522}#mermaid-svg-pSBUv9W1Wkg6Vi9e .edge-thickness-normal{stroke-width:2px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-pSBUv9W1Wkg6Vi9e .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-pSBUv9W1Wkg6Vi9e .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-pSBUv9W1Wkg6Vi9e .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-pSBUv9W1Wkg6Vi9e .marker{fill:#333}#mermaid-svg-pSBUv9W1Wkg6Vi9e .marker.cross{stroke:#333}:root { --mermaid-font-family: "trebuchet ms", verdana, arial;}#mermaid-svg-pSBUv9W1Wkg6Vi9e {color: rgba(0, 0, 0, 0.75);font: ;}
run.gpp.customized
postprocess.py
Environment module 工具,常用于高性能计算集群的环境配置管理上。它可以将软件编译器、MPI 库、数学库、应用软件(计算类软件、分析类软件等)等,以模块的方式,统一到一个框架下,使得用户可以动态切换环境变量。
module load cuda/11.0.2
module load pgi/19.10
设置 Nsight Compute CLI 需要收集的指标。
# Time
metrics="sm__cycles_elapsed.avg,\
sm__cycles_elapsed.avg.per_second,"# DP
metrics+="sm__sass_thread_inst_executed_op_dadd_pred_on.sum,\
sm__sass_thread_inst_executed_op_dfma_pred_on.sum,\
sm__sass_thread_inst_executed_op_dmul_pred_on.sum,"# SP
metrics+="sm__sass_thread_inst_executed_op_fadd_pred_on.sum,\
sm__sass_thread_inst_executed_op_ffma_pred_on.sum,\
sm__sass_thread_inst_executed_op_fmul_pred_on.sum,"# HP
metrics+="sm__sass_thread_inst_executed_op_hadd_pred_on.sum,\
sm__sass_thread_inst_executed_op_hfma_pred_on.sum,\
sm__sass_thread_inst_executed_op_hmul_pred_on.sum,"# Tensor Core
metrics+="sm__inst_executed_pipe_tensor.sum,"# DRAM, L2 and L1
metrics+="dram__bytes.sum,\
lts__t_bytes.sum,\
l1tex__t_bytes.sum"
Slurm 是一个开源、容错、高度可扩展的集群管理和作业调度系统,适用于大型和小型 Linux 集群。
srun 用于提交作业以便实时执行或启动作业步骤。
切换到 GPP 目录,编译并运行。
指定-k
参数可以根据内核名称的正则表达式匹配来过滤内核。
cd ../example-codes/GPP/input=gpp214unformatted.dat
dir=../../custom-scripts/# Baseline
output=output.csv
profilestr="ncu -k sigma_gpp_gpu --metrics $metrics --csv"
echo Baseline version
git checkout gpp.f90
make clean
make
srun -n1 $profilestr ./gpp.x $input > $dir/$output 2>&1
切换到优化的4种实现并执行。
# Four optimization steps
for n in `seq 1 4`
dooutput=output$n.csvprofilestr="ncu -k sigma_gpp_gpu --metrics $metrics --csv"echo Patch version: $ngit checkout gpp.f90patch gpp.f90 step$n.patchmake cleanmakesrun -n1 $profilestr ./gpp.x $input > $dir/$output 2>&1
done
调用 postprocess.py 生成 Roofline 图。
module load python/3.7-anaconda-2019.10
cd $dir
srun -n1 python postprocess.py
#mermaid-svg-FVDGwySECbLnH0AR .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-FVDGwySECbLnH0AR .label text{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR .node rect,#mermaid-svg-FVDGwySECbLnH0AR .node circle,#mermaid-svg-FVDGwySECbLnH0AR .node ellipse,#mermaid-svg-FVDGwySECbLnH0AR .node polygon,#mermaid-svg-FVDGwySECbLnH0AR .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-FVDGwySECbLnH0AR .node .label{text-align:center;fill:#333}#mermaid-svg-FVDGwySECbLnH0AR .node.clickable{cursor:pointer}#mermaid-svg-FVDGwySECbLnH0AR .arrowheadPath{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-FVDGwySECbLnH0AR .flowchart-link{stroke:#333;fill:none}#mermaid-svg-FVDGwySECbLnH0AR .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-FVDGwySECbLnH0AR .edgeLabel rect{opacity:0.9}#mermaid-svg-FVDGwySECbLnH0AR .edgeLabel span{color:#333}#mermaid-svg-FVDGwySECbLnH0AR .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-FVDGwySECbLnH0AR .cluster text{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-FVDGwySECbLnH0AR .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-FVDGwySECbLnH0AR text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-FVDGwySECbLnH0AR .actor-line{stroke:grey}#mermaid-svg-FVDGwySECbLnH0AR .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-FVDGwySECbLnH0AR .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-FVDGwySECbLnH0AR #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-FVDGwySECbLnH0AR .sequenceNumber{fill:#fff}#mermaid-svg-FVDGwySECbLnH0AR #sequencenumber{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR #crosshead path{fill:#333;stroke:#333}#mermaid-svg-FVDGwySECbLnH0AR .messageText{fill:#333;stroke:#333}#mermaid-svg-FVDGwySECbLnH0AR .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-FVDGwySECbLnH0AR .labelText,#mermaid-svg-FVDGwySECbLnH0AR .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-FVDGwySECbLnH0AR .loopText,#mermaid-svg-FVDGwySECbLnH0AR .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-FVDGwySECbLnH0AR .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-FVDGwySECbLnH0AR .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-FVDGwySECbLnH0AR .noteText,#mermaid-svg-FVDGwySECbLnH0AR .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-FVDGwySECbLnH0AR .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-FVDGwySECbLnH0AR .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-FVDGwySECbLnH0AR .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-FVDGwySECbLnH0AR .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR .section{stroke:none;opacity:0.2}#mermaid-svg-FVDGwySECbLnH0AR .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-FVDGwySECbLnH0AR .section2{fill:#fff400}#mermaid-svg-FVDGwySECbLnH0AR .section1,#mermaid-svg-FVDGwySECbLnH0AR .section3{fill:#fff;opacity:0.2}#mermaid-svg-FVDGwySECbLnH0AR .sectionTitle0{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR .sectionTitle1{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR .sectionTitle2{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR .sectionTitle3{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-FVDGwySECbLnH0AR .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR .grid path{stroke-width:0}#mermaid-svg-FVDGwySECbLnH0AR .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-FVDGwySECbLnH0AR .task{stroke-width:2}#mermaid-svg-FVDGwySECbLnH0AR .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR .taskText:not([font-size]){font-size:11px}#mermaid-svg-FVDGwySECbLnH0AR .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-FVDGwySECbLnH0AR .task.clickable{cursor:pointer}#mermaid-svg-FVDGwySECbLnH0AR .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-FVDGwySECbLnH0AR .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-FVDGwySECbLnH0AR .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-FVDGwySECbLnH0AR .taskText0,#mermaid-svg-FVDGwySECbLnH0AR .taskText1,#mermaid-svg-FVDGwySECbLnH0AR .taskText2,#mermaid-svg-FVDGwySECbLnH0AR .taskText3{fill:#fff}#mermaid-svg-FVDGwySECbLnH0AR .task0,#mermaid-svg-FVDGwySECbLnH0AR .task1,#mermaid-svg-FVDGwySECbLnH0AR .task2,#mermaid-svg-FVDGwySECbLnH0AR .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-FVDGwySECbLnH0AR .taskTextOutside0,#mermaid-svg-FVDGwySECbLnH0AR .taskTextOutside2{fill:#000}#mermaid-svg-FVDGwySECbLnH0AR .taskTextOutside1,#mermaid-svg-FVDGwySECbLnH0AR .taskTextOutside3{fill:#000}#mermaid-svg-FVDGwySECbLnH0AR .active0,#mermaid-svg-FVDGwySECbLnH0AR .active1,#mermaid-svg-FVDGwySECbLnH0AR .active2,#mermaid-svg-FVDGwySECbLnH0AR .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-FVDGwySECbLnH0AR .activeText0,#mermaid-svg-FVDGwySECbLnH0AR .activeText1,#mermaid-svg-FVDGwySECbLnH0AR .activeText2,#mermaid-svg-FVDGwySECbLnH0AR .activeText3{fill:#000 !important}#mermaid-svg-FVDGwySECbLnH0AR .done0,#mermaid-svg-FVDGwySECbLnH0AR .done1,#mermaid-svg-FVDGwySECbLnH0AR .done2,#mermaid-svg-FVDGwySECbLnH0AR .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-FVDGwySECbLnH0AR .doneText0,#mermaid-svg-FVDGwySECbLnH0AR .doneText1,#mermaid-svg-FVDGwySECbLnH0AR .doneText2,#mermaid-svg-FVDGwySECbLnH0AR .doneText3{fill:#000 !important}#mermaid-svg-FVDGwySECbLnH0AR .crit0,#mermaid-svg-FVDGwySECbLnH0AR .crit1,#mermaid-svg-FVDGwySECbLnH0AR .crit2,#mermaid-svg-FVDGwySECbLnH0AR .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-FVDGwySECbLnH0AR .activeCrit0,#mermaid-svg-FVDGwySECbLnH0AR .activeCrit1,#mermaid-svg-FVDGwySECbLnH0AR .activeCrit2,#mermaid-svg-FVDGwySECbLnH0AR .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-FVDGwySECbLnH0AR .doneCrit0,#mermaid-svg-FVDGwySECbLnH0AR .doneCrit1,#mermaid-svg-FVDGwySECbLnH0AR .doneCrit2,#mermaid-svg-FVDGwySECbLnH0AR .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-FVDGwySECbLnH0AR .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-FVDGwySECbLnH0AR .milestoneText{font-style:italic}#mermaid-svg-FVDGwySECbLnH0AR .doneCritText0,#mermaid-svg-FVDGwySECbLnH0AR .doneCritText1,#mermaid-svg-FVDGwySECbLnH0AR .doneCritText2,#mermaid-svg-FVDGwySECbLnH0AR .doneCritText3{fill:#000 !important}#mermaid-svg-FVDGwySECbLnH0AR .activeCritText0,#mermaid-svg-FVDGwySECbLnH0AR .activeCritText1,#mermaid-svg-FVDGwySECbLnH0AR .activeCritText2,#mermaid-svg-FVDGwySECbLnH0AR .activeCritText3{fill:#000 !important}#mermaid-svg-FVDGwySECbLnH0AR .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-FVDGwySECbLnH0AR g.classGroup text .title{font-weight:bolder}#mermaid-svg-FVDGwySECbLnH0AR g.clickable{cursor:pointer}#mermaid-svg-FVDGwySECbLnH0AR g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-FVDGwySECbLnH0AR g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-FVDGwySECbLnH0AR .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-FVDGwySECbLnH0AR .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-FVDGwySECbLnH0AR .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-FVDGwySECbLnH0AR .dashed-line{stroke-dasharray:3}#mermaid-svg-FVDGwySECbLnH0AR #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-FVDGwySECbLnH0AR #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-FVDGwySECbLnH0AR #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-FVDGwySECbLnH0AR #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-FVDGwySECbLnH0AR #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-FVDGwySECbLnH0AR #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-FVDGwySECbLnH0AR #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-FVDGwySECbLnH0AR #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-FVDGwySECbLnH0AR .commit-id,#mermaid-svg-FVDGwySECbLnH0AR .commit-msg,#mermaid-svg-FVDGwySECbLnH0AR .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-FVDGwySECbLnH0AR g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-FVDGwySECbLnH0AR g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-FVDGwySECbLnH0AR g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-FVDGwySECbLnH0AR .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-FVDGwySECbLnH0AR .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-FVDGwySECbLnH0AR .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-FVDGwySECbLnH0AR .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-FVDGwySECbLnH0AR .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-FVDGwySECbLnH0AR .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-FVDGwySECbLnH0AR .edgeLabel text{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-FVDGwySECbLnH0AR .node circle.state-start{fill:black;stroke:black}#mermaid-svg-FVDGwySECbLnH0AR .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-FVDGwySECbLnH0AR #statediagram-barbEnd{fill:#9370db}#mermaid-svg-FVDGwySECbLnH0AR .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-FVDGwySECbLnH0AR .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-FVDGwySECbLnH0AR .statediagram-state .divider{stroke:#9370db}#mermaid-svg-FVDGwySECbLnH0AR .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-FVDGwySECbLnH0AR .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-FVDGwySECbLnH0AR .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-FVDGwySECbLnH0AR .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-FVDGwySECbLnH0AR .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-FVDGwySECbLnH0AR .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-FVDGwySECbLnH0AR .note-edge{stroke-dasharray:5}#mermaid-svg-FVDGwySECbLnH0AR .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-FVDGwySECbLnH0AR .error-icon{fill:#522}#mermaid-svg-FVDGwySECbLnH0AR .error-text{fill:#522;stroke:#522}#mermaid-svg-FVDGwySECbLnH0AR .edge-thickness-normal{stroke-width:2px}#mermaid-svg-FVDGwySECbLnH0AR .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-FVDGwySECbLnH0AR .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-FVDGwySECbLnH0AR .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-FVDGwySECbLnH0AR .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-FVDGwySECbLnH0AR .marker{fill:#333}#mermaid-svg-FVDGwySECbLnH0AR .marker.cross{stroke:#333}:root { --mermaid-font-family: "trebuchet ms", verdana, arial;}#mermaid-svg-FVDGwySECbLnH0AR {color: rgba(0, 0, 0, 0.75);font: ;}
postprocess.py
roofline
files
为当前路径下"output"开头的csv文件列表
datadir='.'
files=[x for x in os.listdir(datadir) if x.endswith('.csv') and x.startswith('output')]
files.sort()
files=[os.path.join(datadir,file) for file in files]
变量名用file
不太可取。
获取文件行数。
pandas.read_csv 读取时跳过最后一行。
pandas.DataFrame.groupby 使用映射器或一系列列对 DataFrame 进行分组,返回一个pandas.core.groupby.DataFrameGroupBy
对象。
pandas.pivot_table 创建一个电子表格样式的数据透视表作为 DataFrame。
按’Kernel Name’和’Metric Name’两列分组求和。
pandas.DataFrame.shape 返回一个表示 DataFrame 维度的元组。
计算的结果放入了dfs[tag]
中。
dfs={}
for file in files:tag, ext = os.path.splitext(os.path.basename(file))dfs[tag]=pd.DataFrame()with open(file,'r') as f:cnt=0while True:ln=f.readline()if not ln:breakcnt+=1if 'Host Name' in ln:breakdf = pd.read_csv(file, skiprows=cnt-1)dft=df.groupby(['Kernel Name','Metric Name']).sum()dfmetric=pd.pivot_table(dft, index='Kernel Name', columns='Metric Name', values='Metric Value')dfmetric['Count']=df.groupby(['Kernel Name']).count()['ID'].div(dfmetric.shape[1])
time=cyclesrate\mathrm{time} = \frac{\mathrm{cycles}}{\mathrm{rate}} time=ratecycles
dfmetric['Time']=dfmetric['sm__cycles_elapsed.avg'] \/ (dfmetric['sm__cycles_elapsed.avg.per_second'] /dfmetric['Count'] )
add+2×fma+mul\mathrm{add} + 2\times \mathrm{fma} + \mathrm{mul} add+2×fma+mul
dfmetric['CC FLOPs']= 2 * dfmetric['sm__sass_thread_inst_executed_op_dfma_pred_on.sum'] \+ dfmetric['sm__sass_thread_inst_executed_op_dmul_pred_on.sum'] \+ dfmetric['sm__sass_thread_inst_executed_op_dadd_pred_on.sum'] \+ 2 * dfmetric['sm__sass_thread_inst_executed_op_ffma_pred_on.sum'] \+ dfmetric['sm__sass_thread_inst_executed_op_fmul_pred_on.sum'] \+ dfmetric['sm__sass_thread_inst_executed_op_fadd_pred_on.sum'] \+ 2 * dfmetric['sm__sass_thread_inst_executed_op_hfma_pred_on.sum'] \+ dfmetric['sm__sass_thread_inst_executed_op_hmul_pred_on.sum'] \+ dfmetric['sm__sass_thread_inst_executed_op_hadd_pred_on.sum']
FLOPtc=Insttc×512\mathrm{FLOP_{tc}} = \mathrm{Inst_{tc}}\times 512 FLOPtc=Insttc×512
dfmetric['TC FLOPs']= 512 * dfmetric['sm__inst_executed_pipe_tensor.sum']dfmetric['all FLOPs']= dfmetric['CC FLOPs'] + dfmetric['TC FLOPs']
dfmetric['AI HBM'] = dfmetric['all FLOPs'].div(dfmetric['dram__bytes.sum'])dfmetric['AI L2'] = dfmetric['all FLOPs'].div(dfmetric['lts__t_bytes.sum'])dfmetric['AI L1'] = dfmetric['all FLOPs'].div(dfmetric['l1tex__t_bytes.sum'])dfmetric['GFLOP/s'] = dfmetric['all FLOPs']/ dfmetric['Time'] /1024/1024/1024dfmetric['TC GFLOP/s'] = dfmetric['TC FLOPs']/ dfmetric['Time'] /1024/1024/1024
# dfmetric.to_csv('pd_'+tag+'.csv')dfs[tag]=dfmetric
对于每个文件的结果,
pandas.Index.tolist 返回值列表。
pandas.Series.tolist 返回值列表。
这样 roofline 函数不再需要调用 Pandas 的库函数。
tags=dfs.keys()
flags=['all'] #'HBM','L2','L1' or 'all'
for tag in tags:for flag in flags:dfm=dfs[tag]LABELS = dfm.index.tolist()AIL1 = dfm['AI L1'].tolist()AIL2 = dfm['AI L2'].tolist()AIHBM = dfm['AI HBM'].tolist()FLOPS = dfm['GFLOP/s'].tolist()roofline(tag, FLOPS, AIHBM, AIL2, AIL1, LABELS, flag)
检查输入参数是否为空。
def roofline(filename, FLOPS, AIHBM, AIL2=None, AIL1=None, LABELS=None, flag='HBM'):if not FLOPS:print('FLOPS can not be empty!')returnif max(FLOPS)==0:print('FLOPS are all 0s!')returnif (not AIHBM) and (not AIL2) and (not AIL1):print('AIHBM, AIL2 and AIL1 can not all be empty!')returnif (len(FLOPS) != len(AIHBM)) or (len(FLOPS) != len(AIL2)) or (len(FLOPS) != len(AIL1)):print('FLOPS needs to have the same length as AI!')returnif (flag != 'HBM') and (flag != 'L2') and (flag != 'L1') and (flag != 'all'):print('flag needs to be one of HBM, L2, L1, and all!')return
memRoofs
和cmpRoofs
为提前确定好的值。
matplotlib.pyplot.figure 创建新图窗,或激活现有图窗。figsize
为以英寸为单位的宽和高。
matplotlib.pyplot.clf 清除当前图形。
matplotlib.figure.Figure.gca 获取当前轴。
matplotlib.axes.Axes.set_xscale 设置 x 轴比例。
matplotlib.axes.Axes.set_xlabel 设置 x 轴的标签。
matplotlib.axes.Axes.set_xlim 设置 x 轴视图限制。
matplotlib.axes.Axes.get_xlim 返回 x 轴视图限制。
x 轴和 y 轴对数尺度,其中 x 轴的可见区间为 [10xmin,10xmax][10^{x_{min}}, 10^{x_{max}}][10xmin,10xmax]。
LABELS = [x[:maxchar] for x in LABELS]memRoofs = [('L1', 54000.), ('L2', 2996.77), ('HBM', 828.76)] cmpRoofs = [('Tensor', 96.9),('DP', 7.8)]fig = plt.figure(1,figsize=(10.67,6.6))plt.clf()ax = fig.gca()ax.set_xscale('log')ax.set_yscale('log')ax.set_xlabel('Arithmetic Intensity [FLOPs/Byte]')ax.set_ylabel('Performance [GFLOP/sec]')nx = 10000xmin = -3 xmax = 3ymin = 1ymax = 200000ax.set_xlim(10**xmin, 10**xmax)ax.set_ylim(ymin, ymax)ixx = int(nx*0.02)xlim = ax.get_xlim()ylim = ax.get_ylim()
numpy.logspace 返回在[10**xmin, 10**xmax)
区间内对数刻度上nx
个均匀间隔的数字。
x
与memRoofs
相乘即可得到 y 轴上的性能值。
对于cmpRoofs
中的每一种,如果当前位置的计算小于 L1的限制而前一点的计算大于 L1的限制,则将前一点加入scomp_x_elbow
和scomp_ix_elbow
中。
对于memRoofs
中的每一种,如果当前位置的内存带宽大于 Tensor 算力的限制而前一点的内存带宽小于 Tensor 算力的限制,则将前一点加入smem_x_elbow
和smem_ix_elbow
中。
scomp_x_elbow = []scomp_ix_elbow = []smem_x_elbow = []smem_ix_elbow = []x = np.logspace(xmin,xmax,nx)for roof in cmpRoofs:for ix in range(1,nx):if float(memRoofs[0][1] * x[ix]) >= roof[1]*1024 and (memRoofs[0][1] * x[ix-1]) < roof[1]*1024:scomp_x_elbow.append(x[ix-1])scomp_ix_elbow.append(ix-1)breakfor roof in memRoofs:for ix in range(1,nx):if (cmpRoofs[0][1]*1024 <= roof[1] * x[ix] and cmpRoofs[0][1]*1024 > roof[1] * x[ix-1]):smem_x_elbow.append(x[ix-1])smem_ix_elbow.append(ix-1)break
绘制 Roofline 的折线。
对于每种cmpRoofs
,绘制转弯后的部分。
对于每种memRoofs
,绘制转弯前的部分。
这里使用len(cmpRoofs)
和len(memRoofs)
可能会遇到访问错误,换成len(scomp_ix_elbow)
和len(smem_ix_elbow)
较为合适。
matplotlib.axes.Axes.plot 用于绘制 XY 坐标系的点、线或其他标记形状。
color 为黑色,linestyle 为实线,linewidth 为 2 个像素。
for i in range(len(cmpRoofs)):roof = cmpRoofs[i][1]*1024y = np.ones(len(x)) * roofax.plot(x[scomp_ix_elbow[i]:],y[scomp_ix_elbow[i]:],c='k',ls='-',lw='2')for i in range(len(memRoofs)):roof = memRoofs[i][1]y = x * roofax.plot(x[:smem_ix_elbow[i]+1],y[:smem_ix_elbow[i]+1],c='k',ls='-',lw='2')
绘制 kernel 性能数据到图上。L1的为圆圈,L2的为方形标记,HBM 的为倒三角标记。
按照AIHBM
的长度遍历,这样假定其总是存在且长度匹配的。
根据flag
来决定绘制哪一部分的结果。
从 colors 列表中取不同的颜色。
LABELS 为图例中的标签。
for i in range(len(AIHBM)):if flag == 'L1':ax.plot(float(AIL1[i]),float(FLOPS[i]),c=colors[i%10],marker=styles[0],\linestyle='None',ms=markersize,markerfacecolor='none',\markeredgewidth=markerwidth,label=LABELS[i] if LABELS else "unknown")elif flag == 'L2':ax.plot(float(AIL2[i]),float(FLOPS[i]),c=colors[i%10],marker=styles[1],\linestyle='None',ms=markersize,markerfacecolor='none',\markeredgewidth=markerwidth,label=LABELS[i] if LABELS else "unknown")elif flag == 'HBM':ax.plot(float(AIHBM[i]),float(FLOPS[i]),c=colors[i%10],marker=styles[2],\linestyle='None',ms=markersize,markerfacecolor='none',\markeredgewidth=markerwidth,label=LABELS[i] if LABELS else "unknown")elif flag == 'all':ax.plot(float(AIL1[i]),float(FLOPS[i]),c=colors[i%10],marker=styles[0],\linestyle='None',ms=markersize,markerfacecolor='none',\markeredgewidth=markerwidth,label=LABELS[i] if LABELS else "unknown")ax.plot(float(AIL2[i]),float(FLOPS[i]),c=colors[i%10],marker=styles[1],\linestyle='None',ms=markersize,markerfacecolor='none',\markeredgewidth=markerwidth,label=LABELS[i] if LABELS else "unknown")ax.plot(float(AIHBM[i]),float(FLOPS[i]),c=colors[i%10],marker=styles[2],\linestyle='None',ms=markersize,markerfacecolor='none',\markeredgewidth=markerwidth,label=LABELS[i] if LABELS else "unknown")
matplotlib.axes.Axes.plot 会返回 matplotlib.lines.Line2D 对象的列表。
marker_handles = [] if flag == 'L1':marker_handles.append(ax.plot([],[],c='k',marker=styles[0],linestyle='None',ms=markersize,\markerfacecolor='none',markeredgewidth=markerwidth,label=memRoofs[0][0])[0])elif flag == 'L2':marker_handles.append(ax.plot([],[],c='k',marker=styles[1],linestyle='None',ms=markersize,\markerfacecolor='none',markeredgewidth=markerwidth,label=memRoofs[1][0])[0])elif flag == 'HBM':marker_handles.append(ax.plot([],[],c='k',marker=styles[2],linestyle='None',ms=markersize,\markerfacecolor='none',markeredgewidth=markerwidth,label=memRoofs[2][0])[0])elif flag == 'all':for i in range(len(memRoofs)):marker_handles.append(ax.plot([],[],c='k',marker=styles[i],linestyle='None',ms=markersize,\markerfacecolor='none',markeredgewidth=markerwidth,label=memRoofs[i][0])[0])
matplotlib.axes.Axes.text 向轴添加计算峰值和内存速率数据。
for roof in cmpRoofs:ax.text(x[-ixx],roof[1]*1024,roof[0] + ': ' + '{0:.1f}'.format(roof[1]) + ' TFLOP/s',horizontalalignment='right',verticalalignment='bottom')for roof in memRoofs:ang = np.arctan(np.log10(xlim[1]/xlim[0]) / np.log10(ylim[1]/ylim[0])* fig.get_size_inches()[1]/fig.get_size_inches()[0] )if x[ixx]*roof[1] >ymin:ax.text(x[ixx],x[ixx]*roof[1]*(1+0.25*np.sin(ang)**2),roof[0] + ': ' + '{0:.1f}'.format(float(roof[1])) + ' GB/s',horizontalalignment='left',verticalalignment='bottom',rotation=180/np.pi*ang)else:ymin_ix_elbow=list()ymin_x_elbow=list()for ix in range(1,nx):if (ymin <= roof[1] * x[ix] and ymin > roof[1] * x[ix-1]):ymin_x_elbow.append(x[ix-1])ymin_ix_elbow.append(ix-1)breakax.text(x[ixx+ymin_ix_elbow[0]],x[ixx+ymin_ix_elbow[0]]*roof[1]*(1+0.25*np.sin(ang)**2),roof[0] + ': ' + '{0:.1f}'.format(float(roof[1])) + ' GB/s',horizontalalignment='left',verticalalignment='bottom',rotation=180/np.pi*ang)
matplotlib.pyplot.legend 在右下方放置一个内存类型的图例marker_handles
。
matplotlib.axes.Axes.add_artist 添加 Artist
matplotlib.patches.Patch 是具有外观和边缘颜色的 2D Artist。
leg2
中使用的loc=4
不易理解。
matplotlib.pyplot.savefig 保存当前图窗。
leg1 = plt.legend(handles = marker_handles,loc='lower right', ncol=len(flag[0]) if 'all' not in flag else 3,bbox_to_anchor = (1,0))ax.add_artist(leg1)patch_handles = list()for i in range(0,len(AIHBM)):if FLOPS[i] > 0:patch_handles.append(mpatches.Patch(color=colors[i%10],label = LABELS[i] if LABELS else "unknown"))leg2 = plt.legend(handles = patch_handles,loc=4,ncol=1,bbox_to_anchor = (1,0.1),scatterpoints = 1)ax.text(xlim[0]*1.1,ylim[1]/1.1, '-'.join([filename,flag]), horizontalalignment='left',verticalalignment='top')
# plt.title('-'.join([filename,flag]))plt.savefig('_'.join([filename,flag])+'.png')
# plt.savefig('_'.join([filename,flag])+'.eps')# plt.show()
理想效果为:
参考资料:
- Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis
- Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs
- 8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks
- Pandas透视表(pivot_table)详解
- Pandas | 一文看懂透视表pivot_table
- 一文看懂pandas的透视表pivot_table
- Python学习笔记(7)——Matplotlib中的Axes.plot(绘制点、线和标记)的用法
- List of named colors
- Matplotlib : making a colored markers legend from scratch
- pandas之分组groupby()的使用整理与总结
- Pandas教程 | 超好用的Groupby用法详解
- Pandas透视表(pivot_table)详解
- Pandas group-by and sum
- 数据结构简介
- cyanguwa/nersc-roofline
- Performance and Algorithms Research
- Kernel Profiling Guide
- Nsight Compute CLI
- 用Modules优雅地管理你的环境变量
- 设置编译及运行环境
- Environment Modules 环境变量模块化管理工具的使用
- Environment Modules
- module
- Python 数据分析三剑客之 Matplotlib(三):图例/LaTeX/刻度/子图/补丁等基本图像属性
- Python matplotlib notes整理总结
- 路径教程
- matplotlib-绘制精美的图表
- 可视化组队学习2:艺术画笔见乾坤
- Matplotlib 中如何在图像上绘制矩形
- How to find the min/max value of a common key in a list of dicts?
- GeorgOfenbeck/perfplot
- arm-hpc/roofline
- Techercise/AMD-Instruction-Roofline-using-rocProf-Metrics
- Tutorial: Empirical Roofline Model
- jeewhanchoi/a-roofline-model-of-energy-ubenchmarks
Roofline-on-NVIDIA-GPUs代码分析相关推荐
- NVIDIA GPUs上深度学习推荐模型的优化
NVIDIA GPUs上深度学习推荐模型的优化 Optimizing the Deep Learning Recommendation Model on NVIDIA GPUs 推荐系统帮助人在成倍增 ...
- 基于NVIDIA GPUs的深度学习训练新优化
基于NVIDIA GPUs的深度学习训练新优化 New Optimizations To Accelerate Deep Learning Training on NVIDIA GPUs 不同行业采用 ...
- 2018-2019-2 20165209 《网络对抗技术》Exp4:恶意代码分析
2018-2019-2 20165209 <网络对抗技术>Exp4:恶意代码分析 1 基础问题回答和实验内容 1.1基础问题回答 如果在工作中怀疑一台主机上有恶意代码,但只是猜想,所有想监 ...
- Exp4 恶意代码分析 20164321 王君陶
Exp4 恶意代码分析 20164321 王君陶 1.实践目标 1.1是监控你自己系统的运行状态,看有没有可疑的程序在运行. 1.2是分析一个恶意软件,就分析Exp2或Exp3中生成后门软件:分析工具 ...
- 【小白入门】超详细的OCRnet详解(含代码分析)
[小白入门]超详细的OCRnet详解(含代码分析) OCRnet 简介 网络结构 具体实现(含代码分析) 实验结果 本文仅梳理总结自己在学习过程中的一些理解和思路,不保证绝对正确,请酌情参考.如果各位 ...
- SSD(2)代码分析
目录 代码运行 代码分析 代码运行 同样分析tensorflow版的实现,代码地址:SSD-Tensorflow 1.预测 unzip ssd_300_vgg.ckpt.zip jupyter not ...
- NVIDIA GPUs Compute Capability 英伟达显卡计算力简介及cuda支持显卡链接
深度学习中我们对GPU的计算能力一般是要求大于5.0,具体情况具体分析,低于5.0也并非一定不可以. 那为啥不用CPU?CPU只能一个一个按照顺序进行运算,GPU可以利用多个CUDA核心并行进行运算, ...
- 20145236《网络攻防》Exp4 恶意代码分析
20145236<网络攻防>Exp4 恶意代码分析 一.基础问题回答 如果在工作中怀疑一台主机上有恶意代码,但只是猜想,所有想监控下系统一天天的到底在干些什么.请设计下你想监控的操作有哪些 ...
- C#中类的继承 override virtual new的作用以及代码分析
继承中override virtual new的作用 virtual 父类中需要注明允许重写的方法: override 子类中必须显示声明该方法是重写的父类中的方法: new 子类中忽略父类的已存在的 ...
最新文章
- 使用pxe来实现无人值守linux
- 元气森林,饮料界的小罐茶?
- 规则引擎:大厂营销系统资格设计全解
- ACM10.14题解
- global全局变量
- 推荐我们在B站免费的生信入门基础课程|测序原理,GO/GSEA/WGCNA
- ANSI颜色字体一篇通
- 朝鲜 APT37被指发动软件供应链攻击,瞄准股票投资人
- Linux下编译环境及Makefile的学习笔记
- NVIDIA Game Ready 显卡驱动517.48发布!为《守望先锋2》做好游戏准备
- 博图注册表删除方法_「博图+仿真+授权」西门子软件安装指南及注意事项
- Cadence16.6 最新83号补丁下载-Hotfix_SPB16.60.083_wint_1of1.exe
- 如何更改您的Apple ID电子邮件地址
- TLR4助力攻克脑血管难题 | MedChemExpress
- Cracker学习——任务1
- beyond compare 4 This license key has been revoked 出现的问题与解决办法
- IDEA必装插件-Gyro(强烈推荐)
- 条码打印四 - 1.打印管理库函数Winspool.drv
- 北海450值得入手吗?附带(越野萝莉)照片
- 基于卷积神经网络的序列特异性预测研究--云南大学范航恺硕士论文
热门文章
- PCM开发板模块实验指导--SPI读写PSRAM64实验
- Android中如何利用Minui显示PNG格式的图片
- 20210714学习手记 CANopen 协议
- 谈谈CANopen协议的机制
- 11.9 至 11.17 四道典型题记录: Counter 弹出 | map函数 | 子集求取 | 有序字符桶分装
- 企业如何提高客户转化率、复购率?用快鲸scrm效果突出
- 【自动化运维新手村】Flask-ORM关联查询
- IDEA如何在包下面继续建包
- HP34401a实现高精度温度测量
- python discuz_[Python代码]Discuz!论坛(X2.5)发帖及回复脚本