调试过程,垃圾内容,勿读

文章目录

  • 生成SSA IR的命令
  • lDominatorTreeWrapperPass
    • compute dominance tree & dominant frontier
    • The SEMI-NCA algorithm
    • Debug Process
      • DominatorTreeBase::recalculate
        • runDFS
        • runSemiNCA
  • mem2reg
    • place ϕ\phiϕ-function
      • 收集alloca信息
        • `DefiningBlocks.size()=1`
        • `OnlyUsedInOneBlock`
        • 正常情况
          • `ComputeLiveInBlocks`
        • 插入ϕ\phiϕ node
          • DJ-graphs(Dominator edge, Join edge)
          • Iterated Dominance Fontier `IDF.calculate(PHIBlocks)`
          • `PromoteMem2Reg::QueuePhiNode`
    • rename
      • `PromoteMem2Reg::RenamePass()`
        • 处理store instruction & load instruction
        • 填充ϕ\phiϕ-node
        • 整个迭代过程
    • 清理

我在文章《 构造SSA》中介绍了如何构造 SSA,也就是place ϕ\phiϕ- functionrename到后面的 SSA destruction。这篇文章一步步调试给出LLVM如何构造最终的SSA。

int fac(int num) {if (num == 1)return 1;return num * fac(num - 1)
}
int main() {fac(10);
}

在介绍llvm如何生成SSA之前,先介绍如何生成带有ϕ\phiϕ-instruction的IR。对IR不熟悉的话,《2019 EuroLLVM Developers’ Meeting: V. Bridgers & F. Piovezan “LLVM IR Tutorial - Phis, GEPs …”》是入门LLVM IR最好的视频。

Clang itself does not produce optimized LLVM IR. It produces fairly straightforward IR wherein locals are kept in memory (using allocas). The optimizations are done by opt on LLVM IR level, and one of the most important optimizations is indeed mem2reg which makes sure that locals are represented in LLVM’s SSA values instead of memory. - 《How to get “phi” instruction in llvm without optimization》

// test.c
int foo(int a, int b) {int r;if (a > b)r = a;else r = b;return r;
}

对于上面的代码,使用clang直接生成的IR如下所示,我们可以看到IR还是非常原始的。

// clang -S -emit-llvm test.c -o test_original.ll
; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.15.0"
; Function Attrs: noinline nounwind optnone ssp uwtable
define i32 @foo(i32 %a, i32 %b) #0 {entry:%a.addr = alloca i32, align 4%b.addr = alloca i32, align 4%r = alloca i32, align 4store i32 %a, i32* %a.addr, align 4store i32 %b, i32* %b.addr, align 4%0 = load i32, i32* %a.addr, align 4%1 = load i32, i32* %b.addr, align 4%cmp = icmp sgt i32 %0, %1br i1 %cmp, label %if.then, label %if.elseif.then:                                          ; preds = %entry%2 = load i32, i32* %a.addr, align 4store i32 %2, i32* %r, align 4br label %if.endif.else:                                          ; preds = %entry%3 = load i32, i32* %b.addr, align 4store i32 %3, i32* %r, align 4br label %if.endif.end:                                           ; preds = %if.else, %if.then%4 = load i32, i32* %r, align 4ret i32 %4
}attributes #0 = { noinline nounwind optnone ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 10, i32 15]}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 7, !"PIC Level", i32 2}
!3 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project.git 36663d506e31a43934f10dff5a3020d3aad41ef1)"}

llvm中使用-mem2reg,来将上述IR中的allocastoreload指令删除,并将代码转化为SSA IR。

This file promotes memory references to be register references. It promotes alloca instructions which only have loads and stores as uses. An alloca is transformed by using dominator frontiers to place phi nodes, then traversing the function in depth-first order to rewrite loads and stores as appropriate. This is just the standard SSA construction algorithm to construct “pruned” SSA form. - mem2reg: Promote Memory to Register

生成SSA IR的命令

生成含有ϕ\phiϕ-instruction的命令如下:

$clang -S -emit-llvm -Xclang -disable-O0-optnone test.c // 生成人类可读的IR
$opt -mem2reg test.ll -o test.bc // 将IR转换成SSA形式
$llvm-dis test.bc // 使用llvm-dis生成人类可读的形式

上述指令中的-disable-O0-optnone来删除optnone属性,从而使opt能调用pass。第一条命令生成的结果如下:

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.15.0"
; Function Attrs: noinline nounwind ssp uwtable
define i32 @foo(i32 %a, i32 %b) #0 {entry:%a.addr = alloca i32, align 4%b.addr = alloca i32, align 4%r = alloca i32, align 4store i32 %a, i32* %a.addr, align 4store i32 %b, i32* %b.addr, align 4%0 = load i32, i32* %a.addr, align 4%1 = load i32, i32* %b.addr, align 4%cmp = icmp sgt i32 %0, %1br i1 %cmp, label %if.then, label %if.elseif.then:                                          ; preds = %entry%2 = load i32, i32* %a.addr, align 4store i32 %2, i32* %r, align 4br label %if.endif.else:                                          ; preds = %entry%3 = load i32, i32* %b.addr, align 4store i32 %3, i32* %r, align 4br label %if.endif.end:                                           ; preds = %if.else, %if.then%4 = load i32, i32* %r, align 4ret i32 %4
}attributes #0 = { noinline nounwind ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 10, i32 15]}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 7, !"PIC Level", i32 2}
!3 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project.git 36663d506e31a43934f10dff5a3020d3aad41ef1)"}

第二条命令生成的结果如下:

; ModuleID = 'test.bc'
source_filename = "test.c"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.15.0"
; Function Attrs: noinline nounwind ssp uwtable
define i32 @foo(i32 %a, i32 %b) #0 {entry:%cmp = icmp sgt i32 %a, %bbr i1 %cmp, label %if.then, label %if.elseif.then:                                          ; preds = %entrybr label %if.endif.else:                                          ; preds = %entrybr label %if.endif.end:                                           ; preds = %if.else, %if.then%r.0 = phi i32 [ %a, %if.then ], [ %b, %if.else ]ret i32 %r.0
}attributes #0 = { noinline nounwind ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 10, i32 15]}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 7, !"PIC Level", i32 2}
!3 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project.git 36663d506e31a43934f10dff5a3020d3aad41ef1)"}

IRGen: Add optnone attribute on function during O0
[llvm-dev] Clang/LLVM 5.0 optnone attribute with -O0
LLVM opt mem2reg has no effect
Assignment 1: Introduction to LLVM
-O0 is not a recommended option for clang
opt is defunct when code built without optimizations

lDominatorTreeWrapperPass

dominator信息的计算是由lDominatorTreeWrapperPass完成的,这个pass也是命令opt -mem2reg test.ll -o test.bc在这个module上跑的第一个pass。

compute dominance tree & dominant frontier

llvm在2017年使用SEMI-NCA算法代替传统的LT算法计算dominator信息,见《2017 LLVM Developers’ Meeting: J. Kuderski “Dominator Trees and incremental updates that transcend time》。

首先使用命令opt -dot-cfg ...生成示例代码的CFG图,如下所示:

The SEMI-NCA algorithm

SEMI-NCA算法是由[Dominators] Use Semi-NCA instead of SLT to calculate dominators提进llvm的。
注:关于SEMI-NCA算法的细节请见再谈Dominator Tree的计算

Debug Process


上面这个图展示了执行到DominatorTreeWrapperPass入口之前的调用关系,我们可以看到dominator pass是众多passes中占比很小的一部分。中间涉及到的各个类的继承关系如下:

对应的代码及其相关的注释如下:

// LegacyPassManager.cpp// PassManager manages ModulePassManagers
class PassManager : public PassManagerBase{// ...
};// run - Execute all of the passes scheduled for execution. Keep track of
// whether any of the passes modifies the module, and if so, return true.
bool PassManager::run(Module &M) {return PM->run(M);
}
//===----------------------------------------------------------------------===//
// MPPassManager
//
// MPPassManager manages ModulePasses and function pass managers.
// It batches all Module passes and function pass managers together and
// sequences them to process one module.
class MPPassManager : public Pass, public PMDataManager {// ...
};// Execute all of the passes scheduled for execution by invoking
// runOnModule method. Keep track of whether any of the passes modifies
// the module, and if so, return true.
bool MPPassManager::runOnModule(Module &M) {// ...for (unsinged Index = 0; Index < getNumContainedPasses(); ++Index) {// ...LocalChanged |= MP->runOnModule(M);// ...}
}
// LegacyPasssManager.cpp::FPPassManager::runOnFunction// FPPassManager manages BBPassManagers and FunctionPasses.
// It batches all function passes and basic block pass managers together and
// sequence them to process one function at a time before processing next
// function.
class FPPassManager : public ModulePass, public PMDataManager {// ...
};// Execute all of the passes scheduled of execution by invoking
/// runOnFunction method. Keep track of whether any of the passes modifies
/// the function, and if so, return true.
bool FPPassManager::runOnFunction(Function &F) {// ...for (unsigned Index = 0; Index < getNumContainedPasses(); ++Index) {FunctionPass *FP = getcontainedPass(Index);bool LocalChanged = false;{// ...LocalChanged |= FP->runOnFunction(F);// ...}}// ...
}
bool DominatorTreeWrapperPass::runOnFunction(Function &F) {DT.recalculate(F);return false;
}

DominatorTreeBase::recalculate

下面就进入了真正的dominator tree计算过程,SemiNCAInfo<DomTreeT>::CalculateFromScratch执行具体的计算。

/// recalculate - compute a dominator tree for the given function
void recalculate(ParentType &Func) {Parent = &Func;DomTreeBuilder::Calculate(*this);
}
// ...
template <class DomTreeT>
void Calculate(DomTreeT &DT) {SemiNCAInfo<DomTreeT>::CalculateFromScratch(DT, nullptr);
}

SemiNCAInfo<DomTreeT>::CalculateFromScratch就是一个典型的SEMA-NCA的算法实现了,第一步doFullDFSWalk,第二步执行runSemiNCA

static void CalculateFromScratch(DomTree &DT, BatchUpdatePtr BUI) {auto *Parent = DT.Parent;DT.reset();DT.parent = Parent;SemiNCAInfo SNCA(nullptr); // Since we are rebuilding the whole tree,// there is no point doing it incrementally.// Step #0: Number blocks in depth-first order and initialize variables used // in later stages of the algorithm.DT.Roots = FindRoots(DT, nullptr);SNCA.doFullDFSWalk(DT, AlwaysDescend);SNCA.runSemiNCA(DT);if (BUI) {BUI->IsRecalculated = true;LLVM_DEBUG(dbgs() << "DomTree recalculated, skipping future batch updates\n");}if (DT.Roots.empty()) return;// Add a node for the root. If the tree is a PostDominatorTree it will be// the virtual exit (denoted by (BasicBlock *) nullptr) which postdominates// all real exits (including multiple exit blocks, infinite loops).NodePtr Root = IsPostDom ? nullptr : DT.Roots[0]DT.RootNode = (DT.DomTreeNodes[Root] = std::make_unique<DomTreeNodeBase<NodeT>>(Root, nullptr)).get();SNCA.attachNewSubTree(DT, DT.RootNode);
}

runDFS

runDFS是一个栈实现的典型深度优先遍历,其中对BasicBlock进行了DFS编号,并记录了逆children关系,这里就不展开了。

// Custom DFS implementation which can skip nodes based on a provided
// predicate. It also collects ReverseChildren so that we don't have to spend
// time getting predecessors in SemiNCA.
//
// If IsReverse is set to true, the DFS walk will be performed backwards
// relative to IsPostDom -- using reverse edges for dominators and forward
// edges for postdominators.
template <bool IsReverse = false, typename DescendCondition>
unsigned runDFS(NodePtr V, unsigned LastNum, DescendCondition Condition, unsigned AttachToNum) {// ...
}

经过runDFS之后,最开始的CFG图变为下面的样子。

runSemiNCA

runSemiNCA可以分为典型的两步,第一步以reverse preorder计算sdomsdomsdom值,第二步以preorder序通过NCA计算idomidomidom值。

// This function requires DFS to be run before calling it.
void runSemiNCA(DomTreeT &DT, const unsigned MinLevel != 0) {const unsigned NextDFSNum(NumToNode.size());// Initialize IDoms to spanning tree parents.for (unsigned i = 1; i < NextDFSNum; ++i) {const NodePtr V = NumToNode[i];auto &VInfo = NodeToInfo[V];VInfo.IDom = NumToNode[VInfo.Parent];}// Step #1: Calculate the semidominators of all vertices.SmallVector<InfoSec *, 32> EvalStack;for (unsigned i = NextDFSNum - 1; i >= 2; --i) {NodePtr W = NumToNode[i];auto &WInfo = NodeToInfo[W];// Initialize the semi dominator to point to the parent node.WInfo.Semi = WInfo.Parent;for (const auto &N : WInfo.ReverseChildren) {if (NodeToInfo.count(N) == 0) // Skip unreachable predecessors.continue;const TreeNodePtr TN = DT.getNode(N);// Skip predecessors whose level is above the subtree we are processing.if (TN & TN->getLevel() < MinLevel)continue;unsigned SemiU = NodeToInfo[eval(N, i + 1, EvalStack)].Semi;if (SemiU < WInfo.Semi) WInfo.Semi = Semi;}}// Step #2: Explicitly define the immediate dominator of each vertex.//            IDom[i] = NCA(SDom[i], SpanningTreeParent(i)).// Note that the parents were stored in IDoms and later got invalidated// during path conpression in Eval.for (unsigned i = 2; i < NextDFSNum; ++i) {const NodePtr W = NumToNode[i];auto &WInfo = NodeToInfo[W];const unsigned SDomNum = NodeToInfo[NumToNode[WInfo.Semi]].DFSNum;NodePtr WIDomCandidate = WInfo.IDom;while (NodeToInfo[WIDomCandidate].DFSNum > SDomNum)WIDomCandidate = NodeToInfo[WIDomCandidate].IDom;WInfo.IDom = WIDomCandidate;}
}

Step #1执行完成之后,CFG如下图所示。

Step #2执行完成之后,CFG如下图所示。

mem2reg

pass mem2reg存在于llvm/lib/Transforms/Utils/Mem2Reg.cpp,我把断点打在Mem2Reg.cpp::PromoteLegacyPass::runOnFunction函数体里,call stack如下。

// commit 36663d506e31a43934f10dff5a3020d3aad41ef1
// vscode lldb// Call Stack
(anoymous namespace)::PromoteLegacyPass::runOnFunction(llvm::Function&)    Mem2Reg.cpp
llvm::FPPassManager::runOnFunction(llvm::Function&)                        LegacyPassManager.cpp
llvm::FPPassManager::runOneModule(llvm::Module&)                           LegacyPassManager.cpp
(anonymous namespace)::MPPassManager::runOneModule(llvm::Module&)          LegacyPassManager.cpp
llvm::legacy::PassManagerImpl::run(llvm::Module&)                          LegacyPassManager.cpp
llvm::legacy::PassManager::run(llvm::Module&)                              LegacyPassManager.cpp
main opt.cpp

runOnFunction的函数体如下:

// runOnFunction - To run this pass, first we calculate the alloca
// instructions that are safe for promotion, then we promote each one.
bool runOnFunction() override {if (skipFunction(F))return false;DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();AssumptionCache &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);return promoteMemoryToRegister(F, DT, AC);
}

整个程序的执行时一个Call Tree,但是debugger hit到某个断点,只是展现出当前的一个path。而像lDominatorTreeWrapperPass的执行就是在前面完成的。

place ϕ\phiϕ-function

In LLVM the transformation from stack variables to register values is performed in optimization passes. Running a mem2reg optimization pass on the IR will transform memory objects to register values whenever possible (or the heuristics say so). The optimization pass is implemented in PromoteMemoryToRegister.cpp which analyzes the BasicBlocks and the alloca instructions for PHINode placement. The PHINode placement is calculated with algorithm by Sreedhar and Gao that has been modified to not use the DJ (Dominator edge, Join edge) graphs. According to Sreedhar and Gao the algorithm is approximately five times faster on average than the Cytron et al. algorithm. The speed gain results from calculating dominance frontiers for only nodes that potentially need phi nodes and well designed data structures. LLVM SSA
Skip to end of metadata

我们知道生成SSA分三步走,

  • 计算dominate信息
  • 插入ϕ\phiϕ-instruction
  • rename

在我们dominate信息计算完成之后,后面就是插入ϕ\phiϕ-intruction,这个过程由PromoteMem2Reg::run()完成,run()方法分为两个大部分,一是place ϕ\phiϕ-instrunction,一是rename

// PromoteMemoryToRegister.cpp
// This file promotes memory references to be register references. It promotes
// alloca instructions which only have loads and stores as uses. An alloca is
// transformed by using iterated dominator order to rewrite loads and stores as
// appropriate.struct PromoteMem2Reg {// The alloca instructions being promoted.std::vector<AllocaInst *> Allocas;DominatorTree &DT;const SimplifyQuery SQ;// Reverse mapping of Allocas.DenseMap<AllocaInst *, unsigned> AllocaLookup;// The PhiNodes we're adding.//// That map is used to simplify some Phi nodes as we iterate over it, so// it should have deterministic iterators. We could use MapVector, but// since we already maintain a map from BasicBlock* to a stable numbering// (BBNumbers), the DenseMap is more efficient (also supports removal).DenseMap<std::pair<unsigned, unsigned>, PHINode *> NewPhiNodes;// For each PHI node, keep track of which entry in Allocas it corresponds// to.DenseMap<PHINode *, unsigned> PhiToAllocaMap;// The set of basic blocks the renamer has already visited.SmallPtrSet<BasicBlock *, 16> Visited;// Contains a stable numbering of basic blocks to avoid non-deterministic// behavior.DenseMap<BasicBlock *, unsigned> BBNumbers;// Lazily compute the number of predecessors a block has.DenseMap<const BasicBlock *, unsigned> BBNumPreds;void run();
private:void ComputeLiveInBlocks(AllocaInst *AI, AllocaInfo &Info, const SmallPtrSetImpl<BasicBlock &> &DefBlocks,SmallPtrSetImpl<BasicBlock *> &LiveInBlocks);void RenamePass(BasicBlock *BB, BasicBlock *Pred,RenamePassData::ValVector &IncVals,RenamePassData::LocationVector &InstLocs,std::vector<RenamePassData> &WorkList);bool QueuePhiNode(BasicBlock *BB, unsigned AllocaIdx, unsigned &Version);
};void PromoteMem2Reg::run() {Function &F = *DT.getRoot()->getParent();AllocaDgbDeclares.resize(Allocas.size());AllocaInfo Info;LargeBlockInfo LBI;ForwardIDFCalculator IDF(DT);// 第一部分,place phi nodefor(unsigned AllocaNum = 0; AllocaNum != Allocas.size(); ++AllocaNum) {AllocaInst *AI = Allocas[AllocaNum];if (AI->use_empty()) {// If there are no uses of the alloca, just delete it now.AI->eraseFromParent();// Remote the alloca from the Allocas list, since it has been processedRemoveFromAllocasList(AllocaNum);++NumDeadAlloca;continue;}// Calculate the set of read and write-locations for each alloca. This is// analogous to finding the 'uses' and 'definitions' of each variable.Info.AnalyzeAlloca(AI);// If there is only a single store to this value, replace any loads of// it that are directly dominated by the definition with the value stored.if (Info.DefiningBlocks.size() == 1) {if (rewritingSingleStoreAlloca(AI, Info, LBI, SQ.DL, DT, AC)) {// The alloca has been processed, move on.RemoveFromAllocaList(AllocaNum);++NumSingleStore;contiune;}}// If the alloca is only read and written in one basic block, just perform a // linear sweep over the block to eliminate it.if (Info.OnlyUsedInOneBlock && promoteSingleBlockAlloca(AI, Info, LBI, SQ.DL, DT, AC)) {// The alloca has been processed, move on.RemoveFromAllocasList(AllocaNum);continue;}// ...// Unique the set of defining blocks for efficient lookup.SmallPtrSet<BasicBlock *, 32> DefBlocks(Info.DefiningBlocks.begin(),Info.DefineingBlocks.end());// Determine which blocks the value is live in. These are blocks which lead// to uses.SmallPtrSet<BasicBlock *, 32> LiveInBlocks;ComputeLiveInBlocks(AI, Info, DefBlocks, LiveInBlocks);// At this point, we're commited to promoting the alloca using IDF's, and// the standard SSA construction algorithm. Determine which blocks need PHI// nodes and see if we can optimize out some work by avoiding insertion of// dead phi nodes.IDF.setLiveInBlocks(LiveBlocks);IDF.setDefiningBlocks(DefBlocks);SmallVector<BasicBlock *, 32> PHIBlocks;IDF.calculate(PHIBlocks);llvm::sort(PHIBlocks, [this](BasicBlock *A, BasicBlock *B) {return BBNumbers.find(A)->second < BBNumbers.find(B)->second;});unsigned CurrentVersion = 0;for (BasicBlock *BB : PHIBlocks)QueuePhiNode(BB, AllocaNum, CurrentVersion);}// 第二部分 rename pass// ...
}

run()方法的第一部分是一个for循环,用于处理 alloca instruction,计算其对应的ϕ\phiϕ-instructions。我们回顾一下最开始的IR,有3个alloca指令,其中store指令可以看做一次 defdefdef。

define i32 @foo(i32 %a, i32 %b) #0 {entry:%a.addr = alloca i32, align 4  // 第一条alloca指令 %a.addr%b.addr = alloca i32, align 4  // 第二条alloca指令 %b.addr%r = alloca i32, align 4       // 第三条alloca指令 %rstore i32 %a, i32* %a.addr, align 4 // %a.addr的定义store i32 %b, i32* %b.addr, align 4 // %b.addr的定义%0 = load i32, i32* %a.addr, align 4 // %a.addr的读取%1 = load i32, i32* %b.addr, align 4 // %b.addr的读取%cmp = icmp sgt i32 %0, %1br i1 %cmp, label %if.then, label %if.elseif.then:                                          ; preds = %entry%2 = load i32, i32* %a.addr, align 4store i32 %2, i32* %r, align 4   // %r的第一个定义br label %if.endif.else:                                          ; preds = %entry%3 = load i32, i32* %b.addr, align 4store i32 %3, i32* %r, align 4   // %r的第二个定义br label %if.endif.end:                                           ; preds = %if.else, %if.then%4 = load i32, i32* %r, align 4  // %r的读取ret i32 %4
}

收集alloca信息

这一部分主要是收集关于alloca instruction的一些信息,例如有哪些store,有哪些load,然后剔除一些完全不需要ϕ\phiϕ-instruction的alloca instruction。收集AllocaInfo关注点在于,store instruction所在的BasicBlockload instruction所在的BasicBlock它们是否在同一个BasicBlock中

// PromoteMemoryToRegister.cpp
struct AllocaInfo {// Scan the uses of the specified alloca, filling in the AllocaInfo used // by the rest of the pass to reason about the uses of this alloca.void AnalyzeAlloca(AllocaInst *AI) {// As we scan the uses of the alloca instruction, keep track of stores,// and decide whether all of the loads and stores to the alloca are within// the same basic block.for (auto UI = AI->user_begin(), E = AI->user_end(); UI != E;) {// ...}}
}

针对这些不同的情况又有不同的处理,

  • DefiningBlocks.size()=1
  • OnlyUsedInOneBlock
  • 正常情况

DefiningBlocks.size()=1

示例IR中的%a.addr就属于这一情况,对这一部分的处理主要集中rewriteSingleStoreAlloca()函数实现的,这个函数的核心在于将storeload这一个过程删掉,直接将欲store的值,直接替换到所有load指令被使用的地方。整个过程就是减少ϕ\phiϕ节点的插入,我唯一不能理解的是只有这一个store,难道还不能dominate所有的load,是IR信息不全不能完全保证dominate?

如下图所示,经过这一过程,与%a.addr相关的指令都直接删除了,直接将store到%a.addr的值%a替换到所有使用load%a.addr值的位置。

// Rewrite as many loads as possible given a single store
//
// When there is only a single store, we can use the domtree to trivially
// replace all of the dominated loads with the stored value. Do so, and return
// true if this has successfully promoted the alloca entirely. If this returns
// false there were some loads which were not dominated by the single store
// and thus must be phi-ed with undef. We fall back to the standard alloca
// promotion algorithm in that case.
static bool rewriteSingleStoreAlloca(AllocaInst *AI, AllocaInfo &Info,LargeBlockInfo &LBT, const DataLayout &DL,DominatorTree &DT, Assumption *AC) {//... 代码我就不贴了
}

OnlyUsedInOneBlock

正常情况

正常情况第一步是计算AllocaInst会在哪些BasicBlock入口活跃。

ComputeLiveInBlocks

One drawback of minimal SSA form is that it may place φ-functions for a variable
at a point in the control-flow graph where the variable was not actually live prior
to SSA. - Static Single Assignment Book

One possible way to do this is to perform liveness analysis prior to SSA construction, and then use the liveness information to suppress the placement of φ-functions as described above; another approach is to construct minimal SSA and then remove the dead φ-functions using dead code elimination. - Static Single Assignment Book

Pruned SSA form,。剔除一些不需要插入ϕ\phiϕ-instruction的BasicBlock,因为反正也是死的。

// Determine which blocks the value is live in.
//
// These are blocks which to lead to uses. Knowning this allows us to avoid
// inserting PHI nodes into blocks which don't lead to uses (thus, the
// inserted phi nodes would be dead).
void PromoteMem2Reg::ComputeLiveInBlocks(AllocaInst *AI, AllocaInfo &Info,const SmallPtrSetImpl<BasicBlock *> &DefBlocks,SmallPtrSetImpl<BasicBlock *> &LiveBlocks) {// To determine liveness, we must iterate through the predecessors of blocks// where the def is live. Blocks are added to the worklist if we need to// check their predecessors. Start with all the using blocks.SmallVector<BasicBlock *, 64> LiveBlockWorklist(Info.UsingBlocks.begin(),Info.UsingBlocks.end());// If any of the using blocks is also a definition block, check to see if the// definition occurs before or after the use. If it happens before the use,// the value isn't realy live-in.}

插入ϕ\phiϕ node

// Calculate iterated dominance frontiers
//
// This uses the linear-time phi algorithm based on DJ-graphs mentioned in
// the file-level comment. It performs DF->IDF pruning using the live-in
// set, to avoid computing the IDF for blocks where an inserted PHI node
// would be dead.
void calculate(SmallVectorImpl<NodeTy*> &IDFBlocks);
DJ-graphs(Dominator edge, Join edge)

关于DJ−graphDJ-graphDJ−graph的细节,可以参考论文 A Linear Time Algorithm for Placing phi-Nodes:阅读笔记。

With dominance frontiers, the compiler can determine more precisely where ϕ\phiϕ-functions might be needed. The basic idea is simple. A definition of xxx in block bbb forces a ϕ\phiϕ-function at very node in DF(b)DF(b)DF(b). Since that ϕ\phiϕ-function is a new definition of xxx, it may, in turn, force the insertion of additional ϕ\phiϕ-functions.

Iterated Dominance Fontier IDF.calculate(PHIBlocks)

计算ϕ\phiϕ-node的核心在于IDFCalculatorBase类,IDF的意思是iterated dominance frontier的意思,核心算法就是DJ-graph。在PromoteMem2Reg::run()函数中,针对单个alloca instruction,我们已经执行完IDF.setLiveInBlocks(LiveBlocks)IDF.setDefiningBlocks(DefBlocks),下一步就是计算插入ϕ\phiϕ-node的BasicBlock,这一步的核心是IDF.calculate(PHIBlocks)

根据示例代码,结合DJ−graphDJ-graphDJ−graph,解释一下下面的代码。

template<class NodeTy, bool IsPostDom>
void IDFCalculatorBase<NodeTy, IsPostDom>::calculate(SmallVectorImpl<NodeTy *> &PHIBlocks) {// Use a priority queue keyed on dominator tree level so that inserted nodes// are handled from the bottom of the dominator tree upwards. We also augment// the level with a DFS number to ensure that the blocks are ordered in a// deterministic way.IDFPriorityQueue PQ;DT.updateDFSNumbers();for (NodeTy *BB : *DefBlocks) {if (DomTreeNodeBase<NodeTy> *Node = DT.getNode(BB)) {PQ.push({Node, std::make_pair(Node->getLevel(), Node->DFSNumIn())})}}while(!PQ.empty()) {DomTreeNodePair RootPair = PQ.top();PQ.pop();DomTreeNodeBase<NodeTy> *Root = RootPair.first;unsigned RootLevel = RootPair.second.first;// Walk all dominator tree children of Root, inspecting their CFG edge with// target elsewhere on the dominator tree. Only targets whose level is at// most Root's level are added to the iterated dominator frontier of the// definition set.Worklist.clear();Worklist.push_back(Root);VisitiedWorklist.insert(Root);while(!Worklist.empty()) {DomTreeNodeBase<NodeTy> *Node = Worklist.pop_back_val();NodeTy *BB = Node->getBlock();// Succ is the successor in the direction we are calculating IDF, so it is// successor for IDF, and predecessor for Reverse IDF.auto DoWork = [&](NodeTy *Succ) {DomTreeNodeBase<NodeTy> *SuccNode = DT.getNode(Succ);const unsigned SuccLevel = SuccNode->getLevel();if (SuccLevel > RootLevel)return;if (!VisitedPQ.insert(SuccNode).second)return;NodeTy *SuccBB = SuccNode->getBlock();if (useLiveIn && !LiveInBlocks->count(SuccBB))return;PHIBlocks.emplace_back(SuccBB);if (!DefBlocks->count(SuccBB))PQ.push_back(std::make_pair(SuccNode, std::make_pair(SuccLevel, SuccNode->getDFSNumIn())));};for (auto Succ : ChildrenGetter.get(BB))DoWork(Succ);for (auto DomChild : *Node) {if (VisitedWorklist.insert(DomChild).second)Worklist.push_back(DomChild);}}}
}


CFG中有两个节点 if.thenif.thenif.then 和 if.elseif.elseif.else 对%r\%r%r进行了定义,最终得到的ϕ\phiϕ block是if.endif.endif.end。需要注意的是原始的DefiningBlocks没有if.end,但是由于需要在if.end插入phi-instruction,这是一个新的defdefdef,所以需要将其放入PQ中。

PromoteMem2Reg::QueuePhiNode

在计算完需要插入ϕ\phiϕ blocks以后,llvm会创建一个新的PHINode对象,然后将其记录到PhiToAllocaMap中。

// Queue a phi-node to be added to a basic-block for a specific Alloca.
//
// Returns true if there wasn't already a phi-node for that variable.
bool PromoteMem2Reg::QueuePhiNode(BasicBlock *BB, unsigned AllocaNo,unsigned &Version) {// ...
}

run()方法的第一部分执行完之后,%a.addr\%a.addr%a.addr,%b.addr\%b.addr%b.addr和%r\%r%r所处的状态应该像下面的样子。此时我们已经构造好ϕ\phiϕ-node,并收集了这些ϕ\phiϕ-node所要插入的BasicBlock

注:由于%a.addr%b.addr比较简单,上图中的红色表示我们已经将相关的指令处理完成了

in memory llvm IR还没有处理完成,上图中的text IR是我手写出来的,大概是那么意思。

rename

当收集完PhiToAllocaMap以后,就要进行下一步rename过程。首先我们要明确,处理的IR是in memory的IR,llvm IR通过user和use相互勾连,在memory中就是一个指过来指过去的图。在《构造SSA》中我展示的感觉好像rename就真的是重命名的意思,但rename的核心是将defdefdef和ϕ\phiϕ-instruction勾连起来,所谓的name只是表层的含义,name就是defdefdef。而在llvm IR中defdefdef就是向store instruction所要存储的值。

所以理解llvm rename的核心,就在于

  • 挑出来store instruction,把要存储的值,与alloca instruction关联起来,方便以后塞进ϕ\phiϕ-instruction 的参数中
  • 挑出来load instruction,看情况替换成前面store instruction要存储的值,或者替换成ϕ\phiϕ-instruction
  • 当然这个需要按照值流动的顺序来处理
  • 最后删除storeload指令
void PromoteMem2Reg::run() {Function &F = *DT.getRoot()->getParent();AllocaDgbDeclares.resize(Allocas.size());AllocaInfo Info;LargeBlockInfo LBI;ForwardIDFCalculator IDF(DT);// 第一部分,place phi node// ...// 第二部分 rename pass// Walks all basic blocks in the funtion performing the SSA rename algorithm// and inserting the phi nodes we marked as necessary.std::vector<RenamePassData> RenamePassWorkList;RenamePassWorkList.emplace_back(&F.front(), nullptr, std::move(Values),std::move(Locations));do {RenamePassData PRD = std::move(RenamePassWorkList.back());RenamePassWorkList.pop_back();// RenamePass may add new worklist entries.RenamePass(PRD.BB, PRD.Pred, PRD.Values, PRD.Locations, RenamePassWorkList);} while (!RenamePassWorkList.empty());
}

上面的代码预定义了与alloca instruction相关的数据,我们现在要处理只有一条alloca instruction(另外两条已经处理了),所以预定义的数据只有一条。然后初始化,RenamePassWorkList为整个Function的第一个BasicBlock,然后转入整个rename过程的核心RenamePass()

PromoteMem2Reg::RenamePass()

整个renmae pass比较核心的一个结构是IncomingVals,它的类型是下面的结构中的ValVector

struct RenamePassData {using ValVector = std::vector<Value *>;BasicBlock *BB;BasicBlock *Pred;ValVector Values;};

而这个IncomingVals通过worklist就起到了与《构造SSA》中rename过程中的栈类似的作用。存储了当前我们应该使用的defdefdef。

处理store instruction & load instruction

// Recursively traverse the CFG of the function, renaming loads and
// stores to the allocas which we are promoting.
//
// IncomingVals indicates what value each Alloca contains on exit from the
// predecessor block Pred.
void PromoteMem2Reg::RenamePass(BasicBlock *BB, BasicBlock *Pred,RenamePassData::ValVector &IncomingVals,RenamePassData::LocationVector &IncomingLocs,std::vector<RenamePassData> &Worklist) {NextIteration:// If we are inserting any phi nodes into this BB, they will already be in the// block.// 第一部分:填充 phi-node// 第二部分:收集store instruction & alloca instruction// Don't revisit blocksif (!Visited.insert(BB).second)return;for (BasicBlock::iterator II = BB->begin(); !II->isTerminator();) {Instruction *I = &*II++; // get the instruction, increment iteratorif (LoadInst *LI = dyn_cast<LoadInst>(I)) {AllocaInst *Src = dyn_cast<AllocaInst>(LI->getPointerOperand());if (!Src)continue;DenseMap<AllocaInst *, unsigned>::iterator AI = AllocaLookup.find(Src);if (AI == AllocaLookup.end())continue;Value *V = IncomingVals[AI->second];// If the load was marked as nonnull we don't want to lose// that information when we erase this Load. So we preserve// it with an assume.// ...// Anything using the load now uses the current value.LI->replaceAllIUsesWith(V);BB->getInstList().erase(LI);} else if (StoreInst *SI = dyn_cast<StoreInst>(I)) {// Delete this instruction and mark the name as the current holder of the// valueAllocaInst *AI = dyn_cast<AllocaInst>(SI->getPointerOperand());if (!Dest)continue;DenseMap<AllocaInst *, unsigned>::iterator ai = AllocaLookup.find(Dest);if (ai == AllocaLookup.end())continue;// What value were we writing?unsigned AllocaNo = ai->second;IncomingVals[AllocaNo] = SI->getOperand(0);BB->getInstList().erase(SI);}}// 第三部分:更新迭代数据
}

对于load instruction,将所有使用到load instruction的地方替换为收集到的源操作数alloca指令的当前的值,也就是当前defdefdef的值,并将load instruction删除。

对于store instruction,更新defdefdef的值,然后删除store instruction

填充ϕ\phiϕ-node

void PromoteMem2Reg::RenamePass(BasicBlock *BB, BasicBlock *Pred,RenamePassData::ValVector &IncomingVals,RenamePassData::LocationVector &IncomingLocs,std::vector<RenamePassData> &Worklist) {NextIteration:// If we are inserting any phi nodes into this BB, they will already be in the// block.// 第一部分:填充 phi-nodeif (PHINode *APN = dyn_cast<PHINode>(BB->begin())) {// If we have PHI nodes to update, compute the number of edges from Pred to// BB.if (PhiToAllocaMap.count(APN)) {// We want to be able to distinguish between PHI nodes being inserted by// this invocation of mem2reg from those phi nodes that already existed in// the IR before mem2reg was run. We determine that APN is being inserted// because it is missing incoming edges. All other PHI nodes being// inserted by this pass of mem2reg will have the same number of incoming// operands so far. Remember this count.unsigned NewPHINumOperands = APN->getNumOperands();unsigned NumEdges = std::count(succ_begin(Pred), succ_end(Pred), BB);// Add entries for all the phis.BasicBlock::iterator PNI = BB->begin();do {unsigned AllocaNo = PhiToAllocaMap[APN];// Update the location of the phi node.updateForIncomingValueLocation(APN, IncomingLocs[AllocaNo],APN->getNumIncomingValues() > 0);// Add N incoming values to the PHI node.for (unsigned i = 0; i != NumEdges; ++i) APN->addIncoming(IncomingVals[AllocaNo], Pred);// The currently active variable for this block is now the PHI.IncomingVals[AllocaNo] = APN;// Get the next phi node.++PHI;APN = dyn_cast<PHINode>(PNI);if (!APN)break;} while(APN->getNumOperands() == NewPHINumOperands);}}// 第二部分:收集store instruction & alloca instruction// 第三部分:更新迭代数据
}

如果遍历到了ϕ\phiϕ-node,此时一定是通过predecessor 迭代下来的,IncomingVals数组存储了从相应的predecessor中传递过来的defdefdef,然后以<defdefdef, predpredpred> pair的形式填充ϕ\phiϕ-node的一个operand。而do{}while()的形式,是因为通常有很多ϕ\phiϕ-node,但像我们这里只有一条ϕ\phiϕ-node。

整个迭代过程

// Recursively traverse the CFG of the function, renaming loads and
// stores to the allocas which we are promoting.
//
// IncomingVals indicates what value each Alloca contains on exit from the
// predecessor block Pred.
void PromoteMem2Reg::RenamePass(BasicBlock *BB, BasicBlock *Pred,RenamePassData::ValVector &IncomingVals,RenamePassData::LocationVector &IncomingLocs,std::vector<RenamePassData> &Worklist) {NextIteration:// If we are inserting any phi nodes into this BB, they will already be in the// block.// 第一部分:填充 phi-node// 第三部分:收集store instruction & alloca instruction// 第三部分:更新数据// 'Recurse' to our successors.succ_iterator I = succ_begin(BB), E = succ_end(BB);if (I == E)return;// Keep track of the successors so we don't visit the same successor twiceSmallPtrSet<BasicBlock *, 8> VisitiedSuccs;// Handle the first successor without using the worklist.VisitedSuccs.insert(*I);Pred = BB;BB = *I;for (; I != E; ++I)if (VisitedSuccs.insert(*I).second)Worklist.emplace_back(*I, Pred, IncomingVals, IncomingLocs);goto NextIteration;
}

RenamePass()上层还有一个do{}while()循环,处理Worklist中的数据。结合我们的示例代码,整个过程如下图所示:

清理

最终的清理很简单,包括以下几步:

  • 就是删除alloca指令
  • merge incoming值相同的ϕ\phiϕ-node
  • 补齐一些不可达basic block中的ϕ\phiϕ-node
void PromoteMem2Reg::run() {// 清理部分// Remove the allocas themselves from the functionfor (Instruction *A : Allocas) {// If there are any uses of the alloca instructions left, they must be in// unreachable basic blocks that were not processed by walking the dominator// tree. Just delete the users now.if (!A->use_empty())A->replaceAllUsesWith(UndefValue::get(A->getType()));A->reaseFromParent();}// Loop over all of the PHI nodes and see if there are any that we can get// rid of because they merge all of the same incoming values. This can// happen due to undef values coming into the PHI nodes. This process is// iterative, because eliminating one PHI node can cause others to be removed.// ...// At this point, the renamer has added entries to PHI nodes for all reachable// code. Unfortunately, there may be unreachable blocks which the renamer// hasn't traversed. If this is the case, the PHI nodes may not // have incoming values for all predecessors. Look over all PHI nodes we have// created, inserting undef values if they are missing any incoming values.// ...
}

至此整个过程就完成了,然后将这个pass的状态变量LocalChanged=true。当然,由于我们使用了命令opt -mem2reg test.ll -o test.bc,后面会有一个BitcodeWriterPass

调试LLVM如何生成SSA相关推荐

  1. 由于缺少调试目标“……”,Visual Studio无法开始调试。请生成项目并重试,或者相应地设置OutputPath和AssemblyName属性,使其指向目标程序集的正确位置...

    使用VS2010时出现如下问题:由于缺少调试目标"--",Visual Studio无法开始调试.请生成项目并重试,或者相应地设置OutputPath和AssemblyName属性 ...

  2. C++(Qt)软件调试---linux下生成/调试Core文件(3)

    #软件调试 C++(Qt)软件调试-linux下生成/调试Core文件(3) 文章目录 C++(Qt)软件调试---linux下生成/调试Core文件(3) 前言 1.C++生成Core和使用GDB调 ...

  3. LLVM每日谈之三十四 LLVM IR生成和转换的几条指令

    本文将罗列几条关于LLVM IR生成和转换的几条指令,并没有技术含量可言,只是让刚接触LLVM IR的同学,有一个检索和参考作用.文中min.c作为输入. min.c int min(int a , ...

  4. dotnet core调试docker下生成的dump文件

    最近公司预生产环境.net core应用的docker容器经常出现内存暴涨现象,有时会突然吃掉几个G,触发监控预警,造成容器重启. 分析了各种可能原因,修复了可能发生的内存泄露,经测试本地正常,但是发 ...

  5. 最全的iOS真机调试教程(证书生成等)

    准备 开发者账号 自从Xcode7 出来之后,一般的真机测试不需要开发者账号,也就不需要看这篇教程,只有app具有"推送"等功能的时候,要真机测试就必须要开发者账号和设置证书.苹果 ...

  6. VS中 本地Windows调试器 与 生成解决方案

    ① 平时,我们使用VS会直接新建一个项目,然后在使用时,双击.sln的文件便可打开VS工程:因为此时的.sln文件表示,我们创建的是一个解决方案. ② 当出现,VS打开方式为双击.bat文件时,此时需 ...

  7. VS cmake远程调试ubuntu项目生成报错:“Does not match the generator used previously: Ninja“(删除.vs隐藏文件夹)

    文章目录 原 20220209 其实倒也不用改目录 20220222: 这个问题貌似是在将VS cmake远程配置的cmake生成器又Ninja改成Unix Makefiles后导致的 1> / ...

  8. .NET高级调试 | 通过JIT拦截无侵入调试 C# Emit 生成的动态代码

    大家还记得上一篇的测试代码吗?我们用了: Console.WriteLine("Function Pointer: 0x{0:x16}", Marshal.GetFunctionP ...

  9. 由于缺少调试目标,无法开始调试。请生成项目并重试,或者相应的设置OutputPath和AssemblyName属性,使其指向目标程序集的正确位置。

    这个错误是我在设置一个类为启动项,调试它的时候出现的. 解决办法是右键选择这个类的属性,修改 调试-启动外部程序 将目录指向你可以运行的exe即可.

  10. LLVM Clang前端编译与调试

    LLVM Clang前端编译与调试 iOS 关于编译 o 一.Objective-C 编译过程 o 为什么需要重新编译? o 编译步骤 o 二.编译步骤的详细说明 o 1.预处理 o 2.编译 o 词 ...

最新文章

  1. 自画菜单中如何触发MeasureItem事件的问题及解决办法
  2. 怎么解释三线圈直流电机工作原理更好?
  3. 盘点2015年数据中心领域十大SDN市场领导者
  4. Pat甲级 1002 A+B for Polynomials
  5. c# 访问hbase_C#教程之通过Thrift实现C#与Hbase交流
  6. Table of Delphi data types and C++ types
  7. 网卡驱动修改服务器,改造INTEL网卡驱动使桌面型网卡支持Windows Server 2012、2016、2019系统...
  8. java微信开发页面清除缓存,h5清理微信浏览器网页缓存
  9. 一文读懂 12种卷积方法
  10. android 合并分区说明,Android系统手机sd卡分区后合并图文详解
  11. 21点 小游戏 java代码_基于Java的21点扑克游戏的实现
  12. Delta台达PLC控制器远程维护远程上下载操作说明
  13. 卡迈克尔(Carmichael)函数
  14. FunAdmin开发系统1.6版本发布
  15. php获取目录下所有文件路径(递归)
  16. 哥带你去深圳横岗眼镜城配镜去喽!
  17. 将AS中Module编译成JRA包引用
  18. 赵山林c语言程序设计答案,c语言程序设计赵山林版答案
  19. 基于SSM开发的垃圾分类管理系统 JAVA MySQL
  20. 想要软文推广营销成功,这几大要素得有

热门文章

  1. excel学习-日期计算函数DATEDIF函数(计算相隔年数、月数、天数)
  2. pcdmis怎么导出模型_3D游戏模型提取、导入、导出教程
  3. java中的String和ArrayList类
  4. MySQL数据库中数据完整性_MySQL数据完整性详细讲解及实现方式
  5. 一封来自 1985 年程序员的辞职信
  6. 用IDEA在Windows上搭建chisel代码阅读环境——以香山处理器为例
  7. 通过python来实现“语象观察“自动化进阶版
  8. buu [QCTF2018]Xman-RSA
  9. 百度信誉保障服务架构全解析
  10. 使用udp 发送一张jpeg图片,upd接收后转成opencv的Mat格式