openmp官方源码_MNN推理过程源码分析笔记(一)主流程

在正式开始推理代码分析之前, 回顾下 MNN整体结构

推理分为三个大部分

Engine
Backends
Runtime Optimize

那么问题来了,从哪里开始,怎么入手呢? 我的心得是源码分析不能直接一头扎进去, 陷入迷茫.

应该遵循两个原则

从外到内 : 外部sdk接口开始,顺藤摸瓜分析.
从整体到局部: 先整体上了解这个东西,可以先阅读官方资料知道整体框架,以及哪些模块,分别起到什么作用.

建议按照之前使用方法,把MNN自带的Android demo编译安装运行,并且简单过一遍调用流程, 这里采用从外到内的方法着手分析.

`Android`上用`MNN SDK`做推理

以下内容参考自MNN参考文档

MNN中有两个概念:解释器Interpreter和会话Session。

解释器负责加载模型, 持有模型数据;

会话是由解释器创建的，它是推理数据的持有者。多个推论可以共享相同的模型数据，也就是说，多个会话可以共享一个解释器

`Session`

创建会话得先有解释器, 解释器就是把模型文件加载进来的, 自然就有如下API:

根据指定MNN模型文件,创建解释器Interpreter

 /*** @brief create net from file.* @param file  given file.* @return created net if success, NULL otherwise.*/static Interpreter* createFromFile(const char* file);

解释器提供了创建会话的方法Interpreter::createSession, 通过传递一个config文件配置这个会话.

 /*** @brief create session with schedule config. created session will be managed in net.* @param config session schedule config.* @return created session if success, NULL otherwise.*/Session* createSession(const ScheduleConfig& config);

会话配置`Scheduling configuration`

 /** session schedule config */struct ScheduleConfig {/** which tensor should be kept */std::vector<std::string> saveTensors;/** forward type,   CPU 或者GPU */MNNForwardType type = MNN_FORWARD_CPU;/** number of threads in parallel  线程数 */int numThread = 4;/** subpath to run */struct Path {std::vector<std::string> inputs;std::vector<std::string> outputs;enum Mode {/*** Op Mode* - inputs means the source op, can NOT be empty.* - outputs means the sink op, can be empty.* The path will start from source op, then flow when encounter the sink op.* The sink op will not be compute in this path.*/Op = 0,/*** Tensor Mode (NOT supported yet)* - inputs means the inputs tensors, can NOT be empty.* - outputs means the outputs tensors, can NOT be empty.* It will find the pipeline that compute outputs from inputs.*/Tensor = 1};/** running mode */Mode mode = Op;};Path path;/** backup backend used to create execution when desinated backend do NOT support any op */MNNForwardType backupType = MNN_FORWARD_CPU;/** extra backend config */BackendConfig* backendConfig = nullptr;};

在推理中，主后端计算是由type指定的，缺省值是CPU。由backupType指定的备用后端，当主选择后端不支持模型中的操作符时启用备用后端。

推理路径是指从输入到输出计算过程中涉及到的operator。如果没有指定，它将根据模型结构自动识别。为了节省内存，MNN复用了tensor的内存(除了output tensor)。因此如果你需要保存中间tensor的结果，必须传入中间tensor, 放到saveTensors作为参数, 来避免内存复用。

在进行推断时，可以使用numThread修改线程的数量。但是线程的实际数量取决于部署环境:

•iOS，使用GCD，忽略配置;

•当MNN_USE_THREAD_POOL被启用时，线程的数量取决于第一次配置的线程数量;

•OpenMP，线程数全局设置，实际线程数取决于上次配置的线程数;

此外，可以通过backendConfig设置后端参数。详见下文。

`Backend Configuration`

backend configuration的定义:

 struct BackendConfig {enum MemoryMode {Memory_Normal = 0,Memory_High,Memory_Low};MemoryMode memory = Memory_Normal;enum PowerMode {Power_Normal = 0,Power_High,Power_Low};PowerMode power = Power_Normal;enum PrecisionMode {Precision_Normal = 0,Precision_High,Precision_Low};PrecisionMode precision = Precision_Normal;/** user defined context */void* sharedContext = nullptr;};

MemoryMode, PowerMode和 PrecisionMode 分别是指内存占用,功耗,精度配置,

Input Data

获取`Input Tensor`

两个接口, getSessionInput是获取输入的tensor. 如果模型是有多个input tensor时用第二个接口getSessionInputAll

 /*** @brief get input tensor for given name.* @param session   given session.* @param name      given name. if NULL, return first input.* @return tensor if found, NULL otherwise.*/Tensor* getSessionInput(const Session* session, const char* name);/*** @brief get all output tensors.* @param session   given session.* @return all output tensors mapped with name.*/const std::map<std::string, Tensor*>& getSessionInputAll(const Session* session) const;

`Fill Data`

 auto inputTensor = interpreter->getSessionInput(session, NULL);inputTensor->host<float>()[0] = 1.f;

Backend为cpu时, 直接复制到Input tensot的host属性即可.

`Copy Data`

对于非cpu的backend,参考如下实例

 auto inputTensor = interpreter->getSessionInput(session, NULL);auto nchwTensor = new Tensor(inputTensor, Tensor::CAFFE);// nchwTensor-host<float>()[x] = ...inputTensor->copyFromHostTensor(nchwTensor);delete nchwTensor;

`Image Processing`

 struct Config{Filter filterType = NEAREST;ImageFormat sourceFormat = RGBA;ImageFormat destFormat = RGBA;//Only valid if the dest type is floatfloat mean[4] = {0.0f,0.0f,0.0f, 0.0f};float normal[4] = {1.0f, 1.0f, 1.0f, 1.0f};};

CV::ImageProcess::Config

Specify input and output formats by sourceFormat and destFormat, currently supports RGBA、RGB、BGR、GRAY、BGRA、YUV_NV21
Specify the type of interpolation by filterType, currently supports NEAREST, BILINEAR and BICUBIC
Specify the mean normalization by mean and normal, but the setting is ignored when the data type is not a floating point type

Run Session

调用解释器Interpreter的runSession方法

 /*** @brief run session.* @param session   given session.* @return result of running.*/ErrorCode runSession(Session* session) const;

Run Session with Callbacks

 typedef std::function<bool(const std::vector<Tensor*>&, const std::string& /*opName*/)> TensorCallBack;/** @brief run session.* @param session   given session.* @param before    callback before each op. return true to run the op; return false to skip the op.* @param after     callback after each op. return true to continue running; return false to interrupt the session.* @param sync      synchronously wait for finish of execution or not.* @return result of running.*/ErrorCode runSessionWithCallBack(const Session* session, const TensorCallBack& before, const TensorCallBack& end,bool sync = false) const;

Compared to runSession, runSessionWithCallback provides additional:

Callbacks before each op, which could be used to skip the execution;
Callback after each op, which could be used to interrupt the inference;
Synchronization option, defaults off; when enabled, all backends will wait for inference to complete, ie the function time cost is equal to the inference time cost;

Get Output

`Get Output Tensor`

 /*** @brief get output tensor for given name.* @param session   given session.* @param name      given name. if NULL, return first output.* @return tensor if found, NULL otherwise.*/Tensor* getSessionOutput(const Session* session, const char* name);/*** @brief get all output tensors.* @param session   given session.* @return all output tensors mapped with name.*/const std::map<std::string, Tensor*>& getSessionOutputAll(const Session* session) const;

Interpreter provides two ways to get the output Tensor: getSessionOutput for getting a single output tensor and getSessionOutputAll for getting the output tensor map.

When there is only one output tensor, you can pass NULL to get the tensor when calling getSessionOutput.

`Read Data`

 auto outputTensor = interpreter->getSessionOutput(session, NULL);auto score = outputTensor->host<float>()[0];auto index = outputTensor->host<float>()[1];// ...

The simplest way to read data is to read from host of Tensor directly, but this usage is limited to the CPU backend, other backends need to read the data through deviceid. On the other hand, users need to handle the differences between NC4HW4 and NHWC data layouts.

For non-CPU backends, or users who are not familiar with data layout, copy data interfaces should be used.

`Copy Data`

NCHW example：

 auto outputTensor = interpreter->getSessionOutput(session, NULL);auto nchwTensor = new Tensor(outputTensor, Tensor::CAFFE);outputTensor->copyToHostTensor(nchwTensor);auto score = nchwTensor->host<float>()[0];auto index = nchwTensor->host<float>()[1];// ...delete nchwTensor;

NC4HW4 example：

 auto outputTensor = interpreter->getSessionOutput(session, NULL);auto nc4hw4Tensor = new Tensor(outputTensor, Tensor::CAFFE_C4);outputTensor->copyToHostTensor(nc4hw4Tensor);auto score = nc4hw4Tensor->host<float>()[0];auto index = nc4hw4Tensor->host<float>()[1];// ...delete nc4hw4Tensor;

NHWC example：

 auto outputTensor = interpreter->getSessionOutput(session, NULL);auto nhwcTensor = new Tensor(outputTensor, Tensor::TENSORFLOW);outputTensor->copyToHostTensor(nhwcTensor);auto score = nhwcTensor->host<float>()[0];auto index = nhwcTensor->host<float>()[1];// ...delete nhwcTensor;

By copying the data in this way, the data layout of tensor created with new is only thing you need to pay attention to. copyToHostTensor is responsible for processing the conversion on the data layout (if needed) and the data copy between the backends (if needed).

`Interpreter`解释器执行过程分析

前面从SDK里看到最主要对象是Interpreter, 直接打开Interpreter.hpp看找到主要方法

 class MNN_PUBLIC Interpreter {public:static Interpreter* createFromFile(const char* file);Session* createSession(const ScheduleConfig& config);bool releaseSession(Session* session);ErrorCode runSession(Session* session) const;ErrorCode runSessionWithCallBack(const Session* session, const TensorCallBack& before, const TensorCallBack& end,bool sync = false) const;Tensor* getSessionInput(const Session* session, const char* name);Tensor* getSessionOutput(const Session* session, const char* name);const std::map<std::string, Tensor*>& getSessionOutputAll(const Session* session) const;const std::map<std::string, Tensor*>& getSessionInputAll(const Session* session) const;private:static Interpreter* createFromBufferInternal(Content* net);Content* mNet = nullptr;Interpreter(Content* net);};

打开Interpreter.cpp看具体实现

解释器创建

重点分析createFromFile静态方法

 Interpreter* Interpreter::createFromFile(const char* file) {// 参数检查if (nullptr == file) {MNN_PRINT("NULL file for create interpreter");return nullptr;}// FileLoader读取模型文件std::unique_ptr<FileLoader> loader(new FileLoader(file));if (!loader->valid()) {MNN_PRINT("Create interpreter failed, open %s errorn", file);return nullptr;}bool result = loader->read();if (!result) {MNN_PRINT("Read file errorn");return nullptr;}if (loader->size() == 0) {MNN_PRINT("Create interpreter failed, %s is emptyn", file);return nullptr;}auto net     = new Content; // 文件内容装入 contentbool success = loader->merge(net->buffer);if (!success) {return nullptr;}loader.reset();return createFromBufferInternal(net);  // 创建解释器}struct Content {AutoStorage<uint8_t> buffer;  // raw dataconst Net* net = nullptr;   // MNN Net对象std::vector<std::unique_ptr<Session>> sessions;  // session集合std::map<const Tensor*, const Session*> tensorMap; // 从tensor 到 session};

继续createFromBufferInternal方法

 Interpreter* Interpreter::createFromBufferInternal(Content* net) {if (nullptr == net) {MNN_PRINT("Buffer is null for create interpretern");return nullptr;}// flatbuffer 做反序列化, 先验证内容是否符合格式flatbuffers::Verifier verify((const uint8_t*)(net->buffer.get()), net->buffer.size());if (false == VerifyNetBuffer(verify)) {MNN_PRINT("Invalidate buffer to create interpretern");delete net;return nullptr;}net->net = GetNet(net->buffer.get());  // 反序列化if (nullptr == net->net->oplists()) {MNN_ERROR("Model has no oplistn");delete net;return nullptr;}return new Interpreter(net); // 构造解释器}

其中1解释器Interpreter的构造函数

 Interpreter::Interpreter(Content* net) {mNet      = net;}

其中用flattbuffer取Net

 inline const MNN::Net *GetNet(const void *buf) {return flatbuffers::GetRoot<MNN::Net>(buf);}

用一张图总结下创建解释器的流程

创建`Session`

创建session的函数源码如下, 两个步骤

创建schedule信息, 后面继续分析schedule是什么
构造session对象

 Session* Interpreter::createMultiPathSession(const std::vector<ScheduleConfig>& configs) {if (nullptr == mNet->buffer.get()) {MNN_ERROR("The model buffer has been released. Can't create sessionn");return nullptr;}auto info       = Schedule::schedule(mNet->net, configs);  // 解析Scheduleauto newSession = std::unique_ptr<Session>(new Session(info)); // 创建sessionif (!newSession->valid()) {MNN_PRINT("Invalide Session!!n");return nullptr;}auto result = newSession.get();if (info.validForResize) {result->resize(); // 做resize， 为推理做准备工作}mNet->sessions.emplace_back(std::move(newSession));return result;}Session* Interpreter::createSession(const ScheduleConfig& config) {// 只有一个config构成configs列表return createMultiPathSession({config});}

其中最复杂的是schedule信息处理, 返回的是schedule info, 要理解下其含义

 /** schedule info */struct ScheduleInfo {/** pipelines with backend info   所涉及计算的结点 按照顺序 构成pipeline*/std::vector<std::pair<Backend::Info, std::vector<PipelineInfo>>> pipelineInfo;/** input tensors map */std::map<std::string, Tensor*> inputTensors;/** output tensors map */std::map<std::string, Tensor*> outputTensor;/** all tensors map */std::vector<std::pair<int, std::shared_ptr<Tensor>>> allTensors;/** input valid for resize*/bool validForResize;};

接着继续分析schedule函数

 Schedule::ScheduleInfo Schedule::schedule(const Net* net, const std::vector<ScheduleConfig>& configs) {std::vector<std::shared_ptr<Tensor>> allTensors;ScheduleInfo schedule;if (nullptr == net->oplists()) {MNN_PRINT("Error net for schedulen");return schedule;}bool valid = _setUpTensorInfo(allTensors, net);  // 初始化tensor信息schedule.validForResize = valid;// pipeline以及对应后端backend的构造std::vector<std::pair<Backend::Info, std::vector<PipelineInfo>>> result;for (auto& config : configs) {Backend::Info compute;compute.type      = _getApprociateType(config, net, allTensors, valid);compute.numThread = config.numThread;compute.user      = config.backendConfig;auto oplists      = _scheduleUnit(net, config, allTensors);result.emplace_back(std::make_pair(compute, std::move(oplists)));}schedule.pipelineInfo = std::move(result);// get all used op's output, drop unused op, won't change op order. always insert all Input Ops// 丢弃不需要计算的opstd::set<const Op*> oplists;{for (std::pair<Backend::Info, vector<PipelineInfo>>& pipeline : schedule.pipelineInfo) {for (auto& info : pipeline.second) {oplists.insert(info.op);}}}// 所有op的input tensor 和所有op的output tensorstd::set<int> outputIndexes;std::set<int> inputIndexes;for (auto op : oplists) {if (nullptr != op->outputIndexes()) {auto data = op->outputIndexes()->data();for (int j = 0; j < op->outputIndexes()->size(); ++j) {outputIndexes.insert(data[j]);}}if (nullptr != op->inputIndexes()) {auto data = op->inputIndexes()->data();for (int j = 0; j < op->inputIndexes()->size(); ++j) {inputIndexes.insert(data[j]);}}MNN_ASSERT(OpType_Input != op->type());}// Get All Output and Input   对input,output之间去重std::set<int> inputIndexDiff;std::set<int> outputIndexesDiff;// 如果一个op的output tensor 不是其它op的input, 说明该tensor是整个pipeline的output std::set_difference(outputIndexes.begin(), outputIndexes.end(), inputIndexes.begin(), inputIndexes.end(),std::inserter(outputIndexesDiff, outputIndexesDiff.begin()));// 如果某个op的input tensor,不是其它op的output产生, 说明该tensor是整个pipelien的inputstd::set_difference(inputIndexes.begin(), inputIndexes.end(), outputIndexes.begin(), outputIndexes.end(),std::inserter(inputIndexDiff, inputIndexDiff.begin()));std::unordered_map<std::string, int> tensorNameIndexMap;for (int i = 0; i < net->tensorName()->size(); ++i) {tensorNameIndexMap[net->tensorName()->Get(i)->str()] = i;}for (auto& config : configs) {for (const auto& name : config.saveTensors) { // 默认情况下saveTensors是空的, 如果客户需要取中间tensor的计算结果, 那么传入saveTensors// saveTensors 也要当做output tensorif (tensorNameIndexMap.count(name)) {outputIndexesDiff.insert(tensorNameIndexMap[name]);} else {MNN_PRINT("Bad outputname: %sn", name.c_str());}}}if (net->outputName()) {  // 把模型本身的output tensor 取出来for (int i = 0; i < net->outputName()->size(); ++i) {std::string name = net->outputName()->Get(i)->str();if (tensorNameIndexMap.count(name)) {outputIndexesDiff.insert(tensorNameIndexMap[name]);}}}for (auto index : inputIndexDiff) {  // 最终的input tensorsschedule.inputTensors.insert(std::make_pair(net->tensorName()->GetAsString(index)->c_str(), allTensors[index].get()));TensorUtils::getDescribe(allTensors[index].get())->usage = TensorUsage::INPUT;}for (auto index : outputIndexesDiff) {// 最终的output tensorsschedule.outputTensor.insert(std::make_pair(net->tensorName()->GetAsString(index)->c_str(), allTensors[index].get()));}for (auto& t : allTensors) {  // 全部tensorsschedule.allTensors.emplace_back(std::make_pair(0, std::move(t)));}for (int i = 0; i < net->oplists()->size(); ++i) {auto op = net->oplists()->GetAs<Op>(i);if (nullptr != op->inputIndexes()) {auto data = op->inputIndexes()->data();for (int j = 0; j < op->inputIndexes()->size(); ++j) {auto index = data[j];schedule.allTensors[index].first += 1; // tensor被其他tensor引用次数}}}for (auto outputIndex : outputIndexesDiff) {TensorUtils::getDescribe(schedule.allTensors[outputIndex].second.get())->usage = TensorUsage::OUTPUT;schedule.allTensors[outputIndex].first += 1; // tensor 被 output tesor引用次数}return schedule;}

考虑到schedule比较复杂, 而且其中的数据结构和后面计算相关, 我们拆解下, 逐个分析每个步骤.

1. 初始化`tensor`信息

对应函数是_setUpTensorInfo, 主要工作是分配tensor, 以及input层指定数据大小

 static bool _setUpTensorInfo(std::vector<std::shared_ptr<Tensor>>& allTensors, const Net* net) {bool valid = true;auto& tensors = allTensors;tensors.resize(net->tensorName()->size());for (int i = 0; i < tensors.size(); ++i) {  // 挨个地创建 tensor对象, 分配内存,tensors[i].reset(new Tensor(4)); // NCHW, TODOtensors[i]->setType(DataType_DT_FLOAT);}// 遍历op,  只操作input结点for (int opIndex = 0; opIndex < net->oplists()->size(); ++opIndex) {auto op = net->oplists()->GetAs<Op>(opIndex);if (OpType_Input == op->type()) {MNN_ASSERT(nullptr != op->outputIndexes());auto index      = op->outputIndexes()->data()[0];auto tensor     = tensors[index].get();auto& tb        = tensor->buffer();auto inputParam = op->main_as_Input();if (auto idims = inputParam->dims()) {for (int i = 0; i < idims->size(); ++i) {tb.dim[i].min = 0;int extent    = idims->data()[i];// dim-0 is batch(when input batch is -1, set it to be 1, ignore other dim)if (i == 0 && extent == -1) { // batch size如果没有指定,那 默认是1extent = 1;}if (extent < 0) { // 如果该维度上size 小于0,  模型是有问题的valid = false;}tb.dim[i].extent = extent;  // buffer上指定该维度上的size}tb.dimensions = idims->size();} else {tb.dimensions = 0;}tensor->setType(inputParam->dtype());TensorUtils::getDescribe(tensor)->dimensionFormat = inputParam->dformat();// 数据摆放 比如NCHW}}return valid;}

2. 构造`PipelineInfo`

在schedule函数里设计 PipelineInfo的代码如下:

     // pipeline以及对应后端backend info的构造std::vector<std::pair<Backend::Info, std::vector<PipelineInfo>>> result;for (auto& config : configs) {Backend::Info compute;compute.type      = _getApprociateType(config, net, allTensors, valid);  // 决定backend的类型compute.numThread = config.numThread;compute.user      = config.backendConfig;auto oplists      = _scheduleUnit(net, config, allTensors); // 决定pipeline info列表result.emplace_back(std::make_pair(compute, std::move(oplists)));}schedule.pipelineInfo = std::move(result);

上面代码里涉及两个对象Backend::Info, PipelineInfo, 先搞清楚定义, 然后才能分析逻辑:

     struct Info {/** forward type. */MNNForwardType type = MNN_FORWARD_CPU; // 指定是在cpu 还是gpu,etc 上运行/** for CPU only. number of threads. */int numThread = 4;/** user data. */BackendConfig* user = NULL;enum Mode {// The Op will be run in execution->onExecuteDIRECT = 0,// The Op will be recorded. Run in onExecuteBegin and Wait in onExecuteEndINDIRECT = 1};Mode mode = DIRECT;};/** pipeline info  其实就是描述了一个计算结点信息, 包括 op, 输入和输出tensor*/struct PipelineInfo {/** op */const Op* op;/** input tensors */std::vector<Tensor*> inputs;/** output tensors */std::vector<Tensor*> outputs;};

确定backend类型的函数_getApprociateType, 如下

 static MNNForwardType _getApprociateType(const ScheduleConfig& config, const Net* net, const std::vector<std::shared_ptr<Tensor>>& allTensors, bool inputShapeValid) {MNNForwardType type = config.type;if (MNN_FORWARD_AUTO == config.type) {// Search Backend Exclude MNN_FORWARD_CPUfor (int i = 1; i < MNN_FORWARD_ALL; ++i) {  // 检查下 传递进来的配置的backend type是否支持if (MNNGetExtraBackendCreator((MNNForwardType)i) != nullptr) {type = (MNNForwardType)i;break;}}}auto creator = MNNGetExtraBackendCreator(type); // 根据backend type找 backend creater创建backend, 具体过程后面分析if (nullptr == creator) {MNN_PRINT("Can't Find type=%d backend, use %d insteadn", type, config.backupType);type = config.backupType;}return type;}

确定pipeline info的函数_scheduleUnit,如下:

 static vector<Schedule::PipelineInfo> _scheduleUnit(const Net* net, const ScheduleConfig& configs,const vector<shared_ptr<Tensor>>& allTensors) {vector<Schedule::PipelineInfo> oplists;vector<const Op*> ops;generateScheduleGraph(ops, net, configs, allTensors);  // 找出所有需要参与计算的结点for (const Op* op : ops) {Schedule::PipelineInfo opInfo; // 对这些参与计算的结点,对应创建一个pipeline infoopInfo.op = op;if (nullptr != op->outputIndexes()) { //pipeline info的输出auto data = op->outputIndexes()->data();for (int j = 0; j < op->outputIndexes()->size(); ++j) {opInfo.outputs.push_back(allTensors[data[j]].get());}}if (nullptr != op->inputIndexes()) { //pipeline info的输入auto data = op->inputIndexes()->data();for (int j = 0; j < op->inputIndexes()->size(); ++j) {opInfo.inputs.push_back(allTensors[data[j]].get());}}oplists.emplace_back(opInfo);}return oplists; // 返回pipeline info列表}

其中的generateScheduleGraph函数,用于找到计算output的路径, 因为图里并不是一定所有结点都需要参与计算, 找到那些生成output结点所依赖的结点, 并只计算这部分结点.

 static void generateScheduleGraph(vector<const Op*>& ops, const Net* net, const ScheduleConfig& configs,const vector<shared_ptr<Tensor>>& allTensors) {if (configs.path.inputs.empty() && configs.path.outputs.empty()) { // 没有指定input和putput时,默认所有op都参与计算// Use Default Linear scheduleops.clear();ops.reserve(net->oplists()->size());for (int i = 0; i < net->oplists()->size(); ++i) {auto op = net->oplists()->GetAs<Op>(i);if (op->type() != OpType_Input) {ops.emplace_back(op);}}return;}//如果客户指定了 input/output结点, 那么找到涉及这些结点的路径vector<vector<Op*>> paths = generateSchedulePath(net, configs, allTensors);// 根据涉及到的结点, 构造一个新的graphunique_ptr<DirectedAcyclicGraph<Op*>> graph(new DirectedAcyclicGraph<Op*>());// add Node  把结点加入graphunordered_map<Op*, shared_ptr<Node<Op*>>> opMaps;for (vector<Op*> path : paths) {for (Op* op : path) {if (opMaps.find(op) == opMaps.end()) {OpNodeDef def(op);shared_ptr<Node<Op*>> n = graph->AddNode(def);opMaps.insert(make_pair(op, n));}}}// add edges  为grah里的结点 添加边for (vector<Op*> path : paths) {shared_ptr<Node<Op*>> pre = nullptr;for (Op* op : path) {shared_ptr<Node<Op*>> n = opMaps[op];if (nullptr == pre) {pre = n;} else {graph->AddEdge(pre, n);pre = n;}}}ops.clear();vector<shared_ptr<Node<Op*>>> order;if (graph->GetPostOrder(order)) {  // 拓扑排序, 使得被依赖的结点在前面for (shared_ptr<Node<Op*>> n : order) {ops.emplace_back(n->getData());}} else {MNN_PRINT("op graph have cycle,schedule failedn");}}

3. 梳理`PipelineInfo`里`op`的`Input/Output`

在schedule函数里已经添加了代码注释, 大致包含这么几个步骤:

遍历PipelineInfo里的op, 找到pipeline的input 和output tensor
把客户传进来的save tensor 也加入output tensor集合
生成 tensor名到 tensor的映射map

4. `Session`的构造函数

构造函数只是取出ScheduleInfo里的内容，以及创建backend

 Session::Session(const Schedule::ScheduleInfo& info) {if (info.pipelineInfo.empty()) {  // 参数无无效mValid = false;return;}mTensors = info.allTensors;for (auto& iter : info.pipelineInfo) { // 取backend info， 并创建backend实例if (mBackends.find(iter.first.type) == mBackends.end()) {auto newBn = BackendFactory::create(iter.first);//创建backend实例if (nullptr == newBn) {mValid = false;return;}mBackends[iter.first.type].reset(newBn);}auto backend    = mBackends.find(iter.first.type)->second.get();auto cpuBackend = _getDefaultBackend();// 构造pipeline, 以pipeline info, backend 为参数std::shared_ptr<Pipeline> newPipeline(new Pipeline(iter.second, backend, cpuBackend));mPipelines.emplace_back(std::move(newPipeline));}mInputs  = info.inputTensors; // 取 input信息mOutputs = info.outputTensor; // 取output信息

其中Pipleline定义以及构造函数, pipeline这个对象的含义是记录了从输入结点到输出结点的整个计算路径, 路径上经过的结点用unit对象表示.

unit对象是有 op+input+output组成.

 /** pipeline. one session may contains multiple pipeline, and one pipeline may contains more than one unit. */class Pipeline : public NonCopyable {public:/*** @brief initialize with pipeline info, major backend and backup backend (usually CPU).* @param info      given pipeline info.* @param major     given major backend used to create execution.* @param backup    given backend backend if op is not supported by major backend.*/Pipeline(const std::vector<Schedule::PipelineInfo>& info, Backend* major, Backend* backup);public:/*** @brief prepare all units.* @return result code.*/ErrorCode prepare();/*** @brief execute all units.* @return result code.*/ErrorCode execute();/*** @brief execute all units with callbacks.* @param before    callback before execute each op.* @param after     callback after execute each op.* @return result code.*/ErrorCode executeCallBack(const TensorCallBackWithInfo& before, const TensorCallBackWithInfo& after);/*** @brief the Pipline need not prepare any more, release all cache used for resize.* @return errorcode*/ErrorCode releaseCache();/** op unit in pipeline */class Unit : public NonCopyable, public OperatorInfo {public:/*** @brief initialize with given op and its in-out tensors.* @param op        given op.* @param inputs    execution input tensors.* @param outputs   execution output tensors.*/Unit(const Op* op, const std::vector<Tensor*>& inputs, const std::vector<Tensor*>& outputs);/*** @brief prepare unit.* @return result code.*/ErrorCode prepare(Backend* major, Backend* backup);/*** @brief execute unit.* @return result code.*/ErrorCode execute();/*** @brief execute unit with callbacks.* @param before    callback before execute each op.* @param after     callback after execute each op.* @return result code.*/ErrorCode executeCallBack(const TensorCallBackWithInfo& before, const TensorCallBackWithInfo& after);public:/** op execution */std::shared_ptr<Execution> mExecution;/** op type*/OpType mType;/** input tensors */std::vector<Tensor*> mInputs;/** output tensors */std::vector<Tensor*> mOutputs;/** op */const Op* mOriginOp;private:bool _createExecution(Backend* bn, Backend* cpuBn);bool _allocTensors(Backend* bn, const std::vector<Tensor*>& tensors);private:bool mConst                   = false;};protected:/*Used for Unit Test*/const std::vector<std::shared_ptr<Unit>>& getUnit() const {return this->mUnits;}private:Backend* mBackend;Backend* mBackupBackend;std::vector<std::shared_ptr<Unit>> mUnits;};

pipeline的构造函数, 其中最主要的是创建unit对象数组，unit数组代表计算路径。

 Pipeline::Pipeline(const std::vector<Schedule::PipelineInfo>& infos, Backend* backend, Backend* cpuBackend) {SizeComputerSuite::init();MNN_ASSERT(nullptr != backend);MNN_ASSERT(nullptr != cpuBackend);mBackupBackend = cpuBackend;mBackend       = backend;for (auto& info : infos) {std::shared_ptr<Unit> unit(new Unit(info.op, info.inputs, info.outputs));mUnits.emplace_back(unit);}}

unit的构造函数

 Pipeline::Unit::Unit(const Op* op, const std::vector<Tensor*>& inputs, const std::vector<Tensor*>& outputs) {MNN_ASSERT(nullptr != op);mOriginOp = op; // opmType     = op->type(); // 类型mInputs   = inputs; // 输入mOutputs  = outputs;//输出if (nullptr != op->name()) {mContent->name = op->name()->str();}auto typeStr = EnumNameOpType(mType);if (nullptr != typeStr) {mContent->type = typeStr;}}

5. Session resize操作

Session resize操作代码如下，核心就是 1.把pipeline里unit挨个prepare一遍， 2 准备内存空间

 ErrorCode Session::resize() {_clearCache();for (auto& b : mBackends) {// 清除bufferb.second->onClearBuffer();}for (auto& iter : mPipelines) { // 对pipe做prepareauto error = iter->prepare();if (NO_ERROR != error) {return error;}}mNeedResize = false;for (auto& b : mBackends) { // 重新分配bufferb.second->onAllocateBuffer();}return NO_ERROR;}

unit的prepare方法,做了什么事情

检查该unit的输入合法性，然后分配内存
computeOutputSize方法:计算input， output tensor的shape
根据backend创建Execution执行环境，准备内存空间

 ErrorCode Pipeline::Unit::prepare(Backend* bn, Backend* cpuBn) {for (auto t : mInputs) {bool valid = true;for (int i = 0; i < t->dimensions(); ++i) {// 检查没有小于等于0的输入大小if (t->length(i) <= 0) {valid = false;break;}}if (!valid) {MNN_ERROR("The %s's input is not readyn", mContent->name.c_str());return COMPUTE_SIZE_ERROR;}}{ // 为输入tensor 分配内存auto success = _allocTensors(bn, mInputs);if (!success) {return OUT_OF_MEMORY;}}// 计算shapebool ready = SizeComputer::computeOutputSize(mOriginOp, mInputs, mOutputs);for (auto o : mOutputs) {if (o->size() <= 0) {ready = false;}if (o->dimensions() < 4 && TensorUtils::getDescribe(o)->dimensionFormat == MNN_DATA_FORMAT_NC4HW4) {for (auto index = o->dimensions(); index < 4; ++index) {o->setLength(index, 1);}}}// 计算大约需要的flopsmContent->flops = SizeComputer::computeFlops(mOriginOp, mInputs, mOutputs);if (!ready) {return COMPUTE_SIZE_ERROR;}// Check constmConst = true;for (int i = 0; i < mInputs.size(); ++i) {if (SizeComputer::opNeedContent(mOriginOp->type(), i) && (TensorUtils::getDescribe(mInputs[i])->usage != TensorUsage::CONST)) {mConst = false;break;}}if (mType == OpType_TrainableParam) {for (auto t : mOutputs) {TensorUtils::getDescribe(t)->usage = TensorUsage::TRAINABLE;}mConst = false;}if (mConst) {for (auto t : mOutputs) {TensorUtils::getDescribe(t)->usage = TensorUsage::CONST;}bn = cpuBn;}// 创建执行器Executionif (nullptr == mExecution) {auto sucess = _createExecution(bn, cpuBn);if (!sucess || mExecution == nullptr) {return NOT_SUPPORT;}}bn = mExecution->backend();// 后端{// 分配内存auto success = _allocTensors(bn, mOutputs);if (!success) {return OUT_OF_MEMORY;}}// 内存大小是否需要调整auto code = mExecution->onResize(mInputs, mOutputs);// 需要调整，则是否内存， 重新创建CPU backendif (TENSOR_NOT_SUPPORT == code || TENSOR_NEED_DIVIDE == code) {// TODOmExecution.reset();for (auto t : mOutputs) {auto des = TensorUtils::getDescribe(t);des->backend->onReleaseBuffer(t, _getTensorReleaseStorageType(t));des->backend = nullptr;}auto sucess = _createExecution(cpuBn, cpuBn);MNN_ASSERT(NO_ERROR == sucess);auto success = _allocTensors(mExecution->backend(), mOutputs);if (!success) {return OUT_OF_MEMORY;}code = mExecution->onResize(mInputs, mOutputs);}if (NO_ERROR != code) {mExecution.reset();return code;}if (mConst) {code = mExecution->onExecute(mInputs, mOutputs);}for (auto t : mInputs) {auto des = TensorUtils::getDescribe(t);des->useCount -= 1;if (0 == des->useCount) {des->backend->onReleaseBuffer(t, _getTensorReleaseStorageType(t));}}return code;}

现在总结下session创建过程

Session对象构造需要一个 ScheduleInfo为参数
创建ScheduleInfo的schedule函数比较复杂
- 初始化所有tensor
- 根据配置的指定,构造Backend
- 如果配置里指定了input,output,只取相关结点生成pipeline, 否则用模型本身的input/output结点生成pipeline
- 对于pipeline里涉及的op, 梳理每个op的input和output
构造session对象
session的准备工作

添加输入数据到`Session`

取`input tensor`

首先是或获取该session的input tensor, 这个就是直接根据name查map. 前面已经分析过input tensor的构造了.

 Tensor* Session::getInput(const char* name) const {MNN_ASSERT(!mInputs.empty());if (nullptr == name) {return mInputs.begin()->second;}auto iter = mInputs.find(name);if (iter == mInputs.end()) {MNN_PRINT("Error: can't find input: %sn", name);return nullptr;}return iter->second;}

`tensor`的定义

其实tensor的含义就是代表神经网络里的数据块.

软件本质是数据和算法集合. 在神经网络里, tensor就是数据, 算法就是op以及op构成的graph.

tensor分为两类,host类型,数据是分配在主内存当中, device类型,数据存储由backend分配

填充数据

对于host类型的tensor, 填充数据可以直接赋值

 auto inputTensor = interpreter->getSessionInput(session, NULL);inputTensor->host<float>()[0] = 1.f;

非host类型的,填充数据要copy

 auto inputTensor = interpreter->getSessionInput(session, NULL);auto nchwTensor = new Tensor(inputTensor, Tensor::CAFFE);// nchwTensor-host<float>()[x] = ...inputTensor->copyFromHostTensor(nchwTensor);delete nchwTensor;

数据拷贝copyFromHostTensor分析
数据从device端拷贝到host端内存,具体实现是通过backend的onCopyBuffer函数，Backend是具体后端实现, 继续深入
Backend介绍
backend其实是个抽象类,定义很多接口,没有具体实现, 每个接口的意义通俗易懂, 而且有注释,不再详细解释. 具体的backend实现,后面在详细解释

运行`session`

找到Session的run方法, 该方法对pipeline逐个进行执行, 调用其execute函数. 还记得pipleline, unit的定义么?

 ErrorCode Session::run() const {if (mNeedResize) {MNN_ERROR("Can't run session because not resized");return COMPUTE_SIZE_ERROR;}for (auto& iter : mPipelines) {  //auto error = iter->execute();if (NO_ERROR != error) {return error;}}return NO_ERROR;}

pipeline的execute函数, 核心部分是对该pipeline的每个unit执行execute函数

 ErrorCode Pipeline::execute() {mBackend->onExecuteBegin();  // 埋点hook, 不影响主体逻辑for (int i=0; i<mUnits.size(); ++i) {auto& u = mUnits[i];auto code = u->execute();if (code != NO_ERROR) {mBackend->onExecuteEnd();   // 埋点hook, 不影响主体逻辑return code;}}mBackend->onExecuteEnd(); // 埋点hook, 不影响主体逻辑return NO_ERROR;}

而unit的execute函数只是包装了Execution的onExecute

 ErrorCode Pipeline::Unit::execute() {if (nullptr == mExecution) {return NO_EXECUTION;}if (mConst) {return NO_ERROR;}auto code = mExecution->onExecute(mInputs, mOutputs);if (NO_ERROR != code) {MNN_ERROR("Execute Error for %s, code=%dn", mContent->name.c_str(), code);}return code;}

刨根掘底看Execution是什么? 怎么创建的? 找到Execution被创建的地方, 发现它是有backend提供来执行op的, 需要以input output和op为参数.

 bool Pipeline::Unit::_createExecution(Backend* bn, Backend* cpuBn) {mExecution.reset(bn->onCreate(mInputs, mOutputs, mOriginOp)); // 由指定的backend 来创建executionif (nullptr == mExecution) {mExecution.reset(cpuBn->onCreate(mInputs, mOutputs, mOriginOp));  // 由cpu backend 来创建execution}if (nullptr == mExecution) {return false;}bool needWrap = false;// 给Execution添加包装器, 现在还不理解为什么这么做auto executionBackend = mExecution->backend();for (int i = 0; i < mInputs.size(); ++i) {auto t   = mInputs[i];auto des = TensorUtils::getDescribe(t);if (des->backend != executionBackend && SizeComputer::opNeedContent(mOriginOp->type(), i)) {needWrap = true;}}if (needWrap) {// FUNC_PRINT_ALL(mOriginOp->name()->c_str(), s);auto tempExecution = mExecution;mExecution.reset(new WrapExecution(cpuBn, tempExecution));}return mExecution->valid();}

以CPUBackend为例,分析Execution创建过程.

 /// get executionExecution* CPUBackend::onCreate(const std::vector<Tensor*>& inputs, const std::vector<Tensor*>& outputs,const MNN::Op* op) {auto map  = getCreatorMap();  // Execution是有creator来创建的, 不同op有不同creator来创建Execution. 所有creator是放到map里 auto iter = map->find(op->type()); // 找到对应的creatorif (iter == map->end()) {MNN_PRINT("Don't support type %d, %sn", op->type(), op->name()->c_str());return nullptr;}auto exe = iter->second->onCreate(inputs, outputs, op, this); // 执行创建if (nullptr == exe) {MNN_PRINT("The Creator Don't support type %d, %sn", op->type(), op->name()->c_str());return nullptr;}. . .return exe;}

继续跟进getCreatorMap

 //  map是个单例static inline std::map<OpType, CPUBackend::Creator*>* getCreatorMap() {static std::once_flag of;static std::map<OpType, CPUBackend::Creator*>* ret = nullptr;std::call_once(of, [&]() { ret = new std::map<OpType, CPUBackend::Creator*>; });return ret;}// 往map里添加  execution的creator,   key是op_type, value是Execution creatorbool CPUBackend::addCreator(OpType t, Creator* c) {auto map = getCreatorMap();if (map->find(t) != map->end()) {MNN_PRINT("Error: %d type has be addedn", t);return false;}map->insert(std::make_pair(t, c));return true;}

继续搜索addCreator调用处, 有一条宏REGISTER_CPU_OP_CREATOR用来方便注册creator

 template <class T>class CPUCreatorRegister {public:CPUCreatorRegister(OpType type) {CPUBackend::addCreator(type, new T);}};#define REGISTER_CPU_OP_CREATOR(name, opType) static CPUCreatorRegister<name> _Create##opType(opType)

看下有哪些地方注册了Execution实例, 全局搜索有100多处. 后面再另起一篇文章分析

小结

一个Session里面包含至少一条pipeline(这个是在创建session时的ScheduleConfig决定,通常就一条 ),

一条pipeline里包含unit数组, 每一个unit代表一个op运算, 该op运算由一个具体Execution实例执行. Excution的实例跟设备有关,CPU上,GPU上实现不相同, 后面再分析.

获得输出结果

当session执行完毕时, output tensor上就有了结果,直接取就可以了.

找到指定的输出结点tensor

 Tensor* Session::getOutput(const char* name) const {MNN_ASSERT(!mOutputs.empty());if (nullptr == name) {return mOutputs.begin()->second;}auto iter = mOutputs.find(name);if (iter == mOutputs.end()) {MNN_PRINT("Error: can't find output: %sn", name);return nullptr;}return iter->second;}

从tensor上取数据, 对于CPU backend的tensor,直接用buffer上数据.

 auto outputTensor = interpreter->getSessionOutput(session, NULL);auto score = outputTensor->host<float>()[0];auto index = outputTensor->host<float>()[1];// ...

小结