在正式开始推理代码分析之前, 回顾下 MNN整体结构

推理分为三个大部分

  • Engine
  • Backends
  • Runtime Optimize

那么问题来了,从哪里开始,怎么入手呢? 我的心得是源码分析不能直接一头扎进去, 陷入迷茫.

应该遵循两个原则

  1. 从外到内 : 外部sdk接口开始,顺藤摸瓜分析.
  2. 从整体到局部: 先整体上了解这个东西,可以先阅读官方资料知道整体框架,以及哪些模块,分别起到什么作用.

建议按照之前使用方法,把MNN自带的Android demo编译安装运行,并且简单过一遍调用流程, 这里采用从外到内的方法着手分析.

Android上用MNN SDK做推理

以下内容参考自MNN参考文档

MNN中有两个概念:解释器Interpreter和会话Session

解释器 负责加载模型, 持有模型数据;

会话是由解释器创建的,它是推理数据的持有者。多个推论可以共享相同的模型数据,也就是说,多个会话可以共享一个解释器

Session

创建会话得先有解释器, 解释器就是把模型文件加载进来的, 自然就有如下API:

根据指定MNN模型文件,创建解释器Interpreter

 /*** @brief create net from file.* @param file  given file.* @return created net if success, NULL otherwise.*/static Interpreter* createFromFile(const char* file);

解释器提供了创建会话的方法Interpreter::createSession, 通过传递一个config文件配置这个会话.

 /*** @brief create session with schedule config. created session will be managed in net.* @param config session schedule config.* @return created session if success, NULL otherwise.*/Session* createSession(const ScheduleConfig& config);

会话配置Scheduling configuration

 /** session schedule config */struct ScheduleConfig {/** which tensor should be kept */std::vector<std::string> saveTensors;/** forward type,   CPU 或者GPU */MNNForwardType type = MNN_FORWARD_CPU;/** number of threads in parallel  线程数 */int numThread = 4;​/** subpath to run */struct Path {std::vector<std::string> inputs;std::vector<std::string> outputs;​enum Mode {/*** Op Mode* - inputs means the source op, can NOT be empty.* - outputs means the sink op, can be empty.* The path will start from source op, then flow when encounter the sink op.* The sink op will not be compute in this path.*/Op = 0,​/*** Tensor Mode (NOT supported yet)* - inputs means the inputs tensors, can NOT be empty.* - outputs means the outputs tensors, can NOT be empty.* It will find the pipeline that compute outputs from inputs.*/Tensor = 1};​/** running mode */Mode mode = Op;};Path path;​/** backup backend used to create execution when desinated backend do NOT support any op */MNNForwardType backupType = MNN_FORWARD_CPU;​/** extra backend config */BackendConfig* backendConfig = nullptr;};

在推理中,主后端计算是由type指定的,缺省值是CPU。由backupType指定的备用后端,当主选择后端不支持模型中的操作符时启用备用后端。

推理路径是指从输入到输出计算过程中涉及到的operator。如果没有指定,它将根据模型结构自动识别。为了节省内存,MNN复用了tensor的内存(除了output tensor)。因此如果你需要保存中间tensor的结果,必须传入中间tensor, 放到saveTensors作为参数, 来避免内存复用。

在进行推断时,可以使用numThread修改线程的数量。但是线程的实际数量取决于部署环境:

•iOS,使用GCD,忽略配置;

•当MNN_USE_THREAD_POOL被启用时,线程的数量取决于第一次配置的线程数量;

•OpenMP,线程数全局设置,实际线程数取决于上次配置的线程数;

此外,可以通过backendConfig设置后端参数。详见下文。

Backend Configuration

backend configuration的定义:

 struct BackendConfig {enum MemoryMode {Memory_Normal = 0,Memory_High,Memory_Low};MemoryMode memory = Memory_Normal;enum PowerMode {Power_Normal = 0,Power_High,Power_Low};PowerMode power = Power_Normal;enum PrecisionMode {Precision_Normal = 0,Precision_High,Precision_Low};PrecisionMode precision = Precision_Normal;/** user defined context */void* sharedContext = nullptr;};

MemoryMode, PowerMode和 PrecisionMode 分别是指 内存占用,功耗,精度配置,

Input Data

获取Input Tensor

两个接口, getSessionInput是获取输入的tensor. 如果模型是有多个input tensor时用第二个接口getSessionInputAll

 /*** @brief get input tensor for given name.* @param session   given session.* @param name      given name. if NULL, return first input.* @return tensor if found, NULL otherwise.*/Tensor* getSessionInput(const Session* session, const char* name);​/*** @brief get all output tensors.* @param session   given session.* @return all output tensors mapped with name.*/const std::map<std::string, Tensor*>& getSessionInputAll(const Session* session) const;

Fill Data

 auto inputTensor = interpreter->getSessionInput(session, NULL);inputTensor->host<float>()[0] = 1.f;

Backend为cpu时, 直接复制到Input tensot的host属性即可.

Copy Data

对于非cpu的backend,参考如下实例

 auto inputTensor = interpreter->getSessionInput(session, NULL);auto nchwTensor = new Tensor(inputTensor, Tensor::CAFFE);// nchwTensor-host<float>()[x] = ...inputTensor->copyFromHostTensor(nchwTensor);delete nchwTensor;

Image Processing

 struct Config{Filter filterType = NEAREST;ImageFormat sourceFormat = RGBA;ImageFormat destFormat = RGBA;//Only valid if the dest type is floatfloat mean[4] = {0.0f,0.0f,0.0f, 0.0f};float normal[4] = {1.0f, 1.0f, 1.0f, 1.0f};};

CV::ImageProcess::Config

  • Specify input and output formats by sourceFormat and destFormat, currently supports RGBARGBBGRGRAYBGRAYUV_NV21
  • Specify the type of interpolation by filterType, currently supports NEAREST, BILINEAR and BICUBIC
  • Specify the mean normalization by mean and normal, but the setting is ignored when the data type is not a floating point type

Run Session

调用解释器Interpreter的runSession方法

 /*** @brief run session.* @param session   given session.* @return result of running.*/ErrorCode runSession(Session* session) const;

Run Session with Callbacks

 typedef std::function<bool(const std::vector<Tensor*>&, const std::string& /*opName*/)> TensorCallBack;​/** @brief run session.* @param session   given session.* @param before    callback before each op. return true to run the op; return false to skip the op.* @param after     callback after each op. return true to continue running; return false to interrupt the session.* @param sync      synchronously wait for finish of execution or not.* @return result of running.*/ErrorCode runSessionWithCallBack(const Session* session, const TensorCallBack& before, const TensorCallBack& end,bool sync = false) const;

Compared to runSession, runSessionWithCallback provides additional:

  • Callbacks before each op, which could be used to skip the execution;
  • Callback after each op, which could be used to interrupt the inference;
  • Synchronization option, defaults off; when enabled, all backends will wait for inference to complete, ie the function time cost is equal to the inference time cost;

Get Output

Get Output Tensor

 /*** @brief get output tensor for given name.* @param session   given session.* @param name      given name. if NULL, return first output.* @return tensor if found, NULL otherwise.*/Tensor* getSessionOutput(const Session* session, const char* name);​/*** @brief get all output tensors.* @param session   given session.* @return all output tensors mapped with name.*/const std::map<std::string, Tensor*>& getSessionOutputAll(const Session* session) const;

Interpreter provides two ways to get the output Tensor: getSessionOutput for getting a single output tensor and getSessionOutputAll for getting the output tensor map.

When there is only one output tensor, you can pass NULL to get the tensor when calling getSessionOutput.

Read Data

 auto outputTensor = interpreter->getSessionOutput(session, NULL);auto score = outputTensor->host<float>()[0];auto index = outputTensor->host<float>()[1];// ...

The simplest way to read data is to read from host of Tensor directly, but this usage is limited to the CPU backend, other backends need to read the data through deviceid. On the other hand, users need to handle the differences between NC4HW4 and NHWC data layouts.

For non-CPU backends, or users who are not familiar with data layout, copy data interfaces should be used.

Copy Data

NCHW example:

 auto outputTensor = interpreter->getSessionOutput(session, NULL);auto nchwTensor = new Tensor(outputTensor, Tensor::CAFFE);outputTensor->copyToHostTensor(nchwTensor);auto score = nchwTensor->host<float>()[0];auto index = nchwTensor->host<float>()[1];// ...delete nchwTensor;

NC4HW4 example:

 auto outputTensor = interpreter->getSessionOutput(session, NULL);auto nc4hw4Tensor = new Tensor(outputTensor, Tensor::CAFFE_C4);outputTensor->copyToHostTensor(nc4hw4Tensor);auto score = nc4hw4Tensor->host<float>()[0];auto index = nc4hw4Tensor->host<float>()[1];// ...delete nc4hw4Tensor;

NHWC example:

 auto outputTensor = interpreter->getSessionOutput(session, NULL);auto nhwcTensor = new Tensor(outputTensor, Tensor::TENSORFLOW);outputTensor->copyToHostTensor(nhwcTensor);auto score = nhwcTensor->host<float>()[0];auto index = nhwcTensor->host<float>()[1];// ...delete nhwcTensor;

By copying the data in this way, the data layout of tensor created with new is only thing you need to pay attention to. copyToHostTensor is responsible for processing the conversion on the data layout (if needed) and the data copy between the backends (if needed).

Interpreter解释器执行过程分析

前面从SDK里看到最主要对象是Interpreter, 直接打开Interpreter.hpp看找到主要方法

 class MNN_PUBLIC Interpreter {public:static Interpreter* createFromFile(const char* file);Session* createSession(const ScheduleConfig& config);bool releaseSession(Session* session);ErrorCode runSession(Session* session) const;ErrorCode runSessionWithCallBack(const Session* session, const TensorCallBack& before, const TensorCallBack& end,bool sync = false) const;Tensor* getSessionInput(const Session* session, const char* name);Tensor* getSessionOutput(const Session* session, const char* name);const std::map<std::string, Tensor*>& getSessionOutputAll(const Session* session) const;const std::map<std::string, Tensor*>& getSessionInputAll(const Session* session) const;​private:static Interpreter* createFromBufferInternal(Content* net);​Content* mNet = nullptr;Interpreter(Content* net);};

打开Interpreter.cpp看具体实现

解释器创建

重点分析createFromFile静态方法

 Interpreter* Interpreter::createFromFile(const char* file) {// 参数检查if (nullptr == file) {MNN_PRINT("NULL file for create interpreter");return nullptr;}// FileLoader读取模型文件std::unique_ptr<FileLoader> loader(new FileLoader(file));if (!loader->valid()) {MNN_PRINT("Create interpreter failed, open %s errorn", file);return nullptr;}bool result = loader->read();if (!result) {MNN_PRINT("Read file errorn");return nullptr;}if (loader->size() == 0) {MNN_PRINT("Create interpreter failed, %s is emptyn", file);return nullptr;}auto net     = new Content; // 文件内容装入 contentbool success = loader->merge(net->buffer);if (!success) {return nullptr;}loader.reset();return createFromBufferInternal(net);  // 创建解释器}​struct Content {AutoStorage<uint8_t> buffer;  // raw dataconst Net* net = nullptr;   // MNN Net对象std::vector<std::unique_ptr<Session>> sessions;  // session集合std::map<const Tensor*, const Session*> tensorMap; // 从tensor 到 session};

继续createFromBufferInternal方法

 Interpreter* Interpreter::createFromBufferInternal(Content* net) {if (nullptr == net) {MNN_PRINT("Buffer is null for create interpretern");return nullptr;}// flatbuffer 做反序列化, 先验证内容是否符合格式flatbuffers::Verifier verify((const uint8_t*)(net->buffer.get()), net->buffer.size());if (false == VerifyNetBuffer(verify)) {MNN_PRINT("Invalidate buffer to create interpretern");delete net;return nullptr;}net->net = GetNet(net->buffer.get());  // 反序列化if (nullptr == net->net->oplists()) {MNN_ERROR("Model has no oplistn");delete net;return nullptr;}return new Interpreter(net); // 构造解释器}

其中1解释器Interpreter的构造函数

 Interpreter::Interpreter(Content* net) {mNet      = net;}

其中用flattbuffer取Net

 inline const MNN::Net *GetNet(const void *buf) {return flatbuffers::GetRoot<MNN::Net>(buf);}

用一张图总结下创建解释器的流程

创建Session

创建session的函数源码如下, 两个步骤

  • 创建schedule信息, 后面继续分析schedule是什么
  • 构造session对象
 Session* Interpreter::createMultiPathSession(const std::vector<ScheduleConfig>& configs) {if (nullptr == mNet->buffer.get()) {MNN_ERROR("The model buffer has been released. Can't create sessionn");return nullptr;}auto info       = Schedule::schedule(mNet->net, configs);  // 解析Scheduleauto newSession = std::unique_ptr<Session>(new Session(info)); // 创建sessionif (!newSession->valid()) {MNN_PRINT("Invalide Session!!n");return nullptr;}auto result = newSession.get();if (info.validForResize) {result->resize(); // 做resize, 为推理做准备工作}mNet->sessions.emplace_back(std::move(newSession));return result;}​Session* Interpreter::createSession(const ScheduleConfig& config) {// 只有一个config构成configs列表return createMultiPathSession({config});}

其中最复杂的是schedule信息处理, 返回的是schedule info, 要理解下其含义

 /** schedule info */struct ScheduleInfo {/** pipelines with backend info   所涉及计算的结点 按照顺序 构成pipeline*/std::vector<std::pair<Backend::Info, std::vector<PipelineInfo>>> pipelineInfo;/** input tensors map */std::map<std::string, Tensor*> inputTensors;/** output tensors map */std::map<std::string, Tensor*> outputTensor;/** all tensors map */std::vector<std::pair<int, std::shared_ptr<Tensor>>> allTensors;/** input valid for resize*/bool validForResize;};

接着继续分析schedule函数

 Schedule::ScheduleInfo Schedule::schedule(const Net* net, const std::vector<ScheduleConfig>& configs) {std::vector<std::shared_ptr<Tensor>> allTensors;​ScheduleInfo schedule;if (nullptr == net->oplists()) {MNN_PRINT("Error net for schedulen");return schedule;}bool valid = _setUpTensorInfo(allTensors, net);  // 初始化tensor信息schedule.validForResize = valid;// pipeline以及对应后端backend的构造std::vector<std::pair<Backend::Info, std::vector<PipelineInfo>>> result;for (auto& config : configs) {Backend::Info compute;compute.type      = _getApprociateType(config, net, allTensors, valid);compute.numThread = config.numThread;compute.user      = config.backendConfig;auto oplists      = _scheduleUnit(net, config, allTensors);result.emplace_back(std::make_pair(compute, std::move(oplists)));}schedule.pipelineInfo = std::move(result);// get all used op's output, drop unused op, won't change op order. always insert all Input Ops// 丢弃不需要计算的opstd::set<const Op*> oplists;{for (std::pair<Backend::Info, vector<PipelineInfo>>& pipeline : schedule.pipelineInfo) {for (auto& info : pipeline.second) {oplists.insert(info.op);}}}// 所有op的input tensor 和所有op的output tensorstd::set<int> outputIndexes;std::set<int> inputIndexes;for (auto op : oplists) {if (nullptr != op->outputIndexes()) {auto data = op->outputIndexes()->data();for (int j = 0; j < op->outputIndexes()->size(); ++j) {outputIndexes.insert(data[j]);}}if (nullptr != op->inputIndexes()) {auto data = op->inputIndexes()->data();for (int j = 0; j < op->inputIndexes()->size(); ++j) {inputIndexes.insert(data[j]);}}MNN_ASSERT(OpType_Input != op->type());}// Get All Output and Input   对input,output之间去重std::set<int> inputIndexDiff;std::set<int> outputIndexesDiff;// 如果一个op的output tensor 不是其它op的input, 说明该tensor是整个pipeline的output std::set_difference(outputIndexes.begin(), outputIndexes.end(), inputIndexes.begin(), inputIndexes.end(),std::inserter(outputIndexesDiff, outputIndexesDiff.begin()));// 如果某个op的input tensor,不是其它op的output产生, 说明该tensor是整个pipelien的inputstd::set_difference(inputIndexes.begin(), inputIndexes.end(), outputIndexes.begin(), outputIndexes.end(),std::inserter(inputIndexDiff, inputIndexDiff.begin()));std::unordered_map<std::string, int> tensorNameIndexMap;for (int i = 0; i < net->tensorName()->size(); ++i) {tensorNameIndexMap[net->tensorName()->Get(i)->str()] = i;}for (auto& config : configs) {for (const auto& name : config.saveTensors) { // 默认情况下saveTensors是空的, 如果客户需要取中间tensor的计算结果, 那么传入saveTensors// saveTensors 也要当做output tensorif (tensorNameIndexMap.count(name)) {outputIndexesDiff.insert(tensorNameIndexMap[name]);} else {MNN_PRINT("Bad outputname: %sn", name.c_str());}}}if (net->outputName()) {  // 把模型本身的output tensor 取出来for (int i = 0; i < net->outputName()->size(); ++i) {std::string name = net->outputName()->Get(i)->str();if (tensorNameIndexMap.count(name)) {outputIndexesDiff.insert(tensorNameIndexMap[name]);}}}for (auto index : inputIndexDiff) {  // 最终的input tensorsschedule.inputTensors.insert(std::make_pair(net->tensorName()->GetAsString(index)->c_str(), allTensors[index].get()));TensorUtils::getDescribe(allTensors[index].get())->usage = TensorUsage::INPUT;}for (auto index : outputIndexesDiff) {// 最终的output tensorsschedule.outputTensor.insert(std::make_pair(net->tensorName()->GetAsString(index)->c_str(), allTensors[index].get()));}for (auto& t : allTensors) {  // 全部tensorsschedule.allTensors.emplace_back(std::make_pair(0, std::move(t)));}for (int i = 0; i < net->oplists()->size(); ++i) {auto op = net->oplists()->GetAs<Op>(i);if (nullptr != op->inputIndexes()) {auto data = op->inputIndexes()->data();for (int j = 0; j < op->inputIndexes()->size(); ++j) {auto index = data[j];schedule.allTensors[index].first += 1; // tensor被其他tensor引用次数}}}for (auto outputIndex : outputIndexesDiff) {TensorUtils::getDescribe(schedule.allTensors[outputIndex].second.get())->usage = TensorUsage::OUTPUT;schedule.allTensors[outputIndex].first += 1; // tensor 被 output tesor引用次数}return schedule;}

考虑到schedule比较复杂, 而且其中的数据结构和后面计算相关, 我们拆解下, 逐个分析每个步骤.

1. 初始化tensor信息

对应函数是_setUpTensorInfo, 主要工作是 分配tensor, 以及input层指定数据大小

 static bool _setUpTensorInfo(std::vector<std::shared_ptr<Tensor>>& allTensors, const Net* net) {bool valid = true;auto& tensors = allTensors;tensors.resize(net->tensorName()->size());for (int i = 0; i < tensors.size(); ++i) {  // 挨个地创建 tensor对象, 分配内存,tensors[i].reset(new Tensor(4)); // NCHW, TODOtensors[i]->setType(DataType_DT_FLOAT);}// 遍历op,  只操作input结点for (int opIndex = 0; opIndex < net->oplists()->size(); ++opIndex) {auto op = net->oplists()->GetAs<Op>(opIndex);if (OpType_Input == op->type()) {MNN_ASSERT(nullptr != op->outputIndexes());auto index      = op->outputIndexes()->data()[0];auto tensor     = tensors[index].get();auto& tb        = tensor->buffer();auto inputParam = op->main_as_Input();if (auto idims = inputParam->dims()) {for (int i = 0; i < idims->size(); ++i) {tb.dim[i].min = 0;int extent    = idims->data()[i];// dim-0 is batch(when input batch is -1, set it to be 1, ignore other dim)if (i == 0 && extent == -1) { // batch size如果没有指定,那 默认是1extent = 1;}if (extent < 0) { // 如果该维度上size 小于0,  模型是有问题的valid = false;}tb.dim[i].extent = extent;  // buffer上指定该维度上的size}tb.dimensions = idims->size();} else {tb.dimensions = 0;}tensor->setType(inputParam->dtype());TensorUtils::getDescribe(tensor)->dimensionFormat = inputParam->dformat();// 数据摆放 比如NCHW}}return valid;}

2. 构造PipelineInfo

schedule函数里设计 PipelineInfo的代码如下:

     // pipeline以及对应后端backend info的构造std::vector<std::pair<Backend::Info, std::vector<PipelineInfo>>> result;for (auto& config : configs) {Backend::Info compute;compute.type      = _getApprociateType(config, net, allTensors, valid);  // 决定backend的类型compute.numThread = config.numThread;compute.user      = config.backendConfig;auto oplists      = _scheduleUnit(net, config, allTensors); // 决定pipeline info列表result.emplace_back(std::make_pair(compute, std::move(oplists)));}schedule.pipelineInfo = std::move(result);

上面代码里涉及两个对象Backend::Info, PipelineInfo, 先搞清楚定义, 然后才能分析逻辑:

     struct Info {/** forward type. */MNNForwardType type = MNN_FORWARD_CPU; // 指定是在cpu 还是gpu,etc 上运行/** for CPU only. number of threads. */int numThread = 4;/** user data. */BackendConfig* user = NULL;enum Mode {// The Op will be run in execution->onExecuteDIRECT = 0,// The Op will be recorded. Run in onExecuteBegin and Wait in onExecuteEndINDIRECT = 1};Mode mode = DIRECT;};​/** pipeline info  其实就是描述了一个计算结点信息, 包括 op, 输入和输出tensor*/struct PipelineInfo {/** op */const Op* op;/** input tensors */std::vector<Tensor*> inputs;/** output tensors */std::vector<Tensor*> outputs;};

确定backend类型的函数_getApprociateType, 如下

 static MNNForwardType _getApprociateType(const ScheduleConfig& config, const Net* net, const std::vector<std::shared_ptr<Tensor>>& allTensors, bool inputShapeValid) {MNNForwardType type = config.type;if (MNN_FORWARD_AUTO == config.type) {// Search Backend Exclude MNN_FORWARD_CPUfor (int i = 1; i < MNN_FORWARD_ALL; ++i) {  // 检查下 传递进来的配置的backend type是否支持if (MNNGetExtraBackendCreator((MNNForwardType)i) != nullptr) {type = (MNNForwardType)i;break;}}}auto creator = MNNGetExtraBackendCreator(type); // 根据backend type找 backend creater创建backend, 具体过程后面分析if (nullptr == creator) {MNN_PRINT("Can't Find type=%d backend, use %d insteadn", type, config.backupType);type = config.backupType;}return type;}​

确定pipeline info的函数_scheduleUnit,如下:

 static vector<Schedule::PipelineInfo> _scheduleUnit(const Net* net, const ScheduleConfig& configs,const vector<shared_ptr<Tensor>>& allTensors) {vector<Schedule::PipelineInfo> oplists;vector<const Op*> ops;generateScheduleGraph(ops, net, configs, allTensors);  // 找出所有需要参与计算的结点for (const Op* op : ops) {Schedule::PipelineInfo opInfo; // 对这些参与计算的结点,对应创建一个pipeline infoopInfo.op = op;if (nullptr != op->outputIndexes()) { //pipeline info的输出auto data = op->outputIndexes()->data();for (int j = 0; j < op->outputIndexes()->size(); ++j) {opInfo.outputs.push_back(allTensors[data[j]].get());}}if (nullptr != op->inputIndexes()) { //pipeline info的输入auto data = op->inputIndexes()->data();for (int j = 0; j < op->inputIndexes()->size(); ++j) {opInfo.inputs.push_back(allTensors[data[j]].get());}}oplists.emplace_back(opInfo);}​return oplists; // 返回pipeline info列表}

其中的generateScheduleGraph函数,用于找到 计算output的路径, 因为图里并不是一定所有结点都需要参与计算, 找到那些生成output结点所依赖的结点, 并只计算这部分结点.

 static void generateScheduleGraph(vector<const Op*>& ops, const Net* net, const ScheduleConfig& configs,const vector<shared_ptr<Tensor>>& allTensors) {if (configs.path.inputs.empty() && configs.path.outputs.empty()) { // 没有指定input和putput时,默认所有op都参与计算// Use Default Linear scheduleops.clear();ops.reserve(net->oplists()->size());for (int i = 0; i < net->oplists()->size(); ++i) {auto op = net->oplists()->GetAs<Op>(i);if (op->type() != OpType_Input) {ops.emplace_back(op);}}return;}//如果客户指定了 input/output结点, 那么找到涉及这些结点的路径vector<vector<Op*>> paths = generateSchedulePath(net, configs, allTensors);// 根据涉及到的结点, 构造一个新的graphunique_ptr<DirectedAcyclicGraph<Op*>> graph(new DirectedAcyclicGraph<Op*>());​// add Node  把结点加入graphunordered_map<Op*, shared_ptr<Node<Op*>>> opMaps;for (vector<Op*> path : paths) {for (Op* op : path) {if (opMaps.find(op) == opMaps.end()) {OpNodeDef def(op);shared_ptr<Node<Op*>> n = graph->AddNode(def);opMaps.insert(make_pair(op, n));}}}​// add edges  为grah里的结点 添加边for (vector<Op*> path : paths) {shared_ptr<Node<Op*>> pre = nullptr;for (Op* op : path) {shared_ptr<Node<Op*>> n = opMaps[op];if (nullptr == pre) {pre = n;} else {graph->AddEdge(pre, n);pre = n;}}}ops.clear();vector<shared_ptr<Node<Op*>>> order;if (graph->GetPostOrder(order)) {  // 拓扑排序, 使得被依赖的结点在前面for (shared_ptr<Node<Op*>> n : order) {ops.emplace_back(n->getData());}} else {MNN_PRINT("op graph have cycle,schedule failedn");}}

3. 梳理PipelineInfoopInput/Output

在schedule函数里 已经添加了代码注释, 大致包含这么几个步骤:

  • 遍历PipelineInfo里的op, 找到pipeline的input 和output tensor
  • 把客户传进来的save tensor 也加入output tensor集合
  • 生成 tensor名 到 tensor的映射map

4. Session的构造函数

构造函数只是取出ScheduleInfo里的内容, 以及创建backend

 Session::Session(const Schedule::ScheduleInfo& info) {if (info.pipelineInfo.empty()) {  // 参数无无效mValid = false;return;}​mTensors = info.allTensors;for (auto& iter : info.pipelineInfo) { // 取backend info, 并创建backend实例if (mBackends.find(iter.first.type) == mBackends.end()) {auto newBn = BackendFactory::create(iter.first);//创建backend实例if (nullptr == newBn) {mValid = false;return;}mBackends[iter.first.type].reset(newBn);}auto backend    = mBackends.find(iter.first.type)->second.get();auto cpuBackend = _getDefaultBackend();// 构造pipeline, 以pipeline info, backend 为参数std::shared_ptr<Pipeline> newPipeline(new Pipeline(iter.second, backend, cpuBackend));mPipelines.emplace_back(std::move(newPipeline));}mInputs  = info.inputTensors; // 取 input信息mOutputs = info.outputTensor; // 取output信息

其中Pipleline定义以及构造函数, pipeline这个对象的含义是 记录了从输入结点到输出结点的整个计算路径, 路径上经过的结点用unit对象表示.

unit对象是有 op+input+output组成.

 /** pipeline. one session may contains multiple pipeline, and one pipeline may contains more than one unit. */class Pipeline : public NonCopyable {public:/*** @brief initialize with pipeline info, major backend and backup backend (usually CPU).* @param info      given pipeline info.* @param major     given major backend used to create execution.* @param backup    given backend backend if op is not supported by major backend.*/Pipeline(const std::vector<Schedule::PipelineInfo>& info, Backend* major, Backend* backup);​public:/*** @brief prepare all units.* @return result code.*/ErrorCode prepare();/*** @brief execute all units.* @return result code.*/ErrorCode execute();/*** @brief execute all units with callbacks.* @param before    callback before execute each op.* @param after     callback after execute each op.* @return result code.*/ErrorCode executeCallBack(const TensorCallBackWithInfo& before, const TensorCallBackWithInfo& after);/*** @brief the Pipline need not prepare any more, release all cache used for resize.* @return errorcode*/ErrorCode releaseCache();​/** op unit in pipeline */class Unit : public NonCopyable, public OperatorInfo {public:/*** @brief initialize with given op and its in-out tensors.* @param op        given op.* @param inputs    execution input tensors.* @param outputs   execution output tensors.*/Unit(const Op* op, const std::vector<Tensor*>& inputs, const std::vector<Tensor*>& outputs);​/*** @brief prepare unit.* @return result code.*/ErrorCode prepare(Backend* major, Backend* backup);/*** @brief execute unit.* @return result code.*/ErrorCode execute();/*** @brief execute unit with callbacks.* @param before    callback before execute each op.* @param after     callback after execute each op.* @return result code.*/ErrorCode executeCallBack(const TensorCallBackWithInfo& before, const TensorCallBackWithInfo& after);​public:/** op execution */std::shared_ptr<Execution> mExecution;/** op type*/OpType mType;/** input tensors */std::vector<Tensor*> mInputs;/** output tensors */std::vector<Tensor*> mOutputs;/** op */const Op* mOriginOp;​private:bool _createExecution(Backend* bn, Backend* cpuBn);bool _allocTensors(Backend* bn, const std::vector<Tensor*>& tensors);​private:bool mConst                   = false;};​protected:/*Used for Unit Test*/const std::vector<std::shared_ptr<Unit>>& getUnit() const {return this->mUnits;}​private:Backend* mBackend;Backend* mBackupBackend;std::vector<std::shared_ptr<Unit>> mUnits;};

pipeline的构造函数, 其中最主要的是创建unit对象数组,unit数组代表计算路径。

 Pipeline::Pipeline(const std::vector<Schedule::PipelineInfo>& infos, Backend* backend, Backend* cpuBackend) {SizeComputerSuite::init();MNN_ASSERT(nullptr != backend);MNN_ASSERT(nullptr != cpuBackend);mBackupBackend = cpuBackend;mBackend       = backend;​for (auto& info : infos) {std::shared_ptr<Unit> unit(new Unit(info.op, info.inputs, info.outputs));mUnits.emplace_back(unit);}}

unit的构造函数

 Pipeline::Unit::Unit(const Op* op, const std::vector<Tensor*>& inputs, const std::vector<Tensor*>& outputs) {MNN_ASSERT(nullptr != op);mOriginOp = op; // opmType     = op->type(); // 类型mInputs   = inputs; // 输入mOutputs  = outputs;//输出if (nullptr != op->name()) {mContent->name = op->name()->str();}auto typeStr = EnumNameOpType(mType);if (nullptr != typeStr) {mContent->type = typeStr;}}

5. Session resize操作

Session resize操作代码如下, 核心就是 1.把pipeline里unit挨个prepare一遍 , 2 准备内存空间

 ErrorCode Session::resize() {_clearCache();for (auto& b : mBackends) {// 清除bufferb.second->onClearBuffer();}​for (auto& iter : mPipelines) { // 对pipe做prepareauto error = iter->prepare();if (NO_ERROR != error) {return error;}}mNeedResize = false;for (auto& b : mBackends) { // 重新分配bufferb.second->onAllocateBuffer();}​return NO_ERROR;}

unit的prepare方法,做了什么事情

  • 检查该unit的输入合法性, 然后分配内存
  • computeOutputSize方法:计算input, output tensor的shape
  • 根据backend创建Execution执行环境,准备内存空间
 ​ErrorCode Pipeline::Unit::prepare(Backend* bn, Backend* cpuBn) {for (auto t : mInputs) {bool valid = true;for (int i = 0; i < t->dimensions(); ++i) {// 检查没有小于等于0的输入大小if (t->length(i) <= 0) {valid = false;break;}}if (!valid) {MNN_ERROR("The %s's input is not readyn", mContent->name.c_str());return COMPUTE_SIZE_ERROR;}}{ // 为输入tensor 分配内存auto success = _allocTensors(bn, mInputs);if (!success) {return OUT_OF_MEMORY;}}// 计算shapebool ready = SizeComputer::computeOutputSize(mOriginOp, mInputs, mOutputs);for (auto o : mOutputs) {if (o->size() <= 0) {ready = false;}if (o->dimensions() < 4 && TensorUtils::getDescribe(o)->dimensionFormat == MNN_DATA_FORMAT_NC4HW4) {for (auto index = o->dimensions(); index < 4; ++index) {o->setLength(index, 1);}}}// 计算大约需要的flopsmContent->flops = SizeComputer::computeFlops(mOriginOp, mInputs, mOutputs);if (!ready) {return COMPUTE_SIZE_ERROR;}// Check constmConst = true;for (int i = 0; i < mInputs.size(); ++i) {if (SizeComputer::opNeedContent(mOriginOp->type(), i) && (TensorUtils::getDescribe(mInputs[i])->usage != TensorUsage::CONST)) {mConst = false;break;}}if (mType == OpType_TrainableParam) {for (auto t : mOutputs) {TensorUtils::getDescribe(t)->usage = TensorUsage::TRAINABLE;}mConst = false;}​if (mConst) {for (auto t : mOutputs) {TensorUtils::getDescribe(t)->usage = TensorUsage::CONST;}bn = cpuBn;}​// 创建执行器Executionif (nullptr == mExecution) {auto sucess = _createExecution(bn, cpuBn);if (!sucess || mExecution == nullptr) {return NOT_SUPPORT;}}bn = mExecution->backend();// 后端{// 分配内存auto success = _allocTensors(bn, mOutputs);if (!success) {return OUT_OF_MEMORY;}}// 内存大小是否需要调整auto code = mExecution->onResize(mInputs, mOutputs);// 需要调整,则是否内存, 重新创建CPU backendif (TENSOR_NOT_SUPPORT == code || TENSOR_NEED_DIVIDE == code) {// TODOmExecution.reset();for (auto t : mOutputs) {auto des = TensorUtils::getDescribe(t);des->backend->onReleaseBuffer(t, _getTensorReleaseStorageType(t));des->backend = nullptr;}auto sucess = _createExecution(cpuBn, cpuBn);MNN_ASSERT(NO_ERROR == sucess);auto success = _allocTensors(mExecution->backend(), mOutputs);if (!success) {return OUT_OF_MEMORY;}code = mExecution->onResize(mInputs, mOutputs);}if (NO_ERROR != code) {mExecution.reset();return code;}if (mConst) {code = mExecution->onExecute(mInputs, mOutputs);}for (auto t : mInputs) {auto des = TensorUtils::getDescribe(t);des->useCount -= 1;if (0 == des->useCount) {des->backend->onReleaseBuffer(t, _getTensorReleaseStorageType(t));}}return code;}

现在总结下session创建过程

  • Session对象构造需要一个 ScheduleInfo为参数
  • 创建ScheduleInfo的schedule函数比较复杂
    • 初始化所有tensor
    • 根据配置的指定,构造Backend
    • 如果配置里指定了input,output,只取相关结点生成pipeline, 否则用模型本身的input/output结点生成pipeline
    • 对于pipeline里涉及的op, 梳理每个op的input和output
  • 构造session对象
  • session的准备工作

添加输入数据到Session

input tensor

首先是或获取该session的input tensor, 这个就是直接根据name查map. 前面已经分析过input tensor的构造了.

 Tensor* Session::getInput(const char* name) const {MNN_ASSERT(!mInputs.empty());if (nullptr == name) {return mInputs.begin()->second;}auto iter = mInputs.find(name);if (iter == mInputs.end()) {MNN_PRINT("Error: can't find input: %sn", name);return nullptr;}return iter->second;}​

tensor的定义

其实tensor的含义就是 代表神经网络里的数据块.

软件本质是 数据和算法集合. 在神经网络里, tensor就是数据, 算法就是op以及op构成的graph.

tensor分为两类,host类型,数据是分配在主内存当中, device类型,数据存储由backend分配

填充数据

对于host类型的tensor, 填充数据可以直接赋值

 auto inputTensor = interpreter->getSessionInput(session, NULL);inputTensor->host<float>()[0] = 1.f;

非host类型的,填充数据要copy

 auto inputTensor = interpreter->getSessionInput(session, NULL);auto nchwTensor = new Tensor(inputTensor, Tensor::CAFFE);// nchwTensor-host<float>()[x] = ...inputTensor->copyFromHostTensor(nchwTensor);delete nchwTensor;

  1. 数据拷贝copyFromHostTensor分析
    数据从device端拷贝到host端内存,具体实现是通过backend的onCopyBuffer函数,Backend是具体后端实现, 继续深入
  2. Backend介绍
    backend其实是个抽象类,定义很多接口,没有具体实现, 每个接口的意义通俗易懂, 而且有注释,不再详细解释. 具体的backend实现,后面在详细解释

运行session

找到Session的run方法, 该方法对pipeline逐个进行执行, 调用其execute函数. 还记得pipleline, unit的定义么?

 ErrorCode Session::run() const {if (mNeedResize) {MNN_ERROR("Can't run session because not resized");return COMPUTE_SIZE_ERROR;}for (auto& iter : mPipelines) {  //auto error = iter->execute();if (NO_ERROR != error) {return error;}}return NO_ERROR;}

pipelineexecute函数, 核心部分是 对该pipeline的每个unit执行execute函数

 ErrorCode Pipeline::execute() {mBackend->onExecuteBegin();  // 埋点hook, 不影响主体逻辑for (int i=0; i<mUnits.size(); ++i) {auto& u = mUnits[i];auto code = u->execute();if (code != NO_ERROR) {mBackend->onExecuteEnd();   // 埋点hook, 不影响主体逻辑return code;}}mBackend->onExecuteEnd(); // 埋点hook, 不影响主体逻辑return NO_ERROR;}

unitexecute函数 只是包装了ExecutiononExecute

 ErrorCode Pipeline::Unit::execute() {if (nullptr == mExecution) {return NO_EXECUTION;}if (mConst) {return NO_ERROR;}auto code = mExecution->onExecute(mInputs, mOutputs);if (NO_ERROR != code) {MNN_ERROR("Execute Error for %s, code=%dn", mContent->name.c_str(), code);}return code;}

刨根掘底看Execution是什么? 怎么创建的? 找到Execution被创建的地方, 发现它是有backend提供来执行op的, 需要以input output和op为参数.

 bool Pipeline::Unit::_createExecution(Backend* bn, Backend* cpuBn) {mExecution.reset(bn->onCreate(mInputs, mOutputs, mOriginOp)); // 由指定的backend 来创建executionif (nullptr == mExecution) {mExecution.reset(cpuBn->onCreate(mInputs, mOutputs, mOriginOp));  // 由cpu backend 来创建execution}if (nullptr == mExecution) {return false;}bool needWrap = false;​// 给Execution添加包装器, 现在还不理解为什么这么做auto executionBackend = mExecution->backend();for (int i = 0; i < mInputs.size(); ++i) {auto t   = mInputs[i];auto des = TensorUtils::getDescribe(t);if (des->backend != executionBackend && SizeComputer::opNeedContent(mOriginOp->type(), i)) {needWrap = true;}}if (needWrap) {// FUNC_PRINT_ALL(mOriginOp->name()->c_str(), s);auto tempExecution = mExecution;mExecution.reset(new WrapExecution(cpuBn, tempExecution));}return mExecution->valid();​}

CPUBackend为例,分析Execution创建过程.

 /// get executionExecution* CPUBackend::onCreate(const std::vector<Tensor*>& inputs, const std::vector<Tensor*>& outputs,const MNN::Op* op) {auto map  = getCreatorMap();  // Execution是有creator来创建的, 不同op有不同creator来创建Execution. 所有creator是放到map里 auto iter = map->find(op->type()); // 找到对应的creatorif (iter == map->end()) {MNN_PRINT("Don't support type %d, %sn", op->type(), op->name()->c_str());return nullptr;}auto exe = iter->second->onCreate(inputs, outputs, op, this); // 执行创建if (nullptr == exe) {MNN_PRINT("The Creator Don't support type %d, %sn", op->type(), op->name()->c_str());return nullptr;}. . .return exe;}

继续跟进getCreatorMap

 //  map是个单例static inline std::map<OpType, CPUBackend::Creator*>* getCreatorMap() {static std::once_flag of;static std::map<OpType, CPUBackend::Creator*>* ret = nullptr;std::call_once(of, [&]() { ret = new std::map<OpType, CPUBackend::Creator*>; });return ret;}// 往map里添加  execution的creator,   key是op_type, value是Execution creatorbool CPUBackend::addCreator(OpType t, Creator* c) {auto map = getCreatorMap();if (map->find(t) != map->end()) {MNN_PRINT("Error: %d type has be addedn", t);return false;}map->insert(std::make_pair(t, c));return true;}

继续搜索addCreator调用处, 有一条宏REGISTER_CPU_OP_CREATOR用来方便注册creator

 template <class T>class CPUCreatorRegister {public:CPUCreatorRegister(OpType type) {CPUBackend::addCreator(type, new T);}};​#define REGISTER_CPU_OP_CREATOR(name, opType) static CPUCreatorRegister<name> _Create##opType(opType)

看下有哪些地方注册了Execution实例, 全局搜索有100多处. 后面再另起一篇文章分析

小结

一个Session里面包含 至少一条pipeline(这个是在创建session时的ScheduleConfig决定,通常就一条 ),

一条pipeline里包含unit数组, 每一个unit代表一个op运算, 该op运算由一个具体Execution实例执行. Excution的实例跟设备有关,CPU上,GPU上实现不相同, 后面再分析.

获得输出结果

当session执行完毕时, output tensor上就有了结果,直接取就可以了.

找到指定的输出结点tensor

 Tensor* Session::getOutput(const char* name) const {MNN_ASSERT(!mOutputs.empty());if (nullptr == name) {return mOutputs.begin()->second;}​auto iter = mOutputs.find(name);if (iter == mOutputs.end()) {MNN_PRINT("Error: can't find output: %sn", name);return nullptr;}return iter->second;}

从tensor上取数据, 对于CPU backend的tensor,直接用buffer上数据.

 auto outputTensor = interpreter->getSessionOutput(session, NULL);auto score = outputTensor->host<float>()[0];auto index = outputTensor->host<float>()[1];// ...

小结

openmp官方源码_MNN推理过程源码分析笔记(一)主流程相关推荐

  1. 老李推荐:第5章5节《MonkeyRunner源码剖析》Monkey原理分析-启动运行: 获取系统服务引用 1...

    老李推荐:第5章5节<MonkeyRunner源码剖析>Monkey原理分析-启动运行: 获取系统服务引用 上一节我们描述了monkey的命令处理入口函数run是如何调用optionPro ...

  2. NJ4X源码阅读分析笔记系列(一)——项目整体分析

    NJ4X源码阅读分析笔记系列(一)--项目整体分析 NJ4X是什么 参见NJ4X的官网:http://www.nj4x.com/ Java and .Net interfaces to support ...

  3. Android源码和内核源码的下载,编译和执行

    笔者依据罗升阳老师的<Android 系统源码情景分析>一书,尝试下载,编译和执行Android源码和内核源码.但可能是软件源"被墙"或版本号更新的原因.期间遇到诸多问 ...

  4. MyBatis源码解读之源码结构

    2019独角兽企业重金招聘Python工程师标准>>> 目的 文章主要了解MyBatis源码结构,每个包的具体功能.如何去学习MyBatis源码. MyBatis源码结构导图 查看大 ...

  5. 召唤神龙~让我们来看看源码及简单的修改分析,讲的很清楚哦

    最近召唤神龙好火哈哈哈哈哈哈哈 先给大家一个官方正版链接:[召唤神龙] 在给大家一套源码,这个源码不能本地运行,可以简单搭个服务器环境就可以啦:https://gitee.com/bendantada ...

  6. 流量卡物联网卡管理平台源码|PHP管理系统源码 成品可商用系统

    市面流量卡物联网卡管理平台源码|PHP管理系统源码: 市面上很火的流量卡项目,可以用这套平台系统来管理,实现自定义套餐名以及流量卡运营,充值收款. 安装需要环境php7.2+,mysql5.6+,环境 ...

  7. 最新开源IDC代理PHP源码IDC平源码

    最新开源IDC代理PHP源码IDC平源码 IDC代理源码是一个以php+mysql进行开发的PHP开源IDC平源码.      1.支持国内多家API接口. 2.支持多家主机控制面板对接功能,如:一键 ...

  8. ARM树莓派高级开发——linux内核源码、树莓派源码编译、SD卡挂载

    文章目录 linux内核开发基础(linux内核源码.树莓派源码编译.SD卡挂载) 树莓派等芯片带操作系统的启动过程 linux内核源码树 Linux内核源代码目录树结构: 树莓派Linux源码配置 ...

  9. 【分类器 Softmax-Classifier softmax数学原理与源码详解 深度学习 Pytorch笔记 B站刘二大人(8/10)】

    分类器 Softmax-Classifier softmax数学原理与源码详解 深度学习 Pytorch笔记 B站刘二大人 (8/10) 在进行本章的数学推导前,有必要先粗浅的介绍一下,笔者在广泛查找 ...

最新文章

  1. 计算机语言学教程,《语言学教程》-中文笔记(完整).doc
  2. Xilinx 学习笔记1---新建工程和创建源代码文件
  3. Thread 小总结
  4. APUE学习之多线程编程(二):线程同步
  5. IMA文件如何打开,winimage使用方
  6. 【es】FATAL [circuit_breaking_exception] [parent] Data too large, data for [<http_request>] would be
  7. android滑动fragment,android中ViewPager结合Fragment进行无限滑动
  8. Python字符的转义
  9. 在Linux下安全删除eSATA设备
  10. 《机会的数学》--陈希孺
  11. 深度置信网络(DBN)【经典的DBN网络结构是由若干层 RBM(受限波尔兹曼机)和一层 BP 组成的一种深层神经网络】
  12. html ui在线生成器,漂亮的CSS按钮样式集以及在线生成工具
  13. idea的设置,导致list长度只有1000
  14. python实现自定义搜索内容的天眼查爬虫
  15. 2022.4.24腾讯笔试记录
  16. html css3滤镜,CSS滤镜之Glow属性_css
  17. 谷歌浏览器网页打不开怎么办
  18. vue即时通讯,一个很好用的插件
  19. 小鼠心肌细胞培养方法
  20. ORA-00054 错误原因分析

热门文章

  1. .netcore多语言解决方案
  2. 联想笔记本Win10 F1-F12失效的解决方法
  3. 解决 Visual Studio 2017 RC 不兼容低版本 Visual Studio 创建的 MVC 4 项目的问题
  4. 解决vue-cli 打包后静态资源路径不对的问题
  5. badboy页面脚本发生错误,解决方案
  6. Sublime Text 解决 Unable to download XXX 问题
  7. Git - ‘假设未改变‘和‘跳过工作树‘之间的区别
  8. 如何使用多个参数调用Angular.js过滤器?
  9. TypeError: Cannot read property 'gc' of undefined 使用百度地图报错
  10. scrapy多个page爬取, post请求, 通过爬到的URL继续发请求爬页面