NVIDIA NCCL 源码学习（五）- 路径计算

上节NCCL完成了对机器PCI系统拓扑的建图，其中建好的图如下所示，其中GPU之间是通过NVLink连接起来的

为了方便之后的搜索channel，接下来NCCL会先计算GPU和NIC节点到其他任意节点之间的最优路径，以及对应的带宽，即最优路径上所有边的带宽的最小值。

那么抽象一下，这个问题可以建模为给定一个无向图，每条边有一个权值，给定查询(u, v)，求节点u到节点v的路径，使得路径上的最小边的权值最大，类似无向图的最小瓶颈路，可以用生成树+LCA的方法解决；如果查询中的u是固定的，那么也可以使用类似SPFA的方法解决，将松弛方法改一下即可。

上节忘记介绍图的数据结构，这里补一下

图中的边由ncclTopoLink表示，type区分边的类型，比如NVLink，PCI；width表示带宽；remNode表示当前边连接的对端节点

最后计算出来节点之间的路径由ncclTopoLinkList表示，路径一共有count条边，这个路径的带宽是width，即count条边中带宽最小为width，list为具体的边

struct ncclTopoLink {int type;float width;struct ncclTopoNode* remNode;
};struct ncclTopoLinkList {struct ncclTopoLink* list[NCCL_TOPO_MAX_HOPS];int count;float width;int type;
};

其中type为路径的类型，一共有如下几种枚举值

#define PATH_LOC 0
#define PATH_NVL 1
#define PATH_PIX 2
#define PATH_PXB 3
#define PATH_PHB 4
#define PATH_SYS 5
#define PATH_NET 6

PATH_LOC为节点到自己，PATH_NVL表示路径上的边都是NVLink，PATH_PIX表示经过最多一个PCIe switch，PATH_PXB表示经过了多个PCIe witch，但是没有经过CPU，PATH_PHB表示经过了CPU，PATH_SYS表示不同numa之间的路径。

每个节点由ncclTopoNode表示，nlinks表示该节点有几条边，links存储了具体连接的边，paths存储了到其他节点的路径，node1中的paths[type][id]就是node1到type类型的第id个node的路径

ncclTopoNodeSet表示某种类型的所有节点，比如GPU，PCI，NIC等，ncclTopoSystem存储了全局所有类型的节点

struct ncclTopoNodeSet {int count;struct ncclTopoNode nodes[NCCL_TOPO_MAX_NODES];
};struct ncclTopoSystem {struct ncclTopoNodeSet nodes[NCCL_TOPO_NODE_TYPES];float maxWidth;
};struct ncclTopoNode {int type;int64_t id;// Type specific dataunion {struct {int dev; // NVML dev numberint rank;int cudaCompCap;int gdrSupport;}gpu;struct {uint64_t asic;int port;float width;int gdrSupport;int collSupport;int maxChannels;}net;struct {int arch;int vendor;int model;cpu_set_t affinity;}cpu;};int nlinks;struct ncclTopoLink links[NCCL_TOPO_MAX_LINKS];// Pre-computed paths to GPUs and NICsstruct ncclTopoLinkList* paths[NCCL_TOPO_NODE_TYPES];// Used during searchuint64_t used;
};

然后看下NCCL路径计算的过程，主要是这三步

  NCCLCHECK(ncclTopoComputePaths(comm->topo, comm->peerInfo));// Remove inaccessible GPUs and unused NICsNCCLCHECK(ncclTopoTrimSystem(comm->topo, comm));// Recompute paths after trimmingNCCLCHECK(ncclTopoComputePaths(comm->topo, comm->peerInfo));

其中ncclTopoComputePaths就是执行路径的计算，ncclTopoTrimSystem是删除用不到的节点，接下来详细看下

ncclResult_t ncclTopoComputePaths(struct ncclTopoSystem* system, struct ncclPeerInfo* peerInfos) {// Precompute paths between GPUs/NICs.// Remove everything in case we're re-computingfor (int t=0; t<NCCL_TOPO_NODE_TYPES; t++) ncclTopoRemovePathType(system, t);// Set direct paths from/to CPUs. We need them in many cases.for (int c=0; c<system->nodes[CPU].count; c++) {NCCLCHECK(ncclTopoSetPaths(system->nodes[CPU].nodes+c, system));}...
}

首先通过ncclTopoRemovePathType将所有node中的paths清空

ncclTopoSetPaths作用就是计算出其他所有节点到baseNode的path，这里遍历所有的CPU节点，计算出其他所有节点到所有CPU节点的路径

ncclTopoSetPaths实现类似SPFA，由于这个版本的NCCL不允许GPU作为路径的中间节点，所以在SPFA的过程中不会将GPU节点添加到队列中更新其他节点，相当于这个无向图没有环，因此这个场景下的SPFA过程也就相当于BFS

这里baseNode就是CPU节点，先分配CPU到CPU path的空间，nodeList和nextNodeList就是队列的作用，先将baseNode入队列

getPath函数是获取node中到type为t的第id个节点的路径path

static ncclResult_t getPath(struct ncclTopoSystem* system, struct ncclTopoNode* node, int t, int64_t id, struct ncclTopoLinkList** pat
h);

通过getPath获取到CPU节点到自己的path，然后设置count为0，带宽为LOC_WIDTH，type为PATH_LOC

然后每次从nodeList中拿出一个节点node，获取node到baseNode的路径path，然后用node去更新和node相连的节点，遍历node的边link，获取link对端节点remNode，获取remNode到baseNode的路径remPath，此时需要比较两个路径哪个更优，一个路径是原来的remPath，另一个是path+link这个新路径，新路径的带宽width是path和link的带宽取个min，如果width大于remPath->width，那么remPath更新为path+link

static ncclResult_t ncclTopoSetPaths(struct ncclTopoNode* baseNode, struct ncclTopoSystem* system) {if (baseNode->paths[baseNode->type] == NULL) {NCCLCHECK(ncclCalloc(baseNode->paths+baseNode->type, system->nodes[baseNode->type].count));}// breadth-first search to set all paths to that node in the systemstruct ncclTopoNodeList nodeList;struct ncclTopoNodeList nextNodeList;nodeList.count = 1; nodeList.list[0] = baseNode;nextNodeList.count = 0;struct ncclTopoLinkList* basePath;NCCLCHECK(getPath(system, baseNode, baseNode->type, baseNode->id, &basePath));basePath->count = 0;basePath->width = LOC_WIDTH;basePath->type = PATH_LOC;while (nodeList.count) {nextNodeList.count = 0;for (int n=0; n<nodeList.count; n++) {struct ncclTopoNode* node = nodeList.list[n];struct ncclTopoLinkList* path;NCCLCHECK(getPath(system, node, baseNode->type, baseNode->id, &path));for (int l=0; l<node->nlinks; l++) {struct ncclTopoLink* link = node->links+l;struct ncclTopoNode* remNode = link->remNode;if (remNode->paths[baseNode->type] == NULL) {NCCLCHECK(ncclCalloc(remNode->paths+baseNode->type, system->nodes[baseNode->type].count));}struct ncclTopoLinkList* remPath;NCCLCHECK(getPath(system, remNode, baseNode->type, baseNode->id, &remPath));float width = std::min(path->width, link->width);if (remPath->width < width) {// Find reverse linkfor (int l=0; l<remNode->nlinks; l++) {if (remNode->links[l].remNode == node) {remPath->list[0] = remNode->links+l;break;}}if (remPath->list[0] == NULL) {WARN("Failed to find reverse path from remNode %d/%lx nlinks %d to node %d/%lx",remNode->type, remNode->id, remNode->nlinks, node->type, node->id);return ncclInternalError;}// Copy the rest of the pathfor (int i=0; i<path->count; i++) remPath->list[i+1] = path->list[i];remPath->count = path->count + 1;remPath->width = width;// Start with path type = link type. PATH and LINK types are supposed to match.// Don't consider LINK_NET as we only care about the NIC->GPU path.int type = link->type == LINK_NET ? 0 : link->type;// Differentiate between one and multiple PCI switchesif (type == PATH_PIX && (node->type == PCI || link->remNode->type == PCI) && remPath->count > 3) type = PATH_PXB;// Consider a path going through the CPU as PATH_PHBif (link->type == LINK_PCI && (node->type == CPU || link->remNode->type == CPU)) type = PATH_PHB;// Ignore Power CPU in an NVLink pathif (path->type == PATH_NVL && type == PATH_SYS && link->remNode->type == CPU &&link->remNode->cpu.arch == NCCL_TOPO_CPU_ARCH_POWER) type = 0;remPath->type = std::max(path->type, type);// Add to the list for the next iteration if not already in the list// Disallow GPUs as intermediate steps for nowif (remNode->type != GPU) {int i;for (i=0; i<nextNodeList.count; i++) if (nextNodeList.list[i] == remNode) break;if (i == nextNodeList.count) nextNodeList.list[nextNodeList.count++] = remNode;}}}}memcpy(&nodeList, &nextNodeList, sizeof(nodeList));}return ncclSuccess;
}

路径更新后需要计算remPath的type，这里有个取巧的地方是上节设置边type和本节设置路径type是对应的，比如LINK_PCI等于PATH_PIX，然后可以看到之前说的各种路径的type是怎么计算出来的

首先计算当前link作为一条路径的type，初始化为link的type，比如这个边是LINK_PCI，那么就是LINK_PIX，如果remPath的count大于3的话type就会更新为PATH_PXB（但是这里有个疑问是大于3可能也跨过了两个PCIe switch），如果link有一端是CPU，那么type进一步更新为PATH_PHB，最后取个max，remPath->type = std::max(path->type, type)

#define LINK_LOC 0
#define LINK_NVL 1
#define LINK_PCI 2
// Skipping 3 for PATH_PXB
// Skipping 4 for PATH_PHB
#define LINK_SYS 5
#define LINK_NET 6#define PATH_LOC 0
#define PATH_NVL 1
#define PATH_PIX 2
#define PATH_PXB 3
#define PATH_PHB 4
#define PATH_SYS 5
#define PATH_NET 6

如果remNode不是GPU，那么将remNode添加到nextNodeList，等nodeList遍历完之后，将nextNodeList赋给nodeList继续遍历。

然后回到ncclTopoComputePaths，还是使用ncclTopoSetPaths计算GPU节点到其他所有节点的距离

ncclResult_t ncclTopoComputePaths(struct ncclTopoSystem* system, struct ncclPeerInfo* peerInfos) {...// Set direct paths from/to GPUs.for (int g=0; g<system->nodes[GPU].count; g++) {// Compute paths to GPU gNCCLCHECK(ncclTopoSetPaths(system->nodes[GPU].nodes+g, system));// Update path when we don't want to / can't use GPU Direct P2Pfor (int p=0; p<system->nodes[GPU].count; p++) {int p2p, read;NCCLCHECK(ncclTopoCheckP2p(system, system->nodes[GPU].nodes[p].id, system->nodes[GPU].nodes[g].id, &p2p, &read));if (p2p == 0) {// Divert all traffic through the CPUint cpu;NCCLCHECK(getLocalCpu(system, g, &cpu));NCCLCHECK(addCpuStep(system, cpu, GPU, p, GPU, g));}}if (peerInfos == NULL) continue;// Remove GPUs we can't talk to because of containers.struct ncclPeerInfo* dstInfo = peerInfos+system->nodes[GPU].nodes[g].gpu.rank;for (int p=0; p<system->nodes[GPU].count; p++) {if (p == g) continue;struct ncclPeerInfo* srcInfo = peerInfos+system->nodes[GPU].nodes[p].gpu.rank;int shm;NCCLCHECK(ncclTransports[TRANSPORT_SHM].canConnect(&shm, system, NULL, srcInfo, dstInfo));if (shm == 0) {// Mark this peer as inaccessible. We'll trim it later.system->nodes[GPU].nodes[p].paths[GPU][g].count = 0;}}}// Set direct paths from/to NICs.for (int n=0; n<system->nodes[NET].count; n++) {struct ncclTopoNode* netNode = system->nodes[NET].nodes+n;NCCLCHECK(ncclTopoSetPaths(netNode, system));for (int g=0; g<system->nodes[GPU].count; g++) {// Update path when we dont want to / can't use GPU Direct RDMA.int gdr;NCCLCHECK(ncclTopoCheckGdr(system, system->nodes[GPU].nodes[g].id, netNode->id, 0, &gdr));if (gdr == 0) {// We cannot use GPU Direct RDMA, divert all traffic through the CPU local to the GPUint localCpu;NCCLCHECK(getLocalCpu(system, g, &localCpu));NCCLCHECK(addCpuStep(system, localCpu, NET, n, GPU, g));NCCLCHECK(addCpuStep(system, localCpu, GPU, g, NET, n));}}}return ncclSuccess;
}

然后通过ncclTopoCheckP2p检查当前GPU节点和其他所有的GPU节点之间是否可以使用p2p通信，其实就是判断gpu1到gpu2的路径type是否满足p2pLevel的限制，默认p2pLevel是PATH_SYS，如果用户没有通过环境变量设置的话就相当于没有限制，任意gpu之间都是支持p2p通信，另外如果路径类型为PATH_NVL的话，那么还支持p2p read。

ncclResult_t ncclTopoCheckP2p(struct ncclTopoSystem* system, int64_t id1, int64_t id2, int* p2p, int *read) {*p2p = 0;*read = 0;// Get GPUs from topologyint g1, g2;NCCLCHECK(ncclTopoIdToIndex(system, GPU, id1, &g1));struct ncclTopoNode* gpu1 = system->nodes[GPU].nodes+g1;if (ncclTopoIdToIndex(system, GPU, id2, &g2) == ncclInternalError) {// GPU not found, we can't use p2p.return ncclSuccess;}struct ncclTopoLinkList* path = gpu1->paths[GPU]+g2;// In general, use P2P whenever we can.int p2pLevel = PATH_SYS;// User overrideif (ncclTopoUserP2pLevel == -1)NCCLCHECK(ncclGetLevel(&ncclTopoUserP2pLevel, "NCCL_P2P_DISABLE", "NCCL_P2P_LEVEL"));if (ncclTopoUserP2pLevel != -2) {p2pLevel = ncclTopoUserP2pLevel;goto compare;}// Don't use P2P through ARM CPUsint arch, vendor, model;NCCLCHECK(ncclTopoCpuType(system, &arch, &vendor, &model));if (arch == NCCL_TOPO_CPU_ARCH_ARM) p2pLevel = PATH_PXB;if (arch == NCCL_TOPO_CPU_ARCH_X86 && vendor == NCCL_TOPO_CPU_VENDOR_INTEL) {if (model == NCCL_TOPO_CPU_TYPE_BDW) p2pLevel = PATH_PXB;else p2pLevel = PATH_PHB;}compare:// Compute the PCI distance and compare with the p2pLevel.if (path->type <= p2pLevel) *p2p = 1;if (path->type == PATH_NVL) {struct ncclTopoNode* gpu2 = system->nodes[GPU].nodes+g2;// Enable P2P Read for Ampere/NVLink onlyif ((gpu1->gpu.cudaCompCap == gpu2->gpu.cudaCompCap) && (gpu1->gpu.cudaCompCap == 80)) *read = 1;}return ncclSuccess;
}

然后判断当前GPU和其他GPU是否可以通过shm通信，因为在docker环境中如果shm挂载的不一样就无法通信，如果无法通过shm通信的话就将path的count设置为0，之后会删除掉对应节点（但是这里有个疑问，shm不通的话为什么没有继续判断p2p是否可用）。

最后类似GPU，然后对所有的NIC执行ncclTopoSetPaths计算出路径，然后遍历每个NIC和每个GPU，判断是否支持gdr。

ncclResult_t ncclTopoCheckGdr(struct ncclTopoSystem* system, int64_t busId, int netDev, int read, int* useGdr) {*useGdr = 0;// Get GPU and NETint n, g;NCCLCHECK(ncclTopoIdToIndex(system, NET, netDev, &n));struct ncclTopoNode* net = system->nodes[NET].nodes+n;NCCLCHECK(ncclTopoIdToIndex(system, GPU, busId, &g));struct ncclTopoNode* gpu = system->nodes[GPU].nodes+g;// Check that both the NIC and GPUs support itif (net->net.gdrSupport == 0) return ncclSuccess;if (gpu->gpu.gdrSupport == 0) return ncclSuccess;if (read) { // For reads (sends) only enable under certain conditionsint gdrReadParam = ncclParamNetGdrRead();if (gdrReadParam == 0) return ncclSuccess;if (gdrReadParam < 0) {int nvlink = 0;// Since we don't know whether there are other communicators,// it's better to keep things local if we have a single GPU.if (system->nodes[GPU].count == 1) nvlink = 1;for (int i=0; i<system->nodes[GPU].count; i++) {if (i == g) continue;if (gpu->paths[GPU][i].type == PATH_NVL) {nvlink = 1;break;}}if (!nvlink) return ncclSuccess;}}// Check if we are close enough that it makes sense to enable GDRint netGdrLevel = PATH_PXB;NCCLCHECK(ncclGetLevel(&ncclTopoUserGdrLevel, NULL, "NCCL_NET_GDR_LEVEL"));if (ncclTopoUserGdrLevel != -2) netGdrLevel = ncclTopoUserGdrLevel;int distance = gpu->paths[NET][n].type;if (distance > netGdrLevel) {INFO(NCCL_NET,"GPU Direct RDMA Disabled for GPU %lx / HCA %d (distance %d > %d)", busId, netDev, distance, netGdrLevel);return ncclSuccess;}*useGdr = 1;INFO(NCCL_NET,"GPU Direct RDMA Enabled for GPU %lx / HCA %d (distance %d <= %d), read %d", busId, netDev, distance, netGdrLevel, read);return ncclSuccess;
}

这里除了看之前判断是否支持gdr之外，还要看GPU和NIC之间的距离是否小于netGdrLevel，netGdrLevel默认是PATH_PXB，用户也可以自定义，默认值为PXB的原因可见官方文档：

Even though the only theoretical requirement for GPUDirect RDMA to work between a third-party device and an NVIDIA GPU is that they share the same root complex, there exist bugs (mostly in chipsets) causing it to perform badly, or not work at all in certain setups.

We can distinguish between three situations, depending on what is on the path between the GPU and the third-party device:

PCIe switches only

single CPU/IOH

CPU/IOH <-> QPI/HT <-> CPU/IOH

The first situation, where there are only PCIe switches on the path, is optimal and yields the best performance. The second one, where a single CPU/IOH is involved, works, but yields worse performance ( especially peer-to-peer read bandwidth has been shown to be severely limited on some processor architectures ). Finally, the third situation, where the path traverses a QPI/HT link, may be extremely performance-limited or even not work reliably.

可以看到在只有经过PCIe switch的时候性能最好，在经过CPU的时候性能较差，在跨numa的时候性能很差，甚至不可用。

当p2p或者gdr不支持的时候，会通过CPU进行中转，通过getLocalCpu找到最近的CPU c。


static ncclResult_t getLocalCpu(struct ncclTopoSystem* system, int gpu, int* retCpu) {// Find the closest CPU to a GPUint minHops = 0;int localCpu = -1; struct ncclTopoLinkList* paths = system->nodes[GPU].nodes[gpu].paths[CPU];for (int c=0; c<system->nodes[CPU].count; c++) {int hops = paths[c].count;if (minHops == 0 || hops < minHops) {localCpu = c;minHops = hops;}   }if (localCpu == -1) {WARN("Error : could not find CPU close to GPU %d", gpu);return ncclInternalError;}*retCpu = localCpu;return ncclSuccess;
}

然后addCpuStep将 i1 到 i2 的路径修改为 i1 到 c 的路径 + cpu到 i2 的路径。

static ncclResult_t addCpuStep(struct ncclTopoSystem* system, int c, int t1, int i1, int t2, int i2) {struct ncclTopoNode* cpuNode = system->nodes[CPU].nodes+c;struct ncclTopoNode* srcNode = system->nodes[t1].nodes+i1;int l=0;// Node 1 -> CPUfor (int i=0; i<srcNode->paths[CPU][c].count; i++) srcNode->paths[t2][i2].list[l++] = srcNode->paths[CPU][c].list[i];// CPU -> Node 2for (int i=0; i<cpuNode->paths[t2][i2].count; i++) srcNode->paths[t2][i2].list[l++] = cpuNode->paths[t2][i2].list[i];// Update path characteristicssrcNode->paths[t2][i2].count = l;srcNode->paths[t2][i2].type = std::max(srcNode->paths[CPU][c].type, cpuNode->paths[t2][i2].type);srcNode->paths[t2][i2].width = std::min(srcNode->paths[CPU][c].width, cpuNode->paths[t2][i2].width);return ncclSuccess;
}

到这里ncclTopoComputePaths就完成了，接下来会通过ncclTopoTrimSystem删除图中不可达的GPU节点和用不到的NIC。

ncclResult_t ncclTopoTrimSystem(struct ncclTopoSystem* system, struct ncclComm* comm) {int *domains;int64_t *ids;NCCLCHECK(ncclCalloc(&domains, system->nodes[GPU].count));NCCLCHECK(ncclCalloc(&ids, system->nodes[GPU].count));int myDomain = 0;for (int g=0; g<system->nodes[GPU].count; g++) {struct ncclTopoNode* gpu = system->nodes[GPU].nodes+g;domains[g] = g;ids[g] = gpu->id;for (int p=0; p<g; p++) {if (gpu->paths[GPU][p].count > 0) {domains[g] = std::min(domains[g], domains[p]);}}if (gpu->gpu.rank == comm->rank) myDomain = domains[g];}int ngpus = system->nodes[GPU].count;for (int i=0; i<ngpus; i++) {if (domains[i] == myDomain) continue;struct ncclTopoNode* gpu = NULL;int g;for (g=0; g<system->nodes[GPU].count /* This one varies over the loops */; g++) {gpu = system->nodes[GPU].nodes+g;if (gpu->id == ids[i]) break; else gpu=NULL;}if (gpu == NULL) {WARN("Could not find id %lx", ids[i]);free(domains);free(ids);return ncclInternalError;}NCCLCHECK(ncclTopoRemoveNode(system, GPU, g));}comm->localRanks = system->nodes[GPU].count;if (system->nodes[GPU].count == comm->nRanks) {for (int n=system->nodes[NET].count-1; n>=0; n--)NCCLCHECK(ncclTopoRemoveNode(system, NET, n));}free(domains);free(ids);return ncclSuccess;
}

首先通过类似并查集的思路将多个GPU节点合并成多个集合，myDomain为当前rank的GPU所对应的集合号，然后将不属于myDomain集合的GPU节点在图中删除掉，最后判断下如果comm的rank数等于当前图中的gpu节点数，那么说明不需要网卡，所以也将网卡从图中删除。

得到新的图结构后再重新执行一次ncclTopoComputePaths就得到最终各个节点之间的路径了。

NVIDIA NCCL 源码学习（五）- 路径计算相关推荐

NVIDIA NCCL 源码学习（九）- 单机内ncclSend和ncclRecv的过程
上节介绍了通信链路的建立过程,本节介绍下单机内部ncclSend和ncclRecv的运行过程. 单机内的通信都是通过kernel来进行的,所以整个通信的过程可以分为两步,第一步是准备kernel相关的 ...
NVIDIA NCCL 源码学习（六）- channel搜索
上节讲到我们已经计算出GPU和NIC节点到其他任意节点的最优路径了,本节看下nccl中channel的搜索过程. nccl中channel的概念表示一个通信路径,为了更好的利用带宽和网卡,以及同一块数 ...
NVIDIA NCCL 源码学习（八）- 数据通信链路transport的建立
上节以ringGraph为例介绍了机器间channel的连接过程,现在环里每个rank都知道了从哪个rank接收数据以及将数据发送给哪个rank,本节具体介绍下P2P和rdma NET场景下数据通信链 ...
NVIDIA NCCL 源码学习（一）- 初始化及ncclUniqueId的产生
NCCL是英伟达开源的GPU通信库,支持集合通信和点对点通信看下官方给的一个demo #include <stdio.h> #include "cuda_runtime.h&q ...
NVIDIA NCCL 源码学习（七）- 机器间channel连接
上节中完成了单机内部的channel搜索,仍然以ringGraph为例的话,相当于在单台机器内部搜索出来了一系列的环,接下来需要将机器之间的环连接起来. 为了方便理解假设两机十六卡的情况下第一台机器的 ...
NVIDIA NCCL 源码学习（四）- 建图过程
上次分析到nccl对机器PCI系统进行拓扑分析的过程,产出的结果为xml格式,接下来,nccl会根据这个xml进图的建立过程以便之后进行路径搜索. ncclTopoGetSystem的最后会执行ncc ...
NVIDIA NCCL 源码学习（二）- bootstrap网络连接的建立
上次介绍到rank0的机器生成了ncclUniqueId,并完成了机器的bootstrap网络和通信网络的初始化,这节接着看下所有节点间bootstrap的连接是如何建立的 rank0节点执行nccl ...
OptaPlanner源码学习-VRPTW问题计算得分
问题定义车辆路线规划问题是一个经典的组合优化问题,也是旅行商问题的泛化.该问题的定义为: 有给定数量的客户有运输需求: 为从某个固定地点出发和返回的车辆寻找一个最优路线试的可以服务所有的客户车辆路 ...
jieba源码学习------TF-IDF方法计算词权重
# encoding=utf-8 from __future__ import absolute_import import os import jieba import jieba.posseg f ...

NVIDIA NCCL 源码学习（五）- 路径计算

NVIDIA NCCL 源码学习（五）- 路径计算相关推荐

最新文章

热门文章