RAPIDS cuGraph库是一组图形分析，用于处理GPU数据帧中的数据 - 请参阅cuDF。 cuGraph旨在提供类似NetworkX的API，这对数据科学家来说很熟悉，因此他们现在可以更轻松地构建GPU加速的工作流程

官方文档：
rapidsai/cugraph
cuGraph API Reference

支持的模型：

关联文章：

nvidia-rapids︱cuDF与pandas一样的DataFrame库
NVIDIA的python-GPU算法生态︱ RAPIDS 0.10
nvidia-rapids︱cuML机器学习加速库
nvidia-rapids︱cuGraph(NetworkX-like)关系图模型

文章目录

1 安装与背景
- 1.1 安装
- 1.2 背景
2 简单的demo
3 PageRank

1 安装与背景

1.1 安装

Conda安装，https://github.com/rapidsai/cugraph：

# CUDA 10.0
conda install -c nvidia -c rapidsai -c numba -c conda-forge -c defaults cugraph cudatoolkit=10.0# CUDA 10.1
conda install -c nvidia -c rapidsai -c numba -c conda-forge -c defaults cugraph cudatoolkit=10.1# CUDA 10.2
conda install -c nvidia -c rapidsai -c numba -c conda-forge -c defaults cugraph cudatoolkit=10.2

docker版本，可参考：https://rapids.ai/start.html#prerequisites

docker pull rapidsai/rapidsai:cuda10.1-runtime-ubuntu16.04-py3.7
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \rapidsai/rapidsai:cuda10.1-runtime-ubuntu16.04-py3.7

1.2 背景

cuGraph已在将领先的图形框架集成到一个简单易用的接口方面迈出了新的一步。几个月前，RAPIDS收到了来自佐治亚理工学院的Hornet副本，并将其重构和重命名为cuHornet。这一名称更改表明，源代码已偏离Georgia Tech基准并体现了代码API和数据结构与RAPIDS cuGraph的匹配。cuHornet的加入提供了基于边界的编程模型、动态数据结构以及现有分析的列表。除了核心数函数之外，可用的前两个cuHornet算法是Katz centrality 和K-Cores。

cuGraph是RAPIDS的图形分析库，针对cuGraph我们推出了一个由两个新原语支持的多GPU PageRank算法：这是一个COO到CSR的多GPU数据转换器，和一个计算顶点度的函数。这些原语会被用于将源和目标边缘列从Dask Dataframe转换为图形格式，并使PageRank能够跨越多个GPU进行缩放。

下图显示了新的多GPU PageRank算法的性能。与之前的PageRank基准运行时刻不同，这些运行时刻只是测量PageRank解算器的性能。这组运行时刻包括Dask DataFrame到CSR的转换、PageRank执行以及从CSR返回到DataFrame的结果转换。平均结果显示，新的多GPU PageRank分析比100节点Spark集群快10倍以上。

图1：cuGraph PageRank在不同数量的边缘和NVIDIA Tesla V 100上计算所用的时间

下图仅查看Bigdata数据集、5000万个顶点和19.8亿条边，并运行HiBench端到端测试。HiBench基准运行时刻包括数据读取、运行PageRank，然后得到所有顶点的得分。此前，HiBench分别在10、20、50和100个节点的Google GCP上进行了测试。

图2：5千万边缘端到端PageRank运行时刻，cuGraph PageRank vs Spark Graph（越低越好）

2 简单的demo

参考：https://github.com/rapidsai/cugraph

import cugraph# assuming that data has been loaded into a cuDF (using read_csv) Dataframe
gdf = cudf.read_csv("graph_data.csv", names=["src", "dst"], dtype=["int32", "int32"] )# create a Graph using the source (src) and destination (dst) vertex pairs the GDF
G = cugraph.Graph()
G.add_edge_list(gdf, source='src', destination='dst')# Call cugraph.pagerank to get the pagerank scores
gdf_page = cugraph.pagerank(G)for i in range(len(gdf_page)):print("vertex " + str(gdf_page['vertex'][i]) + " PageRank is " + str(gdf_page['pagerank'][i]))

3 PageRank

cugraph.pagerank(G,alpha=0.85, max_iter=100, tol=1.0e-5)

G: cugraph.Graph object
alpha: float, The damping factor represents the probability to follow an outgoing edge. default is 0.85
max_iter: int, The maximum number of iterations before an answer is returned. This can be used to limit the execution time and do an early exit before the solver reaches the convergence tolerance. If this value is lower or equal to 0 cuGraph will use the default value, which is 100
tol: float, Set the tolerance the approximation, this parameter should be a small magnitude value. The lower the tolerance the better the approximation. If this value is 0.0f, cuGraph will use the default value which is 0.00001. Setting too small a tolerance can lead to non-convergence due to numerical roundoff. Usually values between 0.01 and 0.00001 are acceptable.

Returns:

df: a cudf.DataFrame object with two columns:
- df[‘vertex’]: The vertex identifier for the vertex
- df[‘pagerank’]: The pagerank score for the vertex

安装：

# The notebook compares cuGraph to NetworkX,
# therefore there some additional non-RAPIDS python libraries need to be installed.
# Please run this cell if you need the additional libraries
!pip install networkx
!pip install scipy

代码模块：

# Import needed libraries
import cugraph
import cudf
from collections import OrderedDict# NetworkX libraries
import networkx as nx
from scipy.io import mmread# 相关参数# define the parameters
max_iter = 100  # The maximum number of iterations
tol = 0.00001   # tolerance
alpha = 0.85    # alpha
# Define the path to the test data
datafile='../data/karate-data.csv'# NetworkX
# Read the data, this also created a NetworkX Graph
file = open(datafile, 'rb')
Gnx = nx.read_edgelist(file)pr_nx = nx.pagerank(Gnx, alpha=alpha, max_iter=max_iter, tol=tol)

cuGraph模型：

# cuGraph# Read the data
gdf = cudf.read_csv(datafile, names=["src", "dst"], delimiter='\t', dtype=["int32", "int32"] )# create a Graph using the source (src) and destination (dst) vertex pairs from the Dataframe
G = cugraph.Graph()
G.from_cudf_edgelist(gdf, source='src', destination='dst')# Call cugraph.pagerank to get the pagerank scores
gdf_page = cugraph.pagerank(G)# Find the most important vertex using the scores
# This methods should only be used for small graph
bestScore = gdf_page['pagerank'][0]
bestVert = gdf_page['vertex'][0]for i in range(len(gdf_page)):if gdf_page['pagerank'][i] > bestScore:bestScore = gdf_page['pagerank'][i]bestVert = gdf_page['vertex'][i]print("Best vertex is " + str(bestVert) + " with score of " + str(bestScore))# A better way to do that would be to find the max and then use that values in a query
pr_max = gdf_page['pagerank'].max()def print_pagerank_threshold(_df, t=0) :filtered = _df.query('pagerank >= @t')for i in range(len(filtered)):print("Best vertex is " + str(filtered['vertex'][i]) + " with score of " + str(filtered['pagerank'][i]))              print_pagerank_threshold(gdf_page, pr_max)
sort_pr = gdf_page.sort_values('pagerank', ascending=False)
d = G.degrees()
d.sort_values('out_degree', ascending=False).head(4)

关联结果：

nvidia-rapids︱cuGraph(NetworkX-like)关系图模型相关推荐

Ubuntu系统python3.6版本，networkx画关系图显示中文处理
先参考这篇文章: Ubuntu系统python3.6版本,networkx画关系图显示中文处理_u013617229的博客-CSDN博客 Ubuntu系统python3.6版本,networkx画关系 ...
AGGCN | 基于图神经网络的关系抽取模型
今天给大家介绍2019年6月发表在ACL上的论文"Attention Guided Graph Convolutional Networks for Relation Extraction& ...
ACL 2018论文解读 | 基于路径的实体图关系抽取模型
在碎片化阅读充斥眼球的时代,越来越少的人会去关注每篇论文背后的探索和思考. 在这个栏目里,你会快速 get 每篇精选论文的亮点和痛点,时刻紧跟 AI 前沿成果. 点击本文底部的「阅读原文」即刻加入社区 ...
呆萌的图模型学习——使用networkx计算node2vec，得到节点与边的embedding(三)
networkx基本操作请参考:[呆萌的图模型学习--图基本信息 & Networkx基本操作(二)](呆萌的图模型学习--图基本信息 & Networkx基本操作(二)) node2 ...
呆萌的图模型学习——图基本信息 Networkx基本操作(二)
Networkx是python常用的处理图模型的工具包,可以方便的处理图模型: Github项目主页:https://github.com/networkx/networkx 官网基本教程:https ...
R语言使用lm函数构建回归模型、使用broom包的augmented函数将模型结果存入dataframe中、使用ggplot2可视化回归残差图（拟合值和残差值的关系图）
R语言使用lm函数构建回归模型.使用broom包的augmented函数将模型结果存入dataframe中.使用ggplot2可视化回归残差图(拟合值和残差值的关系图) 目录
基于NEO4J图模型的关系计算
基于NEO4J图模型的关系计算一.原始图模型二.计算关系(不溯源) 三.计算关系(溯源) 四.批量人员关系计算(一对多的计算)<仅供参考> 一.原始图模型原始图模型只有人与发帖之间的 ...
ML：通过数据预处理(分布图/箱型图/模型寻找异常值/热图/散点图/回归关系/修正分布正态化/QQ分位图/构造交叉特征/平均数编码)利用十种算法模型调优实现工业蒸汽量回归预测(交叉训练/模型融合)之详
ML之LightGBM:通过数据预处理(分布图/箱型图/模型寻找异常值/热图/散点图/回归关系/修正分布正态化/QQ分位图/构造交叉特征/平均数编码)利用十种算法模型调优实现工业蒸汽量回归预测(交叉训 ...
图的概念与主要类型、图模型的应用场景
前言图(Graph)是一个常见的数据结构,现实世界中有很多任务都可以抽象成图问题,比如社交网络,蛋白体结构,交通路网数据,以及很火的知识图谱等,甚至规则网络结构数据(如图像,视频等)也是图数据的一种 ...

nvidia-rapids︱cuGraph(NetworkX-like)关系图模型