python聚类系数_NetworkX 计算聚类系数的Python实现

关于聚类系数的原汁原味的介绍，可以参考小世界网络这篇论文 [1]：

The clustering coefficient C(p) is defined as follows. Suppose that a vertex v has kv neighbours; then at most kvðkv 21Þ=2 edges can exist between them (this occurs when every neighbourof v is connected to everyother neighbour of v). Let Cv denote the fraction of these allowable edges that actually exist.

Define C as the average of Cv over all v. For friendship networks, these statistics have intuitive meanings: L is the average number of friendships in the shortest chain connecting two people;

Cv reflects the extent to which friends of v are also friends of each other; and thus C measures the cliquishness of a typical

friendship circle. The data shown in the figure are averages over 20 random realizations of the rewiring process described in Fig.1, and have been normalized by the values L(0), C(0) for a regular lattice. All the graphs have n ¼ 1;000 vertices and an average degree of k ¼ 10 edges per vertex. We note that a logarithmic horizontal scale has been used to resolve the rapid drop in L(p), corresponding to the onset of the small-world phenomenon.

During this drop, C(p) remains almost constant at its value for the regular lattice, indicating that the transition to a small

world is almost undetectable at the local level .

用白话解释一下：

聚类系数的定义：网路中所有长度为2的路径中闭合路径所占的比例 [2]。

节点的聚类系数是度量某节点的两个邻居节点也互为邻居的平均概率 [2]。也就是以该节点为中心，其周围的邻居中相互之间存在边的个数除以该节点周围邻居节点对总数。分子要求必须有连边，分母不要求，也就是该节点的度*(度-1)度*(度-1)。

关于节点聚类系数 [2, 3] 的定义：

\[C_i = \frac{节点 i 的邻居中直接相连的节点对的个数}{节点 i 的邻居节点对的总数}\]

其Python版计算方式如下：

方法1：

def cal_ci(G, u):

'''

@description: 计算节点的聚类系数

@param : G, u

@return:

'''

return sum([1 for x in G[u] for y in G[x] if y in G[u]]) / (G.degree[u] * (G.degree[u] - 1)) if G.degree[u] > 1 else 0

方法2：

def get_ci(G, z):

''' 获取节点聚集系数 Clustering Coefficient'''

return sum([1 for u in G[z] for v in G[z] if not u == v and (u, v) in G.edges()]) / (len(G[z]) * (len(G[z]) - 1)) if len(G[z]) != 0 and len(G[z]) != 1 else 0

注：这里可能会存在疑问，用 (u, v) in G.edges() 会不会出错？实际上 G.edges() 打印出来虽然是列表的形式，但其实是NetworkX的内置类型，可以通过以下代码验证：

G = nx.path_graph(5)

print(G.edges())

print(type(G.edges()))

a = (3, 2) in G.edges()

b = (3, 2) in [(0, 1), (1, 2), (2, 3), (3, 4)]

print(a)

print(b)

运行结果：

[(0, 1), (1, 2), (2, 3), (3, 4)]

True

False

方法3：NetworkX库中的方法：

nx.clustering(G,u)

对应源代码：

def clustering(G, nodes=None, weight=None):

r"""Compute the clustering coefficient for nodes.

For unweighted graphs, the clustering of a node :math:`u`

is the fraction of possible triangles through that node that exist,

.. math::

c_u = \frac{2 T(u)}{deg(u)(deg(u)-1)},

where :math:`T(u)` is the number of triangles through node :math:`u` and

:math:`deg(u)` is the degree of :math:`u`.

For weighted graphs, there are several ways to define clustering [1]_.

the one used here is defined

as the geometric average of the subgraph edge weights [2]_,

.. math::

c_u = \frac{1}{deg(u)(deg(u)-1))}

\sum_{vw} (\hat{w}_{uv} \hat{w}_{uw} \hat{w}_{vw})^{1/3}.

The edge weights :math:`\hat{w}_{uv}` are normalized by the maximum weight

in the network :math:`\hat{w}_{uv} = w_{uv}/\max(w)`.

The value of :math:`c_u` is assigned to 0 if :math:`deg(u) < 2`.

For directed graphs, the clustering is similarly defined as the fraction

of all possible directed triangles or geometric average of the subgraph

edge weights for unweighted and weighted directed graph respectively [3]_.

.. math::

c_u = \frac{1}{deg^{tot}(u)(deg^{tot}(u)-1) - 2deg^{\leftrightarrow}(u)}

T(u),

where :math:`T(u)` is the number of directed triangles through node

:math:`u`, :math:`deg^{tot}(u)` is the sum of in degree and out degree of

:math:`u` and :math:`deg^{\leftrightarrow}(u)` is the reciprocal degree of

:math:`u`.

Parameters

----------

G : graph

nodes : container of nodes, optional (default=all nodes in G)

Compute clustering for nodes in this container.

weight : string or None, optional (default=None)

The edge attribute that holds the numerical value used as a weight.

If None, then each edge has weight 1.

Returns

-------

out : float, or dictionary

Clustering coefficient at specified nodes

Examples

--------

>>> G=nx.complete_graph(5)

>>> print(nx.clustering(G,0))

1.0

>>> print(nx.clustering(G))

{0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0}

Notes

-----

Self loops are ignored.

References

----------

.. [1] Generalizations of the clustering coefficient to weighted

complex networks by J. Saramäki, M. Kivelä, J.-P. Onnela,

K. Kaski, and J. Kertész, Physical Review E, 75 027105 (2007).

http://jponnela.com/web_documents/a9.pdf

.. [2] Intensity and coherence of motifs in weighted complex

networks by J. P. Onnela, J. Saramäki, J. Kertész, and K. Kaski,

Physical Review E, 71(6), 065103 (2005).

.. [3] Clustering in complex directed networks by G. Fagiolo,

Physical Review E, 76(2), 026107 (2007).

"""

if G.is_directed():

if weight is not None:

td_iter = _directed_weighted_triangles_and_degree_iter(

G, nodes, weight)

clusterc = {v: 0 if t == 0 else t / ((dt * (dt - 1) - 2 * db) * 2)

for v, dt, db, t in td_iter}

else:

td_iter = _directed_triangles_and_degree_iter(G, nodes)

clusterc = {v: 0 if t == 0 else t / ((dt * (dt - 1) - 2 * db) * 2)

for v, dt, db, t in td_iter}

else:

if weight is not None:

td_iter = _weighted_triangles_and_degree_iter(G, nodes, weight)

clusterc = {v: 0 if t == 0 else t / (d * (d - 1)) for

v, d, t in td_iter}

else:

td_iter = _triangles_and_degree_iter(G, nodes)

clusterc = {v: 0 if t == 0 else t / (d * (d - 1)) for

v, d, t, _ in td_iter}

if nodes in G:

# Return the value of the sole entry in the dictionary.

return clusterc[nodes]

return clusterc

虽然，求网络的聚类系数定义为网络中存在的三角形个数 * 3 再除以三元组个数，但是实际中大多采用所有节点聚类系数的平均值来计算。

参考文献：

D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442, Jun. 1998.

M.E.J.Newman. 网络科学引论[M]. 2014.

汪小帆, 李翔, 陈关荣. 网络科学导论[J]. 高等教育出版社, 2012.

python聚类系数_NetworkX 计算聚类系数的Python实现相关推荐

Python三种方法计算皮尔逊相关系数以及实现给定数据集，返回数据集中每个特征和标签的相关系数
特征预处理完之后,我们需要选择有意义的特征作为输入机器学习的算法和模型进行训练在统计学中,皮尔逊相关系数( Pearson correlation coefficient),又称皮尔逊积矩相关系数( ...
python可以做科学计算吗_用 Python 做科学计算之最小二乘
前段时间帮@littlemorning做论文,要编程实现经济学的模型,其中主要用最小二乘拟合来估算一些函数的参数.科学计算的活一般来说都会用matlab,不过那样庞大的东西不是我所喜欢的.于是乎转向P ...
python可以用于excel计算吗_你好Python！再见Excel？
现在很多行业,都离不开Excel: 做财务的,要用Excel做报表: 做物流的,会用Excel来跟踪订单情况: 做HR的,会用Excel算工资: 做运营的,会用Excel记录数据做分析. 不知道你有没 ...
计算机都要学python吗_大学计算机要不要学python？
近些年,只要是年轻人,多多少少会看过这么一个广告(或者说几个不同但类似的广告),大意是: "学Python可以加薪升职" 或者是 "学Python"使你工作效率 ...
python混合运算_计算加减乘除混合运算python实现_加减乘除运算题（Python实现）...
Loading... 加减乘除运算题 ------- 程序输出类似"1+3=","4-1=","2*51=","18/2=&quo ...
聚类效果评估——轮廓系数（Silhouette Coefficient）附Python代码
轮廓系数前言是什么? 为什么? 由此可得: 怎么用? 不适用示例前言在机器学习中,无监督学习算法中聚类算法算作相对重要的一部分算法.也常在低资源和无标注的情况下使用. 其中KMeans作为聚 ...
【Pandas】计算相关性系数corr()
相关:数据之间有关联,相互有影响如:A和B 存在一定的相关性,A对B存在一定程度的影响,A变化,B也会有一定的变化如果A和B相等或者 B可以由A经过计算得到---->完全相关如果B是由 ...
众智科学：计算聚集系数和邻里重叠度
实验内容: 输入:任意的有向图输出: 1)每个节点的聚集系数 2)每个节点对的邻里重叠度相关定义介绍: 聚集系数:节点A的聚集系数 = A的任意两个朋友之间也是朋友的概率(即邻居间朋友对的个数除以 ...
Python 结束程序——如何在终端中退出 Python 程序
您可以在终端中执行 Python 代码,就像在 VS Code.Atom 等 IDE 中一样.您可以在 Windows 和 Unix 操作系统(如 Linux 和 macOS)中执行此操作. 在本文中 ...
python社会网络度与聚类系数的计算（network+原理法）
一.借助包完成网络度与聚类系数的计算与可视化 python为我们提供了networkx包,可以帮助进行网络关键指标的实现.networkx是Python的一个包,用于构建和操作复杂的图结构,提供分析图 ...

python聚类系数_NetworkX 计算聚类系数的Python实现

python聚类系数_NetworkX 计算聚类系数的Python实现相关推荐

最新文章

热门文章