关于聚类系数的原汁原味的介绍,可以参考小世界网络这篇论文 [1]:

The clustering coefficient C(p) is defined as follows. Suppose that a vertex v has kv neighbours; then at most kvðkv 21Þ=2 edges can exist between them (this occurs when every neighbourof v is connected to everyother neighbour of v). Let Cv denote the fraction of these allowable edges that actually exist.

Define C as the average of Cv over all v. For friendship networks, these statistics have intuitive meanings: L is the average number of friendships in the shortest chain connecting two people;

Cv reflects the extent to which friends of v are also friends of each other; and thus C measures the cliquishness of a typical

friendship circle. The data shown in the figure are averages over 20 random realizations of the rewiring process described in Fig.1, and have been normalized by the values L(0), C(0) for a regular lattice. All the graphs have n ¼ 1;000 vertices and an average degree of k ¼ 10 edges per vertex. We note that a logarithmic horizontal scale has been used to resolve the rapid drop in L(p), corresponding to the onset of the small-world phenomenon.

During this drop, C(p) remains almost constant at its value for the regular lattice, indicating that the transition to a small

world is almost undetectable at the local level .

用白话解释一下:

聚类系数的定义:网路中所有长度为2的路径中闭合路径所占的比例 [2]。

节点的聚类系数是度量某节点的两个邻居节点也互为邻居的平均概率 [2]。也就是以该节点为中心,其周围的邻居中相互之间存在边的个数 除以 该节点周围邻居节点对总数。分子要求必须有连边,分母不要求,也就是 该节点的 度*(度-1)度*(度-1)。

关于节点 聚类系数 [2, 3] 的定义:

\[C_i = \frac{节点 i 的邻居中直接相连的节点对的个数}{节点 i 的邻居节点对的总数}\]

其Python版计算方式如下:

方法1:

def cal_ci(G, u):

'''

@description: 计算节点的聚类系数

@param : G, u

@return:

'''

return sum([1 for x in G[u] for y in G[x] if y in G[u]]) / (G.degree[u] * (G.degree[u] - 1)) if G.degree[u] > 1 else 0

方法2:

def get_ci(G, z):

''' 获取节点聚集系数 Clustering Coefficient'''

return sum([1 for u in G[z] for v in G[z] if not u == v and (u, v) in G.edges()]) / (len(G[z]) * (len(G[z]) - 1)) if len(G[z]) != 0 and len(G[z]) != 1 else 0

注:这里可能会存在疑问,用 (u, v) in G.edges() 会不会出错?实际上 G.edges() 打印出来虽然是列表的形式,但其实是NetworkX的内置类型,可以通过以下代码验证:

G = nx.path_graph(5)

print(G.edges())

print(type(G.edges()))

a = (3, 2) in G.edges()

b = (3, 2) in [(0, 1), (1, 2), (2, 3), (3, 4)]

print(a)

print(b)

运行结果:

[(0, 1), (1, 2), (2, 3), (3, 4)]

True

False

方法3:NetworkX库中的方法:

nx.clustering(G,u)

对应源代码:

def clustering(G, nodes=None, weight=None):

r"""Compute the clustering coefficient for nodes.

For unweighted graphs, the clustering of a node :math:`u`

is the fraction of possible triangles through that node that exist,

.. math::

c_u = \frac{2 T(u)}{deg(u)(deg(u)-1)},

where :math:`T(u)` is the number of triangles through node :math:`u` and

:math:`deg(u)` is the degree of :math:`u`.

For weighted graphs, there are several ways to define clustering [1]_.

the one used here is defined

as the geometric average of the subgraph edge weights [2]_,

.. math::

c_u = \frac{1}{deg(u)(deg(u)-1))}

\sum_{vw} (\hat{w}_{uv} \hat{w}_{uw} \hat{w}_{vw})^{1/3}.

The edge weights :math:`\hat{w}_{uv}` are normalized by the maximum weight

in the network :math:`\hat{w}_{uv} = w_{uv}/\max(w)`.

The value of :math:`c_u` is assigned to 0 if :math:`deg(u) < 2`.

For directed graphs, the clustering is similarly defined as the fraction

of all possible directed triangles or geometric average of the subgraph

edge weights for unweighted and weighted directed graph respectively [3]_.

.. math::

c_u = \frac{1}{deg^{tot}(u)(deg^{tot}(u)-1) - 2deg^{\leftrightarrow}(u)}

T(u),

where :math:`T(u)` is the number of directed triangles through node

:math:`u`, :math:`deg^{tot}(u)` is the sum of in degree and out degree of

:math:`u` and :math:`deg^{\leftrightarrow}(u)` is the reciprocal degree of

:math:`u`.

Parameters

----------

G : graph

nodes : container of nodes, optional (default=all nodes in G)

Compute clustering for nodes in this container.

weight : string or None, optional (default=None)

The edge attribute that holds the numerical value used as a weight.

If None, then each edge has weight 1.

Returns

-------

out : float, or dictionary

Clustering coefficient at specified nodes

Examples

--------

>>> G=nx.complete_graph(5)

>>> print(nx.clustering(G,0))

1.0

>>> print(nx.clustering(G))

{0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0}

Notes

-----

Self loops are ignored.

References

----------

.. [1] Generalizations of the clustering coefficient to weighted

complex networks by J. Saramäki, M. Kivelä, J.-P. Onnela,

K. Kaski, and J. Kertész, Physical Review E, 75 027105 (2007).

http://jponnela.com/web_documents/a9.pdf

.. [2] Intensity and coherence of motifs in weighted complex

networks by J. P. Onnela, J. Saramäki, J. Kertész, and K. Kaski,

Physical Review E, 71(6), 065103 (2005).

.. [3] Clustering in complex directed networks by G. Fagiolo,

Physical Review E, 76(2), 026107 (2007).

"""

if G.is_directed():

if weight is not None:

td_iter = _directed_weighted_triangles_and_degree_iter(

G, nodes, weight)

clusterc = {v: 0 if t == 0 else t / ((dt * (dt - 1) - 2 * db) * 2)

for v, dt, db, t in td_iter}

else:

td_iter = _directed_triangles_and_degree_iter(G, nodes)

clusterc = {v: 0 if t == 0 else t / ((dt * (dt - 1) - 2 * db) * 2)

for v, dt, db, t in td_iter}

else:

if weight is not None:

td_iter = _weighted_triangles_and_degree_iter(G, nodes, weight)

clusterc = {v: 0 if t == 0 else t / (d * (d - 1)) for

v, d, t in td_iter}

else:

td_iter = _triangles_and_degree_iter(G, nodes)

clusterc = {v: 0 if t == 0 else t / (d * (d - 1)) for

v, d, t, _ in td_iter}

if nodes in G:

# Return the value of the sole entry in the dictionary.

return clusterc[nodes]

return clusterc

虽然,求网络的聚类系数定义为网络中存在的三角形个数 * 3 再除以 三元组个数,但是实际中大多采用所有节点聚类系数的平均值来计算。

参考文献:

D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442, Jun. 1998.

M.E.J.Newman. 网络科学引论[M]. 2014.

汪小帆, 李翔, 陈关荣. 网络科学导论[J]. 高等教育出版社, 2012.

python聚类系数_NetworkX 计算聚类系数的Python实现相关推荐

  1. Python三种方法计算皮尔逊相关系数以及实现给定数据集,返回数据集中每个特征和标签的相关系数

    特征预处理完之后,我们需要选择有意义的特征作为输入机器学习的算法和模型进行训练 在统计学中,皮尔逊相关系数( Pearson correlation coefficient),又称皮尔逊积矩相关系数( ...

  2. python可以做科学计算吗_用 Python 做科学计算之最小二乘

    前段时间帮@littlemorning做论文,要编程实现经济学的模型,其中主要用最小二乘拟合来估算一些函数的参数.科学计算的活一般来说都会用matlab,不过那样庞大的东西不是我所喜欢的.于是乎转向P ...

  3. python可以用于excel计算吗_你好Python!再见Excel?

    现在很多行业,都离不开Excel: 做财务的,要用Excel做报表: 做物流的,会用Excel来跟踪订单情况: 做HR的,会用Excel算工资: 做运营的,会用Excel记录数据做分析. 不知道你有没 ...

  4. 计算机都要学python吗_大学计算机要不要学python?

    近些年,只要是年轻人,多多少少会看过这么一个广告(或者说几个不同但类似的广告),大意是: "学Python可以加薪升职" 或者是 "学Python"使你工作效率 ...

  5. python混合运算_计算加减乘除混合运算python实现_加减乘除运算题(Python实现)...

    Loading... 加减乘除运算题 ------- 程序输出类似"1+3=","4-1=","2*51=","18/2=&quo ...

  6. 聚类效果评估——轮廓系数(Silhouette Coefficient)附Python代码

    轮廓系数 前言 是什么? 为什么? 由此可得: 怎么用? 不适用 示例 前言 在机器学习中,无监督学习算法中聚类算法算作相对重要的一部分算法.也常在低资源和无标注的情况下使用. 其中KMeans作为聚 ...

  7. 【Pandas】计算相关性系数corr()

    相关:数据之间有关联,相互有影响 如:A和B 存在一定的相关性,A对B存在一定程度的影响,A变化,B也会有一定的变化 如果A和B相等 或者 B可以由A经过计算得到---->完全相关 如果B是由 ...

  8. 众智科学:计算聚集系数和邻里重叠度

    实验内容: 输入:任意的有向图 输出: 1)每个节点的聚集系数 2)每个节点对的邻里重叠度 相关定义介绍: 聚集系数:节点A的聚集系数 = A的任意两个朋友之间也是朋友的概率(即邻居间朋友对的个数除以 ...

  9. Python 结束程序——如何在终端中退出 Python 程序

    您可以在终端中执行 Python 代码,就像在 VS Code.Atom 等 IDE 中一样.您可以在 Windows 和 Unix 操作系统(如 Linux 和 macOS)中执行此操作. 在本文中 ...

  10. python社会网络度与聚类系数的计算(network+原理法)

    一.借助包完成网络度与聚类系数的计算与可视化 python为我们提供了networkx包,可以帮助进行网络关键指标的实现.networkx是Python的一个包,用于构建和操作复杂的图结构,提供分析图 ...

最新文章

  1. 烦死调参数-想设计一个自动调参机制
  2. javafx应用启动自动执行函数_一张图,理顺 Spring Boot应用在启动阶段执行代码的几种方式...
  3. linux ls 输出对齐,理解 Linux 中 `ls` 的输出
  4. C++modular exponentiation模幂运算的实现算法(附完整源码)
  5. PowerDesigner 15生成数据字典
  6. 普通电脑能做成瘦客户机吗_阿里云的“无影”云电脑,能不能真的代替传统电脑...
  7. 学习《软件评测师教程》
  8. MATLAB加入螺旋相位板调制,螺旋相位板的操作原理和使用手册_维尔克斯光电
  9. java/php/net/python大学生就业管理系统设计
  10. Modbus通信协议
  11. 网络相关概念扫盲:公网IP和私网IP 静态IP和动态IP 路由器和交换机和网关
  12. matlab字符识别ocr,OCR字符识别 matlab
  13. NOIP2012 国王游戏(贪心)
  14. 如何在Windows下安装听云NodeJs探针
  15. 分享为小程序添加自动回复消息的5种方法!自动客服功能的微信小程序
  16. mpeg2是信源还是信道编码_11.2、11.3信源及信道编码.ppt
  17. 如何使用 DiskGenius 合并分区教程
  18. 【总结】pick定理Farey序列
  19. 2022飞鱼科技-鱼苗夏令营实习-游戏客户端-终面(高管面)已挂
  20. Python File文件处理 删除文件(remove)

热门文章

  1. 189邮箱smpt服务器,189帮助中心
  2. 网页游戏外挂的设计与编写:QQ摩天大楼【二】(登陆准备-信息处理方式)
  3. TYPEC-CC逻辑芯片-E-MARK数据线-浅析
  4. BitTorrent
  5. html 页面长度单位,css绝对长度单位有哪些?
  6. c语言:用牛顿迭代法求方程在1.5附近的根:2x^3-4x^2+3x-6=0.
  7. VMware彻底删除、扫描添加导入,已安装好的虚拟计算机
  8. 苹果safari浏览器video视频无法播放
  9. Webbygram:网页版Instagram再生
  10. 关于vcard通讯录格式解析-微信二维码解析