贝叶斯网络R语言学习笔记1

2021年7月19日19:54:12

一、创建贝叶斯网络结构

贝叶斯网络的图结构存储在bn对象中，可以通过三种表示来创建bn对象，即the arc set of the graph, its adjacency matrix or a model formula（边集、邻接矩阵、模型公式）。此外，可以通过empty.graph() 和random.graph()函数创建空网络结构或随机网络结构。

1、创建空网络

空网络应该是最简单的形式了，先从简单的入门。

# 产生空网络
library(bnlearn)
e = empty.graph(LETTERS[1:6])
e
class(e)
# 结果Random/Generated Bayesian networkmodel:[A][B][C][D][E][F] nodes:                                 6 arcs:                                  0 undirected arcs:                     0 directed arcs:                       0 average markov blanket size:           0.00 average neighbourhood size:            0.00 average branching factor:              0.00 generation algorithm:                  Empty > class(e)
[1] "bn"

通常empty.graph()默认只产生一个bn对象，可以指定参数num，使之产生一批网络对象。

empty.graph(LETTERS[1:6], num = 2) #此处指定num = 2，产生两个bn对象

2、创建特定网络结构

①特定的边集

要产生边集，首先需要指定一个边集的矩阵。以下代码产生一个矩阵，两列，按行填充，行名为from，to。

arc.set = matrix(c("A", "C", "B", "F", "C", "F"),ncol = 2, byrow = TRUE,dimnames = list(NULL, c("from", "to")))
arc.setfrom to
[1,] "A"  "C"
[2,] "B"  "F"
[3,] "C"  "F"

将产生的边集arc.set传给bn对象使用arcs()函数

arcs(e) = arc.set
eRandom/Generated Bayesian networkmodel:[A][B][D][E][C|A][F|B:C] nodes:                                 6 arcs:                                  3 undirected arcs:                     0 directed arcs:                       3 average markov blanket size:           1.33 average neighbourhood size:            1.00 average branching factor:              0.50 generation algorithm:                  Empty

对于arcs()函数，其会对边集做一定的检查，来看是否符合上下条件。

边的标签要和图里的标签一致

bogus = matrix(c("X", "Y", "W", "Z"),ncol = 2, byrow = TRUE,dimnames = list(NULL, c("from", "to")))
bogusfrom to
[1,] "X"  "Y"
[2,] "W"  "Z"

将bogus传给bn对象e，会出现报错

arcs(e) = bogusError in check.arcs(value, nodes = names(x$nodes)) : node(s) 'X' 'W' 'Y' 'Z' not present in the graph.

不能引入环(e.g. A → B → C → A) ，除非我们设置ignore.cycles = TURE

cycle = matrix(c("A", "C", "C", "B", "B", "A"),ncol = 2, byrow = TRUE,dimnames = list(NULL, c("from", "to")))
cyclefrom to
[1,] "A"  "C"
[2,] "C"  "B"
[3,] "B"  "A"arcs(e) = cycleError in `arcs<-`(`*tmp*`, value = c("A", "C", "B", "C", "B", "A")) : the specified network contains cycles.arcs(e,ignore.cycles = TRUE) = cycleError in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = c("A", "C", "B",  : 参数没有用(ignore.cycles = TRUE)> acyclic(e) # check whether the graph is acyclic/completely directed.检测是否无环
[1] TRUE

不能引入圈 (e.g. A → A).

> loops = matrix(c("A", "A", "B", "B", "C", "D"),
-                ncol = 2, byrow = TRUE,
-                dimnames = list(NULL, c("from", "to")))
> loopsfrom to
[1,] "A"  "A"
[2,] "B"  "B"
[3,] "C"  "D"
> arcs(e) = loops
Error in check.arcs(value, nodes = names(x$nodes)) : invalid arcs that are actually loops:A -> A B -> B

可以通过在边集中包括双向边引入非直接边(e.g. A → B and B → A)，保证双向均未引入环，设置ignore.cycles会覆盖这个检查。

> edges = matrix(c("A", "B", "B", "A", "C", "D"),
+                  ncol = 2, byrow = TRUE,
+                  dimnames = list(NULL, c("from", "to")))
> edgesfrom to
[1,] "A"  "B"
[2,] "B"  "A"
[3,] "C"  "D"
> arcs(e) = edges
> eRandom/Generated Bayesian networkmodel:[partially directed graph]nodes:                                 6 arcs:                                  2 undirected arcs:                     1 #这里有个非直接环弧 directed arcs:                       1 average markov blanket size:           0.67 average neighbourhood size:            0.67 average branching factor:              0.17 generation algorithm:                  Empty

②特定的的邻接矩阵

使用amat()函数可以用矩阵来产生图结构。
我们首先需要做的是产生一个矩阵，矩阵的元素为0/1整数（注意这里使用0L和1L，0和1也是可以的，但是会占更多的内存，这在大数据集尤为重要），1L就代表存在一条弧。

> adj = matrix(0L, ncol = 6, nrow = 6,
+              dimnames = list(LETTERS[1:6], LETTERS[1:6]))
> adjA B C D E F
A 0 0 0 0 0 0
B 0 0 0 0 0 0
C 0 0 0 0 0 0
D 0 0 0 0 0 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0
> adj["A", "C"] = 1L
> adj["B", "F"] = 1L
> adj["C", "F"] = 1L
> adj["D", "E"] = 1L
> adj["A", "E"] = 1L
> adjA B C D E F
A 0 0 1 0 1 0
B 0 0 0 0 0 1
C 0 0 0 0 0 1
D 0 0 0 0 1 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0

接着使用amat()函数将弧（arcs）引入到图中

> amat(e) = adj
> eRandom/Generated Bayesian networkmodel:[A][B][D][C|A][E|A:D][F|B:C] nodes:                                 6 arcs:                                  5 undirected arcs:                     0 directed arcs:                       5 average markov blanket size:           2.33 average neighbourhood size:            1.67 average branching factor:              0.83 generation algorithm:                  Empty

注意：amat()函数和arcs()函数一样，都会检查错误的图结构

③特定的模型公式

模型公式的格式来源于deal包，定义如下

每个节点在方括号中
每个节点的父节点在列在管道符后（级竖杆符号|），父节点间用冒号:隔开，但是都是在一个方括号中
可以采用两种方式来定义图结构

使用`model2network()`函数，不用先产生空图

> model2network("[A][C][B|A][D|C][F|A:B:C][E|F]")Random/Generated Bayesian networkmodel:[A][C][B|A][D|C][F|A:B:C][E|F] nodes:                                 6 arcs:                                  6 undirected arcs:                     0 directed arcs:                       6 average markov blanket size:           2.67 average neighbourhood size:            2.00 average branching factor:              1.00 generation algorithm:                  Empty

使用`modelstring()`函数和一个已存在的`bn`对象，类似之前的`arcs()` 和`amat()`

> modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]"
> eRandom/Generated Bayesian networkmodel:[A][C][B|A][D|C][F|A:B:C][E|F] nodes:                                 6 arcs:                                  6 undirected arcs:                     0 directed arcs:                       6 average markov blanket size:           2.67 average neighbourhood size:            2.00 average branching factor:              1.00 generation algorithm:                  Empty

注意：model2network()和modelstring()函数同amat()函数、arcs()函数一样，都会检查错误的图结构。

3、创建一个或多个随机结构

①特定节点顺序

默认情况下，random.graph()函数产生的图和提供的节点顺序一致；在每个弧中，尾部节点排在头部节点的前面。弧是独立的样本，其包含概率由prob参数指定，prob参数的默认值设置为使图中平均有与节点一样多的弧。

> random.graph(LETTERS[1:6], prob = 0.1) #顺序是从A到FRandom/Generated Bayesian networkmodel:[A][B][D][E][C|B][F|C]   #这里可以看出，[B]是排在[C|B]前面的nodes:                                 6 arcs:                                  2 undirected arcs:                     0 directed arcs:                       2 average markov blanket size:           0.67 average neighbourhood size:            0.67 average branching factor:              0.33 generation algorithm:                  Full Ordering arc sampling probability:              0.1

和empty.graph()函数一样，我们也可以指定num 参数来产生一批图，这样就会产生一个列表，可以使用lapply() 来很好地批量计算。

②以均匀概率从连通的有向无环图空间中抽样

除了使用默认的方法method = "ordered"，还可以使用method = "ic-dag"来从具有均匀概率的连通的有向无环图空间中抽样，使用 Ide & Cozman MCMC sampler
请注意，建议的老化长度（length of the burn-in scales ？）与节点数量成二次函数关系，因此使用num批量生成多个图要比一次生成一个图高效得多。

> random.graph(LETTERS[1:6], num = 2, method = "ic-dag")
[[1]]Random/Generated Bayesian networkmodel:[C][E][A|E][B|A:E][D|B:E][F|A:C:D] nodes:                                 6 arcs:                                  8 undirected arcs:                     0 directed arcs:                       8 average markov blanket size:           3.67 average neighbourhood size:            2.67 average branching factor:              1.33 generation algorithm:                  Ide & Cozman's Multiconnected DAGs burn in length:                        216 maximum in-degree:                     Inf maximum out-degree:                    Inf maximum degree:                        Inf [[2]]Random/Generated Bayesian networkmodel:[E][A|E][B|A:E][C|A][D|B:E][F|A:C:D] nodes:                                 6 arcs:                                  9 undirected arcs:                     0 directed arcs:                       9 average markov blanket size:           3.67 average neighbourhood size:            3.00 average branching factor:              1.50 generation algorithm:                  Ide & Cozman's Multiconnected DAGs burn in length:                        216 maximum in-degree:                     Inf maximum out-degree:                    Inf maximum degree:                        Inf

此方法对生成的图的结构接受几个可选的约束，如下面的参数所指定:

burn.in: the number of iterations for the algorithm to converge to a stationary (and uniform) probability distribution.收敛迭代的次数
every: return only one graph every number of steps instead of all the generated graphs. High values of every result in a more diverse set.每个步骤只返回一个图，而不是所有生成的图。every 的高值会导致更多样化的集合。
max.degree: the maximum degree of a node.
max.in.degree: the maximum in-degree.
max.out.degree: the maximum out-degree.
举例

random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4,
+              every = 50, max.degree = 3)
[[1]]Random/Generated Bayesian networkmodel:[B][F|B][A|F][C|A:F][D|A:B][E|B:C] nodes:                                 6 arcs:                                  8 undirected arcs:                     0 directed arcs:                       8 average markov blanket size:           3.33 average neighbourhood size:            2.67 average branching factor:              1.33 generation algorithm:                  Ide & Cozman's Multiconnected DAGs burn in length:                        10000 maximum in-degree:                     Inf maximum out-degree:                    Inf maximum degree:                        3 [[2]]Random/Generated Bayesian networkmodel:[A][C|A][F|C][D|F][B|D:F][E|B:C] nodes:                                 6 arcs:                                  7 undirected arcs:                     0 directed arcs:                       7 average markov blanket size:           2.67 average neighbourhood size:            2.33 average branching factor:              1.17 generation algorithm:                  Ide & Cozman's Multiconnected DAGs burn in length:                        10000 maximum in-degree:                     Inf maximum out-degree:                    Inf maximum degree:                        3

③以均匀概率从有向无环图空间抽样

Melançon的 MCMC算法以均匀概率从有向无环图空间抽样（不需要连接not necessarily connected），指定method = "melancon"即可。

> random.graph(LETTERS[1:6], method = "melancon")Random/Generated Bayesian networkmodel:[A][B|A][F|A][C|A:B][E|B][D|B:C:E] nodes:                                 6 arcs:                                  8 undirected arcs:                     0 directed arcs:                       8 average markov blanket size:           3.00 average neighbourhood size:            2.67 average branching factor:              1.33 generation algorithm:                  Melancon's Uniform Probability DAGs burn in length:                        216 maximum in-degree:                     Inf maximum out-degree:                    Inf maximum degree:                        Inf

至于可选的参数，它与Ide & Cozman算法是相同的。

4、总结

创建三种网络：空网络、特定网络、随机网络
创建特定网络网络的三种方法：边集、邻接矩阵、模型公式
存在的问题：对随机网络不太懂，为什么要创建随机网络，是给我们看看有这些节点可以产生什么样的结构吗？
随机网络的取样是啥意思？