kohonen | SOM：自组织映射聚类法（1）

自组织映射（Self-Organizing Maps, SOM）是一种神经网络算法，可以用于聚类分析，由芬兰学者Kohonen提出，在R语言中对应的工具包是kohonen。

最初看到这种方法，是在International Journal of Health Geographics杂志的一篇论文中：

Crespo, R., Alvarez, C., Hernandez, I. et al. A spatially explicit analysis of chronic diseases in small areas: a case study of diabetes in Santiago, Chile. Int J Health Geogr 19, 24 (2020). https://doi.org/10.1186/s12942-020-00217-1

聚类的效果如下：

1 `SOM()`函数

kohonen工具包的核心函数有三个：

som()
xyf()
supersom()

本篇主要介绍som()函数，它实现的是单层SOM算法。函数的语法结构如下：

som(X, grid = somgrid(), rlen = 100,...)

X：输入数据，数据结构必须为矩阵，并按列进行标准化；

grid：神经网络结构；由somgrid()函数指定；

rlen：迭代次数。

somgrid()函数的语法结构如下：

somgrid(xdim = 8, ydim = 6,topo = c("rectangular", "hexagonal"),neighbourhood.fct = c("bubble", "gaussian"),toroidal = FALSE)

xdim、ydim：列数、行数；

topo：网络排列形状；上面提及到的文献使用的就是“六边形”（hexagonal）。

library(kohonen)
data = mtcars
X = subset(data, select = c(mpg, hp, drat, wt, qsec))
X <- as.matrix(scale(X))model.som <- som(X = X, grid = somgrid(4,4, "hexagonal"),rlen = 20000)

somgrid(4,4, "hexagonal")表示按六边形将神经网络单元排成4列、4行，共计16个单元。

2 `plot.kohonen`

展示SOM算法的结果一般使用plot()函数。经过改装后，plot()函数的type参数有如下几个选项：

plot(x, type = c("codes", "changes", "counts","dist.neighbours", "mapping","property", "quality"),...)

默认情况下下，type = "code"：

plot(model.som)

type = "change"输出迭代过程：

plot(model.som, type = "change")

type = "count"输出每个神经单元分配到的样本数：

plot(model.som, type = "count")

type参数每个取值的含义具体可参见kohonen工具包中plot.kohonen的说明文档。

3 聚类

som()函数输出的神经网络单元对象，可以使用系统聚类的函数对其进行聚类。单元之间的距离使用object.distances()函数进行刻度。

如将原先的16个神经网络单元根据系统聚类法分为5类：

som.hc <- cutree(hclust(object.distances(model.som, "codes")), 5)
plot(model.som)
add.cluster.boundaries(model.som, som.hc)

如果想查看样本的归属，首先需查看model.som中的unit.classif，对应的是每个样本所在的神经单元编码，再查看som.hc，对应的是每个单元所在的聚类。

model.som$unit.classif
##  [1]  2  2  6 14 16 14  8  9 13  7  7 11 11 11 12 12 12  5  1  5  9 15 15  4 16
## [26]  5  1  1  4  3  4  6som.hc
##  [1] 1 2 2 3 1 2 2 3 2 2 2 4 5 2 2 2