merge()函数--R语言

1. 函数功能

Merge two data frames by common columns or row names,
or do other versions of database join operations.

通过共同列或者行名合并数据框，或者执行其他合并操作

2.函数语法

merge(x, y, by = intersect(names(x), names(y)),by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,sort = TRUE, suffixes = c(".x",".y"), no.dups = TRUE,incomparables = NULL, ...)

3. 函数参数

1） x,y

x, y
data frames, or objects to be coerced to one.

要合并的数据框或者对象

2） by, by.x, by.y

by, by.x, by.y
specifications of the columns used for merging.

指定用于合并的列

By default the data frames are merged on the columns with names they both have,
but separate specifications of the columns can be given by by.x and by.y.
The rows in the two data frames that match on the specified columns are extracted, and joined together.
If there is more than one match, all possible matches contribute one row each.

默认两个数据框通过共同的列合并，也可通过参数by.x和by.y指定。满足指定列的行被提取出来，并合并到一起。
如果有多个匹配项，则所有可能的匹配项各贡献一行。

当两个数据框有相同的列名：

名称 <- c('郭靖','黄蓉','华筝','梅超风','杨康','穆念慈')
性别 <- c('M','F','F','F','M','F')
亲属 <- c('郭啸天','黄药师','铁木真','陈玄风','完颜洪烈','杨铁心')
data <- data.frame(名称,性别,亲属,stringsAsFactors = F)
data名称 <- c('郭靖','黄蓉','王重阳','梅超风','欧阳锋','一灯大师')
身份 <- c('侠之大者','女中诸葛','全真教掌门','黑风双煞','白驼山庄主','大理高僧')
武功 <- c('降龙十八掌',' 落英神剑掌','全真剑法','九阴白骨爪','蛤蟆功','一阳指')
pd <- data.frame(名称,身份,武功,stringsAsFactors = F)
pd

当合并要用到的共同列在数据框中的列名称不同

#  merge()函数
名称 <- c('郭靖','黄蓉','华筝','梅超风','杨康','穆念慈')
性别 <- c('M','F','F','F','M','F')
亲属 <- c('郭啸天','黄药师','铁木真','陈玄风','完颜洪烈','杨铁心')
data <- data.frame(名称,性别,亲属,stringsAsFactors = F)
data姓名 <- c('郭靖','黄蓉','王重阳','梅超风','欧阳锋','一灯大师')
身份 <- c('侠之大者','女中诸葛','全真教掌门','黑风双煞','白驼山庄主','大理高僧')
武功 <- c('降龙十八掌',' 落英神剑掌','全真剑法','九阴白骨爪','蛤蟆功','一阳指')
pd <- data.frame(姓名,身份,武功,stringsAsFactors = F)
pd

by.x和by.y实际上是用来告诉merge函数我们取出x（第一个数据框）的by.x列和y（第二个数据框）的by.y列中具有相同取值的行进行合并，其他的丢掉，另外如果指定了其中一个，那么另一个就必须同时指定，不然就报错；

3) all,all.x,all.y

all
logical; all = L is shorthand for all.x = L and all.y = L,
where L is either TRUE or FALSE.

逻辑值，
all=TRUE相当于all.x=TRUE&all.y=TRUE
all=FALSE相当于all.x=FALSE&all.y=FALSE

all.x
logical; if TRUE, then extra rows will be added to the output,
one for each row in x that has no matching row in y.
These rows will have NAs in those columns that are usually filled with values from y.
The default is FALSE, so that only rows with data from both x and y are included in the output.

all.x
逻辑值，若all.x=TRUE，则将在y中没有匹配的那些x数据也加入输出结果，y中没有的这些行将会产生NA值。默认是FALSE，只有x与y数据框均有的行会被包含在输出结果中。

all.y
logical; analogous to all.x.

逻辑值。与all.x类似

情况1：默认合并方式：取交集

# 默认合并方式：取交集
merge(data,pd,by.x = '名称',by.y = '姓名')
merge(data,pd,by.x = '名称',by.y = '姓名',all=FALSE)
merge(data,pd,by.x = '名称',by.y = '姓名',all.x=FALSE,all.y=FALSE)

情况2：all=TRUE,取并集

# all=TRUE,取并集
merge(data,pd,by.x = '名称',by.y = '姓名',all=TRUE)
merge(data,pd,by.x = '名称',by.y = '姓名',all.x=TRUE,all.y=TRUE)

情况3：all.x=TRUE,all.y=FALSE：取x的全集与匹配的y数据

merge(data,pd,by.x = '名称',by.y = '姓名',all.x=TRUE,all.y=FALSE)

情况4： all.x=FALSE,all.y=TRUE：取y数据框的全集与匹配的x数据框数据

merge(data,pd,by.x = '名称',by.y = '姓名',all.x=FALSE,all.y=TRUE)

总结四种合并方式如下：引用文章

4）sort函数：结果是否按照公共列排序

sort
logical. Should the result be sorted on the by columns?

逻辑值，结果是否按照公共列排序，默认TRUE：排序

merge(data,pd,by.x = '名称',by.y = '姓名',all=TRUE,sort=TRUE)
merge(data,pd,by.x = '名称',by.y = '姓名',all=TRUE,sort=FALSE)

5） suffixes：后缀，当合并后的x,y矩阵有相同的列名时，使用后缀表明出处，默认后缀为.x, .y

suffixes
a character vector of length 2 specifying the suffixes
to be used for making unique the names of columns in the result
which are not used for merging (appearing in by etc).

长度为2的字符串向量，指定用于使结果中不用于合并的列名保持唯一性的后缀。

#  merge()函数
名称 <- c('郭靖','黄蓉','华筝','梅超风','杨康','穆念慈')
性别 <- c('M','F','F','F','M','F')
亲属 <- c('郭啸天','黄药师','铁木真','陈玄风','完颜洪烈','杨铁心')
武功 <- c('空明拳','兰花拂穴手','无','摧心掌','九阴白骨爪','逍遥游拳法')
data <- data.frame(名称,性别,亲属,武功,stringsAsFactors = F)
data姓名 <- c('郭靖','黄蓉','王重阳','梅超风','欧阳锋','一灯大师')
身份 <- c('侠之大者','女中诸葛','全真教掌门','黑风双煞','白驼山庄主','大理高僧')
武功 <- c('降龙十八掌',' 落英神剑掌','全真剑法','九阴白骨爪','蛤蟆功','一阳指')
pd <- data.frame(姓名,身份,武功,stringsAsFactors = F)
pd

merge(data,pd,by.x = '名称',by.y = '姓名',all=TRUE,sort=TRUE)
merge(data,pd,by.x = '名称',by.y = '姓名',all=TRUE,sort=TRUE,suffixes = c('.x','.y'))
merge(data,pd,by.x = '名称',by.y = '姓名',all=TRUE,sort=TRUE,suffixes = c('.data','.pd'))

默认后缀

修改后缀

6）no.dups

no.dups
logical indicating that suffixes are appended in more cases to
avoid duplicated column names in the result.
This was implicitly false before R version 3.5.0.

逻辑值：是否将参数suffixessuffixessuffixes扩展到更多情况下，以避免出现重复的列名

7） incomparables

incomparables
values which cannot be matched.
This is intended to be used for merging on one column,
so these are incomparable values of that column.

8） …

arguments to be passed to or from methods.

merge函数： Merge Two Data Frames