文章目录

轴向堆叠数据——concat()函数
- 横向堆叠与外连接
- 纵向堆叠与内连接
主键合并数据——merge()函数
- 内连接方式
- 外连接方式
- 左连接方式
- 右连接方式
- 其他
根据行索引合并数据——join()方法
- 四种连接方式
- 行索引与列索引重叠
合并重叠数据——combine_first()方法

轴向堆叠数据——concat()函数

pandas.concat(
objs: Union[
Iterable[FrameOrSeriesUnion], Mapping[Optional[Hashable], FrameOrSeriesUnion]
],
axis=0,
join=“outer”,
ignore_index: bool = False,
keys=None,
levels=None,
names=None,
verify_integrity: bool = False,
sort: bool = False,
copy: bool = True,
)

上述函数中常用参数表示的含义如下：

join：表示连接的方式，inner表示内连接，outer表示外连接，默认使用外连接。

ignore_index：接受布尔值，默认为False。如果设置为True，则表示清除现有索引并重置索引值。

keys：接受序列，表示添加最外层索引。

levels：用于构建MultiIndex的特定级别（唯一值）

names：在设置了keys和levels参数后，用于创建分层级别的名称

verify_integerity：检查新的连接轴是否包含重复项。接收布尔值，当设置为True时，如果有重复的轴将会抛出错误，默认为False

根据轴方向的不同（axis参数），可以将堆叠分成横向堆叠和纵向堆叠，默认采用的是纵向堆叠方式。在堆叠数据时，默认采用的是外连接，(join参数设为outer)的方式。

横向堆叠与外连接

使用concat()函数合并时，若是将axis参数的值设为1，且join参数的值设为outer，则合并方式为横向堆叠与外连接。

测试对象：

left:A   B
a  A0  B0
b  A1  B1
right:C   D
c  C0  D0
d  C1  D1

代码：

left = pd.DataFrame({'A': ['A0', 'A1'],'B': ['B0', 'B1']},index=['a', 'b'])
right = pd.DataFrame({'C': ['C0', 'C1'],'D': ['D0', 'D1']},index=['c', 'd'])
print(pd.concat([left, right], join='outer', axis=1))

输出结果：

     A    B    C    D
a   A0   B0  NaN  NaN
b   A1   B1  NaN  NaN
c  NaN  NaN   C0   D0
d  NaN  NaN   C1   D1

使用concat()函数合并之后产生的不存在的数据将用NaN进行填充。

纵向堆叠与内连接

使用concat()函数合并时，若是将axis参数的值设为0，且join参数的值设为inner，则合并方式为纵向堆叠与内连接。

测试对象：

df1:A   B   C
0  A0  B0  C0
1  A1  B1  C1
2  A2  B2  C2
df2:B   C   D
0  B3  C3  D3
1  B4  C4  D4
2  B5  C5  D5

代码：

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2'],'C': ['C0', 'C1', 'C2']})
df2 = pd.DataFrame({'B': ['B3', 'B4', 'B5'],'C': ['C3', 'C4', 'C5'],'D': ['D3', 'D4', 'D5']})
print(pd.concat([df1, df2], join='inner', axis=0))

输出结果

    B   C
0  B0  C0
1  B1  C1
2  B2  C2
0  B3  C3
1  B4  C4
2  B5  C5

主键合并数据——merge()函数

主键合并根据一个或多个键将不同的DaraFrame对象连接起来，大多数是将两个DataFrame对象中重叠的列作为合并的键。

merge(
left,
right,
how: str = “inner”,
on=None,
left_on=None,
right_on=None,
left_index: bool = False,
right_index: bool = False,
sort: bool = False,
suffixes=("_x", “_y”),
copy: bool = True,
indicator: bool = False,
validate=None,
)

上述函数中部分参数表示的含义如下：

left：参与合并的的左侧DataFrame对象
right：参与合并的的右侧DataFrame对象
how：表示连接方式，默认为inner，该参数支持以下的取值：
· left：使用左侧的DataFrame的键，类似于SQL的左外连接
· right：使用右侧的DataFrame的键，类似于SQL的右外连接
· outer：使用两个DataFrame所有的键，类似于SQL的全连接
· inner：使用两个DataFrame键的交集，类似于SQL的内连接
on：用于连接的列名。必须存在于左右两个DataFrame对象中
left_on：以左侧的DataFrame作为连接键
right_on：以右侧的DataFrame作为连接键
left_index：左侧的行索引用作连接键
right_index：右侧的行索引用作连接键
sort：是否排序，接受布尔值，默认为False
suffixes：用于追加都重叠列名的末尾，默认为(_x,_y)

内连接方式

默认采用how=inner的方式合并

测试对象：

df1:A   B   C
0  A0  B0  C0
1  A1  B1  C1
2  A2  B2  C2
df3:B   C   D
0  B0  C0  D3
1  B2  C2  D4
2  B4  C4  D5

代码：

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2'],'C': ['C0', 'C1', 'C2']})
df3 = pd.DataFrame({'B': ['B0', 'B2', 'B4'],'C': ['C0', 'C2', 'C4'],'D': ['D3', 'D4', 'D5']})
print("pd.merge:\n", pd.merge(df1, df3, on=['B', 'C']))

输出结果：

pd.merge:A   B   C   D
0  A0  B0  C0  D3
1  A2  B2  C2  D4

外连接方式

外连接方式（how=outer）：left与right列中相同的数据将会重叠，没有数据的位置使用NaN进行填充。

测试对象：

df1:A   B   C
0  A0  B0  C0
1  A1  B1  C1
2  A2  B2  C2
df3:B   C   D
0  B0  C0  D3
1  B2  C2  D4
2  B4  C4  D5

代码：

print("pd.merge(how=outer):\n", pd.merge(df1, df3, on=['B', 'C'], how='outer'))

输出结果：

pd.merge(how=outer):A   B   C    D
0   A0  B0  C0   D3
1   A1  B1  C1  NaN
2   A2  B2  C2   D4
3  NaN  B4  C4   D5

左连接方式

左连接方式（how=left）：以左表作为基准进行连接，left表中的数据会全部显示，right表中只会显示与重叠数据行索引值相同的数据，合并后表中缺失的数据会使用NaN进行填充。

测试对象：

df1:A   B   C
0  A0  B0  C0
1  A1  B1  C1
2  A2  B2  C2
df3:B   C   D
0  B0  C0  D3
1  B2  C2  D4
2  B4  C4  D5

代码：

print("pd.merge(how=left):\n", pd.merge(df1, df3, on=['B', 'C'], how='left'))

输出结果：

pd.merge(how=left):A   B   C    D
0  A0  B0  C0   D3
1  A1  B1  C1  NaN
2  A2  B2  C2   D4

右连接方式

右连接方式（how=left）：以右表作为基准进行连接，right表中的数据会全部显示，left表中只会显示与重叠数据行索引值相同的数据，合并后表中缺失的数据会使用NaN进行填充。

测试对象：

df1:A   B   C
0  A0  B0  C0
1  A1  B1  C1
2  A2  B2  C2
df3:B   C   D
0  B0  C0  D3
1  B2  C2  D4
2  B4  C4  D5

代码：

print("pd.merge(how=right):\n", pd.merge(df1, df3, on=['B', 'C'], how='right'))

测试结果：

pd.merge(how=right):A   B   C   D
0   A0  B0  C0  D3
1   A2  B2  C2  D4
2  NaN  B4  C4  D5

其他

即使两张表中的行索引与列索引均没有重叠的部分，也可以使用merge()函数来合并。只需要将参数left_index和right_index的值设置为True即可。

测试对象

left:A   B
a  A0  B0
b  A1  B1
right:C   D
c  C0  D0
d  C1  D1

代码：

print("pd.merge(left_index=right_index=True):\n",pd.merge(left, right, how='outer', left_index=True, right_index=True))

输出结果：

      A    B    C    D
a   A0   B0  NaN  NaN
b   A1   B1  NaN  NaN
c  NaN  NaN   C0   D0
d  NaN  NaN   C1   D1

根据行索引合并数据——join()方法

join(self, other, on=None, how=“left”, lsuffix="", rsuffix="", sort=False)

上述方法常用参数表示的含义如下：

on：用于连接列名
how：可从{‘left’，‘right’，‘outer’，‘inner’}中任选一个，默认使用left的方式
lsuffix：接受字符串，用于在左侧重叠的列名后添加后缀名
rsuffix：接受字符串，用于在右侧重叠的列名后添加后缀名
sort：接受布尔值，根据连接键对合并的数据进行排序，默认为False

四种连接方式

测试对象：

data1:A   B   C
a  A0  B0  C0
b  A1  B1  C1
c  A2  B2  C2
data2:B   C   D
b  B1  C1  D1
c  B2  C2  D2
d  B3  C3  D3

代码：

data1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2'],'C': ['C0', 'C1', 'C2']},index=['a', 'b', 'c'])
data2 = pd.DataFrame({'B': ['B1', 'B2', 'B3'],'C': ['C1', 'C2', 'C3'],'D': ['D1', 'D2', 'D3']},index=['b', 'c', 'd'])
print("data1.join(data2, how='outer', lsuffix='one'):\n",data1.join(data2, how='outer', lsuffix='one'))
print("data1.join(data2, how='inner', rsuffix='two'):\n",
data1.join(data2, how='inner', rsuffix='two'))
print("data1.join(data2, how='left', lsuffix='one'):\n",
data1.join(data2, how='left', lsuffix='one'))
print("data1.join(data2, how='right', rsuffix='two'):\n",
data1.join(data2, how='right', rsuffix='two'))

输出结果：

data1.join(data2, how='outer', lsuffix='one'):A Bone Cone    B    C    D
a   A0   B0   C0  NaN  NaN  NaN
b   A1   B1   C1   B1   C1   D1
c   A2   B2   C2   B2   C2   D2
d  NaN  NaN  NaN   B3   C3   D3
data1.join(data2, how='inner', rsuffix='two'):A   B   C Btwo Ctwo   D
b  A1  B1  C1   B1   C1  D1
c  A2  B2  C2   B2   C2  D2
data1.join(data2, how='left', lsuffix='one'):A Bone Cone    B    C    D
a  A0   B0   C0  NaN  NaN  NaN
b  A1   B1   C1   B1   C1   D1
c  A2   B2   C2   B2   C2   D2
data1.join(data2, how='right', rsuffix='two'):A    B    C Btwo Ctwo   D
b   A1   B1   C1   B1   C1  D1
c   A2   B2   C2   B2   C2  D2
d  NaN  NaN  NaN   B3   C3  D3

行索引与列索引重叠

测试对象：

join1:A   B key
0  A0  B0  K0
1  A1  B1  K1
2  A2  B2  K2
join2:C   D
K0  C0  D0
K1  C1  D1
K2  C2  D2

代码：

join1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2'],'key': ['K0', 'K1', 'K2']})
join2 = pd.DataFrame({'C': ['C0', 'C1', 'C2'],'D': ['D0', 'D1', 'D2']},index=['K0', 'K1', 'K2'])
print("join1.join(join2, on='key'):\n", join1.join(join2, on='key'))

输出结果：

join1.join(join2, on='key'):A   B key   C   D
0  A0  B0  K0  C0  D0
1  A1  B1  K1  C1  D1
2  A2  B2  K2  C2  D2

合并重叠数据——combine_first()方法

使用combine_first()方法合并两个DataFrame对象时，必须确保它们的行索引和列索引有重叠的部分。

combine_first(self, other: “DataFrame”)

上述方法中只有一个参数other，该参数用于接收填充缺失值的DataFrame对象。

测试对象：

test1:A    B
0  NaN   B0
1   A1  NaN
2   A2   B2
3   A3  NaN
test2:A   B
1  C0  D0
0  C1  D1
2  C2  D2

代码：

test1 = pd.DataFrame({'A': [np.nan, 'A1', 'A2', 'A3'],'B': ['B0', np.nan, 'B2', np.nan]})
test2 = pd.DataFrame({'A': ['C0', 'C1', 'C2'],'B': ['D0', 'D1', 'D2']},index=[1, 0, 2])
print("test1.combine_first(test2):\n", test1.combine_first(test2))

输出结果：

test1.combine_first(test2):A    B
0  C1   B0
1  A1   D0
2  A2   B2
3  A3  NaN

从上可知，尽管test2表中的行索引与test1表的行索引顺序不同，当用test2表的数据替换test1表的NaN值时，替换数据与缺失数据的索引位置仍然是相同的。例如，test1表中位于第0行第A列的“NaN”需要使用test2表中相同位置的数据“C1"来替换。

Python之数据合并——【concat()函数、merge()函数、join()方法、combine_first()方法】相关推荐

04_pandas字符串函数；数据合并concat、merge；分组groupby；Reshaping；Pivot tables；时间处理（date_range、tz_localize等）
字符串函数,Series的lower()函数 Series在str属性中提供了一组字符串处理方法,可以方便地对数组中的每个元素进行操作,如下面的代码片段所示.请注意,str中的模式匹配通常默认使用正则 ...
Reactor 3 （10）: 数据合并concat、merge
由于业务需求有的时候需要将多个数据源进行合并,Reactor提供了concat方法和merge方法: concat方法示意图: merge方法示意图: 从图中可以很清楚的看出这两种合并方法的不同: c ...
合并数据 - 方法总结（concat、append、merge、join、combine_first）- Python代码
描述分析一个业务的时候往往涉及到很多数据,比如企业融资信息.投资机构信息.行业标签.招聘数据.政策数据等,这些数据分别存储在不同的表中.通过堆叠合并和主键合并等多种合并方式,可以将这些表中需要的数据 ...
python用merge匹配和左连接_左手用R右手Python系列——数据合并与追加
感谢关注天善智能,走好数据之路↑↑↑ 欢迎关注天善智能,我们是专注于商业智能BI,大数据,数据分析领域的垂直社区,学习,问答.求职一站式搞定! 本文作者:天善智能社区专家杜雨今天这篇跟大家介绍R语言 ...
python的rbind_左手用R右手Python系列—数据合并与追加
原标题:左手用R右手Python系列-数据合并与追加感谢关注天善智能,走好数据之路↑↑↑ 欢迎关注天善智能,我们是专注于商业智能BI,大数据,数据分析领域的垂直社区,学习,问答.求职一站式搞定! 今 ...
python DataFrame数据合并 merge()、concat()方法
文章目录 merge() 1.常规合并 ①方法1 ②方法2 重要参数合并方式 left right outer inner 准备数据' inner(默认) outer left right 2.多对 ...
数据合并中pd.merge()和pd.concat()区别
https://mp.weixin.qq.com/s?src=11&timestamp=1619685662&ver=3037&signature=3fZ7EE4fw8Ddfe ...
在python中数据的输出用哪个函数名_在Python中，数据的输出用哪个函数名
在Python中,数据的输出用哪个函数名工业机器人按照坐标形式可以分为().A:圆柱坐标机器人B:球坐标机器人C:直角坐标机器人D:关节坐标机器人生物膜上脂质主要是磷脂.A:对B:错再结晶退火主 ...
这或许是全网最全 Python dataframe 数据合并方法汇总
有位朋友面试阿里的数据岗位,面试官问关于Python的5种数据合并的函数,结果他蒙蔽了'... 那么,究竟是哪五个呢?今天,我们就来带大家了解一下,喜欢记得收藏.关注.点赞. 注意:完整代码.资料.技 ...

Python之数据合并——【concat()函数、merge()函数、join()方法、combine_first()方法】

文章目录

轴向堆叠数据——concat()函数

横向堆叠与外连接

纵向堆叠与内连接

主键合并数据——merge()函数

内连接方式

外连接方式

左连接方式

右连接方式

其他

根据行索引合并数据——join()方法

四种连接方式

行索引与列索引重叠

合并重叠数据——combine_first()方法

Python之数据合并——【concat()函数、merge()函数、join()方法、combine_first()方法】相关推荐

最新文章

热门文章