postgresql分页用法_postgresql分页数据重复问题的深入理解

问题背景

许多开发和测试人员都可能遇到过列表的数据翻下一页的时候显示了上一页的数据，也就是翻页会有重复的数据。

如何处理？

这个问题出现的原因是因为选择的排序字段有重复，常见的处理办法就是排序的时候加上唯一字段，这样在分页的过程中数据就不会重复了。关于这个问题文档也有解释并非是一个bug。而是排序时需要选择唯一字段来做排序，不然返回的结果不确定

排序返回数据重复的根本原因是什么呢？

经常优化sql的同学可能会发现，执行计划里面会有Sort Method这个关键字，而这个关键字就是排序选择的方法。abase的排序分为三种

quicksort 快速排序

top-N heapsort Memory 堆排序

external merge Disk 归并排序

推测

分页重复的问题和执行计划选择排序算法的稳定性有关。

简单介绍下这三种排序算法的场景：

在有索引的情况下：排序可以直接走索引。在没有索引的情况下：当表的数据量较小的时候选择快速排序(排序所需必须内存小于work_mem)，当排序有limit，且耗费的内存不超过work_mem时选择堆排序，当work_mem不够时选择归并排序。

验证推测

1.创建表，初始化数据

abase=# create table t_sort(n_int int,c_id varchar(300));

CREATE TABLE

abase=# insert into t_sort(n_int,c_id) select 100,generate_series(1,9);

INSERT 0 9

abase=# insert into t_sort(n_int,c_id) select 200,generate_series(1,9);

INSERT 0 9

abase=# insert into t_sort(n_int,c_id) select 300,generate_series(1,9);

INSERT 0 9

abase=# insert into t_sort(n_int,c_id) select 400,generate_series(1,9);

INSERT 0 9

abase=# insert into t_sort(n_int,c_id) select 500,generate_series(1,9);

INSERT 0 9

abase=# insert into t_sort(n_int,c_id) select 600,generate_series(1,9);

INSERT 0 9

三种排序

--快速排序 quicksort

abase=# explain analyze select ctid,n_int,c_id from t_sort order by n_int asc;

QUERY PLAN

------------------------------------------------------------

Sort (cost=3.09..3.23 rows=54 width=12) (actual time=0.058..0.061 rows=54 loops=1)

Sort Key: n_int

Sort Method: quicksort Memory: 27kB

-> Seq Scan on t_sort (cost=0.00..1.54 rows=54 width=12) (actual time=0.021..0.032 rows=54 loops=1)

Planning time: 0.161 ms

Execution time: 0.104 ms

(6 rows)

--堆排序 top-N heapsort

abase=# explain analyze select ctid,n_int,c_id from t_sort order by n_int asc limit 10;

QUERY PLAN

------------------------------------------------------------

Limit (cost=2.71..2.73 rows=10 width=12) (actual time=0.066..0.068 rows=10 loops=1)

-> Sort (cost=2.71..2.84 rows=54 width=12) (actual time=0.065..0.066 rows=10 loops=1)

Sort Key: n_int

Sort Method: top-N heapsort Memory: 25kB

-> Seq Scan on t_sort (cost=0.00..1.54 rows=54 width=12) (actual time=0.022..0.031 rows=54 loops=1

)

Planning time: 0.215 ms

Execution time: 0.124 ms

(7 rows)

--归并排序 external sort Disk

--插入大量值为a的数据

abase=# insert into t_sort(n_int,c_id) select generate_series(1000,2000),'a';

INSERT 0 1001

abase=# set work_mem = '64kB';

SET

abase=# explain analyze select ctid,n_int,c_id from t_sort order by n_int asc;

QUERY PLAN

-------------------------------------------------------------

Sort (cost=18.60..19.28 rows=270 width=12) (actual time=1.235..1.386 rows=1055 loops=1)

Sort Key: n_int

Sort Method: external sort Disk: 32kB

-> Seq Scan on t_sort (cost=0.00..7.70 rows=270 width=12) (actual time=0.030..0.247 rows=1055 loops=1)

Planning time: 0.198 ms

Execution time: 1.663 ms

(6 rows)

快速排序

--快速排序

abase=# explain analyze select ctid,n_int,c_id from t_sort order by n_int asc;

QUERY PLAN

------------------------------------------------------------

Sort (cost=3.09..3.23 rows=54 width=12) (actual time=0.058..0.061 rows=54 loops=1)

Sort Key: n_int

Sort Method: quicksort Memory: 27kB

-> Seq Scan on t_sort (cost=0.00..1.54 rows=54 width=12) (actual time=0.021..0.032 rows=54 loops=1)

Planning time: 0.161 ms

Execution time: 0.104 ms

(6 rows)

--获取前20条数据

abase=# select ctid,n_int,c_id from t_sort order by n_int asc limit 20;

ctid | n_int | c_id

--------+-------+------

(0,7) | 100 | 7

(0,2) | 100 | 2

(0,4) | 100 | 4

(0,8) | 100 | 8

(0,3) | 100 | 3

(0,6) | 100 | 6

(0,5) | 100 | 5

(0,9) | 100 | 9

(0,1) | 100 | 1

(0,14) | 200 | 5

(0,13) | 200 | 4

(0,12) | 200 | 3

(0,10) | 200 | 1

(0,15) | 200 | 6

(0,16) | 200 | 7

(0,17) | 200 | 8

(0,11) | 200 | 2

(0,18) | 200 | 9

(0,20) | 300 | 2

(0,19) | 300 | 1

(20 rows) --分页获取前10条数据

abase=# select ctid,n_int,c_id from t_sort order by n_int asc limit 10 offset 0;

ctid | n_int | c_id

--------+-------+------

(0,1) | 100 | 1

(0,3) | 100 | 3

(0,4) | 100 | 4

(0,2) | 100 | 2

(0,6) | 100 | 6

(0,7) | 100 | 7

(0,8) | 100 | 8

(0,9) | 100 | 9

(0,5) | 100 | 5

(0,10) | 200 | 1

(10 rows)

--分页从第10条开始获取10条

abase=# select ctid,n_int,c_id from t_sort order by n_int asc limit 10 offset 10;

ctid | n_int | c_id

--------+-------+------

(0,13) | 200 | 4

(0,12) | 200 | 3

(0,10) | 200 | 1

(0,15) | 200 | 6

(0,16) | 200 | 7

(0,17) | 200 | 8

(0,11) | 200 | 2

(0,18) | 200 | 9

(0,20) | 300 | 2

(0,19) | 300 | 1

(10 rows)

limit 10 offset 0,limit 10 offset 10，连续取两页数据

此处可以看到limit 10 offset 10结果中，第三条数据重复了第一页的最后一条： (0,10) | 200 | 1

并且n_int = 200 and c_id = 5这条数据被“遗漏”了。

堆排序

abase=# select count(*) from t_sort;

count

-------

1055

(1 row)

--设置work_mem 4MB

abase=# show work_mem ;

work_mem

----------

4MB

(1 row)

--top-N heapsort

abase=# explain analyze select * from ( select ctid,n_int,c_id from test order by n_int asc limit 1001 offset 0) td limit 10;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------

Limit (cost=2061.33..2061.45 rows=10 width=13) (actual time=15.247..15.251 rows=10 loops=1)

-> Limit (cost=2061.33..2063.83 rows=1001 width=13) (actual time=15.245..15.247 rows=10 loops=1)

-> Sort (cost=2061.33..2135.72 rows=29757 width=13) (actual time=15.244..15.245 rows=10 loops=1)

Sort Key: test.n_int

Sort Method: top-N heapsort Memory: 95kB

-> Seq Scan on test (cost=0.00..429.57 rows=29757 width=13) (actual time=0.042..7.627 rows=2

9757 loops=1)

Planning time: 0.376 ms

Execution time: 15.415 ms

(8 rows)

--获取limit 1001 offset 0，然后取10前10条数据

abase=# select * from ( select ctid,n_int,c_id from test order by n_int asc limit 1001 offset 0) td limit 10;

ctid | n_int | c_id

----------+-------+------

(0,6) | 100 | 6

(0,2) | 100 | 2

(0,5) | 100 | 5

(87,195) | 100 | 888

(0,3) | 100 | 3

(0,1) | 100 | 1

(0,8) | 100 | 8

(0,55) | 100 | 888

(44,12) | 100 | 888

(0,9) | 100 | 9

(10 rows)

---获取limit 1001 offset 1，然后取10前10条数据

abase=# select * from ( select ctid,n_int,c_id from test order by n_int asc limit 1001 offset 1) td limit 10;

ctid | n_int | c_id

----------+-------+------

(44,12) | 100 | 888

(0,8) | 100 | 8

(0,1) | 100 | 1

(0,5) | 100 | 5

(0,9) | 100 | 9

(87,195) | 100 | 888

(0,7) | 100 | 7

(0,6) | 100 | 6

(0,3) | 100 | 3

(0,4) | 100 | 4

(10 rows)

---获取limit 1001 offset 2，然后取10前10条数据

abase=# select * from ( select ctid,n_int,c_id from test order by n_int asc limit 1001 offset 2) td limit 10;

ctid | n_int | c_id

----------+-------+------

(0,5) | 100 | 5

(0,55) | 100 | 888

(0,1) | 100 | 1

(0,9) | 100 | 9

(0,2) | 100 | 2

(0,3) | 100 | 3

(44,12) | 100 | 888

(0,7) | 100 | 7

(87,195) | 100 | 888

(0,4) | 100 | 4

(10 rows)

堆排序使用内存： Sort Method: top-N heapsort Memory: 95kB

当offset从0变成1后，以及变成2后，会发现查询出的10条数据不是有顺序的。

归并排序

--将work_mem设置为64kb让其走归并排序。

abase=# set work_mem ='64kB';

SET

abase=# show work_mem;

work_mem

----------

64kB

(1 row)

-- external merge Disk

abase=# explain analyze select * from ( select ctid,n_int,c_id from test order by n_int asc limit 1001 offset 0) td limit 10;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------

Limit (cost=2061.33..2061.45 rows=10 width=13) (actual time=27.912..27.916 rows=10 loops=1)

-> Limit (cost=2061.33..2063.83 rows=1001 width=13) (actual time=27.910..27.913 rows=10 loops=1)

-> Sort (cost=2061.33..2135.72 rows=29757 width=13) (actual time=27.909..27.911 rows=10 loops=1)

Sort Key: test.n_int

Sort Method: external merge Disk: 784kB

-> Seq Scan on test (cost=0.00..429.57 rows=29757 width=13) (actual time=0.024..6.730 rows=29757 loops=1)

Planning time: 0.218 ms

Execution time: 28.358 ms

(8 rows)

--同堆排序一样，获取limit 1001 offset 0，然后取10前10条数据

abase=# select * from ( select ctid,n_int,c_id from test order by n_int asc limit 1001 offset 0) td limit 10;

ctid | n_int | c_id

--------+-------+------

(0,1) | 100 | 1

(0,2) | 100 | 2

(0,4) | 100 | 4

(0,8) | 100 | 8

(0,9) | 100 | 9

(0,5) | 100 | 5

(0,3) | 100 | 3

(0,6) | 100 | 6

(0,55) | 100 | 888

(0,7) | 100 | 7

(10 rows)

--同堆排序一样，获取limit 1001 offset 1，然后取10前10条数据

abase=# select * from ( select ctid,n_int,c_id from test order by n_int asc limit 1001 offset 1) td limit 10;

ctid | n_int | c_id

----------+-------+------

(0,2) | 100 | 2

(0,4) | 100 | 4

(0,8) | 100 | 8

(0,9) | 100 | 9

(0,5) | 100 | 5

(0,3) | 100 | 3

(0,6) | 100 | 6

(0,55) | 100 | 888

(0,7) | 100 | 7

(87,195) | 100 | 888

(10 rows)

--同堆排序一样，获取limit 1001 offset 2，然后取10前10条数据

abase=# select * from ( select ctid,n_int,c_id from test order by n_int asc limit 1001 offset 2) td limit 10;

ctid | n_int | c_id

----------+-------+------

(0,4) | 100 | 4

(0,8) | 100 | 8

(0,9) | 100 | 9

(0,5) | 100 | 5

(0,3) | 100 | 3

(0,6) | 100 | 6

(0,55) | 100 | 888

(0,7) | 100 | 7

(87,195) | 100 | 888

(44,12) | 100 | 888

(10 rows)

减小work_mem使用归并排序的时候，offset从0变成1后以及变成2后，任然有序。

还有一种情况，那就是在查询前面几页的时候会有重复，但是越往后面翻就不会重复了，现在也可以解释清楚。

如果每页10条数据，当offse较小的时候使用的内存较少。当offse不断增大，所耗费的内存也就越多。

--设置work_mem =64kb

abase=# show work_mem;

work_mem

----------

64kB

(1 row)

--查询limit 10 offset 10

abase=# explain analyze select * from ( select ctid,n_int,c_id from test order by n_int asc limit 10 offset 10) td limit 10;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------

Limit (cost=1221.42..1221.54 rows=10 width=13) (actual time=12.881..12.884 rows=10 loops=1)

-> Limit (cost=1221.42..1221.44 rows=10 width=13) (actual time=12.879..12.881 rows=10 loops=1)

-> Sort (cost=1221.39..1295.79 rows=29757 width=13) (actual time=12.877..12.879 rows=20 loops=1)

Sort Key: test.n_int

Sort Method: top-N heapsort Memory: 25kB

-> Seq Scan on test (cost=0.00..429.57 rows=29757 width=13) (actual time=0.058..6.363 rows=29757 loops=1)

Planning time: 0.230 ms

Execution time: 12.976 ms

(8 rows)

--查询limit 10 offset 1000

abase=# explain analyze select * from ( select ctid,n_int,c_id from test order by n_int asc limit 10 offset 1000) td limit 10;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------

Limit (cost=2065.75..2065.88 rows=10 width=13) (actual time=27.188..27.192 rows=10 loops=1)

-> Limit (cost=2065.75..2065.78 rows=10 width=13) (actual time=27.186..27.188 rows=10 loops=1)

-> Sort (cost=2063.25..2137.64 rows=29757 width=13) (actual time=26.940..27.138 rows=1010 loops=1)

Sort Key: test.n_int

Sort Method: external merge Disk: 784kB

-> Seq Scan on test (cost=0.00..429.57 rows=29757 width=13) (actual time=0.026..6.374 rows=29757 loops=1)

Planning time: 0.207 ms

Execution time: 27.718 ms

(8 rows)

可以看到当offset从10增加到1000的时候，使用的内存增加，排序的方法从堆排序变成了归并排序。而归并排序为稳定排序，所以后面的分页不会再有后一页出现前一页数据的情况。

结语

1.关于分页重复数据的问题主要是排序字段不唯一并且执行计划走了快速排序和堆排序导致。

2.当排序有重复字段，但是如果查询是归并排序，便不会存在有重复数据的问题。

3.当用重复字段排序，前面的页重复，随着offset的增大导致work_mem不足以后使用归并排序，就不存在重复的数据了。

4.排序和算法的稳定性有关，当执行计划选择不同的排序算法时，返回的结果不一样。

5.处理重复数据的常见手段就是，排序的时候可以在排序字段d_larq(立案日期)后面加上c_bh(唯一字段)来排序。

order by d_larq,c_bh;

总结

以上就是这篇文章的全部内容了，希望本文的内容对大家的学习或者工作具有一定的参考学习价值，谢谢大家对脚本之家的支持。

postgresql分页用法_postgresql分页数据重复问题的深入理解相关推荐

如何解决MySQL order by limit语句的分页数据重复问题？
文章来源:https://www.jianshu.com/p/544c319fd838 0 问题描述在MySQL中我们通常会采用limit来进行翻页查询,比如limit(0,10)表示列出第一页的1 ...
layui分页limit不显示_小心避坑：MySQL分页时使用 limit+order by 会出现数据重复问题...
20大进阶架构专题每日送达来源:www.jianshu.com/p/544c319fd838 进入主题前先插一下,当当优惠码福利来一波!当当全场自营图书5折,用优惠码:J2JYFK(长按复制),满2 ...
关于 Oracle分页数据重复的问题
2019独角兽企业重金招聘Python工程师标准>>> 先说问题吧.最近在测试一个新的模块,发现列表数据的前三页数据竟然是一样的.第一反应是 pageNo 的问题,debug一看,p ...
mybatis分页数据重复
今天测试的时候遇到个bug:分页查询出来的数据是乱序的(第一页查过的数据也会跑第二页去) 将mybatis 日志中的sql ,拿出来单独执行,发现结果是正确,为什么mybatis查出来的数据是乱序的? ...
mysql scrapy 重复数据_小心避坑：MySQL分页时使用 limit+order by 会出现数据重复问题...
作者:猿码道http://www.jianshu.com/p/544c319fd838 0 问题描述在MySQL中我们通常会采用limit来进行翻页查询,比如limit(0,10)表示列出第一页的1 ...
小心避坑：MySQL分页时使用 limit+order by 会出现数据重复问题
点击上方"Java知音",选择"置顶公众号" 技术文章第一时间送达! 作者:猿码道 www.jianshu.com/p/544c319fd838 0 问题描述 ...
mysql limit分页_MySQL order by limit 分页数据重复问题
黑客技术点击右侧关注,了解黑客的世界! Linux编程点击右侧关注,免费入门到精通! 程序员严选甄选正品好物,程序员生活指南! 作者丨猿码道 https://www.jianshu.com/p/544 ...
数据库不断有新数据插入, 导致分页查询数据重复的问题
问题描述首先, 查询数据时是按照数据的录入时间分页查询的, 最新的数据一直是第1页; 同时, 库表不断地有新数据写入, 这就导致了分页查询数据请求出现重复问题. 例如19:31分时分页查询请求第1页 ...
排序后分页第一页数据和第二页数据重复问题
问题描述数据分页时需要根据值班时间F_Start 字段倒序,即使用OrderByDescending(t => t.F_Start),经过调试发现返回的总条数records,总页数to ...

postgresql分页用法_postgresql分页数据重复问题的深入理解

postgresql分页用法_postgresql分页数据重复问题的深入理解相关推荐

最新文章

热门文章