PostgreSQL pg_stats used to estimate top N freps values and explain rows
标签
PostgreSQL , 数组 , pg_stats , 统计信息 , TOP N , 词频
背景
本文要聊的是如何利用统计信息规避一些复杂的精确统计.
去年写过一篇关于 PostgreSQL 9.2 新增array元素统计收集的相关文章. 本文也会讲到.
《PostgreSQL 9.2 add array elements statistics》
Release 9.2 Release Date: 2012-09-10 * Move the frequently accessed members of the PGPROC shared memory array to a separate array (Pavan Deolasee, Heikki Linnakangas, Robert Haas)
在日常的数据库统计中, count(*), 排名这类的统计非常多, 同时这类统计的开销也非常大, 特别是当表的数据量巨大时.
接下来模拟几个场景。
1、统计某条件下的记录条数.
digoal=# create table test_1 (id serial4 primary key, info text, appid int, crt_time timestamp);
CREATE TABLE
digoal=# insert into test_1 (info,appid,crt_time) select md5(random()::text),round(10000*random())::int,clock_timestamp() from generate_series(1,2000000);
INSERT 0 2000000
默认的统计目标值为100
digoal=# show default_statistics_target ;
-[ RECORD 1 ]-------------+----
default_statistics_target | 100
digoal=# analyze test_1;
ANALYZE
为了得到准确的统计信息, 如果没有打开autovacuum, 最好手动收集一次统计信息.
digoal=# select * from pg_stat_all_tables where relname='test_1';
-[ RECORD 1 ]-----+------------------------------
relid | 91368
schemaname | public
relname | test_1
seq_scan | 1
seq_tup_read | 0
idx_scan | 0
idx_tup_fetch | 0
n_tup_ins | 2000000
n_tup_upd | 0
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 2000000
n_dead_tup | 0
last_vacuum |
last_autovacuum |
last_analyze | 2013-08-11 11:15:04.975523+08
last_autoanalyze | 2013-08-11 11:14:53.663951+08
vacuum_count | 0
autovacuum_count | 0
analyze_count | 1
autoanalyze_count | 1
查询appid=1的记录数有多少
digoal=# select count(*) from test_1 where appid=1; count
------- 189
(1 row)
使用explain输出appid=1的记录数有多少, 这里显示为197. 和使用count(*)得到的存在一点差异.
digoal=# explain select * from test_1 where appid=1; QUERY PLAN
------------------------------------------------------------- Seq Scan on test_1 (cost=0.00..45619.00 rows=197 width=49) Filter: (appid = 1)
(2 rows)
如果将统计目标调整为10000, 采样行数会大大增加, 消耗的资源加大.
digoal=# alter table test_1 alter column appid SET STATISTICS 10000;
ALTER TABLE
digoal=# analyze verbose test_1;
INFO: analyzing "public.test_1"
INFO: "test_1": scanned 20619 of 20619 pages, containing 2000000 live rows and 0 dead rows; 2000000 rows in sample, 2000000 estimated total rows
ANALYZE
但是统计行数更加准确了, 现在为188. 只相差1行.
digoal=# explain select * from test_1 where appid=1; QUERY PLAN
------------------------------------------------------------- Seq Scan on test_1 (cost=0.00..45619.00 rows=188 width=49) Filter: (appid = 1)
(2 rows)
范围查询看看是否准确呢?
digoal=# explain select * from test_1 where appid>1000; QUERY PLAN
----------------------------------------------------------------- Seq Scan on test_1 (cost=0.00..45619.00 rows=1800419 width=49) Filter: (appid > 1000)
(2 rows)
也非常准确.
digoal=# select count(*) from test_1 where appid>1000; count
--------- 1800263
(1 row)
误差为 : 0.00008665
组合条件的输出行评估 :
digoal=# explain select * from test_1 where appid>1000 and crt_time>now(); QUERY PLAN
------------------------------------------------------------- Seq Scan on test_1 (cost=0.00..55619.00 rows=180 width=49) Filter: ((appid > 1000) AND (crt_time > now()))
(2 rows)
digoal=# select * from test_1 where appid>1000 and crt_time>now(); id | info | appid | crt_time
----+------+-------+----------
(0 rows)
2、分组排行, 例如要查询哪个appid的记录条数最多.
digoal=# alter table test_1 alter column appid SET STATISTICS 100;
ALTER TABLE
digoal=# analyze verbose test_1;
INFO: analyzing "public.test_1"
INFO: "test_1": scanned 20619 of 20619 pages, containing 2000000 live rows and 0 dead rows; 30000 rows in sample, 2000000 estimated total rows
ANALYZE
digoal=# select most_common_vals,most_common_freqs from pg_stats where tablename='test_1' and attname='appid';
-[ RECORD 1 ]-----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
most_common_vals | {3677,6460,1210,1291,2052,3374,3633,4386,4729,4756,5602,320,383,479,906,1003,1018,1102,1243,2594,2625,2762,3092,3243,3376,3511,4842,5595,5967,6135,6412,6821,6824,6966,7828,7984,8118,8310,8378,8952,9012,9840,9922,22,68,359,632,899,933,1034,1227,1369,1554,1615,1706,1744,1824,1995,2034,2056,2215,2412,2770,2988,3488,3722,3780,3834,3937,4079,4124,4224,4424,4723,4811,4870,5287,5490,5596,5609,5665,5751,5881,6236,6562,6656,6694,6827,6865,6980,6996,7008,7021,7097,7274,7285,7289,7330,7367,7449}
most_common_freqs | {0.0004,0.000366667,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667}
值太均匀, 所以这个不准确.
digoal=# select appid,count(*) from test_1 group by appid order by count(*) desc limit 100; appid | count
-------+------- 9853 | 253 6502 | 249 1688 | 249 464 | 249 9974 | 248 1540 | 248 6622 | 247 6669 | 247 4643 | 247 1046 | 246 3051 | 246 6359 | 246 9103 | 246 348 | 246 2213 | 244 138 | 244 8135 | 244 3980 | 244 5870 | 243 9349 | 243 6210 | 243 4575 | 243 3421 | 242 207 | 242 3224 | 242 7056 | 242 4561 | 242 8770 | 241 3011 | 241 3731 | 241 4951 | 241 1066 | 240 5501 | 240 9354 | 240 7430 | 240 7621 | 240 2058 | 240 5460 | 240 6578 | 239 7431 | 239 5473 | 239 7305 | 239 9563 | 239 3275 | 239 2968 | 239 8825 | 239 3426 | 238 3850 | 238 6835 | 238 5928 | 238 8567 | 238 4083 | 238 1137 | 238 4862 | 238 4238 | 238 1058 | 238 6745 | 237 5854 | 237 3196 | 237 3165 | 237 724 | 237 9643 | 237 7326 | 237 6661 | 237 3685 | 236 2590 | 236 9685 | 236 8366 | 236 3931 | 236 7074 | 236 6140 | 236 4402 | 236 4635 | 236 7628 | 236 5967 | 236 24 | 236 987 | 236 2472 | 236 8724 | 236 6404 | 236 9504 | 235 5816 | 235 1261 | 235 5551 | 235 874 | 235 1880 | 235 5248 | 235 404 | 235 5738 | 235 583 | 235 7799 | 235 2362 | 235 1789 | 235 7707 | 235 3091 | 234 9245 | 234 6107 | 234 8657 | 234 7460 | 234 2252 | 234
(100 rows)
当倾斜较大时(符合现实世界理论,二八原则), 就特别准.
digoal=# insert into test_1 (info,appid,crt_time) select 'test',1,now() from generate_series(1,100000);
INSERT 0 100000
digoal=# analyze verbose test_1;
INFO: analyzing "public.test_1"
INFO: "test_1": scanned 21262 of 21262 pages, containing 2101000 live rows and 0 dead rows; 30000 rows in sample, 2101000 estimated total rows
ANALYZE
digoal=# select appid,count(*) from test_1 group by appid order by count(*) desc limit 5; appid | count
-------+-------- 1 | 101189 9853 | 253 6502 | 249 464 | 249 1688 | 249
(5 rows)
3、数组的元素值排行统计
例如有一个表中记录了客户允许的应用程序信息, 应用程序存储为一个数组, 程序以id形式存到数组中.
如果要统计全国终端中最火爆的程序排行.
digoal=# create table test_2(id serial primary key, appid int[], crt_time timestamp);
CREATE TABLE
假设appid为0-10的程序比较火爆, 模拟100秒插入请求.
vi test.sql
insert into test_2(appid) select array_agg(appid) appid_agg from (select round(10*random())::int as appid from generate_series(1,20)) t;
pg93@db-172-16-3-33-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -T 100 digoal
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 16
number of threads: 4
duration: 100 s
number of transactions actually processed: 389701
tps = 3896.755035 (including connections establishing)
tps = 3897.686073 (excluding connections establishing)
statement latencies in milliseconds: 4.103117 insert into test_2(appid) select array_agg(appid) appid_agg from (select round(10*random())::int as appid from generate_series(1,20)) t;
假设appid为10以上的程序不火爆, 模拟10秒插入请求.
vi test.sql
insert into test_2(appid) select array_agg(appid) appid_agg from (select round(1000*random())::int as appid from generate_series(1,20)) t;
pg93@db-172-16-3-33-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -T 10 digoal
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 16
number of threads: 4
duration: 10 s
number of transactions actually processed: 37256
tps = 3718.362208 (including connections establishing)
tps = 3725.838361 (excluding connections establishing)
statement latencies in milliseconds: 4.290334 insert into test_2(appid) select array_agg(appid) appid_agg from (select round(1000*random())::int as appid from generate_series(1,20)) t;
总记录数如下 :
digoal=# select count(*) from test_2; count
-------- 472583
(1 row)
收集统计信息
digoal=# analyze verbose test_2;
INFO: analyzing "public.test_2"
INFO: "test_2": scanned 8184 of 8184 pages, containing 472583 live rows and 0 dead rows; 30000 rows in sample, 472583 estimated total rows
ANALYZE
通过group by和order by获取实际的排名 :
digoal=# select appid,count(*) from (select unnest(appid) as appid from test_2) t group by appid order by count(*) desc limit 20; appid | count
-------+-------- 9 | 872831 6 | 871908 3 | 871867 7 | 871551 8 | 871436 4 | 871391 1 | 871051 5 | 870770 2 | 870692 10 | 435583 0 | 435342 387 | 831 69 | 824 665 | 822 703 | 816 651 | 811 520 | 809 435 | 809 783 | 806 671 | 806
(20 rows)
接下来要用9.2新增的array 统计信息查看排名了 :
注意元素的收集个数为default_statistics_target 的10倍, 这个可以参考源码 :
+ /*
+ * Set up static pointer for use by subroutines. We wait till here in
+ * case std_compute_stats somehow recursively invokes us (probably not
+ * possible, but ...)
+ */
+ array_extra_data = extra_data;
+
+ /*
+ * We want statistics_target * 10 elements in the MCELEM array. This
+ * multiplier is pretty arbitrary, but is meant to reflect the fact that
+ * the number of individual elements tracked in pg_statistic ought to be
+ * more than the number of values for a simple scalar column.
+ */
+ num_mcelem = stats->attr->attstattarget * 10;
所以默认default_statistics_target=100的情况下, 将会收集1000个数组型元素的出现概率.
digoal=# select array_length(most_common_elems,1),array_length(most_common_elem_freqs,1) from pg_stats where tablename='test_2' and attname='appid';
-[ RECORD 1 ]+-----
array_length | 1000
array_length | 1003
most_common_elem_freqs中最后3个的值代表min, max, null_freqs. 具体见
《PostgreSQL 9.2 add array elements statistics》
如果把统计信息修改为50, 那么统计信息的元素个数将变成500.
digoal=# alter table test_2 alter column appid set statistics 50;
ALTER TABLE
digoal=# analyze verbose test_2;
INFO: analyzing "public.test_2"
INFO: "test_2": scanned 8184 of 8184 pages, containing 472583 live rows and 0 dead rows; 30000 rows in sample, 472583 estimated total rows
ANALYZE
digoal=# select array_length(most_common_elems,1),array_length(most_common_elem_freqs,1) from pg_stats where tablename='test_2' and attname='appid';
-[ RECORD 1 ]+----
array_length | 500
array_length | 503
下面是统计信息 :
digoal=# select most_common_elems,most_common_elem_freqs from pg_stats where tablename='test_2' and attname='appid';
most_common_elems | {0,1,2,3,4,5,6,7,8,9,10,12,13,15,16,17,18,22,23,24,27,30,31,32,34,35,36,37,40,41,42,44,49,50,52,55,56,60,63,67,69,73,77,78,81,85,89,90,94,95,97,100,101,103,104,108,109,114,116,117,120,121,123,125,128,132,133,134,136,139,140,143,144,146,149,150,151,152,153,154,155,158,159,162,165,167,170,172,173,174,175,180,182,183,184,186,189,192,196,197,200,202,208,209,210,211,214,217,221,222,226,227,232,233,234,239,241,242,243,245,246,249,250,253,254,259,260,263,264,265,268,269,272,273,275,276,277,281,283,287,288,289,291,294,295,296,297,300,301,304,305,307,311,315,316,317,318,322,324,325,333,334,335,338,340,342,344,346,348,351,352,353,354,355,357,359,362,363,365,366,367,368,370,374,376,377,379,382,384,387,389,391,393,397,398,400,401,402,406,407,410,415,418,419,420,422,423,424,425,428,430,431,432,433,434,436,437,438,441,442,443,445,446,448,449,450,451,455,456,457,458,460,463,464,465,467,470,471,472,474,476,478,481,483,484,486,488,490,491,492,494,495,496,498,499,503,504,505,510,512,513,514,515,518,520,522,523,529,530,532,538,539,542,543,545,553,556,559,560,562,565,567,568,569,571,572,574,575,576,579,580,582,587,588,589,593,596,597,599,601,605,606,607,608,609,610,612,616,617,619,621,622,624,626,628,631,632,637,638,642,644,646,648,655,656,657,660,661,663,664,665,666,667,669,670,672,675,678,679,681,682,683,685,687,689,691,692,693,696,697,698,699,701,702,703,707,708,709,712,714,716,718,719,720,722,723,724,725,727,728,729,734,735,736,738,741,742,743,744,746,747,752,753,759,760,761,762,763,767,769,770,772,773,775,776,777,778,779,782,784,785,787,788,789,790,791,792,795,797,799,800,801,804,805,807,809,811,815,816,818,819,824,825,827,828,829,830,831,832,833,836,838,840,842,843,844,847,848,850,851,854,858,860,861,862,864,865,868,869,870,873,874,875,880,881,884,886,893,894,895,898,900,904,906,907,912,913,914,922,924,928,931,933,939,941,942,944,946,947,954,955,956,957,960,961,962,963,965,967,968,970,972,975,978,987,988,991,994,996,997}
most_common_elem_freqs | {0.590833,0.810967,0.809233,0.808433,0.807467,0.806667,0.808067,0.8102,0.806233,0.808367,0.588033,0.00186667,0.0019,0.0018,0.0018,0.00183333,0.00193333,0.00176667,0.002,0.00176667,0.00223333,0.00163333,0.00213333,0.0019,0.00193333,0.0017,0.00166667,0.00166667,0.00163333,0.00186667,0.00176667,0.0022,0.00206667,0.00196667,0.00183333,0.00163333,0.0018,0.002,0.0017,0.0017,0.00166667,0.00183333,0.00173333,0.0017,0.0017,0.00163333,0.00183333,0.00173333,0.00193333,0.0017,0.0017,0.0018,0.00173333,0.00176667,0.00166667,0.00213333,0.0021,0.00173333,0.00193333,0.00186667,0.00166667,0.0019,0.00166667,0.00213333,0.0021,0.00176667,0.00163333,0.00173333,0.00176667,0.00176667,0.00186667,0.00193333,0.00176667,0.00173333,0.0019,0.00193333,0.00176667,0.002,0.00176667,0.00173333,0.00173333,0.00186667,0.00173333,0.00166667,0.0021,0.00163333,0.00193333,0.00176667,0.00166667,0.00176667,0.00163333,0.00193333,0.00166667,0.00166667,0.00193333,0.00176667,0.0017,0.00163333,0.00176667,0.00196667,0.002,0.00196667,0.00213333,0.0017,0.00193333,0.00173333,0.00176667,0.00166667,0.0018,0.00163333,0.0021,0.00216667,0.00203333,0.00193333,0.0019,0.00176667,0.00193333,0.00176667,0.00193333,0.00176667,0.00166667,0.00186667,0.00196667,0.00173333,0.0019,0.002,0.00193333,0.0021,0.00166667,0.0017,0.00166667,0.00196667,0.0019,0.0019,0.00163333,0.0017,0.00163333,0.0023,0.00173333,0.00186667,0.00193333,0.00173333,0.0017,0.00166667,0.00166667,0.00173333,0.0018,0.00166667,0.00183333,0.0018,0.00173333,0.00163333,0.0017,0.00176667,0.00176667,0.00166667,0.0019,0.00163333,0.00166667,0.00163333,0.0017,0.00196667,0.0021,0.00166667,0.0019,0.00173333,0.00166667,0.002,0.00233333,0.0017,0.00163333,0.00183333,0.0018,0.00186667,0.00183333,0.00186667,0.0019,0.0018,0.00166667,0.00196667,0.0018,0.00176667,0.0017,0.00166667,0.00163333,0.00166667,0.00163333,0.0017,0.00166667,0.0018,0.00166667,0.00173333,0.0017,0.0017,0.002,0.00176667,0.00166667,0.00186667,0.00166667,0.00183333,0.00163333,0.00176667,0.00183333,0.00183333,0.00173333,0.0018,0.0018,0.0017,0.00166667,0.0018,0.0018,0.0017,0.00186667,0.00166667,0.00193333,0.0018,0.0017,0.0019,0.00163333,0.00173333,0.00203333,0.0017,0.00166667,0.002,0.00163333,0.00186667,0.002,0.0019,0.00163333,0.0017,0.00186667,0.00163333,0.00173333,0.00166667,0.0018,0.00203333,0.00166667,0.00166667,0.00186667,0.0018,0.0018,0.00163333,0.0019,0.00186667,0.00166667,0.00203333,0.00176667,0.00166667,0.002,0.00163333,0.00186667,0.00183333,0.00186667,0.00176667,0.00166667,0.00196667,0.0021,0.0018,0.00186667,0.00193333,0.00196667,0.0017,0.00176667,0.0018,0.0017,0.0017,0.00183333,0.00173333,0.0021,0.002,0.0019,0.00193333,0.0018,0.00166667,0.00173333,0.00183333,0.0017,0.00176667,0.0018,0.00196667,0.00206667,0.0018,0.00173333,0.00186667,0.00173333,0.00186667,0.00166667,0.0017,0.00183333,0.00173333,0.0018,0.00163333,0.00203333,0.0018,0.00166667,0.00216667,0.00183333,0.0017,0.00166667,0.0018,0.00163333,0.00183333,0.0019,0.00163333,0.00183333,0.0018,0.00206667,0.0017,0.00176667,0.00196667,0.00163333,0.00166667,0.00206667,0.0017,0.0018,0.00196667,0.00166667,0.00193333,0.00173333,0.00166667,0.0017,0.00173333,0.0018,0.00206667,0.00166667,0.00166667,0.00186667,0.00163333,0.0017,0.00163333,0.00186667,0.00173333,0.00166667,0.00183333,0.0018,0.002,0.00163333,0.00176667,0.00206667,0.00173333,0.00183333,0.00173333,0.00163333,0.00173333,0.00183333,0.00163333,0.00163333,0.00203333,0.00173333,0.0017,0.00196667,0.00176667,0.00166667,0.00173333,0.0018,0.0018,0.00173333,0.00183333,0.0019,0.002,0.00176667,0.002,0.00173333,0.00203333,0.0018,0.00163333,0.00173333,0.00166667,0.00166667,0.00213333,0.00173333,0.00163333,0.00183333,0.002,0.00203333,0.00166667,0.0019,0.00173333,0.00166667,0.00193333,0.0017,0.00166667,0.00176667,0.0017,0.0017,0.00166667,0.00166667,0.0018,0.00163333,0.0017,0.00203333,0.00193333,0.00176667,0.00176667,0.0019,0.0018,0.00176667,0.00166667,0.0018,0.0019,0.00196667,0.00173333,0.00173333,0.0018,0.0018,0.00166667,0.00166667,0.00213333,0.0018,0.00206667,0.00173333,0.00163333,0.00166667,0.00166667,0.00176667,0.00166667,0.0017,0.00186667,0.00176667,0.00183333,0.00176667,0.002,0.00163333,0.0017,0.0018,0.00206667,0.00186667,0.0018,0.00173333,0.00196667,0.0017,0.0017,0.0018,0.00173333,0.0017,0.0017,0.00203333,0.0018,0.0018,0.00163333,0.00173333,0.00176667,0.00163333,0.0017,0.00213333,0.0017,0.00186667,0.00206667,0.00163333,0.0017,0.00176667,0.00163333,0.0021,0.0018,0.0021,0.00173333,0.00196667,0.00166667,0.00183333,0.0017,0.0018,0.00196667,0.00166667,0.00166667,0.00206667,0.00166667,0.00163333,0.00186667,0.0018,0.00186667,0.0017,0.00203333,0.00206667,0.00233333,0.0018,0.00216667,0.0017,0.00163333,0.0019,0.00173333,0.0018,0.00196667,0.00186667,0.0018,0.00176667,0.0017,0.0017,0.0018,0.0017,0.00176667,0.00166667,0.00166667,0.00196667,0.00186667,0.0021,0.00163333,0.0019,0.00176667,0.0018,0.00163333,0.00163333,0.810967,0}
注意这里不是按照appid频率排序的, 所以还需要处理一下.
digoal=# select * from
(select row_number() over(partition by r) as rn,ele from (select unnest(most_common_elems::text::int[]) ele,2 as r from pg_stats where tablename='test_2' and attname='appid') t) t1
join
(select row_number() over(partition by r) as rn,freq from (select unnest(most_common_elem_freqs) freq,2 as r from pg_stats where tablename='test_2' and attname='appid') t) t2
on (t1.rn=t2.rn) order by t2.freq desc limit 20; rn | ele | rn | freq
-----+-----+-----+------------ 2 | 1 | 2 | 0.810967 8 | 7 | 8 | 0.8102 3 | 2 | 3 | 0.809233 4 | 3 | 4 | 0.808433 10 | 9 | 10 | 0.808367 7 | 6 | 7 | 0.808067 5 | 4 | 5 | 0.807467 6 | 5 | 6 | 0.806667 9 | 8 | 9 | 0.806233 1 | 0 | 1 | 0.590833 11 | 10 | 11 | 0.588033 474 | 939 | 474 | 0.00233333 169 | 348 | 169 | 0.00233333 138 | 281 | 138 | 0.0023 21 | 27 | 21 | 0.00223333 32 | 44 | 32 | 0.0022 476 | 942 | 476 | 0.00216667 296 | 593 | 296 | 0.00216667 112 | 227 | 112 | 0.00216667 56 | 108 | 56 | 0.00213333
(20 rows)
前10完全准确, 但是由于前8的记录数偏差太小, 所以前8的排名顺序可能不准确.
例如从统计信息中取出的排名 :
rn | ele | rn | freq
-----+-----+-----+------------ 2 | 1 | 2 | 0.810967 8 | 7 | 8 | 0.8102 3 | 2 | 3 | 0.809233 4 | 3 | 4 | 0.808433 10 | 9 | 10 | 0.808367 7 | 6 | 7 | 0.808067 5 | 4 | 5 | 0.807467 6 | 5 | 6 | 0.806667 9 | 8 | 9 | 0.806233 1 | 0 | 1 | 0.590833 11 | 10 | 11 | 0.588033
实际排名 :
appid | count
-------+-------- 9 | 872831 6 | 871908 3 | 871867 7 | 871551 8 | 871436 4 | 871391 1 | 871051 5 | 870770 2 | 870692 10 | 435583 0 | 435342
非常OK , 这对于大数据的统计来说, 无疑是非常重要的参考.
注意
- 数据统计信息, 占用空间超过ARRAY_WIDTH_THRESHOLD的数组不会进入统计范畴.
src/backend/utils/adt/array_typanalyze.c
+/*
+ * To avoid consuming too much memory, IO and CPU load during analysis, and/or
+ * too much space in the resulting pg_statistic rows, we ignore arrays that
+ * are wider than ARRAY_WIDTH_THRESHOLD (after detoasting!). Note that this
+ * number is considerably more than the similar WIDTH_THRESHOLD limit used
+ * in analyze.c's standard typanalyze code.
+ */
+#define ARRAY_WIDTH_THRESHOLD 0x10000
....
+ /* Skip too-large values. */
+ if (toast_raw_datum_size(value) > ARRAY_WIDTH_THRESHOLD)
+ continue;
+ else
+ analyzed_rows++;
PostgreSQL pg_stats used to estimate top N freps values and explain rows相关推荐
- PostgreSQL技术周刊第8期:用PostgreSQL 做实时高效搜索引擎
PostgreSQL(简称PG)的开发者们: 云栖社区已有5000位PG开发者,发布了3000+PG文章(文章列表),沉淀了700+的PG精品问答(问答列表). PostgreSQL技术周刊将会为大家 ...
- 数据库案例集锦 - 开发者的《如来神掌》
标签 PostgreSQL , PG DBA cookbook , PG Oracle兼容性 , PG 架构师 cookbook , PG 开发者 cookbook , PG 应用案例 背景 「剑魔独 ...
- PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle
标签 PostgreSQL , dump_stat , 统计信息 , 导出导入 背景 <PostgreSQL 规格评估 - 微观.宏观.精准 多视角估算数据库性能(选型.做预算不求人)> ...
- 将您的SQL Server工作负载迁移到PostgreSQL –第2部分
目录 SQL Server vs PostgreSQL 并发控制 SQL Server并发控制 PostgreSQL并发控制 数据库结构 T-SQL与PL/pgSQL PostgreSQL具有更完善的 ...
- Indexes in PostgreSQL — 7 (GIN)
We have already got acquainted with PostgreSQL indexing engine and the interface of access methods a ...
- Postgresql GIN索引
GIN概念介绍: GIN是Generalized Inverted Index的缩写.就是所谓的倒排索引.它处理的数据类型的值不是原子的,而是由元素构成.我们称之为复合类型.如('hank', '15 ...
- Postgresql数据库介绍8——使用
函数和操作(Functions and Operators) 逻辑操作符(Logical Operators) The usual logical operators are available: A ...
- PostgreSQL pgmetrics - 多版本、健康监控指标采集、报告
背景 pgmetrics,GO写的一款PostgreSQL 多版本.健康监控指标采集.报告开源软件. https://github.com/rapidloop/pgmetrics 结合pgdash,可 ...
- postgresql存图片字段类型_PostgreSQL 入门 | Linux 中国
安装.设置.创建和开始使用 PostgreSQL 数据库.-- Greg Pittman 每个人或许都有需要在数据库中保存的东西.即使你执着于使用纸质文件或电子文件,它们也会变得很麻烦.纸质文档可能会 ...
- PostgreSQL日期函数备忘
PostgreSQL日期函数备忘 http://www.postgresql.org/docs/current/static/functions-formatting.html Date/Time F ...
最新文章
- 放眼2018年,我们对虚拟现实产品有五大疑问
- 【Maven学习】Nexus OSS私服仓库的备份与迁移
- [ BZOJ 4668 ] 冷战
- python定义复数矩阵_python矩阵运算,转置,逆运算,共轭矩阵实例
- 同花顺如何切换k线_同花顺k线图怎么切换(aud和人民币的汇率)
- 51单片机开发板介绍
- Python selenium爬虫抓取船舶网站数据(动态页面)
- python数组增加维度_python – 为xarray DataArray添加维度
- XDMCP远程连接Linux桌面(lightdm) ps -ef |grep lightdm
- 普普通通的算法期末考试试卷
- STM32+PN532写UltraLight标签
- 从零开始系统化学Android,手慢无
- Deci and Centi Seconds parsing in java
- 棋圣高调搬弄名人日本棋圣挟五冠搬弄对手
- 无法同时打开多个PPT,并排显示
- PWorld2016大会演讲PPT+访谈视频大合集,总有一款是你在找的!
- (转载)成都移动IPV6光猫设置教程
- 必应输入法VS搜狗拼音
- 淘宝短视频直播怎么去运营丨国仁网络资讯
- 【DOE】--方差、自由度、回归分析