多维数据查询效率分析(1)
有时需求需要我们把系统做成灵活的。最常见的形式是,属性不能是固定的,要用户可以自定义。这样的需求往往会在数据库中建模成一个一对多的关系。
create table person { ... }
create table person_attribute { person_id ... attribute_name ... attribute_value ... ... }
这样的建模在没有查询需要的时候,还是蛮不错的。但是一旦需要对扩充的属性值进行查询,速度往往惨不忍睹。曾经在新加坡做过一个电信的遗留系统的前端,其数据库的建模就是这样的。对于中间的属性表,一个简单的查询都需要join好几次,速度非常慢。好在那次只是做ETL,并不是直接把这样的数据库做后端,要不然肯定死的很惨。当时的做法是把所有的数据读入到内存中,针对属性的查询用内存集合遍历来实现。这样做的前提是集合的元素数量非常少(几百而已),总数据量也非常少。但是如果我们需要处理的数据量非常大,那么我们就必须在数据库中能够对多维数据进行高效查询。
为了搞清楚这个问题,我们需要做一系列实验。先来介绍一下我们实验的对象。
假定我们有一张contacts表,然后对每个联系人有一个contact_categories的表,简称cc
create table cc( contact_id integer, cad_id integer, value integer);
cad_id代表字段的id,value是cateogry的值。数据量是500万。实验用的数据库分别是PostgreSQL(原始数据大小250M),和使用MYISAM引擎的MySQL(原始数据大小73M)。使用的磁盘是普通的笔记本硬盘,没有raid,普通的ext4分区,峰值传输率大概是70M/s。假设没有资源的争抢,而且数据库总是以最快的顺序读的方式从磁盘中加载数据,那么PostgreSQL得用4s,而MySQL也需要1s才能把所有的磁盘内容读到内存中。
所以如果我们使用基于磁盘的解决方案的话,无论如何也无法把查询压缩到1s以内。因为我们需要提供一个Reponsive的界面前端,所以数据必须能够在内存中被查询,可能不是所有的数据都能放入内存,但是最起码被查询到的数据得一直在内存中。最简单的办法把数据库移到内存中的方式不是改数据库的设置,而是直接把内存映射成文件夹:
sudo mount -t ramfs -o size=200000m ramfs /mnt/memory
然后把数据库的数据目录移动到/mnt/memory之中。但是数据库的缓存设置也是必须修改的,比如PostgreSQL的work_mem如果设置过小的话,在做对一个很大的表做count(distinct xxx)时就会导致中间结果被写入到临时表之中。所以我们还是把所有的缓存搞大一些吧。
work_mem = 1000MBshared_buffer = 1000MBtemp_buffer = 1000MBeffective_cache_size = 1000MBwal_buffers = 1000MBauto_vacuum = off
让我们来看看最基本的一个查询能有多快吧
taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc;' count --------- 5904385(1 row) real 0m0.448suser 0m0.024ssys 0m0.000s
count(*)和count(contact_id)是一样的么,让我们来试验一下
taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc;' count --------- 5904385(1 row) real 0m0.537suser 0m0.016ssys 0m0.008s
有意思!居然比count(*)还要慢。可能是因为我们没有给contact_id字段加索引的原因。好吧,加上索引。
taowen@dmright-perf:~$ time psql postgres -c 'create index contact_id_idx on cc(contact_id);'CREATE INDEX real 0m4.848suser 0m0.016ssys 0m0.008staowen@dmright-perf:~$ time psql postgres -c 'analyze cc;'ANALYZE real 0m0.197suser 0m0.024ssys 0m0.004s
再试试看
taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc;' count --------- 5904385(1 row) real 0m0.534suser 0m0.020ssys 0m0.008s taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc;' count --------- 5904385(1 row) real 0m0.447suser 0m0.028ssys 0m0.008s
基本上没有变化……count(*)貌似就是比count(contact_id)要快。不管啦,既然小于500ms,也算是够快了。让我们给查询加上个条件吧。
taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc where cad_id = 101 and value = 5;' count -------- 998839(1 row) real 0m0.686suser 0m0.024ssys 0m0.000s taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc where cad_id = 101 and value = 5;' count -------- 998839(1 row) real 0m0.660suser 0m0.024ssys 0m0.000s
我们可以看到,在count(*)和count(contact_id)之间没有特别大的区别。但是我们还没有给cad_id和value加索引,所以让我们加上看看如何
taowen@dmright-perf:~$ time psql postgres -c 'create index cad_id_value_idx on cc(cad_id, value);'CREATE INDEX real 0m10.069suser 0m0.020ssys 0m0.008s taowen@dmright-perf:~$ time psql postgres -c 'analyze cc;'ANALYZE real 0m0.199suser 0m0.016ssys 0m0.012s and try again. taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc where cad_id = 101 and value = 5;' count -------- 998839(1 row) real 0m0.283suser 0m0.020ssys 0m0.012s
真是很快很快耶!让我们来回顾一下,我们现在对于contact_id和(cad_id, value)都建立了索引。
接下来我们把查询弄得更复杂一些,加上AND条件。有三种可能的方式:
1、INTERSECT
2、INNER JOIN
3、IN + SUB QUERY
我们每种做法都试试
taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp' count -------- 164788(1 row) real 0m1.159suser 0m0.028ssys 0m0.008s taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp' count -------- 164788(1 row) real 0m1.148suser 0m0.032ssys 0m0.000s
显而易见,这么做很慢。那么INNER JOIN是不是更快一些呢?
taowen@dmright-perf:~$ time psql postgres -c 'select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;' count -------- 164788(1 row) real 0m1.162suser 0m0.036ssys 0m0.000s
这大概要慢上个200ms了。那么IN + SUB QUERY呢?
taowen@dmright-perf:~$ time psql postgres -c 'select count(a1.contact_id) from cc as a1 where a1.cad_id = 101 and a1.value = 5 and a1.contact_id in (select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7)' count -------- 164788(1 row) real 0m2.645suser 0m0.024ssys 0m0.004s
靠,居然更慢了。总结就是,在有两个条件的情况下,INTERSECT似乎是最快的。但是即便如此,它也超过了一秒钟了。为什么会这样呢?时间都花哪里去了?
taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp' QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=107853.91..107853.92 rows=1 width=4) (actual time=1471.907..1471.907 rows=1 loops=1) -> Subquery Scan on temp (cost=7467.63..107375.65 rows=191301 width=4) (actual time=1366.131..1459.419 rows=164788 loops=1) -> HashSetOp Intersect (cost=7467.63..105462.64 rows=191301 width=4) (actual time=1366.129..1439.781 rows=164788 loops=1) -> Append (cost=7467.63..103326.69 rows=854380 width=4) (actual time=69.436..797.478 rows=1829408 loops=1) -> Subquery Scan on "*SELECT* 2" (cost=7467.63..48180.53 rows=351876 width=4) (actual time=69.435..350.031 rows=830569 loops=1) -> Bitmap Heap Scan on cc (cost=7467.63..44661.77 rows=351876 width=4) (actual time=69.434..264.538 rows=830569 loops=1) Recheck Cond: ((cad_id = 102) AND (value = 7)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=64.162..64.162 rows=830569 loops=1) Index Cond: ((cad_id = 102) AND (value = 7)) -> Subquery Scan on "*SELECT* 1" (cost=10667.56..55146.16 rows=502504 width=4) (actual time=69.846..331.544 rows=998839 loops=1) -> Bitmap Heap Scan on cc (cost=10667.56..50121.12 rows=502504 width=4) (actual time=69.845..233.794 rows=998839 loops=1) Recheck Cond: ((cad_id = 101) AND (value = 5)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=64.501..64.501 rows=998839 loops=1) Index Cond: ((cad_id = 101) AND (value = 5)) Total runtime: 1477.484 ms(15 rows)
首先,INTERSECT不是并行执行的。两个子查询分别花费了300ms以上的时间,加起来有800ms是用在搜集contact_id上了。其次,集合之间的并集操作花费了600多ms。其余的时间都花在了数集合的成员个数上了。有一点值得注意的是,根据 http://postgresql.1045698.n5.nabble.com/ANTI-JOIN-needs-table-index-scan-not-possible-td3425340.html index scan和heap scan实际上都用上了索引。然而,SELECT还是在其之上额外耗费了100ms,我猜测它可能是回到原始的表结构中把行取出来,以获得contact_id的值。这在磁盘上的话速度会更慢,因为会是random seek操作。
看完了INTERSECT,让我们再来分析分析Join:
taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;' QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=180366.31..180366.32 rows=1 width=4) (actual time=1280.964..1280.964 rows=1 loops=1) -> Merge Join (cost=174783.00..180104.02 rows=104912 width=4) (actual time=1043.879..1270.197 rows=164788 loops=1) Merge Cond: (a1.contact_id = a2.contact_id) -> Sort (cost=97705.18..98961.44 rows=502504 width=4) (actual time=586.735..626.292 rows=998839 loops=1)Sort Key: a1.contact_idSort Method: quicksort Memory: 71397kB -> Bitmap Heap Scan on cc a1 (cost=10667.56..50121.12 rows=502504 width=4) (actual time=75.377..257.403 rows=998839 loops=1) Recheck Cond: ((cad_id = 101) AND (value = 5)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=69.565..69.565 rows=998839 loops=1) Index Cond: ((cad_id = 101) AND (value = 5)) -> Sort (cost=77077.83..77957.52 rows=351876 width=4) (actual time=457.131..492.428 rows=830569 loops=1)Sort Key: a2.contact_idSort Method: quicksort Memory: 63509kB -> Bitmap Heap Scan on cc a2 (cost=7467.63..44661.77 rows=351876 width=4) (actual time=60.039..186.422 rows=830569 loops=1) Recheck Cond: ((cad_id = 102) AND (value = 7)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=54.929..54.929 rows=830569 loops=1) Index Cond: ((cad_id = 102) AND (value = 7)) Total runtime: 1286.846 ms(18 rows)
貌似大部分时间都花在了排序上。而且它也不是并行执行的。如果后台同时执行top命令的话,就会发现只有一个核是被实际占用着的。
要是再创建更多的索引呢?会不会有帮助?
taowen@dmright-perf:~$ time psql postgres -c 'create index cad_id_value_contact_id on cc(cad_id, value, contact_id);'CREATE INDEX real 0m10.683suser 0m0.020ssys 0m0.004staowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;' QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=180366.31..180366.32 rows=1 width=4) (actual time=1338.478..1338.478 rows=1 loops=1) -> Merge Join (cost=174783.00..180104.02 rows=104912 width=4) (actual time=1090.136..1327.312 rows=164788 loops=1) Merge Cond: (a1.contact_id = a2.contact_id) -> Sort (cost=97705.18..98961.44 rows=502504 width=4) (actual time=615.843..657.438 rows=998839 loops=1)Sort Key: a1.contact_idSort Method: quicksort Memory: 71397kB -> Bitmap Heap Scan on cc a1 (cost=10667.56..50121.12 rows=502504 width=4) (actual time=80.926..275.431 rows=998839 loops=1) Recheck Cond: ((cad_id = 101) AND (value = 5)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=75.816..75.816 rows=998839 loops=1) Index Cond: ((cad_id = 101) AND (value = 5)) -> Sort (cost=77077.83..77957.52 rows=351876 width=4) (actual time=474.279..510.866 rows=830569 loops=1)Sort Key: a2.contact_idSort Method: quicksort Memory: 63509kB -> Bitmap Heap Scan on cc a2 (cost=7467.63..44661.77 rows=351876 width=4) (actual time=65.335..198.655 rows=830569 loops=1) Recheck Cond: ((cad_id = 102) AND (value = 7)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=60.314..60.314 rows=830569 loops=1) Index Cond: ((cad_id = 102) AND (value = 7)) Total runtime: 1346.587 ms(18 rows)
还是不行!Merge Join慢的话,升级到9.1然后强制使用hash join会不会好一点?
postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=902356.85..902356.86 rows=1 width=4) (actual time=1693.137..1693.137 rows=1 loops=1) -> Hash Join (cost=59599.78..902135.88 rows=88389 width=4) (actual time=461.788..1682.718 rows=164788 loops=1) Hash Cond: (a1.contact_id = a2.contact_id) -> Bitmap Heap Scan on cc a1 (cost=10833.24..50406.32 rows=510472 width=4) (actual time=76.337..224.571 rows=998839 loops=1) Recheck Cond: ((cad_id = 101) AND (value = 5)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10705.62 rows=510472 width=0) (actual time=71.028..71.028 rows=998839 loops=1) Index Cond: ((cad_id = 101) AND (value = 5)) -> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=385.256..385.256 rows=830569 loops=1) Buckets: 65536 Batches: 1 Memory Usage: 29200kB -> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=64.778..258.059 rows=830569 loops=1) Recheck Cond: ((cad_id = 102) AND (value = 7)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=59.675..59.675 rows=830569 loops=1) Index Cond: ((cad_id = 102) AND (value = 7)) Total runtime: 1698.207 ms(14 rows)
有两个原因造成这样的情况:
1、两个条件意味着扫描两次,而且不是并行扫描
2、join自身很耗费时间,无论是sort merge join还是hash join
两个条件都这熊样了,三个条件呢?看看吧,先上INTERSECT:
postgres@dmright-perf:/mnt/memory/pg$ time psql postgres -c 'select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7 intersect select contact_id from cc where cad_id = 6 and value = 1) as temp' count ------- 6748(1 row) real 0m1.350suser 0m0.020ssys 0m0.008s
然后是INNER JOIN
postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id inner join cc as a3 on a1.contact_id = a3.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1'; count ------- 6748(1 row) real 0m0.756suser 0m0.028ssys 0m0.000s
然后是IN + SUBQUERY:
postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'select count(a1.contact_id) from cc as a1 where a1.cad_id = 101 and a1.value = 5 and a1.contact_id in (select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7 and a2.contact_id in (select contact_id from cc as a3 where a3.cad_id = 6 and a3.value = 1))'; count ------- 6748(1 row) real 0m7.320suser 0m0.024ssys 0m0.004s
我们可以看到这回INNER JOIN是最快的了。为什么?
postgres@dmright-perf:/mnt/memory/pg$ time psql postgres -c 'explain analyze select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7 intersect select contact_id from cc where cad_id = 6 and value = 1) as temp' QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=132357.21..132357.22 rows=1 width=4) (actual time=1659.666..1659.666 rows=1 loops=1) -> Subquery Scan on temp (cost=306.85..132335.71 rows=8602 width=4) (actual time=1648.818..1659.243 rows=6748 loops=1) -> HashSetOp Intersect (cost=306.85..132249.69 rows=8602 width=4) (actual time=1648.817..1658.543 rows=6748 loops=1) -> Append (cost=306.85..131691.89 rows=223118 width=4) (actual time=33.797..1547.811 rows=372016 loops=1) -> Subquery Scan on "*SELECT* 3" (cost=306.85..26138.13 rows=14239 width=4) (actual time=33.797..153.230 rows=207228 loops=1) -> Bitmap Heap Scan on cc (cost=306.85..25995.74 rows=14239 width=4) (actual time=33.795..134.292 rows=207228 loops=1) Recheck Cond: ((cad_id = 6) AND (value = 1)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=28.558..28.558 rows=207228 loops=1) Index Cond: ((cad_id = 6) AND (value = 1)) -> Result (cost=7341.27..105553.76 rows=208879 width=4) (actual time=1282.426..1371.255 rows=164788 loops=1) -> HashSetOp Intersect (cost=7341.27..105553.76 rows=208879 width=4) (actual time=1282.418..1353.422 rows=164788 loops=1) -> Append (cost=7341.27..103413.10 rows=856264 width=4) (actual time=57.550..734.282 rows=1829408 loops=1) -> Subquery Scan on "*SELECT* 2" (cost=7341.27..47902.07 rows=345792 width=4) (actual time=57.550..292.595 rows=830569 loops=1) -> Bitmap Heap Scan on cc (cost=7341.27..44444.15 rows=345792 width=4) (actual time=57.549..218.162 rows=830569 loops=1) Recheck Cond: ((cad_id = 102) AND (value = 7)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=52.606..52.606 rows=830569 loops=1) Index Cond: ((cad_id = 102) AND (value = 7)) -> Subquery Scan on "*SELECT* 1" (cost=10833.24..55511.04 rows=510472 width=4) (actual time=69.129..330.737 rows=998839 loops=1) -> Bitmap Heap Scan on cc (cost=10833.24..50406.32 rows=510472 width=4) (actual time=69.128..242.416 rows=998839 loops=1) Recheck Cond: ((cad_id = 101) AND (value = 5)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10705.62 rows=510472 width=0) (actual time=64.161..64.161 rows=998839 loops=1) Index Cond: ((cad_id = 101) AND (value = 5)) Total runtime: 1665.691 ms(23 rows)
对于INTERSECT来说,过程和两个条件是差不多的,只是集合更大一些罢了。
postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id inner join cc as a3 on a1.contact_id = a3.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=118979.47..118979.48 rows=1 width=4) (actual time=771.393..771.393 rows=1 loops=1) -> Nested Loop (cost=49073.39..118977.89 rows=630 width=4) (actual time=399.835..770.587 rows=6748 loops=1) -> Hash Join (cost=49073.39..97159.97 rows=2466 width=8) (actual time=399.814..691.519 rows=34219 loops=1) Hash Cond: (a3.contact_id = a2.contact_id) -> Bitmap Heap Scan on cc a3 (cost=306.85..25995.74 rows=14239 width=4) (actual time=25.655..83.526 rows=207228 loops=1) Recheck Cond: ((cad_id = 6) AND (value = 1)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=20.570..20.570 rows=207228 loops=1) Index Cond: ((cad_id = 6) AND (value = 1)) -> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=373.969..373.969 rows=830569 loops=1) Buckets: 65536 Batches: 1 Memory Usage: 29200kB -> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=59.271..250.932 rows=830569 loops=1) Recheck Cond: ((cad_id = 102) AND (value = 7)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=54.030..54.030 rows=830569 loops=1) Index Cond: ((cad_id = 102) AND (value = 7)) -> Index Scan using cad_id_value_contact_id_idx on cc a1 (cost=0.00..8.83 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=34219) Index Cond: ((cad_id = 101) AND (value = 5) AND (contact_id = a2.contact_id)) Total runtime: 774.588 ms(17 rows)
对于INNER JOIN,索引都被利用上了。它不需要取得contact_id然后再来做集合操作。结论是对rowid做hash操作比集合操作更快。
不显示用INNER JOIN,让Planner决定Join顺序也是一样的:
postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1, cc as a2, cc as a3 where a1.contact_id = a2.contact_id and a2.contact_id = a3.contact_id and a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=118979.47..118979.48 rows=1 width=4) (actual time=762.969..762.970 rows=1 loops=1) -> Nested Loop (cost=49073.39..118977.89 rows=630 width=4) (actual time=398.554..762.206 rows=6748 loops=1) -> Hash Join (cost=49073.39..97159.97 rows=2466 width=8) (actual time=398.531..684.425 rows=34219 loops=1) Hash Cond: (a3.contact_id = a2.contact_id) -> Bitmap Heap Scan on cc a3 (cost=306.85..25995.74 rows=14239 width=4) (actual time=34.802..91.672 rows=207228 loops=1) Recheck Cond: ((cad_id = 6) AND (value = 1)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=29.524..29.524 rows=207228 loops=1) Index Cond: ((cad_id = 6) AND (value = 1)) -> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=363.537..363.537 rows=830569 loops=1) Buckets: 65536 Batches: 1 Memory Usage: 29200kB -> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=57.799..245.467 rows=830569 loops=1) Recheck Cond: ((cad_id = 102) AND (value = 7)) -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=52.349..52.349 rows=830569 loops=1) Index Cond: ((cad_id = 102) AND (value = 7)) -> Index Scan using cad_id_value_contact_id_idx on cc a1 (cost=0.00..8.83 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=34219) Index Cond: ((cad_id = 101) AND (value = 5) AND (contact_id = a2.contact_id)) Total runtime: 766.107 ms(17 rows)
结果是一样的。
基本上对PostgreSQL的实验就到这里了,差不多也就这样了,提高空间不大。Google之后发现,PostgreSQL使用的MVCC机制导致其甚至在做SELECT COUNT这样的操作的时候也会去更新hint bit。也许这就是其慢的重要原因。MySQL的MYISAM引擎以完全不负责事务和著称,应该会比PostgreSQL有更大的提升,下一篇中,我们将对MySQL重复同样的实验。
转载于:https://www.cnblogs.com/taowen/archive/2012/02/26/2367724.html
多维数据查询效率分析(1)相关推荐
- 以太坊数据查询与分析工具:以太坊浏览器Etherscan、ETHERQL
第八章 文章目录 第八章 一.以太坊浏览器Etherscan 1.Ethersan的基本功能 2.其他功能 3.API 4.ENS域名查询 二.ETHERQL 1.同步管理器 2.处理程序链 3.持久 ...
- 基于springboot项目中使用docker-compose+es+kibana+logstash+mysql 提高数据查询效率
基于springboot项目中使用docker-compose+es+kibana+logstash+mysql 提高数据查询效率 1.拉取logstash,kibana,es,mysql镜像 #命令 ...
- 阿里云图数据库GDB公测,高度连接数据查询效率提升10倍
3月21日,阿里云图数据库GDB正式公测,用于处理高度连接数据查询与存储,支持属性图模型,并且全面兼容TinkerPop Gremlin 查询语言. 图数据库覆盖的应用场景非常广泛,比较典型的有社交网 ...
- Mysql 多表联合查询效率分析及优化
1. 多表连接类型 1. 笛卡尔积(交叉连接) 在MySQL中可以为CROSS JOIN或者省略CROSS即JOIN,或者使用',' 如: [sql] view plaincopy print? S ...
- mysql 多表查询 优化_Mysql 多表联合查询效率分析及优化
1. 多表连接类型 1. 笛卡尔积(交叉连接)在MySQL中可以为CROSS JOIN或者省略CROSS即JOIN,或者使用',' 如: SELECT * FROM table1 CROSS JOI ...
- mysql多表 性能_Mysql 多表联合查询效率分析及优化
1. 多表连接类型 1. 笛卡尔积(交叉连接)在MySQL中可以为CROSS JOIN或者省略CROSS即JOIN,或者使用',' 如: SELECT*FROMtable1CROSSJOINtabl ...
- mysql 多表 查询慢_详解Mysql多表联合查询效率分析及优化
1. 多表连接类型1. 笛卡尔积(交叉连接) 在MySQL中可以为CROSS JOIN或者省略CROSS即JOIN,或者使用',' 如: SELECT * FROM table1 CROSS JOI ...
- oracle sql语句执行顺序、查询效率分析、步骤解析
语句 执行顺序 select –5 from –1 where –2 group by –3 having –4 order by –6 查询语句中select from where group by ...
- mysql 多表联查的效率_Mysql 多表联合查询效率分析
1 多表连接类型 1. 笛卡尔积(交叉连接) 在MySQL中可以为CROSS JOIN或者省略CROSS即JOIN,或者使用',' 如: SELECT * FROM table1 CROSS JOIN ...
最新文章
- html5 直接获取当前位置,HTML5调用百度地图API获取当前位置并直接导航目的地的方法...
- linux下几种反弹Shell方法的总结与理解
- ubuntu中使用VsCode+Eigen创建Eiegn应用程序
- 老男孩Linux运维第41期20171016第六周学习重点课堂记录
- 美团DSP广告策略实践
- Java基础常见的面试题
- django路由层 虚拟环境 视图层
- Android 图像按钮ImageButton
- RTKLIB学习:数据下载和数据转换
- 铺铜需要把agnd和dgnd分开_AGNDDGND 分析
- 2021-11-08 RAID磁盘阵列的分级与结构 总结
- iOS 加载本地gif缓慢
- win10 ,jkd8 查看线程状态
- SQL中Round函数没有四舍五入原因及处理方法
- 功放管php和NPN的区别,一张图搞懂PNP型和NPN型三极管区别
- 计算机开机最快,你电脑开机要多久?4招让你的电脑开机速度比之前快几倍
- CSS制作大风车动画及花朵
- 船模、富斯遥控器、电调联调心得记录
- python如何读pst文件_Python 学习 第11篇:文件读写
- Cocos Creator 2.0.10极限跳跃实现