这篇文章列存的查询测试使用了clickhouse数据库,由于clickhouse的原理还在研究中,所以具体流程的分析存在欠缺,后面会在clickhouse的文档中描述。关于行存与列存的对比看前面三部分即可。

一、行式存储

​ 一般的事务型数据库(OLTP)基本会增删改查同一行数据,故大多使用了行式存储,所有数据按列名排成一行,可通过主键快速找到对应的那行数据,基本数据存储情况如下表。

编号 姓名 年龄 居住地 学位
1 张三 20 北京 本科
2 李四 21 上海 硕士
3 王五 22 广州 博士

​ 物理存储格式基本如下:

二、列式存储

​ 一般OLAP系统需要查询大量的数据,但仅需要关注其中几个列的数据,基于使用情况,一般会使用列式存储。同样的数据按列存储,基本数据存储情况为:

1 2 3
张三 李四 王五
20 21 22
北京 上海 广州
本科 硕士 博士

​ 物理存储格式基本如下:

三、读写对比

1、数据写入

  1. 按照上述示例结构,行存储写入一条数据仅需一次便可直接完成,并且可以保证数据完整性
  2. 列存储上述示例数据写入一条数据,需要拆成五列进行保存,写入次数比行存储翻了五倍,实际写入时间会更长

2、数据修改

  1. 若需要修改一条数据,行存储到指定位置写入一次即可,列存储需要定位到多个位置进行写入,按上述示例数据,列存储所需次数仍是行存储的5倍

3、数据读取

  1. 读取某行数据时,行存储会将该条记录的所有数据读出,若仅需要其中某列的数据,则存在了数据冗余,通常会消耗内存来消除这些冗余列数据
  2. 列存储读取的均为所需要的某一段数据,不存在冗余列的数据
  3. 由于行存储读取的数据包含了多种类型,可能存在数据类型之间的转换从而对数据进行解析,列存储读出的每一段数据的类型均相同,不需要对数据进行类型转换,可以使用不同方法对不同类型进行数据压缩,列存储更有利于对大数据进行分析

四、ClickHouse查询写入测试

1、测试数据导入

​ 本次ClickHouse使用官方提供的Yandex.Metrica Data测试数据来进行测试,其中包含两张表visit_v1(访问数据)和hits_v1(Yandex.Metrica提供的查询匹配数据),安装流程可参考ClickHouse测试数据导入,由于官方提供的导入方式前提为直接安装了clickhouse服务,而之前一直使用docker方式来进行安装,所以这里简单描述docker环境下如何导入测试数据。

# 使用了docker-compose打包了clickhouse镜像
## 1. 下载并解压测试数据
[root@xxxx clickhouse]# curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
[root@xxxx clickhouse]# curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
## 2. 在docker-compose配置文件中volume中添加一个地址和镜像中的对应
[root@xxxx docker_compose]# ll
total 28
-rw-r--r--  1 root root 14907 Apr 30 15:49 config.xml
drwxr-xr-x 10  101  101   204 Apr 26 18:56 data
-rw-r--r--  1 root root   733 Apr 26 18:46 docker-compose.yml
drwxr-xr-x  2 root root    80 Apr 30 15:53 log
drwxr-xr-x  2  101  101    58 Apr 30 15:41 tmp
-rw-r--r--  1 root root  4532 Mar 30 17:43 users.xml
[root@xxxx docker_compose]# vim docker-compose.ymlvolumes:... - ./tmp:/var/lib/clickhouse/tmp
# 这里使用tmp目录存放下载的测试数据,映射到镜像中的/var/lib/clickhouse/tmp目录
[root@xxxx docker_compose]# ll tmp/
total 10197044
-rw-r--r-- 1  101  101 7784351125 Apr 26 18:32 hits_v1.tsv
-rw-r--r-- 1 root root 2657415178 Apr 30 15:41 visits_v1.tsv
# docker clickhouse已经启动,进入docker环境导入数据即可
[root(host/tjtx148-16-25.58os.org)@tjtx162-17-78 docker_compose]# docker ps
CONTAINER ID        IMAGE                                                     COMMAND                  CREATED             STATUS                 PORTS                                                                              NAMES
968b926da80b        clickhouse-server-demo:1.0                                "/entrypoint.sh"         3 days ago          Up 3 days              0.0.0.0:8123->8123/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9004->9004/tcp, 9009/tcp   clickhouse-server_1
## 3. 导入测试数据
[root@xxxx docker_compose]# docker exec -it clickhouse-server_1 /bin/bash
root@clickhouse-server_1:/# ll /var/lib/clickhouse/tmp/
total 10197044
drwxr-xr-x  2 clickhouse clickhouse         58 Apr 30 07:41 ./
drwxr-xr-x 10 clickhouse clickhouse        204 Apr 26 10:56 ../
-rw-r--r--  1 clickhouse clickhouse 7784351125 Apr 26 10:32 hits_v1.tsv
-rw-r--r--  1 root       root       2657415178 Apr 30 07:41 visits_v1.tsv
root@clickhouse-server_1:/# clickhouse-client --query "CREATE DATABASE IF NOT EXISTS datasets"
... # 建表语句参考官方文档即可
## 将visiit_v1.tsv测试数据导入,另外一张表hists_v1采用同样方法导入
root@clickhouse-server_1:/# cat visits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.visits_v1 FORMAT TSV" --max_insert_block_size=100000## 4. 检验测试数据量
root@clickhouse-server_1:/# clickhouse-client
ClickHouse client version 20.3.4.10 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.3.4 revision 54433.clickhouse-server_1 :) use datasetsUSE datasetsOk.0 rows in set. Elapsed: 0.002 sec. clickhouse-server_1 :) select count() from hits_v1;SELECT count()
FROM hits_v1┌─count()─┐
│ 8873898 │
└─────────┘1 rows in set. Elapsed: 0.008 sec. clickhouse-server_1 :) select count() from visits_v1;SELECT count()
FROM visits_v1┌─count()─┐
│ 1676861 │
└─────────┘1 rows in set. Elapsed: 0.002 sec.

2、查询语句测试

​ 由于clickhouse本身并没有提供查看执行计划的命令,所以只能通过查看日志变相看到SQL语句的执行过程,下面将基于hits_v1表以及官方提供的SQL测试语句结合记录的日志来进行查询语句的测试,并且之后针对clickhouse会基于这两张表持续测试。

1. hits_v1表结构简单介绍

## 这里为了后面的测试,简单介绍hits_v1的表结构,具体关于clickhouse的SQL描述在后面的文章中写出
CREATE TABLE datasets.hits_v1 (`WatchID` UInt64, ...`RequestNum` UInt32, `RequestTry` UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192

clickhouse中create table的基础语法为:

CREATE TABLE [IF NOT EXISTS] [db.]table_name ON CLUSTER cluster
(name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],...INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = engine_name()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...];# 选项描述
- db:指定数据库名称,若未指定,默认使用当前数据库
- cluster: clickhouse中包含集群的概念,可对每个集群进行数据分片,创建分布式表,ON CLUSTER表示将在每个分片上都创建这个本地表,默认为default
- type: 该列类型,例如UInt32,clickhouse大小写敏感,类型必须遵守严格的大小写格式
- MATERIALIZED: 表示该列不能被修改,是通过其他列计算出来的数据并保存在表中,查询时并不需要即时计算,所以insert时不需要插入该列数据,select *查询时该列数据也不会显示
- ALIAS: 类似于MATERIALIZED参数,但该列数据并不会保存在表中,每次需要时才会进行计算,同样的insert不能修改该列数据,select *时该列数据也不会显示# 由于hits_v1测试表使用的是MergeTree引擎,这里剩余参数针对MergeTree进行简单描述,其余引擎未必全部支持,关于各个引擎的描述及适用场景会在之后的文档中分开进行详细描述
- PARTITION BY: 指定分区键,该测试表使用日期进行分区(EventDate)
- ORDER BY: 指定排序键
- PRIMARY KEY: 指定主键,默认与ORDER BY相同
- SAMPLE BY: 抽样表达式
- SETTING: 其他一些参数

2. 分组排序查询

# SQL:
clickhouse-server_1 :) select CounterID, count() AS c from hits_v1 group by CounterID order by c desc limit 10SELECT CounterID, count() AS c
FROM hits_v1
GROUP BY CounterID
ORDER BY c DESC
LIMIT 10┌─CounterID─┬──────c─┐
│   1704509 │ 523264 │
│    732797 │ 475698 │
│    598875 │ 337212 │
│    792887 │ 252197 │
│   3807842 │ 196036 │
│  25703952 │ 147211 │
│    716829 │  90109 │
│     59183 │  85379 │
│  33010362 │  77807 │
│    800784 │  77492 │
└───────────┴────────┘10 rows in set. Elapsed: 0.026 sec. Processed 8.87 million rows, 35.50 MB (339.77 million rows/s., 1.36 GB/s.) # log
## 1. 查询语句
2020.05.03 22:31:49.042192 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Debug> executeQuery: (from 127.0.0.1:59736) SELECT CounterID, count() AS c FROM hits_v1 GROUP BY CounterID ORDER BY c DESC LIMIT 10
## 2. 用户权限验证
2020.05.03 22:31:49.042790 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> AccessRightsContext (default): Access granted: SELECT(CounterID) ON datasets.hits_v1
## 3. 该SQL查询未使用主键索引
2020.05.03 22:31:49.042971 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Debug> datasets.hits_v1 (SelectExecutor): Key condition: unknown
## 4. 该SQL查询未使用分区索引
2020.05.03 22:31:49.042986 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: unknown
## 5. 该查询扫描了1个分区目录,共1092个MarkRange(分区目录和MarkRange在MergeTree中的意义还在理解中,后续在MergeTree引擎中详细描述)
2020.05.03 22:31:49.043021 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Debug> datasets.hits_v1 (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1092 marks to read from 1 ranges
## 6. 全表扫描,共8873898条数据,共使用20个stream来进行查询处理数据
2020.05.03 22:31:49.043381 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> datasets.hits_v1 (SelectExecutor): Reading approx. 8873898 rows with 20 streams
2020.05.03 22:31:49.044209 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2020.05.03 22:31:49.045627 [ 100 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> AggregatingTransform: Aggregating
...
...
## 7. 每个stream查询的行数,使用时间等信息
2020.05.03 22:31:49.050780 [ 100 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> AggregatingTransform: Aggregated. 572078 to 5872 rows (from 2.182 MiB) in 0.006 sec. (88841051.020 rows/sec., 338.902 MiB/sec.)
2020.05.03 22:31:49.050814 [ 193 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> AggregatingTransform: Aggregated. 516096 to 7617 rows (from 1.969 MiB) in 0.006 sec. (80547357.953 rows/sec., 307.264 MiB/sec.)
2020.05.03 22:31:49.051022 [ 191 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> AggregatingTransform: Aggregated. 450560 to 8195 rows (from 1.719 MiB) in 0.007 sec. (68053391.462 rows/sec., 259.603 MiB/sec.)
## 8. 某一个stream进行数据merge操作
2020.05.03 22:31:49.051046 [ 191 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> Aggregator: Merging aggregated data
## 9. 共读取了8873898条数据,共33.85M
2020.05.03 22:31:49.066350 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Information> executeQuery: Read 8873898 rows, 33.85 MiB in 0.024 sec., 368641453 rows/sec., 1.37 GiB/sec.
2020.05.03 22:31:49.066415 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
## 10. 查询共消耗了0.00B内存
2020.05.03 22:31:49.066786 [ 81 ] {} <Debug> MemoryTracker: Peak memory usage (total): 0.00 B.
## 11. 查询共使用了0.025s
2020.05.03 22:31:49.066814 [ 81 ] {} <Information> TCPHandler: Processed in 0.025 sec.

3. 使用主键索引

# SQL:
clickhouse-server_1 :) SELECT WatchID FROM hits_v1 WHERE CounterID = 67141SELECT WatchID
FROM hits_v1
WHERE CounterID = 67141┌─────────────WatchID─┐
│ 9092821867385297764 │
│ 6698385316098085730 │
│ 8281386980251669809 │
│ 7804373457861079090 │
│ 5352419935083292124 │
│ 7522442961486322437 │
│ 4926733399374529578 │
│ 8651569660825010330 │
│ 7777215115402859170 │
│ 5488491440763342147 │
│ 7016898938173798841 │
│ 7512073271455311672 │
│ 7675183452718991621 │
│ 7698094942612287474 │
│ 7229580476946423672 │
│ 8265472689024610766 │
│ 7397061429050334296 │
│ 5642502882079177996 │
│ 5521967617262710331 │
│ 6045376808846148744 │
│ 5223813301270698276 │
│ 5891294304736742075 │
│ 7473702977877450342 │
│ 7131227524298036078 │
│ 6397036526472438783 │
│ 5452801867475832050 │
│ 8203620973862900075 │
│ 8228211160680219393 │
│ 5669672267661574263 │
│ 6447542723619820343 │
│ 5609776647750491151 │
│ 5937976217944527938 │
│ 8559139126342788142 │
│ 6731577587255153490 │
│ 7541590813574755789 │
│ 6736741087826610411 │
│ 5750208933466385975 │
│ 6501641543222310031 │
│ 6817897199087131799 │
│ 8775895600472212626 │
│ 7276707177012917444 │
│ 7841417239216625313 │
│ 6708893161493789316 │
│ 5161987475887808662 │
│ 5167052428932424884 │
│ 8512404755681004329 │
│ 5407707620324494582 │
│ 7664508369041326595 │
│ 6437220034025745400 │
│ 5074053444698312956 │
│ 5698931552063656743 │
│ 8826145146896127905 │
└─────────────────────┘52 rows in set. Elapsed: 0.003 sec. Processed 8.19 thousand rows, 98.30 KB (2.77 million rows/s., 33.19 MB/s.) # log
2020.05.03 22:58:53.011895 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Debug> executeQuery: (from 127.0.0.1:50236) SELECT WatchID FROM hits_v1 WHERE CounterID = 67141
2020.05.03 22:58:53.012824 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Trace> AccessRightsContext (default): Access granted: SELECT(WatchID, CounterID) ON datasets.hits_v1
## 可以看到该条查询使用了主键索引
2020.05.03 22:58:53.012985 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Debug> datasets.hits_v1 (SelectExecutor): Key condition: (column 0 in [67141, 67141])
2020.05.03 22:58:53.013000 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: unknown
2020.05.03 22:58:53.013032 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Debug> datasets.hits_v1 (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1 marks to read from 1 ranges
2020.05.03 22:58:53.013210 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Trace> datasets.hits_v1 (SelectExecutor): Reading approx. 8192 rows with 1 streams
2020.05.03 22:58:53.013291 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
## 一共查询了8192行,共96KB
2020.05.03 22:58:53.013902 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Information> executeQuery: Read 8192 rows, 96.00 KiB in 0.002 sec., 4226200 rows/sec., 48.37 MiB/sec.
2020.05.03 22:58:53.013964 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2020.05.03 22:58:53.014059 [ 82 ] {} <Debug> MemoryTracker: Peak memory usage (total): 0.00 B.
## 查询一共花费了0.002s
2020.05.03 22:58:53.014082 [ 82 ] {} <Information> TCPHandler: Processed in 0.002 sec.

3、写入语句测试

​ 这里建立一张临时表,并插入一部分数据,结合log查看列数据库的写入逻辑,clickhouse官方目前未提供SQL查询及写入逻辑流程,并且基于不同的引擎,其写入逻辑有所不同,之后会在不同引擎的测试中对写入逻辑及日志进行详细描述。

# 建立临时表:
## SQL:
clickhouse-server_1 :) CREATE TABLE mixed_granularity_table (`WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `URLDomain` String, `RefererDomain` String, `Refresh` UInt8, `IsRobot` UInt8, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `UTCEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` Int32, `DNSTiming` Int32, `ConnectTiming` Int32, `ResponseStartTiming` Int32, `ResponseEndTiming` Int32, `FetchTiming` Int32, `RedirectTiming` Int32, `DOMInteractiveTiming` Int32, `DOMContentLoadedTiming` Int32, `DOMCompleteTiming` Int32, `LoadEventStartTiming` Int32, `LoadEventEndTiming` Int32, `NSToDOMContentLoadedTiming` Int32, `FirstPaintTiming` Int32, `RedirectCount` Int8, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `GoalsReached` Array(UInt32), `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32, `YCLID` UInt64, `ShareService` String, `ShareURL` String, `ShareTitle` String, `ParsedParams.Key1` Array(String), `ParsedParams.Key2` Array(String), `ParsedParams.Key3` Array(String), `ParsedParams.Key4` Array(String), `ParsedParams.Key5` Array(String), `ParsedParams.ValueDouble` Array(Float64), `IslandID` FixedString(16), `RequestNum` UInt32, `RequestTry` UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192, enable_mixed_granularity_parts=1; -- same with hits, but enabled mixed granularity...Ok.0 rows in set. Elapsed: 0.008 sec.
## log
### 1. 检查建表语句
2020.05.03 23:06:08.921921 [ 82 ] {0db543f9-58b2-4765-af63-a5f881246d41} <Debug> executeQuery: (from 127.0.0.1:50236) CREATE TABLE mixed_granularity_table (`WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `URLDomain` String, `RefererDomain` String, `Refresh` UInt8, `IsRobot` UInt8, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `UTCEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` Int32, `DNSTiming` Int32, `ConnectTiming` Int32, `ResponseStartTiming` Int32, `ResponseEndTiming` Int32, `FetchTiming` Int32, `RedirectTiming` Int32, `DOMInteractiveTiming` Int32, `DOMContentLoadedTiming` Int32, `DOMCompleteTiming` Int32, `LoadEventStartTiming` Int32, `LoadEventEndTiming` Int32, `NSToDOMContentLoadedTiming` Int32, `FirstPaintTiming` Int32, `RedirectCount` Int8, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `GoalsReached` Array(UInt32), `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32, `YCLID` UInt64, `ShareService` String, `ShareURL` String, `ShareTitle` String, `ParsedParams.Key1` Array(String), `ParsedParams.Key2` Array(String), `ParsedParams.Key3` Array(String), `ParsedParams.Key4` Array(String), `ParsedParams.Key5` Array(String), `ParsedParams.ValueDouble` Array(Float64), `IslandID` FixedString(16), `RequestNum` UInt32, `RequestTry` UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192, enable_mixed_granularity_parts = 1
### 2. 权限检查
2020.05.03 23:06:08.922323 [ 82 ] {0db543f9-58b2-4765-af63-a5f881246d41} <Trace> AccessRightsContext (default): Access granted: CREATE TABLE ON datasets.mixed_granularity_table
2020.05.03 23:06:08.925036 [ 82 ] {0db543f9-58b2-4765-af63-a5f881246d41} <Debug> datasets.mixed_granularity_table: Loading data parts
2020.05.03 23:06:08.925135 [ 82 ] {0db543f9-58b2-4765-af63-a5f881246d41} <Debug> datasets.mixed_granularity_table: Loaded data parts (0 items)
2020.05.03 23:06:08.925992 [ 82 ] {0db543f9-58b2-4765-af63-a5f881246d41} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2020.05.03 23:06:08.926057 [ 82 ] {} <Debug> MemoryTracker: Peak memory usage (total): 0.00 B.
2020.05.03 23:06:08.926089 [ 82 ] {} <Information> TCPHandler: Processed in 0.006 sec.# 插入数据:
## SQL:
clickhouse-server_1 :) INSERT INTO mixed_granularity_table SELECT * FROM hits_v1 LIMIT 10;INSERT INTO mixed_granularity_table SELECT *
FROM hits_v1
LIMIT 10↖ Progress: 10.00 rows, 10.85 KB (23.92 rows/s., 25.95 KB/s.)  0%Ok.0 rows in set. Elapsed: 0.418 sec.
## log:
2020.05.03 23:09:53.387837 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> executeQuery: (from 127.0.0.1:50236) INSERT INTO mixed_granularity_table SELECT * FROM hits_v1 LIMIT 10
### 需要进行两次权限检查,对新表mixed_granularity_table的insert权限以及对hits_v1的select权限
2020.05.03 23:09:53.388113 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Trace> AccessRightsContext (default): Access granted: INSERT(WatchID, JavaEnable, ..., RequestTry) ON datasets.mixed_granularity_table
2020.05.03 23:09:53.389910 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Trace> AccessRightsContext (default): Access granted: SELECT(WatchID, JavaEnable, ..., RequestTry) ON datasets.hits_v1
2020.05.03 23:09:53.390381 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> datasets.hits_v1 (SelectExecutor): Key condition: unknown
2020.05.03 23:09:53.390396 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: unknown
2020.05.03 23:09:53.390421 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> datasets.hits_v1 (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1092 marks to read from 1 ranges
2020.05.03 23:09:53.390615 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Trace> MergeTreeSelectProcessor: Reading 1 ranges from part 201403_1_32_2, approx. 8873898 rows starting from 0
2020.05.03 23:09:53.390711 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2020.05.03 23:09:53.391523 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> executeQuery: Query pipeline:
NullAndDoCopyConvertingLimitExpressionExpressionTreeExecutor2020.05.03 23:09:53.791641 [ 102 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 3.44 TiB.
2020.05.03 23:09:53.804394 [ 102 ] {321f3cb7-467a-4744-8371-29d536f78908} <Trace> datasets.mixed_granularity_table: Renaming temporary part tmp_insert_201403_1_1_0 to 201403_1_1_0.
2020.05.03 23:09:53.804871 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Information> executeQuery: Read 10 rows, 10.59 KiB in 0.417 sec., 23 rows/sec., 25.40 KiB/sec.
2020.05.03 23:09:53.804923 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> MemoryTracker: Peak memory usage (for query): 263.95 MiB.
2020.05.03 23:09:53.806135 [ 82 ] {} <Debug> MemoryTracker: Peak memory usage (total): 263.95 MiB.
2020.05.03 23:09:53.806169 [ 82 ] {} <Information> TCPHandler: Processed in 0.419 sec.

行存与列存的简单对比相关推荐

  1. 行存、列存,堆表、AO表性能对比 - 阿里云HDB for PostgreSQL最佳实践

    标签 PostgreSQL , GIS , PostGIS , Greenplum , 空间检索 , GiST , B-Tree , geohash 背景 <Greenplum 行存.列存,堆表 ...

  2. 代码干货 | 行存、列存_堆表、AO表性能对比-阿里云HDB for PostgreSQL最佳实践

    本文来源于阿里云-云栖社区,原文点击这里. 标签 PostgreSQL , GIS , PostGIS , Greenplum , 空间检索 , GiST , B-Tree , geohash 背景 ...

  3. postgres默认安装后有哪些表_Greenplum 行存、列存,堆表、AO表的原理和选择

    行存和列存的原理 什么时候选择行存 什么时候选择列存 堆表和AO表的原理 什么时候选择堆表 什么时候选择AO表 测试对比行存deform和列存的性能差别 如何查看表的存储结构 Greenplum支持行 ...

  4. Greenplum 行存、列存,堆表、AO表的原理和选择

    转载自: https://github.com/digoal/blog/blob/master/201708/20170818_02.md?spm=a2c4e.11153940.blogcont179 ...

  5. 聊一聊数据库的行存与列存

    目录 存储方式比较 优缺点比较 行存与列存实验 选择建议 注意事项 好多人最开始学习数据库的时候,是关系数据库,数据以表格形式存储,一行表示一条记录.其实这种就是典型的行存储(Row-based st ...

  6. greenplum 数据库如何增加列_Greenplum行存与列存的选择以及转换方法-阿里云开发者社区...

    背景 数据在数据库中的存储形式多种多样,比较常见的如 1. PostgreSQL的堆表,以行的形式存储,(当变成字段压缩后的长度超过数据块的四分之一时,会以TOAST的形式存储到TOAST表). 2. ...

  7. Excel中多行变为一列,一个简单的方法,一试就会!

    原始数据如下: 想把上述数据变成下边这个样子:(面板模型数据录入需要用到下边这个形式的数据) 步骤: (1)先把所有原始数据转置: (2)再在153.19下的单元格输入"=C2", ...

  8. byte数组穿换成pcm格式_Apache Arrow:一种适合异构大数据系统的内存列存数据格式标准...

    作者 | 伴鱼技术团队 策划 | 钰莹 本文介绍一种内存列存数据格式:Apache Arrow,它有一个非常大的愿景:提供内存数据分析 (in-memory analytics) 的开发平台,让数据在 ...

  9. parquet格式_【存储】基于列存之Parquet格式

    " Parquet 是一种支持嵌套结构的列式存储格式,非常适用于 OLAP 场景,按列存储和扫描." 01 - 概述 数据存储是信息技术对您每天所需的数据内容(从应用到网络协议,从 ...

最新文章

  1. 找不到网卡eth0的解决方法
  2. xampp中php连接mysql数据库乱码解决
  3. NameError: global name ‘***‘ is not defined
  4. 总线协议之I2C总线时序
  5. 面试小记---外部脚本必须包含 script 标签吗?
  6. 计算机专业必须读的经典书籍
  7. mysql 6.5.7_centos 6.5装mysql5.7
  8. 2025年将达6.4万亿美元!AI这下玩大了……
  9. Databricks文档02----Databricks CLI
  10. 21天Jmeter打卡Day13 逻辑控制器之if和简单控制
  11. 10.11 noip模拟试题
  12. Android studio 突然 无法解析符号
  13. Go基础:切片的地址和扩容
  14. 求最短路的多种方法比较及应用
  15. Host 'admin-PC' is not allowed to connect to this MySQL server
  16. msxml6_x64 下载
  17. U3D Distortion
  18. python Flask 10 Flask请求上下文管理werkzeug
  19. python turtle画圣诞树动图_用 turtle 画一棵圣诞树
  20. 安卓逆向—霸哥磁力搜索apk过签名校验

热门文章

  1. golang: 密码中允许出现数字、大写字母、小写字母、特殊字符,但至少包含其中2种且长度在8-16之间(四种符号任意满足三种即可)
  2. 如何在word里里面打勾
  3. JAVA中list根据某个字段排序
  4. 2417. Loan Repayment
  5. 乡村振兴公益基金启动暨古茶树非遗保护公益行发布
  6. 手机USB共享电脑宽带的尝试过程,从失败到成功
  7. 计算机专业-世界大学学术排名,QS排名,U.S.NEWS排名
  8. 【Spring注解驱动开发】使用@Autowired@Qualifier@Primary三大注解自动装配组件,你会了吗?
  9. 最大子序列和问题c语言力扣,力扣
  10. 【HashTab初学】哈希表