大数据应用之 — apache doris 基于ssb测试

  1. 下载doris的ssb-tools

    https://github.com/apache/doris

    将 doris-master\tools\ssb-tools 上传到lsyk01:/softw

  2. 下载 ssb-gen工具包(因为虚拟机没有联网)

    https://palo-cloud-repo-bd.bd.bcebos.com/baidu-doris-release/ssb-dbgen-linux.tar.gz

​ 上传至 lsyk01:/softw/ssb-tools

  1. 修改脚本 /softw/ssb-tools/build-ssb-dbgen.sh
vi /softw/ssb-tools/build-ssb-dbgen.sh
#修改如下:不去下载了,直接解压下载好的包
# download ssb-dbgen first
if [[ -d $SSB_DBGEN_DIR ]]; thenecho "Dir $CURDIR/ssb-dbgen/ already exists. No need to download."echo "If you want to download ssb-dbgen again, please delete this dir first."
else#curl https://palo-cloud-repo-bd.bd.bcebos.com/baidu-doris-release/ssb-dbgen-linux.tar.gz | tar xz -C $CURDIR/tar -zxvf $CURDIR/ssb-dbgen-linux.tar.gz -C $CURDIR/
fi
  1. 编译ssb-gen
cd /softw/ssb-tools
sh build-ssb-dbgen.sh
  1. 生成测试数据
sh gen-ssb-data.sh -s 40du -sh *
110M    customer.tbl
228K    date.tbl
2.4G    lineorder.tbl.1
2.4G    lineorder.tbl.10
2.4G    lineorder.tbl.2
2.4G    lineorder.tbl.3
2.4G    lineorder.tbl.4
2.4G    lineorder.tbl.5
2.4G    lineorder.tbl.6
2.4G    lineorder.tbl.7
2.4G    lineorder.tbl.8
2.4G    lineorder.tbl.9
99M     part.tbl
6.5M    supplier.tblwc -l *1200000 customer.tbl2556 date.tbl23996604 lineorder.tbl.124001837 lineorder.tbl.1023992403 lineorder.tbl.223996070 lineorder.tbl.324003563 lineorder.tbl.424005968 lineorder.tbl.524005179 lineorder.tbl.623998304 lineorder.tbl.724002460 lineorder.tbl.824009902 lineorder.tbl.91200000 part.tbl80000 supplier.tbl
  1. 配置 doris-cluster.conf
# Any of FE host
export FE_HOST='lsyk01'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD='fa'
# The database where SSB tables located
export DB='ssb'
  1. 建表
sh ./create-ssb-tables.sh
sh ./create-ssb-flat-table.sh
  1. 导入数据
sh ./load-ssb-dimension-data.shsh ./load-ssb-fact-data.sh -c 5

​ 很吃内存:

用时 8分钟,大小大概6.8G,原文件是:24G

mysql> select count(1) from ssb.lineorder


由此可见,apache doris 的缓存了得啊。。。

  1. 导入flat宽表
sh ./load-ssb-flat-data.sh

报错:

查看代码,发现没有指定密码:

增加-p密码

耗时25分钟,还报错了,是内存不足了:

语句拿出来,半年一次,100秒,比官方的脚本快

挂了

INSERT INTO ssb.lineorder_flat
SELECTLO_ORDERDATE,LO_ORDERKEY,LO_LINENUMBER,LO_CUSTKEY,LO_PARTKEY,LO_SUPPKEY,LO_ORDERPRIORITY,LO_SHIPPRIORITY,LO_QUANTITY,LO_EXTENDEDPRICE,LO_ORDTOTALPRICE,LO_DISCOUNT,LO_REVENUE,LO_SUPPLYCOST,LO_TAX,LO_COMMITDATE,LO_SHIPMODE,C_NAME,C_ADDRESS,C_CITY,C_NATION,C_REGION,C_PHONE,C_MKTSEGMENT,S_NAME,S_ADDRESS,S_CITY,S_NATION,S_REGION,S_PHONE,P_NAME,P_MFGR,P_CATEGORY,P_BRAND,P_COLOR,P_TYPE,P_SIZE,P_CONTAINER
FROM (SELECTlo_orderkey,lo_linenumber,lo_custkey,lo_partkey,lo_suppkey,lo_orderdate,lo_orderpriority,lo_shippriority,lo_quantity,lo_extendedprice,lo_ordtotalprice,lo_discount,lo_revenue,lo_supplycost,lo_tax,lo_commitdate,lo_shipmodeFROM ssb.lineorder-- WHERE ${con}
) l
INNER JOIN ssb.customer c
ON (c.c_custkey = l.lo_custkey)
INNER JOIN ssb.supplier s
ON (s.s_suppkey = l.lo_suppkey)
INNER JOIN ssb.part p
ON (p.p_partkey = l.lo_partkey);
select 'part',count(*) from ssb.part union all
select 'customer',count(*) from ssb.customer union all
select 'supplier',count(*) from ssb.supplier union all
select 'date',count(*) from ssb.dates union all
select 'lineorder',count(*) from ssb.lineorder union all
select 'lineorder_flat',count(*) from ssb.lineorder_flat

  1. 测试结果
set global enable_vectorized_engine=1;
set global parallel_fragment_exec_instance_num=8;
set global exec_mem_limit=48G;
set global batch_size=4096;
set global enable_projection=true;
set global runtime_filter_mode=global;--Q1.1   0.68
SELECTSUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROMssb.lineorder_flat
WHERELO_ORDERDATE >= 19930101AND LO_ORDERDATE <= 19931231AND LO_DISCOUNT BETWEEN 1 AND 3AND LO_QUANTITY < 25;--Q1.2    0.12
SELECTSUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROMssb.lineorder_flat
WHERELO_ORDERDATE >= 19940101AND LO_ORDERDATE <= 19940131AND LO_DISCOUNT BETWEEN 4 AND 6AND LO_QUANTITY BETWEEN 26 AND 35;--Q1.3   0.78
SELECTSUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROMssb.lineorder_flat
WHEREweekofyear(LO_ORDERDATE) = 6AND LO_ORDERDATE >= 19940101AND LO_ORDERDATE <= 19941231AND LO_DISCOUNT BETWEEN 5 AND 7AND LO_QUANTITY BETWEEN 26 AND 35;--Q2.1   4.47
SELECTSUM(LO_REVENUE),(LO_ORDERDATE DIV 10000) AS YEAR,P_BRAND
FROMssb.lineorder_flat
WHEREP_CATEGORY = 'MFGR#12'AND S_REGION = 'AMERICA'
GROUP BYYEAR,P_BRAND
ORDER BYYEAR,P_BRAND;--Q2.2    2.69
SELECTSUM(LO_REVENUE),(LO_ORDERDATE DIV 10000) AS YEAR,P_BRAND
FROMssb.lineorder_flat
WHEREP_BRAND >= 'MFGR#2221'AND P_BRAND <= 'MFGR#2228'AND S_REGION = 'ASIA'
GROUP BYYEAR,P_BRAND
ORDER BYYEAR,P_BRAND;--Q2.3   2.07
SELECTSUM(LO_REVENUE),(LO_ORDERDATE DIV 10000) AS YEAR,P_BRAND
FROMssb.lineorder_flat
WHEREP_BRAND = 'MFGR#2239'AND S_REGION = 'EUROPE'
GROUP BYYEAR,P_BRAND
ORDER BYYEAR,P_BRAND;--Q3.1   4.10
SELECTC_NATION,S_NATION,(LO_ORDERDATE DIV 10000) AS YEAR,SUM(LO_REVENUE) AS revenue
FROMssb.lineorder_flat
WHEREC_REGION = 'ASIA'AND S_REGION = 'ASIA'AND LO_ORDERDATE >= 19920101AND LO_ORDERDATE <= 19971231
GROUP BYC_NATION,S_NATION,YEAR
ORDER BYYEAR ASC,revenue DESC;--Q3.2   3.99
SELECTC_CITY,S_CITY,(LO_ORDERDATE DIV 10000) AS YEAR,SUM(LO_REVENUE) AS revenue
FROMssb.lineorder_flat
WHEREC_NATION = 'UNITED STATES'AND S_NATION = 'UNITED STATES'AND LO_ORDERDATE >= 19920101AND LO_ORDERDATE <= 19971231
GROUP BYC_CITY,S_CITY,YEAR
ORDER BYYEAR ASC,revenue DESC;--Q3.3   1.76
SELECTC_CITY,S_CITY,(LO_ORDERDATE DIV 10000) AS YEAR,SUM(LO_REVENUE) AS revenue
FROMssb.lineorder_flat
WHEREC_CITY IN ('UNITED KI1', 'UNITED KI5')AND S_CITY IN ('UNITED KI1', 'UNITED KI5')AND LO_ORDERDATE >= 19920101AND LO_ORDERDATE <= 19971231
GROUP BYC_CITY,S_CITY,YEAR
ORDER BYYEAR ASC,revenue DESC;--Q3.4   0.1
SELECTC_CITY,S_CITY,(LO_ORDERDATE DIV 10000) AS YEAR,SUM(LO_REVENUE) AS revenue
FROMssb.lineorder_flat
WHEREC_CITY IN ('UNITED KI1', 'UNITED KI5')AND S_CITY IN ('UNITED KI1', 'UNITED KI5')AND LO_ORDERDATE >= 19971201AND LO_ORDERDATE <= 19971231
GROUP BYC_CITY,S_CITY,YEAR
ORDER BYYEAR ASC,revenue DESC;--Q4.1   5.97
SELECT(LO_ORDERDATE DIV 10000) AS YEAR,C_NATION,SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROMssb.lineorder_flat
WHEREC_REGION = 'AMERICA'aND S_REGION = 'AMERICA'AND P_MFGR IN ('MFGR#1', 'MFGR#2')
GROUP BYYEAR,C_NATION
ORDER BYYEAR ASC,C_NATION ASC;--Q4.2   1.48
SELECT(LO_ORDERDATE DIV 10000) AS YEAR,S_NATION,P_CATEGORY,SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROMssb.lineorder_flat
WHEREC_REGION = 'AMERICA'AND S_REGION = 'AMERICA'AND LO_ORDERDATE >= 19970101AND LO_ORDERDATE <= 19981231AND P_MFGR IN ('MFGR#1', 'MFGR#2')
GROUP BYYEAR,S_NATION,P_CATEGORY
ORDER BYYEAR ASC,S_NATION ASC,P_CATEGORY ASC;--Q4.3   1.13
SELECT(LO_ORDERDATE DIV 10000) AS YEAR,S_CITY,P_BRAND,SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROMssb.lineorder_flat
WHERES_NATION = 'UNITED STATES'AND LO_ORDERDATE >= 19970101AND LO_ORDERDATE <= 19981231AND P_CATEGORY = 'MFGR#14'
GROUP BYYEAR,S_CITY,P_BRAND
ORDER BYYEAR ASC,S_CITY ASC,P_BRAND ASC;--Q5.1   58.79
selectcount(1),sum(cnt)
from(selectLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRAND,count(1) as cnt,sum(LO_SUPPLYCOST)fromssb.lineorder_flatgroup byLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRANDorder byLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRAND
) t ;
--3218808  240012290--Q5.2  5.43
selectcount(1),sum(cnt)
from(selectLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRAND,count(1) as cnt,sum(LO_SUPPLYCOST)fromssb.lineorder_flatwhereS_NATION = 'UNITED STATES'AND P_CATEGORY = 'MFGR#14'group byLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRANDorder byLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRAND
) t ;
--117571--Q6.1   58.79
selectcount(1),sum(cnt)
from(selectLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRAND,count(1) as cnt,sum(LO_SUPPLYCOST) as sm,count(distinct S_NAME) as dcntfromssb.lineorder_flatgroup byLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRANDorder byLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRAND
) t ;
--报错,内存不足--Q6.2  10.81
selectcount(1),sum(cnt)
from(selectLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRAND,count(1) as cnt,sum(LO_SUPPLYCOST) as sm,count(distinct S_NAME) as dcntfromssb.lineorder_flatwhereS_NATION = 'UNITED STATES'AND P_CATEGORY = 'MFGR#14'group byLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRANDorder byLO_ORDERPRIORITY,LO_SHIPMODE,P_COLOR,P_BRAND
) t ;
--117571 386092

大数据应用之 --- apache doris 基于ssb测试相关推荐

  1. 大数据任务调度工具 Apache DolphinScheduler

    文章目录 大数据任务调度工具 Apache DolphinScheduler 项目亮点 DolphinScheduler 简介 调度系统选型 为什么大数据要选择 DolphinScheduler Do ...

  2. 深入掌握大数据Kafka的使用(基于Python开发)-张明阳-专题视频课程

    深入掌握大数据Kafka的使用(基于Python开发)-3人已学习 课程介绍         深入掌握大数据Kafka实战视频教程,本课程为实战教学,主要介绍了Kafka的生产者.消费者,其中重点内容 ...

  3. 小米大数据:借助Apache Kylin打造高效、易用的一站式OLAP解决方案

    如今的小米不仅是一家手机公司,更是一家大数据与人工智能公司.随着小米公司各项业务的快速发展,数据中的商业价值也愈发突显.而与此同时,各业务团队在数据查询.分析等方面的压力同样正在剧增.因此,为帮助公司 ...

  4. 大数据主题分享第三期 | 基于ELK的亿级实时日志分析平台实践

    猫友会希望建立更多高质量垂直细分社群,本次是"大数据学习交流付费群"的第三期分享. "大数据学习交流付费群"由猫友会联合,斗鱼数据平台总监吴瑞诚,卷皮BI技术总 ...

  5. 大数据疫情可视化平台1_基于Hadoop3.2.1、Hive3.1.2、搭建疫情信息可视化系统

    前言 项目效果展示 项目源码免费获得请私信博主,绝对免费! 目录 Linux基础命令:往期博客Linux课堂篇3_Linux目录结构.快捷键.常用基础命令 Hadoop3.2.1介绍与环境搭建 Hiv ...

  6. drill apache_大数据SQL:Apache Drill查询执行功能概述–白板演练

    drill apache 在本周的白板演练中,MapR Technologies产品管理高级总监Neeraja Rentachintala概述了开源Apache Drill如何在大型数据集上实现交互式 ...

  7. 大数据应用实践1:基于开源架构的股票行情分析与预测

    股票市场行情分析与预测一直是数据分析领域里面的重头戏,确切地说IT行业的每一次重大发展的幕后推动者以及新产品(特别是高端产品)的最先尝试者都包含金融行业,特别是证券交易市场,它符合大数据的四大特征:交 ...

  8. 大数据 客户标签体系_基于大数据的用户标签体系建设思路和应用

    如何设计一个完善的用户标签体系?怎么打标签?打哪些标签?谁来打?怎么使用用户标签创建商业价值? 在大数据时代,数据在呈现出海量化.多样化和价值化变化的同时,也改变了传统IT行业的市场竞争环境.营销策略 ...

  9. 大数据精准投放平台_基于大数据的广告精准投放方法与流程

    本发明涉及大数据及广告投放技术领域,尤其涉及一种基于大数据的广告精准投放方法. 背景技术: 随着市场经济的迅猛发展,广告成为经济行为中必不可少的重要工具,而由于用户群组成复杂,广告数量多,投放广告的终 ...

  10. 大数据 客户标签体系_基于大数据的用户标签体系建设思路

    如何在ZB级的海量数据中获取并筛选有价值的信息,是对IT企业的一大挑战.通过构建客户标签,支撑精准营销服务,是应对上述挑战的有效解决方案. 但是怎么设计一个完善的用户标签体系?怎么打标签?打哪些标签? ...

最新文章

  1. 在CentOS 6.8 x86_64上安装nginx 1.10.3
  2. NVME CLI -- nvme 命令查看NVME设备内部状态
  3. linux 把命令行结果赋值给变量;linux if语句 ; command log
  4. debian下安装LNMP环境(二)
  5. 服务器内存一般多大_性能调优第一步,搞定服务器硬件选型
  6. 明年的方向是JAVA+SAP
  7. Sicily-1063
  8. java基本命令_java基础篇 快捷键 常见Dos命令等等
  9. java 类的域_Java类中对象域的初始化
  10. SQL server int 转char类型
  11. SPSS异方差检验的实现
  12. 【JY】结构概念之(消能减震黏滞阻尼器)
  13. 珍藏绝版MTV全套 -《最动听的BEYOND
  14. 信用卡降额冻结封卡,如何摆脱银行风控?
  15. 类似PS的蒙版?可以实现,LVGL『Object mask对象蒙版控件』介绍
  16. ContentProvider详解
  17. H5监听摇一摇和手机倾斜事件(重力感应)
  18. 微机原理与接口技术--西安电子科技大学-笔记一
  19. 【论文阅读】Text Gestalt: Stroke-Aware Scene Text Image Super-Resolution
  20. java中的\t\r\n\b(Java转义字符)分别是什么?

热门文章

  1. win10无法打开匿名级安全令牌_无法打开匿名级安全令牌
  2. Python做一份简易旅行攻略——疫情之后,若条件允许,可愿意用一场旅行“弥补”自己
  3. protocol buffer 使用
  4. poj3580:SuperMemo(块状链表/Splay)
  5. 跨境电商如何利用Quora帮你引上万流量
  6. 计算机添加启动程序,电脑的开机启动项怎么设置?
  7. c语言中int sel是什么意思,SEL数据类型,@selector的用法,以及调用SEL
  8. SpringBoot:用腾讯企业微信邮箱发送邮件
  9. 什么是DNS缓存投毒?有哪些危害?
  10. Qt FlowLayout升级版