需求:结合odps对日志就行统计分析并将结果导入到mysql中

解决方法:结合odps命令行odpscmd和mysqldump、mysql以及contab完成该工作。

Shell:

#!/bin/bashPORT="3306" #端口号
USERNAME="biuser" #用户名
PASSWORD="!#123date" #密码
DBNAME="bitest" #数据库名称
RUNDATE=`date +%Y-%m-%d`
period=`date +"%Y%m" -d  "-1days"`PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/shenl/odpscmd_public/bin:/root/jdk1.7.0_75/bin####1 同步其它库的相关数据
mysqldump -umdshop -p'xdata123' -h'192.168.128.33' testdb tb_carry tb_city --set-gtid-purged=OFF>testdb.sql
mysql -u$USERNAME -p$PASSWORD -D$DBNAME < '/root/testdb.sql'1>/root/syscODPS.log###2 删除现有分类 时段 渠道 商品数据
rm -rf /tmp/timespan.txt
rm -rf /tmp/timearea.txt
rm -rf /tmp/seekarea.txt
rm -rf /tmp/tb_good.txt###3 mysql里加工分类 时段 渠道 商品数据
mysql -u$USERNAME -p$PASSWORD -D$DBNAME <'/root/shenl/timespan.sql' 1>/root/shenl/synctimespan.log###4 odps里删除分类 时段 渠道 商品数据
odpscmd -e "DROP TABLE tb_bi_meta_timespan;"
odpscmd -e "DROP TABLE tb_bi_meta_timearea;"
odpscmd -e "DROP TABLE tb_bi_meta_seekarea;"
odpscmd -e "DROP TABLE tb_good;"###5 odps里创建分类 时段 渠道 商品数据以便后期加工
odpscmd -e "CREATE TABLE tb_bi_meta_timespan(catalogid int,catalogname string,span int, channel int,action int);"
odpscmd -e "CREATE TABLE tb_bi_meta_timearea(catalogid int,catalogname string,province string,province_id int,channel int,action int);"
odpscmd -e "CREATE TABLE tb_bi_meta_seekarea(province string,province_id int,channel int);"
odpscmd -e "create table tb_good(gid string,g_title string,g_catalog_id string,g_status string);"###6 上传分类 时段 渠道 商品数据到odps对应的表中
odpscmd -e "tunnel upload /tmp/timespan.txt tb_bi_meta_timespan;"
odpscmd -e "tunnel upload /tmp/timearea.txt tb_bi_meta_timearea;"
odpscmd -e "tunnel upload /tmp/seekarea.txt tb_bi_meta_seekarea;"
odpscmd -e "tunnel upload /tmp/tb_good.txt tb_good;"###7 按照分区插入日志数据odpscmd -e "INSERT into table tb_bi_goodbrowse partition(periodsplit="$period")SELECT channel,province,city,itemid,'' as goodclassify,datetrunc(requesttime,'DD') AS period,geid,userid,requesttime,action,getdate() as inserttime FROM tb_bi_marketlog A WHERE isdate(requesttime,'yyyy-mm-dd hh:mi:ss') AND A.period=to_char(dateadd(getdate(), -1, 'dd'),'yyyymm') AND datetrunc(A.requesttime,'DD') = datetrunc(dateadd(getdate(), -1,'dd'),'dd');"###8 odps计算生成出报表数据和导出
odpscmd -f "/root/shenl/odpsstat0320.sql"###9 删除现有的报表数据
rm -rf /var/log/mysql/timespanout.txt
rm -rf /var/log/mysql/timeareaout.txt
rm -rf /var/log/mysql/seekareaout.txt###10 odps的报表数据下载到/var/log/mysql目录内odpscmd -e "tunnel download max_compute.tb_bi_report_timespans /var/log/mysql/timespanout.txt"
chown mysql:mysql /var/log/mysql/timespanout.txt
odpscmd -e "tunnel downloadmax_compute.tb_bi_report_timeareas /var/log/mysql/timeareaout.txt"
chown mysql:mysql /var/log/mysql/timeareaout.txt
odpscmd -e "tunnel download -ni ''max_compute.tb_bi_report_seekareas /var/log/mysql/seekareaout.txt"
chown mysql:mysql /var/log/mysql/seekareaout.txt###11 mysql导入odps里统计后的数据
mysql -u$USERNAME -p$PASSWORD -D$DBNAME <'/root/shenl/load2mysql.sql' 1>/root/shenl/load2mysql.log

1) 其中/root/shenl/timespan.sql的脚本内容:

use bi;
SELECT A.id AS catalogid,A.c_name catalogname,B.i AS span,C.i AS channel,D.i AS action INTO OUTFILE '/tmp/timespan.txt' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' FROM (SELECT id,c_name FROM shenl_catalog WHERE c_parent_id = 0) A
CROSS JOIN tb_incr B
CROSS JOIN (SELECT * FROM tb_incr WHERE i>0 AND i<4)C
CROSS JOIN (SELECT * FROM tb_incr WHERE i>0 AND i<5)D;SELECT A.id AS catalogid,A.c_name catalogname,B.province,B.province_id,C.i AS channel,D.i AS action into outfile '/tmp/timearea.txt' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
FROM (SELECT id,c_name FROM shenl_catalog WHERE c_parent_id = 0)A CROSS JOIN (SELECT DISTINCT province,province_id FROM tb_bi_area)B CROSS JOIN(SELECT * FROM tb_incr WHERE i>0 AND i<4)C CROSS JOIN(SELECT * FROM tb_incr WHERE i>0 AND i<5)D;SELECT B.province,B.province_id,C.i AS channel into outfile '/tmp/seekarea.txt' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'FROM (SELECT DISTINCT province, province_id FROM tb_bi_area) B CROSS JOIN (SELECT * FROM tb_incr WHERE i > 0 AND i < 4) C;SELECT gid,REPLACE(REPLACE(REPLACE(g_title,char(10),''),CHAR(13),''),',','') AS g_title,g_catalog_id,g_status into outfile '/tmp/shenl_goods.txt' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' FROM  shenl_goods;

2) /root/shenl/odpsstat0320.sql 内统计脚本内容:

insert OVERWRITE table tb_bi_report_timespans
SELECT datetrunc(dateadd(getdate(), -1, 'dd'),'dd') as period,A.catalogname,A.span,case A.channel WHEN 1 THEN "tel" WHEN 2 THEN "smell" WHEN 3 THEN "bigscreen" END AS channel,
A.action,case when D.catalogid IS NULL then 0 else browsetimes END as stattimes FROM tb_bi_meta_timespan ALEFT OUTER JOIN(SELECT C.g_catalog_id as catalogid,channel,datepart(browsetime,'hh') as spanid,COUNT(DISTINCT browsetime) as browsetimes,action from tb_bi_goodbrowse AJOIN shenl_goods CON A.good = C.gid AND A.period=datetrunc(dateadd(getdate(), -1, 'dd'),'dd') AND A.periodsplit=to_char(dateadd(getdate(), -1, 'dd'),'yyyymm')GROUP BY C.g_catalog_id,datepart(browsetime,'hh'),channel,action)DON A.catalogid = D.catalogid AND A.span = D.spanid AND A.channel = D.channel AND A.action = D.action;insert OVERWRITE table tb_bi_report_timeareas
SELECT datetrunc(dateadd(getdate(), -1, 'dd'),'dd') as period,A.catalogname,A.province,case A.channel WHEN 1 THEN "tel" WHEN 2 THEN "smell" WHEN 3 THEN "bigscreen" END AS channel,
A.action,case when D.catalogid IS NULL then 0 else browsetimes END as stattimes FROM tb_bi_meta_timearea ALEFT OUTER JOIN(SELECT B.province_id,period,channel,COUNT(1) as browsetimes,C.g_catalog_id as catalogid,actionfrom tb_bi_goodbrowse AJOIN (SELECT distinct province,province_id FROM tb_bi_area)BON A.province = B.province_idJOIN shenl_goods CON A.good = C.gid AND A.period=datetrunc(dateadd(getdate(), -1, 'dd'),'dd') AND A.periodsplit=to_char(dateadd(getdate(), -1, 'dd'),'yyyymm')GROUP BY B.province_id,C.g_catalog_id,period,channel,action)DON A.catalogid = D.catalogid AND A.province_id = D.province_id AND A.channel=D.channel AND A.action = D.action;insert overwrite table tb_bi_report_seekareas
SELECT datetrunc(dateadd(getdate(), -1, 'dd'),'dd') as period,A.province,D.item as item,case A.channel WHEN 1 THEN "tel" WHEN 2 THEN "smell" WHEN 3 THEN "bigscreen" END AS channel,case when D.province_id IS NULL then 0 else browsetimes END as stattimes FROM tb_bi_meta_seekarea ALEFT OUTER JOIN(SELECT B.province_id,datetrunc(dateadd(getdate(), -1, 'dd'),'dd') as period,channel,item,COUNT(DISTINCT requesttime) as browsetimesfrom tb_bi_marketlog AJOIN (SELECT distinct province,province_id FROM tb_bi_area)BON A.province = B.province_id AND A.action =2 AND isdate(A.requesttime,'yyyy-mm-dd hh:mi:ss') AND A.period=to_char(dateadd(getdate(), -1, 'dd'),'yyyymm') AND datetrunc(A.requesttime,'DD') = datetrunc(dateadd(getdate(), -1, 'dd'),'dd') GROUP BY B.province_id,period,channel,item)DON A.province_id = D.province_id AND A.channel=D.channel

3) /root/shenl/load2mysql.sql脚本中的内容是:

load data infile '/var/log/mysql/timespanout.txt' into tabletb_bi_report_timespans fields terminated by ',' lines terminated by '\n';
load data infile '/var/log/mysql/timeareaout.txt' into tabletb_bi_report_timeareas fields terminated by ',' lines terminated by '\n';
load data infile '/var/log/mysql/seekareaout.txt' into tabletb_bi_report_seekareas fields terminated by ',' lines terminated by '\n';

代码解读:详见注释部分

odps结合mysql统计相关推荐

  1. mysql合并统计总数,mysql统计多表交织组合总数

    mysql统计多表交叉组合总数 >mysql -h localhost -u root -p mysql> use world; Database changed mysql> sh ...

  2. mysql统计功能和数据库information_schema/performance_schema

    mysql统计功能和数据库information_schema/performance_schema

  3. php+mysql统计7天、30天每天数据没有补0

    php+mysql统计7天.15天.30天没有补0: 先来看效果图 问题描述 查询数据库表中最近7天的记录 select count(id) count,FROM_UNIXTIME(dateline, ...

  4. mysql统计每半小时内的数据(查寻某段时间内的数据)

    mysql统计每半小时内的数据(查寻某段时间内的数据) 表结构 sql展示 sql说明 结果展示 思考 需求:统计某段时间内的数据,以半小时为单位统计 表结构 sql展示 SELECT @rank:= ...

  5. Mysql统计近30天的数据,无数据的填充0

    Mysql统计近30天的数据,无数据的填充0. 这个应该是我们在做统计分析的时候,经常遇到的一个需求. 先说一般的实现方式,就是按照日期进行分组,但是这样会有一个问题,如果数据库表中有一天没有数据,那 ...

  6. mysql 统计每年的数据统计_Mysql统计每年每个月的数据——详细教程

    Mysql统计每年每个月的数据(前端页面统计图实现) 最终想实现的效果图,在这里就不多废话了,直接上效果图,由于测试数据有几个月是为0的,所以数据图看着会有点怪怪. 接下来是数据库的两个表,这里直接给 ...

  7. mysql 统计历史累计数据,每天的数据需要进行累加

    mysql 统计历史累计数据 5月1号 = 5月1号的数据 5月2号 = 5月1号+5月2号 5月3号 =5月1号+5月2号+5月3号 - (如果当天没有新增数据则跳过统计) 表名:students ...

  8. MySQL统计每个月的销售合计数据

    MySQL统计每个月的销售合计数据   在名为'temp'数据库中有一张销售表(bb_sale),结构如下 字段名 类型 说明 备注 F1 Varchar 销售ID F2 Varchar 销售日期 日 ...

  9. Mysql统计近6个月的数据,无数据的填充0

    之前写过一遍文章,记录了Mysql统计近30天的数据,无数据填0的方式.主要思路就是利用mysql中的函数,生成一列30天的日期格式的数据,在通过这张临时表的数据去左关联我们的业务数据,由于用的是左关 ...

最新文章

  1. CNNIC互联网报告:中国网民超8亿 前沿科技进展显著
  2. MQTT 与 Kafka
  3. a commit git 参数是什么意思_深入理解Git - 一切皆commit
  4. CF1406E:Deleting Numbers(构造、根号分块)
  5. 听说当今程序员很厉害?不,那是你不了解上古时期的那些神级操作
  6. android脚本需语言,Android中使用脚本语言Lua
  7. WINDOWS2008 SERVER服务器上网实战
  8. 外包公司的运作模式和赚钱之道-聊聊IT外包公司
  9. 携程硅谷人才见面会邀请你参加!
  10. Flink实操 : DataSource操作
  11. 一文完全理解模型ks指标含义并画出ks曲线(包含代码和详细解释)
  12. JAVA JDBC连接步骤代码,SQL注入,处理异常try catch 的快捷键
  13. 《番茄工作法图解》书摘
  14. Sunday算法:查找字符串
  15. 拯救者Y9000P 2022 ubuntu18.04问题总结
  16. servlce和tomcat
  17. 《五天学会绘画》读后感-1至五章中
  18. html/javascript实现简单的上传
  19. ICASSP2020一些主题演讲
  20. 如何给win10增加磁盘盘符

热门文章

  1. wxWidgets:显示 wxTreeListCtrl 的示例
  2. boost::spirit模块实现一个类似于 XML 的小型解析器的测试程序
  3. boost::count相关的测试程序
  4. boost::logic::tribool相关用法的测试程序
  5. BOOST_LOCAL_FUNCTION宏用法的测试程序
  6. boost::graph::distributed::distributed_queue用法的测试程序
  7. boost::gil::for_each_pixel用法的测试程序
  8. boost::geometry模块model::polygon相关的测试程序
  9. VTK:可视化算法之VelocityProfile
  10. VTK:图片之CannyEdgeDetector