基于sparksql调用shell脚本运行SQL

[Author]: kwu

基于sparksql调用shell脚本运行SQL，sparksql提供了类似hive中的 -e , -f ,-i的选项

1、定时调用脚本

#!/bin/sh
# upload logs to hdfs  yesterday=`date --date='1 days ago' +%Y%m%d`  /opt/modules/spark/bin/spark-sql -i /opt/bin/spark_opt/init.sql --master spark://10.130.2.20:7077 --executor-memory 6g --total-executor-cores 45 --conf spark.ui.port=4075   -e "\
insert overwrite table st.stock_realtime_analysis PARTITION (DTYPE='01' )select t1.stockId as stockId,t1.url as url,t1.clickcnt as clickcnt,0,round((t1.clickcnt / (case when t2.clickcntyesday is null then   0 else t2.clickcntyesday end) - 1) * 100, 2) as LPcnt,'01' as type,t1.analysis_date as analysis_date,t1.analysis_time as analysis_timefrom (select stock_code stockId,concat('http://stockdata.stock.hexun.com/', stock_code,'.shtml') url,count(1) clickcnt,substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),1,10) analysis_date,substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),12,8) analysis_timefrom dms.tracklog_5minwhere stock_type = 'STOCK'and day =substr(from_unixtime(unix_timestamp(), 'yyyyMMdd'), 1, 8)group by stock_codeorder by clickcnt desc limit 20) t1left join (select stock_code stockId, count(1) clickcntyesdayfrom dms.tracklog_5min awhere stock_type = 'STOCK'and substr(datetime, 1, 10) = date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),1)and substr(datetime, 12, 5) <substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'), 12, 5)and day = '${yesterday}'group by stock_code) t2on t1.stockId = t2.stockId;"\sqoop export  --connect jdbc:mysql://10.130.2.245:3306/charts   --username guojinlian  --password Abcd1234  --table stock_realtime_analysis  --fields-terminated-by '\001' --columns "stockid,url,clickcnt,splycnt,lpcnt,type" --export-dir /dw/st/stock_realtime_analysis/dtype=01;

init.sql内容为载入udf:

add jar /opt/bin/UDF/hive-udf.jar;
create temporary function udtf_stockidxfund as 'com.hexun.hive.udf.stock.UDTFStockIdxFund';
create temporary function udf_getbfhourstime as 'com.hexun.hive.udf.time.UDFGetBfHoursTime';
create temporary function udf_getbfhourstime2 as 'com.hexun.hive.udf.time.UDFGetBfHoursTime2';
create temporary function udf_stockidxfund as 'com.hexun.hive.udf.stock.UDFStockIdxFund';
create temporary function udf_md5 as 'com.hexun.hive.udf.common.HashMD5UDF';
create temporary function udf_murhash as 'com.hexun.hive.udf.common.HashMurUDF';
create temporary function udf_url as 'com.hexun.hive.udf.url.UDFUrl';
create temporary function url_host as 'com.hexun.hive.udf.url.UDFHost';
create temporary function udf_ip as 'com.hexun.hive.udf.url.UDFIP';
create temporary function udf_site as 'com.hexun.hive.udf.url.UDFSite';
create temporary function udf_UrlDecode as 'com.hexun.hive.udf.url.UDFUrlDecode';
create temporary function udtf_url as 'com.hexun.hive.udf.url.UDTFUrl';
create temporary function udf_ua as 'com.hexun.hive.udf.useragent.UDFUA';
create temporary function udf_ssh as 'com.hexun.hive.udf.useragent.UDFSSH';
create temporary function udtf_ua as 'com.hexun.hive.udf.useragent.UDTFUA';
create temporary function udf_kw as 'com.hexun.hive.udf.url.UDFKW';
create temporary function udf_chdecode as 'com.hexun.hive.udf.url.UDFChDecode';

设置ui的port

--conf spark.ui.port=4075

默觉得4040，会与其它正在跑的任务冲突，这里改动为4075

设定任务使用的内存与CPU资源

--executor-memory 6g --total-executor-cores 45

原来的语句是用hive -e 运行的，改动为spark后速度大加快了。

原来为15min，提升速度后为 45s.

基于sparksql调用shell脚本运行SQL相关推荐

hive运行mysql脚本_用java代码调用shell脚本执行sqoop将hive表中数据导出到mysql
1:创建shell脚本 1 touch sqoop_options.sh2 chmod 777 sqoop_options.sh 编辑文件特地将执行map的个数设置为变量测试可以java代码 ...
halcon可以用python吗_如何基于pythonnet调用halcon脚本
这篇文章主要介绍了如何基于pythonnet调用halcon脚本,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下最近的项目中遇到了使用python程 ...
java调用shell脚本及注意事项
需求: get方法下载远程zip包,然后zip包解压,取出第一级目录再次进行压缩获取新的压缩zip包. 问题: 如果选择使用java代码的IO流操作,在不确定zip包大小的情况下可能会占用很大的内存, ...
android执行命令行取得结果,Android调用shell脚本并取得输出
Android调用shell脚本并获得输出前段时间做的HLS流媒体服务器可以正常工作了,但是它的启动需要在PC机命令行中进行或者在Android下载个Terminal IDE软件,在Android上 ...
C语言程序中调用脚本,C语言调用SHELL脚本
在Linux 环境下Shell脚本具有非常强大的功能!使用Shell可以很方便的使用和管理Linux系统,最近学习了一点shell知识,所以一直在想要是可以在C/C++中调用shell脚本那该有多 ...
java调用shell脚本_Java 执行Shell脚本指令
一.介绍有时候我们在Linux中运行Java程序时,需要调用一些Shell命令和脚本.而Runtime.getRuntime().exec()方法给我们提供了这个功能,而且Runtime.getRu ...
java无阻塞执行脚本,JAVA调用Shell脚本-及阻塞的解决方法
JAVA调用Shell脚本--及阻塞的解决办法用java调用shell,使用 Process p=Runtime.getRuntime().exec(String[] cmd); Runtime.e ...
Python 调用shell脚本
python调用Shell脚本,有两种方法:os.system(cmd)或os.popen(cmd),前者返回值是脚本的退出状态码,后者的返回值是脚本执行过程中的输出内容. 实际使用时视需求情况而选择 ...
python调用Shell脚本：os.system(cmd)或os.popen(cmd),
python调用Shell脚本,有两种方法:os.system(cmd)或os.popen(cmd),前者返回值是脚本的退出状态码,后者的返回值是脚本执行过程中的输出内容.实际使用时视需求情况而选择. ...

基于sparksql调用shell脚本运行SQL

基于sparksql调用shell脚本运行SQL相关推荐

最新文章

热门文章