java 解析hiveserver2日志 解析HiveSQL 获取表的使用次数 热度

首先逐行读取hiveserver2日志
日志里每个sql之前都会包含Executing command关键字
故先匹配关键字

if (str.contains("Executing command"))

每行的打头都是固定的头部,使用正则表达式进行切割

String sqlRegex = ".*Executing command\\(queryId=[\\w-]*\\):";
String sql = str.replaceAll(sqlRegex, " ").trim();

下面是解析日志的工具类,首先导入依赖

<properties><hive.version>1.1.0</hive.version><hadoop.version>2.6.0</hadoop.version><poi.version>4.1.2</poi.version></properties><dependencies><dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>${poi.version}</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml</artifactId><version>${poi.version}</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml-schemas</artifactId><version>${poi.version}</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-scratchpad</artifactId><version>${poi.version}</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.12</version></dependency><dependency><groupId>org.apache.hive</groupId><artifactId>hive-exec</artifactId><version>${hive.version}</version><scope>compile</scope></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>${hadoop.version}</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>${hadoop.version}</version></dependency><!-- https://mvnrepository.com/artifact/com.alibaba/fastjson --><dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.58</version></dependency><dependency><groupId>commons-httpclient</groupId><artifactId>commons-httpclient</artifactId><version>3.1</version></dependency><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.36</version></dependency></dependencies>

一些类似mysql的依赖可以按需去掉,一些依赖因为后面的博客要用,我这边就一起写在这
下面是Hive日志解析工具类的代码

import org.apache.hadoop.hive.ql.parse.*;import java.util.*;/*** 目的:获取AST中的表,列,以及对其所做的操作,如SELECT,INSERT* 重点:获取SELECT操作中的表和列的相关操作。其他操作这判断到表级别。* 实现思路:对AST深度优先遍历,遇到操作的token则判断当前的操作,*                     遇到TOK_TAB或TOK_TABREF则判断出当前操作的表,遇到子句则压栈当前处理,处理子句。*                    子句处理完,栈弹出。**/
public class HiveParseUtils {private static final String UNKNOWN = "UNKNOWN";private static Map<String, String> alias = new HashMap<>();private static Map<String, String> cols = new TreeMap<>();private static Map<String, String> colAlais = new TreeMap<>();public static List<String> tables = new ArrayList<>();private static Stack<String> tableNameStack = new Stack<>();private static Stack<Oper> operStack = new Stack<>();private static String nowQueryTable = "";//定义及处理不清晰,修改为query或from节点对应的table集合或许好点。目前正在查询处理的表可能不止一个。private static Oper oper ;private static boolean joinClause = false;private enum Oper {SELECT, INSERT, DROP, TRUNCATE, LOAD, CREATETABLE, ALTER}public static Set<String> parseIteral(ASTNode ast) {Set<String> set= new HashSet<>();//当前查询所对应到的表集合prepareToParseCurrentNodeAndChilds(ast);set.addAll(parseChildNodes(ast));set.addAll(parseCurrentNode(ast ,set));endParseCurrentNode(ast);return set;}private static void endParseCurrentNode(ASTNode ast){if (ast.getToken() != null) {switch (ast.getToken().getType()) {//join 从句结束,跳出joincase HiveParser.TOK_RIGHTOUTERJOIN:case HiveParser.TOK_LEFTOUTERJOIN:case HiveParser.TOK_JOIN:joinClause = false;break;case HiveParser.TOK_QUERY:break;case HiveParser.TOK_INSERT:case HiveParser.TOK_SELECT:nowQueryTable = tableNameStack.pop();oper = operStack.pop();break;}}}private static Set<String> parseCurrentNode(ASTNode ast, Set<String> set){if (ast.getToken() != null) {switch (ast.getToken().getType()) {case HiveParser.TOK_TABLE_PARTITION:if (ast.getChildCount() != 2) {String table = BaseSemanticAnalyzer.getUnescapedName((ASTNode) ast.getChild(0));if (oper == Oper.SELECT) {nowQueryTable = table;}tables.add(table + "  " + oper);}break;case HiveParser.TOK_TAB:// outputTableString tableTab = BaseSemanticAnalyzer.getUnescapedName((ASTNode) ast.getChild(0));if (oper == Oper.SELECT) {nowQueryTable = tableTab;}tables.add(tableTab + "  " + oper);break;case HiveParser.TOK_TABREF:// inputTableASTNode tabTree = (ASTNode) ast.getChild(0);String tableName = (tabTree.getChildCount() == 1) ? BaseSemanticAnalyzer.getUnescapedName((ASTNode) tabTree.getChild(0)): BaseSemanticAnalyzer.getUnescapedName((ASTNode) tabTree.getChild(0))+ "." + tabTree.getChild(1);if (oper == Oper.SELECT) {if(joinClause && !"".equals(nowQueryTable) ){nowQueryTable += "&"+tableName;//}else{nowQueryTable = tableName;}set.add(tableName);}tables.add(tableName + "  " + oper);if (ast.getChild(1) != null) {String alia = ast.getChild(1).getText().toLowerCase();alias.put(alia, tableName);}break;case HiveParser.TOK_TABLE_OR_COL:if (ast.getParent().getType() != HiveParser.DOT) {String col = ast.getChild(0).getText().toLowerCase();if (alias.get(col) == null&& colAlais.get(nowQueryTable + "." + col) == null) {if(nowQueryTable.indexOf("&") > 0){//sql23cols.put(UNKNOWN + "." + col, "");}else{cols.put(nowQueryTable + "." + col, "");}}}break;case HiveParser.TOK_ALLCOLREF:cols.put(nowQueryTable + ".*", "");break;case HiveParser.TOK_SUBQUERY:if (ast.getChildCount() == 2) {String tableAlias = unescapeIdentifier(ast.getChild(1).getText());String aliaReal = "";for(String table : set){aliaReal+=table+"&";}if(aliaReal.length() !=0){aliaReal = aliaReal.substring(0, aliaReal.length()-1);}alias.put(tableAlias, aliaReal);}break;case HiveParser.TOK_SELEXPR:if (ast.getChild(0).getType() == HiveParser.TOK_TABLE_OR_COL) {String column = ast.getChild(0).getChild(0).getText().toLowerCase();if(nowQueryTable.indexOf("&") > 0){cols.put(UNKNOWN + "." + column, "");}else if (colAlais.get(nowQueryTable + "." + column) == null) {cols.put(nowQueryTable + "." + column, "");}} else if (ast.getChild(1) != null) {String columnAlia = ast.getChild(1).getText().toLowerCase();colAlais.put(nowQueryTable + "." + columnAlia, "");}break;case HiveParser.DOT:if (ast.getType() == HiveParser.DOT) {if (ast.getChildCount() == 2) {if (ast.getChild(0).getType() == HiveParser.TOK_TABLE_OR_COL&& ast.getChild(0).getChildCount() == 1&& ast.getChild(1).getType() == HiveParser.Identifier) {String alia = BaseSemanticAnalyzer.unescapeIdentifier(ast.getChild(0).getChild(0).getText().toLowerCase());String column = BaseSemanticAnalyzer.unescapeIdentifier(ast.getChild(1).getText().toLowerCase());String realTable = null;if (!tables.contains(alia + "" + oper)&& alias.get(alia) == null) {// [b SELECT, a// SELECT]alias.put(alia, nowQueryTable);}if (tables.contains(alia + "" + oper)) {realTable = alia;} else if (alias.get(alia) != null) {realTable = alias.get(alia);}if (realTable == null || realTable.length() == 0 || realTable.indexOf("&") > 0) {realTable = UNKNOWN;}cols.put(realTable + "." + column, "");}}}break;case HiveParser.TOK_ALTERTABLE_ADDPARTS:case HiveParser.TOK_ALTERTABLE_RENAME:case HiveParser.TOK_ALTERTABLE_ADDCOLS:ASTNode alterTableName = (ASTNode) ast.getChild(0);tables.add(alterTableName.getText() + "" + oper);break;}}return set;}private static Set<String> parseChildNodes(ASTNode ast){Set<String> set= new HashSet<String>();int numCh = ast.getChildCount();if (numCh > 0) {for (int num = 0; num < numCh; num++) {ASTNode child = (ASTNode) ast.getChild(num);set.addAll(parseIteral(child));}}return set;}private static void prepareToParseCurrentNodeAndChilds(ASTNode ast){if (ast.getToken() != null) {switch (ast.getToken().getType()) {//join 从句开始case HiveParser.TOK_RIGHTOUTERJOIN:case HiveParser.TOK_LEFTOUTERJOIN:case HiveParser.TOK_JOIN:joinClause = true;break;case HiveParser.TOK_QUERY:tableNameStack.push(nowQueryTable);operStack.push(oper);nowQueryTable = "";//sql22oper = Oper.SELECT;break;case HiveParser.TOK_INSERT:tableNameStack.push(nowQueryTable);operStack.push(oper);oper = Oper.INSERT;break;case HiveParser.TOK_SELECT:tableNameStack.push(nowQueryTable);operStack.push(oper);oper = Oper.SELECT;break;case HiveParser.TOK_DROPTABLE:oper = Oper.DROP;break;case HiveParser.TOK_TRUNCATETABLE:oper = Oper.TRUNCATE;break;case HiveParser.TOK_LOAD:oper = Oper.LOAD;break;case HiveParser.TOK_CREATETABLE:oper = Oper.CREATETABLE;break;}if (ast.getToken() != null&& ast.getToken().getType() >= HiveParser.TOK_ALTERDATABASE_PROPERTIES&& ast.getToken().getType() <= HiveParser.TOK_ALTERVIEW_RENAME) {oper = Oper.ALTER;}}}public static String unescapeIdentifier(String val) {if (val == null) {return null;}if (val.charAt(0) == '`' && val.charAt(val.length() - 1) == '`') {val = val.substring(1, val.length() - 1);}return val;}//获取字段和别名private static void output(Map<String, String> map) {Iterator<String> it = map.keySet().iterator();while (it.hasNext()) {String key = it.next();System.out.println(key + "" + map.get(key));}}public static List<String> splitSql(String sql) {sql = sql.replaceAll("^\\s*|\\s*$", "");char[] cs = sql.toCharArray();int quotTimes = 0;List<Integer> marksSplit = new ArrayList();for (int i = 0; i < cs.length; i++) {char c = cs[i];if (c == 39) {quotTimes++;}if (c == 59 && quotTimes % 2 == 0) {marksSplit.add(i);}if (i == cs.length - 1 && c != 59) {marksSplit.add(i + 1);}}List<String> sqls = new ArrayList<>();if (!marksSplit.isEmpty()) {for (int i = 0; i < marksSplit.size(); i++) {if (i == 0) {sqls.add(sql.substring(0, marksSplit.get(i)));} else {sqls.add(sql.substring(marksSplit.get(i - 1) + 1, marksSplit.get(i)));}}} else {sqls.add(sql);}return sqls;}public static Map<String, Hashtable<String, Integer>> sqlParse(String sql) throws ParseException {ParseDriver pd = new ParseDriver();List<String> sqlList = splitSql(sql);Map<String, Hashtable<String, Integer>> map = new HashMap<>();for(String s : sqlList) {ASTNode ast = pd.parse(s);parseIteral(ast);Hashtable<String, Integer> hashtable = new Hashtable<>();for (String table : tables) {String[] split = table.split("  ");String tableName = split[0];if(hashtable.containsKey(tableName)) {hashtable.put(tableName, hashtable.get(tableName)+1);}else {hashtable.put(tableName, 1);}}map.put("tables", hashtable);}return map;}/*** 解析sql并选出指定类型的相关表* @param sql* @param type* @return* @throws ParseException*/public static Map<String, Hashtable<String, Integer>> sqlParseAssignType(String sql,String type) throws Exception {ParseDriver pd = new ParseDriver();List<String> sqlList = splitSql(sql);Map<String, Hashtable<String, Integer>> map = new HashMap<>();for(String s : sqlList) {ASTNode ast = pd.parse(s);parseIteral(ast);Hashtable<String, Integer> hashtable = new Hashtable<>();for (String table : tables) {String[] split = table.split("  ");String tableName = split[0];String tableType = split[1];if(tableType.equalsIgnoreCase(type)) {  //只有符合类型的才会被保存if(hashtable.containsKey(tableName)) {hashtable.put(tableName, hashtable.get(tableName)+1);}else {hashtable.put(tableName, 1);}}}map.put("tables", hashtable);}return map;}public static Map<String, Hashtable<String, Integer>> sqlParseAssignType(String sql) throws Exception {ParseDriver pd = new ParseDriver();List<String> sqlList = splitSql(sql);Map<String, Hashtable<String, Integer>> map = new HashMap<>();for(String s : sqlList) {ASTNode ast = pd.parse(s);parseIteral(ast);Hashtable<String, Integer> hashtable = new Hashtable<>();for (String table : tables) {String[] split = table.split("  ");String tableName = split[0];if(hashtable.containsKey(tableName)) {hashtable.put(tableName, hashtable.get(tableName)+1);}else {hashtable.put(tableName, 1);}}map.put("tables", hashtable);}return map;}}

定义一个集合收集一下解析的内容

Map<String, Hashtable<String, Integer>> sqlParse=new HashMap<>();

对sql进行解析

HiveParseUtils.tables = new ArrayList<>();
sqlParse = HiveParseUtils.sqlParseAssignType(line);

获得热度

//得到内容
Map<String,Integer> result=new HashMap<>();
Hashtable<String, Integer> hashtable = sqlParse.get("tables");
//循环遍历,key为表名,value为表的使用次数
for (Iterator itr = hashtable.keySet().iterator(); itr.hasNext(); ) {String key = (String) itr.next();String table = key.trim().toLowerCase();if (key.contains(".")) {//这里这步是只获取表名,不要库名,可按需改造table = key.split("\\.")[1];}int value = hashtable.get(key);}

搞定!!!

java 解析hiveserver2日志 解析HiveSQL 获取表的使用次数 热度相关推荐

  1. Java垃圾回收日志解析

    1.开启垃圾回收日志 在运行一个java程序时可以在命令行中加入相应的JVM垃圾回收参数,获取程序运行时详细的垃圾回收日志信息.以下是一些大概的参数: -XX:+PrintGC与-verbose:gc ...

  2. java解析sql查询字段_sql解析json格式字段 如何获取json中某个字段的值?

    java将json数据解析为sql语句?小编给你倒一杯热水.可你惦记着其他饮料,所以你将它放置一旁.等你想起那杯水时,可惜它已经变得冰冷刺骨. 图片中是json数据,每个数据的开头都有表名称,操作类型 ...

  3. mysql binlog 统计_对MySQL binlog日志解析,统计每张表的DML次数

    想要获取每天数据库每张表的DML的次数,统计热度表,可以使用该脚本 # coding:utf-8 # 解析binlog,统计热度表,表的DML个数 import sys import os # mys ...

  4. java解析apache日志_使用Apache Log4j 2解析日志条目

    我搜索了StackOverflow和网络,以获取如何使用Apache Log4j 2来解析现有日志文件的示例 . 我已经读过Apache有一个子项目"Chainsaw",它是日志文 ...

  5. java微信开发API解析(二)-获取消息和回复消息

    java微信开发API解析(二)-获取消息和回复消息 说明 * 本演示样例依据微信开发文档:http://mp.weixin.qq.com/wiki/home/index.html最新版(4/3/20 ...

  6. java解析xml文件并写入Excel表

    解析xml文件并写入Excel表 类似于如下格式的xml数据,解析之后将数据写入Excel中 <?xml version="1.0" encoding="UTF-8 ...

  7. 使用filebeat和logstash解析java的log4j日志

    目的:我们使用filebeat来接受日志,并将日志做一个简单的处理,然后发给logstash,使用logstash的filter的grok来匹配日志,将日志解析成json格式并将日志输出在logsta ...

  8. 在Java中使用Grok解析日志

    1.引入maven依赖 <dependency><groupId>io.krakens</groupId><artifactId>java-grok&l ...

  9. Java高级面试题解析(二):百度Java面试题前200页(精选)

    基本概念 操作系统中 heap 和 stack 的区别 heap是堆,stack是栈,是两种不同的数据结构.堆是队列优先,先进先出:栈是先进后出. 在java多线程中,每个线程都有自己的栈:不同的线程 ...

最新文章

  1. python并发1000个http请求_php下api接口的并发http请求
  2. ViewStub must have a valid layoutResource
  3. 了解React.js中数组子项的唯一键
  4. swap的实现(没有中间变量)
  5. 【python】数据结构与算法之快速排序(重要)
  6. Cpp 对象模型探索 / placement new 实现原理
  7. 解决Jenkins邮件配置问题
  8. 开放-封闭原则(OCP)
  9. struts2实现XML异步交互
  10. linux开机密码取消,如何取消电脑开机密码
  11. HBuilder(H5+App)中集成腾讯云通信IM功能
  12. Windows平台的网速监控悬浮窗软件
  13. 具体时间转换cron表达式
  14. udacity深度学习--2. 深度学习简介--LESSON5 Jupyter notebook
  15. 读《终身学习》 哈佛毕业后的六堂课,整理总结
  16. 公众号文章留言评论功能开通方法(详解)
  17. midi键盘接电脑实现电子琴功能
  18. 工程师解读:为何华为手机“干不过”小米(1)
  19. 河南科技大学计算机自考本科,北京河南科技大学自考本科计算机及应用专业一年毕业...
  20. Scrapy爬虫之热门网站数据爬取--------第2关

热门文章

  1. 什么是云管平台?业界知名的云管平台品牌有哪些?
  2. 基于微博评论的细粒度的虚假信息识别软件
  3. 不得不知IOC和AOP
  4. 无线hacking系统—wifislax
  5. stack-es-标准篇-ElasticsearchClient-fields
  6. 前端的小激动:Nodejs写简单接口教程
  7. Oracle VM VirtualBox 解决 “不能为虚拟电脑 打开一个新任务”
  8. 汇编语言中PTR的含义及作用
  9. R语言近期记录(201911)
  10. 淇℃伅 [main] org.apache.catalina.startup.VersionLoggerListener.log Server.鏈嶅姟鍣ㄧ増鏈�: