解析op.log日志
解析op.log日志
op.log内容如下:
1593136280858|{"cm":{"ln":"-55.0","sv":"V2.9.6","os":"8.0.4","g":"C6816QZ0@gmail.com","mid":"489","nw":"3G","l":"es","vc":"4","hw":"640*960","ar":"MX","uid":"489","t":"1593123253541","la":"5.2","md":"sumsung-18","vn":"1.3.4","ba":"Sumsung","sr":"I"},"ap":"app","et":[{"ett":"1593050051366","en":"loading","kv":{"extend2":"","loading_time":"14","action":"3","extend1":"","type":"2","type1":"201","loading_way":"1"}},{"ett":"1593108791764","en":"ad","kv":{"activityId":"1","displayMills":"78522","entry":"1","action":"1","contentType":"0"}},{"ett":"1593111271266","en":"notification","kv":{"ap_time":"1593097087883","action":"1","type":"1","content":""}},{"ett":"1593066033562","en":"active_background","kv":{"active_source":"3"}},{"ett":"1593135644347","en":"comment","kv":{"p_comment_id":1,"addtime":"1593097573725","praise_count":973,"other_id":5,"comment_id":9,"reply_count":40,"userid":7,"content":"辑赤蹲慰鸽抿肘捎"}}]}
1593136280858|{"cm":{"ln":"-114.9","sv":"V2.7.8","os":"8.0.4","g":"NW0S962J@gmail.com","mid":"490","nw":"3G","l":"pt","vc":"8","hw":"640*1136","ar":"MX","uid":"490","t":"1593121224789","la":"-44.4","md":"Huawei-8","vn":"1.0.1","ba":"Huawei","sr":"O"},"ap":"app","et":[{"ett":"1593063223807","en":"loading","kv":{"extend2":"","loading_time":"0","action":"3","extend1":"","type":"1","type1":"102","loading_way":"1"}},{"ett":"1593095105466","en":"ad","kv":{"activityId":"1","displayMills":"1966","entry":"3","action":"2","contentType":"0"}},{"ett":"1593051718208","en":"notification","kv":{"ap_time":"1593095336265","action":"2","type":"3","content":""}},{"ett":"1593100021275","en":"comment","kv":{"p_comment_id":4,"addtime":"1593098946009","praise_count":220,"other_id":4,"comment_id":9,"reply_count":151,"userid":4,"content":"抄应螟皮釉倔掉汉蛋蕾街羡晶"}},{"ett":"1593105344120","en":"praise","kv":{"target_id":9,"id":7,"type":1,"add_time":"1593098545976","userid":8}}]}
- 进入spark,从hdfs中导入数据
val fileRDD = sc.textFile("hdfs://192.168.129.100:9000/kb09file/op.log")
- 开始清洗数据,将开始的15XXXXX单独做成id列,并重新拼接转换成dateframe
val jsonStrRDD = fileRDD.map(x=>x.split('|')).map(x=>(x(0),x(1)))val jsonRDD = jsonStrRDD.map(x=>{var jsonStr=x._2; jsonStr = jsonStr.substring(0,jsonStr.length-1); jsonStr+",\"id\":\""+x._1+"\"}" })val jsonDF = jsonRDD.toDF
- 导入相关包
import org.apache.spark.sql.types._import org.apache.spark.sql.functions._
- 将json字符串 {“cm”:“a1”,“ap”:“b1”;“et”:“c1”;“id”:“d1”} 结构化
val jsonDF2 = jsonDF.select(get_json_object($"value","$.cm").alias("cm"),get_json_object($"value","$.ap").alias("ap"),get_json_object($"value","$.et").alias("et"),get_json_object($"value","$.id").alias("id")
)
- 将cm中的数据结构化
val jsonDF3 = jsonDF2.select($"id",$"ap",get_json_object($"cm","$.ln").alias("ln"),get_json_object($"cm","$.sv").alias("sv"),get_json_object($"cm","$.os").alias("os"),get_json_object($"cm","$.g").alias("g"),get_json_object($"cm","$.mid").alias("mid"),get_json_object($"cm","$.nw").alias("nw"),get_json_object($"cm","$.l").alias("l"),get_json_object($"cm","$.vc").alias("vc"),get_json_object($"cm","$.hw").alias("hw"),get_json_object($"cm","$.ar").alias("ar"),get_json_object($"cm","$.uid").alias("uid"),get_json_object($"cm","$.t").alias("t"),get_json_object($"cm","$.la").alias("la"),get_json_object($"cm","$.md").alias("md"),get_json_object($"cm","$.vn").alias("vn"),get_json_object($"cm","$.ba").alias("ba"),get_json_object($"cm","$.sr").alias("sr"),$"et")
jsonDF3.select(from_json($"et",ArrayType(StructType(StructField("ett",StringType)::StructField("en",StringType)::StructField("kv",StringType)::Nil))).alias("event")
).show(false)
- from_json 把字符串
[
{“ett”:“a1”,“en”:“a2”,“kv”:“a3”}
{“ett”:“b1”,“en”:“b2”,“kv”:“b3”}
{“ett”:“c1”,“en”:“c2”,“kv”:“c3”}
] 结构化
val jsonDF4 = jsonDF3.select($"id",$"ap",$"ln",$"sv",$"os",$"g",$"mid",$"l",$"vc",$"hw",
$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",from_json($"et",ArrayType(StructType(StructField("ett",StringType)::StructField("en",StringType)
::StructField("kv",StringType)
::Nil
val jsonDF5 = jsonDF4.select($"id",$"ap",$"ln",$"sv",$"os",$"g",$"mid",$"l",$"vc",$"hw",$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",explode($"event").alias("event"))jsonDF5.select($"id",$"ap",$"ln",$"sv",$"os",$"g",$"mid",$"l",$"vc",$"hw",$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",$"event.ett",$"event.en",$"event.kv").show(false)))).as("event")
七.对en的类型进行分类结构化
val df6 = df5.filter($"en"==="praise").select(
$"id",$"ap",$"ln",$"sv",$"os",$"g",$"mid",$"l",$"vc",$"hw"
,$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",$"ett",$"en",
get_json_object($"kv","$.target_id").as("target_id"),
get_json_object($"kv","$.id").as("id"),
get_json_object($"kv","$.type").as("type"),
get_json_object($"kv","$.add_time").as("add_time"),
get_json_object($"kv","$.userid").as("userid"))
val df7 = df5.filter($"en"==="notification").select(
$"id",$"ap",$"ln",$"sv",$"os",$"g",$"mid",$"l",$"vc",$"hw"
,$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",$"ett",$"en",
get_json_object($"kv","$.ap_time").as("ap_time"),
get_json_object($"kv","$.action").as("action"),
get_json_object($"kv","$.type").as("type"),
get_json_object($"kv","$.content").as("content"))
val df8 = df5.filter($"en"==="comment").select(
$"id",$"ap",$"ln",$"sv",$"os",$"g",$"mid",$"l",$"vc",$"hw"
,$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",$"ett",$"en",
get_json_object($"kv","$.p_comment_id").as("p_comment_id"),
get_json_object($"kv","$.addtime").as("addtime"),
get_json_object($"kv","$.praise_count").as("praise_count"),
get_json_object($"kv","$.other_id").as("other_id"),
get_json_object($"kv","$.comment_id").as("comment_id"),
get_json_object($"kv","$.reply_count").as("reply_count"),
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201122224358972.png#pic_center)
val df9 = df5.filter($"en"==="ad").select(
$"id",$"ap",$"ln",$"sv",$"os",$"g",$"mid",$"l",$"vc",$"hw"
,$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",$"ett",$"en",
get_json_object($"kv","$.activityId").as("activityId"),
get_json_object($"kv","$.displayMills").as("displayMills"),
get_json_object($"kv","$.entry").as("entry"),
get_json_object($"kv","$.action").as("action"),
get_json_object($"kv","$.contentType").as("contentType"))
get_json_object($"kv","$.userid").as("userid"),
get_json_object($"kv","$.content").as("content"))
val df10 = df5.filter($"en"==="active_background").select(
$"id",$"ap",$"ln",$"sv",$"os",$"g",$"mid",$"l",$"vc",$"hw"
,$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",$"ett",$"en",
get_json_object($"kv","$.active_source").as("active_source"))
val df11 = df5.filter($"en"==="loading").select(
$"id",$"ap",$"ln",$"sv",$"os",$"g",$"mid",$"l",$"vc",$"hw"
,$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",$"ett",$"en",
get_json_object($"kv","$.extend2").as("extend2"),
get_json_object($"kv","$.loading_time").as("loading_time"),
get_json_object($"kv","$.action").as("action"),
get_json_object($"kv","$.extend1").as("extend1"),
get_json_object($"kv","$.type").as("type"),
get_json_object($"kv","$.type1").as("type1"),
get_json_object($"kv","$.loading_way").as("loading_way"))
八.保存到hive中
nohup hive --service metastore & //打开metastore
create database logproject //在hive中创建一个数据库
df6.write.mode("overwrite").saveAsTable("logproject.praiseDF")
df7.write.mode("overwrite").saveAsTable("logproject.notificationDF")
df8.write.mode("overwrite").saveAsTable("logproject.commentDF")
df9.write.mode("overwrite").saveAsTable("logproject.adDF")
df10.write.mode("overwrite").saveAsTable("logproject.active_backgroundDF")
df11.write.mode("overwrite").saveAsTable("logproject.loadingDF")
解析op.log日志相关推荐
- linux如何截断日志,linux 如何截取一段时间内log日志
截取一段时间内的log日志可以使用sed命令对log文件进行抽取操作: 1,sed查看某时间段到现在的系统日志: sed -n '/May 20 17/,$p' /var/log/messages | ...
- Python 解析log日志
Python 解析log日志 软件环境 环境搭建 待解析log日志格式 log解析脚本 解析后文本格式 软件环境 软件 版本 作用 Ubuntu 20.04 操作系统 python 3.8.10 py ...
- java常见log日志的使用方法详细解析
目录 前言 1. Java.util.Logger 2. org.apache.logging.log4j 2.1 xml配置文件 3. org.slf4j.Logger 前言 log日志可以debu ...
- 明明白白炸鸡--APM固件LOG日志全解析线上视频讨论会
经常有人在各大QQ群求分析LOG日志,分析炸鸡原因.我们作为开发者角度以为,不管是航模爱好者和学习研究APM固件的飞控人员,对于LOG日志的分析都是很有必要的. 其实LOG日志里面已经可以分析出绝大多 ...
- 别用symbolicatecrash来解析crash Log了
今天突然发现了一个解析iOS crash log的好方法,忍不住来分享一下. 相信每个做iOS开发的TX都应该不会对symbolicatecrash陌生,我们第一次遇到真机上产生的崩溃日志时,在网上搜 ...
- Golang中log日志包的使用
文章目录 1.前言 2.log包介绍 3.log包的使用 3.1 日志输出方法 3.2 自定义创建日志对象 3.3 封装自定义日志包 3.4 log包进一步解析 1.前言 作为后端开发人员,日志文件记 ...
- log日志的java动态代理
问题描述:今天老大告诉我说系统的Log日志要修改,有些参数不能打印,有些参数不能打印,有些参数要替换部分内容,要求系统要尽量小的修改.我修改了一天,现记录如下. 思路: 思路一. 定义方法,传入要打印 ...
- Zookeeper源码解析 -- 本地事务日志持久化之FileTxnLog
序言 在各个分布式组件中,持久化数据到本地的思想并不少见,为的是能保存内存中的数据,以及重启后能够重载上次内存状态的值.那么如何行之有效的进行,内存数据持久化到磁盘,怎么样的落盘策略合适,怎么设计持久 ...
- 深度解析Linux通过日志反查入侵
有一个朋友的服务器发现有入侵的痕迹后来处理解决但是由于对方把日志都清理了无疑给排查工作增加了许多难度.刚好手里有些资料我就整理整理贴出来分享一下.其实日志的作用是非常大的.学会使用通过日志来排查解决我 ...
最新文章
- 深入理解Java注解Annotation及自定义注解
- 用owncloud 打造自己的云盘
- 第2节:常量、变量与C语言的数据类型
- C语言实现冒泡排序(bubble排序)算法(附完整源码)
- Web应用程序的简单插件系统
- JAVA类 与类文件
- 属性的表示方法和对象的枚举
- Generic Polygon Module in MAME 0.120u1
- 关于泰勒展开的两点思考
- VMware Cloud Director 被曝严重漏洞,可导致黑客接管企业服务器
- Maple 教程 何青,科学出版社
- 集成maven和Spring boot的profile 专题
- 苹果6swifi温度过高 iphone6s温度过高提醒修复教程
- 《21个项目玩转深度学习》第四章问题总结--Python3.6与2.7的兼容问题及其他小问题
- ES6学习——新的语法:Temporal Dead Zone(TDZ)
- zk4元年拆解_耐克ZK5 Protro 科五复刻“减配”?可能你根本不懂曼巴心意!
- 线性代数中的矩阵运算P(A,E)是什么意思?
- 小米电视微信投屏出现服务器出错,同一wifi下无法投屏怎么办 小米电视不能投屏的解决方法...
- get请求在ie浏览器中缓存问题
- 探索java的 protect/private变量