impala有时查询报错内存不足,并持续一段时间后自动恢复,报错时日志如下:

org.apache.hive.service.cli.HiveSQLException: ExecQueryFInstances rpc query_id=834c3b2376181f0e:a901620f00000000 failed: Failed to get minimum memory reservation of 204.00 MB on daemon 192.168.0.1:22000 for query 834c3b2376181f0e:a901620f00000000 due to following error: Failed to increase reservation by 204.00 MB because it would exceed the applicable reservation limit for the "Process" ReservationTracker: reservation_limit=8.50 GB reservation=8.50 GB used_reservation=0 child_reservations=8.50 GB

The top 5 queries that allocated memory under this tracker are:

Query(404fa28252334daf:5bb8cef500000000): Reservation=974.00 MB ReservationLimit=8.00 GB OtherMemory=31.14 MB Total=1005.14 MB Peak=1.00 GB

Query(3443bd95eb37b73a:885bbe6f00000000): Reservation=784.00 MB ReservationLimit=8.00 GB OtherMemory=31.14 MB Total=815.14 MB Peak=826.21 MB

Query(894b4caf9b397385:16df941300000000): Reservation=558.00 MB ReservationLimit=8.00 GB OtherMemory=15.77 MB Total=573.77 MB Peak=595.57 MB

Query(6c42b50ef6e5ba57:42aa5de200000000): Reservation=558.00 MB ReservationLimit=8.00 GB OtherMemory=15.77 MB Total=573.77 MB Peak=595.57 MB

Query(6142bb9f76823b8a:97e6451900000000): Reservation=508.00 MB ReservationLimit=8.00 GB OtherMemory=15.77 MB Total=523.77 MB Peak=534.67 MB

Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error.

at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)

at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)

at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)

at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)

at org.apache.hive.jdbc.HivePreparedStatement.executeQuery(HivePreparedStatement.java:109)

at com.alibaba.druid.pool.DruidPooledPreparedStatement.executeQuery(DruidPooledPreparedStatement.java:227)

at com.dataone.xishaoye.backend.dataoneapi.config.impala.RoundRobinPattern.winning(RoundRobinPattern.java:56)

at com.dataone.xishaoye.backend.dataoneapi.common.util.ConvertQueryUtils.convertcustomerexecQueryMapList(ConvertQueryUtils.java:92)

at com.dataone.xishaoye.backend.dataoneapi.businessAnalysis_v2.timingTasks.RealTimeTotalYear.realTimeTotalYear(RealTimeTotalYear.java:48)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at com.dataone.xishaoye.backend.dataoneapi.config.quartz.SystemJobFactory.execute(SystemJobFactory.java:29)

at org.quartz.core.JobRunShell.run(JobRunShell.java:202)

at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)

可以看到有5个query正在进行导致当前query获取不到足够的资源,查看impala日志 /var/log/impalad/impalad.INFO,

这5个query执行简略过程为:

第1个

I0731 13:15:03.254120 11667 impala-server.cc:972] Registered query query_id=404fa28252334daf:5bb8cef500000000 session_id=ea422a1e110dec91:7de05e982c2b4c90

...

I0731 13:15:04.135802 11720 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=404fa28252334daf:5bb8cef500000000 refcnt=4

I0731 13:21:35.669724 22421 coordinator.cc:956] Backend completed: host=192.168.0.1:22000 remaining=3 query_id=404fa28252334daf:5bb8cef500000000

...

I0731 13:22:09.125800 11667 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=404fa28252334daf:5bb8cef500000000 refcnt=1

第2个

I0731 13:09:02.447206 22390 impala-server.cc:972] Registered query query_id=3443bd95eb37b73a:885bbe6f00000000 session_id=ff4e12e38ce19b8d:56bef858ad9def93

...

I0731 13:09:02.657199 22396 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=3443bd95eb37b73a:885bbe6f00000000 refcnt=4

I0731 13:21:30.155854 9378 coordinator.cc:956] Backend completed: host=192.168.0.1:22000 remaining=3 query_id=3443bd95eb37b73a:885bbe6f00000000

...

I0731 13:22:13.168628 22390 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=3443bd95eb37b73a:885bbe6f00000000 refcnt=1

第3个

I0731 13:14:02.140969 3886 impala-server.cc:972] Registered query query_id=894b4caf9b397385:16df941300000000 session_id=6e47d4c6c94a8e19:b6384942e957718c

...

I0731 13:14:02.598412 3927 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=894b4caf9b397385:16df941300000000 refcnt=7

I0731 13:21:18.426283 3931 krpc-data-stream-mgr.cc:293] DeregisterRecvr(): fragment_instance_id=894b4caf9b397385:16df941300000011, node=18

...

I0731 13:22:06.124980 3886 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=894b4caf9b397385:16df941300000000 refcnt=1

第4个

I0731 13:07:01.442411 20737 impala-server.cc:972] Registered query query_id=6c42b50ef6e5ba57:42aa5de200000000 session_id=ba4cab33df74b6e4:94e56665c8bf36a1

...

I0731 13:07:01.692301 20752 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=6c42b50ef6e5ba57:42aa5de200000000 refcnt=7

I0731 13:21:18.492522 20755 krpc-data-stream-mgr.cc:293] DeregisterRecvr(): fragment_instance_id=6c42b50ef6e5ba57:42aa5de200000011, node=18

...

I0731 13:22:03.314045 20737 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=6c42b50ef6e5ba57:42aa5de200000000 refcnt=1

第5个

I0731 13:12:02.287282 29199 impala-server.cc:972] Registered query query_id=6142bb9f76823b8a:97e6451900000000 session_id=6b497f1f4c2a1000:2fca9a9feb8a268d

...

I0731 13:12:02.876965 29214 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=6142bb9f76823b8a:97e6451900000000 refcnt=7

I0731 13:21:18.524863 29219 krpc-data-stream-mgr.cc:293] DeregisterRecvr(): fragment_instance_id=6142bb9f76823b8a:97e6451900000011, node=18

...

I0731 13:22:08.481279 29199 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=6142bb9f76823b8a:97e6451900000000 refcnt=1

每个query具体执行的sql都可以在所有impalad中的一台的impalad.INFO中找到,这里不再列举

发现一些规律:这5个query都持续十几分钟,而且查询过程中都有很长的空白(没有任何日志输出),都从13点21开始陆续恢复,进一步查看空白过程的日志,发现有很多报错

java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.1:42194 remote=/192.168.0.2:50010]

在impala日志查看发现异常SocketTimeoutException持续时间从13点06到13点21

# tail -400000 impalad.INFO|grep SocketTimeoutException -B 2

W0731 13:06:00.506495 31507 BlockReaderFactory.java:570] BlockReaderFactory(fileName=/part-00001-a55afd98-d28f-4493-985d-0cf1f8577cfe.c000, block=BP-692799849-192.168.0.2-1556106550

816:blk_1084633665_10894033): I/O error requesting file descriptors. Disabling domain socket DomainSocket(fd=800,path=/var/run/hdfs-sockets/dn)

Java exception follows:

java.net.SocketTimeoutException: read(2) error: Resource temporarily unavailable

...

W0731 13:21:09.401382 31508 DFSInputStream.java:704] Failed to connect to /192.168.0.1:50010 for block BP-692799849-192.168.0.2-1556106550816:blk_1084742948_11003322, add to deadNodes and continue.

Java exception follows:

java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.2:35019 remote=/192.168.0.1:50010]

192.168.0.1上的datanode在这段时间响应异常,查看datanode日志 /var/log/hadoop-hdfs/hadoop-cmf-hdfs-DATANODE-192.168.0.1.log.out

在这段时间有很多异常

2020-07-31 13:02:50,431 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1306ms

在datanode日志查看发现异常持续时间从13点02到13点21

# tail -347000 hadoop-cmf-hdfs-DATANODE-192.168.0.1.log.out.1|grep JvmPauseMonitor

2020-07-31 13:02:50,431 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1306ms

...

2020-07-31 13:21:16,102 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2399ms

调整datanode内存(Java Heap Size of DataNode in Bytes),问题解决

总结: datanode内存不足导致gc,进而导致impala查询中部分查询(用到该datanode上数据的查询)卡住,进而导致impala大量内存被占用,进而导致impala查询报错内存不足

impala java 堆内存配置_【原创】大叔问题定位分享(38)impala报错内存不足相关推荐

  1. java 输出全部小写_输入小写,输出大写,为什么报错?

    输入小写,输出大写,为什么报错? 要实现在文本框里实现输入小写,输出大写 import java.awt.*; import java.awt.Event; import java.awt.TextA ...

  2. docker配置daemon.json或docker.service后重启docker报错的问题

    docker配置daemon.json或docker.service后重启docker报错的问题 在更改或创建docker的配置文件时,我们通常都会重启使docker载入这些配置.但有的时候重启会报错 ...

  3. java 读取excel2007 内存不足_原创 |我是如何解决POI解析Excel出现的OOM问题的?

    背景 之前接手过一个解析Excel的项目,使用的是Java里的POI组件解析的,但是在解析时候经常出现OOM,后来我从下面几个方面优化了下,解决了99%的问题,对,你没看错,只解决了99%. 解决方案 ...

  4. java 堆转储快照_捕获Java堆转储的7个选项

    java 堆转储快照 堆转储是诊断与内存相关的问题的重要工件,例如内存泄漏缓慢,垃圾回收问题和java.lang.OutOfMemoryError.它们也是优化内存消耗的重要工件. 有很棒的工具,例如 ...

  5. arthas 排查内存溢出_【spark-tips】spark2.4.0触发的executor内存溢出排查

    版本升级背景 spark 2.4.0 最近刚发版,新增了很多令人振奋的特性.由于本司目前使用的是spark 2.3.0版本,本没打算这么快升级到2.4.0.无奈最近排查出的两个大bug迫使我们只能对s ...

  6. 【原创】大叔问题定位分享(33)oozie提交任务报错ArithmeticException: / by zero

    oozie提交workflow后执行task报错: 2019-07-04 17:19:00,559 ERROR [RMCommunicator Allocator] org.apache.hadoop ...

  7. spark executor内存分配_二十二、Spark之图解Executor端内存管理

    Spark应用程序执行时,Spark集群会启动Driver和Executor两种JVM进程,Driver端负责创建SparkContext上下文(通往集群的唯一通道),构建DAG, 创建Task并进行 ...

  8. 阿里云服务器mysql内存限制_高性能的MySQL(8)优化服务器配置一内存

    配置MySQL服务器离不开配置文件,接下来就开始这一部分的内容. 首先一定要清楚配置文件的位置,如果不知道可以尝试下面的操作: 有时候可以使用show global status 的输出来看状态,有的 ...

  9. 【原创】大叔问题定位分享(33)beeline连接presto报错

    hive2.3.4 presto0.215 使用hive2.3.4的beeline连接presto报错 $ beeline -d com.facebook.presto.jdbc.PrestoDriv ...

最新文章

  1. Android逆向--如何调试smali代码?
  2. 一文读懂支持向量积核函数(附公式)
  3. python getattr函数_[转]Python中的getattr()函数详解
  4. Android 系统(257)---Launcher显示未读通知的数量
  5. java dojo_Dojo入门三种HelloWorld!
  6. 领导开会为什么总爱在桌子上摆一个水杯?
  7. 【付费毕设】php mysql社团报名管理系统
  8. max std value 宏_【转载】:【C++跨平台系列】解决STL的max()与numeric_limits::max()和VC6 min/max 宏冲突问题...
  9. python数独游戏源代码_Python数独游戏源代码
  10. 基于FPGA的智能家具之PM2.5传感器,温湿度传感器驱动设计
  11. C++制作一个连点器
  12. 公众号运营工具有哪些?
  13. 计算机网络概述上海电力,上海电力大学2021考研复试计算机网络考试大纲
  14. ubuntu rar解压缩
  15. 侦探系列-照片隐藏信息提取
  16. HTML5前端期末大作业 HTML+CSS+JavaScript防锤子手机商城官网 web前端网页设计实例 企业网站制作
  17. Maven:解决IDEA无法下载源码
  18. USB通信协议与供电协议全解
  19. 神奇宝贝光是无限远服务器,Pokémon Online
  20. 关于汇编语言中的转移指令原理——offset

热门文章

  1. android+如何设置单屏壁纸,给你一个设置单屏壁纸的软件
  2. c语言中简易公交一卡通系统的实现
  3. 51nod-生产口罩(拓补排序+DP)by zyz
  4. cad 打开硬件加速卡_CAD经常性卡顿?要怎么解决?
  5. Openwrt编译feeds机制
  6. jscript.dll 加载失败
  7. 求最长不下降序列:逆推法
  8. 华为便携机修改服务器密码,华为随身WiFi如何修改WiFi密码 华为随身WiFi修改WiFi密码方法【介绍】...
  9. D - Free Candies UVA - 10118
  10. 对于String作为 HashMap key 的一些思考。