flume-异常Closing file:log.xxxtmp failed. Will retry again in 180 seconds
文章目录
- 现象
- 原因
- 解决方案
现象
21/12/10 16:24:13 WARN hdfs.BucketWriter: Closing file: hdfs://xxx/origin_data/kafka/topicBoxLauncher/2021/12/10/09/log.1639099219793.tmp failed. Will retry again in 180 seconds.
java.nio.channels.ClosedChannelExceptionat org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1993)at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2404)at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:2349)at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.flume.sink.hdfs.AbstractHDFSWriter.hflushOrSync(AbstractHDFSWriter.java:266)at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:134)at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:319)at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:316)at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727)at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724)at java.util.concurrent.FutureTask.run(FutureTask.java:266)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)21/12/10 16:54:34 ERROR hdfs.HDFSEventSink: process failed
java.lang.InterruptedException: Timed out before HDFS call was made. Your hdfs.callTimeout might be set too low or HDFS calls are taking too long.at org.apache.flume.sink.hdfs.BucketWriter.checkAndThrowInterruptedException(BucketWriter.java:708)at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:477)at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:441)at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)at java.lang.Thread.run(Thread.java:748)
21/12/10 16:54:34 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.InterruptedException: Timed out before HDFS call was made. Your hdfs.callTimeout might be set too low or HDFS calls are taking too long.at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:464)at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException: Timed out before HDFS call was made. Your hdfs.callTimeout might be set too low or HDFS calls are taking too long.at org.apache.flume.sink.hdfs.BucketWriter.checkAndThrowInterruptedException(BucketWriter.java:708)at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:477)at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:441)... 3 more
原因
虽然在HDFSWriter.close(.aaa.lzo.tmp)时发生了TimeoutException,进行了Cancel操作造成InterruptedException,并使用线程池进行文件close的重试。但是rename过程是依旧会继续执行的,因为上述放入close重试过程是通过submit()方法放入线程池异步执行的。
也就是说,文件名.aaa.lzo.tmp已经被修改为aaa.lzo,但一直在以旧的文件名.aaa.lzo.tmp重复close文件。
解决方案
手动关闭
hdfs fsck /xxx/20190813 -openforwrite | grep OPENFORWRITE |awk -F ’ ’ ‘{print $1}’
hdfs debug recoverLease -path /xxx/20190813/aaa.lzo
修改conf
app.sinks.k3.hdfs.callTimeout=600000
app.sinks.k3.hdfs.closeTries=20
app.sinks.k3.hdfs.idleTimeout = 10
根本方案 修改源码
public synchronized void close(boolean callCloseCallback)throws IOException, InterruptedException {checkAndThrowInterruptedException();try {flush();} catch (IOException e) {LOG.warn("pre-close flush failed", e);}boolean failedToClose = false;LOG.info("Closing {}", bucketPath);CallRunner<Void> closeCallRunner = createCloseCallRunner();int tryTime = 1;while (isOpen && tryTime <= 5) {try {callWithTimeout(closeCallRunner);sinkCounter.incrementConnectionClosedCount();} catch (IOException e) {LOG.warn("failed to close() HDFSWriter for file (try times:" + tryTime + "): " + bucketPath +". Exception follows.", e);sinkCounter.incrementConnectionFailedCount();failedToClose = true;}if (failedToClose) {isOpen = true;tryTime++;Thread.sleep(this.callTimeout);} else {isOpen = false;}}//如果isopen失敗if (isOpen) {LOG.error("failed to close file: " + bucketPath + " after " + tryTime + " tries.");} else {LOG.info("HDFSWriter is already closed: {}", bucketPath);}// NOTE: timed rolls go through this codepath as well as other roll typesif (timedRollFuture != null && !timedRollFuture.isDone()){timedRollFuture.cancel(false); // do not cancel myself if running!timedRollFuture = null;}if (idleFuture != null && !idleFuture.isDone()){idleFuture.cancel(false); // do not cancel myself if running!idleFuture = null;}if (bucketPath != null && fileSystem != null) {// could block or throw IOExceptiontry {renameBucket(bucketPath, targetPath, fileSystem);} catch (Exception e) {LOG.warn("failed to rename() file (" + bucketPath +"). Exception follows.", e);sinkCounter.incrementConnectionFailedCount();final Callable<Void> scheduledRename =createScheduledRenameCallable();timedRollerPool.schedule(scheduledRename, retryInterval,TimeUnit.SECONDS);}}if (callCloseCallback){runCloseAction();closed = true;}}
flume-异常Closing file:log.xxxtmp failed. Will retry again in 180 seconds相关推荐
- [Flume]使用 Flume 来传递web log 到 hdfs 的例子
[Flume]使用 Flume 来传递web log 到 hdfs 的例子: 在 hdfs 上创建存储 log 的目录: $ hdfs dfs -mkdir -p /test001/weblogsfl ...
- log4j.properties配置与将异常输出到Log日志文件实例
将异常输出到 log日志文件 实际项目中的使用: <dependencies><dependency><groupId>org.slf4j</groupId& ...
- eclipse svn异常:RA layer request failed 的解决方案
eclipse svn异常:RA layer request failed 的解决方案 参考文章: (1)eclipse svn异常:RA layer request failed 的解决方案 (2) ...
- No such file or directory: Failed to enable the 'httpready' Accept Filter
No such file or directory: Failed to enable the 'httpready' Accept Filter出现该问题的原因,是因为系统没有加载accf_http ...
- ninja Compiling the C compiler identification source file CMakeCCompilerId.c failed
Compiling the C compiler identification source file "CMakeCCompilerId.c" failed 同时,控制台也报错: ...
- Directory lookup for the file xxx.mdf failed with the operating system error 2
Directory lookup for the file "xxx.mdf" failed with the operating system error 2(系统找不到指定的文 ...
- JavaSE(十)——set和map集合、异常、File类
文章目录 1.set集合 2. Map集合 2.1 Map概述 2.2 Map集合的基本功能 2.3 Map集合遍历 3.异常 3.1 异常的继承体系 3.2 异常处理的两种方式 3.3 异常注意事项 ...
- Get data from file(xxx.png) failed!
最近有很多朋友在群里问用VS+Cocos2d-x 加载图片资源时总是报" Get data from file(xxx.png) failed! " 错误,问题虽小,但却能影响项 ...
- Error: Already running on PID 31356 (or pid file 'log/gunicorn.pid' is stale)
问题复现: (Python3.6) appleyuchi@ubuntu19:geventuse$ gunicorn -c gun.py geventuse.wsgi:application [2020 ...
最新文章
- 记一次Quartz重复调度(任务重复执行)的问题排查
- 《程序员之禅》一一第3章 为什么要进行禅修编程
- 基于Linux 的VM TOOLS Install
- 802.11帧聚合技术
- C++判断文件夹是否存在
- zabbix的入门到精通之zabbix的触发器Trigger
- 如何将xml的String字符串转化标准格式的String字符串
- c语言case label,an enum switch case label must be the unqualified name of an enumeration constant
- PYPL 1 月 IDE 榜单:Visual Studio Code 猛追 IntelliJ
- Android ClassLoader笔记(二)
- 多个同名进程linux获取对应pid,Linux Shell根据进程名获取PID
- ADT-bundle(Android Development Tools)环境配置
- [转]跨域资源共享 CORS 详解
- SAP 下载EXCEL模板
- html5dragw3c,HTML5拖拽功能drag
- php http请求 返回数据包太大 499,http错误码原理及复现 - 499,500,502,504
- HTML5七夕情人节表白网页制作【唯美3D相册】HTML+CSS+JavaScript
- 地图开发技术报告(百度地图)
- 申请加精—ERP实施方法论的比较(SAP、 Oracle、J.D.E、BANN、用友等实施方法论)...
- 如何对长期出差的员工进行“人文关怀”,前提是差旅标准仍然很抠?
热门文章
- 接口测试 Pytest的简单示例
- nginx【nginx跨域、nginx开启gizp压缩、nginx服务器部署项目】
- 通过sql给表添加字段
- VMware虚拟机超简单的联网方法
- xiandian云计算平台IAAS图文篇-controller节点
- 如何使用抽象类和抽象方法
- (4)数据分析-正态性检验与方差齐性检验
- 2017计算机研究生专业排名,2017年USNews美国大学计算机硕士研究生专业排名TOP110...
- 《西西弗神话》读后感
- python中iloc切片_Dataframe选择行列loc,iloc,切片,布尔索引,条件判断等