文章目录

  • 现象
  • 原因
  • 解决方案

现象

21/12/10 16:24:13 WARN hdfs.BucketWriter: Closing file: hdfs://xxx/origin_data/kafka/topicBoxLauncher/2021/12/10/09/log.1639099219793.tmp failed. Will retry again in 180 seconds.
java.nio.channels.ClosedChannelExceptionat org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1993)at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2404)at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:2349)at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.flume.sink.hdfs.AbstractHDFSWriter.hflushOrSync(AbstractHDFSWriter.java:266)at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:134)at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:319)at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:316)at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727)at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724)at java.util.concurrent.FutureTask.run(FutureTask.java:266)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)21/12/10 16:54:34 ERROR hdfs.HDFSEventSink: process failed
java.lang.InterruptedException: Timed out before HDFS call was made. Your hdfs.callTimeout might be set too low or HDFS calls are taking too long.at org.apache.flume.sink.hdfs.BucketWriter.checkAndThrowInterruptedException(BucketWriter.java:708)at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:477)at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:441)at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)at java.lang.Thread.run(Thread.java:748)
21/12/10 16:54:34 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.InterruptedException: Timed out before HDFS call was made. Your hdfs.callTimeout might be set too low or HDFS calls are taking too long.at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:464)at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException: Timed out before HDFS call was made. Your hdfs.callTimeout might be set too low or HDFS calls are taking too long.at org.apache.flume.sink.hdfs.BucketWriter.checkAndThrowInterruptedException(BucketWriter.java:708)at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:477)at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:441)... 3 more

原因

虽然在HDFSWriter.close(.aaa.lzo.tmp)时发生了TimeoutException,进行了Cancel操作造成InterruptedException,并使用线程池进行文件close的重试。但是rename过程是依旧会继续执行的,因为上述放入close重试过程是通过submit()方法放入线程池异步执行的。
也就是说,文件名.aaa.lzo.tmp已经被修改为aaa.lzo,但一直在以旧的文件名.aaa.lzo.tmp重复close文件。

解决方案

手动关闭
hdfs fsck /xxx/20190813 -openforwrite | grep OPENFORWRITE |awk -F ’ ’ ‘{print $1}’
hdfs debug recoverLease -path /xxx/20190813/aaa.lzo

修改conf

app.sinks.k3.hdfs.callTimeout=600000
app.sinks.k3.hdfs.closeTries=20
app.sinks.k3.hdfs.idleTimeout = 10

根本方案 修改源码

public synchronized void close(boolean callCloseCallback)throws IOException, InterruptedException {checkAndThrowInterruptedException();try {flush();} catch (IOException e) {LOG.warn("pre-close flush failed", e);}boolean failedToClose = false;LOG.info("Closing {}", bucketPath);CallRunner<Void> closeCallRunner = createCloseCallRunner();int tryTime = 1;while (isOpen && tryTime <= 5) {try {callWithTimeout(closeCallRunner);sinkCounter.incrementConnectionClosedCount();} catch (IOException e) {LOG.warn("failed to close() HDFSWriter for file (try times:" + tryTime + "): " + bucketPath +". Exception follows.", e);sinkCounter.incrementConnectionFailedCount();failedToClose = true;}if (failedToClose) {isOpen = true;tryTime++;Thread.sleep(this.callTimeout);} else {isOpen = false;}}//如果isopen失敗if (isOpen) {LOG.error("failed to close file: " + bucketPath + " after " + tryTime + " tries.");} else {LOG.info("HDFSWriter is already closed: {}", bucketPath);}// NOTE: timed rolls go through this codepath as well as other roll typesif (timedRollFuture != null && !timedRollFuture.isDone()){timedRollFuture.cancel(false); // do not cancel myself if running!timedRollFuture = null;}if (idleFuture != null && !idleFuture.isDone()){idleFuture.cancel(false); // do not cancel myself if running!idleFuture = null;}if (bucketPath != null && fileSystem != null) {// could block or throw IOExceptiontry {renameBucket(bucketPath, targetPath, fileSystem);} catch (Exception e) {LOG.warn("failed to rename() file (" + bucketPath +"). Exception follows.", e);sinkCounter.incrementConnectionFailedCount();final Callable<Void> scheduledRename =createScheduledRenameCallable();timedRollerPool.schedule(scheduledRename, retryInterval,TimeUnit.SECONDS);}}if (callCloseCallback){runCloseAction();closed = true;}}

flume-异常Closing file:log.xxxtmp failed. Will retry again in 180 seconds相关推荐

  1. [Flume]使用 Flume 来传递web log 到 hdfs 的例子

    [Flume]使用 Flume 来传递web log 到 hdfs 的例子: 在 hdfs 上创建存储 log 的目录: $ hdfs dfs -mkdir -p /test001/weblogsfl ...

  2. log4j.properties配置与将异常输出到Log日志文件实例

    将异常输出到 log日志文件 实际项目中的使用: <dependencies><dependency><groupId>org.slf4j</groupId& ...

  3. eclipse svn异常:RA layer request failed 的解决方案

    eclipse svn异常:RA layer request failed 的解决方案 参考文章: (1)eclipse svn异常:RA layer request failed 的解决方案 (2) ...

  4. No such file or directory: Failed to enable the 'httpready' Accept Filter

    No such file or directory: Failed to enable the 'httpready' Accept Filter出现该问题的原因,是因为系统没有加载accf_http ...

  5. ninja Compiling the C compiler identification source file CMakeCCompilerId.c failed

    Compiling the C compiler identification source file "CMakeCCompilerId.c" failed 同时,控制台也报错: ...

  6. Directory lookup for the file xxx.mdf failed with the operating system error 2

    Directory lookup for the file "xxx.mdf" failed with the operating system error 2(系统找不到指定的文 ...

  7. JavaSE(十)——set和map集合、异常、File类

    文章目录 1.set集合 2. Map集合 2.1 Map概述 2.2 Map集合的基本功能 2.3 Map集合遍历 3.异常 3.1 异常的继承体系 3.2 异常处理的两种方式 3.3 异常注意事项 ...

  8. Get data from file(xxx.png) failed!

    最近有很多朋友在群里问用VS+Cocos2d-x 加载图片资源时总是报" Get data from file(xxx.png) failed!  " 错误,问题虽小,但却能影响项 ...

  9. Error: Already running on PID 31356 (or pid file 'log/gunicorn.pid' is stale)

    问题复现: (Python3.6) appleyuchi@ubuntu19:geventuse$ gunicorn -c gun.py geventuse.wsgi:application [2020 ...

最新文章

  1. 记一次Quartz重复调度(任务重复执行)的问题排查
  2. 《程序员之禅》一一第3章 为什么要进行禅修编程
  3. 基于Linux 的VM TOOLS Install
  4. 802.11帧聚合技术
  5. C++判断文件夹是否存在
  6. zabbix的入门到精通之zabbix的触发器Trigger
  7. 如何将xml的String字符串转化标准格式的String字符串
  8. c语言case label,an enum switch case label must be the unqualified name of an enumeration constant
  9. PYPL 1 月 IDE 榜单:Visual Studio Code 猛追 IntelliJ
  10. Android ClassLoader笔记(二)
  11. 多个同名进程linux获取对应pid,Linux Shell根据进程名获取PID
  12. ADT-bundle(Android Development Tools)环境配置
  13. [转]跨域资源共享 CORS 详解
  14. SAP 下载EXCEL模板
  15. html5dragw3c,HTML5拖拽功能drag
  16. php http请求 返回数据包太大 499,http错误码原理及复现 - 499,500,502,504
  17. HTML5七夕情人节表白网页制作【唯美3D相册】HTML+CSS+JavaScript
  18. 地图开发技术报告(百度地图)
  19. 申请加精—ERP实施方法论的比较(SAP、 Oracle、J.D.E、BANN、用友等实施方法论)...
  20. 如何对长期出差的员工进行“人文关怀”,前提是差旅标准仍然很抠?

热门文章

  1. 接口测试 Pytest的简单示例
  2. nginx【nginx跨域、nginx开启gizp压缩、nginx服务器部署项目】
  3. 通过sql给表添加字段
  4. VMware虚拟机超简单的联网方法
  5. xiandian云计算平台IAAS图文篇-controller节点
  6. 如何使用抽象类和抽象方法
  7. (4)数据分析-正态性检验与方差齐性检验
  8. 2017计算机研究生专业排名,2017年USNews美国大学计算机硕士研究生专业排名TOP110...
  9. 《西西弗神话》读后感
  10. python中iloc切片_Dataframe选择行列loc,iloc,切片,布尔索引,条件判断等