flink任务提交到yarn执行几天后报错:

2022-01-05 15:09:26,288 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed checkpoint 89574 for job cc0abb4a3cd870b2a9e1abc7235ceb91 (3528 bytes in 610 ms).
2022-01-05 15:09:29,544 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@prod-bigdata-pc3:42636] has failed, address is now gated for [50] ms. Reason: [Disassociated]
2022-01-05 15:09:30,678 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering checkpoint 89575 (type=CHECKPOINT) @ 1641366570678 for job cc0abb4a3cd870b2a9e1abc7235ceb91.
2022-01-05 15:09:30,729 WARN  akka.remote.transport.netty.NettyTransport                   [] - Remote connection to [null] failed with java.net.ConnectException: 拒绝连接: prod-bigdata-pc3/10.5.2.133:42636
2022-01-05 15:09:30,729 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@prod-bigdata-pc3:42636] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@prod-bigdata-pc3:42636]] Caused by: [java.net.ConnectException: 拒绝连接: prod-bigdata-pc3/10.5.2.133:42636]
2022-01-05 15:09:31,482 INFO  org.apache.flink.yarn.YarnResourceManager                    [] - Closing TaskExecutor connection container_e27_1640598151061_2774_01_000002 because: Container released on a *lost* node
2022-01-05 15:09:31,495 WARN  akka.remote.transport.netty.NettyTransport                   [] - Remote connection to [null] failed with java.net.ConnectException: 拒绝连接: prod-bigdata-pc3/10.5.2.133:42636
2022-01-05 15:09:31,496 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@prod-bigdata-pc3:42636] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@prod-bigdata-pc3:42636]] Caused by: [java.net.ConnectException: 拒绝连接: prod-bigdata-pc3/10.5.2.133:42636]
2022-01-05 15:09:31,492 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: Custom Source -> (Filter -> Map -> Sink: Unnamed, Filter -> Map -> Filter -> Map -> Sink: Unnamed) (1/1) (7d96c28b36c1b80514f188d59a885ca4) switched from RUNNING to FAILED on container_e27_1640598151061_2774_01_000002 @ prod-bigdata-pc3 (dataPort=44807).
java.lang.Exception: Container released on a *lost* nodeat org.apache.flink.yarn.YarnResourceManager.lambda$onContainersCompleted$0(YarnResourceManager.java:370) ~[flink-dist_2.12-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [akka-actor_2.11-2.5.21.jar:2.5.21]at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [scala-library-2.11.12.jar:?]at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [akka-actor_2.11-2.5.21.jar:2.5.21]at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [scala-library-2.11.12.jar:?]at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [scala-library-2.11.12.jar:?]at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [scala-library-2.11.12.jar:?]at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.actor.ActorCell.invoke(ActorCell.scala:561) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.Mailbox.run(Mailbox.scala:225) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [akka-actor_2.11-2.5.21.jar:2.5.21]
2022-01-05 15:09:31,533 INFO  org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - Calculating tasks to restart to recover the failed task 4081cf0163fcce7fe6af0cf07ad2d43c_0.
2022-01-05 15:09:31,539 INFO  org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - 1 tasks should be restarted to recover the failed task 4081cf0163fcce7fe6af0cf07ad2d43c_0.
2022-01-05 15:09:31,547 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job MainKafka2CK (cc0abb4a3cd870b2a9e1abc7235ceb91) switched from state RUNNING to RESTARTING.
2022-01-05 15:09:31,556 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding the results produced by task execution 7d96c28b36c1b80514f188d59a885ca4.
2022-01-05 15:09:31,883 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: Custom Source -> (Filter -> Map -> Sink: Unnamed, Filter -> Map -> Filter -> Map -> Sink: Unnamed) (1/1) (d556e4eb39bea674f6e10f51b009a535) switched from RUNNING to FAILED on container_e27_1640598151061_2774_01_000002 @ prod-bigdata-pc3 (dataPort=44807).
java.lang.Exception: Container released on a *lost* nodeat org.apache.flink.yarn.YarnResourceManager.lambda$onContainersCompleted$0(YarnResourceManager.java:370) ~[flink-dist_2.12-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]

这个问题很可能是某个nodemanager节点资源不足,可能是由于它的CPU或磁盘使用率较高导致被标记为“lost”,想知道具体原因可以去这个被标记lost的节点上查看nodemanager的日志,被标记为lost的nodemanager通常需要重启下恢复

Flink on Yarn报错:Container released on a *lost* node相关推荐

  1. [Flink] Flink运行报错Container released on a *lost* node

    文章目录 1.背景 2.源码 2.1 onContainersAllocated 2.2 getContainersFromPreviousAttempts 3. 其他 M.扩展 本文为博主九师兄(Q ...

  2. Flink : Flink run yarn 报错 could not build the program from jar file -ynm

    文章目录 1.美图 2.场景1 2.1 背景 Run a Flink job on YARN 3.场景2 1.美图 2.场景1 2.1 背景 因为升级flink 1.9 到flink 1.10 然后最 ...

  3. Flink ON YARN 报错及解决方案

    java.lang.NoClassDefFoundError: Could not initialize class org.apache.flink.runtime.entrypoint.parse ...

  4. Flink on yarn Container released on a *lost* node

    目录 背景 Yarn 上面查看日志 背景 FLink on yarn Cluster 模式运行一段时间后,程序突然报错, 查找Exceotion 发现 "Container released ...

  5. 【Flink】Flink 提交任务到yarn报错 proxy provider ConfiguredFailoverProxyProvider NetUtils.getSocketAddressS

    文章目录 1.概述 1.概述 Flink 提交任务到yarn报错 Couldn't create proxy provider class org.apache.hadoop.hdfs.server. ...

  6. 5.Apache Kylin 构建 第一步报错 Container complete event for unknown container

    版本: Apache Kylin 3.0.0 一.问题 build cube第一步(#1 Step Name: Create Intermediate Flat Hive)报错 Container c ...

  7. Yarn 报错:error Couldn‘t publish package: “https://registry.yarnpkg.com/ 。。。Forbidden“

    Yarn 报错:error Couldn't publish package: "https://registry.yarnpkg.com/generator-alison-vue: For ...

  8. Yarn报错:error Couldn‘t publish package: “https://registry。。。 Are you logged in as the correct user?“

    Yarn报错:error Couldn't publish package: "https://registry.yarnpkg.com/generator-vue: You do not ...

  9. yarn报错:error An unexpected error occurred: “https://registry.yarnpkg.com/-/user/org.couchdb。。。

    Yarn 报错:error An unexpected error occurred: "https://registry.yarnpkg.com/-/user/org.couchdb.us ...

最新文章

  1. MySQL数据库修改表结构
  2. c++builder传递参数_参数按值传递和按地址传递过程中,指向内存的指针如何变化?...
  3. 科大星云诗社动态20210521
  4. 怎样远程连接服务器后上传文件,远程登录服务器后怎样上传文件
  5. 最新28个很棒的 jQuery 教程
  6. 内核态文件操作【转】
  7. 互联网创业团队需要什么样的人
  8. HDFS存储大量小文件的问题及解决方案
  9. java默认字符串排序规则_Java 字符串排序--------请对一组字符串进行排序,字符串由大小写字母和数字组成,需要满足一下比较规则...
  10. JSON_UNQUOTE 和JSON_EXTRACT 的简单认识
  11. SSL/TLS(3): CA证书解释
  12. pdf文件如何生成目录 wps_怎样快速为WPS文档增加目录
  13. 频繁gc是什么意思_linux查看是否频繁gc
  14. 运放自激震荡的大杂烩总结
  15. 微信小程序+.NET(九) 小程序之简单的广告拦截
  16. Linux三剑客 grep sed awk 详细使用方法
  17. 对付木马:空手入白刃谁动了我的电脑系统(转)
  18. 方差、标准差、均方误差
  19. 如何配置tomcat服务
  20. TCP/IP 之 大明邮差

热门文章

  1. linux从零基础开始
  2. 新手怎么把java源码做成app_怎么样将1个编写好的程序源代码做成1个软件!
  3. vue中使用keep-alive来优化网页性能
  4. 【学校联考】CQYZ_Vijos_P3755 轰炸
  5. elasticsearch 数据类型
  6. 怪物掉落装备修改属性
  7. 服务器搭建邮件自动回复,爆笑的邮件自动回复内容,邮件自动回复心理
  8. 免费领7天腾讯视频VIP/优酷会员!
  9. 冬天洗衣不动手,这几款智慧洗衣机可以帮到你
  10. python为循环线条增加颜色_python – Matplotlib:如何将线条颜色设置为橙色,并指定线条标记?...