Flink on Yarn报错:Container released on a *lost* node
flink任务提交到yarn执行几天后报错:
2022-01-05 15:09:26,288 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 89574 for job cc0abb4a3cd870b2a9e1abc7235ceb91 (3528 bytes in 610 ms). 2022-01-05 15:09:29,544 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@prod-bigdata-pc3:42636] has failed, address is now gated for [50] ms. Reason: [Disassociated] 2022-01-05 15:09:30,678 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 89575 (type=CHECKPOINT) @ 1641366570678 for job cc0abb4a3cd870b2a9e1abc7235ceb91. 2022-01-05 15:09:30,729 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with java.net.ConnectException: 拒绝连接: prod-bigdata-pc3/10.5.2.133:42636 2022-01-05 15:09:30,729 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@prod-bigdata-pc3:42636] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@prod-bigdata-pc3:42636]] Caused by: [java.net.ConnectException: 拒绝连接: prod-bigdata-pc3/10.5.2.133:42636] 2022-01-05 15:09:31,482 INFO org.apache.flink.yarn.YarnResourceManager [] - Closing TaskExecutor connection container_e27_1640598151061_2774_01_000002 because: Container released on a *lost* node 2022-01-05 15:09:31,495 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with java.net.ConnectException: 拒绝连接: prod-bigdata-pc3/10.5.2.133:42636 2022-01-05 15:09:31,496 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@prod-bigdata-pc3:42636] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@prod-bigdata-pc3:42636]] Caused by: [java.net.ConnectException: 拒绝连接: prod-bigdata-pc3/10.5.2.133:42636] 2022-01-05 15:09:31,492 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source -> (Filter -> Map -> Sink: Unnamed, Filter -> Map -> Filter -> Map -> Sink: Unnamed) (1/1) (7d96c28b36c1b80514f188d59a885ca4) switched from RUNNING to FAILED on container_e27_1640598151061_2774_01_000002 @ prod-bigdata-pc3 (dataPort=44807). java.lang.Exception: Container released on a *lost* nodeat org.apache.flink.yarn.YarnResourceManager.lambda$onContainersCompleted$0(YarnResourceManager.java:370) ~[flink-dist_2.12-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [akka-actor_2.11-2.5.21.jar:2.5.21]at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [scala-library-2.11.12.jar:?]at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [akka-actor_2.11-2.5.21.jar:2.5.21]at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [scala-library-2.11.12.jar:?]at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [scala-library-2.11.12.jar:?]at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [scala-library-2.11.12.jar:?]at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.actor.ActorCell.invoke(ActorCell.scala:561) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.Mailbox.run(Mailbox.scala:225) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [akka-actor_2.11-2.5.21.jar:2.5.21]at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [akka-actor_2.11-2.5.21.jar:2.5.21] 2022-01-05 15:09:31,533 INFO org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - Calculating tasks to restart to recover the failed task 4081cf0163fcce7fe6af0cf07ad2d43c_0. 2022-01-05 15:09:31,539 INFO org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - 1 tasks should be restarted to recover the failed task 4081cf0163fcce7fe6af0cf07ad2d43c_0. 2022-01-05 15:09:31,547 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job MainKafka2CK (cc0abb4a3cd870b2a9e1abc7235ceb91) switched from state RUNNING to RESTARTING. 2022-01-05 15:09:31,556 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Discarding the results produced by task execution 7d96c28b36c1b80514f188d59a885ca4. 2022-01-05 15:09:31,883 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source -> (Filter -> Map -> Sink: Unnamed, Filter -> Map -> Filter -> Map -> Sink: Unnamed) (1/1) (d556e4eb39bea674f6e10f51b009a535) switched from RUNNING to FAILED on container_e27_1640598151061_2774_01_000002 @ prod-bigdata-pc3 (dataPort=44807). java.lang.Exception: Container released on a *lost* nodeat org.apache.flink.yarn.YarnResourceManager.lambda$onContainersCompleted$0(YarnResourceManager.java:370) ~[flink-dist_2.12-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154) ~[flink-runtime_2.11-1.11.3.jar:1.11.3]
这个问题很可能是某个nodemanager节点资源不足,可能是由于它的CPU或磁盘使用率较高导致被标记为“lost”,想知道具体原因可以去这个被标记lost的节点上查看nodemanager的日志,被标记为lost的nodemanager通常需要重启下恢复
Flink on Yarn报错:Container released on a *lost* node相关推荐
- [Flink] Flink运行报错Container released on a *lost* node
文章目录 1.背景 2.源码 2.1 onContainersAllocated 2.2 getContainersFromPreviousAttempts 3. 其他 M.扩展 本文为博主九师兄(Q ...
- Flink : Flink run yarn 报错 could not build the program from jar file -ynm
文章目录 1.美图 2.场景1 2.1 背景 Run a Flink job on YARN 3.场景2 1.美图 2.场景1 2.1 背景 因为升级flink 1.9 到flink 1.10 然后最 ...
- Flink ON YARN 报错及解决方案
java.lang.NoClassDefFoundError: Could not initialize class org.apache.flink.runtime.entrypoint.parse ...
- Flink on yarn Container released on a *lost* node
目录 背景 Yarn 上面查看日志 背景 FLink on yarn Cluster 模式运行一段时间后,程序突然报错, 查找Exceotion 发现 "Container released ...
- 【Flink】Flink 提交任务到yarn报错 proxy provider ConfiguredFailoverProxyProvider NetUtils.getSocketAddressS
文章目录 1.概述 1.概述 Flink 提交任务到yarn报错 Couldn't create proxy provider class org.apache.hadoop.hdfs.server. ...
- 5.Apache Kylin 构建 第一步报错 Container complete event for unknown container
版本: Apache Kylin 3.0.0 一.问题 build cube第一步(#1 Step Name: Create Intermediate Flat Hive)报错 Container c ...
- Yarn 报错:error Couldn‘t publish package: “https://registry.yarnpkg.com/ 。。。Forbidden“
Yarn 报错:error Couldn't publish package: "https://registry.yarnpkg.com/generator-alison-vue: For ...
- Yarn报错:error Couldn‘t publish package: “https://registry。。。 Are you logged in as the correct user?“
Yarn报错:error Couldn't publish package: "https://registry.yarnpkg.com/generator-vue: You do not ...
- yarn报错:error An unexpected error occurred: “https://registry.yarnpkg.com/-/user/org.couchdb。。。
Yarn 报错:error An unexpected error occurred: "https://registry.yarnpkg.com/-/user/org.couchdb.us ...
最新文章
- MySQL数据库修改表结构
- c++builder传递参数_参数按值传递和按地址传递过程中,指向内存的指针如何变化?...
- 科大星云诗社动态20210521
- 怎样远程连接服务器后上传文件,远程登录服务器后怎样上传文件
- 最新28个很棒的 jQuery 教程
- 内核态文件操作【转】
- 互联网创业团队需要什么样的人
- HDFS存储大量小文件的问题及解决方案
- java默认字符串排序规则_Java 字符串排序--------请对一组字符串进行排序,字符串由大小写字母和数字组成,需要满足一下比较规则...
- JSON_UNQUOTE 和JSON_EXTRACT 的简单认识
- SSL/TLS(3): CA证书解释
- pdf文件如何生成目录 wps_怎样快速为WPS文档增加目录
- 频繁gc是什么意思_linux查看是否频繁gc
- 运放自激震荡的大杂烩总结
- 微信小程序+.NET(九) 小程序之简单的广告拦截
- Linux三剑客 grep sed awk 详细使用方法
- 对付木马:空手入白刃谁动了我的电脑系统(转)
- 方差、标准差、均方误差
- 如何配置tomcat服务
- TCP/IP 之 大明邮差
热门文章
- linux从零基础开始
- 新手怎么把java源码做成app_怎么样将1个编写好的程序源代码做成1个软件!
- vue中使用keep-alive来优化网页性能
- 【学校联考】CQYZ_Vijos_P3755 轰炸
- elasticsearch 数据类型
- 怪物掉落装备修改属性
- 服务器搭建邮件自动回复,爆笑的邮件自动回复内容,邮件自动回复心理
- 免费领7天腾讯视频VIP/优酷会员!
- 冬天洗衣不动手,这几款智慧洗衣机可以帮到你
- python为循环线条增加颜色_python – Matplotlib:如何将线条颜色设置为橙色,并指定线条标记?...