flink on yarn模式出现The main method caused an error: Could not deploy Yarn job cluster问题排查+解决
报错复现:
flink run -m yarn-cluster -p 2 -yjm 700m -ytm 1024m -c WordCount target/bbb-1.0-SNAPSHOT.jar
完整报错如下:
The program finished with the following exception:org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Could not deploy Yarn job cluster.at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662)at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893)at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966)
Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy Yarn job cluster.at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:398)at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:70)at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1733)at org.apache.flink.streaming.api.environment.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:94)at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:63)at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620)at WordCount.main(WordCount.java:47)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)... 11 more
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1591614969089_0002 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1591614969089_0002_000001 exited with exitCode: 1
Failing this attempt.Diagnostics: [2020-06-08 19:18:12.457]Exception from container-launch.
Container id: container_1591614969089_0002_01_000001
Exit code: 1[2020-06-08 19:18:12.466]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :[2020-06-08 19:18:12.467]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :For more detailed output, check the application tracking page: http://Desktop:8188/applicationhistory/app/application_1591614969089_0002 Then click on links to logs of each attempt.
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1591614969089_0002at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:391)... 22 more
2020-06-08 19:18:12,659 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cancelling deployment from Deployment Failure Hook
2020-06-08 19:18:12,660 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at Desktop/192.168.0.103:8032
2020-06-08 19:18:12,661 INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at Desktop/192.168.0.103:10201
2020-06-08 19:18:12,661 INFO org.apache.flink.yarn.YarnClusterDescriptor - Killing YARN application
2020-06-08 19:18:12,668 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1591614969089_0002
2020-06-08 19:18:12,769 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deleting files in hdfs://Desktop:9000/user/appleyuchi/.flink/application_1591614969089_0002.
比较难排查的一个报错,注意确保HADOOP的日志服务器打开,即确保jps中有:
JobHistoryServer,启动命令为:
"$HADOOP_HOME/bin/mapred --daemon start historyserver"
打开时间线服务器
yarn timelineserver
进行完上述操作后,yarn界面的各个端口应该都能打开了。
#######################################################################################
然后在yarn界面的log中看到如下报错:
2020-06-08 19:21:02,071 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Shutting YarnJobClusterEntrypoint down with application status FAILED. Diagnostics org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8082at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:228)at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)... 9 more
.
2020-06-08 19:21:02,076 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:37633
2020-06-08 19:21:02,077 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
2020-06-08 19:21:02,082 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
2020-06-08 19:21:02,087 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2020-06-08 19:21:02,088 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2020-06-08 19:21:02,095 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2020-06-08 19:21:02,095 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2020-06-08 19:21:02,110 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2020-06-08 19:21:02,110 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2020-06-08 19:21:02,130 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
2020-06-08 19:21:02,131 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
2020-06-08 19:21:02,132 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Could not start cluster entrypoint YarnJobClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)... 2 more
Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8082at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:228)at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)... 9 more
##############################################################
端口问题,但是这个端口并没有占用啊,所以我也懵逼了一会儿。
犯错原因:
这两个文件中的端口要保持统一,我忘记修改masters文件了,从而导致了上述复杂的报错。
这里之所以默认的8081要改成8082是因为8081被spark给占用了,所以我当时修改完flink-conf.yaml就忘乎所以了。
最终解决方案:
flink-conf.yaml:rest.port: 8082
masters:Desktop:8082
然后别忘记这两个文件同步更新到集群中的其他节点。
关闭眼前的所有终端,重新开一个终端,因为配置文件只有在你开启新终端的情况下才会生效。
flink on yarn模式出现The main method caused an error: Could not deploy Yarn job cluster问题排查+解决相关推荐
- Flink : exitCode=1 the main method caused an error: could not deploy yarn job cluster
1.美图 2.背景 执行一个flink on yarn命令 su -hdfs -c "export HADOOP_CONF_DIR=xxx && export HADOOP_ ...
- org.apache.flink.client.program.ProgramInvocationException: The main method caused an error
flink任务开启检查点并设置状态后端后,提交任务运行,出现以上错误,具体错误如下: org.apache.flink.client.program.ProgramInvocationExceptio ...
- spark on yarn模式下SparkStream整合kafka踩的各种坑(已解决)_fqzzzzz的博客
项目场景: 使用sparkStream接收kafka的数据进行计算,并且打包上传到linux进行spark任务的submit 错误集合: 1.错误1: Failed to add file:/usr/ ...
- Spark的安装(Standalone模式,高可用模式,基于Yarn模式)
目录 spark的Standalone模式安装 一.安装流程 1.将spark-2.2.0-bin-hadoop2.7.tgz 上传到 /usr/local/spark/ 下,然后解压 2.进入到c ...
- 2021年大数据Flink(六):Flink On Yarn模式
目录 Flink On Yarn模式 原理 为什么使用Flink On Yarn? Flink如何和Yarn进行交互? 两种方式 操作 1.关闭yarn的内存检查 2.同步 3.重启yarn 测试 S ...
- Flink On Yarn模式,为什么使用Flink On Yarn?Session模式、Per-Job模式、关闭yarn的内存检查,由Yarn模式切换回standalone模式时需要注意的点
Flink On Yarn模式 原理 为什么使用Flink On Yarn? 在实际开发中,使用Flink时,更多的使用方式是Flink On Yarn模式,原因如下: -1.Yarn的资源可以按需使 ...
- flink的Yarn模式
以Yarn模式部署的Flink任务时,要求Flink是有Hadoop支持的版本,并且集群中安装HDFS服务 Flink on YarnFlink提供了两种在yarn上运行的模式,分别为Session- ...
- 【FLINK 】 Flink on YARN模式下TaskManager的内存分配
解决背景: 总的ytm分配的不变的情况下怎么划分给堆内内存JVM 一个更大的内存空间 对于心急的同学来说,我们直接先给一个解决方案,后面想去了解的再往下看: 原来的命令,-ytm 8192,分配给ta ...
- flink yarn模式HA部署
文章目录 1.yarn cluster 模式部署介绍 2.flink session HA模式 3.flink-per-job模式 该文章基于上一篇: Flink的local和standalone H ...
最新文章
- 计算机导论excel,[计算机导论实验三Excel.doc
- nginx配置详解与优化
- unity, 非public变量需要加[SerializeField]才能序列化
- 3.定义一个有10个元素的数组,用其代表10个学生的考试成绩,从键盘输入10个成绩,统计平均成绩。
- 【图论】【模板】静态仙人掌(luogu 5236)
- WebStrom Sass 编译配置 windows
- 从零开始撸一个Kotlin Demo
- mui dtpicker 时间的设置 以及MUI的弹窗
- 命令行开启一个unity实例和执行其中的脚本方法的使用和注意
- django多条件筛选搜索(项目实例)
- 群晖ds3617xs_23739虚拟机安装与半洗白教程
- python生成随机imei
- Buffer Overflow with Shellcode-protostar-stak5-bin-0x06
- 红警3修改器无法连接服务器,红警3序列号修改器-不能加入游戏怎么办?红警3连局域网说cd-– 手机爱问...
- 第三方软件测试z5x电池,vivo Z5x第三方续航测试结果公布,刷新手机业续航排行榜...
- vue引入Echarts画饼图详解
- 如何查看夜神、逍遥模拟器的端口
- 小程序RSA加密 - 公钥加密
- 5-3中央处理器-数据通路的功能和基本结构
- 百度推广——搜索营销新视角(百度官方出品,俞敏洪、吴晓波、徐雷力荐!)
热门文章
- 工作中常用到的一些方法集合
- Cheatsheet: 2011 12.01 ~ 12.12
- WeChall_PHP-Local File Inclusion(LFI)
- Python编写自动化脚本(无验证码)
- 分享一张前端知识点思维导图
- java创建指定日期_如何创建指定的日期和时间
- 安装了silverlight还是提示_win10系统安装.netframework3.5方法
- 单片机中断机制对日常生活的启示_单片机原理部分课后习题解
- ES6结构赋值的用途
- JS复制内容到剪贴板