本地提交任务到Spark集群报错:Initial job has not accepted any resources

错误信息如下:

18/04/17 18:18:14 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks

18/04/17 18:18:29 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

将该python文件放到集群机器上提交到spark就没有问题。后来尝试在本机执行Spark自带的example,问题依旧存在。

虽然是WARN,但是任务并未成功执行,在Spark的webui里也一直是运行状态。我在本机和集群上执行的命令分别如下:

bin\spark-submit --master spark://192.168.3.207:7077 examples\src\main\python\pi.py

./spark-submit --master spark://192.168.3.207:7077 ../examples/src/main/python/pi.py执行的都是spark自带的例子。

从网上找的解决办法大概有2个,都不好使,先在此记录一下:

1)加大执行内存:

bin\spark-submit --driver-memory 2000M --executor-memory 2000M --master spark://192.168.3.207:7077 examples\src\main\python\pi.py

2)修改防火墙或放开对spark的限制,或者暂时先关闭。

继续查看master和slave各自的log,也没有错误,后来到master的webui界面:http://192.168.3.207:8080/,点击刚才的任务进去:

点击某个workder的stderr,内容如下:18/04/17 18:55:54 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 23412@he-200

18/04/17 18:55:54 INFO SignalUtils: Registered signal handler for TERM

18/04/17 18:55:54 INFO SignalUtils: Registered signal handler for HUP

18/04/17 18:55:54 INFO SignalUtils: Registered signal handler for INT

18/04/17 18:55:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

18/04/17 18:55:55 INFO SecurityManager: Changing view acls to: he,shaowei.liu

18/04/17 18:55:55 INFO SecurityManager: Changing modify acls to: he,shaowei.liu

18/04/17 18:55:55 INFO SecurityManager: Changing view acls groups to:

18/04/17 18:55:55 INFO SecurityManager: Changing modify acls groups to:

18/04/17 18:55:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(he, shaowei.liu); groups with view permissions: Set(); users  with modify permissions: Set(he, shaowei.liu); groups with modify permissions: Set()

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)

at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)

at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)

at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)

at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)

Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply from 192.168.56.1:51378 in 120 seconds. This timeout is controlled by spark.rpc.askTimeout

at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)

at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)

at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)

...

Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from 192.168.56.1:51378 in 120 seconds

... 8 more

18/04/17 18:57:55 ERROR RpcOutboxMessage: Ask timeout before connecting successfully

发现日志报连接192.168.56.1:51378超时。问题是这个ip是哪里来的呢?查看下自己机器ip,命令行执行ipconfig,问题找到了:192.168.56.1是我本机Docker创建的VirtualBox虚拟网络IP。应该是本地在提交任务到集群时,没有正确获取到本机的ip地址,导致集群节点接受任务一直超时。解决办法很简单:把该网络禁用。

再试一次,很快就执行完毕了。

bin\spark-submit --master spark://192.168.3.207:7077 examples\src\main\python\pi.py

再看下webui里的日志,发现集群节点要连接我本机,然后将我的任务pi.py,传到节点临时目录/tmp/spark-xxx/,并拷贝到$SPARM_HOME/work/下才真正执行。以后有时间再学习下具体流程。顺便把日志贴出来:

18/04/17 19:13:11 INFO TransportClientFactory: Successfully created connection to /192.168.0.138:51843 after 3 ms (0 ms spent in bootstraps)

18/04/17 19:13:11 INFO DiskBlockManager: Created local directory at /tmp/spark-67d75b11-65e7-4bc7-89b5-c07fb159470f/executor-b8ce41a3-7c6e-49f6-95ef-7ed6cdef8e53/blockmgr-030eb78d-e46b-4feb-b7b7-108f9e61ec85

18/04/17 19:13:11 INFO MemoryStore: MemoryStore started with capacity 366.3 MB

18/04/17 19:13:12 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@192.168.0.138:51843

18/04/17 19:13:12 INFO WorkerWatcher: Connecting to worker spark://Worker@192.168.3.102:34041

18/04/17 19:13:12 INFO TransportClientFactory: Successfully created connection to /192.168.3.102:34041 after 0 ms (0 ms spent in bootstraps)

18/04/17 19:13:12 INFO WorkerWatcher: Successfully connected to spark://Worker@192.168.3.102:34041

18/04/17 19:13:12 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, 192.168.3.102, 44683, None)

18/04/17 19:13:12 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, 192.168.3.102, 44683, None)

18/04/17 19:13:12 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, 192.168.3.102, 44683, None)

18/04/17 19:13:14 INFO CoarseGrainedExecutorBackend: Got assigned task 0

18/04/17 19:13:14 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)

18/04/17 19:13:14 INFO Executor: Fetching spark://192.168.0.138:51843/files/pi.py with timestamp 1523963609005

18/04/17 19:13:14 INFO TransportClientFactory: Successfully created connection to /192.168.0.138:51843 after 1 ms (0 ms spent in bootstraps)

18/04/17 19:13:14 INFO Utils: Fetching spark://192.168.0.138:51843/files/pi.py to /tmp/spark-67d75b11-65e7-4bc7-89b5-c07fb159470f/executor-b8ce41a3-7c6e-49f6-95ef-7ed6cdef8e53/spark-98745f3b-2f70-47b2-8c56-c5b9f6eac496/fetchFileTemp2255624304256249008.tmp

18/04/17 19:13:14 INFO Utils: Copying /tmp/spark-67d75b11-65e7-4bc7-89b5-c07fb159470f/executor-b8ce41a3-7c6e-49f6-95ef-7ed6cdef8e53/spark-98745f3b-2f70-47b2-8c56-c5b9f6eac496/-11088979641523963609005_cache to /home/ubutnu/spark_2_2_1/work/app-20180417191311-0005/1/./pi.py

……

18/04/17 19:13:14 INFO TransportClientFactory: Successfully created connection to /192.168.0.138:51866 after 5 ms (0 ms spent in bootstraps)

18/04/17 19:13:14 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1803 bytes result sent to driver

……

18/04/17 19:13:16 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown

18/04/17 19:13:16 INFO MemoryStore: MemoryStore cleared

18/04/17 19:13:16 INFO ShutdownHookManager: Shutdown hook called

18/04/17 19:13:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-67d75b11-65e7-4bc7-89b5-c07fb159470f/executor-b8ce41a3-7c6e-49f6-95ef-7ed6cdef8e53/spark-98745f3b-2f70-47b2-8c56-c5b9f6eac496

spark 序列化错误 集群提交时_【问题解决】本地提交任务到Spark集群报错:Initial job has not accepted any resources...相关推荐

  1. spark 序列化错误 集群提交时_spark 面试题(2)

    33.选择题 二.选择题 1. Spark 的四大组件下面哪个不是 (D ) A.Spark Streaming    B. Mlib C Graphx    D.Spark R 2.下面哪个端口不是 ...

  2. tar命令打包压缩时排除.svn .git .repo等特定文件/文件夹报错/未生效

    tar命令打包压缩时排除.svn .git .repo等特定文件/文件夹报错/未生效 当使用tar命令打包压缩的同时期望排除某类文件或文件夹时,可以使用--exclude参数. 在ubuntu20.0 ...

  3. spark 序列化错误 集群提交时_Spark统一内存管理机制

    一.内存的分配 预留内存:300m 可用内存 = 系统内存 -  预留内存 可用内存 = 统一内存(60%) + 其他 (40%) 统一内存 = 存储内存(Storage)50%  + 执行内存(Ex ...

  4. aspose条件格式无法读取_分析 Pandas 源码,解决读取 Excel 报错问题

    问题描述 使用 Pandas 的 read_excel 方法读取一个 16 万行的 Excel 文件报 AssertionError 错误: "/Users/XXX/excel_test/v ...

  5. 数据库建表时, 没有成功创建表 No migrations to apply报错原因和解决方法

    一.在cmd中执行执行python manage.py makemigrations可以顺利创建0001_initial.py文件,但继续执行python manage.py migrate时出现No ...

  6. spark无法与服务器建立稳定连接,无法从本地主机上的Spark连接到Openfire

    我已经在我的Mac本地主机上安装了Openfire,我们已启动并运行它.我已经成功创建了一个测试用户.要测试连接,我已经安装了星火2.8.3用输入以下细节如图所示截图如下无法从本地主机上的Spark连 ...

  7. cmd mysql 报错_客户端cmd打开mysql,执行插入中文报错或插入中文乱码解决方案

    最近在制作一个安装包,需要安装的时候执行mysql脚本儿,做了一个批处理,但是发现总是执行到 插入中文的时候报错,或者插入中文是乱码. 网上查了好多资料,说是把编码改成GBK什么的,终究还是不成功. ...

  8. Springboot 启动时Bean初始化,启动异常-Assert.isTrue(condition,message) 报错

    Springboot 启动时Bean初始化启动异常Assert.isTrue(condition,message) 报错,如果 condition为false 则会出现 java.lang.Illeg ...

  9. java总是标点符号报错_[javamail]AUTH LOGIN failed;Invalid username or password报错

    项目中需要用到javamailAPI,邮箱服务器用的sohu闪电邮,SMTP协议用来发送,赋值代码: Properties props = new Properties(); props.setPro ...

最新文章

  1. LTE QCI分类 QoS
  2. 如何从值获取C#枚举描述? [重复]
  3. 分行与支行有什么区别
  4. 进入postgresql
  5. JavaSE(十一)——多线程
  6. Linux 内核宏 time_after解析
  7. 润米咨询创始人刘润:传统企业数字化转型之道
  8. 古迪纳夫等3人获得诺贝尔化学奖 确立锂离子电池构成
  9. 与context的关系_Go中的Context超时和关闭是如何实现的呢?
  10. php bloginfo templatedirectory,PHP变量不显示使用bloginfo('template_directory')的图像
  11. oracle应用技术期末考试,Oracle数据库应用技术
  12. linux shell 脚本 入门到实战详解[⭐建议收藏!!⭐]
  13. Veritas Backup Exec 22 (Windows)
  14. svn忽略文件不提交
  15. android人脸识别的背景图_Android原生人脸识别Camera2+FaceDetector 快速实现人脸跟踪...
  16. 真惭愧--连这样的小事都没有坚持下来
  17. Unity3D U3D安装教程
  18. java 拼图游戏_Java学员作品-拼图游戏
  19. Striped64 api详解
  20. 分解成质因数(如435234=251*17*17*3*2

热门文章

  1. pytorch学习笔记(九):PyTorch结构介绍
  2. 大三后端暑期实习面经总结——SSM微服务框架篇
  3. VS Code搭建C/C++开发环境超详细教程
  4. 使用Jittor实现Conditional GAN
  5. 多级中间表示概述MLIR
  6. 2021年大数据ELK(二十五):添加Elasticsearch数据源
  7. 2021年大数据Spark(二十八):SparkSQL案例三电影评分数据分析
  8. [JAVA EE] Filter过滤器
  9. [19/03/30-星期六] IO技术_四大抽象类_ 字节流( 字节输入流 InputStream 、字符输出流 OutputStream )_(含字节文件缓冲流)...
  10. linux的mount(挂载)命令