spark集群环境下Lost task 0.0 in stage 10.0 (TID 17, 10.28.23.202): java.io.FileNotFoundException
spark从当前目录加载文件报错,Lost task 0.0 in stage 10.0 (TID 17, 10.28.23.202): java.io.FileNotFoundException,明显的,找不到本地的文件,但是本地的文件是存在的。
scala> val file = sc.textFile("test.txt")
15/12/09 13:22:36 INFO MemoryStore: ensureFreeSpace(191856) called with curMem=717340, maxMem=277877882
15/12/09 13:22:36 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 187.4 KB, free 264.1 MB)
15/12/09 13:22:36 INFO MemoryStore: ensureFreeSpace(19750) called with curMem=909196, maxMem=277877882
15/12/09 13:22:36 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 19.3 KB, free 264.1 MB)
15/12/09 13:22:36 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 10.28.23.201:60179 (size: 19.3 KB, free: 264.9 MB)
15/12/09 13:22:36 INFO SparkContext: Created broadcast 14 from textFile at <console>:21
file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[10] at textFile at <console>:21
scala> file foreach println
15/12/09 13:22:38 INFO FileInputFormat: Total input paths to process : 1
15/12/09 13:22:38 INFO SparkContext: Starting job: foreach at <console>:24
15/12/09 13:22:38 INFO DAGScheduler: Got job 10 (foreach at <console>:24) with 2 output partitions (allowLocal=false)
15/12/09 13:22:38 INFO DAGScheduler: Final stage: ResultStage 10(foreach at <console>:24)
15/12/09 13:22:38 INFO DAGScheduler: Parents of final stage: List()
15/12/09 13:22:38 INFO DAGScheduler: Missing parents: List()
15/12/09 13:22:38 INFO DAGScheduler: Submitting ResultStage 10 (MapPartitionsRDD[10] at textFile at <console>:21), which has no missing parents
15/12/09 13:22:38 INFO MemoryStore: ensureFreeSpace(3080) called with curMem=928946, maxMem=277877882
15/12/09 13:22:38 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 3.0 KB, free 264.1 MB)
15/12/09 13:22:38 INFO MemoryStore: ensureFreeSpace(1795) called with curMem=932026, maxMem=277877882
15/12/09 13:22:38 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 1795.0 B, free 264.1 MB)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 10.28.23.201:60179 (size: 1795.0 B, free: 264.9 MB)
15/12/09 13:22:38 INFO SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:874
15/12/09 13:22:38 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 10 (MapPartitionsRDD[10] at textFile at <console>:21)
15/12/09 13:22:38 INFO TaskSchedulerImpl: Adding task set 10.0 with 2 tasks
15/12/09 13:22:38 INFO TaskSetManager: Starting task 0.0 in stage 10.0 (TID 17, 10.28.23.202, PROCESS_LOCAL, 1397 bytes)
15/12/09 13:22:38 INFO TaskSetManager: Starting task 1.0 in stage 10.0 (TID 18, 10.28.23.203, PROCESS_LOCAL, 1397 bytes)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 10.28.23.203:57813 (size: 1795.0 B, free: 265.0 MB)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 10.28.23.202:50706 (size: 1795.0 B, free: 265.0 MB)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 10.28.23.202:50706 (size: 19.3 KB, free: 264.9 MB)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 10.28.23.203:57813 (size: 19.3 KB, free: 265.0 MB)
15/12/09 13:22:38 INFO TaskSetManager: Finished task 1.0 in stage 10.0 (TID 18) in 156 ms on 10.28.23.203 (1/2)
15/12/09 13:22:38 WARN TaskSetManager: Lost task 0.0 in stage 10.0 (TID 17, 10.28.23.202): java.io.FileNotFoundException: File file:/usr/spark/test.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/12/09 13:22:38 INFO TaskSetManager: Starting task 0.1 in stage 10.0 (TID 19, 10.28.23.201, PROCESS_LOCAL, 1397 bytes)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 10.28.23.201:51294 (size: 1795.0 B, free: 264.9 MB)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 10.28.23.201:51294 (size: 19.3 KB, free: 264.9 MB)
15/12/09 13:22:39 INFO TaskSetManager: Finished task 0.1 in stage 10.0 (TID 19) in 304 ms on 10.28.23.201 (2/2)
15/12/09 13:22:39 INFO TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool
15/12/09 13:22:39 INFO DAGScheduler: ResultStage 10 (foreach at <console>:24) finished in 0.613 s
15/12/09 13:22:39 INFO DAGScheduler: Job 10 finished: foreach at <console>:24, took 0.620210 s
解决办法
1.检查文件的权限
2.如果你是在集群的环境下运行,你必须确保所有的节点上的同个文件夹都有该文件,(我的问题就是这个),或者你可以使用HDFS就不会出现此问题。
spark集群环境下Lost task 0.0 in stage 10.0 (TID 17, 10.28.23.202): java.io.FileNotFoundException相关推荐
- Kafka:ZK+Kafka+Spark Streaming集群环境搭建(九)安装kafka_2.11-1.1.0
如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...
- quartz在集群环境下的最终解决方案
在集群环境下,大家会碰到一直困扰的问题,即多个 APP 下如何用 quartz 协调处理自动化 JOB . 大家想象一下,现在有 A , B , C3 台机器同时作为集群服务器对外统一提供 SERVI ...
- Java技术分享:集群环境下的定时任务
定时任务的实现方式有多种,例如JDK自带的Timer+TimerTask方式,Spring 3.0以后的调度任务(Scheduled Task),Quartz框架等. Timer+TimerTask是 ...
- Spark集群环境搭建(standalone模式)
Spark集群环境搭建(standalone模式) 1. 实验室名称: 2. 实验项目名称: 3. 实验学时: 4. 实验原理: 5. 实验目的: 6. 实验内容: 7. 实验器材(设备.虚拟机名称) ...
- 分布式集群环境下,如何实现session共享三(环境搭建)
这是分布式集群环境下,如何实现session共享系列的第三篇.在上一篇:分布式集群环境下,如何实现session共享二(项目开发)中,准备好了一个通过原生态的servlet操作session的案例.本 ...
- 在非容器(集群)环境下运行dapr
作者:李俱顺 原文:https://www.4async.com/2021/03/2021-03-11-running-dapr-without-container/ 前一段时间一直关注的dapr正式 ...
- 集群环境下,你不得不注意的ASP.NET Core Data Protection 机制
引言 最近线上环境遇到一个问题,就是ASP.NET Core Web应用在单个容器使用正常,扩展多个容器无法访问的问题.查看容器日志,发现以下异常: System.Security.Cryptogra ...
- weblogic 12C集群环境下的session复制
做过weblogic集群环境的人应该都清楚,要想实现session同步,必须满足两个条件:第一,在weblogic.xml里面增加session同步相关的代码:第二,所有放入session的类都要序列 ...
- ORACLE集群日志收集,【RAC】Oracle RAC集群环境下日志文件结构
在Oracle RAC环境中,对集群中的日志的定期检查是必不可少的.通过查看集群日志,可以早期定位集群环境中出现的问题,以便将问题消灭在萌芽状态.简单介绍一下有关Oracle集群环境中日志的结构,方便 ...
最新文章
- IncDec Sequence(codevs 2098)
- Docker最全教程——从理论到实战(一)
- 6463: Tak and Hotels II(倍增)
- 子div在父div中置底
- Processing 字体变形
- 字典排序什么意思_列表及字典的排序
- 做折线图_python的visvis库做折线图(line.py)代码详解
- java 语法 —— 数组
- 一步一步搭建hibernate4+ spring+ struts2
- imu相机标定_解放双手——相机与IMU外参的在线标定
- 手机群控还有这种事半功倍的操作?快来看强大的Rest API脚本功能
- 电阻触摸屏原理及电容触摸屏原理(附上原图)以及各自优缺点
- mysql监听显示syn_sent,TCP协议端口状态说明:CLOSE-WAIT、TIME-WAIT 、LISTENING、SYN_SENT、ESTABLISHED、LAST-ACK ......
- 阿里平头哥CPU技术生态负责人陈炜:平头哥的发展之路
- 罗杨美慧 20180912-3 词频统计
- 什么是范数(norm)?以及L1,L2范数的简单介绍
- python mp4 视频格式压缩
- RangingTool 覆盖物文案国际化
- 51单片机简单乐曲演奏(青花瓷)
- word中多级列表操作问题
热门文章
- PostgreSQL安装之后,打开pgAdmin4后,点击servers下方没有任何内容的情况
- 射频与麦克斯韦方程组
- ppt怎么压缩文件大小?学会这几种方法
- 天梯赛(cccc)总结(写于4.1号)
- 闻听成都华为的员工跳楼自杀
- 英语日常口语对话(7)
- 【测试】bug的生命周期和组成部分
- 使用python识别图像中的文字
- 登陆QQ时总显示QQ安全防护进程,而且点了确定后QQ还是登不上
- 如何解决win11“无法枚举容器中的对象,访问被拒绝”、“右键新建只有文件夹,没有其他选项”的问题。