spark从当前目录加载文件报错,Lost task 0.0 in stage 10.0 (TID 17, 10.28.23.202): java.io.FileNotFoundException,明显的,找不到本地的文件,但是本地的文件是存在的。
scala> val file = sc.textFile("test.txt")
15/12/09 13:22:36 INFO MemoryStore: ensureFreeSpace(191856) called with curMem=717340, maxMem=277877882
15/12/09 13:22:36 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 187.4 KB, free 264.1 MB)
15/12/09 13:22:36 INFO MemoryStore: ensureFreeSpace(19750) called with curMem=909196, maxMem=277877882
15/12/09 13:22:36 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 19.3 KB, free 264.1 MB)
15/12/09 13:22:36 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 10.28.23.201:60179 (size: 19.3 KB, free: 264.9 MB)
15/12/09 13:22:36 INFO SparkContext: Created broadcast 14 from textFile at <console>:21
file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[10] at textFile at <console>:21

scala> file foreach println
15/12/09 13:22:38 INFO FileInputFormat: Total input paths to process : 1
15/12/09 13:22:38 INFO SparkContext: Starting job: foreach at <console>:24
15/12/09 13:22:38 INFO DAGScheduler: Got job 10 (foreach at <console>:24) with 2 output partitions (allowLocal=false)
15/12/09 13:22:38 INFO DAGScheduler: Final stage: ResultStage 10(foreach at <console>:24)
15/12/09 13:22:38 INFO DAGScheduler: Parents of final stage: List()
15/12/09 13:22:38 INFO DAGScheduler: Missing parents: List()
15/12/09 13:22:38 INFO DAGScheduler: Submitting ResultStage 10 (MapPartitionsRDD[10] at textFile at <console>:21), which has no missing parents
15/12/09 13:22:38 INFO MemoryStore: ensureFreeSpace(3080) called with curMem=928946, maxMem=277877882
15/12/09 13:22:38 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 3.0 KB, free 264.1 MB)
15/12/09 13:22:38 INFO MemoryStore: ensureFreeSpace(1795) called with curMem=932026, maxMem=277877882
15/12/09 13:22:38 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 1795.0 B, free 264.1 MB)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 10.28.23.201:60179 (size: 1795.0 B, free: 264.9 MB)
15/12/09 13:22:38 INFO SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:874
15/12/09 13:22:38 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 10 (MapPartitionsRDD[10] at textFile at <console>:21)
15/12/09 13:22:38 INFO TaskSchedulerImpl: Adding task set 10.0 with 2 tasks
15/12/09 13:22:38 INFO TaskSetManager: Starting task 0.0 in stage 10.0 (TID 17, 10.28.23.202, PROCESS_LOCAL, 1397 bytes)
15/12/09 13:22:38 INFO TaskSetManager: Starting task 1.0 in stage 10.0 (TID 18, 10.28.23.203, PROCESS_LOCAL, 1397 bytes)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 10.28.23.203:57813 (size: 1795.0 B, free: 265.0 MB)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 10.28.23.202:50706 (size: 1795.0 B, free: 265.0 MB)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 10.28.23.202:50706 (size: 19.3 KB, free: 264.9 MB)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 10.28.23.203:57813 (size: 19.3 KB, free: 265.0 MB)
15/12/09 13:22:38 INFO TaskSetManager: Finished task 1.0 in stage 10.0 (TID 18) in 156 ms on 10.28.23.203 (1/2)
15/12/09 13:22:38 WARN TaskSetManager: Lost task 0.0 in stage 10.0 (TID 17, 10.28.23.202): java.io.FileNotFoundException: File file:/usr/spark/test.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

15/12/09 13:22:38 INFO TaskSetManager: Starting task 0.1 in stage 10.0 (TID 19, 10.28.23.201, PROCESS_LOCAL, 1397 bytes)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 10.28.23.201:51294 (size: 1795.0 B, free: 264.9 MB)
15/12/09 13:22:38 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 10.28.23.201:51294 (size: 19.3 KB, free: 264.9 MB)
15/12/09 13:22:39 INFO TaskSetManager: Finished task 0.1 in stage 10.0 (TID 19) in 304 ms on 10.28.23.201 (2/2)
15/12/09 13:22:39 INFO TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool 
15/12/09 13:22:39 INFO DAGScheduler: ResultStage 10 (foreach at <console>:24) finished in 0.613 s
15/12/09 13:22:39 INFO DAGScheduler: Job 10 finished: foreach at <console>:24, took 0.620210 s

解决办法
1.检查文件的权限
2.如果你是在集群的环境下运行,你必须确保所有的节点上的同个文件夹都有该文件,(我的问题就是这个),或者你可以使用HDFS就不会出现此问题。

spark集群环境下Lost task 0.0 in stage 10.0 (TID 17, 10.28.23.202): java.io.FileNotFoundException相关推荐

  1. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(九)安装kafka_2.11-1.1.0

    如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...

  2. quartz在集群环境下的最终解决方案

    在集群环境下,大家会碰到一直困扰的问题,即多个 APP 下如何用 quartz 协调处理自动化 JOB . 大家想象一下,现在有 A , B , C3 台机器同时作为集群服务器对外统一提供 SERVI ...

  3. Java技术分享:集群环境下的定时任务

    定时任务的实现方式有多种,例如JDK自带的Timer+TimerTask方式,Spring 3.0以后的调度任务(Scheduled Task),Quartz框架等. Timer+TimerTask是 ...

  4. Spark集群环境搭建(standalone模式)

    Spark集群环境搭建(standalone模式) 1. 实验室名称: 2. 实验项目名称: 3. 实验学时: 4. 实验原理: 5. 实验目的: 6. 实验内容: 7. 实验器材(设备.虚拟机名称) ...

  5. 分布式集群环境下,如何实现session共享三(环境搭建)

    这是分布式集群环境下,如何实现session共享系列的第三篇.在上一篇:分布式集群环境下,如何实现session共享二(项目开发)中,准备好了一个通过原生态的servlet操作session的案例.本 ...

  6. 在非容器(集群)环境下运行dapr

    作者:李俱顺 原文:https://www.4async.com/2021/03/2021-03-11-running-dapr-without-container/ 前一段时间一直关注的dapr正式 ...

  7. 集群环境下,你不得不注意的ASP.NET Core Data Protection 机制

    引言 最近线上环境遇到一个问题,就是ASP.NET Core Web应用在单个容器使用正常,扩展多个容器无法访问的问题.查看容器日志,发现以下异常: System.Security.Cryptogra ...

  8. weblogic 12C集群环境下的session复制

    做过weblogic集群环境的人应该都清楚,要想实现session同步,必须满足两个条件:第一,在weblogic.xml里面增加session同步相关的代码:第二,所有放入session的类都要序列 ...

  9. ORACLE集群日志收集,【RAC】Oracle RAC集群环境下日志文件结构

    在Oracle RAC环境中,对集群中的日志的定期检查是必不可少的.通过查看集群日志,可以早期定位集群环境中出现的问题,以便将问题消灭在萌芽状态.简单介绍一下有关Oracle集群环境中日志的结构,方便 ...

最新文章

  1. IncDec Sequence(codevs 2098)
  2. Docker最全教程——从理论到实战(一)
  3. 6463: Tak and Hotels II(倍增)
  4. 子div在父div中置底
  5. Processing 字体变形
  6. 字典排序什么意思_列表及字典的排序
  7. 做折线图_python的visvis库做折线图(line.py)代码详解
  8. java 语法 —— 数组
  9. 一步一步搭建hibernate4+ spring+ struts2
  10. imu相机标定_解放双手——相机与IMU外参的在线标定
  11. 手机群控还有这种事半功倍的操作?快来看强大的Rest API脚本功能
  12. 电阻触摸屏原理及电容触摸屏原理(附上原图)以及各自优缺点
  13. mysql监听显示syn_sent,TCP协议端口状态说明:CLOSE-WAIT、TIME-WAIT 、LISTENING、SYN_SENT、ESTABLISHED、LAST-ACK ......
  14. 阿里平头哥CPU技术生态负责人陈炜:平头哥的发展之路
  15. 罗杨美慧 20180912-3 词频统计
  16. 什么是范数(norm)?以及L1,L2范数的简单介绍
  17. python mp4 视频格式压缩
  18. RangingTool 覆盖物文案国际化
  19. 51单片机简单乐曲演奏(青花瓷)
  20. word中多级列表操作问题

热门文章

  1. PostgreSQL安装之后,打开pgAdmin4后,点击servers下方没有任何内容的情况
  2. 射频与麦克斯韦方程组
  3. ppt怎么压缩文件大小?学会这几种方法
  4. 天梯赛(cccc)总结(写于4.1号)
  5. 闻听成都华为的员工跳楼自杀
  6. 英语日常口语对话(7)
  7. 【测试】bug的生命周期和组成部分
  8. 使用python识别图像中的文字
  9. 登陆QQ时总显示QQ安全防护进程,而且点了确定后QQ还是登不上
  10. 如何解决win11“无法枚举容器中的对象,访问被拒绝”、“右键新建只有文件夹,没有其他选项”的问题。