Pyspark Python worker exited unexpectedly (crashed) java.io.EOFException
问题表现
mac报错
Multiprocessing causes Python to crash and gives an error may have been in progress in another thread when fork() was called
Pycharm报错
/Users/wuyumo/PycharmProjects/spark_study/venv/bin/python /Users/wuyumo/PycharmProjects/User-Profile-Spark/recommend_user_follow.py
20/09/14 14:07:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
-------------------------------------------
Time: 2020-09-14 14:08:00
--------------------------------------------------------------------------------------
Time: 2020-09-14 14:08:30
--------------------------------------------------------------------------------------
Time: 2020-09-14 14:09:00
-------------------------------------------objc[9808]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[9808]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
20/09/14 14:09:30 ERROR Executor: Exception in task 2.0 in stage 28.0 (TID 102)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 more
20/09/14 14:09:30 WARN TaskSetManager: Lost task 2.0 in stage 28.0 (TID 102, localhost, executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 more20/09/14 14:09:30 ERROR TaskSetManager: Task 2 in stage 28.0 failed 1 times; aborting job
20/09/14 14:09:30 ERROR JobScheduler: Error running job streaming job 1600063770000 ms.0
org.apache.spark.SparkException: An exception was raised by Python:
Traceback (most recent call last):File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/streaming/util.py", line 68, in callr = self.func(t, *rdds)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/streaming/dstream.py", line 173, in takeAndPrinttaken = rdd.take(num + 1)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/rdd.py", line 1360, in takeres = self.context.runJob(self, takeUpToNumLeft, p)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/context.py", line 1069, in runJobsock_info = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__answer, self.gateway_client, self.target_id, self.name)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_valueformat(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 28.0 failed 1 times, most recent failure: Lost task 2.0 in stage 28.0 (TID 102, localhost, executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 moreDriver stacktrace:at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)at scala.Option.foreach(Option.scala:257)at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:153)at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)at py4j.Gateway.invoke(Gateway.java:282)at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)at py4j.commands.CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:238)at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)... 1 more
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 moreat org.apache.spark.streaming.api.python.TransformFunction.callPythonTransformFunction(PythonDStream.scala:95)at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:78)at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)at scala.util.Try$.apply(Try.scala:192)at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):File "/Users/wuyumo/PycharmProjects/User-Profile-Spark/recommend_user_follow.py", line 398, in <module>start()File "/Users/wuyumo/PycharmProjects/User-Profile-Spark/recommend_user_follow.py", line 394, in startssc.awaitTermination()File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/streaming/context.py", line 192, in awaitTerminationself._jssc.awaitTermination()File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__answer, self.gateway_client, self.target_id, self.name)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_valueformat(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o23.awaitTermination.
: org.apache.spark.SparkException: An exception was raised by Python:
Traceback (most recent call last):File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/streaming/util.py", line 68, in callr = self.func(t, *rdds)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/streaming/dstream.py", line 173, in takeAndPrinttaken = rdd.take(num + 1)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/rdd.py", line 1360, in takeres = self.context.runJob(self, takeUpToNumLeft, p)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/context.py", line 1069, in runJobsock_info = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__answer, self.gateway_client, self.target_id, self.name)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_valueformat(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 28.0 failed 1 times, most recent failure: Lost task 2.0 in stage 28.0 (TID 102, localhost, executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 moreDriver stacktrace:at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)at scala.Option.foreach(Option.scala:257)at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:153)at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)at py4j.Gateway.invoke(Gateway.java:282)at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)at py4j.commands.CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:238)at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)... 1 more
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 moreat org.apache.spark.streaming.api.python.TransformFunction.callPythonTransformFunction(PythonDStream.scala:95)at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:78)at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)at scala.util.Try$.apply(Try.scala:192)at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)Process finished with exit code 1
问题产生原因
This is due to Apple changing macOS fork() behavior since High Sierra. The OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes variable turns off the immediate crash behavior that their newer ObjectiveC framework usually enforces now by default. This can affect any language that is doing multithreading / multiprocessing using fork() on macOS >= 10.13, especially when “native extensions” / C code extensions are used. 问题描述和解决
问题解决方案
修改mac环境变量,添加 OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
Pyspark Python worker exited unexpectedly (crashed) java.io.EOFException相关推荐
- ERROR PythonRunner: Python worker exited unexpectedly (crashed)解决方法
前段时间收到粉丝的私信,在pycharm中运行时报错了ERROR PythonRunner: Python worker exited unexpectedly (crashed) 测试运行print ...
- 启动项目时出现java.io.EOFException异常。
2019独角兽企业重金招聘Python工程师标准>>> 启动项目时报以下异常 严重: Exception loading sessions from persistent stora ...
- java.io.eof_java.io.IOException: java.io.EOFException: Unexpected end of input stream错误
报错现象: Diagnostic Messages for this Task: Error: java.io.IOException: java.io.EOFException: Unexpecte ...
- 无法从服务器获得响应,什么是java.io.EOFException的,消息:无法从服务器读取响应。 预期读4个字节,...
这个问题已经被问了几次,在其他网站,所以和很多次. 但我没有得到任何令人满意的答案. 我的问题: 我有一个使用简单的JDBC通过GlassFish应用服务器连接到MySQL数据库Java Web应用程 ...
- java.io.EOFException java.io.ObjectInputStream$PeekInputStream.readFully 错误
omcat 启动时报以下错误: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully 错误 这个错误 ...
- 异常处理:SEVERE: Unable to process Jar entry [......]for annotations java.io.EOFException
异常处理:SEVERE: Unable to process Jar entry [......]for annotations java.io.EOFException 参考文章: (1)异常处理: ...
- socket编程报异常java.io.EOFException
socket编程报异常java.io.EOFException 参考文章: (1)socket编程报异常java.io.EOFException (2)https://www.cnblogs.com/ ...
- Java Scoket之java.io.EOFException解决方案
Java Scoket之java.io.EOFException解决方案 参考文章: (1)Java Scoket之java.io.EOFException解决方案 (2)https://www.cn ...
- java.io.EOFException: Chunk stream does not exist at page: 0
http://www.cnblogs.com/kaka/archive/2012/03/15/2398215.html ActiveMQ 启动异常 在按照 <ActiveMQ in Action ...
最新文章
- tf.contrib.layers.xavier_initializer
- 详解OpenCV卷积滤波之边缘处理与锚定输出
- java线程分类_Java 线程类别
- 16 MM配置-BP业务伙伴-定义屏幕格式的账户组(供应商)
- 记一次mysql千万订单汇总查询优化
- 灭霸—个人冲刺(5)
- 爬取 wallhaven图片到本地壁纸库
- linux主机慢的原因,51CTO博客-专业IT技术博客创作平台-技术成就梦想
- 东南大学计算机考研数学教材,考东南大学计算机的看这里,双非学长逆袭!
- SAP在阿里云白皮书-第二章 阿里云概念解析
- 计算机考研复试——数据库篇
- Web—13-判断网站请求来自手机还是pc浏览器
- pgsql之Comment命令
- 【Linux】深入解析Linux proc文件系统
- 一种基于线性反馈位移寄存器的随机数生成方法
- CAS4.0 4.1 服务器端搭建(二)
- 微信小程序基础入门---登陆实现
- 基于Python3+Requests的贴吧签到助手
- 知乎cookies的介绍_知乎更新隐私政策:不点同意可选“仅浏览”,相关数据一月内删除...
- [NOIP1996 提高组] 挖地雷 (动态规划)