问题表现

mac报错
Multiprocessing causes Python to crash and gives an error may have been in progress in another thread when fork() was called
Pycharm报错
/Users/wuyumo/PycharmProjects/spark_study/venv/bin/python /Users/wuyumo/PycharmProjects/User-Profile-Spark/recommend_user_follow.py
20/09/14 14:07:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
-------------------------------------------
Time: 2020-09-14 14:08:00
--------------------------------------------------------------------------------------
Time: 2020-09-14 14:08:30
--------------------------------------------------------------------------------------
Time: 2020-09-14 14:09:00
-------------------------------------------objc[9808]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[9808]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
20/09/14 14:09:30 ERROR Executor: Exception in task 2.0 in stage 28.0 (TID 102)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 more
20/09/14 14:09:30 WARN TaskSetManager: Lost task 2.0 in stage 28.0 (TID 102, localhost, executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 more20/09/14 14:09:30 ERROR TaskSetManager: Task 2 in stage 28.0 failed 1 times; aborting job
20/09/14 14:09:30 ERROR JobScheduler: Error running job streaming job 1600063770000 ms.0
org.apache.spark.SparkException: An exception was raised by Python:
Traceback (most recent call last):File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/streaming/util.py", line 68, in callr = self.func(t, *rdds)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/streaming/dstream.py", line 173, in takeAndPrinttaken = rdd.take(num + 1)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/rdd.py", line 1360, in takeres = self.context.runJob(self, takeUpToNumLeft, p)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/context.py", line 1069, in runJobsock_info = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__answer, self.gateway_client, self.target_id, self.name)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_valueformat(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 28.0 failed 1 times, most recent failure: Lost task 2.0 in stage 28.0 (TID 102, localhost, executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 moreDriver stacktrace:at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)at scala.Option.foreach(Option.scala:257)at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:153)at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)at py4j.Gateway.invoke(Gateway.java:282)at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)at py4j.commands.CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:238)at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)... 1 more
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 moreat org.apache.spark.streaming.api.python.TransformFunction.callPythonTransformFunction(PythonDStream.scala:95)at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:78)at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)at scala.util.Try$.apply(Try.scala:192)at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):File "/Users/wuyumo/PycharmProjects/User-Profile-Spark/recommend_user_follow.py", line 398, in <module>start()File "/Users/wuyumo/PycharmProjects/User-Profile-Spark/recommend_user_follow.py", line 394, in startssc.awaitTermination()File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/streaming/context.py", line 192, in awaitTerminationself._jssc.awaitTermination()File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__answer, self.gateway_client, self.target_id, self.name)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_valueformat(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o23.awaitTermination.
: org.apache.spark.SparkException: An exception was raised by Python:
Traceback (most recent call last):File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/streaming/util.py", line 68, in callr = self.func(t, *rdds)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/streaming/dstream.py", line 173, in takeAndPrinttaken = rdd.take(num + 1)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/rdd.py", line 1360, in takeres = self.context.runJob(self, takeUpToNumLeft, p)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/pyspark/context.py", line 1069, in runJobsock_info = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__answer, self.gateway_client, self.target_id, self.name)File "/Users/wuyumo/PycharmProjects/spark_study/venv/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_valueformat(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 28.0 failed 1 times, most recent failure: Lost task 2.0 in stage 28.0 (TID 102, localhost, executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 moreDriver stacktrace:at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)at scala.Option.foreach(Option.scala:257)at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:153)at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)at py4j.Gateway.invoke(Gateway.java:282)at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)at py4j.commands.CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:238)at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:490)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:479)at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:597)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)... 1 more
Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:392)at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:582)... 16 moreat org.apache.spark.streaming.api.python.TransformFunction.callPythonTransformFunction(PythonDStream.scala:95)at org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:78)at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)at org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:179)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)at scala.util.Try$.apply(Try.scala:192)at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)Process finished with exit code 1

问题产生原因

This is due to Apple changing macOS fork() behavior since High Sierra. The OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes variable turns off the immediate crash behavior that their newer ObjectiveC framework usually enforces now by default. This can affect any language that is doing multithreading / multiprocessing using fork() on macOS >= 10.13, especially when “native extensions” / C code extensions are used. 问题描述和解决

问题解决方案

修改mac环境变量,添加 OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Pyspark Python worker exited unexpectedly (crashed) java.io.EOFException相关推荐

  1. ERROR PythonRunner: Python worker exited unexpectedly (crashed)解决方法

    前段时间收到粉丝的私信,在pycharm中运行时报错了ERROR PythonRunner: Python worker exited unexpectedly (crashed) 测试运行print ...

  2. 启动项目时出现java.io.EOFException异常。

    2019独角兽企业重金招聘Python工程师标准>>> 启动项目时报以下异常 严重: Exception loading sessions from persistent stora ...

  3. java.io.eof_java.io.IOException: java.io.EOFException: Unexpected end of input stream错误

    报错现象: Diagnostic Messages for this Task: Error: java.io.IOException: java.io.EOFException: Unexpecte ...

  4. 无法从服务器获得响应,什么是java.io.EOFException的,消息:无法从服务器读取响应。 预期读4个字节,...

    这个问题已经被问了几次,在其他网站,所以和很多次. 但我没有得到任何令人满意的答案. 我的问题: 我有一个使用简单的JDBC通过GlassFish应用服务器连接到MySQL数据库Java Web应用程 ...

  5. java.io.EOFException java.io.ObjectInputStream$PeekInputStream.readFully 错误

    omcat 启动时报以下错误: java.io.EOFException  at java.io.ObjectInputStream$PeekInputStream.readFully 错误 这个错误 ...

  6. 异常处理:SEVERE: Unable to process Jar entry [......]for annotations java.io.EOFException

    异常处理:SEVERE: Unable to process Jar entry [......]for annotations java.io.EOFException 参考文章: (1)异常处理: ...

  7. socket编程报异常java.io.EOFException

    socket编程报异常java.io.EOFException 参考文章: (1)socket编程报异常java.io.EOFException (2)https://www.cnblogs.com/ ...

  8. Java Scoket之java.io.EOFException解决方案

    Java Scoket之java.io.EOFException解决方案 参考文章: (1)Java Scoket之java.io.EOFException解决方案 (2)https://www.cn ...

  9. java.io.EOFException: Chunk stream does not exist at page: 0

    http://www.cnblogs.com/kaka/archive/2012/03/15/2398215.html ActiveMQ 启动异常 在按照 <ActiveMQ in Action ...

最新文章

  1. tf.contrib.layers.xavier_initializer
  2. 详解OpenCV卷积滤波之边缘处理与锚定输出
  3. java线程分类_Java 线程类别
  4. 16 MM配置-BP业务伙伴-定义屏幕格式的账户组(供应商)
  5. 记一次mysql千万订单汇总查询优化
  6. 灭霸—个人冲刺(5)
  7. 爬取 wallhaven图片到本地壁纸库
  8. linux主机慢的原因,51CTO博客-专业IT技术博客创作平台-技术成就梦想
  9. 东南大学计算机考研数学教材,考东南大学计算机的看这里,双非学长逆袭!
  10. SAP在阿里云白皮书-第二章 阿里云概念解析
  11. 计算机考研复试——数据库篇
  12. Web—13-判断网站请求来自手机还是pc浏览器
  13. pgsql之Comment命令
  14. 【Linux】深入解析Linux proc文件系统
  15. 一种基于线性反馈位移寄存器的随机数生成方法
  16. CAS4.0 4.1 服务器端搭建(二)
  17. 微信小程序基础入门---登陆实现
  18. 基于Python3+Requests的贴吧签到助手
  19. 知乎cookies的介绍_知乎更新隐私政策:不点同意可选“仅浏览”,相关数据一月内删除...
  20. [NOIP1996 提高组] 挖地雷 (动态规划)

热门文章

  1. jenkins邮件配置和邮件发送
  2. 新研究显示尿石素A (Mitopure®)可改善线粒体健康,减少关节软骨损伤并减轻骨关节炎疼痛
  3. qt窗体设置圆角后出现黑色的直角
  4. ctfshow web入门 XXE web373~web378
  5. bug缺陷管理流程及等级划分
  6. Qualcomm msm8974 编译
  7. Linux zip分卷压缩
  8. 逆战班---《JS操作汉字时钟详解》
  9. awk打印除某列之外的所有列
  10. 易语言和python融合_易语言和python融合|智联招聘怎么自动投递简历