解决“Spark context stopped while waiting for backend“ issue
在配置为4C16G的虚拟机上安装hadoop生态全家桶,在安装Spark2,使用了社区版2.3的版本。
安装完毕后,使用spark2自带的样例程序 org.apache.spark.examples.SparkPi 测试了下,结果报了如下错误:
Spark context stopped while waiting for backend
完整报错日志如下:
2021-03-12 15:05:32 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-c0b92da6-a56a-47c4-b3a5-bc2e37e6cf84
[root@cdh-001 conf]# spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 600m --executor-memory 600m --executor-cores 2 $SPARK_HOME/examples/jars/spark-examples*.jar 10
2021-03-12 15:05:46 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-03-12 15:05:46 INFO SparkContext:54 - Running Spark version 2.3.0
2021-03-12 15:05:46 INFO SparkContext:54 - Submitted application: Spark Pi
2021-03-12 15:05:46 INFO SecurityManager:54 - Changing view acls to: root
2021-03-12 15:05:46 INFO SecurityManager:54 - Changing modify acls to: root
2021-03-12 15:05:46 INFO SecurityManager:54 - Changing view acls groups to:
2021-03-12 15:05:46 INFO SecurityManager:54 - Changing modify acls groups to:
2021-03-12 15:05:46 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2021-03-12 15:05:46 INFO Utils:54 - Successfully started service 'sparkDriver' on port 40299.
2021-03-12 15:05:46 INFO SparkEnv:54 - Registering MapOutputTracker
2021-03-12 15:05:46 INFO SparkEnv:54 - Registering BlockManagerMaster
2021-03-12 15:05:46 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2021-03-12 15:05:46 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2021-03-12 15:05:46 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-b49e6294-3bf9-42f3-956e-897aa442cf58
2021-03-12 15:05:46 INFO MemoryStore:54 - MemoryStore started with capacity 140.1 MB
2021-03-12 15:05:47 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2021-03-12 15:05:47 INFO log:192 - Logging initialized @1691ms
2021-03-12 15:05:47 INFO Server:346 - jetty-9.3.z-SNAPSHOT
2021-03-12 15:05:47 INFO Server:414 - Started @1762ms
2021-03-12 15:05:47 INFO AbstractConnector:278 - Started ServerConnector@2a551a63{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2021-03-12 15:05:47 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7cd1ac19{/jobs,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2a2bb0eb{/jobs/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3c291aad{/jobs/job,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@733037{/jobs/job/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7728643a{/stages,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@320e400{/stages/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5167268{/stages/stage,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2c444798{/stages/stage/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1af7f54a{/stages/pool,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6ebd78d1{/stages/pool/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@436390f4{/storage,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4d157787{/storage/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@68ed96ca{/storage/rdd,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6d1310f6{/storage/rdd/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3228d990{/environment,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@54e7391d{/environment/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@50b8ae8d{/executors,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@255990cc{/executors/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@51c929ae{/executors/threadDump,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3c8bdd5b{/executors/threadDump/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@29d2d081{/static,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60afd40d{/,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@28a2a3e7{/api,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@33a2499c{/jobs/job/kill,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@e72dba7{/stages/stage/kill,null,AVAILABLE,@Spark}
2021-03-12 15:05:47 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://cdh-001:4040
2021-03-12 15:05:47 INFO SparkContext:54 - Added JAR file:/data/my_bdc_apps/spark/examples/jars/spark-examples_2.11-2.3.0.jar at spark://cdh-001:40299/jars/spark-examples_2.11-2.3.0.jar with timestamp 1615532747291
2021-03-12 15:05:47 INFO RMProxy:98 - Connecting to ResourceManager at cdh-002/10.6.2.245:8032
2021-03-12 15:05:48 INFO Client:54 - Requesting a new application from cluster with 4 NodeManagers
2021-03-12 15:05:48 INFO Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2021-03-12 15:05:48 INFO Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead
2021-03-12 15:05:48 INFO Client:54 - Setting up container launch context for our AM
2021-03-12 15:05:48 INFO Client:54 - Setting up the launch environment for our AM container
2021-03-12 15:05:48 INFO Client:54 - Preparing resources for our AM container
2021-03-12 15:05:49 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2021-03-12 15:05:50 INFO Client:54 - Uploading resource file:/tmp/spark-bf6b7bbf-3512-4522-8b31-47a6a65187ac/__spark_libs__6875100117796151482.zip -> hdfs://nameservice1/user/root/.sparkStaging/application_1615455919840_0004/__spark_libs__6875100117796151482.zip
2021-03-12 15:05:51 INFO Client:54 - Uploading resource file:/tmp/spark-bf6b7bbf-3512-4522-8b31-47a6a65187ac/__spark_conf__38977155164719923.zip -> hdfs://nameservice1/user/root/.sparkStaging/application_1615455919840_0004/__spark_conf__.zip
2021-03-12 15:05:51 INFO SecurityManager:54 - Changing view acls to: root
2021-03-12 15:05:51 INFO SecurityManager:54 - Changing modify acls to: root
2021-03-12 15:05:51 INFO SecurityManager:54 - Changing view acls groups to:
2021-03-12 15:05:51 INFO SecurityManager:54 - Changing modify acls groups to:
2021-03-12 15:05:51 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2021-03-12 15:05:51 INFO Client:54 - Submitting application application_1615455919840_0004 to ResourceManager
2021-03-12 15:05:51 INFO YarnClientImpl:273 - Submitted application application_1615455919840_0004
2021-03-12 15:05:51 INFO SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1615455919840_0004 and attemptId None
2021-03-12 15:05:52 INFO Client:54 - Application report for application_1615455919840_0004 (state: ACCEPTED)
2021-03-12 15:05:52 INFO Client:54 -client token: N/Adiagnostics: N/AApplicationMaster host: N/AApplicationMaster RPC port: -1queue: defaultstart time: 1615532751522final status: UNDEFINEDtracking URL: http://cdh-002:8088/proxy/application_1615455919840_0004/user: root
2021-03-12 15:05:53 INFO Client:54 - Application report for application_1615455919840_0004 (state: ACCEPTED)
2021-03-12 15:05:54 INFO Client:54 - Application report for application_1615455919840_0004 (state: ACCEPTED)
2021-03-12 15:05:55 INFO Client:54 - Application report for application_1615455919840_0004 (state: ACCEPTED)
2021-03-12 15:05:55 INFO YarnClientSchedulerBackend:54 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> cdh-002, PROXY_URI_BASES -> http://cdh-002:8088/proxy/application_1615455919840_0004), /proxy/application_1615455919840_0004
2021-03-12 15:05:55 INFO JettyUtils:54 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2021-03-12 15:05:56 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
2021-03-12 15:05:56 INFO Client:54 - Application report for application_1615455919840_0004 (state: RUNNING)
2021-03-12 15:05:56 INFO Client:54 -client token: N/Adiagnostics: N/AApplicationMaster host: 10.6.2.248ApplicationMaster RPC port: 0queue: defaultstart time: 1615532751522final status: UNDEFINEDtracking URL: http://cdh-002:8088/proxy/application_1615455919840_0004/user: root
2021-03-12 15:05:56 INFO YarnClientSchedulerBackend:54 - Application application_1615455919840_0004 has started running.
2021-03-12 15:05:56 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46294.
2021-03-12 15:05:56 INFO NettyBlockTransferService:54 - Server created on cdh-001:46294
2021-03-12 15:05:56 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2021-03-12 15:05:56 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, cdh-001, 46294, None)
2021-03-12 15:05:56 INFO BlockManagerMasterEndpoint:54 - Registering block manager cdh-001:46294 with 140.1 MB RAM, BlockManagerId(driver, cdh-001, 46294, None)
2021-03-12 15:05:56 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, cdh-001, 46294, None)
2021-03-12 15:05:56 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, cdh-001, 46294, None)
2021-03-12 15:05:56 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@626b639e{/metrics/json,null,AVAILABLE,@Spark}
2021-03-12 15:05:59 INFO YarnClientSchedulerBackend:54 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> cdh-002, PROXY_URI_BASES -> http://cdh-002:8088/proxy/application_1615455919840_0004), /proxy/application_1615455919840_0004
2021-03-12 15:05:59 INFO JettyUtils:54 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2021-03-12 15:05:59 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
2021-03-12 15:06:03 ERROR YarnClientSchedulerBackend:70 - Yarn application has already exited with state FINISHED!
2021-03-12 15:06:03 INFO AbstractConnector:318 - Stopped Spark@2a551a63{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2021-03-12 15:06:03 INFO SparkUI:54 - Stopped Spark web UI at http://cdh-001:4040
2021-03-12 15:06:03 ERROR TransportClient:233 - Failed to send RPC 8719934078843937066 to /10.6.2.248:50270: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelExceptionat io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2021-03-12 15:06:03 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 8719934078843937066 to /10.6.2.248:50270: java.nio.channels.ClosedChannelExceptionat org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelExceptionat io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2021-03-12 15:06:03 INFO SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices
(serviceOption=None,services=List(),started=false)
2021-03-12 15:06:03 ERROR Utils:91 - Uncaught exception in thread Yarn application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult:at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:566)at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:95)at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:155)at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:508)at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1752)at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1924)at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)at org.apache.spark.SparkContext.stop(SparkContext.scala:1923)at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
Caused by: java.io.IOException: Failed to send RPC 8719934078843937066 to /10.6.2.248:50270: java.nio.channels.ClosedChannelExceptionat org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelExceptionat io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
2021-03-12 15:06:03 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2021-03-12 15:06:03 INFO MemoryStore:54 - MemoryStore cleared
2021-03-12 15:06:03 INFO BlockManager:54 - BlockManager stopped
2021-03-12 15:06:03 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2021-03-12 15:06:03 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2021-03-12 15:06:03 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backendat org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177)at org.apache.spark.SparkContext.<init>(SparkContext.scala:558)at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)at org.apache.spark.examples.SparkPi.main(SparkPi.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2021-03-12 15:06:03 INFO SparkContext:54 - SparkContext already stopped.
2021-03-12 15:06:03 INFO SparkContext:54 - Successfully stopped SparkContext
Exception in thread "main" java.lang.IllegalStateException: Spark context stopped while waiting for backendat org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:177)at org.apache.spark.SparkContext.<init>(SparkContext.scala:558)at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)at org.apache.spark.examples.SparkPi.main(SparkPi.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2021-03-12 15:06:03 INFO ShutdownHookManager:54 - Shutdown hook called
2021-03-12 15:06:03 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-ad5a6ab4-8aa4-4c06-a783-02195cd8c569
2021-03-12 15:06:03 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-bf6b7bbf-3512-4522-8b31-47a6a65187ac
猜测是在安装了大量hadoop生态组件的服务器之后,服务器上又陆续安装了Spark,在spark-submit任务提交过程中会对系统可用内存进行检测,当发现内存不足时,报出了上述错误。
在安装了 Hadoop 生态全家桶后,机器内存剩余不多,这里配置了如下的spark运行参数:
- –driver-memory 600m
- –executor-memory 600m
通过对这2个参数进行不同极端值的设置,可以根据日志推断出当前程序需要的内存与目前服务器的内存限制,测试结果如下:
①、第一次测试
[root@cdh-001 conf]# spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 500m --executor-memory 100m --executor-cores 2 $SPARK_HOME/examples/jars/spark-examples*.jar 10
2021-03-12 15:05:17 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-03-12 15:05:18 INFO SparkContext:54 - Running Spark version 2.3.0
2021-03-12 15:05:18 INFO SparkContext:54 - Submitted application: Spark Pi
2021-03-12 15:05:18 INFO SecurityManager:54 - Changing view acls to: root
2021-03-12 15:05:18 INFO SecurityManager:54 - Changing modify acls to: root
2021-03-12 15:05:18 INFO SecurityManager:54 - Changing view acls groups to:
2021-03-12 15:05:18 INFO SecurityManager:54 - Changing modify acls groups to:
2021-03-12 15:05:18 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2021-03-12 15:05:18 INFO Utils:54 - Successfully started service 'sparkDriver' on port 43552.
2021-03-12 15:05:18 INFO SparkEnv:54 - Registering MapOutputTracker
2021-03-12 15:05:18 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:217)at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:199)
②、第二次测试
[root@cdh-001 conf]# spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 600m --executor-memory 100m --executor-cores 2 $SPARK_HOME/examples/jars/spark-examples*.jar 10
2021-03-12 15:05:31 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-03-12 15:05:31 INFO SparkContext:54 - Running Spark version 2.3.0
2021-03-12 15:05:31 INFO SparkContext:54 - Submitted application: Spark Pi
2021-03-12 15:05:31 INFO SecurityManager:54 - Changing view acls to: root
2021-03-12 15:05:31 INFO SecurityManager:54 - Changing modify acls to: root
2021-03-12 15:05:31 INFO SecurityManager:54 - Changing view acls groups to:
2021-03-12 15:05:31 INFO SecurityManager:54 - Changing modify acls groups to:
2021-03-12 15:05:31 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2021-03-12 15:05:31 INFO Utils:54 - Successfully started service 'sparkDriver' on port 39530.
2021-03-12 15:05:31 INFO SparkEnv:54 - Registering MapOutputTracker
2021-03-12 15:05:31 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalArgumentException: Executor memory 104857600 must be at least 471859200. Please increase executor memory using the --executor-memory option or spark.executor.memory in Spark configuration.at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:225)at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:199)at org.apache.spark.SparkEnv$.create(SparkEnv.scala:330)
看表象是运行spark时,服务器资源不足,无法分配到指定大小的内存。
机器为16G内存,安装了全家桶后,根据free -g命令检查,系统可用内容是高于报错中的内存大小限制的。
但因为运行的Spark on yarn机制,那么会不会是yarn的配置限制或配置错误引起的呢?
经查,Yarn的nodemanager节点会对提交上来的任务(本例为spark on yarn)进行内存可分配性检查,涉及到对物理内存和虚拟内存的检查,当机器内存性能不太高时,可能无法通过内存检查。
当然可用尝试关闭此选项,来通过不预检内存来尝试启动程序的目的(受限于物理内存的制约,可能会失败)。
解决方案:
通过在yarn-site.xml中添加如下配置项,并重启yarn,程序在 “–driver-memory 600m --executor-memory 600m”的参数下已可以成功运行。
<property><name>yarn.nodemanager.pmem-check-enabled</name><value>false</value>
</property>
<property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value>
</property>
解决“Spark context stopped while waiting for backend“ issue相关推荐
- Spark Session 与 Spark Context的区别
Spark Session是Spark 2.0中Spark应用程序的统一入口点. 它提供了一种以较少数量的构造与各种spark功能交互的方法. 此时就不需要spark context, hive co ...
- 解决Spark数据倾斜(Data Skew)的 N 种姿势 与 问题定位
Spark性能优化之道--解决Spark数据倾斜(Data Skew)的N种姿势 本文结合实例详细阐明了Spark数据倾斜的问题定位和几种场景以及对应的解决方案,包括避免数据源倾斜,调整并行度,使用自 ...
- CC00082.spark——|HadoopSpark.V08|——|Spark.v08|Spark 原理 源码|Spark Context|
一.SparkContext启动流程 ### --- sparkContext启动流程~~~ SparkContext 涉及到的组件多,源码比较庞大. ~~~ 有些边缘性的模块主要起到辅助的功能,暂时 ...
- 解决spark中遇到的数据倾斜问题
一. 数据倾斜的现象 多数task执行速度较快,少数task执行时间非常长,或者等待很长时间后提示你内存不足,执行失败. 二. 数据倾斜的原因 常见于各种shuffle操作,例如reduceByKey ...
- hive解决数据倾斜问题_八种解决 Spark 数据倾斜的方法
有的时候,我们可能会遇到大数据计算中一个最棘手的问题--数据倾斜,此时Spark作业的性能会比期望差很多.数据倾斜调优,就是使用各种技术方案解决不同类型的数据倾斜问题,以保证Spark作业的性能. 数 ...
- 使用spark.streaming.kafka.consumer.poll.ms和reconnect.backoff.ms解决spark streaming消费kafka时任务不稳定的问题
问题描述 在用spark streaming程序消费kafka的数据时,遇到了一个神奇的现象:同样的数据量.相似的数据,在消费时,有些批次的数据在做map操作时神奇的多了40多秒,具体看下面的数据:在 ...
- 解决Spark窗口统计函数rank()、row_number()、percent_rank()的OOM问题
目录 1. 窗口函数功能介绍 一个简单的例子 一个复杂的例子 2.数据量过大时的OOM问题 问题及原因 解决方法1:用SQL处理 解决方法2:转为rdd进行处理 解决方法3:将数据量过多的分组进 ...
- 使用maven-shade-plugin插件解决spark依赖冲突问题
(尊重劳动成果,转载请注明出处:http://blog.csdn.net/qq_25827845/article/details/54973182冷血之心的博客) 依赖冲突:NoSuchMethodE ...
- 解决spark on yarn报错:File /tmp/hadoop-root/nm-local-dir/filecache does not exist
在测试过程中遇到了类似如下的错误: /tmp/hadoop-root/root为用户名 Application application_xxxxxxxxx_yyyy failed 2 times du ...
最新文章
- python常用变量名_python基础知识整理
- 被鹤岗买房鼓励,我带上6万来到另一小城
- 一文带你入门图论和网络分析(附Python代码)
- 【svn】svn报错:“Previous operation has not finished; run ‘cleanup‘ if it was interrupted“ 的解决方法
- 【干货】救火必备:线上故障排查套路大全
- android录屏软件冲突,关于Android同时录制多个录像的问题
- gpl2 gpl3区别_GPL的下降?
- 数据结构C语言版第二版答案 严蔚敏 李冬梅 吴伟民 编著
- LumaQQ.NET 试用
- c语言大计基题库,2016年大学计算机基础试题题库及答案
- Hive数据分析案例
- 拦截QT关闭窗口的CloseEvent()解析
- 【R语言】5种探索数据分布的可视化技术
- 论文摘要部分如何撰写
- 8月20日 仿163邮箱中遇到的问题及解决(二)
- MySql事务4种隔离级别以及悲观锁和乐观锁
- python,selenium爬取微博热搜存入Mysql
- leetcode98.验证二叉搜索树 Python
- 商品新零售行业——客户价值分析驾驶舱(附详细操作)
- 9.app后端选择什么服务器