更多代码请见:https://github.com/xubo245/SparkLearning

Spark代码2之Transformation:union,distinct,join

代码:

package LocalSpark/*** Created by xubo on 2016/3/3.*/
import org.apache.spark._
import org.apache.spark.network.netty.SparkTransportConfobject Transformation1 {def main(args:Array[String]): Unit ={val conf =new SparkConf().setAppName("Transformation1").setMaster("local")val spark=new SparkContext(conf)//unionvar a1=spark.parallelize(List(('a',1),('b',1)))var a2=spark.parallelize(List(('c',1),('d',1)))val result= a1.union(a2)println(result.count())println(result.collect().length)for(i<-0 until result.collect().length) (result.collect())(i)for((i,j)<- result.collect()) println(i+":"+j)//distinctvar a3=spark.parallelize(List(('a',1),('b',1),('a',1)))var a4=spark.parallelize(List(('c',1),('d',1),('b',1),('b',2),('b',3),('a',1),('a',2)))val r2=a3.distinct()for((i,j)<-a3) println("a3:"+i+":"+j)for((i,j)<-r2) println("r2:"+i+":"+j)//var j1=a3.join(a4)for((i,(j,k))<-j1) println("j1:"+i+":"+j+":"+k)}
}

运行结果:

D:\1win7\java\jdk\bin\java -Didea.launcher.port=7533 "-Didea.launcher.bin.path=D:\1win7\idea\IntelliJ IDEA Community Edition 15.0.4\bin" -Dfile.encoding=UTF-8 -classpath "D:\1win7\java\jdk\jre\lib\charsets.jar;D:\1win7\java\jdk\jre\lib\deploy.jar;D:\1win7\java\jdk\jre\lib\ext\access-bridge-64.jar;D:\1win7\java\jdk\jre\lib\ext\dnsns.jar;D:\1win7\java\jdk\jre\lib\ext\jaccess.jar;D:\1win7\java\jdk\jre\lib\ext\localedata.jar;D:\1win7\java\jdk\jre\lib\ext\sunec.jar;D:\1win7\java\jdk\jre\lib\ext\sunjce_provider.jar;D:\1win7\java\jdk\jre\lib\ext\sunmscapi.jar;D:\1win7\java\jdk\jre\lib\ext\zipfs.jar;D:\1win7\java\jdk\jre\lib\javaws.jar;D:\1win7\java\jdk\jre\lib\jce.jar;D:\1win7\java\jdk\jre\lib\jfr.jar;D:\1win7\java\jdk\jre\lib\jfxrt.jar;D:\1win7\java\jdk\jre\lib\jsse.jar;D:\1win7\java\jdk\jre\lib\management-agent.jar;D:\1win7\java\jdk\jre\lib\plugin.jar;D:\1win7\java\jdk\jre\lib\resources.jar;D:\1win7\java\jdk\jre\lib\rt.jar;D:\1win7\scala;D:\1win7\scala\lib;D:\all\idea\scala2\out\production\scala2;G:\149\spark-assembly-1.5.2-hadoop2.6.0.jar;D:\1win7\scala\lib\scala-actors-migration.jar;D:\1win7\scala\lib\scala-actors.jar;D:\1win7\scala\lib\scala-library.jar;D:\1win7\scala\lib\scala-reflect.jar;D:\1win7\scala\lib\scala-swing.jar;D:\1win7\idea\IntelliJ IDEA Community Edition 15.0.4\lib\idea_rt.jar" com.intellij.rt.execution.application.AppMain LocalSpark.Transformation1
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/03 22:19:55 INFO SparkContext: Running Spark version 1.5.2
16/03/03 22:19:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/03 22:19:56 INFO SecurityManager: Changing view acls to: xubo
16/03/03 22:19:56 INFO SecurityManager: Changing modify acls to: xubo
16/03/03 22:19:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xubo); users with modify permissions: Set(xubo)
16/03/03 22:19:57 INFO Slf4jLogger: Slf4jLogger started
16/03/03 22:19:57 INFO Remoting: Starting remoting
16/03/03 22:19:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@202.38.84.241:50601]
16/03/03 22:19:57 INFO Utils: Successfully started service 'sparkDriver' on port 50601.
16/03/03 22:19:57 INFO SparkEnv: Registering MapOutputTracker
16/03/03 22:19:57 INFO SparkEnv: Registering BlockManagerMaster
16/03/03 22:19:57 INFO DiskBlockManager: Created local directory at C:\Users\xubo\AppData\Local\Temp\blockmgr-d0b75ef9-8481-448f-8f00-149fde368635
16/03/03 22:19:57 INFO MemoryStore: MemoryStore started with capacity 730.6 MB
16/03/03 22:19:57 INFO HttpFileServer: HTTP File server directory is C:\Users\xubo\AppData\Local\Temp\spark-c2b62943-1802-4092-a678-704b95d428ca\httpd-38a16729-4063-4ff9-a3e9-336e27bd386f
16/03/03 22:19:57 INFO HttpServer: Starting HTTP Server
16/03/03 22:19:57 INFO Utils: Successfully started service 'HTTP file server' on port 50602.
16/03/03 22:19:57 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/03 22:19:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/03/03 22:19:57 INFO SparkUI: Started SparkUI at http://202.38.84.241:4040
16/03/03 22:19:58 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16/03/03 22:19:58 INFO Executor: Starting executor ID driver on host localhost
16/03/03 22:19:58 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50609.
16/03/03 22:19:58 INFO NettyBlockTransferService: Server created on 50609
16/03/03 22:19:58 INFO BlockManagerMaster: Trying to register BlockManager
16/03/03 22:19:58 INFO BlockManagerMasterEndpoint: Registering block manager localhost:50609 with 730.6 MB RAM, BlockManagerId(driver, localhost, 50609)
16/03/03 22:19:58 INFO BlockManagerMaster: Registered BlockManager
16/03/03 22:19:58 INFO SparkContext: Starting job: count at Transformation1.scala:17
16/03/03 22:19:58 INFO DAGScheduler: Got job 0 (count at Transformation1.scala:17) with 2 output partitions
16/03/03 22:19:58 INFO DAGScheduler: Final stage: ResultStage 0(count at Transformation1.scala:17)
16/03/03 22:19:58 INFO DAGScheduler: Parents of final stage: List()
16/03/03 22:19:58 INFO DAGScheduler: Missing parents: List()
16/03/03 22:19:58 INFO DAGScheduler: Submitting ResultStage 0 (UnionRDD[2] at union at Transformation1.scala:16), which has no missing parents
16/03/03 22:19:59 INFO MemoryStore: ensureFreeSpace(1784) called with curMem=0, maxMem=766075207
16/03/03 22:19:59 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1784.0 B, free 730.6 MB)
16/03/03 22:19:59 INFO MemoryStore: ensureFreeSpace(1202) called with curMem=1784, maxMem=766075207
16/03/03 22:19:59 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1202.0 B, free 730.6 MB)
16/03/03 22:19:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:50609 (size: 1202.0 B, free: 730.6 MB)
16/03/03 22:19:59 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:861
16/03/03 22:19:59 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (UnionRDD[2] at union at Transformation1.scala:16)
16/03/03 22:19:59 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/03/03 22:19:59 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:19:59 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/03/03 22:19:59 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 953 bytes result sent to driver
16/03/03 22:19:59 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:19:59 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/03/03 22:19:59 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 953 bytes result sent to driver
16/03/03 22:19:59 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 89 ms on localhost (1/2)
16/03/03 22:19:59 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 17 ms on localhost (2/2)
16/03/03 22:19:59 INFO DAGScheduler: ResultStage 0 (count at Transformation1.scala:17) finished in 0.110 s
16/03/03 22:19:59 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/03/03 22:19:59 INFO DAGScheduler: Job 0 finished: count at Transformation1.scala:17, took 0.619731 s
4
16/03/03 22:19:59 INFO SparkContext: Starting job: collect at Transformation1.scala:18
16/03/03 22:19:59 INFO DAGScheduler: Got job 1 (collect at Transformation1.scala:18) with 2 output partitions
16/03/03 22:19:59 INFO DAGScheduler: Final stage: ResultStage 1(collect at Transformation1.scala:18)
16/03/03 22:19:59 INFO DAGScheduler: Parents of final stage: List()
16/03/03 22:19:59 INFO DAGScheduler: Missing parents: List()
16/03/03 22:19:59 INFO DAGScheduler: Submitting ResultStage 1 (UnionRDD[2] at union at Transformation1.scala:16), which has no missing parents
16/03/03 22:19:59 INFO MemoryStore: ensureFreeSpace(1928) called with curMem=2986, maxMem=766075207
16/03/03 22:19:59 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 1928.0 B, free 730.6 MB)
16/03/03 22:19:59 INFO MemoryStore: ensureFreeSpace(1246) called with curMem=4914, maxMem=766075207
16/03/03 22:19:59 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1246.0 B, free 730.6 MB)
16/03/03 22:19:59 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:50609 (size: 1246.0 B, free: 730.6 MB)
16/03/03 22:19:59 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861
16/03/03 22:19:59 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (UnionRDD[2] at union at Transformation1.scala:16)
16/03/03 22:19:59 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
16/03/03 22:19:59 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:19:59 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
16/03/03 22:19:59 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1057 bytes result sent to driver
16/03/03 22:19:59 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:19:59 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
16/03/03 22:19:59 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 15 ms on localhost (1/2)
16/03/03 22:19:59 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1057 bytes result sent to driver
16/03/03 22:19:59 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 13 ms on localhost (2/2)
16/03/03 22:19:59 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
16/03/03 22:19:59 INFO DAGScheduler: ResultStage 1 (collect at Transformation1.scala:18) finished in 0.030 s
16/03/03 22:19:59 INFO DAGScheduler: Job 1 finished: collect at Transformation1.scala:18, took 0.047917 s
4
16/03/03 22:19:59 INFO SparkContext: Starting job: collect at Transformation1.scala:19
16/03/03 22:19:59 INFO DAGScheduler: Got job 2 (collect at Transformation1.scala:19) with 2 output partitions
16/03/03 22:19:59 INFO DAGScheduler: Final stage: ResultStage 2(collect at Transformation1.scala:19)
16/03/03 22:19:59 INFO DAGScheduler: Parents of final stage: List()
16/03/03 22:19:59 INFO DAGScheduler: Missing parents: List()
16/03/03 22:19:59 INFO DAGScheduler: Submitting ResultStage 2 (UnionRDD[2] at union at Transformation1.scala:16), which has no missing parents
16/03/03 22:19:59 INFO MemoryStore: ensureFreeSpace(1928) called with curMem=6160, maxMem=766075207
16/03/03 22:19:59 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 1928.0 B, free 730.6 MB)
16/03/03 22:19:59 INFO MemoryStore: ensureFreeSpace(1246) called with curMem=8088, maxMem=766075207
16/03/03 22:19:59 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1246.0 B, free 730.6 MB)
16/03/03 22:19:59 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:50609 (size: 1246.0 B, free: 730.6 MB)
16/03/03 22:19:59 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:861
16/03/03 22:19:59 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 2 (UnionRDD[2] at union at Transformation1.scala:16)
16/03/03 22:19:59 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
16/03/03 22:19:59 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:19:59 INFO Executor: Running task 0.0 in stage 2.0 (TID 4)
16/03/03 22:19:59 INFO Executor: Finished task 0.0 in stage 2.0 (TID 4). 1057 bytes result sent to driver
16/03/03 22:19:59 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 5, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:19:59 INFO Executor: Running task 1.0 in stage 2.0 (TID 5)
16/03/03 22:19:59 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 15 ms on localhost (1/2)
16/03/03 22:19:59 INFO Executor: Finished task 1.0 in stage 2.0 (TID 5). 1057 bytes result sent to driver
16/03/03 22:19:59 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 5) in 11 ms on localhost (2/2)
16/03/03 22:19:59 INFO DAGScheduler: ResultStage 2 (collect at Transformation1.scala:19) finished in 0.022 s
16/03/03 22:19:59 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
16/03/03 22:19:59 INFO DAGScheduler: Job 2 finished: collect at Transformation1.scala:19, took 0.038511 s
16/03/03 22:19:59 INFO SparkContext: Starting job: collect at Transformation1.scala:19
16/03/03 22:19:59 INFO DAGScheduler: Got job 3 (collect at Transformation1.scala:19) with 2 output partitions
16/03/03 22:19:59 INFO DAGScheduler: Final stage: ResultStage 3(collect at Transformation1.scala:19)
16/03/03 22:19:59 INFO DAGScheduler: Parents of final stage: List()
16/03/03 22:19:59 INFO DAGScheduler: Missing parents: List()
16/03/03 22:19:59 INFO DAGScheduler: Submitting ResultStage 3 (UnionRDD[2] at union at Transformation1.scala:16), which has no missing parents
16/03/03 22:19:59 INFO MemoryStore: ensureFreeSpace(1928) called with curMem=9334, maxMem=766075207
16/03/03 22:19:59 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 1928.0 B, free 730.6 MB)
16/03/03 22:19:59 INFO MemoryStore: ensureFreeSpace(1246) called with curMem=11262, maxMem=766075207
16/03/03 22:19:59 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 1246.0 B, free 730.6 MB)
16/03/03 22:19:59 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:50609 (size: 1246.0 B, free: 730.6 MB)
16/03/03 22:19:59 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:861
16/03/03 22:19:59 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 3 (UnionRDD[2] at union at Transformation1.scala:16)
16/03/03 22:19:59 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
16/03/03 22:19:59 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 6, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:19:59 INFO Executor: Running task 0.0 in stage 3.0 (TID 6)
16/03/03 22:19:59 INFO Executor: Finished task 0.0 in stage 3.0 (TID 6). 1057 bytes result sent to driver
16/03/03 22:19:59 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 7, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:19:59 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 6) in 17 ms on localhost (1/2)
16/03/03 22:19:59 INFO Executor: Running task 1.0 in stage 3.0 (TID 7)
16/03/03 22:19:59 INFO Executor: Finished task 1.0 in stage 3.0 (TID 7). 1057 bytes result sent to driver
16/03/03 22:19:59 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 7) in 15 ms on localhost (2/2)
16/03/03 22:19:59 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
16/03/03 22:19:59 INFO DAGScheduler: ResultStage 3 (collect at Transformation1.scala:19) finished in 0.026 s
16/03/03 22:19:59 INFO DAGScheduler: Job 3 finished: collect at Transformation1.scala:19, took 0.126468 s
16/03/03 22:19:59 INFO SparkContext: Starting job: collect at Transformation1.scala:19
16/03/03 22:19:59 INFO DAGScheduler: Got job 4 (collect at Transformation1.scala:19) with 2 output partitions
16/03/03 22:19:59 INFO DAGScheduler: Final stage: ResultStage 4(collect at Transformation1.scala:19)
16/03/03 22:19:59 INFO DAGScheduler: Parents of final stage: List()
16/03/03 22:19:59 INFO DAGScheduler: Missing parents: List()
16/03/03 22:19:59 INFO DAGScheduler: Submitting ResultStage 4 (UnionRDD[2] at union at Transformation1.scala:16), which has no missing parents
16/03/03 22:19:59 INFO MemoryStore: ensureFreeSpace(1928) called with curMem=12508, maxMem=766075207
16/03/03 22:19:59 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 1928.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1246) called with curMem=14436, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 1246.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:50609 (size: 1246.0 B, free: 730.6 MB)
16/03/03 22:20:00 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:861
16/03/03 22:20:00 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 4 (UnionRDD[2] at union at Transformation1.scala:16)
16/03/03 22:20:00 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
16/03/03 22:20:00 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 8, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:20:00 INFO Executor: Running task 0.0 in stage 4.0 (TID 8)
16/03/03 22:20:00 INFO Executor: Finished task 0.0 in stage 4.0 (TID 8). 1057 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Starting task 1.0 in stage 4.0 (TID 9, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:20:00 INFO Executor: Running task 1.0 in stage 4.0 (TID 9)
16/03/03 22:20:00 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 8) in 13 ms on localhost (1/2)
16/03/03 22:20:00 INFO Executor: Finished task 1.0 in stage 4.0 (TID 9). 1057 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Finished task 1.0 in stage 4.0 (TID 9) in 7 ms on localhost (2/2)
16/03/03 22:20:00 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool
16/03/03 22:20:00 INFO DAGScheduler: ResultStage 4 (collect at Transformation1.scala:19) finished in 0.020 s
16/03/03 22:20:00 INFO DAGScheduler: Job 4 finished: collect at Transformation1.scala:19, took 0.193107 s
16/03/03 22:20:00 INFO SparkContext: Starting job: collect at Transformation1.scala:19
16/03/03 22:20:00 INFO DAGScheduler: Got job 5 (collect at Transformation1.scala:19) with 2 output partitions
16/03/03 22:20:00 INFO DAGScheduler: Final stage: ResultStage 5(collect at Transformation1.scala:19)
16/03/03 22:20:00 INFO DAGScheduler: Parents of final stage: List()
16/03/03 22:20:00 INFO DAGScheduler: Missing parents: List()
16/03/03 22:20:00 INFO DAGScheduler: Submitting ResultStage 5 (UnionRDD[2] at union at Transformation1.scala:16), which has no missing parents
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1928) called with curMem=15682, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 1928.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1246) called with curMem=17610, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 1246.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:50609 (size: 1246.0 B, free: 730.6 MB)
16/03/03 22:20:00 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:861
16/03/03 22:20:00 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 5 (UnionRDD[2] at union at Transformation1.scala:16)
16/03/03 22:20:00 INFO TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
16/03/03 22:20:00 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 10, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:20:00 INFO Executor: Running task 0.0 in stage 5.0 (TID 10)
16/03/03 22:20:00 INFO Executor: Finished task 0.0 in stage 5.0 (TID 10). 1057 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 11, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:20:00 INFO Executor: Running task 1.0 in stage 5.0 (TID 11)
16/03/03 22:20:00 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 10) in 17 ms on localhost (1/2)
16/03/03 22:20:00 INFO Executor: Finished task 1.0 in stage 5.0 (TID 11). 1057 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 11) in 15 ms on localhost (2/2)
16/03/03 22:20:00 INFO DAGScheduler: ResultStage 5 (collect at Transformation1.scala:19) finished in 0.026 s
16/03/03 22:20:00 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
16/03/03 22:20:00 INFO DAGScheduler: Job 5 finished: collect at Transformation1.scala:19, took 0.107037 s
16/03/03 22:20:00 INFO SparkContext: Starting job: collect at Transformation1.scala:19
16/03/03 22:20:00 INFO DAGScheduler: Got job 6 (collect at Transformation1.scala:19) with 2 output partitions
16/03/03 22:20:00 INFO DAGScheduler: Final stage: ResultStage 6(collect at Transformation1.scala:19)
16/03/03 22:20:00 INFO DAGScheduler: Parents of final stage: List()
16/03/03 22:20:00 INFO DAGScheduler: Missing parents: List()
16/03/03 22:20:00 INFO DAGScheduler: Submitting ResultStage 6 (UnionRDD[2] at union at Transformation1.scala:16), which has no missing parents
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1928) called with curMem=18856, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 1928.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1246) called with curMem=20784, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 1246.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on localhost:50609 (size: 1246.0 B, free: 730.6 MB)
16/03/03 22:20:00 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:861
16/03/03 22:20:00 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 6 (UnionRDD[2] at union at Transformation1.scala:16)
16/03/03 22:20:00 INFO TaskSchedulerImpl: Adding task set 6.0 with 2 tasks
16/03/03 22:20:00 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 12, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:20:00 INFO Executor: Running task 0.0 in stage 6.0 (TID 12)
16/03/03 22:20:00 INFO Executor: Finished task 0.0 in stage 6.0 (TID 12). 1057 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID 13, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:20:00 INFO Executor: Running task 1.0 in stage 6.0 (TID 13)
16/03/03 22:20:00 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 12) in 10 ms on localhost (1/2)
16/03/03 22:20:00 INFO Executor: Finished task 1.0 in stage 6.0 (TID 13). 1057 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID 13) in 8 ms on localhost (2/2)
16/03/03 22:20:00 INFO DAGScheduler: ResultStage 6 (collect at Transformation1.scala:19) finished in 0.016 s
16/03/03 22:20:00 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool
16/03/03 22:20:00 INFO DAGScheduler: Job 6 finished: collect at Transformation1.scala:19, took 0.137714 s
16/03/03 22:20:00 INFO SparkContext: Starting job: collect at Transformation1.scala:20
16/03/03 22:20:00 INFO DAGScheduler: Got job 7 (collect at Transformation1.scala:20) with 2 output partitions
16/03/03 22:20:00 INFO DAGScheduler: Final stage: ResultStage 7(collect at Transformation1.scala:20)
16/03/03 22:20:00 INFO DAGScheduler: Parents of final stage: List()
16/03/03 22:20:00 INFO DAGScheduler: Missing parents: List()
16/03/03 22:20:00 INFO DAGScheduler: Submitting ResultStage 7 (UnionRDD[2] at union at Transformation1.scala:16), which has no missing parents
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1928) called with curMem=22030, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 1928.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1246) called with curMem=23958, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 1246.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on localhost:50609 (size: 1246.0 B, free: 730.6 MB)
16/03/03 22:20:00 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:861
16/03/03 22:20:00 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (UnionRDD[2] at union at Transformation1.scala:16)
16/03/03 22:20:00 INFO TaskSchedulerImpl: Adding task set 7.0 with 2 tasks
16/03/03 22:20:00 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 14, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:20:00 INFO Executor: Running task 0.0 in stage 7.0 (TID 14)
16/03/03 22:20:00 INFO Executor: Finished task 0.0 in stage 7.0 (TID 14). 1057 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Starting task 1.0 in stage 7.0 (TID 15, localhost, PROCESS_LOCAL, 2322 bytes)
16/03/03 22:20:00 INFO Executor: Running task 1.0 in stage 7.0 (TID 15)
16/03/03 22:20:00 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 14) in 9 ms on localhost (1/2)
16/03/03 22:20:00 INFO Executor: Finished task 1.0 in stage 7.0 (TID 15). 1057 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Finished task 1.0 in stage 7.0 (TID 15) in 13 ms on localhost (2/2)
16/03/03 22:20:00 INFO DAGScheduler: ResultStage 7 (collect at Transformation1.scala:20) finished in 0.020 s
16/03/03 22:20:00 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool
16/03/03 22:20:00 INFO DAGScheduler: Job 7 finished: collect at Transformation1.scala:20, took 0.032942 s
a:1
b:1
c:1
d:1
16/03/03 22:20:00 INFO SparkContext: Starting job: foreach at Transformation1.scala:26
16/03/03 22:20:00 INFO DAGScheduler: Got job 8 (foreach at Transformation1.scala:26) with 1 output partitions
16/03/03 22:20:00 INFO DAGScheduler: Final stage: ResultStage 8(foreach at Transformation1.scala:26)
16/03/03 22:20:00 INFO DAGScheduler: Parents of final stage: List()
16/03/03 22:20:00 INFO DAGScheduler: Missing parents: List()
16/03/03 22:20:00 INFO DAGScheduler: Submitting ResultStage 8 (MapPartitionsRDD[8] at filter at Transformation1.scala:26), which has no missing parents
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1816) called with curMem=25204, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 1816.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1143) called with curMem=27020, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 1143.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on localhost:50609 (size: 1143.0 B, free: 730.6 MB)
16/03/03 22:20:00 INFO SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:861
16/03/03 22:20:00 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 8 (MapPartitionsRDD[8] at filter at Transformation1.scala:26)
16/03/03 22:20:00 INFO TaskSchedulerImpl: Adding task set 8.0 with 1 tasks
16/03/03 22:20:00 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 16, localhost, PROCESS_LOCAL, 2227 bytes)
16/03/03 22:20:00 INFO Executor: Running task 0.0 in stage 8.0 (TID 16)
16/03/03 22:20:00 INFO Executor: Finished task 0.0 in stage 8.0 (TID 16). 915 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Finished task 0.0 in stage 8.0 (TID 16) in 15 ms on localhost (1/1)
16/03/03 22:20:00 INFO TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks have all completed, from pool
16/03/03 22:20:00 INFO DAGScheduler: ResultStage 8 (foreach at Transformation1.scala:26) finished in 0.028 s
a3:a:1
a3:b:1
a3:a:1
16/03/03 22:20:00 INFO DAGScheduler: Job 8 finished: foreach at Transformation1.scala:26, took 0.097583 s
16/03/03 22:20:00 INFO SparkContext: Starting job: foreach at Transformation1.scala:27
16/03/03 22:20:00 INFO DAGScheduler: Registering RDD 5 (distinct at Transformation1.scala:25)
16/03/03 22:20:00 INFO DAGScheduler: Got job 9 (foreach at Transformation1.scala:27) with 1 output partitions
16/03/03 22:20:00 INFO DAGScheduler: Final stage: ResultStage 10(foreach at Transformation1.scala:27)
16/03/03 22:20:00 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 9)
16/03/03 22:20:00 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 9)
16/03/03 22:20:00 INFO DAGScheduler: Submitting ShuffleMapStage 9 (MapPartitionsRDD[5] at distinct at Transformation1.scala:25), which has no missing parents
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(2560) called with curMem=28163, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 2.5 KB, free 730.6 MB)
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1523) called with curMem=30723, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 1523.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on localhost:50609 (size: 1523.0 B, free: 730.6 MB)
16/03/03 22:20:00 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:861
16/03/03 22:20:00 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 9 (MapPartitionsRDD[5] at distinct at Transformation1.scala:25)
16/03/03 22:20:00 INFO TaskSchedulerImpl: Adding task set 9.0 with 1 tasks
16/03/03 22:20:00 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 17, localhost, PROCESS_LOCAL, 2216 bytes)
16/03/03 22:20:00 INFO Executor: Running task 0.0 in stage 9.0 (TID 17)
16/03/03 22:20:00 INFO Executor: Finished task 0.0 in stage 9.0 (TID 17). 1158 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Finished task 0.0 in stage 9.0 (TID 17) in 62 ms on localhost (1/1)
16/03/03 22:20:00 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool
16/03/03 22:20:00 INFO DAGScheduler: ShuffleMapStage 9 (distinct at Transformation1.scala:25) finished in 0.065 s
16/03/03 22:20:00 INFO DAGScheduler: looking for newly runnable stages
16/03/03 22:20:00 INFO DAGScheduler: running: Set()
16/03/03 22:20:00 INFO DAGScheduler: waiting: Set(ResultStage 10)
16/03/03 22:20:00 INFO DAGScheduler: failed: Set()
16/03/03 22:20:00 INFO DAGScheduler: Missing parents for ResultStage 10: List()
16/03/03 22:20:00 INFO DAGScheduler: Submitting ResultStage 10 (MapPartitionsRDD[9] at filter at Transformation1.scala:27), which is now runnable
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(2920) called with curMem=32246, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 2.9 KB, free 730.6 MB)
16/03/03 22:20:00 INFO MemoryStore: ensureFreeSpace(1656) called with curMem=35166, maxMem=766075207
16/03/03 22:20:00 INFO MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 1656.0 B, free 730.6 MB)
16/03/03 22:20:00 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on localhost:50609 (size: 1656.0 B, free: 730.6 MB)
16/03/03 22:20:00 INFO SparkContext: Created broadcast 10 from broadcast at DAGScheduler.scala:861
16/03/03 22:20:00 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 10 (MapPartitionsRDD[9] at filter at Transformation1.scala:27)
16/03/03 22:20:00 INFO TaskSchedulerImpl: Adding task set 10.0 with 1 tasks
16/03/03 22:20:00 INFO TaskSetManager: Starting task 0.0 in stage 10.0 (TID 18, localhost, PROCESS_LOCAL, 1901 bytes)
16/03/03 22:20:00 INFO Executor: Running task 0.0 in stage 10.0 (TID 18)
16/03/03 22:20:00 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/03/03 22:20:00 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 9 ms
16/03/03 22:20:00 INFO Executor: Finished task 0.0 in stage 10.0 (TID 18). 1165 bytes result sent to driver
16/03/03 22:20:00 INFO TaskSetManager: Finished task 0.0 in stage 10.0 (TID 18) in 101 ms on localhost (1/1)
16/03/03 22:20:00 INFO DAGScheduler: ResultStage 10 (foreach at Transformation1.scala:27) finished in 0.102 s
16/03/03 22:20:00 INFO TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool
r2:b:1
r2:a:1
16/03/03 22:20:00 INFO DAGScheduler: Job 9 finished: foreach at Transformation1.scala:27, took 0.215763 s
16/03/03 22:20:01 INFO SparkContext: Starting job: foreach at Transformation1.scala:31
16/03/03 22:20:01 INFO DAGScheduler: Registering RDD 3 (parallelize at Transformation1.scala:23)
16/03/03 22:20:01 INFO DAGScheduler: Registering RDD 4 (parallelize at Transformation1.scala:24)
16/03/03 22:20:01 INFO DAGScheduler: Got job 10 (foreach at Transformation1.scala:31) with 1 output partitions
16/03/03 22:20:01 INFO DAGScheduler: Final stage: ResultStage 13(foreach at Transformation1.scala:31)
16/03/03 22:20:01 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 12, ShuffleMapStage 11)
16/03/03 22:20:01 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 12, ShuffleMapStage 11)
16/03/03 22:20:01 INFO DAGScheduler: Submitting ShuffleMapStage 11 (ParallelCollectionRDD[3] at parallelize at Transformation1.scala:23), which has no missing parents
16/03/03 22:20:01 INFO MemoryStore: ensureFreeSpace(1520) called with curMem=36822, maxMem=766075207
16/03/03 22:20:01 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 1520.0 B, free 730.5 MB)
16/03/03 22:20:01 INFO MemoryStore: ensureFreeSpace(980) called with curMem=38342, maxMem=766075207
16/03/03 22:20:01 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 980.0 B, free 730.5 MB)
16/03/03 22:20:01 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on localhost:50609 (size: 980.0 B, free: 730.6 MB)
16/03/03 22:20:01 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:861
16/03/03 22:20:01 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 11 (ParallelCollectionRDD[3] at parallelize at Transformation1.scala:23)
16/03/03 22:20:01 INFO TaskSchedulerImpl: Adding task set 11.0 with 1 tasks
16/03/03 22:20:01 INFO DAGScheduler: Submitting ShuffleMapStage 12 (ParallelCollectionRDD[4] at parallelize at Transformation1.scala:24), which has no missing parents
16/03/03 22:20:01 INFO MemoryStore: ensureFreeSpace(1520) called with curMem=39322, maxMem=766075207
16/03/03 22:20:01 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 1520.0 B, free 730.5 MB)
16/03/03 22:20:01 INFO MemoryStore: ensureFreeSpace(984) called with curMem=40842, maxMem=766075207
16/03/03 22:20:01 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 984.0 B, free 730.5 MB)
16/03/03 22:20:01 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on localhost:50609 (size: 984.0 B, free: 730.6 MB)
16/03/03 22:20:01 INFO TaskSetManager: Starting task 0.0 in stage 11.0 (TID 19, localhost, PROCESS_LOCAL, 2216 bytes)
16/03/03 22:20:01 INFO Executor: Running task 0.0 in stage 11.0 (TID 19)
16/03/03 22:20:01 INFO SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:861
16/03/03 22:20:01 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 12 (ParallelCollectionRDD[4] at parallelize at Transformation1.scala:24)
16/03/03 22:20:01 INFO TaskSchedulerImpl: Adding task set 12.0 with 1 tasks
16/03/03 22:20:01 INFO Executor: Finished task 0.0 in stage 11.0 (TID 19). 1158 bytes result sent to driver
16/03/03 22:20:01 INFO TaskSetManager: Starting task 0.0 in stage 12.0 (TID 20, localhost, PROCESS_LOCAL, 2272 bytes)
16/03/03 22:20:01 INFO Executor: Running task 0.0 in stage 12.0 (TID 20)
16/03/03 22:20:01 INFO TaskSetManager: Finished task 0.0 in stage 11.0 (TID 19) in 54 ms on localhost (1/1)
16/03/03 22:20:01 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool
16/03/03 22:20:01 INFO DAGScheduler: ShuffleMapStage 11 (parallelize at Transformation1.scala:23) finished in 0.060 s
16/03/03 22:20:01 INFO DAGScheduler: looking for newly runnable stages
16/03/03 22:20:01 INFO DAGScheduler: running: Set(ShuffleMapStage 12)
16/03/03 22:20:01 INFO DAGScheduler: waiting: Set(ResultStage 13)
16/03/03 22:20:01 INFO DAGScheduler: failed: Set()
16/03/03 22:20:01 INFO DAGScheduler: Missing parents for ResultStage 13: List(ShuffleMapStage 12)
16/03/03 22:20:01 INFO Executor: Finished task 0.0 in stage 12.0 (TID 20). 1158 bytes result sent to driver
16/03/03 22:20:01 INFO TaskSetManager: Finished task 0.0 in stage 12.0 (TID 20) in 24 ms on localhost (1/1)
16/03/03 22:20:01 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool
16/03/03 22:20:01 INFO DAGScheduler: ShuffleMapStage 12 (parallelize at Transformation1.scala:24) finished in 0.061 s
16/03/03 22:20:01 INFO DAGScheduler: looking for newly runnable stages
16/03/03 22:20:01 INFO DAGScheduler: running: Set()
16/03/03 22:20:01 INFO DAGScheduler: waiting: Set(ResultStage 13)
16/03/03 22:20:01 INFO DAGScheduler: failed: Set()
16/03/03 22:20:01 INFO DAGScheduler: Missing parents for ResultStage 13: List()
16/03/03 22:20:01 INFO DAGScheduler: Submitting ResultStage 13 (MapPartitionsRDD[13] at filter at Transformation1.scala:31), which is now runnable
16/03/03 22:20:01 INFO MemoryStore: ensureFreeSpace(2960) called with curMem=41826, maxMem=766075207
16/03/03 22:20:01 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 2.9 KB, free 730.5 MB)
16/03/03 22:20:01 INFO MemoryStore: ensureFreeSpace(1618) called with curMem=44786, maxMem=766075207
16/03/03 22:20:01 INFO MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 1618.0 B, free 730.5 MB)
16/03/03 22:20:01 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory on localhost:50609 (size: 1618.0 B, free: 730.6 MB)
16/03/03 22:20:01 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:861
16/03/03 22:20:01 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 13 (MapPartitionsRDD[13] at filter at Transformation1.scala:31)
16/03/03 22:20:01 INFO TaskSchedulerImpl: Adding task set 13.0 with 1 tasks
16/03/03 22:20:01 INFO TaskSetManager: Starting task 0.0 in stage 13.0 (TID 21, localhost, PROCESS_LOCAL, 1974 bytes)
16/03/03 22:20:01 INFO Executor: Running task 0.0 in stage 13.0 (TID 21)
16/03/03 22:20:01 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/03/03 22:20:01 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/03/03 22:20:01 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/03/03 22:20:01 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
j1:a:1:1
j1:a:1:2
j1:a:1:1
j1:a:1:2
j1:b:1:1
j1:b:1:2
j1:b:1:3
16/03/03 22:20:01 INFO Executor: Finished task 0.0 in stage 13.0 (TID 21). 1165 bytes result sent to driver
16/03/03 22:20:01 INFO TaskSetManager: Finished task 0.0 in stage 13.0 (TID 21) in 46 ms on localhost (1/1)
16/03/03 22:20:01 INFO DAGScheduler: ResultStage 13 (foreach at Transformation1.scala:31) finished in 0.046 s
16/03/03 22:20:01 INFO TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool
16/03/03 22:20:01 INFO DAGScheduler: Job 10 finished: foreach at Transformation1.scala:31, took 0.154299 s
16/03/03 22:20:01 INFO SparkContext: Invoking stop() from shutdown hook
16/03/03 22:20:01 INFO SparkUI: Stopped Spark web UI at http://202.38.84.241:4040
16/03/03 22:20:01 INFO DAGScheduler: Stopping DAGScheduler
16/03/03 22:20:01 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/03/03 22:20:01 INFO MemoryStore: MemoryStore cleared
16/03/03 22:20:01 INFO BlockManager: BlockManager stopped
16/03/03 22:20:01 INFO BlockManagerMaster: BlockManagerMaster stopped
16/03/03 22:20:01 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/03/03 22:20:01 INFO SparkContext: Successfully stopped SparkContext
16/03/03 22:20:01 INFO ShutdownHookManager: Shutdown hook called
16/03/03 22:20:01 INFO ShutdownHookManager: Deleting directory C:\Users\xubo\AppData\Local\Temp\spark-c2b62943-1802-4092-a678-704b95d428caProcess finished with exit code 0

Spark代码2之Transformation:union,distinct,join相关推荐

  1. 从零开始,手把手教会你5分钟用SPARK对PM2.5数据进行分析(包括环境准备和SPARK代码)...

    2019独角兽企业重金招聘Python工程师标准>>> 要从零开始,五分钟做完一个基于SPARK的PM2.5分析项目,你是不是会问 1. PM2.5的数据在哪里? 2. SPARK的 ...

  2. Union、Join语句

    Union.Join语句 Union 定义 语法 Join 示例表 定义 语法 连接属性 [outer]join(内连接) 定义 例子 Left[outer]join (左[外]连接) 定义 例子 R ...

  3. mysql join union_MySQL中union和join语句使用区别的辨析教程

    union和join是需要联合多张表时常见的关联词,具体概念我就不说了,想知道上网查就行,因为我也记不准确. 先说差别:union对两张表的操作是合并数据条数,等于是纵向的,要求是两张表字段必须是相同 ...

  4. spark代码连接hive_spark连接Hive

    作者是通过metastore方式实现spark连接hive数据库,所以首先启动metastore: hive --service metastore 另外需要将core-site.xml.hdfs-s ...

  5. Spark Standalone -- 独立集群模式、Spark 提交任务的两种模式、spark在yarn上运行的环境搭建、自己写的spark代码如何提交到yarn上并运行...

    目录 Spark Standalone -- 独立集群模式 Standalone 架构图 Standalone 的搭建 1.上传.解压.重命名 2.配置环境变量 3.修改配置文件 conf 4.同步到 ...

  6. spark代码 spark-submit提交yarn-cluster模式

    worldcount yarn-cluster集群作业运行 之前写的是一个windows本地的worldcount的代码,当然这种功能简单 代码量少的 也可以直接在spark-shell中直接输sca ...

  7. Spark广播变量之超大表left join小表时如何进行优化以及小表的正确位置

    Spark广播变量之大表left join小表时如何进行优化以及小表的正确位置放置,带着这个目标我们一探究竟. 项目场景: 最近工作中遇到一个场景: 有一个超大表3.5T和一个小表963K 需要做关联 ...

  8. 《Spark系列-SparkCore》IDEA运行Spark代码异常 -> Error:scalac: IO error while decoding \Demo2.scala with UTF-8

    IDEA运行Spark代码异常 -> Error:scalac: IO error while decoding \Demo2.scala with UTF-8 IDEA异常 Error:sca ...

  9. Spark RDD算子(transformation + action)

    概念 RDD(Resilient Distributed Dataset)叫做弹性分布式数据集,是Spark中最基本的数据抽象,它代表一个不可变.可分区.里面的元素可并行计算的集合.RDD具有数据流模 ...

  10. Spark Java API:Transformation

    mapPartitions 官方文档描述: Return a new RDD by applying a function to each partition of this RDD. mapPart ...

最新文章

  1. python命令之m参数 局域网传输
  2. c语言将ascii码存入eeprom,微机原理复习题答案+_Fixed
  3. 原创 | 常见损失函数和评价指标总结(附公式代码)
  4. NeHe教程Qt实现——lesson15
  5. 团队软件开发第一次冲刺(四)
  6. 【Qt】QObject详解
  7. coroutine资源索引
  8. jQuery 源码系列(四)Tokens 词法分析
  9. 世界卫生组织高血压防治指南_建立对团队和组织的信任的指南
  10. map文件分析 stm32_浅谈STM32的启动过程
  11. 判定是否支持XHTML
  12. mysql-proxy读写分离,负载均衡
  13. 软件测试行业到底有没有前景和出路?(最全面)
  14. 怎样将PDF中指定页面方向进行旋转
  15. 第十部分 项目风险管理
  16. 【数据仓库】大数据定义
  17. 相随与欢-彩色泡泡机的设计与实现
  18. [android源码下载索引贴】微信+二维码那都不是事......
  19. 视觉slam14讲ch5 opencv安装 ubuntu20.04
  20. FFmpeg解码视频帧为jpg图片保存到本地

热门文章

  1. Unity 之 ShaderGraph 实现火焰效果入门级教程
  2. status(c语言)
  3. 一杯免费咖啡引发的ERP上云思考
  4. 计算机技术在材料物理专业的应用,东北大学材料物理专业要学哪些课程,好学吗?...
  5. 在Flutter的项目中AndroidX Compatibility(AndroidX兼容性)配置
  6. Java获取四分位数
  7. 一个线程OOM,进程里其他线程还能运行么?
  8. 【书影观后感 八】《周期》万事皆周期
  9. 详解矩阵算法在电商sku组件中的应用一
  10. 乒乓球十一分制比赛规则_乒乓球比赛规则