前述,我们在Windows WSL上,通过Docker容器技术实现了Hadoop集群环境,现在看看利用现有集群环境进行编程开发。

1 设置容器开启时自运行SSH服务

参照docker中安装Ubuntu无法在启动时运行ssh服务的解决方案 - llCoding - 博客园 (cnblogs.com),在我们的hadoop-spark集群镜像中添加如下内容:

vim /root/startup_run.sh
chmod +x /root/startup_run.sh

编写startup_run.sh 脚本内容。

#!/bin/bashLOGTIME=$(date "+%Y-%m-%d %H:%M:%S")
echo "[$LOGTIME] startup run..." >>/root/startup_run.log
/bin/systemctl status sshd.service |grep "active (running)" > /dev/null 2>&1
if [ "$?" -ne "0" ]; thenif [ -f /root/startup_run.sh ]; then. /root/startup_run.shfi
fi
#service mysql start >>/root/startup_run.log

.bashrc末尾增加如下语句。

vim /root/.bashrc
# startup run
if [ -f /root/startup_run.sh ]; then./root/startup_run.sh
fi

同样,修改hadoop-client镜像,startup_run.sh 脚本内容改写为:

#!/bin/bashLOGTIME=$(date "+%Y-%m-%d %H:%M:%S")
echo "[$LOGTIME] startup run..." >>/root/startup_run.log
/etc/init.d/ssh start >>/root/startup_run.log
#service mysql start >>/root/startup_run.log

其他类同。

wslu@LAPTOP-ERJ3P24M:~$ sudo docker commit -m "hadoop spark ssh" hadoop-spark centos/hadoop-spark:v3

2 Hadoop Mapreduce开发

创建并运行hadoop集群容器。

sudo docker run -it --name hadoop-master -p 60070:50070 -p 2222:22 -h master centos/hadoop-spark:v3 /bin/bash
sudo docker run -it --name hadoop-node02 -p 50070 -p 22 -h node02 centos/hadoop-spark:v3 /bin/bash
sudo docker run -it --name hadoop-node03 -p 50070 -p 22 -h node03 centos/hadoop-spark:v3 /bin/bash
sudo docker run -it --name hadoop-client1 -p 22222:22 -h client1 centos/hadoop-client:v1 /bin/bash

启动容器中SSH服务。在master、node02和node03各节点容器以及client客户端容器分别执行。

service sshd start
/etc/init.d/ssh start

启动hadoop集群。在master节点容器,运行。

[root@master ~]# hadoop/hadoop-2.7.7/sbin/start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /root/hadoop/hadoop-2.7.7/logs/hadoop-root-namenode-master.out
node02: starting datanode, logging to /root/hadoop/hadoop-2.7.7/logs/hadoop-root-datanode-node02.out
node03: starting datanode, logging to /root/hadoop/hadoop-2.7.7/logs/hadoop-root-datanode-node03.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/hadoop/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-master.out
[root@master ~]# hadoop/hadoop-2.7.7/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /root/hadoop/hadoop-2.7.7/logs/yarn--resourcemanager-master.out
node02: starting nodemanager, logging to /root/hadoop/hadoop-2.7.7/logs/yarn-root-nodemanager-node02.out
node03: starting nodemanager, logging to /root/hadoop/hadoop-2.7.7/logs/yarn-root-nodemanager-node03.out

jps查看集群运行进程。

[root@master ~]# jps
4195 ResourceManager
3798 NameNode
4487 Jps
4008 SecondaryNameNode

浏览器运行http://localhost:60070。

在客户端容器执行和查看hdfs情况。

root@client1:~# hdfs dfs -mkdir /tmproot@client1:~# hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - root supergroup          0 2022-04-13 16:32 /tmp

Apache Maven的下载与安装。Maven – Download Apache Maven下载apache-maven-3.8.5-bin.tar.gz,传到hadoop-client1容器的data目录下,解压,移动到/root/apache-maven-3.8.5。设置环境变量MAVEN_HOME和PATH路径。

root@client1:~# cd
root@client1:~# cd data
root@client1:~/data# ls
apache-maven-3.8.5-bin.tar.gz
root@client1:~/data# tar xzvf apache-maven*root@client1:~/data# mv apache-maven-3.8.5 /root/apache-maven-3.8.5root@client1:~# vi .bashrc
MAVEN_HOME=/root/apache-maven-3.8.5
PATH=$PATH:$MAVEN_HOME/binexport JAVA_HOME JRE_HOME CLASSPATH PATH
export HADOOP_HOME SPARK_HOME HIVE_HOME ZOOKEEPER_HOME
export FLUME_HOME FLUME_CONF_DIR
export KAFKA_HOME FLINK_HOME
export MAVEN_HOME
root@client1:~# source .bashrc
root@client1:~# mvn --version
Apache Maven 3.8.5 (3599d3414f046de2324203b78ddcf9b5e4388aa0)
Maven home: /root/apache-maven-3.8.5
Java version: 1.8.0_312, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en_US, platform encoding: ANSI_X3.4-1968
OS name: "linux", version: "5.10.102.1-microsoft-standard-wsl2", arch: "amd64", family: "unix"
root@client1:~#

写一个WordCount测试,参考(203条消息) 使用IDEA+Maven实现MapReduce的WordCount功能_Java_Lioop的博客-CSDN博客_idea词频统计。我们使用MobaXterm开启hadoop-client1容器SSH终端,开启idea。

新建maven项目HadoopWordCount。编辑pom.xml。

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>org.example</groupId><artifactId>HadoopWordCount</artifactId><version>1.0-SNAPSHOT</version><properties><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target></properties><dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.7.7</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.7.7</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.7.7</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>2.7.7</version></dependency></dependencies>
</project>

建立项目文件结构。

devz.ideasrcmainjavaorg.example.HadoopWordCountWordCountMainWordCountMapperWordCountReducerresourcestest

编写WordCountMain、WordCountMapper和WordCountReducer三个java类。

package org.example.HadoopWordCount;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCountMain {public static void main(String[] args) throws Exception {//1.创建一个job和任务入口Job job = Job.getInstance(new Configuration());job.setJarByClass(WordCountMain.class);  //main方法所在的class//2.指定job的mapper和输出的类型<k2 v2>job.setMapperClass(WordCountMapper.class);//指定Mapper类job.setMapOutputKeyClass(Text.class);    //k2的类型job.setMapOutputValueClass(IntWritable.class);  //v2的类型//3.指定job的reducer和输出的类型<k4  v4>job.setReducerClass(WordCountReducer.class);//指定Reducer类job.setOutputKeyClass(Text.class);  //k4的类型job.setOutputValueClass(IntWritable.class);  //v4的类型//4.指定job的输入和输出FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));//5.执行jobjob.waitForCompletion(true);}
}
package org.example.HadoopWordCount;import java.io.IOException;import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;//                                      泛型    k1         v1    k2       v2
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{@Overrideprotected void map(LongWritable key1, Text value1, Context context)throws IOException, InterruptedException {//数据: I like MapReduceString data = value1.toString();//分词:按空格来分词String[] words = data.split(" ");//输出 k2    v2for(String w:words){context.write(new Text(w), new IntWritable(1));}}}
package org.example.HadoopWordCount;import java.io.IOException;import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;//                                              k3      v3         k4       v4
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {@Overrideprotected void reduce(Text k3, Iterable<IntWritable> v3,Context context) throws IOException, InterruptedException {//对v3求和int total = 0;for(IntWritable v:v3){total += v.get();}//输出   k4 单词   v4  频率context.write(k3, new IntWritable(total));}
}

在idea terminal下编译成jar包。

root@client1:~/devz# mvn clean package
[INFO] Scanning for projects...
[INFO]
[INFO] --------------------< org.example:HadoopWordCount >---------------------
[INFO] Building HadoopWordCount 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ HadoopWordCount ---
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ HadoopWordCount ---
[WARNING] Using platform encoding (ANSI_X3.4-1968 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 0 resource
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ HadoopWordCount ---
[INFO] Changes detected - recompiling the module!
[WARNING] File encoding has not been set, using platform encoding ANSI_X3.4-1968, i.e. build is platform dependent!
[INFO] Compiling 3 source files to /root/devz/target/classes[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ HadoopWordCount ---
[WARNING] Using platform encoding (ANSI_X3.4-1968 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /root/devz/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ HadoopWordCount ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ HadoopWordCount ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ HadoopWordCount ---
[INFO] Building jar: /root/devz/target/HadoopWordCount-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  1.284 s
[INFO] Finished at: 2022-04-13T18:15:53+08:00
[INFO] ------------------------------------------------------------------------

查看devz/target下存在HadoopWordCount-1.0-SNAPSHOT.jar。

新建文件word.txt。

root@client1:~# vi word.txt
hello hdfs
hdfs hello
This is MapReduce
reduce word
word great

在hdfs文件系统下新建/input目录,将word.txt传到hadoop hdfs /input下。

root@client1:~# hdfs dfs -mkdir /input
root@client1:~# hdfs dfs -put word.txt /input
root@client1:~# hdfs dfs -ls /input
Found 1 items
-rw-r--r--   3 root supergroup         63 2022-04-13 18:19 /input/word.txt
root@client1:~# hdfs dfs -cat /input/word.txt
hello hdfs
hdfs hello
This is MapReduce
reduce word
word great

接下来运行hadoop jar。

root@client1:~# hadoop jar ./devz/target/HadoopWordCount-1.0-SNAPSHOT.jar org/example/HadoopWordCount/WordCountMain /input/word.txt /output
22/04/13 18:28:33 INFO client.RMProxy: Connecting to ResourceManager at master/172.17.0.3:8032
22/04/13 18:28:33 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
22/04/13 18:28:33 INFO input.FileInputFormat: Total input paths to process : 1
22/04/13 18:28:33 INFO mapreduce.JobSubmitter: number of splits:1
22/04/13 18:28:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1649839271892_0001
22/04/13 18:28:34 INFO impl.YarnClientImpl: Submitted application application_1649839271892_0001
22/04/13 18:28:34 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1649839271892_0001/
22/04/13 18:28:34 INFO mapreduce.Job: Running job: job_1649839271892_0001
22/04/13 18:28:39 INFO mapreduce.Job: Job job_1649839271892_0001 running in uber mode : false
22/04/13 18:28:39 INFO mapreduce.Job:  map 0% reduce 0%
22/04/13 18:28:43 INFO mapreduce.Job:  map 100% reduce 0%
22/04/13 18:28:47 INFO mapreduce.Job:  map 100% reduce 100%
22/04/13 18:28:48 INFO mapreduce.Job: Job job_1649839271892_0001 completed successfully
22/04/13 18:28:48 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=135FILE: Number of bytes written=245597FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=161HDFS: Number of bytes written=63HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job CountersLaunched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=1992Total time spent by all reduces in occupied slots (ms)=1582Total time spent by all map tasks (ms)=1992Total time spent by all reduce tasks (ms)=1582Total vcore-milliseconds taken by all map tasks=1992Total vcore-milliseconds taken by all reduce tasks=1582Total megabyte-milliseconds taken by all map tasks=2039808Total megabyte-milliseconds taken by all reduce tasks=1619968Map-Reduce FrameworkMap input records=5Map output records=11Map output bytes=107Map output materialized bytes=135Input split bytes=98Combine input records=0Combine output records=0Reduce input groups=8Reduce shuffle bytes=135Reduce input records=11Reduce output records=8Spilled Records=22Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=57CPU time spent (ms)=630Physical memory (bytes) snapshot=454873088Virtual memory (bytes) snapshot=3882061824Total committed heap usage (bytes)=344457216Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format CountersBytes Read=63File Output Format CountersBytes Written=63

查看结果。

root@client1:~# hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2022-04-13 18:19 /input
drwxr-xr-x   - root supergroup          0 2022-04-13 18:28 /output
drwxr-xr-x   - root supergroup          0 2022-04-13 18:28 /tmp
root@client1:~# hdfs dfs -ls /output
Found 2 items
-rw-r--r--   3 root supergroup          0 2022-04-13 18:28 /output/_SUCCESS
-rw-r--r--   3 root supergroup         63 2022-04-13 18:28 /output/part-r-00000
root@client1:~# hdfs dfs -cat /output/part-r-00000
MapReduce       1
This    1
great   1
hdfs    2
hello   2
is      1
reduce  1
word    2

3 Spark应用

启动hadoop集群和spark集群。

同上,加载hadoop-master、hadoop-node02和hadoop03容器,并加载hadoop-client容器,修正/etc/hosts文件,正确指定master、node02、node03和client1及相应IP地址的对应。

我们通过两个shell脚本简化设置的工作。在WSL运行hadoop-master-hosts.sh,通过docker inspect获取对应容器ip地址并传回各容器。

#!/bin/bashecho -e "`docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' hadoop-master`\tmaster" > hadoop-hosts
echo -e "`docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' hadoop-node02`\tnode02" >> hadoop-hosts
echo -e "`docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' hadoop-node03`\tnode03" >> hadoop-hosts
echo -e "`docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' hadoop-client1`\tclient1" >> hadoop-hosts
#echo -e "`docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' hadoop-mysql`\tmysqls" > hadoop-hostssudo docker cp hadoop-hosts hadoop-master:/root/hosts1
sudo docker cp hadoop-hosts hadoop-node02:/root/hosts1
sudo docker cp hadoop-hosts hadoop-node03:/root/hosts1
sudo docker cp hadoop-hosts hadoop-client1:/root/hosts1
#sudo docker cp hadoop-hosts hadoop-mysql:/root/hosts1

在各容器中执行re-hosts.sh。

#!/bin/bashecho "$(sed '/172.17.0/d' /etc/hosts)" > /etc/hosts
cat hosts1 >> /etc/hosts

在master运行/root/hadoop/hadoop-2.7.7/sbin/start-dfs.sh,/root/hadoop/hadoop-2.7.7/sbin/start-yarn.sh启动hadoop。

在master运行/root/hadoop/spark-2.1.1-bin-hadoop2.7/sbin/start-master.sh,/root/hadoop/spark-2.1.1-bin-hadoop2.7/sbin/start-slaves.sh启动spark集群。

在hadoop-client1容器执行spark-shell。

root@client1:~# spark-shell --master master:7077
Error: Master must either be yarn or start with spark, mesos, local
Run with --help for usage help or --verbose for debug output
root@client1:~# spark-shell --master spark://master:7077
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/14 13:01:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/04/14 13:01:29 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
22/04/14 13:01:30 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
22/04/14 13:01:30 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://172.17.0.5:4040
Spark context available as 'sc' (master = spark://master:7077, app id = app-20220414050124-0000).
Spark session available as 'spark'.
Welcome to____              __/ __/__  ___ _____/ /___\ \/ _ \/ _ `/ __/  '_//___/ .__/\_,_/_/ /_/\_\   version 2.1.1/_/Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.scala>

输入:quit退出。

集群测试参见在集群上运行Spark应用程序_厦大数据库实验室博客 (xmu.edu.cn)。在集群中运行应用程序JAR包。把spark://master:7077作为主节点参数递给spark-submit,运行Spark自带的样例程序SparkPi,它的功能是计算得到pi的值(3.1415926)。

root@client1:~# spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077 hadoop/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.1.jar 100 2>&1 | grep "Pi is roughly"
Pi is roughly 3.1420243142024313

在spark-shell中输入代码 。

scala> val textFile = sc.textFile("hdfs://master:9000/input/word.txt")
textFile: org.apache.spark.rdd.RDD[String] = hdfs://master:9000/input/word.txt MapPartitionsRDD[1] at textFile at <console>:24scala> textFile.count()
res0: Long = 6scala> textFile.first()
res1: String = hello hdfs

spark on yarn。在集群中运行应用程序JAR包,向Hadoop YARN集群管理器提交应用,把yarn-cluster作为主节点参数递给spark-submit。

root@client1:~# spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster hadoop/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.1.jar
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/14 13:27:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/04/14 13:27:24 INFO client.RMProxy: Connecting to ResourceManager at master/172.17.0.2:8032
22/04/14 13:27:25 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
22/04/14 13:27:25 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
22/04/14 13:27:25 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
22/04/14 13:27:25 INFO yarn.Client: Setting up container launch context for our AM
22/04/14 13:27:25 INFO yarn.Client: Setting up the launch environment for our AM container
22/04/14 13:27:25 INFO yarn.Client: Preparing resources for our AM container
22/04/14 13:27:25 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
22/04/14 13:27:26 INFO yarn.Client: Uploading resource file:/tmp/spark-b31ba74b-bd24-419a-95db-f74a457290c9/__spark_libs__7573270795616141427.zip -> hdfs://master:9000/user/root/.sparkStaging/application_1649911721270_0001/__spark_libs__7573270795616141427.zip
22/04/14 13:27:27 INFO yarn.Client: Uploading resource file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.1.jar -> hdfs://master:9000/user/root/.sparkStaging/application_1649911721270_0001/spark-examples_2.11-2.1.1.jar
22/04/14 13:27:27 INFO yarn.Client: Uploading resource file:/tmp/spark-b31ba74b-bd24-419a-95db-f74a457290c9/__spark_conf__3756889894852026633.zip -> hdfs://master:9000/user/root/.sparkStaging/application_1649911721270_0001/__spark_conf__.zip
22/04/14 13:27:27 INFO spark.SecurityManager: Changing view acls to: root
22/04/14 13:27:27 INFO spark.SecurityManager: Changing modify acls to: root
22/04/14 13:27:27 INFO spark.SecurityManager: Changing view acls groups to:
22/04/14 13:27:27 INFO spark.SecurityManager: Changing modify acls groups to:
22/04/14 13:27:27 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/04/14 13:27:27 INFO yarn.Client: Submitting application application_1649911721270_0001 to ResourceManager
22/04/14 13:27:28 INFO impl.YarnClientImpl: Submitted application application_1649911721270_0001
22/04/14 13:27:29 INFO yarn.Client: Application report for application_1649911721270_0001 (state: ACCEPTED)
22/04/14 13:27:29 INFO yarn.Client:client token: N/Adiagnostics: N/AApplicationMaster host: N/AApplicationMaster RPC port: -1queue: defaultstart time: 1649914047921final status: UNDEFINEDtracking URL: http://master:8088/proxy/application_1649911721270_0001/user: root
22/04/14 13:27:30 INFO yarn.Client: Application report for application_1649911721270_0001 (state: ACCEPTED)
22/04/14 13:27:31 INFO yarn.Client: Application report for application_1649911721270_0001 (state: ACCEPTED)
22/04/14 13:27:32 INFO yarn.Client: Application report for application_1649911721270_0001 (state: RUNNING)
22/04/14 13:27:32 INFO yarn.Client:client token: N/Adiagnostics: N/AApplicationMaster host: 172.17.0.4ApplicationMaster RPC port: 0queue: defaultstart time: 1649914047921final status: UNDEFINEDtracking URL: http://master:8088/proxy/application_1649911721270_0001/user: root
22/04/14 13:27:33 INFO yarn.Client: Application report for application_1649911721270_0001 (state: RUNNING)
22/04/14 13:27:34 INFO yarn.Client: Application report for application_1649911721270_0001 (state: RUNNING)
22/04/14 13:27:35 INFO yarn.Client: Application report for application_1649911721270_0001 (state: RUNNING)
22/04/14 13:27:36 INFO yarn.Client: Application report for application_1649911721270_0001 (state: RUNNING)
22/04/14 13:27:37 INFO yarn.Client: Application report for application_1649911721270_0001 (state: FINISHED)
22/04/14 13:27:37 INFO yarn.Client:client token: N/Adiagnostics: N/AApplicationMaster host: 172.17.0.4ApplicationMaster RPC port: 0queue: defaultstart time: 1649914047921final status: SUCCEEDEDtracking URL: http://master:8088/proxy/application_1649911721270_0001/user: root
22/04/14 13:27:37 INFO util.ShutdownHookManager: Shutdown hook called
22/04/14 13:27:37 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-b31ba74b-bd24-419a-95db-f74a457290c9
root@client1:~#

用spark-shell连接到独立集群管理器上。

root@client1:~# spark-shell --master yarn
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/14 13:41:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/04/14 13:41:03 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
22/04/14 13:41:12 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://172.17.0.5:4040
Spark context available as 'sc' (master = yarn, app id = application_1649911721270_0002).
Spark session available as 'spark'.
Welcome to____              __/ __/__  ___ _____/ /___\ \/ _ \/ _ `/ __/  '_//___/ .__/\_,_/_/ /_/\_\   version 2.1.1/_/Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.scala> val textFile = sc.textFile("hdfs://master:9000/input/word.txt")
textFile: org.apache.spark.rdd.RDD[String] = hdfs://master:9000/input/word.txt MapPartitionsRDD[1] at textFile at <console>:24scala> textFile.count()
res0: Long = 6scala> textFile.first()
res2: String = hello hdfsscala>

使用idea,在idea安装Scala插件。

参考大数据,Spark_厦大数据库实验室博客 (xmu.edu.cn)。新建maven项目,在项目文件结构中src/main下新建scala目录,将scala目录设置为Resources Root。

新建一个Scala类,Word Count.scala,输入程序。

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.log4j.{Level,Logger}object WordCountLocal {def main(args: Array[String]) {//屏蔽日志//    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)//    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)val inputFile =  "hdfs://master:9000/input/word.txt"val conf = new SparkConf().setAppName("WordCount").setMaster("local[2]")val sc = new SparkContext(conf)val textFile = sc.textFile(inputFile)val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)wordCount.foreach(println)}
}

添加pom.xml。

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>org.example</groupId><artifactId>SparkWordCount</artifactId><version>1.0-SNAPSHOT</version><properties><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target><spark.version>2.1.1</spark.version><scala.version>2.11</scala.version><hadoop.version>2.7.7</hadoop.version></properties><dependencies><dependency><groupId>org.scala-lang</groupId><artifactId>scala-library</artifactId><version>2.11.12</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-mllib_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_${scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-hive_${scala.version}</artifactId><version>${spark.version}</version></dependency></dependencies></project>

运行结果。

root@client1:~# spark-submit --master local[2] --class WordCountLocal /root/devz/SparkWordCount/out/artifacts/SparkW
ordCount_jar/SparkWordCount.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/14 18:35:59 INFO spark.SparkContext: Running Spark version 2.1.1
22/04/14 18:35:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/04/14 18:35:59 INFO spark.SecurityManager: Changing view acls to: root
22/04/14 18:35:59 INFO spark.SecurityManager: Changing modify acls to: root
22/04/14 18:35:59 INFO spark.SecurityManager: Changing view acls groups to:
22/04/14 18:35:59 INFO spark.SecurityManager: Changing modify acls groups to:
22/04/14 18:35:59 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/04/14 18:35:59 INFO util.Utils: Successfully started service 'sparkDriver' on port 46193.
22/04/14 18:35:59 INFO spark.SparkEnv: Registering MapOutputTracker
22/04/14 18:35:59 INFO spark.SparkEnv: Registering BlockManagerMaster
22/04/14 18:35:59 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/04/14 18:35:59 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/04/14 18:36:00 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-4bec553e-97cf-450d-a324-b7e8deb92dd2
22/04/14 18:36:00 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
22/04/14 18:36:00 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/04/14 18:36:00 INFO util.log: Logging initialized @1195ms
22/04/14 18:36:00 INFO server.Server: jetty-9.2.z-SNAPSHOT
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3aee3976{/jobs,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5ef8df1e{/jobs/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@27cf3151{/jobs/job,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@127e70c5{/jobs/job/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5910de75{/stages,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4108fa66{/stages/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1f130eaf{/stages/stage,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7e0aadd0{/stages/stage/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21362712{/stages/pool,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@27eb3298{/stages/pool/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@200a26bc{/storage,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@bc57b40{/storage/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b5bc39d{/storage/rdd,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@655a5d9c{/storage/rdd/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1494b84d{/environment,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@34abdee4{/environment/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71a9b4c7{/executors,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4628b1d3{/executors/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77cf3f8b{/executors/threadDump,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1df98368{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21ca139c{/static,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@226f885f{/,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2cd2c8fe{/api,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d61eccf{/jobs/job/kill,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@cc6460c{/stages/stage/kill,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO server.ServerConnector: Started Spark@324c64cd{HTTP/1.1}{0.0.0.0:4040}
22/04/14 18:36:00 INFO server.Server: Started @1295ms
22/04/14 18:36:00 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/04/14 18:36:00 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.0.5:4040
22/04/14 18:36:00 INFO spark.SparkContext: Added JAR file:/root/devz/SparkWordCount/out/artifacts/SparkWordCount_jar/SparkWordCount.jar at spark://172.17.0.5:46193/jars/SparkWordCount.jar with timestamp 1649932560268
22/04/14 18:36:00 INFO executor.Executor: Starting executor ID driver on host localhost
22/04/14 18:36:00 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36177.
22/04/14 18:36:00 INFO netty.NettyBlockTransferService: Server created on 172.17.0.5:36177
22/04/14 18:36:00 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/04/14 18:36:00 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.17.0.5, 36177, None)
22/04/14 18:36:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.5:36177 with 366.3 MB RAM, BlockManagerId(driver, 172.17.0.5, 36177, None)
22/04/14 18:36:00 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.17.0.5, 36177, None)
22/04/14 18:36:00 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.17.0.5, 36177, None)
22/04/14 18:36:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@23eee4b8{/metrics/json,null,AVAILABLE,@Spark}
22/04/14 18:36:00 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 240.9 KB, free 366.1 MB)
22/04/14 18:36:00 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.2 KB, free 366.0 MB)
22/04/14 18:36:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.5:36177 (size: 23.2 KB, free: 366.3 MB)
22/04/14 18:36:00 INFO spark.SparkContext: Created broadcast 0 from textFile at WordCountLocal.scala:14
22/04/14 18:36:01 INFO mapred.FileInputFormat: Total input paths to process : 1
22/04/14 18:36:01 INFO spark.SparkContext: Starting job: foreach at WordCountLocal.scala:16
22/04/14 18:36:01 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCountLocal.scala:15)
22/04/14 18:36:01 INFO scheduler.DAGScheduler: Got job 0 (foreach at WordCountLocal.scala:16) with 2 output partitions
22/04/14 18:36:01 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (foreach at WordCountLocal.scala:16)
22/04/14 18:36:01 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
22/04/14 18:36:01 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
22/04/14 18:36:01 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCountLocal.scala:15), which has no missing parents
22/04/14 18:36:01 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.7 KB, free 366.0 MB)
22/04/14 18:36:01 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 366.0 MB)
22/04/14 18:36:01 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.5:36177 (size: 2.8 KB, free: 366.3 MB)
22/04/14 18:36:01 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:996
22/04/14 18:36:01 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCountLocal.scala:15)
22/04/14 18:36:01 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
22/04/14 18:36:01 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 6028 bytes)
22/04/14 18:36:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, ANY, 6028 bytes)
22/04/14 18:36:01 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
22/04/14 18:36:01 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
22/04/14 18:36:01 INFO executor.Executor: Fetching spark://172.17.0.5:46193/jars/SparkWordCount.jar with timestamp 1649932560268
22/04/14 18:36:01 INFO client.TransportClientFactory: Successfully created connection to /172.17.0.5:46193 after 15 ms (0 ms spent in bootstraps)
22/04/14 18:36:01 INFO util.Utils: Fetching spark://172.17.0.5:46193/jars/SparkWordCount.jar to /tmp/spark-50f639b7-f1b7-463d-ba1d-5c705d617a93/userFiles-52407d5a-ad86-48dd-ae54-6b3966c72435/fetchFileTemp1220641872057524506.tmp
22/04/14 18:36:01 INFO executor.Executor: Adding file:/tmp/spark-50f639b7-f1b7-463d-ba1d-5c705d617a93/userFiles-52407d5a-ad86-48dd-ae54-6b3966c72435/SparkWordCount.jar to class loader
22/04/14 18:36:01 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/input/word.txt:35+35
22/04/14 18:36:01 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/input/word.txt:0+35
22/04/14 18:36:01 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
22/04/14 18:36:01 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
22/04/14 18:36:01 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
22/04/14 18:36:01 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
22/04/14 18:36:01 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
22/04/14 18:36:01 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 1816 bytes result sent to driver
22/04/14 18:36:01 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1816 bytes result sent to driver
22/04/14 18:36:01 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 269 ms on localhost (executor driver) (1/2)
22/04/14 18:36:01 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 294 ms on localhost (executor driver) (2/2)
22/04/14 18:36:01 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
22/04/14 18:36:01 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at WordCountLocal.scala:15) finished in 0.306 s
22/04/14 18:36:01 INFO scheduler.DAGScheduler: looking for newly runnable stages
22/04/14 18:36:01 INFO scheduler.DAGScheduler: running: Set()
22/04/14 18:36:01 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
22/04/14 18:36:01 INFO scheduler.DAGScheduler: failed: Set()
22/04/14 18:36:01 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountLocal.scala:15), which has no missing parents
22/04/14 18:36:01 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.1 KB, free 366.0 MB)
22/04/14 18:36:01 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1954.0 B, free 366.0 MB)
22/04/14 18:36:01 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.17.0.5:36177 (size: 1954.0 B, free: 366.3 MB)
22/04/14 18:36:01 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:996
22/04/14 18:36:01 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountLocal.scala:15)
22/04/14 18:36:01 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
22/04/14 18:36:01 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 5808 bytes)
22/04/14 18:36:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 5808 bytes)
22/04/14 18:36:01 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 2)
22/04/14 18:36:01 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3)
22/04/14 18:36:01 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
22/04/14 18:36:01 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
22/04/14 18:36:01 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
22/04/14 18:36:01 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
(spark,2)
(are,2)
(you,2)
(hive,1)
(OK,1)
(hadoop,1)
(hdfs,1)
(how,1)
(hello,3)
22/04/14 18:36:01 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1809 bytes result sent to driver
22/04/14 18:36:01 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 1809 bytes result sent to driver
22/04/14 18:36:01 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 52 ms on localhost (executor driver) (1/2)
22/04/14 18:36:01 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 56 ms on localhost (executor driver) (2/2)
22/04/14 18:36:01 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
22/04/14 18:36:01 INFO scheduler.DAGScheduler: ResultStage 1 (foreach at WordCountLocal.scala:16) finished in 0.057 s
22/04/14 18:36:01 INFO scheduler.DAGScheduler: Job 0 finished: foreach at WordCountLocal.scala:16, took 0.553104 s
22/04/14 18:36:01 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/04/14 18:36:01 INFO server.ServerConnector: Stopped Spark@324c64cd{HTTP/1.1}{0.0.0.0:4040}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@cc6460c{/stages/stage/kill,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7d61eccf{/jobs/job/kill,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2cd2c8fe{/api,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@226f885f{/,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21ca139c{/static,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1df98368{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@77cf3f8b{/executors/threadDump,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4628b1d3{/executors/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@71a9b4c7{/executors,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@34abdee4{/environment/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1494b84d{/environment,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@655a5d9c{/storage/rdd/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1b5bc39d{/storage/rdd,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@bc57b40{/storage/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@200a26bc{/storage,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@27eb3298{/stages/pool/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21362712{/stages/pool,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7e0aadd0{/stages/stage/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1f130eaf{/stages/stage,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4108fa66{/stages/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5910de75{/stages,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@127e70c5{/jobs/job/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@27cf3151{/jobs/job,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ef8df1e{/jobs/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3aee3976{/jobs,null,UNAVAILABLE,@Spark}
22/04/14 18:36:01 INFO ui.SparkUI: Stopped Spark web UI at http://172.17.0.5:4040
22/04/14 18:36:01 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/04/14 18:36:01 INFO memory.MemoryStore: MemoryStore cleared
22/04/14 18:36:01 INFO storage.BlockManager: BlockManager stopped
22/04/14 18:36:01 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/04/14 18:36:01 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/04/14 18:36:01 INFO spark.SparkContext: Successfully stopped SparkContext
22/04/14 18:36:01 INFO util.ShutdownHookManager: Shutdown hook called
22/04/14 18:36:01 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-50f639b7-f1b7-463d-ba1d-5c705d617a93

此时时本地spark运行,集群运行要做一定的修改,添加 setJars和setIfMissing属性。

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.log4j.{Level,Logger}object WordCount {def main(args: Array[String]) {//屏蔽日志
//    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
//    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)val inputFile =  "hdfs://master:9000/input/word.txt"val conf = new SparkConf().setAppName("WordCount").setMaster("spark://master:7077").setJars(List("/root/devz/SparkWordCount/out/artifacts/SparkWordCount_jar/SparkWordCount.jar")).setIfMissing("spark.driver.host", "client1")val sc = new SparkContext(conf)val textFile = sc.textFile(inputFile)val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)wordCount.foreach(println)}
}

root@client1:~# spark-submit --master spark://master:7077 --class WordCountLocal /root/devz/SparkWordCount/out/artif
acts/SparkWordCount_jar/SparkWordCount.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/14 18:36:31 INFO spark.SparkContext: Running Spark version 2.1.1
22/04/14 18:36:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/04/14 18:36:31 INFO spark.SecurityManager: Changing view acls to: root
22/04/14 18:36:31 INFO spark.SecurityManager: Changing modify acls to: root
22/04/14 18:36:31 INFO spark.SecurityManager: Changing view acls groups to:
22/04/14 18:36:31 INFO spark.SecurityManager: Changing modify acls groups to:
22/04/14 18:36:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/04/14 18:36:31 INFO util.Utils: Successfully started service 'sparkDriver' on port 34369.
22/04/14 18:36:31 INFO spark.SparkEnv: Registering MapOutputTracker
22/04/14 18:36:32 INFO spark.SparkEnv: Registering BlockManagerMaster
22/04/14 18:36:32 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/04/14 18:36:32 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/04/14 18:36:32 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-935470d1-e6eb-464d-949b-d19c6b49bcc8
22/04/14 18:36:32 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
22/04/14 18:36:32 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/04/14 18:36:32 INFO util.log: Logging initialized @1181ms
22/04/14 18:36:32 INFO server.Server: jetty-9.2.z-SNAPSHOT
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3aee3976{/jobs,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5ef8df1e{/jobs/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@27cf3151{/jobs/job,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@127e70c5{/jobs/job/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5910de75{/stages,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4108fa66{/stages/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1f130eaf{/stages/stage,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7e0aadd0{/stages/stage/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21362712{/stages/pool,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@27eb3298{/stages/pool/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@200a26bc{/storage,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@bc57b40{/storage/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b5bc39d{/storage/rdd,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@655a5d9c{/storage/rdd/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1494b84d{/environment,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@34abdee4{/environment/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71a9b4c7{/executors,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4628b1d3{/executors/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77cf3f8b{/executors/threadDump,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1df98368{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21ca139c{/static,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@226f885f{/,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2cd2c8fe{/api,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d61eccf{/jobs/job/kill,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@cc6460c{/stages/stage/kill,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO server.ServerConnector: Started Spark@324c64cd{HTTP/1.1}{0.0.0.0:4040}
22/04/14 18:36:32 INFO server.Server: Started @1273ms
22/04/14 18:36:32 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/04/14 18:36:32 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.0.5:4040
22/04/14 18:36:32 INFO spark.SparkContext: Added JAR file:/root/devz/SparkWordCount/out/artifacts/SparkWordCount_jar/SparkWordCount.jar at spark://172.17.0.5:34369/jars/SparkWordCount.jar with timestamp 1649932592287
22/04/14 18:36:32 INFO executor.Executor: Starting executor ID driver on host localhost
22/04/14 18:36:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36669.
22/04/14 18:36:32 INFO netty.NettyBlockTransferService: Server created on 172.17.0.5:36669
22/04/14 18:36:32 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/04/14 18:36:32 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.17.0.5, 36669, None)
22/04/14 18:36:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.5:36669 with 366.3 MB RAM, BlockManagerId(driver, 172.17.0.5, 36669, None)
22/04/14 18:36:32 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.17.0.5, 36669, None)
22/04/14 18:36:32 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.17.0.5, 36669, None)
22/04/14 18:36:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@23eee4b8{/metrics/json,null,AVAILABLE,@Spark}
22/04/14 18:36:32 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 240.9 KB, free 366.1 MB)
22/04/14 18:36:32 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.2 KB, free 366.0 MB)
22/04/14 18:36:32 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.5:36669 (size: 23.2 KB, free: 366.3 MB)
22/04/14 18:36:32 INFO spark.SparkContext: Created broadcast 0 from textFile at WordCountLocal.scala:14
22/04/14 18:36:33 INFO mapred.FileInputFormat: Total input paths to process : 1
22/04/14 18:36:33 INFO spark.SparkContext: Starting job: foreach at WordCountLocal.scala:16
22/04/14 18:36:33 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCountLocal.scala:15)
22/04/14 18:36:33 INFO scheduler.DAGScheduler: Got job 0 (foreach at WordCountLocal.scala:16) with 2 output partitions
22/04/14 18:36:33 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (foreach at WordCountLocal.scala:16)
22/04/14 18:36:33 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
22/04/14 18:36:33 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
22/04/14 18:36:33 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCountLocal.scala:15), which has no missing parents
22/04/14 18:36:33 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.7 KB, free 366.0 MB)
22/04/14 18:36:33 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 366.0 MB)
22/04/14 18:36:33 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.5:36669 (size: 2.8 KB, free: 366.3 MB)
22/04/14 18:36:33 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:996
22/04/14 18:36:33 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCountLocal.scala:15)
22/04/14 18:36:33 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
22/04/14 18:36:33 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 6028 bytes)
22/04/14 18:36:33 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, ANY, 6028 bytes)
22/04/14 18:36:33 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
22/04/14 18:36:33 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
22/04/14 18:36:33 INFO executor.Executor: Fetching spark://172.17.0.5:34369/jars/SparkWordCount.jar with timestamp 1649932592287
22/04/14 18:36:33 INFO client.TransportClientFactory: Successfully created connection to /172.17.0.5:34369 after 16 ms (0 ms spent in bootstraps)
22/04/14 18:36:33 INFO util.Utils: Fetching spark://172.17.0.5:34369/jars/SparkWordCount.jar to /tmp/spark-8c576105-bc8d-4b13-a37e-a38faa682d04/userFiles-fea4e5af-7e35-4ec6-a896-d0ed474cc4e9/fetchFileTemp430920042879287618.tmp
22/04/14 18:36:33 INFO executor.Executor: Adding file:/tmp/spark-8c576105-bc8d-4b13-a37e-a38faa682d04/userFiles-fea4e5af-7e35-4ec6-a896-d0ed474cc4e9/SparkWordCount.jar to class loader
22/04/14 18:36:33 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/input/word.txt:35+35
22/04/14 18:36:33 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/input/word.txt:0+35
22/04/14 18:36:33 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
22/04/14 18:36:33 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
22/04/14 18:36:33 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
22/04/14 18:36:33 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
22/04/14 18:36:33 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
22/04/14 18:36:33 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 1816 bytes result sent to driver
22/04/14 18:36:33 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1816 bytes result sent to driver
22/04/14 18:36:33 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 262 ms on localhost (executor driver) (1/2)
22/04/14 18:36:33 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 290 ms on localhost (executor driver) (2/2)
22/04/14 18:36:33 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
22/04/14 18:36:33 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at WordCountLocal.scala:15) finished in 0.304 s
22/04/14 18:36:33 INFO scheduler.DAGScheduler: looking for newly runnable stages
22/04/14 18:36:33 INFO scheduler.DAGScheduler: running: Set()
22/04/14 18:36:33 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
22/04/14 18:36:33 INFO scheduler.DAGScheduler: failed: Set()
22/04/14 18:36:33 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountLocal.scala:15), which has no missing parents
22/04/14 18:36:33 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.1 KB, free 366.0 MB)
22/04/14 18:36:33 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1954.0 B, free 366.0 MB)
22/04/14 18:36:33 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.17.0.5:36669 (size: 1954.0 B, free: 366.3 MB)
22/04/14 18:36:33 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:996
22/04/14 18:36:33 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountLocal.scala:15)
22/04/14 18:36:33 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
22/04/14 18:36:33 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 5808 bytes)
22/04/14 18:36:33 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 5808 bytes)
22/04/14 18:36:33 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 2)
22/04/14 18:36:33 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3)
22/04/14 18:36:33 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
22/04/14 18:36:33 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
22/04/14 18:36:33 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 7 ms
22/04/14 18:36:33 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 7 ms
(are,2)
(spark,2)
(hive,1)
(you,2)
(hadoop,1)
(hdfs,1)
(OK,1)
(how,1)
(hello,3)
22/04/14 18:36:33 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1809 bytes result sent to driver
22/04/14 18:36:33 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 1809 bytes result sent to driver
22/04/14 18:36:33 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 46 ms on localhost (executor driver) (1/2)
22/04/14 18:36:33 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 49 ms on localhost (executor driver) (2/2)
22/04/14 18:36:33 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
22/04/14 18:36:33 INFO scheduler.DAGScheduler: ResultStage 1 (foreach at WordCountLocal.scala:16) finished in 0.050 s
22/04/14 18:36:33 INFO scheduler.DAGScheduler: Job 0 finished: foreach at WordCountLocal.scala:16, took 0.540589 s
22/04/14 18:36:33 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/04/14 18:36:33 INFO server.ServerConnector: Stopped Spark@324c64cd{HTTP/1.1}{0.0.0.0:4040}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@cc6460c{/stages/stage/kill,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7d61eccf{/jobs/job/kill,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2cd2c8fe{/api,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@226f885f{/,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21ca139c{/static,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1df98368{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@77cf3f8b{/executors/threadDump,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4628b1d3{/executors/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@71a9b4c7{/executors,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@34abdee4{/environment/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1494b84d{/environment,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@655a5d9c{/storage/rdd/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1b5bc39d{/storage/rdd,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@bc57b40{/storage/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@200a26bc{/storage,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@27eb3298{/stages/pool/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21362712{/stages/pool,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7e0aadd0{/stages/stage/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1f130eaf{/stages/stage,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4108fa66{/stages/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5910de75{/stages,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@127e70c5{/jobs/job/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@27cf3151{/jobs/job,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ef8df1e{/jobs/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3aee3976{/jobs,null,UNAVAILABLE,@Spark}
22/04/14 18:36:33 INFO ui.SparkUI: Stopped Spark web UI at http://172.17.0.5:4040
22/04/14 18:36:33 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/04/14 18:36:33 INFO memory.MemoryStore: MemoryStore cleared
22/04/14 18:36:33 INFO storage.BlockManager: BlockManager stopped
22/04/14 18:36:33 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/04/14 18:36:33 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/04/14 18:36:33 INFO spark.SparkContext: Successfully stopped SparkContext
22/04/14 18:36:33 INFO util.ShutdownHookManager: Shutdown hook called

spark on yarn的话,要进一步做一些修改。

导入yarn和hdfs配置文件。因为spark on yarn 是依赖于yarn和hdfs的,所以获取yarn和hdfs配置文件是首要条件,将core-site.xml、hdfs-site.xml、yarn-site.xml 这三个文本考入到你IDEA项目里面的resource目录下。

        <dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.7.7</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.7.7</version></dependency>

而且还要添加spark-yarn的依赖包到你的dependencies,spark-yarn_2.11-2.1.1.jar。

程序修改。

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.log4j.{Level,Logger}object WordCountYarn {def main(args: Array[String]) {//屏蔽日志//    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)//    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)val inputFile =  "hdfs://master:9000/input/word.txt"val conf = new SparkConf().setAppName("WordCount").setMaster("yarn")// 设置resourcemanager的ip.set("yarn.resourcemanager.hostname","master")// 设置executor的个数.set("spark.executor.instance","2")// 设置executor的内存大小.set("spark.executor.memory","1024M")// 设置提交任务的yarn队列.set("spark.yarn.queue","spark")// 设置driver的ip地址.set("spark.driver.host","client1").setJars(List("/root/devz/SparkWordCount/out/artifacts/SparkWordCount_jar/SparkWordCount.jar")).setIfMissing("spark.driver.host", "client1")val sc = new SparkContext(conf)val textFile = sc.textFile(inputFile)val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)wordCount.foreach(println)}
}
root@client1:~# spark-submit --master yarn --class WordCountLocal /root/devz/SparkWordCount/out/artifacts/SparkWordC
ount_jar/SparkWordCount.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/14 18:36:53 INFO spark.SparkContext: Running Spark version 2.1.1
22/04/14 18:36:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/04/14 18:36:54 INFO spark.SecurityManager: Changing view acls to: root
22/04/14 18:36:54 INFO spark.SecurityManager: Changing modify acls to: root
22/04/14 18:36:54 INFO spark.SecurityManager: Changing view acls groups to:
22/04/14 18:36:54 INFO spark.SecurityManager: Changing modify acls groups to:
22/04/14 18:36:54 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/04/14 18:36:54 INFO util.Utils: Successfully started service 'sparkDriver' on port 37081.
22/04/14 18:36:54 INFO spark.SparkEnv: Registering MapOutputTracker
22/04/14 18:36:54 INFO spark.SparkEnv: Registering BlockManagerMaster
22/04/14 18:36:54 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/04/14 18:36:54 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/04/14 18:36:54 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-f91343d9-8843-4d55-9680-4f5a2cf108f3
22/04/14 18:36:54 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
22/04/14 18:36:54 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/04/14 18:36:54 INFO util.log: Logging initialized @1139ms
22/04/14 18:36:54 INFO server.Server: jetty-9.2.z-SNAPSHOT
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@bc57b40{/jobs,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b5bc39d{/jobs/json,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@655a5d9c{/jobs/job,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1494b84d{/jobs/job/json,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@34abdee4{/stages,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71a9b4c7{/stages/json,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4628b1d3{/stages/stage,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77cf3f8b{/stages/stage/json,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1df98368{/stages/pool,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21ca139c{/stages/pool/json,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@226f885f{/storage,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2cd2c8fe{/storage/json,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d61eccf{/storage/rdd,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@cc6460c{/storage/rdd/json,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@52350abb{/environment,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@681aad3b{/environment/json,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1a6f2363{/executors,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2427e004{/executors/json,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5ebd56e9{/executors/threadDump,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@63f34b70{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@641856{/static,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b58ff9e{/,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f66e802{/api,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56b78e55{/jobs/job/kill,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@76318a7d{/stages/stage/kill,null,AVAILABLE,@Spark}
22/04/14 18:36:54 INFO server.ServerConnector: Started Spark@7c18432b{HTTP/1.1}{0.0.0.0:4040}
22/04/14 18:36:54 INFO server.Server: Started @1237ms
22/04/14 18:36:54 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/04/14 18:36:54 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.0.5:4040
22/04/14 18:36:54 INFO spark.SparkContext: Added JAR file:/root/devz/SparkWordCount/out/artifacts/SparkWordCount_jar/SparkWordCount.jar at spark://172.17.0.5:37081/jars/SparkWordCount.jar with timestamp 1649932614539
22/04/14 18:36:54 INFO executor.Executor: Starting executor ID driver on host localhost
22/04/14 18:36:54 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42659.
22/04/14 18:36:54 INFO netty.NettyBlockTransferService: Server created on 172.17.0.5:42659
22/04/14 18:36:54 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/04/14 18:36:54 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.17.0.5, 42659, None)
22/04/14 18:36:54 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.5:42659 with 366.3 MB RAM, BlockManagerId(driver, 172.17.0.5, 42659, None)
22/04/14 18:36:54 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.17.0.5, 42659, None)
22/04/14 18:36:54 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.17.0.5, 42659, None)
22/04/14 18:36:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@9f6e406{/metrics/json,null,AVAILABLE,@Spark}
22/04/14 18:36:55 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 240.9 KB, free 366.1 MB)
22/04/14 18:36:55 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.2 KB, free 366.0 MB)
22/04/14 18:36:55 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.5:42659 (size: 23.2 KB, free: 366.3 MB)
22/04/14 18:36:55 INFO spark.SparkContext: Created broadcast 0 from textFile at WordCountLocal.scala:14
22/04/14 18:36:55 INFO mapred.FileInputFormat: Total input paths to process : 1
22/04/14 18:36:55 INFO spark.SparkContext: Starting job: foreach at WordCountLocal.scala:16
22/04/14 18:36:55 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCountLocal.scala:15)
22/04/14 18:36:55 INFO scheduler.DAGScheduler: Got job 0 (foreach at WordCountLocal.scala:16) with 2 output partitions
22/04/14 18:36:55 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (foreach at WordCountLocal.scala:16)
22/04/14 18:36:55 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
22/04/14 18:36:55 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
22/04/14 18:36:55 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCountLocal.scala:15), which has no missing parents
22/04/14 18:36:55 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.7 KB, free 366.0 MB)
22/04/14 18:36:55 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 366.0 MB)
22/04/14 18:36:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.5:42659 (size: 2.8 KB, free: 366.3 MB)
22/04/14 18:36:55 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:996
22/04/14 18:36:55 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCountLocal.scala:15)
22/04/14 18:36:55 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
22/04/14 18:36:55 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 6028 bytes)
22/04/14 18:36:55 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, ANY, 6028 bytes)
22/04/14 18:36:55 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
22/04/14 18:36:55 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
22/04/14 18:36:55 INFO executor.Executor: Fetching spark://172.17.0.5:37081/jars/SparkWordCount.jar with timestamp 1649932614539
22/04/14 18:36:55 INFO client.TransportClientFactory: Successfully created connection to /172.17.0.5:37081 after 17 ms (0 ms spent in bootstraps)
22/04/14 18:36:55 INFO util.Utils: Fetching spark://172.17.0.5:37081/jars/SparkWordCount.jar to /tmp/spark-1b07077a-700b-470e-a47c-abd5e74d5450/userFiles-d82b1ab5-2f29-4e28-8aeb-8eb5941283e6/fetchFileTemp4388474812972783818.tmp
22/04/14 18:36:55 INFO executor.Executor: Adding file:/tmp/spark-1b07077a-700b-470e-a47c-abd5e74d5450/userFiles-d82b1ab5-2f29-4e28-8aeb-8eb5941283e6/SparkWordCount.jar to class loader
22/04/14 18:36:55 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/input/word.txt:0+35
22/04/14 18:36:55 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/input/word.txt:35+35
22/04/14 18:36:55 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
22/04/14 18:36:55 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
22/04/14 18:36:55 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
22/04/14 18:36:55 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
22/04/14 18:36:55 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
22/04/14 18:36:56 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 1743 bytes result sent to driver
22/04/14 18:36:56 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1743 bytes result sent to driver
22/04/14 18:36:56 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 243 ms on localhost (executor driver) (1/2)
22/04/14 18:36:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 264 ms on localhost (executor driver) (2/2)
22/04/14 18:36:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
22/04/14 18:36:56 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at WordCountLocal.scala:15) finished in 0.279 s
22/04/14 18:36:56 INFO scheduler.DAGScheduler: looking for newly runnable stages
22/04/14 18:36:56 INFO scheduler.DAGScheduler: running: Set()
22/04/14 18:36:56 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
22/04/14 18:36:56 INFO scheduler.DAGScheduler: failed: Set()
22/04/14 18:36:56 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountLocal.scala:15), which has no missing parents
22/04/14 18:36:56 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.1 KB, free 366.0 MB)
22/04/14 18:36:56 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1954.0 B, free 366.0 MB)
22/04/14 18:36:56 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.17.0.5:42659 (size: 1954.0 B, free: 366.3 MB)
22/04/14 18:36:56 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:996
22/04/14 18:36:56 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountLocal.scala:15)
22/04/14 18:36:56 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
22/04/14 18:36:56 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 5808 bytes)
22/04/14 18:36:56 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 5808 bytes)
22/04/14 18:36:56 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 2)
22/04/14 18:36:56 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3)
22/04/14 18:36:56 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
22/04/14 18:36:56 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
22/04/14 18:36:56 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
22/04/14 18:36:56 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
(spark,2)
(are,2)
(you,2)
(hive,1)
(hadoop,1)
(OK,1)
(how,1)
(hdfs,1)
(hello,3)
22/04/14 18:36:56 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 1809 bytes result sent to driver
22/04/14 18:36:56 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1809 bytes result sent to driver
22/04/14 18:36:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 49 ms on localhost (executor driver) (1/2)
22/04/14 18:36:56 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 47 ms on localhost (executor driver) (2/2)
22/04/14 18:36:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
22/04/14 18:36:56 INFO scheduler.DAGScheduler: ResultStage 1 (foreach at WordCountLocal.scala:16) finished in 0.050 s
22/04/14 18:36:56 INFO scheduler.DAGScheduler: Job 0 finished: foreach at WordCountLocal.scala:16, took 0.522530 s
22/04/14 18:36:56 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/04/14 18:36:56 INFO server.ServerConnector: Stopped Spark@7c18432b{HTTP/1.1}{0.0.0.0:4040}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@76318a7d{/stages/stage/kill,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@56b78e55{/jobs/job/kill,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2f66e802{/api,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1b58ff9e{/,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@641856{/static,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@63f34b70{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ebd56e9{/executors/threadDump,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2427e004{/executors/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1a6f2363{/executors,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@681aad3b{/environment/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@52350abb{/environment,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@cc6460c{/storage/rdd/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7d61eccf{/storage/rdd,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2cd2c8fe{/storage/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@226f885f{/storage,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21ca139c{/stages/pool/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1df98368{/stages/pool,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@77cf3f8b{/stages/stage/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4628b1d3{/stages/stage,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@71a9b4c7{/stages/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@34abdee4{/stages,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1494b84d{/jobs/job/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@655a5d9c{/jobs/job,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1b5bc39d{/jobs/json,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@bc57b40{/jobs,null,UNAVAILABLE,@Spark}
22/04/14 18:36:56 INFO ui.SparkUI: Stopped Spark web UI at http://172.17.0.5:4040
22/04/14 18:36:56 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/04/14 18:36:56 INFO memory.MemoryStore: MemoryStore cleared
22/04/14 18:36:56 INFO storage.BlockManager: BlockManager stopped
22/04/14 18:36:56 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/04/14 18:36:56 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/04/14 18:36:56 INFO spark.SparkContext: Successfully stopped SparkContext
22/04/14 18:36:56 INFO util.ShutdownHookManager: Shutdown hook called
22/04/14 18:36:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-1b07077a-700b-470e-a47c-abd5e74d5450
root@client1:~#

4 Hive应用

确保启动了hadoop容器集群、client容器和mysql容器,在master、node02、node03、client1容器/etc/hosts正确设置主机与ip地址对应。

进入client1容器,输入hive。

root@client1:~# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/root/hadoop/apache-hive-2.3.4-bin/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

测试hive 

hive> show databases;
Thu Apr 14 19:17:04 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 19:17:05 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 19:17:05 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 19:17:05 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 19:17:06 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 19:17:06 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 19:17:06 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 19:17:06 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
OK
default
Time taken: 2.462 seconds, Fetched: 1 row(s)
hive>

创建表t_test,插入数据。

hive> show tables;
OK
Time taken: 0.031 secondshive> create table t_test(id int,name string);
OK
Time taken: 0.493 secondshive> show tables;
OK
t_test
Time taken: 0.024 seconds, Fetched: 1 row(s)hive> insert into table t_test values (1,'One');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220414190118_9dc66a34-5bff-4a88-b9c5-b9b07dd16bf2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649911721270_0014, Tracking URL = http://master:8088/proxy/application_1649911721270_0014/
Kill Command = /root/hadoop/hadoop-2.7.7/bin/hadoop job  -kill job_1649911721270_0014
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-04-14 19:01:23,865 Stage-1 map = 0%,  reduce = 0%
2022-04-14 19:01:28,085 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.75 sec
MapReduce Total cumulative CPU time: 1 seconds 750 msec
Ended Job = job_1649911721270_0014
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/.hive-staging_hive_2022-04-14_19-01-18_633_3873363413686710193-1/-ext-10000
Loading data to table default.t_test
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 1.75 sec   HDFS Read: 4239 HDFS Write: 76 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 750 msec
OK
Time taken: 11.982 secondshive> select * form t_test;
FAILED: ParseException line 1:9 missing EOF at 'form' near '*'
hive> select * from t_test;
OK
1       One
Time taken: 0.125 seconds, Fetched: 1 row(s)hive> insert into table t_test values (2,'Two');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220414190209_5f9a4cbd-ba99-40c8-a88b-b82a1b8c75f3
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649911721270_0015, Tracking URL = http://master:8088/proxy/application_1649911721270_0015/
Kill Command = /root/hadoop/hadoop-2.7.7/bin/hadoop job  -kill job_1649911721270_0015
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-04-14 19:02:13,878 Stage-1 map = 0%,  reduce = 0%
2022-04-14 19:02:17,015 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.75 sec
MapReduce Total cumulative CPU time: 1 seconds 750 msec
Ended Job = job_1649911721270_0015
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/.hive-staging_hive_2022-04-14_19-02-09_297_950085809313642205-1/-ext-10000
Loading data to table default.t_test
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 1.75 sec   HDFS Read: 4248 HDFS Write: 76 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 750 msec
OK
Time taken: 10.141 secondshive> insert into table t_test values (3,'Three');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220414190230_7c827a36-8dd8-41f5-865c-b63954b35de6
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649911721270_0016, Tracking URL = http://master:8088/proxy/application_1649911721270_0016/
Kill Command = /root/hadoop/hadoop-2.7.7/bin/hadoop job  -kill job_1649911721270_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-04-14 19:02:34,473 Stage-1 map = 0%,  reduce = 0%
2022-04-14 19:02:38,615 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.9 sec
MapReduce Total cumulative CPU time: 1 seconds 900 msec
Ended Job = job_1649911721270_0016
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/.hive-staging_hive_2022-04-14_19-02-30_916_5301811094785365177-1/-ext-10000
Loading data to table default.t_test
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 1.9 sec   HDFS Read: 4255 HDFS Write: 78 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 900 msec
OK
Time taken: 9.046 secondshive> select * from t_test;
OK
1       One
2       Two
3       Three
Time taken: 0.081 seconds, Fetched: 3 row(s)hive> exit;

spark hive编程

首先,我们将hive配置中拷入hive lib中的mysql jdbc拷入spark jars中,再将hive的配置文件hive-site.xml也拷入相应spark conf目录下。

[root@master ~]# cp hadoop/apache-hive-2.3.4-bin/lib/mysql-connector* hadoop/spark-2.1.1-bin-hadoop2.7/jars
[root@master ~]# cp hadoop/apache-hive-2.3.4-bin/conf/hive-site.xml hadoop/spark-2.1.1-bin-hadoop2.7/conf[root@node02 ~]# cp hadoop/apache-hive-2.3.4-bin/lib/mysql-connector* hadoop/spark-2.1.1-bin-hadoop2.7/jars
[root@node02 ~]# cp hadoop/apache-hive-2.3.4-bin/conf/hive-site.xml hadoop/spark-2.1.1-bin-hadoop2.7/conf[root@node03 ~]# cp hadoop/apache-hive-2.3.4-bin/lib/mysql-connector* hadoop/spark-2.1.1-bin-hadoop2.7/jars
[root@node03 ~]# cp hadoop/apache-hive-2.3.4-bin/conf/hive-site.xml hadoop/spark-2.1.1-bin-hadoop2.7/conf[root@client1 ~]# cp hadoop/apache-hive-2.3.4-bin/lib/mysql-connector* hadoop/spark-2.1.1-bin-hadoop2.7/jars
[root@client1 ~]# cp hadoop/apache-hive-2.3.4-bin/conf/hive-site.xml hadoop/spark-2.1.1-bin-hadoop2.7/conf

进入client1客户端容器,键入spark-shell,查看hive数据表状况。

root@client1:~# spark-shell --master spark://master:7077
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/14 20:54:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Thu Apr 14 20:54:16 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 20:54:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 20:54:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 20:54:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 20:54:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 20:54:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 20:54:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 20:54:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/14 20:54:18 ERROR metastore.ObjectStore: Version information found in metastore differs 2.3.0 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version.
22/04/14 20:54:18 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://172.17.0.5:4040
Spark context available as 'sc' (master = spark://master:7077, app id = app-20220414125414-0018).
Spark session available as 'spark'.
Welcome to____              __/ __/__  ___ _____/ /___\ \/ _ \/ _ `/ __/  '_//___/ .__/\_,_/_/ /_/\_\   version 2.1.1/_/Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| default|   t_test|      false|
+--------+---------+-----------+scala> spark.sql("select * from t_test").show
+---+-----+
| id| name|
+---+-----+
|  1|  One|
|  2|  Two|
|  3|Three|
+---+-----+scala>

idea spark hive编程

新建maven scala项目。

pom.xml添加mysql依赖。

        <dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.49</version></dependency>

新建scala类,编写代码。

import org.apache.spark.sql.SparkSessionobject SparkHive {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().master("local[2]").appName("SparkHive").enableHiveSupport().getOrCreate()
//      .setJars(List("/root/devz/SparkWordCount/out/artifacts/SparkWordCount_jar/SparkWordCount.jar"))
//      .setIfMissing("spark.driver.host", "client1")spark.sql("select * from t_test").show()}
}

运行。

生成jar包,spark-submit运行,local模式。

root@client1:~# spark-submit --master local[2] --class SparkHive /root/devz/SparkHive/out/artifacts/SparkHive_jar/Sp
arkHive.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/14 21:41:24 INFO spark.SparkContext: Running Spark version 2.1.1
22/04/14 21:41:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/04/14 21:41:24 INFO spark.SecurityManager: Changing view acls to: root
22/04/14 21:41:24 INFO spark.SecurityManager: Changing modify acls to: root
22/04/14 21:41:24 INFO spark.SecurityManager: Changing view acls groups to:
22/04/14 21:41:24 INFO spark.SecurityManager: Changing modify acls groups to:
22/04/14 21:41:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/04/14 21:41:24 INFO util.Utils: Successfully started service 'sparkDriver' on port 44821.
22/04/14 21:41:24 INFO spark.SparkEnv: Registering MapOutputTracker
22/04/14 21:41:24 INFO spark.SparkEnv: Registering BlockManagerMaster
22/04/14 21:41:24 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/04/14 21:41:24 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/04/14 21:41:24 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-c777d8af-fca9-4d0a-bacd-ddba22f12166
22/04/14 21:41:24 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
22/04/14 21:41:24 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/04/14 21:41:24 INFO util.log: Logging initialized @1285ms
22/04/14 21:41:24 INFO server.Server: jetty-9.2.z-SNAPSHOT
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@29d2d081{/jobs,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@40e4ea87{/jobs/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58783f6c{/jobs/job,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a7b503d{/jobs/job/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@512d92b{/stages,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@62c5bbdc{/stages/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7bdf6bb7{/stages/stage,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bc53649{/stages/stage/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@88d6f9b{/stages/pool,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@47d93e0d{/stages/pool/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@475b7792{/storage,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@751e664e{/storage/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@160c3ec1{/storage/rdd,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@182b435b{/storage/rdd/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d0402b{/environment,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2fa7ae9{/environment/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7577b641{/executors,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3704122f{/executors/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3153ddfc{/executors/threadDump,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@60afd40d{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28a2a3e7{/static,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f2049b6{/,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@10b3df93{/api,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@ea27e34{/jobs/job/kill,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@33a2499c{/stages/stage/kill,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO server.ServerConnector: Started Spark@3a71c100{HTTP/1.1}{0.0.0.0:4040}
22/04/14 21:41:25 INFO server.Server: Started @1385ms
22/04/14 21:41:25 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/04/14 21:41:25 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.0.5:4040
22/04/14 21:41:25 INFO spark.SparkContext: Added JAR file:/root/devz/SparkHive/out/artifacts/SparkHive_jar/SparkHive.jar at spark://172.17.0.5:44821/jars/SparkHive.jar with timestamp 1649943685064
22/04/14 21:41:25 INFO executor.Executor: Starting executor ID driver on host localhost
22/04/14 21:41:25 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39815.
22/04/14 21:41:25 INFO netty.NettyBlockTransferService: Server created on 172.17.0.5:39815
22/04/14 21:41:25 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/04/14 21:41:25 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.17.0.5, 39815, None)
22/04/14 21:41:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.5:39815 with 366.3 MB RAM, BlockManagerId(driver, 172.17.0.5, 39815, None)
22/04/14 21:41:25 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.17.0.5, 39815, None)
22/04/14 21:41:25 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.17.0.5, 39815, None)
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3703bf3c{/metrics/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO internal.SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/root/hadoop/apache-hive-2.3.4-bin/warehouse').
22/04/14 21:41:25 INFO internal.SharedState: Warehouse path is '/root/hadoop/apache-hive-2.3.4-bin/warehouse'.
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6edaa77a{/SQL,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@62ddd21b{/SQL/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@29314cc9{/SQL/execution,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@35f8a9d3{/SQL/execution/json,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1280851e{/static/sql,null,AVAILABLE,@Spark}
22/04/14 21:41:25 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
22/04/14 21:41:25 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
22/04/14 21:41:25 INFO metastore.ObjectStore: ObjectStore, initialize called
22/04/14 21:41:25 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
22/04/14 21:41:25 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
Thu Apr 14 21:41:26 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:41:26 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:41:26 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:41:26 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/14 21:41:26 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
Thu Apr 14 21:41:26 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:41:26 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:41:26 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:41:26 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/14 21:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:41:27 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
22/04/14 21:41:27 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
22/04/14 21:41:27 INFO metastore.ObjectStore: Initialized ObjectStore
22/04/14 21:41:27 INFO metastore.HiveMetaStore: Added admin role in metastore
22/04/14 21:41:27 INFO metastore.HiveMetaStore: Added public role in metastore
22/04/14 21:41:27 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
22/04/14 21:41:27 INFO metastore.HiveMetaStore: 0: get_all_databases
22/04/14 21:41:27 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_all_databases
22/04/14 21:41:27 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
22/04/14 21:41:27 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_functions: db=default pat=*
22/04/14 21:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:41:28 INFO session.SessionState: Created local directory: /tmp/7f4c3ae7-8046-41fe-810f-ba929632c72f_resources
22/04/14 21:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7f4c3ae7-8046-41fe-810f-ba929632c72f
22/04/14 21:41:28 INFO session.SessionState: Created local directory: /tmp/root/7f4c3ae7-8046-41fe-810f-ba929632c72f
22/04/14 21:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7f4c3ae7-8046-41fe-810f-ba929632c72f/_tmp_space.db
22/04/14 21:41:28 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /root/hadoop/apache-hive-2.3.4-bin/warehouse
22/04/14 21:41:28 INFO metastore.HiveMetaStore: 0: get_database: default
22/04/14 21:41:28 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_database: default
22/04/14 21:41:28 INFO metastore.HiveMetaStore: 0: get_database: global_temp
22/04/14 21:41:28 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_database: global_temp
22/04/14 21:41:28 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
22/04/14 21:41:28 INFO execution.SparkSqlParser: Parsing command: select * from t_test
22/04/14 21:41:28 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=t_test
22/04/14 21:41:28 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_table : db=default tbl=t_test
22/04/14 21:41:28 INFO parser.CatalystSqlParser: Parsing command: int
22/04/14 21:41:28 INFO parser.CatalystSqlParser: Parsing command: string
22/04/14 21:41:29 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 296.5 KB, free 366.0 MB)
22/04/14 21:41:29 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.2 KB, free 366.0 MB)
22/04/14 21:41:29 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.5:39815 (size: 24.2 KB, free: 366.3 MB)
22/04/14 21:41:29 INFO spark.SparkContext: Created broadcast 0 from show at SparkHive.scala:13
22/04/14 21:41:30 INFO mapred.FileInputFormat: Total input paths to process : 3
22/04/14 21:41:30 INFO spark.SparkContext: Starting job: show at SparkHive.scala:13
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Got job 0 (show at SparkHive.scala:13) with 1 output partitions
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (show at SparkHive.scala:13)
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Missing parents: List()
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at show at SparkHive.scala:13), which has no missing parents
22/04/14 21:41:30 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.0 KB, free 366.0 MB)
22/04/14 21:41:30 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.4 KB, free 366.0 MB)
22/04/14 21:41:30 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.5:39815 (size: 4.4 KB, free: 366.3 MB)
22/04/14 21:41:30 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:996
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at show at SparkHive.scala:13)
22/04/14 21:41:30 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
22/04/14 21:41:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 6024 bytes)
22/04/14 21:41:30 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
22/04/14 21:41:30 INFO executor.Executor: Fetching spark://172.17.0.5:44821/jars/SparkHive.jar with timestamp 1649943685064
22/04/14 21:41:30 INFO client.TransportClientFactory: Successfully created connection to /172.17.0.5:44821 after 23 ms (0 ms spent in bootstraps)
22/04/14 21:41:30 INFO util.Utils: Fetching spark://172.17.0.5:44821/jars/SparkHive.jar to /tmp/spark-ac4c49c2-4e3a-4e49-a6d9-b17d5204c285/userFiles-6769ba50-206f-4843-b6ff-b499091200e2/fetchFileTemp808959395596279395.tmp
22/04/14 21:41:30 INFO executor.Executor: Adding file:/tmp/spark-ac4c49c2-4e3a-4e49-a6d9-b17d5204c285/userFiles-6769ba50-206f-4843-b6ff-b499091200e2/SparkHive.jar to class loader
22/04/14 21:41:30 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/000000_0:0+6
22/04/14 21:41:30 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
22/04/14 21:41:30 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
22/04/14 21:41:30 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
22/04/14 21:41:30 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
22/04/14 21:41:30 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
22/04/14 21:41:30 INFO codegen.CodeGenerator: Code generated in 150.8142 ms
22/04/14 21:41:30 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1356 bytes result sent to driver
22/04/14 21:41:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 517 ms on localhost (executor driver) (1/1)
22/04/14 21:41:30 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
22/04/14 21:41:30 INFO scheduler.DAGScheduler: ResultStage 0 (show at SparkHive.scala:13) finished in 0.532 s
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Job 0 finished: show at SparkHive.scala:13, took 0.624659 s
22/04/14 21:41:30 INFO spark.SparkContext: Starting job: show at SparkHive.scala:13
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Got job 1 (show at SparkHive.scala:13) with 2 output partitions
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (show at SparkHive.scala:13)
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Missing parents: List()
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[4] at show at SparkHive.scala:13), which has no missing parents
22/04/14 21:41:30 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 8.0 KB, free 366.0 MB)
22/04/14 21:41:30 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 4.4 KB, free 366.0 MB)
22/04/14 21:41:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.17.0.5:39815 (size: 4.4 KB, free: 366.3 MB)
22/04/14 21:41:30 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:996
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at show at SparkHive.scala:13)
22/04/14 21:41:30 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
22/04/14 21:41:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 1, ANY, 6031 bytes)
22/04/14 21:41:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, localhost, executor driver, partition 2, ANY, 6031 bytes)
22/04/14 21:41:30 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
22/04/14 21:41:30 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 2)
22/04/14 21:41:30 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/000000_0_copy_1:0+6
22/04/14 21:41:30 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/000000_0_copy_2:0+8
22/04/14 21:41:30 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1356 bytes result sent to driver
22/04/14 21:41:30 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 2). 1356 bytes result sent to driver
22/04/14 21:41:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 19 ms on localhost (executor driver) (1/2)
22/04/14 21:41:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 16 ms on localhost (executor driver) (2/2)
22/04/14 21:41:30 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
22/04/14 21:41:30 INFO scheduler.DAGScheduler: ResultStage 1 (show at SparkHive.scala:13) finished in 0.020 s
22/04/14 21:41:30 INFO scheduler.DAGScheduler: Job 1 finished: show at SparkHive.scala:13, took 0.031408 s
22/04/14 21:41:30 INFO codegen.CodeGenerator: Code generated in 15.4031 ms
+---+-----+
| id| name|
+---+-----+
|  1|  One|
|  2|  Two|
|  3|Three|
+---+-----+22/04/14 21:41:30 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/04/14 21:41:30 INFO server.ServerConnector: Stopped Spark@3a71c100{HTTP/1.1}{0.0.0.0:4040}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@33a2499c{/stages/stage/kill,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@ea27e34{/jobs/job/kill,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@10b3df93{/api,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3f2049b6{/,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@28a2a3e7{/static,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@60afd40d{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3153ddfc{/executors/threadDump,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3704122f{/executors/json,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7577b641{/executors,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2fa7ae9{/environment/json,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4d0402b{/environment,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@182b435b{/storage/rdd/json,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@160c3ec1{/storage/rdd,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@751e664e{/storage/json,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@475b7792{/storage,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@47d93e0d{/stages/pool/json,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@88d6f9b{/stages/pool,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1bc53649{/stages/stage/json,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7bdf6bb7{/stages/stage,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@62c5bbdc{/stages/json,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@512d92b{/stages,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3a7b503d{/jobs/job/json,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@58783f6c{/jobs/job,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@40e4ea87{/jobs/json,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@29d2d081{/jobs,null,UNAVAILABLE,@Spark}
22/04/14 21:41:30 INFO ui.SparkUI: Stopped Spark web UI at http://172.17.0.5:4040
22/04/14 21:41:30 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/04/14 21:41:30 INFO memory.MemoryStore: MemoryStore cleared
22/04/14 21:41:30 INFO storage.BlockManager: BlockManager stopped
22/04/14 21:41:30 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/04/14 21:41:30 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/04/14 21:41:30 INFO spark.SparkContext: Successfully stopped SparkContext
22/04/14 21:41:30 INFO util.ShutdownHookManager: Shutdown hook called
22/04/14 21:41:30 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ac4c49c2-4e3a-4e49-a6d9-b17d5204c285
root@client1:~#

standalong

root@client1:~# spark-submit --master spark://master:7077 --class SparkHive /root/devz/SparkHive/out/artifacts/Spark
Hive_jar/SparkHive.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/14 21:46:10 INFO spark.SparkContext: Running Spark version 2.1.1
22/04/14 21:46:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/04/14 21:46:10 INFO spark.SecurityManager: Changing view acls to: root
22/04/14 21:46:10 INFO spark.SecurityManager: Changing modify acls to: root
22/04/14 21:46:10 INFO spark.SecurityManager: Changing view acls groups to:
22/04/14 21:46:10 INFO spark.SecurityManager: Changing modify acls groups to:
22/04/14 21:46:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/04/14 21:46:10 INFO util.Utils: Successfully started service 'sparkDriver' on port 34059.
22/04/14 21:46:10 INFO spark.SparkEnv: Registering MapOutputTracker
22/04/14 21:46:10 INFO spark.SparkEnv: Registering BlockManagerMaster
22/04/14 21:46:10 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/04/14 21:46:10 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/04/14 21:46:10 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-afed9ed6-e636-4322-9fa1-722140a5e637
22/04/14 21:46:10 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
22/04/14 21:46:10 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/04/14 21:46:10 INFO util.log: Logging initialized @1286ms
22/04/14 21:46:10 INFO server.Server: jetty-9.2.z-SNAPSHOT
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@29d2d081{/jobs,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@40e4ea87{/jobs/json,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58783f6c{/jobs/job,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a7b503d{/jobs/job/json,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@512d92b{/stages,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@62c5bbdc{/stages/json,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7bdf6bb7{/stages/stage,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bc53649{/stages/stage/json,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@88d6f9b{/stages/pool,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@47d93e0d{/stages/pool/json,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@475b7792{/storage,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@751e664e{/storage/json,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@160c3ec1{/storage/rdd,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@182b435b{/storage/rdd/json,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d0402b{/environment,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2fa7ae9{/environment/json,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7577b641{/executors,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3704122f{/executors/json,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3153ddfc{/executors/threadDump,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@60afd40d{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28a2a3e7{/static,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f2049b6{/,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@10b3df93{/api,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@ea27e34{/jobs/job/kill,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@33a2499c{/stages/stage/kill,null,AVAILABLE,@Spark}
22/04/14 21:46:10 INFO server.ServerConnector: Started Spark@3a71c100{HTTP/1.1}{0.0.0.0:4040}
22/04/14 21:46:10 INFO server.Server: Started @1380ms
22/04/14 21:46:10 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/04/14 21:46:10 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.0.5:4040
22/04/14 21:46:10 INFO spark.SparkContext: Added JAR file:/root/devz/SparkHive/out/artifacts/SparkHive_jar/SparkHive.jar at spark://172.17.0.5:34059/jars/SparkHive.jar with timestamp 1649943970932
22/04/14 21:46:10 INFO executor.Executor: Starting executor ID driver on host localhost
22/04/14 21:46:10 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38273.
22/04/14 21:46:10 INFO netty.NettyBlockTransferService: Server created on 172.17.0.5:38273
22/04/14 21:46:10 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/04/14 21:46:10 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.17.0.5, 38273, None)
22/04/14 21:46:10 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.5:38273 with 366.3 MB RAM, BlockManagerId(driver, 172.17.0.5, 38273, None)
22/04/14 21:46:11 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.17.0.5, 38273, None)
22/04/14 21:46:11 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.17.0.5, 38273, None)
22/04/14 21:46:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3703bf3c{/metrics/json,null,AVAILABLE,@Spark}
22/04/14 21:46:11 INFO internal.SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/root/hadoop/apache-hive-2.3.4-bin/warehouse').
22/04/14 21:46:11 INFO internal.SharedState: Warehouse path is '/root/hadoop/apache-hive-2.3.4-bin/warehouse'.
22/04/14 21:46:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6edaa77a{/SQL,null,AVAILABLE,@Spark}
22/04/14 21:46:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@62ddd21b{/SQL/json,null,AVAILABLE,@Spark}
22/04/14 21:46:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@29314cc9{/SQL/execution,null,AVAILABLE,@Spark}
22/04/14 21:46:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@35f8a9d3{/SQL/execution/json,null,AVAILABLE,@Spark}
22/04/14 21:46:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1280851e{/static/sql,null,AVAILABLE,@Spark}
22/04/14 21:46:11 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
22/04/14 21:46:11 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
22/04/14 21:46:11 INFO metastore.ObjectStore: ObjectStore, initialize called
22/04/14 21:46:11 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
22/04/14 21:46:11 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
Thu Apr 14 21:46:12 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:46:12 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:46:12 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:46:12 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/14 21:46:12 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
Thu Apr 14 21:46:12 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:46:12 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:46:12 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:46:12 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/14 21:46:13 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:46:13 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:46:13 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:46:13 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:46:13 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
22/04/14 21:46:13 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
22/04/14 21:46:13 INFO metastore.ObjectStore: Initialized ObjectStore
22/04/14 21:46:13 INFO metastore.HiveMetaStore: Added admin role in metastore
22/04/14 21:46:13 INFO metastore.HiveMetaStore: Added public role in metastore
22/04/14 21:46:13 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
22/04/14 21:46:13 INFO metastore.HiveMetaStore: 0: get_all_databases
22/04/14 21:46:13 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_all_databases
22/04/14 21:46:13 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
22/04/14 21:46:13 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_functions: db=default pat=*
22/04/14 21:46:13 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:46:14 INFO session.SessionState: Created local directory: /tmp/359847ab-dbaf-496b-bcd7-79455f9aa46b_resources
22/04/14 21:46:14 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/359847ab-dbaf-496b-bcd7-79455f9aa46b
22/04/14 21:46:14 INFO session.SessionState: Created local directory: /tmp/root/359847ab-dbaf-496b-bcd7-79455f9aa46b
22/04/14 21:46:14 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/359847ab-dbaf-496b-bcd7-79455f9aa46b/_tmp_space.db
22/04/14 21:46:14 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /root/hadoop/apache-hive-2.3.4-bin/warehouse
22/04/14 21:46:14 INFO metastore.HiveMetaStore: 0: get_database: default
22/04/14 21:46:14 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_database: default
22/04/14 21:46:14 INFO metastore.HiveMetaStore: 0: get_database: global_temp
22/04/14 21:46:14 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_database: global_temp
22/04/14 21:46:14 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
22/04/14 21:46:14 INFO execution.SparkSqlParser: Parsing command: select * from t_test
22/04/14 21:46:14 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=t_test
22/04/14 21:46:14 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_table : db=default tbl=t_test
22/04/14 21:46:14 INFO parser.CatalystSqlParser: Parsing command: int
22/04/14 21:46:14 INFO parser.CatalystSqlParser: Parsing command: string
22/04/14 21:46:15 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 296.5 KB, free 366.0 MB)
22/04/14 21:46:15 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.2 KB, free 366.0 MB)
22/04/14 21:46:15 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.5:38273 (size: 24.2 KB, free: 366.3 MB)
22/04/14 21:46:15 INFO spark.SparkContext: Created broadcast 0 from show at SparkHive.scala:13
22/04/14 21:46:16 INFO mapred.FileInputFormat: Total input paths to process : 3
22/04/14 21:46:16 INFO spark.SparkContext: Starting job: show at SparkHive.scala:13
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Got job 0 (show at SparkHive.scala:13) with 1 output partitions
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (show at SparkHive.scala:13)
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Missing parents: List()
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at show at SparkHive.scala:13), which has no missing parents
22/04/14 21:46:16 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.0 KB, free 366.0 MB)
22/04/14 21:46:16 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.4 KB, free 366.0 MB)
22/04/14 21:46:16 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.5:38273 (size: 4.4 KB, free: 366.3 MB)
22/04/14 21:46:16 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:996
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at show at SparkHive.scala:13)
22/04/14 21:46:16 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
22/04/14 21:46:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 6024 bytes)
22/04/14 21:46:16 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
22/04/14 21:46:16 INFO executor.Executor: Fetching spark://172.17.0.5:34059/jars/SparkHive.jar with timestamp 1649943970932
22/04/14 21:46:16 INFO client.TransportClientFactory: Successfully created connection to /172.17.0.5:34059 after 28 ms (0 ms spent in bootstraps)
22/04/14 21:46:16 INFO util.Utils: Fetching spark://172.17.0.5:34059/jars/SparkHive.jar to /tmp/spark-00fdf7fa-f26c-4edb-8e87-0511ae0fcbb9/userFiles-32198d63-b555-4295-8f23-0c48ce6251a0/fetchFileTemp6081690347703811084.tmp
22/04/14 21:46:16 INFO executor.Executor: Adding file:/tmp/spark-00fdf7fa-f26c-4edb-8e87-0511ae0fcbb9/userFiles-32198d63-b555-4295-8f23-0c48ce6251a0/SparkHive.jar to class loader
22/04/14 21:46:16 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/000000_0:0+6
22/04/14 21:46:16 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
22/04/14 21:46:16 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
22/04/14 21:46:16 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
22/04/14 21:46:16 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
22/04/14 21:46:16 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
22/04/14 21:46:16 INFO codegen.CodeGenerator: Code generated in 141.8799 ms
22/04/14 21:46:16 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1356 bytes result sent to driver
22/04/14 21:46:16 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 525 ms on localhost (executor driver) (1/1)
22/04/14 21:46:16 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
22/04/14 21:46:16 INFO scheduler.DAGScheduler: ResultStage 0 (show at SparkHive.scala:13) finished in 0.543 s
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Job 0 finished: show at SparkHive.scala:13, took 0.640807 s
22/04/14 21:46:16 INFO spark.SparkContext: Starting job: show at SparkHive.scala:13
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Got job 1 (show at SparkHive.scala:13) with 2 output partitions
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (show at SparkHive.scala:13)
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Missing parents: List()
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[4] at show at SparkHive.scala:13), which has no missing parents
22/04/14 21:46:16 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 8.0 KB, free 366.0 MB)
22/04/14 21:46:16 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 4.4 KB, free 366.0 MB)
22/04/14 21:46:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.17.0.5:38273 (size: 4.4 KB, free: 366.3 MB)
22/04/14 21:46:16 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:996
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at show at SparkHive.scala:13)
22/04/14 21:46:16 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
22/04/14 21:46:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 1, ANY, 6031 bytes)
22/04/14 21:46:16 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, localhost, executor driver, partition 2, ANY, 6031 bytes)
22/04/14 21:46:16 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
22/04/14 21:46:16 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 2)
22/04/14 21:46:16 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/000000_0_copy_2:0+8
22/04/14 21:46:16 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/000000_0_copy_1:0+6
22/04/14 21:46:16 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 2). 1356 bytes result sent to driver
22/04/14 21:46:16 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1356 bytes result sent to driver
22/04/14 21:46:16 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 21 ms on localhost (executor driver) (1/2)
22/04/14 21:46:16 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 25 ms on localhost (executor driver) (2/2)
22/04/14 21:46:16 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
22/04/14 21:46:16 INFO scheduler.DAGScheduler: ResultStage 1 (show at SparkHive.scala:13) finished in 0.026 s
22/04/14 21:46:16 INFO scheduler.DAGScheduler: Job 1 finished: show at SparkHive.scala:13, took 0.042148 s
22/04/14 21:46:16 INFO codegen.CodeGenerator: Code generated in 16.8351 ms
+---+-----+
| id| name|
+---+-----+
|  1|  One|
|  2|  Two|
|  3|Three|
+---+-----+22/04/14 21:46:16 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/04/14 21:46:16 INFO server.ServerConnector: Stopped Spark@3a71c100{HTTP/1.1}{0.0.0.0:4040}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@33a2499c{/stages/stage/kill,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@ea27e34{/jobs/job/kill,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@10b3df93{/api,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3f2049b6{/,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@28a2a3e7{/static,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@60afd40d{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3153ddfc{/executors/threadDump,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3704122f{/executors/json,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7577b641{/executors,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2fa7ae9{/environment/json,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4d0402b{/environment,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@182b435b{/storage/rdd/json,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@160c3ec1{/storage/rdd,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@751e664e{/storage/json,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@475b7792{/storage,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@47d93e0d{/stages/pool/json,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@88d6f9b{/stages/pool,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1bc53649{/stages/stage/json,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7bdf6bb7{/stages/stage,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@62c5bbdc{/stages/json,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@512d92b{/stages,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3a7b503d{/jobs/job/json,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@58783f6c{/jobs/job,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@40e4ea87{/jobs/json,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@29d2d081{/jobs,null,UNAVAILABLE,@Spark}
22/04/14 21:46:16 INFO ui.SparkUI: Stopped Spark web UI at http://172.17.0.5:4040
22/04/14 21:46:16 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/04/14 21:46:16 INFO memory.MemoryStore: MemoryStore cleared
22/04/14 21:46:16 INFO storage.BlockManager: BlockManager stopped
22/04/14 21:46:16 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/04/14 21:46:16 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/04/14 21:46:16 INFO spark.SparkContext: Successfully stopped SparkContext
22/04/14 21:46:16 INFO util.ShutdownHookManager: Shutdown hook called
22/04/14 21:46:16 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-00fdf7fa-f26c-4edb-8e87-0511ae0fcbb9
root@client1:~#

spark on yarn。

root@client1:~# spark-submit --master yarn --class SparkHive /root/devz/SparkHive/out/artifacts/SparkHive_jar/SparkH
ive.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/spark-2.1.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/04/14 21:47:47 INFO spark.SparkContext: Running Spark version 2.1.1
22/04/14 21:47:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/04/14 21:47:48 INFO spark.SecurityManager: Changing view acls to: root
22/04/14 21:47:48 INFO spark.SecurityManager: Changing modify acls to: root
22/04/14 21:47:48 INFO spark.SecurityManager: Changing view acls groups to:
22/04/14 21:47:48 INFO spark.SecurityManager: Changing modify acls groups to:
22/04/14 21:47:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/04/14 21:47:48 INFO util.Utils: Successfully started service 'sparkDriver' on port 42613.
22/04/14 21:47:48 INFO spark.SparkEnv: Registering MapOutputTracker
22/04/14 21:47:48 INFO spark.SparkEnv: Registering BlockManagerMaster
22/04/14 21:47:48 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/04/14 21:47:48 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/04/14 21:47:48 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-910cac25-f52d-4937-bb09-c7747191c519
22/04/14 21:47:48 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
22/04/14 21:47:48 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/04/14 21:47:48 INFO util.log: Logging initialized @1232ms
22/04/14 21:47:48 INFO server.Server: jetty-9.2.z-SNAPSHOT
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@751e664e{/jobs,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@160c3ec1{/jobs/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@182b435b{/jobs/job,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d0402b{/jobs/job/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2fa7ae9{/stages,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7577b641{/stages/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3704122f{/stages/stage,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3153ddfc{/stages/stage/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@60afd40d{/stages/pool,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28a2a3e7{/stages/pool/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f2049b6{/storage,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@10b3df93{/storage/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@ea27e34{/storage/rdd,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@33a2499c{/storage/rdd/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e72dba7{/environment,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@33c2bd{/environment/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1dfd5f51{/executors,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3c321bdb{/executors/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@24855019{/executors/threadDump,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3abd581e{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d4d8fcf{/static,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@610db97e{/,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6f0628de{/api,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3fabf088{/jobs/job/kill,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1e392345{/stages/stage/kill,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO server.ServerConnector: Started Spark@51abf713{HTTP/1.1}{0.0.0.0:4040}
22/04/14 21:47:48 INFO server.Server: Started @1330ms
22/04/14 21:47:48 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/04/14 21:47:48 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.0.5:4040
22/04/14 21:47:48 INFO spark.SparkContext: Added JAR file:/root/devz/SparkHive/out/artifacts/SparkHive_jar/SparkHive.jar at spark://172.17.0.5:42613/jars/SparkHive.jar with timestamp 1649944068615
22/04/14 21:47:48 INFO executor.Executor: Starting executor ID driver on host localhost
22/04/14 21:47:48 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42113.
22/04/14 21:47:48 INFO netty.NettyBlockTransferService: Server created on 172.17.0.5:42113
22/04/14 21:47:48 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/04/14 21:47:48 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.17.0.5, 42113, None)
22/04/14 21:47:48 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.5:42113 with 366.3 MB RAM, BlockManagerId(driver, 172.17.0.5, 42113, None)
22/04/14 21:47:48 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.17.0.5, 42113, None)
22/04/14 21:47:48 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.17.0.5, 42113, None)
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a1c0752{/metrics/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO internal.SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/root/hadoop/apache-hive-2.3.4-bin/warehouse').
22/04/14 21:47:48 INFO internal.SharedState: Warehouse path is '/root/hadoop/apache-hive-2.3.4-bin/warehouse'.
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b1e7ad3{/SQL,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@13a37e2a{/SQL/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5972d253{/SQL/execution,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@31e32ea2{/SQL/execution/json,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6ef81f31{/static/sql,null,AVAILABLE,@Spark}
22/04/14 21:47:48 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
22/04/14 21:47:49 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
22/04/14 21:47:49 INFO metastore.ObjectStore: ObjectStore, initialize called
22/04/14 21:47:49 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
22/04/14 21:47:49 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
Thu Apr 14 21:47:49 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:47:49 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:47:49 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:47:49 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/14 21:47:50 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
Thu Apr 14 21:47:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:47:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:47:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Apr 14 21:47:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/04/14 21:47:50 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:47:50 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:47:50 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:47:50 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:47:51 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
22/04/14 21:47:51 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
22/04/14 21:47:51 INFO metastore.ObjectStore: Initialized ObjectStore
22/04/14 21:47:51 INFO metastore.HiveMetaStore: Added admin role in metastore
22/04/14 21:47:51 INFO metastore.HiveMetaStore: Added public role in metastore
22/04/14 21:47:51 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
22/04/14 21:47:51 INFO metastore.HiveMetaStore: 0: get_all_databases
22/04/14 21:47:51 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_all_databases
22/04/14 21:47:51 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
22/04/14 21:47:51 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_functions: db=default pat=*
22/04/14 21:47:51 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
22/04/14 21:47:51 INFO session.SessionState: Created local directory: /tmp/b440033e-4fcc-46f4-bb4c-b9780b3986b9_resources
22/04/14 21:47:51 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/b440033e-4fcc-46f4-bb4c-b9780b3986b9
22/04/14 21:47:51 INFO session.SessionState: Created local directory: /tmp/root/b440033e-4fcc-46f4-bb4c-b9780b3986b9
22/04/14 21:47:51 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/b440033e-4fcc-46f4-bb4c-b9780b3986b9/_tmp_space.db
22/04/14 21:47:51 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /root/hadoop/apache-hive-2.3.4-bin/warehouse
22/04/14 21:47:51 INFO metastore.HiveMetaStore: 0: get_database: default
22/04/14 21:47:51 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_database: default
22/04/14 21:47:51 INFO metastore.HiveMetaStore: 0: get_database: global_temp
22/04/14 21:47:51 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_database: global_temp
22/04/14 21:47:51 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
22/04/14 21:47:51 INFO execution.SparkSqlParser: Parsing command: select * from t_test
22/04/14 21:47:52 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=t_test
22/04/14 21:47:52 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr      cmd=get_table : db=default tbl=t_test
22/04/14 21:47:52 INFO parser.CatalystSqlParser: Parsing command: int
22/04/14 21:47:52 INFO parser.CatalystSqlParser: Parsing command: string
22/04/14 21:47:53 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 296.5 KB, free 366.0 MB)
22/04/14 21:47:53 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.2 KB, free 366.0 MB)
22/04/14 21:47:53 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.5:42113 (size: 24.2 KB, free: 366.3 MB)
22/04/14 21:47:53 INFO spark.SparkContext: Created broadcast 0 from show at SparkHive.scala:13
22/04/14 21:47:53 INFO mapred.FileInputFormat: Total input paths to process : 3
22/04/14 21:47:53 INFO spark.SparkContext: Starting job: show at SparkHive.scala:13
22/04/14 21:47:53 INFO scheduler.DAGScheduler: Got job 0 (show at SparkHive.scala:13) with 1 output partitions
22/04/14 21:47:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (show at SparkHive.scala:13)
22/04/14 21:47:53 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/04/14 21:47:53 INFO scheduler.DAGScheduler: Missing parents: List()
22/04/14 21:47:53 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at show at SparkHive.scala:13), which has no missing parents
22/04/14 21:47:53 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.0 KB, free 366.0 MB)
22/04/14 21:47:53 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.4 KB, free 366.0 MB)
22/04/14 21:47:53 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.5:42113 (size: 4.4 KB, free: 366.3 MB)
22/04/14 21:47:53 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:996
22/04/14 21:47:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at show at SparkHive.scala:13)
22/04/14 21:47:53 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
22/04/14 21:47:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 6024 bytes)
22/04/14 21:47:53 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
22/04/14 21:47:53 INFO executor.Executor: Fetching spark://172.17.0.5:42613/jars/SparkHive.jar with timestamp 1649944068615
22/04/14 21:47:54 INFO client.TransportClientFactory: Successfully created connection to /172.17.0.5:42613 after 26 ms (0 ms spent in bootstraps)
22/04/14 21:47:54 INFO util.Utils: Fetching spark://172.17.0.5:42613/jars/SparkHive.jar to /tmp/spark-0398d23e-d125-4b80-9f9e-1ae1929d8304/userFiles-567b128b-072c-44e5-b38b-a6acc1e260ae/fetchFileTemp6723243436101928199.tmp
22/04/14 21:47:54 INFO executor.Executor: Adding file:/tmp/spark-0398d23e-d125-4b80-9f9e-1ae1929d8304/userFiles-567b128b-072c-44e5-b38b-a6acc1e260ae/SparkHive.jar to class loader
22/04/14 21:47:54 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/000000_0:0+6
22/04/14 21:47:54 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
22/04/14 21:47:54 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
22/04/14 21:47:54 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
22/04/14 21:47:54 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
22/04/14 21:47:54 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
22/04/14 21:47:54 INFO codegen.CodeGenerator: Code generated in 142.4007 ms
22/04/14 21:47:54 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1356 bytes result sent to driver
22/04/14 21:47:54 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 531 ms on localhost (executor driver) (1/1)
22/04/14 21:47:54 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
22/04/14 21:47:54 INFO scheduler.DAGScheduler: ResultStage 0 (show at SparkHive.scala:13) finished in 0.548 s
22/04/14 21:47:54 INFO scheduler.DAGScheduler: Job 0 finished: show at SparkHive.scala:13, took 0.639079 s
22/04/14 21:47:54 INFO spark.SparkContext: Starting job: show at SparkHive.scala:13
22/04/14 21:47:54 INFO scheduler.DAGScheduler: Got job 1 (show at SparkHive.scala:13) with 2 output partitions
22/04/14 21:47:54 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (show at SparkHive.scala:13)
22/04/14 21:47:54 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/04/14 21:47:54 INFO scheduler.DAGScheduler: Missing parents: List()
22/04/14 21:47:54 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[4] at show at SparkHive.scala:13), which has no missing parents
22/04/14 21:47:54 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 8.0 KB, free 366.0 MB)
22/04/14 21:47:54 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 4.4 KB, free 366.0 MB)
22/04/14 21:47:54 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.17.0.5:42113 (size: 4.4 KB, free: 366.3 MB)
22/04/14 21:47:54 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:996
22/04/14 21:47:54 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at show at SparkHive.scala:13)
22/04/14 21:47:54 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
22/04/14 21:47:54 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 1, ANY, 6031 bytes)
22/04/14 21:47:54 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, localhost, executor driver, partition 2, ANY, 6031 bytes)
22/04/14 21:47:54 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
22/04/14 21:47:54 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 2)
22/04/14 21:47:54 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/000000_0_copy_1:0+6
22/04/14 21:47:54 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/root/hadoop/apache-hive-2.3.4-bin/warehouse/t_test/000000_0_copy_2:0+8
22/04/14 21:47:54 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 2). 1356 bytes result sent to driver
22/04/14 21:47:54 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1356 bytes result sent to driver
22/04/14 21:47:54 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 21 ms on localhost (executor driver) (1/2)
22/04/14 21:47:54 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 25 ms on localhost (executor driver) (2/2)
22/04/14 21:47:54 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
22/04/14 21:47:54 INFO scheduler.DAGScheduler: ResultStage 1 (show at SparkHive.scala:13) finished in 0.026 s
22/04/14 21:47:54 INFO scheduler.DAGScheduler: Job 1 finished: show at SparkHive.scala:13, took 0.040328 s
22/04/14 21:47:54 INFO codegen.CodeGenerator: Code generated in 15.9474 ms
+---+-----+
| id| name|
+---+-----+
|  1|  One|
|  2|  Two|
|  3|Three|
+---+-----+22/04/14 21:47:54 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/04/14 21:47:54 INFO server.ServerConnector: Stopped Spark@51abf713{HTTP/1.1}{0.0.0.0:4040}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1e392345{/stages/stage/kill,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3fabf088{/jobs/job/kill,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@6f0628de{/api,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@610db97e{/,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4d4d8fcf{/static,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3abd581e{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@24855019{/executors/threadDump,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3c321bdb{/executors/json,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1dfd5f51{/executors,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@33c2bd{/environment/json,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@e72dba7{/environment,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@33a2499c{/storage/rdd/json,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@ea27e34{/storage/rdd,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@10b3df93{/storage/json,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3f2049b6{/storage,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@28a2a3e7{/stages/pool/json,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@60afd40d{/stages/pool,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3153ddfc{/stages/stage/json,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3704122f{/stages/stage,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7577b641{/stages/json,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2fa7ae9{/stages,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4d0402b{/jobs/job/json,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@182b435b{/jobs/job,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@160c3ec1{/jobs/json,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@751e664e{/jobs,null,UNAVAILABLE,@Spark}
22/04/14 21:47:54 INFO ui.SparkUI: Stopped Spark web UI at http://172.17.0.5:4040
22/04/14 21:47:54 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/04/14 21:47:54 INFO memory.MemoryStore: MemoryStore cleared
22/04/14 21:47:54 INFO storage.BlockManager: BlockManager stopped
22/04/14 21:47:54 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/04/14 21:47:54 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/04/14 21:47:54 INFO spark.SparkContext: Successfully stopped SparkContext
22/04/14 21:47:54 INFO util.ShutdownHookManager: Shutdown hook called
22/04/14 21:47:54 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0398d23e-d125-4b80-9f9e-1ae1929d8304
root@client1:~#

Windows PC上创建大数据职业技能竞赛实验环境之五--hadoop、hive和spark编程相关推荐

  1. Windows PC上创建大数据职业技能竞赛实验环境之四--客户端环境的搭建

    Hadoop集群访问客户端 X window Jet brain Idea和客户端开发环境 以上环境我们将在WSL的Ubuntu docker容器中实现,其中对X和Idea的访问,通过Windows ...

  2. Windows PC上创建大数据职业技能竞赛实验环境之一--基本环境的搭建

    1 大数据职业竞赛环境的基础要求 软件环境 设备类型 软件类别 软件名称.版本号 竞赛服务器 竞赛环境大数据集群操作系统 CentOS 7 大数据平台组件 Hadoop 2.7.7 Hive 2.3. ...

  3. Windows PC上创建大数据职业技能竞赛实验环境之七--疑难问题

    1 Ubuntu Docker中文显示问题 查看locale设置,我们可以看到Ubuntu缺省local是POSIX,需要进行修改. root@client1:~# locale LANG= LANG ...

  4. Windows PC上创建大数据职业技能竞赛实验环境之六--Flume、Kafka和Flink编程

    1 Flume 参看日志采集工具Flume的安装与使用方法_厦大数据库实验室博客 (xmu.edu.cn). 查看Flume安装 root@client1:~# flume-ng version Fl ...

  5. 自学大数据:用以生产环境的Hadoop版本比较

    一.背景介绍 生产环境中,hadoop的版本选择是一个公司架构之时,很重要的一个考虑因素.这篇文章根据就谈谈现在主流的hadoop版本的比较.如果有不同意见,或者指正,希望大家能交流. Apache ...

  6. 大数据培训之核心知识点Hbase、Hive、Spark和MapReduce的概念理解、特点及机制等

    今天,上海尚学堂大数据培训班毕业的一位学生去参加易普软件公司面试,应聘的职位是大数据开发.面试官问了他10个问题,主要集中在Hbase.Spark.Hive和MapReduce上,基础概念.特点.应用 ...

  7. 全国大学生大数据技能竞赛(Hadoop集群搭建)

    系列文章 全国大学生大数据技能竞赛(数仓部署) 全国大学生大数据技能竞赛(Spark on Yarn安装) 文章目录 系列文章 前言 资料链接 用VMware练习配置前准备三台虚拟机并修改网络为桥接 ...

  8. 在Kaggle上赢得大数据竞赛的技巧和窍门

    在Kaggle上赢得大数据竞赛的技巧和窍门 解决方案 平台 数据 应用 方法 阅读1906  原文:The tips and tricks I used to succeed on Kaggle  作 ...

  9. 就业培训 | 2020第一期重庆高校毕业生大数据职业技能线上特训营开课啦

    "大数据特训营开课啦!""芝诺数据的老师太'可'了~"这几天,同学们的朋友圈被刷屏了,原来,是2020第一期重庆高校毕业生大数据职业技能线上特训营开讲了! &q ...

最新文章

  1. 手语识别 机器学习_机器学习入门实践,让机器识别一只猫
  2. python cgitb_python CGI 编程实践
  3. mui 解决弹出图片问题
  4. disconf mysql_disconf-web 安装
  5. java学习(121):treeset排序集合
  6. 最拼爹的css属性:z-index失效情况记录
  7. 大数据Hadoop原理学习(HDFS,MAPREDUCE,YARN)
  8. 机器学习基础:主成分分析(Machine Learning Fundamentals: PCA)
  9. AEAI Miscdp文件上传功能使用心得
  10. 自动驾驶 4-1 二维运动学建模Kinematic Modeling in 2D
  11. ADMM算法在神经网络模型剪枝方面的应用
  12. 计算机科学数学背景,计算机科学中的数学教育.pdf
  13. 程序员的自我修养有哪些途径
  14. 谣言检测论文精读——1.IJCAI2016-Detecting Rumors from Microblogs with Recurrent Neural Networks
  15. 手写数字识别问题(5)——完结
  16. Python 静态方法 类方法
  17. Python爬虫是个啥?学了Python爬虫有什么用?
  18. 基于python高校学生管理系统
  19. 【P2P网络】DHT协议基础1:Kademlia翻译稿
  20. 计算机配置35%卡住不动了,大师教你win7 update更新卡住35%不动应该怎么解决

热门文章

  1. flutter开发android部分页面,Flutter(Android 混合开发)
  2. 处理iphone 微信中.play()方法不能播放的问题
  3. MySQL基础知识——ALTER TABLE
  4. String转化为date类型,从而获取星期几
  5. 让所有网站都支持深色主题 - Dark Reader
  6. 十三、JavaSE-IO体系
  7. 联想基于OpenStack的高可用企业云平台实践
  8. python网页登录钉钉_【Python】关于钉钉接口使用Python,Post 500报错
  9. 华为AP固件升级方法 (适用 4051TN系列AP)
  10. mysql 1067 abouting_GitHub - chenxiao07150808/MySQL