Standalone模式部署实战

spark有好几种运行模式，本次我们来挑一种standalone模式来操作一下，就是spark独自包揽除了存储文件操作之外的所有操作，包括集群管理，任务调度，程序计算等等，这种模式适合不大的程序，不需要yarn等将部署整的很复杂。专业点的描述就是，利用Spark自带的资源管理与调度器运行Spark集群，采用Master/Slave结构，为解决单点故障，可以采用Xookeeper实现高可靠(High Availability, HA)。开始吧
首先准备以下东西

要部署的Application，已经打包成一个jar包，或者没有用自带的实例
四台linux机器，物理机或者虚拟机，要求互相能ping通，并且已经安装好jdk 我的是jdk-1.8.0_101
spark安装包，我的版本spark-2.3-hadoop-2.7版本
zookeeper安装包
每台机器都要安装scala环境我的版本scala-2.11.12

一、操作系统准备

四台机器ip如下:

10.1.161.91
10.1.161.92
10.1.161.94
10.1.161.95

1、主机名修改

为了便于后续操作，修改下主机名，修改成统一格式，我的机器对应如下

10.1.161.91  --->   test01.yuyin
10.1.161.92  --->   test02.yuyin
10.1.161.94  --->   test04.yuyin
10.1.161.95  --->   test05.yuyin

2、配置主机和ip的映射修改hosts文件

每台机器执行

vi /etc/hosts

在文件后面添加下列配置

10.1.161.95 test05.yuyin
10.1.161.94 test04.yuyin
10.1.161.91 test01.yuyin
10.1.161.92 test02.yuyin

保存好了，可以测试下ping情况，比如在test01.yuyin上ping test02.yuyin，结果如下

每台机器都要试一下，保证映射有效，以免后续出错
同时也要测一下外网的连通性，如下

如果不通的话，检查下是否dns解析有问题

vi /etc/resolv.conf

尝试修改nameServer。或者是其他什么原因，请自行排查

3、无密码访问

如果机器没有安装openssh，执行下列命令安装

yum install openssh-server

每台机器都要安装，安装好了以后，生成密钥，命令如下

ssh-keygen -t rsa

执行的时候直接几次enter就行了，由于我这个已经生成过了，因此会问是否覆盖，如下

生成好了以后，将密钥拷贝到其他机器，通过以下指令：

ssh-copy-id -i test02.yuyin
ssh-copy-id -i test04.yuyin
ssh-copy-id -i test05.yuyin

我在test01上执行

ssh-copy-id -i test02.yuyin

结果

执行过程中需要输入访问密码。所有的机器都要执行，目的是确保任意两台机器之间都可以互相无密码登录，才能作为一个集群，共同协作。
接下来测试下是否可以无密码互通，执行

ssh test02.yuyin

结果

可以看到现在test01.yuyin已经可以无密码访问test02.yuyin了，同时test02.yuyin也可以访问test01.yuyin，如下

其他机器同理，

ssh test04.yuyin
ssh test05.yuyin

保证任意两台机器之间可以互相无密码操作即可，还可以用scp命令测试下是否可以无密码互相复制文件啥的。
这一步算完成了。

二、环境安装

1、安装包分享

包括jdk、scala、spark、hadoop、zookeeper安装包分享如下

https://pan.baidu.com/s/1wq77i-EB5kh5j3ZncBEa6g   密码 1v2p

建议将安装包都下载到一个目录下，比如/usr/local/sparksoft 下，至于是用ftp传上去还是虚拟机用共享目录都可以，方便即可。
至于安装，应该首先在一台机器上安装，比如test01.yuyin，安装好了以后全部复制到其他机器上，保证几台机器的安装和配置都是相同的。

2、基础环境安装

JDK安装

将安装包复制到当前目录，接下来解压即可

    tar -zxvf jdk-8u161-linux-x64.tar.gz

解压完毕后，配置环境变量

    vi /etc/profile

文件后面添加配置

    export JAVA_HOME=/usr/local/java/jdk1.8.0_131export JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib

保存，立即生效

    source /etc/profile

测试是否安装成功

有版本显示即可。

Scala安装

同java安装一样，不多啰嗦，安装结果

3、spark安装

mkdir /usr/local/spark-2.3-hadoop-2.7
cd /usr/local/spark-2.3-hadoop-2.7
cp /usr/local/sparksoft/spark-2.3-bin-hadoop2.7.tgz .
tar -zxvf spark-2.3-bin-hadoop2.7.tgz

解压完毕，同样的需要配置spark_home如下

export SPARK_HOME=/usr/local/spark-2.3-hadoop-2.7/spark-2.3.2-bin-hadoop2.7
export PATH=$PATH:${JAVA_HOME}/bin:${SPARK_HOME}/bin:${SPARK_HOME}/sbin

这样在任意目录就可以直接运行spark的sh脚本，不需要到安装目录中去找脚本。
由于在该模式下，spark自己管理资源，因此不需要安装其他的yarn之类的就可以了，spark用master进程充当resourcemanager,worker进程就是工人，干活的，另外假设我们的机器足够刚，不发生故障，所以也不考虑单点故障问题，先就这样启动，搞起来再说。
接下来进入启动阶段

三、配置与启动

现在test01.yuyin上的环境配置的差不多了，所以需要将这台机器上的配置拷贝到其他机器上，包括环境配置文件，java，scala，spark等，如果安装了hadoop那么也要拷贝一下，总之保证几台机器的环境相同:

scp /etc/profile test02.yuyin:/etc/profile
scp -r /usr/local/java/jdk1.8.0_131 test02.yuyin:/usr/local/java/jdk1.8.0_131/

其他指令差不多，不重复了。
Standalone 模式是Spark实现的资源调度框架，其主要的节点有Client节点、Master节点和Worker节点。其中Driver既可以运行在Master 节点上中，也可以运行在本地Client端。当用spark-shell交互式工具提交Spark的Job时，Driver在Master节点上运行（集群模式）；当使用spark-submit工具提交Job或者在Eclips、IDEA等开发平台上使用”new SparkConf.setManager(“spark://master:7077”)”方式运行Spark任务时，Driver是运行在本地 Client端上的（客户端模式）。找到一张运行过程的原理图如下：

关于Worker进程生成几个Executor，每个Executor使用几个core，这些都可以在spark-env.sh里面配置，也可以不配置，在/usr/local/spark-2.3-hadoop-2.7/spark-2.3.2-bin-hadoop2.7/conf 目录下,如果没有spark-env.sh ，就复制一个

可添加配置如下

[root@test01 conf]# vi spark-env.sh#!/usr/bin/env bash#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos# Options read in YARN client/cluster mode
# - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN
# - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR      Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR       Where log files are stored.  (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR       Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING  A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS      The scheduling priority for daemons. (Default: 0)
# - SPARK_NO_DAEMONIZE  Run the proposed command in the foreground. It will not output a PID file.
# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
# You might get better performance to enable these options if using native BLAS (see SPARK-21305).
# - MKL_NUM_THREADS=1        Disable multi-threading of Intel MKL
# - OPENBLAS_NUM_THREADS=1   Disable multi-threading of OpenBLAS
export JAVA_HOME=/usr/local/java/jdk1.8.0_131
export SPARK_HOME=/usr/local/spark-2.3-hadoop-2.7/spark-2.3.2-bin-hadoop2.7
export SPARK_EXECUTOR_MEMORY=5G
export SPARK_EXECUTOR_cores=2
export SPARK_WORKER_CORES=2

注意java_home 和spark_home一定要配其他的试自己的情况，不配也可以
将配置复制到其他机器

scp spark-env.sh test02.yuyin:/usr/local/spark-2.3-hadoop-2.7/spark-2.3.2-bin-hadoop2.7/conf/
scp spark-env.sh test04.yuyin:/usr/local/spark-2.3-hadoop-2.7/spark-2.3.2-bin-hadoop2.7/conf/
scp spark-env.sh test05.yuyin:/usr/local/spark-2.3-hadoop-2.7/spark-2.3.2-bin-hadoop2.7/conf/

接着给机器分配下角色(看自己喜好)

机器	角色
test01.yuyin	master
test02.yuyin	worker
test04.yuyin	worker
test05.yuyin	worker

test01.yuyin机器有承担master任务，其他机器都承担worker任务，将这个决定配置到spark中，配置在conf目录下的slaves文件中

cp slaves.template ./slaves
vi slaves

修改成如下结果

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## A Spark Worker will be started on each of the machines listed below.
test02.yuyin
test04.yuyin
test05.yuyin

保存，再将配置复制到其他机器

scp slaves test02.yuyin:/usr/local/spark-2.3-hadoop-2.7/spark-2.3.2-bin-hadoop2.7/conf/
scp slaves test04.yuyin:/usr/local/spark-2.3-hadoop-2.7/spark-2.3.2-bin-hadoop2.7/conf/
scp slaves test05.yuyin:/usr/local/spark-2.3-hadoop-2.7/spark-2.3.2-bin-hadoop2.7/conf/

这样准备差不多了，着手启动。

standalone cluster集群模式，相比客户端模式的区别

客户端的SparkSubmit进程会在应用程序提交给集群之后就退出
Master会在集群中选择一个Worker进程生成一个子进程DriverWrapper来启动driver程序

我们可以查看sbin下的start-master.sh

#!/usr/bin/env bash#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Starts the master on the machine this script is executed on.if [ -z "${SPARK_HOME}" ]; thenexport SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi# NOTE: This exact class name is matched downstream by SparkSubmit.
# Any changes need to be reflected there.
CLASS="org.apache.spark.deploy.master.Master"if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; thenecho "Usage: ./sbin/start-master.sh [options]"pattern="Usage:"pattern+="\|Using Spark's default log4j profile:"pattern+="\|Registered signal handlers for""${SPARK_HOME}"/bin/spark-class $CLASS --help 2>&1 | grep -v "$pattern" 1>&2exit 1
fiORIGINAL_ARGS="$@". "${SPARK_HOME}/sbin/spark-config.sh". "${SPARK_HOME}/bin/load-spark-env.sh"if [ "$SPARK_MASTER_PORT" = "" ]; thenSPARK_MASTER_PORT=7077
fiif [ "$SPARK_MASTER_HOST" = "" ]; thencase `uname` in(SunOS)SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`";;(*)SPARK_MASTER_HOST="`hostname -f`";;esac
fiif [ "$SPARK_MASTER_WEBUI_PORT" = "" ]; thenSPARK_MASTER_WEBUI_PORT=8080
fi"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \--host $SPARK_MASTER_HOST --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT \$ORIGINAL_ARGS

从这里可以看到三个信息，

Starts the master on the machine this script is executed on.
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080

启动这个脚本的机器就是master节点，所以我从test01.yuyin机器启动脚本，有两个端口，待会尝试访问下。
首先启动这个脚本

完成后看下进程

果然有Master进程，那个jar进程是别的程序，不用管，那么master节点就起来了，访问下那两个端口看看

说明8080端口是web管理界面，7077端口是master节点url，待会交作业用

下面启动worker节点
启动start-slaves.sh脚本

再看下各台机器的进程

可以看到大家都领到了自己的角色，就等待任务提交了。
再访问下管理界面

跟刚才的区别是worker节点的管理也进来了，但是没有application 所以下面就是交作业环节。
直接用官方的示例来做，提交命令如下：

spark-submit --class org.apache.spark.examples.SparkPi --master spark://test01.yuyin:7077 --num-executors  2 /usr/local/spark-2.3-hadoop-2.7/spark-2.3.2-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.2.jar

提交完看打印结果

这个地址是可以看spark运行情况的，不过应用结束了，这个界面也就结束了。

看到输出计算结果了。再看看管理界面

可以看到，有一个应用运行，现在已经finished，只用了两秒钟时间。如果把官方示例应用换成自己的应用也可以。
最后，从整体看初步目标算是完成了，会有若干细节问题以及其他问题没提到，自己摸索摸索就可以了，多造几遍肯定也就熟了，本文到此结束。

Spark学习之standalone模式部署实战相关推荐

Spark 3.x各模式部署 - Ubuntu
写在前面:博主是一只经过实战开发历练后投身培训事业的"小山猪",昵称取自动画片<狮子王>中的"彭彭",总是以乐观.积极的心态对待周边的事物.本人的技 ...
CC00012.flink——|HadoopFlink.V03|——|Flink.v03|安装部署|StandAlone模式部署|
一.Flink安装和部署 ### --- Flink支持多种安装模式~~~ local(本地):单机模式,一般本地开发调试使用 ~~~ StandAlone 独立模式:Flink自带集群,自己管理资源 ...
SS00004.flink——|HadoopFlink计算领域锋利的武器.v04|——|Flink.v01|StandAlone模式部署|
一.Flink安装和部署 ### --- Flink支持多种安装模式~~~ local(本地):单机模式,一般本地开发调试使用 ~~~ StandAlone 独立模式:Flink自带集群,自己管理资源 ...
Spark On K8S（Standalone模式部署）
Spark on K8S 的几种模式 Standalone:在 K8S 启动一个长期运行的集群,所有 Job 都通过 spark-submit 向这个集群提交 Kubernetes Native:通过 ...
Spark环境搭建Standalone模式
2.4 Standalone模式在Standalon模式中,Spark集群由Master节点和Worker节点构成,使用内置的Standalon框架进行资源管理.Spark程序通过与Master节 ...
flink Standalone模式部署
部署前环境准备(基本的环境变量配置这里就不赘述了) jdk8和flink-1.14.0-bin-scala_2.12我这里准备了四台服务器,分别为node100~node103将安装包上传至各个服务器 ...
安装部署Spark 1.x Standalone模式集群
Configuration spark-env.sh HADOOP_CONF_DIR=/opt/data02/hadoop-2.6.0-cdh5.4.0/etc/hadoop ...
Spark单独集群模式部署
目录网络配置 SSH 免密码登录部署执行测试网络配置 192.168.81.157 node1 master 192.168.81.158 node2 slave1 192.168.81.15 ...
KubeSphere学习---Mysql中间件安装部署实战
前言上一篇讲解了KubeSphere的多租户系统,并创建了众多用户,和众多项目,其中我们挑选:"dev-zhao"用户,和"his"项目来进行中间件的部署和学 ...

Spark学习之standalone模式部署实战

Standalone模式部署实战

一、操作系统准备

1、主机名修改

2、配置主机和ip的映射修改hosts文件

3、无密码访问

二、环境安装

1、安装包分享

2、基础环境安装

3、spark安装

三、配置与启动

Spark学习之standalone模式部署实战相关推荐

最新文章

热门文章

Spark学习之standalone模式部署实战

Standalone模式部署实战

一、操作系统准备

1、主机名修改

2、配置主机和ip的映射 修改hosts文件

3、无密码访问

二、环境安装

1、安装包分享

2、基础环境安装

3、spark安装

三、配置与启动

Spark学习之standalone模式部署实战相关推荐

最新文章

热门文章

2、配置主机和ip的映射修改hosts文件