数据分析：大数据环境安装（docker+docker-compose+spark+hadoop+hive）

大数据环境安装

VirtualBox虚拟机下载地址：https://www.virtualbox.org/wiki/Downloads
Vagrant下载地址：https://www.vagrantup.com/downloads.html
图形界面操作步骤可参考：http://drupalchina.cn/book/export/html/6389
ubantu：https://ubuntu.com/#download

一、VirtualBox创建虚拟机（其他步骤为默认）

修改好需要存放的地址

将内存大小调整到绿色

属于动态分配，所以不用太担心占用空间，避免之后空间不足

选择下载的Ubantu系统

设置网卡为桥接网卡，利用Ping命令进行检测

二、Vagrant创建虚拟机

2.1 利用命令导入box

根据官网提供的boxes命令去创建虚拟机，参考网址https://app.vagrantup.com/ubuntu/boxes/trusty64

vagrant init ubuntu/trusty64
vagrant up

如果下载较慢，可以用迅雷下载，观察vagrant up执行后的语句，发现有下载地址，添加到迅雷下载

 Downloading: https://vagrantcloud.com/ubuntu/boxes/trusty64/versions/20190514.0.0/providers/virtualbox.box

下载完成后，添加box，注意下载后文件的路径，然后进行查看导入是否成功

vagrant box add --name ubuntu/trusty64 D:\VirtualBoxVMS\trusty-server-cloudimg-amd64-vagrant-disk1.box

导入成功后再执行vagrant up语句

2.2 成功后利用xshell连接

成功后，可以看到VirtualBox中有对应的虚拟机

  config.vm.synced_folder '/host/path', '/guest/path', SharedFoldersEnableSymlinksCreate: false
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...default: Adapter 1: nat
==> default: Forwarding ports...default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...default: SSH address: 127.0.0.1:2222default: SSH username: vagrantdefault: SSH auth method: private keydefault: Warning: Connection reset. Retrying...default: Warning: Connection aborted. Retrying...default: Warning: Remote connection disconnect. Retrying...default: Warning: Connection reset. Retrying...default: Warning: Connection aborted. Retrying...default:default: Vagrant insecure key detected. Vagrant will automatically replacedefault: this with a newly generated keypair for better security.default:default: Inserting generated public key within guest...default: Removing insecure key from the guest if it's present...default: Key inserted! Disconnecting and reconnecting using new SSH key...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...default: The guest additions on this VM do not match the installed version ofdefault: VirtualBox! In most cases this is fine, but in rare cases it candefault: prevent things such as shared folders from working properly. If you seedefault: shared folder errors, please make sure the guest additions within thedefault: virtual machine match the version of VirtualBox you have installed ondefault: your host and reload your VM.default:default: Guest Additions Version: 4.3.40default: VirtualBox Version: 6.0
==> default: Mounting shared folders...default: /vagrant => D:/VirtualBoxVagrant

说明了虚拟机中开放了22端口转发成本机的2222端口，如果使用Xshell连接时，需要连接的是2222端口，主机连接为127.0.0.1:2222，校验方式为public key

三、VagrantFile详细配置虚拟机

在Virtual Box图形化界面中进行修改配置，如果虚拟机进行重启，依旧会根据VagrantFile里面的配置重新启动，所以需要修改此文件去调整配置

3.1 虚拟机基于哪个box

config.vm.box = "ubuntu/trusty64"

3.2 是否每次启动去校验官网仓库

config.vm.box_check_update = false

3.3 Network配置

guest代表虚拟机端口为80，host代表对应的宿主机端口为8080，这里代表只有127.0.0.1的8080端口可以访问虚拟机的80端口

config.vm.network "forwarded_port", guest: 80, host: 8080, host_ip: "127.0.0.1"

设置虚拟机对应的静态IP

config.vm.network "private_network", ip: "192.168.33.10"

config.vm.synced_folder "../data", "/vagrant_data"

这里修改目录，并在本地创建对应的目录，虚拟机会自动创建vm_test

config.vm.synced_folder "./local_test", "/vm_test"

3.5 显示配置与内存

  # config.vm.provider "virtualbox" do |vb|#   # Display the VirtualBox GUI when booting the machine#   vb.gui = true##   # Customize the amount of memory on the VM:#   vb.memory = "1024"# end

这里进行如下修改，内存设置为2G，双核CPU

   config.vm.provider "virtualbox" do |vb|# Display the VirtualBox GUI when booting the machine# vb.gui = true# Customize the amount of memory on the VM:vb.memory = "2048"vb.cpus = 2end

3.6 自动执行shell脚本

  # config.vm.provision "shell", inline: <<-SHELL#   apt-get update#   apt-get install -y apache2# SHELL

3.7 重新启动

vagrant reload

三、安装Docker与Docker-compose

3.1 安装命令

sudo apt-get install docker.io

3.2 查看版本

docker --version

3.3 引入hello world镜像

sudo docker pull hello-world

3.4 运行hello world镜像

sudo docker run hello-world

3.5 解决sudo问题

需要重启电脑

sudo gpasswd -a vagrant docker

3.6 安装docker-compose

链接：https://docs.docker.com/compose/install/，选择Linux版本

sudo curl -L "https://github.com/docker/compose/releases/download/1.25.5/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

3.7 将docker-compose变成可执行文件

sudo chmod +x /usr/local/bin/docker-compose

四、安装ZSH

4.1 安装

sudo apt-get install zsh

4.2 将ubantu默认的交互软件shell修改为zsh

chsh -s $(which zsh)

4.3 重启后，选择2

(q)  Quit and do nothing.  The function will be run again next time.(0)  Exit, creating the file ~/.zshrc containing just a comment.That will prevent this function being run again.(1)  Continue to the main menu.(2)  Populate your ~/.zshrc with the configuration recommendedby the system administrator and exit (you will need to editthe file by hand, if so desired).

4.4 oh my zsh管理zsh配置

https://github.com/ohmyzsh/ohmyzsh

安装curl

sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

五、DEMO

利用docker与docker compose容器技术把spring boot项目与redis server统一管理起来，使统计网站使用人数微服务一起启停与运维。

在spring boot项目中使用maven clear install，生成一个jar包
把jar包放入docker文件夹，里面还包含docker-compose.yml、Dockerfile
然后拷贝进vagrant生成的虚拟机的local_test文件夹下
在虚拟机中，进入vm_test查看

Dockerfile

#基于哪个现有的镜像构建
FROM livingobjects/jre8
#文件名的容器
VOLUME /tmep
#将jar包加入到容器
ADD spark-es-tag-1.0.jar app.jar
#容器启动时使用的命令
ENTRYPOINT["java","-Djava.security.egd=file:/dev/ ./urandom","-jar","/app.jar"]

docker-compose.yml，容器编排的描述与配置文件
执行命令：dcup

六、大数据环境安装与验证

大数据环境脚本下载：https://pan.baidu.com/s/1oYAmlIltC6H5vpFJ2vPx5Q，验证码：mjux

6.1 大数据环境

在ubuntu虚拟机上创建文件夹docker-env
在xshell中使用ctrl+alt+f，开启文件传输，需要下载Xftp，将大数据环境脚本复制到docker-env文件夹下
修改**.sh文件的属性

chmod +x run.sh
chmod +x copy-jar.sh
chmod +x stop.sh

执行run.sh脚本

./run.sh

在虚拟机的火狐浏览器中，输入localhost:50070（namenode）、localhost:8080（spark）、localhost:8088(hadoop)，如果页面可以访问到，便成功

七、docker-compose.yml

version: '2'    #版本不同，写法会有不同
services:        #根节点namenode:     image: bde2020/hadoop-namenode:1.1.0-hadoop2.8-java8 #引用的imagecontainer_name: namenode   #容器名称volumes:                       #主机与docker容器挂载关系- ./data/namenode:/hadoop/dfs/nameenvironment:              #环境变量，key-value- CLUSTER_NAME=testenv_file:                 #环境变量具体的文件- ./hadoop-hive.envports:                     #内外端口映射关系- 50070:50070- 8020:8020resourcemanager:image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.8-java8container_name: resourcemanagerenvironment:- CLUSTER_NAME=testenv_file:- ./hadoop-hive.envports:- 8088:8088historyserver:image: bde2020/hadoop-historyserver:1.1.0-hadoop2.8-java8container_name: historyserverenvironment:- CLUSTER_NAME=testenv_file:- ./hadoop-hive.envports:- 8188:8188datanode:image: bde2020/hadoop-datanode:1.1.0-hadoop2.8-java8depends_on:       #必须在namenode启动完后再启动datanode，容器关系之前的依赖- namenodevolumes:- ./data/datanode:/hadoop/dfs/dataenv_file:- ./hadoop-hive.envports:              #ubuntu中使用浏览器localhost:50075就可以直接访问到- 50075:50075nodemanager:image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.8-java8container_name: nodemanagerhostname: nodemanagerenvironment:- CLUSTER_NAME=testenv_file:- ./hadoop-hive.envports:- 8042:8042hive-server:image: bde2020/hive:2.1.0-postgresql-metastorecontainer_name: hive-serverenv_file:- ./hadoop-hive.envenvironment:- "HIVE_CORE_CONF_javax_jdo_option_ConnectionURL=jdbc:postgresql://hive-metastore/metastore"ports:- "10000:10000"hive-metastore:image: bde2020/hive:2.1.0-postgresql-metastorecontainer_name: hive-metastoreenv_file:- ./hadoop-hive.envcommand: /opt/hive/bin/hive --service metastoreports:- 9083:9083hive-metastore-postgresql:image: bde2020/hive-metastore-postgresql:2.1.0ports:- 5432:5432volumes:- ./data/postgresql/:/var/lib/postgresql/dataspark-master:image: bde2020/spark-master:2.1.0-hadoop2.8-hive-java8container_name: spark-masterhostname: spark-master          #hostname设置后，进入后主机名便不会使用哈希值volumes:- ./copy-jar.sh:/copy-jar.shports:- 8080:8080- 7077:7077env_file:- ./hadoop-hive.envspark-worker:image: bde2020/spark-worker:2.1.0-hadoop2.8-hive-java8depends_on:- spark-masterenvironment:- SPARK_MASTER=spark://spark-master:7077ports:- "8081:8081"env_file:- ./hadoop-hive.envmysql-server:image: mysql:5.7container_name: mysql-serverports:- "3306:3306"environment:- MYSQL_ROOT_PASSWORD=zhangyang517volumes:- ./data/mysql:/var/lib/mysqlelasticsearch:image: elasticsearch:6.5.3environment:- discovery.type=single-nodeports:- "9200:9200"- "9300:9300"networks: - es_networkkibana:image: kibana:6.5.3ports:- "5601:5601"networks: - es_networknetworks:es_network:external: true