
  • 安装虚拟机管理软件vagrant:
  • 安装VirtualBox, vagrant默认使用VirtualBox:
  • 使用windows的powershell:
  • 进入centos, 安装jdk:
  • 安装spark:
  • 运行spark:
  • 测试spark:



安装VirtualBox, vagrant默认使用VirtualBox:




PS C:\Users\geng\env> vagrant centos/7
PS C:\Users\geng\env> vagrant init centos/7
PS C:\Users\geng\env> vagrant up
PS C:\Users\geng> cd .\env\
PS C:\Users\geng\env>
PS C:\Users\geng\env>
PS C:\Users\geng\env> ls
PS C:\Users\geng\env> vagrant centos/7
Usage: vagrant [options] <command> [<args>]-v, --version                    Print the version and exit.-h, --help                       Print this help.Common commands:box             manages boxes: installation, removal, etc.cloud           manages everything related to Vagrant Clouddestroy         stops and deletes all traces of the vagrant machineglobal-status   outputs status Vagrant environments for this userhalt            stops the vagrant machinehelp            shows the help for a subcommandinit            initializes a new Vagrant environment by creating a Vagrantfileloginpackage         packages a running vagrant environment into a boxplugin          manages plugins: install, uninstall, update, etc.port            displays information about guest port mappingspowershell      connects to machine via powershell remotingprovision       provisions the vagrant machinepush            deploys code in this environment to a configured destinationrdp             connects to machine via RDPreload          restarts vagrant machine, loads new Vagrantfile configurationresume          resume a suspended vagrant machinesnapshot        manages snapshots: saving, restoring, etc.ssh             connects to machine via SSHssh-config      outputs OpenSSH valid configuration to connect to the machinestatus          outputs status of the vagrant machinesuspend         suspends the machineup              starts and provisions the vagrant environmentupload          upload to machine via communicatorvalidate        validates the Vagrantfileversion         prints current and latest Vagrant versionwinrm           executes commands on a machine via WinRMwinrm-config    outputs WinRM configuration to connect to the machineFor help on any individual command run `vagrant COMMAND -h`Additional subcommands are available, but are either more advanced
or not commonly used. To see all subcommands, run the command
`vagrant list-commands`.
PS C:\Users\geng\env> vagrant init centos/7
A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.
PS C:\Users\geng\env> vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'centos/7' could not be found. Attempting to find and install...default: Box Provider: virtualboxdefault: Box Version: >= 0
==> default: Loading metadata for box 'centos/7'default: URL: https://vagrantcloud.com/centos/7
==> default: Adding box 'centos/7' (v1901.01) for provider: virtualboxdefault: Downloading: https://vagrantcloud.com/centos/boxes/7/versions/1901.01/providers/virtualbox.boxdefault: Download redirected to host: cloud.centos.orgdefault:
==> default: Successfully added box 'centos/7' (v1901.01) for 'virtualbox'!
==> default: Importing base box 'centos/7'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'centos/7' version '1901.01' is up to date...
==> default: Setting the name of the VM: env_default_1551574680676_41983
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...default: Adapter 1: nat
==> default: Forwarding ports...default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...default: SSH address: SSH username: vagrantdefault: SSH auth method: private keydefault:default: Vagrant insecure key detected. Vagrant will automatically replacedefault: this with a newly generated keypair for better security.default:default: Inserting generated public key within guest...default: Removing insecure key from the guest if it's present...default: Key inserted! Disconnecting and reconnecting using new SSH key...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...default: No guest additions were detected on the base box for this VM! Guestdefault: additions are required for forwarded ports, shared folders, host onlydefault: networking, and more. If SSH fails on this machine, please installdefault: the guest additions and repackage the box to continue.default:default: This is not an error message; everything may continue to work properly,default: in which case you may ignore this message.
==> default: Rsyncing folder: /cygdrive/c/Users/geng/env/ => /vagrant
PS C:\Users\geng\env>
PS C:\Users\geng\env>
PS C:\Users\geng\env> vagrant ssh

进入centos, 安装jdk:


PS C:\Users\geng\env> vagrant ssh


[vagrant@localhost ~]$ sudo yum update


[vagrant@localhost ~]$ sudo yum install java-1.8.0-openjdk



[vagrant@localhost ~]$ curl -O https://www.apache.org/dyn/closer.lua/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz[vagrant@localhost ~]$ tar zxvf spark-2.4.0-bin-hadoop2.7.tgz


[vagrant@localhost opt]$ cd spark-2.4.0-bin-hadoop2.7/
[vagrant@localhost spark-2.4.0-bin-hadoop2.7]$ bin/pyspark
Python 2.7.5 (default, Oct 30 2018, 23:45:53)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
2019-03-03 09:21:13 WARN  Utils:66 - Your hostname, localhost.localdomain resolves to a loopback address:; using instead (on interface eth0)
2019-03-03 09:21:13 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-03-03 09:21:14 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to____              __/ __/__  ___ _____/ /___\ \/ _ \/ _ `/ __/  '_//__ / .__/\_,_/_/ /_/\_\   version 2.4.0/_/Using Python version 2.7.5 (default, Oct 30 2018 23:45:53)
SparkSession available as 'spark'.


>>> rdd = sc.parallelize([1,2,3,4,5])
>>> rdd.map(lambda x: x+1).reduce(lambda x,y : x+y)

