简单的spark概述:
原文:
Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
译:
Apache Spark 是一个快速的通用集群计算系统,它提供JAVA Scala Python R 的高级API,以及支持常规的执行图和优化引擎,并且还支持一组丰富的跟高级别的工具,包括Spark SQL 用于SQL和结构化数据的处理,MLlib机器学习,GraphX用于图形处理,Spark Streaming 流处理。

安全:
原文:
Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. Please see Spark Security before downloading and running Spark.
译:
默认情况下Spark的安全性是处于关闭状态的,这可能意味着你在默认情况下容易受到攻击,下载并运行Spark之前,请参阅Spark Security(http://spark.apache.org/docs/latest/security.html)。

下载和注意事项:
原文:
Get Spark from the downloads page of the project website. This documentation is for Spark version 2.4.5. Spark uses Hadoop’s client libraries for HDFS and YARN. Downloads are pre-packaged for a handful of popular Hadoop versions. Users can also download a “Hadoop free” binary and run Spark with any Hadoop version by augmenting Spark’s classpath. Scala and Java users can include Spark in their projects using its Maven coordinates and in the future Python users can also install Spark from PyPI.
If you’d like to build Spark from source, visit Building Spark.
Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It’s easy to run locally on one machine — all you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.
Spark runs on Java 8, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.5 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
Note that support for Java 7, Python 2.6 and old Hadoop versions before 2.6.5 were removed as of Spark 2.2.0. Support for Scala 2.10 was removed as of 2.3.0. Support for Scala 2.11 is deprecated as of Spark 2.4.1 and will be removed in Spark 3.0.
译:
从项目网站的下载页面获取Spark(https://spark.apache.org/downloads.html),目前的本文适用于Spark将Hadoop的客户端库用户HDFS和YARN,下载是为少数流行的Hadoop版本欲先打包的,好可以下载免费的Hadoop二进制文件,并通过扩展Spark的classpath(http://spark.apache.org/docs/latest/hadoop-provided.html)在任何Hadoop版本上运行Spark,Scala和Java用户可以使用Maven坐标将Spark包含在他们的项目中,将来Ppython用户还可以从PyPI安装Spark。

如果您想从源代码构建Spark,访问(http://spark.apache.org/docs/latest/building-spark.html)。

Spark可在windows和unix的系统(例如Linux,Mac os)上运行,在一台计算机上本地运行很容易,所要做的就是java在系统上安装PATH或JAVA_HOME指向JAVA_HOME指向安装的环境变量。

Spark 可在Java 8, Python2.7 + 3.4 + 和 R 3.1上运行,对于Scala API, Spark 2.4.5 使用Scala 2.1.2,将使用兼容的Scala版本(2.12.x)。

注意:自Spark2.2.0起,已删除了对Java 7 , Python 2.6和2.6.5之前的旧版本Hadoop版本的支持,Spark从2.3.0版本开始,不再支持Scala 2.1.0, 从Spark 2.4.1开始不再支持Scala 2.11,他将Spark3.0中删除。

运行示例和外壳:
原文:
Spark comes with several sample programs. Scala, Java, Python and R examples are in the examples/src/main directory. To run one of the Java or Scala sample programs, use bin/run-example [params] in the top-level Spark directory. (Behind the scenes, this invokes the more general spark-submit script for launching applications). For example,

./bin/run-example SparkPi 10

译:
Spark附带了几个示例程序,目录中有Spark Java Python R 和示例 examples/src/main ,要运行Java或者Scala示例程序之一,请 bin/run-example [params] 在顶级Spark目录中使用,(在后台调用用的spark-submit(http://spark.apache.org/docs/latest/submitting-applications.html)脚本来启动应用程序)例如:

./bin/run-example SparkPi 10

原文:

You can also run Spark interactively through a modified version of the Scala shell. This is a great way to learn the framework.

./bin/spark-shell --master local[2]

译:

你还可以通过修改后的Scala shell 版本以交互方式运行Spark,这是学习框架的好方法。

./bin/spark-shell --master local[2]

原文:

The --master option specifies the master URL for a distributed cluster, or local to run locally with one thread, or local[N] to run locally with N threads. You should start by using local for testing. For a full list of options, run Spark shell with the --help option.

译:
该 --master 指定分布式集群的主URL(http://spark.apache.org/docs/latest/submitting-applications.html#master-urls),或local使用一个线程local[N] 在本地运行,或使用N个线程在本地运行,您因该先从local进行测试开始,有关选项的完整列表,请运行带有 --help参数的Spark shell。

原文:
Spark also provides a Python API. To run Spark interactively in a Python interpreter, use bin/pyspark:

./bin/pyspark --master local[2]

译:
Spark 还提供了Python API,要在Python 解释器中交互式运行Spark,请使用 bin/pyspark

./bin/pyspark --master local[2]

原文:
Example applications are also provided in Python. For example,

./bin/spark-submit examples/src/main/python/pi.py 10

译:
Python 还提供了示例应用程序。

./bin/spark-submit examples/src/main/python/pi.py 10

原文:

Spark also provides an experimental R API since 1.4 (only DataFrames APIs included). To run Spark interactively in a R interpreter, use bin/sparkR:

./bin/sparkR --master local[2]

译:
从1.4开始,Spark还提供了实验性R API (仅包含DataFrames API), 要在R解释器中交互式运行Spark,请使用bin/sparkR:

./bin/sparkR --master local[2]

原文:

Example applications are also provided in R. For example,

./bin/spark-submit examples/src/main/r/dataframe.R

译:
R中还提供了示例应用程序,例如。

./bin/spark-submit examples/src/main/r/dataframe.R

在集群上启动:

原文:
The Spark cluster mode overview explains the key concepts in running on a cluster. Spark can run both by itself, or over several existing cluster managers. It currently provides several options for deployment:

译:
Spark 集群模式概述(http://spark.apache.org/docs/latest/cluster-overview.html)介绍了在集群上运行的关键概念,也可以在多个县有集群管理器上运行,当前提供了几种部署选项:

原文:
Standalone Deploy Mode: simplest way to deploy Spark on a private cluster

译:独立部署模式:(http://spark.apache.org/docs/latest/spark-standalone.html)在私有集群上部署Spark的最简单方法

原文:
Apache Mesos

译:Apache Mesos (http://spark.apache.org/docs/latest/running-on-mesos.html)

原文:
Hadoop YARN

译:Hadoop YARN (http://spark.apache.org/docs/latest/running-on-yarn.html)

原文:
Kubernetes
译:Kubernetes (http://spark.apache.org/docs/latest/running-on-kubernetes.html)

从这开始(目录):
编辑指南:
Programming Guides:

Quick Start: a quick introduction to the Spark API; start here!
快速入门(http://spark.apache.org/docs/latest/quick-start.html):Spark API快速入门;从这里开始!

RDD Programming Guide: overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
RDD编程指南(http://spark.apache.org/docs/latest/rdd-programming-guide.html):Spark基础概述-RDD(核心但旧的API),累加器和广播变量

Spark SQL, Datasets, and DataFrames: processing structured data with relational queries (newer API than RDDs)
Spark SQL,数据集和数据帧(http://spark.apache.org/docs/latest/sql-programming-guide.html):使用关系查询(比RDD更新的API)处理结构化数据

Structured Streaming: processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
结构化流(http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html):使用关系查询处理结构化数据流(使用数据集和数据帧,API比DStreams更新)

Spark Streaming: processing data streams using DStreams (old API)
Spark Streaming(http://spark.apache.org/docs/latest/streaming-programming-guide.html):使用DStreams处理数据流(旧API)

MLlib: applying machine learning algorithms
MLlib(http://spark.apache.org/docs/latest/ml-guide.html):应用机器学习算法

GraphX: processing graphs
GraphX(http://spark.apache.org/docs/latest/graphx-programming-guide.html):处理图形

API文件:
API Docs:

Spark Scala API (Scaladoc) (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package)
Spark Java API (Javadoc) (http://spark.apache.org/docs/latest/api/java/index.html)
Spark Python API (Sphinx) (http://spark.apache.org/docs/latest/api/python/index.html)
Spark R API (Roxygen2) (http://spark.apache.org/docs/latest/api/R/index.html)
Spark SQL, Built-in Functions (MkDocs) (http://spark.apache.org/docs/latest/api/sql/index.html)

部署指南:
Deployment Guides:

Cluster Overview: overview of concepts and components when running on a cluster
集群概述(http://spark.apache.org/docs/latest/cluster-overview.html):在集群上运行时的概念和组件概述

Submitting Applications: packaging and deploying applications
提交应用程序(http://spark.apache.org/docs/latest/submitting-applications.html):打包和部署应用程序

部署方式:
Deployment modes:

Amazon EC2: scripts that let you launch a cluster on EC2 in about 5 minutes
Amazon EC2(https://github.com/amplab/spark-ec2):可使您在大约5分钟内在EC2上启动集群的脚本

Standalone Deploy Mode: launch a standalone cluster quickly without a third-party cluster manager
独立部署模式(http://spark.apache.org/docs/latest/spark-standalone.html):无需第三方集群管理器即可快速启动独立集群

Mesos: deploy a private cluster using Apache Mesos
Mesos(http://spark.apache.org/docs/latest/running-on-mesos.html):使用Apache Mesos部署私有集群

YARN: deploy Spark on top of Hadoop NextGen (YARN)
YARN(http://spark.apache.org/docs/latest/running-on-yarn.html):在Hadoop NextGen(YARN)之上部署Spark

Kubernetes: deploy Spark on top of Kubernetes
Kubernetes(http://spark.apache.org/docs/latest/running-on-kubernetes.html):在Kubernetes之上部署Spark

其他文件:
Other Documents:

Configuration: customize Spark via its configuration system
配置(http://spark.apache.org/docs/latest/configuration.html):通过其配置系统自定义Spark

Monitoring: track the behavior of your applications
监视(http://spark.apache.org/docs/latest/monitoring.html):跟踪应用程序的行为

Tuning Guide: best practices to optimize performance and memory use
调优指南(http://spark.apache.org/docs/latest/tuning.html):优化性能和内存使用的最佳做法

Job Scheduling: scheduling resources across and within Spark applications
作业调度(http://spark.apache.org/docs/latest/job-scheduling.html):在Spark应用程序之间和内部调度资源

Security: Spark security support
安全性(http://spark.apache.org/docs/latest/security.html):Spark安全性支持

Hardware Provisioning: recommendations for cluster hardware
硬件配置(http://spark.apache.org/docs/latest/hardware-provisioning.html):有关群集硬件的建议

与其他存储系统集成
Integration with other storage systems:

Cloud Infrastructures
云基础架构(http://spark.apache.org/docs/latest/cloud-integration.html)

OpenStack Swift
OpenStack迅捷(http://spark.apache.org/docs/latest/storage-openstack-swift.html)

Building Spark: build Spark using the Maven system
构建Spark(http://spark.apache.org/docs/latest/building-spark.html):使用Maven系统构建Spark

Contributing to Spark
为Spark贡献(https://spark.apache.org/contributing.html)

Third Party Projects: related third party Spark projects
第三方项目(https://spark.apache.org/third-party-projects.html):相关的第三方Spark项目

外部资源:
External Resources:

Spark Homepage
Spark主页(https://spark.apache.org)

Spark Community resources, including local meetups
Spark社区资源(https://spark.apache.org/community.html),包括本地社区

StackOverflow tag apache-spark
StackOverflow标签 apache-spark(https://stackoverflow.com/questions/tagged/apache-spark)

Mailing Lists: ask questions about Spark here
邮件列表(https://spark.apache.org/mailing-lists.html):在此处询问有关Spark的问题

AMP Camps: a series of training camps at UC Berkeley that featured talks and exercises about Spark, Spark Streaming, Mesos, and more. Videos, slides and exercises are available online for free.
AMP营地(http://ampcamp.berkeley.edu):加州大学伯克利分校的一系列训练营,其中包含有关Spark,Spark Streaming,Mesos等的讲座和练习。视频, 幻灯片和练习可在线免费获得。

Code Examples: more are also available in the examples subfolder of Spark (Scala, Java, Python, R)
代码示例(https://spark.apache.org/examples.html):examplesSpark 的子文件夹(Scala, Java, Python, R)中也提供更多示例

spark 官网首页相关推荐

  1. spark官网首页翻译

    官网:http://spark.apache.org/ Download(下载)   Libraries(SQL And DataFrame.Spark Streaming.MLlib.Third-P ...

  2. HTML/CSS——PC端QQ飞车官网首页

    原作品效果: 仿制品效果: 代码: HTML <!DOCTYPE html> <html lang="en"> <head><meta c ...

  3. 超好看的自适应蜘蛛池官网首页源码

    介绍: 一款自适应的官网首页程序,请自行修改为您需要的内容 网盘下载地址: http://kekewl.org/fVPKO0gfW8i0 图片:

  4. HTML期末大作业课程设计~仿阴阳师游戏官网首页html模板(HTML+CSS)~动漫主题html5网页模板-HTML期末作业课程设计期末大作业动漫主题html5网页模板-html5网页设计源码...

    HTML期末大作业课程设计~仿阴阳师游戏官网首页html模板(DIV+CSS) 临近期末, 你还在为HTML网页设计结课作业,老师的作业要求感到头大?HTML网页作业无从下手?网页要求的总数量太多?没 ...

  5. HTML5期末大作业:网页设计——小米商城官网首页(1页) HTML+CSS+JavaScript web期末作业设计网页_清新淡雅个人网页大学生网页设计作业成品

    HTML5期末大作业:网页设计--小米商城官网首页(1页) HTML+CSS+JavaScript web期末作业设计网页_清新淡雅个人网页大学生网页设计作业成品 常见网页设计作业题材有 个人. 美食 ...

  6. html实训QQ音乐官网首页制作

    QQ音乐官网首页 html <!DOCTYPE html> <html lang="en"> <head><link rel=" ...

  7. android studio 混淆包,gogoapp体育-官网首页

    一.密码 1.在阅读项目代码时,对于 #pragma warning(disable : 4251) 这个语句不是很理解,现在有时间查阅了一些资料整理如下,以备以后查找使用,也给对此有疑问提的朋友一个 ...

  8. 院线售票系统 背景:有一套连锁影院系统,包括官网、视频站、论坛、在线售票等等子系统,现要求你来实现其中的官网首页的电影信息展示和检索功能

    设计一(8分):请为该功能设计一张电影信息表,字段大概包括电影名称.内部编号.主演.导演.票价.上映时间.简介.点赞数.众评等字段,除此之外字段可以根据你的设计自由添加(但不能少于以上内容).使用my ...

  9. HTML期末大作业课程设计~仿阴阳师游戏官网首页html模板(HTML+CSS)~动漫主题html5网页模板-HTML期末作业课程设计期末大作业动漫主题html5网页模板-html5网页设计源码

    HTML期末大作业课程设计~仿阴阳师游戏官网首页html模板(DIV+CSS) 临近期末, 你还在为HTML网页设计结课作业,老师的作业要求感到头大?HTML网页作业无从下手?网页要求的总数量太多?没 ...

最新文章

  1. 215. Kth Largest Element in an Array 数组中的第K个最大元素
  2. memcached使用详解
  3. leetcode 229. Majority Element II | 229. 求众数 II(找出现次数超过n/k的元素)
  4. sqlite mysql pgsql_比较MySQL,PostgreSQL和SQLite中的数据库列类型?(跨图)
  5. ASP.NET 抓取网页内容
  6. asp.net下载的三种方式
  7. 简述isodata算法的原理_基于UWB技术的室内定位方法简述
  8. 如何加密 Ubuntu 安装后的主文件夹
  9. openmv探索_4_AprilTag标记追踪
  10. 基于php mysql技术_基于PHP和MySQL技术的网络教学平台构建
  11. Illustrator 教程,如何在 Illustrator 中裁剪、分割和修剪作品?
  12. Linq to xml修改CDATA节点值
  13. 2.原子变量 CAS算法
  14. android原生 6108v9a,全国通用版华为EC6108V9A图文教程
  15. 怎样剪立体灯笼_懒人版立体镂空星星折纸灯笼教程
  16. vscode任务栏图标突然不显示
  17. java图片加文字水印_JAVA实现图片的修改,添加文字水印效果
  18. 程序员提问的智慧(How-To-Ask-Questions-The-Smart-Way)
  19. x64长模式与段的纠葛
  20. (59)TCL脚本命令【全局变量】

热门文章

  1. python 条件语句且_Python条件语句
  2. 陈昊芝--Great or nothing
  3. 2008年十大最流行病毒 你被命中多少
  4. Lucene系列:(11)异步分页
  5. Android Studio程序运行流程(大白话迅速入门)
  6. 笔记本电脑连接wifi有时候会自动断网提示有限的访问权限解决办法
  7. 中秋国庆连放20天假?这家互联网公司的放假通知火了!
  8. Monkey-介绍、优势、操作步骤、中止monkey、
  9. 华为鸿蒙用户达到4000万,但这样的进度恐怕难实现年内3.6亿用户
  10. 361度宣布新晋世界拳王徐灿为品牌形象代言人