简单的spark概述:
原文:
Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
译:
Apache Spark 是一个快速的通用集群计算系统,它提供JAVA Scala Python R 的高级API,以及支持常规的执行图和优化引擎,并且还支持一组丰富的跟高级别的工具,包括Spark SQL 用于SQL和结构化数据的处理,MLlib机器学习,GraphX用于图形处理,Spark Streaming 流处理。

安全:
原文:
Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. Please see Spark Security before downloading and running Spark.
译:
默认情况下Spark的安全性是处于关闭状态的,这可能意味着你在默认情况下容易受到攻击,下载并运行Spark之前,请参阅Spark Security(http://spark.apache.org/docs/latest/security.html)。

下载和注意事项:
原文:
Get Spark from the downloads page of the project website. This documentation is for Spark version 2.4.5. Spark uses Hadoop’s client libraries for HDFS and YARN. Downloads are pre-packaged for a handful of popular Hadoop versions. Users can also download a “Hadoop free” binary and run Spark with any Hadoop version by augmenting Spark’s classpath. Scala and Java users can include Spark in their projects using its Maven coordinates and in the future Python users can also install Spark from PyPI.
If you’d like to build Spark from source, visit Building Spark.
Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It’s easy to run locally on one machine — all you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.
Spark runs on Java 8, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.5 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
Note that support for Java 7, Python 2.6 and old Hadoop versions before 2.6.5 were removed as of Spark 2.2.0. Support for Scala 2.10 was removed as of 2.3.0. Support for Scala 2.11 is deprecated as of Spark 2.4.1 and will be removed in Spark 3.0.
译:
从项目网站的下载页面获取Spark(https://spark.apache.org/downloads.html),目前的本文适用于Spark将Hadoop的客户端库用户HDFS和YARN,下载是为少数流行的Hadoop版本欲先打包的,好可以下载免费的Hadoop二进制文件,并通过扩展Spark的classpath(http://spark.apache.org/docs/latest/hadoop-provided.html)在任何Hadoop版本上运行Spark,Scala和Java用户可以使用Maven坐标将Spark包含在他们的项目中,将来Ppython用户还可以从PyPI安装Spark。

如果您想从源代码构建Spark,访问(http://spark.apache.org/docs/latest/building-spark.html)。

Spark可在windows和unix的系统(例如Linux,Mac os)上运行,在一台计算机上本地运行很容易,所要做的就是java在系统上安装PATH或JAVA_HOME指向JAVA_HOME指向安装的环境变量。

Spark 可在Java 8, Python2.7 + 3.4 + 和 R 3.1上运行,对于Scala API, Spark 2.4.5 使用Scala 2.1.2,将使用兼容的Scala版本(2.12.x)。

注意:自Spark2.2.0起,已删除了对Java 7 , Python 2.6和2.6.5之前的旧版本Hadoop版本的支持,Spark从2.3.0版本开始,不再支持Scala 2.1.0, 从Spark 2.4.1开始不再支持Scala 2.11,他将Spark3.0中删除。

运行示例和外壳:
原文:
Spark comes with several sample programs. Scala, Java, Python and R examples are in the examples/src/main directory. To run one of the Java or Scala sample programs, use bin/run-example [params] in the top-level Spark directory. (Behind the scenes, this invokes the more general spark-submit script for launching applications). For example,

./bin/run-example SparkPi 10

译:
Spark附带了几个示例程序,目录中有Spark Java Python R 和示例 examples/src/main ,要运行Java或者Scala示例程序之一,请 bin/run-example [params] 在顶级Spark目录中使用,(在后台调用用的spark-submit(http://spark.apache.org/docs/latest/submitting-applications.html)脚本来启动应用程序)例如:

./bin/run-example SparkPi 10

原文:

You can also run Spark interactively through a modified version of the Scala shell. This is a great way to learn the framework.

./bin/spark-shell --master local[2]

译:

你还可以通过修改后的Scala shell 版本以交互方式运行Spark,这是学习框架的好方法。

./bin/spark-shell --master local[2]

原文:

The --master option specifies the master URL for a distributed cluster, or local to run locally with one thread, or local[N] to run locally with N threads. You should start by using local for testing. For a full list of options, run Spark shell with the --help option.

译:
该 --master 指定分布式集群的主URL(http://spark.apache.org/docs/latest/submitting-applications.html#master-urls),或local使用一个线程local[N] 在本地运行,或使用N个线程在本地运行,您因该先从local进行测试开始,有关选项的完整列表,请运行带有 --help参数的Spark shell。

原文:
Spark also provides a Python API. To run Spark interactively in a Python interpreter, use bin/pyspark:

./bin/pyspark --master local[2]

译:
Spark 还提供了Python API,要在Python 解释器中交互式运行Spark,请使用 bin/pyspark

./bin/pyspark --master local[2]

原文:
Example applications are also provided in Python. For example,

./bin/spark-submit examples/src/main/python/pi.py 10

译:
Python 还提供了示例应用程序。

./bin/spark-submit examples/src/main/python/pi.py 10

原文:

Spark also provides an experimental R API since 1.4 (only DataFrames APIs included). To run Spark interactively in a R interpreter, use bin/sparkR:

./bin/sparkR --master local[2]

译:
从1.4开始,Spark还提供了实验性R API (仅包含DataFrames API), 要在R解释器中交互式运行Spark,请使用bin/sparkR:

./bin/sparkR --master local[2]

原文:

Example applications are also provided in R. For example,

./bin/spark-submit examples/src/main/r/dataframe.R

译:
R中还提供了示例应用程序,例如。

./bin/spark-submit examples/src/main/r/dataframe.R

在集群上启动:

原文:
The Spark cluster mode overview explains the key concepts in running on a cluster. Spark can run both by itself, or over several existing cluster managers. It currently provides several options for deployment:

译:
Spark 集群模式概述(http://spark.apache.org/docs/latest/cluster-overview.html)介绍了在集群上运行的关键概念,也可以在多个县有集群管理器上运行,当前提供了几种部署选项:

原文:
Standalone Deploy Mode: simplest way to deploy Spark on a private cluster

译:独立部署模式:(http://spark.apache.org/docs/latest/spark-standalone.html)在私有集群上部署Spark的最简单方法

原文:
Apache Mesos

译:Apache Mesos (http://spark.apache.org/docs/latest/running-on-mesos.html)

原文:
Hadoop YARN

译:Hadoop YARN (http://spark.apache.org/docs/latest/running-on-yarn.html)

原文:
Kubernetes
译:Kubernetes (http://spark.apache.org/docs/latest/running-on-kubernetes.html)

从这开始(目录):
编辑指南:
Programming Guides:

Quick Start: a quick introduction to the Spark API; start here!
快速入门(http://spark.apache.org/docs/latest/quick-start.html):Spark API快速入门;从这里开始!

RDD Programming Guide: overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
RDD编程指南(http://spark.apache.org/docs/latest/rdd-programming-guide.html):Spark基础概述-RDD(核心但旧的API),累加器和广播变量

Spark SQL, Datasets, and DataFrames: processing structured data with relational queries (newer API than RDDs)
Spark SQL,数据集和数据帧(http://spark.apache.org/docs/latest/sql-programming-guide.html):使用关系查询(比RDD更新的API)处理结构化数据

Structured Streaming: processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
结构化流(http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html):使用关系查询处理结构化数据流(使用数据集和数据帧,API比DStreams更新)

Spark Streaming: processing data streams using DStreams (old API)
Spark Streaming(http://spark.apache.org/docs/latest/streaming-programming-guide.html):使用DStreams处理数据流(旧API)

MLlib: applying machine learning algorithms
MLlib(http://spark.apache.org/docs/latest/ml-guide.html):应用机器学习算法

GraphX: processing graphs
GraphX(http://spark.apache.org/docs/latest/graphx-programming-guide.html):处理图形

API文件:
API Docs:

Spark Scala API (Scaladoc) (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package)
Spark Java API (Javadoc) (http://spark.apache.org/docs/latest/api/java/index.html)
Spark Python API (Sphinx) (http://spark.apache.org/docs/latest/api/python/index.html)
Spark R API (Roxygen2) (http://spark.apache.org/docs/latest/api/R/index.html)
Spark SQL, Built-in Functions (MkDocs) (http://spark.apache.org/docs/latest/api/sql/index.html)

部署指南:
Deployment Guides:

Cluster Overview: overview of concepts and components when running on a cluster
集群概述(http://spark.apache.org/docs/latest/cluster-overview.html):在集群上运行时的概念和组件概述

Submitting Applications: packaging and deploying applications
提交应用程序(http://spark.apache.org/docs/latest/submitting-applications.html):打包和部署应用程序

部署方式:
Deployment modes:

Amazon EC2: scripts that let you launch a cluster on EC2 in about 5 minutes
Amazon EC2(https://github.com/amplab/spark-ec2):可使您在大约5分钟内在EC2上启动集群的脚本

Standalone Deploy Mode: launch a standalone cluster quickly without a third-party cluster manager
独立部署模式(http://spark.apache.org/docs/latest/spark-standalone.html):无需第三方集群管理器即可快速启动独立集群

Mesos: deploy a private cluster using Apache Mesos
Mesos(http://spark.apache.org/docs/latest/running-on-mesos.html):使用Apache Mesos部署私有集群

YARN: deploy Spark on top of Hadoop NextGen (YARN)
YARN(http://spark.apache.org/docs/latest/running-on-yarn.html):在Hadoop NextGen(YARN)之上部署Spark

Kubernetes: deploy Spark on top of Kubernetes
Kubernetes(http://spark.apache.org/docs/latest/running-on-kubernetes.html):在Kubernetes之上部署Spark

其他文件:
Other Documents:

Configuration: customize Spark via its configuration system
配置(http://spark.apache.org/docs/latest/configuration.html):通过其配置系统自定义Spark

Monitoring: track the behavior of your applications
监视(http://spark.apache.org/docs/latest/monitoring.html):跟踪应用程序的行为

Tuning Guide: best practices to optimize performance and memory use
调优指南(http://spark.apache.org/docs/latest/tuning.html):优化性能和内存使用的最佳做法

Job Scheduling: scheduling resources across and within Spark applications
作业调度(http://spark.apache.org/docs/latest/job-scheduling.html):在Spark应用程序之间和内部调度资源

Security: Spark security support
安全性(http://spark.apache.org/docs/latest/security.html):Spark安全性支持

Hardware Provisioning: recommendations for cluster hardware
硬件配置(http://spark.apache.org/docs/latest/hardware-provisioning.html):有关群集硬件的建议

与其他存储系统集成
Integration with other storage systems:

Cloud Infrastructures
云基础架构(http://spark.apache.org/docs/latest/cloud-integration.html)

OpenStack Swift
OpenStack迅捷(http://spark.apache.org/docs/latest/storage-openstack-swift.html)

Building Spark: build Spark using the Maven system
构建Spark(http://spark.apache.org/docs/latest/building-spark.html):使用Maven系统构建Spark

Contributing to Spark
为Spark贡献(https://spark.apache.org/contributing.html)

Third Party Projects: related third party Spark projects
第三方项目(https://spark.apache.org/third-party-projects.html):相关的第三方Spark项目

外部资源:
External Resources:

Spark Homepage
Spark主页(https://spark.apache.org)

Spark Community resources, including local meetups
Spark社区资源(https://spark.apache.org/community.html),包括本地社区

StackOverflow tag apache-spark
StackOverflow标签 apache-spark(https://stackoverflow.com/questions/tagged/apache-spark)

Mailing Lists: ask questions about Spark here
邮件列表(https://spark.apache.org/mailing-lists.html):在此处询问有关Spark的问题

AMP Camps: a series of training camps at UC Berkeley that featured talks and exercises about Spark, Spark Streaming, Mesos, and more. Videos, slides and exercises are available online for free.
AMP营地(http://ampcamp.berkeley.edu):加州大学伯克利分校的一系列训练营,其中包含有关Spark,Spark Streaming,Mesos等的讲座和练习。视频, 幻灯片和练习可在线免费获得。

Code Examples: more are also available in the examples subfolder of Spark (Scala, Java, Python, R)
代码示例(https://spark.apache.org/examples.html):examplesSpark 的子文件夹(Scala, Java, Python, R)中也提供更多示例

spark 官网首页相关推荐

  1. spark官网首页翻译

    官网:http://spark.apache.org/ Download(下载)   Libraries(SQL And DataFrame.Spark Streaming.MLlib.Third-P ...

  2. HTML/CSS——PC端QQ飞车官网首页

    原作品效果: 仿制品效果: 代码: HTML <!DOCTYPE html> <html lang="en"> <head><meta c ...

  3. 超好看的自适应蜘蛛池官网首页源码

    介绍: 一款自适应的官网首页程序,请自行修改为您需要的内容 网盘下载地址: http://kekewl.org/fVPKO0gfW8i0 图片:

  4. HTML期末大作业课程设计~仿阴阳师游戏官网首页html模板(HTML+CSS)~动漫主题html5网页模板-HTML期末作业课程设计期末大作业动漫主题html5网页模板-html5网页设计源码...

    HTML期末大作业课程设计~仿阴阳师游戏官网首页html模板(DIV+CSS) 临近期末, 你还在为HTML网页设计结课作业,老师的作业要求感到头大?HTML网页作业无从下手?网页要求的总数量太多?没 ...

  5. HTML5期末大作业:网页设计——小米商城官网首页(1页) HTML+CSS+JavaScript web期末作业设计网页_清新淡雅个人网页大学生网页设计作业成品

    HTML5期末大作业:网页设计--小米商城官网首页(1页) HTML+CSS+JavaScript web期末作业设计网页_清新淡雅个人网页大学生网页设计作业成品 常见网页设计作业题材有 个人. 美食 ...

  6. html实训QQ音乐官网首页制作

    QQ音乐官网首页 html <!DOCTYPE html> <html lang="en"> <head><link rel=" ...

  7. android studio 混淆包,gogoapp体育-官网首页

    一.密码 1.在阅读项目代码时,对于 #pragma warning(disable : 4251) 这个语句不是很理解,现在有时间查阅了一些资料整理如下,以备以后查找使用,也给对此有疑问提的朋友一个 ...

  8. 院线售票系统 背景:有一套连锁影院系统,包括官网、视频站、论坛、在线售票等等子系统,现要求你来实现其中的官网首页的电影信息展示和检索功能

    设计一(8分):请为该功能设计一张电影信息表,字段大概包括电影名称.内部编号.主演.导演.票价.上映时间.简介.点赞数.众评等字段,除此之外字段可以根据你的设计自由添加(但不能少于以上内容).使用my ...

  9. HTML期末大作业课程设计~仿阴阳师游戏官网首页html模板(HTML+CSS)~动漫主题html5网页模板-HTML期末作业课程设计期末大作业动漫主题html5网页模板-html5网页设计源码

    HTML期末大作业课程设计~仿阴阳师游戏官网首页html模板(DIV+CSS) 临近期末, 你还在为HTML网页设计结课作业,老师的作业要求感到头大?HTML网页作业无从下手?网页要求的总数量太多?没 ...

最新文章

  1. [洛谷P4889]kls与flag
  2. 树莓派(Raspberry Pi 3) centos7使用yum命令报错File /usr/bin/yum, line 30 except KeyboardInterrupt, e:...
  3. [AGC018 B] Sports Festival 解题报告
  4. IBM WebSphere MQ 系列(二)安装MQ
  5. vue.js响应式原理解析与实现
  6. 不止 JDK7 的 HashMap ,JDK8 的 ConcurrentHashMap 也会造成 CPU 100%?原因与解决~
  7. Python之sklearn2pmml:sklearn2pmml库函数的简介、安装、使用方法之详细攻略
  8. 第二十篇:定义一个整形变量
  9. 尚硅谷的 ediary 笔记_干货分享 | 硅谷创新加速营第五讲教您合理规划融资需求 降低投资风险...
  10. 阿里日均纳税超1.4亿;AI换脸骗过美侦查;日本民众哄抢令和报纸;辟谣教学楼发现大量金矿;上海拨通首个5G通话;这就是今日大新闻...
  11. deebot扫地机器人使用_完美主义的双子座也选这款扫地机——小米米家扫地机器人使用心得...
  12. 给fiddle 解密_fiddler学习笔记2 字段说明;移动设备、解密证书
  13. 35岁的程序员如果不转行,从事哪些细分行业比较好?
  14. Windows7安装PowerShell5.1方法(Flutter新版本需要)
  15. Kicad如何导入封装库、符号库(元件库)以及3D模型文件?
  16. 算法导论第三版 16.1-5习题答案
  17. OSAL 添加一个LED任务
  18. MD5加密解密工具类
  19. julia 使用修改后的pkg
  20. python提取数组元素_使用python提取数组元素的一部分

热门文章

  1. Unity实现打字机效果
  2. AI 辅助,翻译的也可能是一门艺术! #LanguageX
  3. IM即时通讯-推荐框架
  4. 光伏产业促进中部省份崛起
  5. 安卓扁平化之路专题(一)Android 4.4新特性
  6. Q - C语言实验——打印菱形
  7. tpl.func.php,微擎自带组件 tpl函数
  8. 和女生约会的30条注意事项
  9. 微信公众号开发--双色球彩票开奖结果(Java版)
  10. 服务器CGI模式的运行机制