转载自:http://atlas.apache.org/InstallationSteps.html

Building & Installing Apache Atlas

Building Atlas

git clone https://git-wip-us.apache.org/repos/asf/incubator-atlas.git atlascd atlasexport MAVEN_OPTS="-Xmx1536m -XX:MaxPermSize=512m" && mvn clean install

Once the build successfully completes, artifacts can be packaged for deployment.

mvn clean package -Pdist

NOTE: 1. Use option '-DskipTests' to skip running unit and integration tests 2. Use option '-P perf' to instrument atlas to collect performance metrics

To build a distribution that configures Atlas for external HBase and Solr, build with the external-hbase-solr profile.

mvn clean package -Pdist,external-hbase-solr

Note that when the external-hbase-solr profile is used the following steps need to be completed to make Atlas functional.

  • Configure atlas.graph.storage.hostname (see "Graph persistence engine - HBase" in the Configuration section).
  • Configure atlas.graph.index.search.solr.zookeeper-url (see "Graph Search Index - Solr" in the Configuration section).
  • Set HBASE_CONF_DIR to point to a valid HBase config directory (see "Graph persistence engine - HBase" in the Configuration section).
  • Create the SOLR indices (see "Graph Search Index - Solr" in the Configuration section).

To build a distribution that packages HBase and Solr, build with the embedded-hbase-solr profile.

mvn clean package -Pdist,embedded-hbase-solr

Using the embedded-hbase-solr profile will configure Atlas so that an HBase instance and a Solr instance will be started and stopped along with the Atlas server by default.

Atlas also supports building a distribution that can use BerkeleyDB and Elastic search as the graph and index backends. To build a distribution that is configured for these backends, build with the berkeley-elasticsearch profile.

mvn clean package -Pdist,berkeley-elasticsearch

An additional step is required for the binary built using this profile to be used along with the Atlas distribution. Due to licensing requirements, Atlas does not bundle the BerkeleyDB Java Edition in the tarball.

You can download the Berkeley DB jar file from the URL: http://download.oracle.com/otn/berkeley-db/je-5.0.73.zip and copy the je-5.0.73.jar to the ${atlas_home}/libext directory.

Tar can be found in atlas/distro/target/apache-atlas-${project.version}-bin.tar.gz

Tar is structured as follows

|- bin|- atlas_start.py|- atlas_stop.py|- atlas_config.py|- quick_start.py|- cputil.py
|- conf|- atlas-application.properties|- atlas-env.sh|- hbase|- hbase-site.xml.template|- log4j.xml|- solr|- currency.xml|- lang|- stopwords_en.txt|- protowords.txt|- schema.xml|- solrconfig.xml|- stopwords.txt|- synonyms.txt
|- docs
|- hbase|- bin|- conf...
|- server|- webapp|- atlas.war
|- solr|- bin...
|- README
|- NOTICE
|- LICENSE
|- DISCLAIMER.txt
|- CHANGES.txt

Note that if the embedded-hbase-solr profile is specified for the build then HBase and Solr are included in the distribution.

In this case, a standalone instance of HBase can be started as the default storage backend for the graph repository. During Atlas installation the conf/hbase/hbase-site.xml.template gets expanded and moved to hbase/conf/hbase-site.xml for the initial standalone HBase configuration. To configure ATLAS graph persistence for a different HBase instance, please see "Graph persistence engine - HBase" in the Configuration section.

Also, a standalone instance of Solr can be started as the default search indexing backend. To configure ATLAS search indexing for a different Solr instance please see "Graph Search Index - Solr" in the Configuration section.

Installing & Running Atlas

Installing Atlas
tar -xzvf apache-atlas-${project.version}-bin.tar.gzcd atlas-${project.version}

Configuring Atlas

By default config directory used by Atlas is {package dir}/conf. To override this set environment variable ATLAS_CONF to the path of the conf dir.

atlas-env.sh has been added to the Atlas conf. This file can be used to set various environment variables that you need for you services. In addition you can set any other environment variables you might need. This file will be sourced by atlas scripts before any commands are executed. The following environment variables are available to set.

# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
#export JAVA_HOME=# any additional java opts you want to set. This will apply to both client and server operations
#export ATLAS_OPTS=# any additional java opts that you want to set for client only
#export ATLAS_CLIENT_OPTS=# java heap size we want to set for the client. Default is 1024MB
#export ATLAS_CLIENT_HEAP=# any additional opts you want to set for atlas service.
#export ATLAS_SERVER_OPTS=# java heap size we want to set for the atlas server. Default is 1024MB
#export ATLAS_SERVER_HEAP=# What is is considered as atlas home dir. Default is the base locaion of the installed software
#export ATLAS_HOME_DIR=# Where log files are stored. Defatult is logs directory under the base install location
#export ATLAS_LOG_DIR=# Where pid files are stored. Defatult is logs directory under the base install location
#export ATLAS_PID_DIR=# where the atlas titan db data is stored. Defatult is logs/data directory under the base install location
#export ATLAS_DATA_DIR=# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
#export ATLAS_EXPANDED_WEBAPP_DIR=

Settings to support large number of metadata objects

If you plan to store several tens of thousands of metadata objects, it is recommended that you use values tuned for better GC performance of the JVM.

The following values are common server side options:

export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps"

The -XX:SoftRefLRUPolicyMSPerMB option was found to be particularly helpful to regulate GC performance for query heavy workloads with many concurrent users.

The following values are recommended for JDK 7:

export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=3072m -XX:PermSize=100M -XX:MaxPermSize=512m"

The following values are recommended for JDK 8:

export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m"

NOTE for Mac OS users If you are using a Mac OS, you will need to configure the ATLAS_SERVER_OPTS (explained above).

In {package dir}/conf/atlas-env.sh uncomment the following line

#export ATLAS_SERVER_OPTS=

and change it to look as below

export ATLAS_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

Hbase as the Storage Backend for the Graph Repository

By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently. The HBase versions currently supported are 1.1.x. For configuring ATLAS graph persistence on HBase, please see "Graph persistence engine - HBase" in the Configuration section for more details.

Pre-requisites for running HBase as a distributed cluster

  • 3 or 5 ZooKeeper nodes
  • Atleast 3 RegionServer nodes. It would be ideal to run the DataNodes on the same hosts as the Region servers for data locality.

HBase tablename in Titan can be set using the following configuration in ATLAS_HOME/conf/atlas-application.properties:

atlas.graph.storage.hbase.table=apache_atlas_titan
atlas.audit.hbase.tablename=apache_atlas_entity_audit

Configuring SOLR as the Indexing Backend for the Graph Repository

By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently. For configuring Titan to work with Solr, please follow the instructions below

  • Install solr if not already running. The version of SOLR supported is 5.2.1. Could be installed from http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz
  • Start solr in cloud mode.

SolrCloud mode uses a ZooKeeper Service as a highly available, central location for cluster management. For a small cluster, running with an existing ZooKeeper quorum should be fine. For larger clusters, you would want to run separate multiple ZooKeeper quorum with atleast 3 servers. Note: Atlas currently supports solr in "cloud" mode only. "http" mode is not supported. For more information, refer solr documentation - https://cwiki.apache.org/confluence/display/solr/SolrCloud

  • For e.g., to bring up a Solr node listening on port 8983 on a machine, you can use the command:
      $SOLR_HOME/bin/solr start -c -z <zookeeper_host:port> -p 8983

  • Run the following commands from SOLR_BIN (e.g. $SOLR_HOME/bin) directory to create collections in Solr corresponding to the indexes that Atlas uses. In the case that the ATLAS and SOLR instance are on 2 different hosts,

first copy the required configuration files from ATLAS_HOME/conf/solr on the ATLAS instance host to the Solr instance host. SOLR_CONF in the below mentioned commands refer to the directory where the solr configuration files have been copied to on Solr host:

  $SOLR_BIN/solr create -c vertex_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor$SOLR_BIN/solr create -c edge_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor$SOLR_BIN/solr create -c fulltext_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor

Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance. Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration. The number of shards cannot exceed the total number of Solr nodes in your !SolrCloud cluster.

The number of replicas (replicationFactor) can be set according to the redundancy required.

Also note that solr will automatically be called to create the indexes when the Atlas server is started if the SOLR_BIN and SOLR_CONF environment variables are set and the search indexing backend is set to 'solr5'.

  • Change ATLAS configuration to point to the Solr instance setup. Please make sure the following configurations are set to the below values in ATLAS_HOME/conf/atlas-application.properties
 atlas.graph.index.search.backend=solr5atlas.graph.index.search.solr.mode=cloudatlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181atlas.graph.index.search.solr.zookeeper-connect-timeout=<SolrCloud Zookeeper Connection Timeout>. Default value is 60000 msatlas.graph.index.search.solr.zookeeper-session-timeout=<SolrCloud Zookeeper Session Timeout>. Default value is 60000 ms

  • Restart Atlas

For more information on Titan solr configuration , please refer http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.htm

Pre-requisites for running Solr in cloud mode * Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk. Solr works well with 32GB RAM. Plan to provide as much memory as possible to Solr process * Disk - If the number of entities that need to be stored are large, plan to have at least 500 GB free space in the volume where Solr is going to store the index data * SolrCloud has support for replication and sharding. It is highly recommended to use SolrCloud with at least two Solr nodes running on different servers with replication enabled. If using SolrCloud, then you also need ZooKeeper installed and configured with 3 or 5 ZooKeeper nodes

Configuring Kafka Topics

Atlas uses Kafka to ingest metadata from other components at runtime. This is described in the Architecture page in more detail. Depending on the configuration of Kafka, sometimes you might need to setup the topics explicitly before using Atlas. To do so, Atlas provides a script bin/atlas_kafka_setup.py which can be run from the Atlas server. In some environments, the hooks might start getting used first before Atlas server itself is setup. In such cases, the topics can be run on the hosts where hooks are installed using a similar script hook-bin/atlas_kafka_setup_hook.py. Both these use configuration in atlas-application.properties for setting up the topics. Please refer to the Configuration page for these details.

Setting up Atlas

There are a few steps that setup dependencies of Atlas. One such example is setting up the Titan schema in the storage backend of choice. In a simple single server setup, these are automatically setup with default configuration when the server first accesses these dependencies.

However, there are scenarios when we may want to run setup steps explicitly as one time operations. For example, in a multiple server scenario using High Availability, it is preferable to run setup steps from one of the server instances the first time, and then start the services.

To run these steps one time, execute the command bin/atlas_start.py -setup from a single Atlas server instance.

However, the Atlas server does take care of parallel executions of the setup steps. Also, running the setup steps multiple times is idempotent. Therefore, if one chooses to run the setup steps as part of server startup, for convenience, then they should enable the configuration option atlas.server.run.setup.on.start by defining it with the value true in the atlas-application.properties file.

Starting Atlas Server
bin/atlas_start.py [-port <port>]

By default,

  • To change the port, use -port option.
  • atlas server starts with conf from {package dir}/conf. To override this (to use the same conf with multiple atlas upgrades), set environment variable ATLAS_CONF to the path of conf dir

Using Atlas

  • Quick start model - sample model and data
  bin/quick_start.py [<atlas endpoint>]

  • Verify if the server is up and running
  curl -v http://localhost:21000/api/atlas/admin/version{"Version":"v0.1"}

  • List the types in the repository
  curl -v http://localhost:21000/api/atlas/types{"results":["Process","Infrastructure","DataSet"],"count":3,"requestId":"1867493731@qtp-262860041-0 - 82d43a27-7c34-4573-85d1-a01525705091"}

  • List the instances for a given type
  curl -v http://localhost:21000/api/atlas/entities?type=hive_table{"requestId":"788558007@qtp-44808654-5","list":["cb9b5513-c672-42cb-8477-b8f3e537a162","ec985719-a794-4c98-b98f-0509bd23aac0","48998f81-f1d3-45a2-989a-223af5c1ed6e","a54b386e-c759-4651-8779-a099294244c4"]}curl -v http://localhost:21000/api/atlas/entities/list/hive_db

  • Search for entities (instances) in the repository
  curl -v http://localhost:21000/api/atlas/discovery/search/dsl?query="from hive_table"

Dashboard

Once atlas is started, you can view the status of atlas entities using the Web-based dashboard. You can open your browser at the corresponding port to use the web UI.

Stopping Atlas Server

bin/atlas_stop.py

Known Issues

Setup

If the setup of Atlas service fails due to any reason, the next run of setup (either by an explicit invocation of atlas_start.py -setup or by enabling the configuration option atlas.server.run.setup.on.start) will fail with a message such as A previous setup run may not have completed cleanly.. In such cases, you would need to manually ensure the setup can run and delete the Zookeeper node at /apache_atlas/setup_in_progress before attempting to run setup again.

If the setup failed due to HBase Titan schema setup errors, it may be necessary to repair the HBase schema. If no data has been stored, one can also disable and drop the 'titan' schema in HBase to let setup run again.

apache atlas 官方安装相关推荐

  1. Apache Atlas服务安装

    Atlas架构: Atlas 是一组可扩展和可扩展的核心基础治理服务--使企业能够有效且高效地满足其在 Hadoop 中的合规性要求,并允许与整个企业数据生态系统集成. Apache Atlas 为组 ...

  2. Apache Atlas 安装及入门

    介绍 Apache Atlas 用来管理hive元数据 安装 # 解压更名 tar xvfz apache-atlas-1.0.0-sources.tar.gz cd apache-atlas-sou ...

  3. CDH6 安装 Apache atlas

    编译 一.环境 JDK1.8 下载地址 https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151. ...

  4. Apache Atlas 安装部署

    Apache Atlas 安装部署 1.Atlas 原理及相关概念 2.安装前置条件 3.安装Hbase 4.安装Solr 5.安装kafka 6.编译Atlas 7.安装Atlas 8.集成hive ...

  5. Apache Atlas 安装

    1.从官网下载源码http://atlas.apache.org/Downloads.html ,本项目选择source 1.0.0: 2.下载好包后,上传到服务器中某个机器上:解压包: tar -z ...

  6. Apache Atlas 2.3.0 安装部署

    安装前环境准备 hadoop 3.1.0 hbase 2.3.4 hive 3.1.3 solr 7.7.3 zookeeper 3.5.7 kafka 2.11-2.4.1 atlas 2.3.0 ...

  7. Apache Atlas 1.2.0 安装部署

    组件版本: Hive 1.1.0 CDH 5.15.0 Atlas 1.2.0 Centos 8 Maven 3.6.3 文章目录 Java 环境安装 Atlas编译安装 下载源码包 解压编译 Atl ...

  8. apache atlas 2.1.0的安装

    1.Atlas 架构原理 2.atlas安装 1)Atlas 官网地址:https://atlas.apache.org/ 2)文档查看地址:https://atlas.apache.org/2.1. ...

  9. Apache Atlas安装部署报错之解决方案

    说明:在按照官网文档安装Atlas的过程中,遇到了各种各样的问题,现将本人在实践中,成功安装部署的过程分享给大家. 一.下载安装包 下载安装JDK.(不要用Linux自带的jdk,且atlas2.0要 ...

最新文章

  1. 2017.6.4 入门组 NO.2——睡眠
  2. @class和#import
  3. 基于Java的四大开源测试工具
  4. Quartus中常见错误·
  5. 你不是一个人在战斗!有人将吴恩达的视频教程做成了文字版
  6. java hex to float_Hex to Float
  7. 学习RUNOOB.COM进度一
  8. shell编程中配置文件的使用
  9. 【R】函数-数学函数
  10. 字典树+博弈 CF 455B A Lot of Games(接龙游戏)
  11. python爬取个人信息_Python爬取个人支付宝朋友信息操作示例
  12. WebBrowser页面与WinForm交互技巧(转)
  13. LabVIEW显示Unicode字符
  14. 服务器dump文件位置,dump解析入门-用VS解析dump文件进行排障
  15. 写作技巧 - Markdown常用Emoji表情符号
  16. 阿里巴巴Java开发手册代码规范
  17. 利用python进行数据分析~基金分析
  18. 静态市盈率和动态市盈率
  19. sql网上书店项目的实现
  20. 不同RAID硬盘利用率参考

热门文章

  1. 消息队列:RabbitMQ
  2. Object类型数据转为int型数据
  3. 继电器开关性能简要对比
  4. 基于视觉的扫地机器人导航系统(模块设计)
  5. 提前做好网络安全分析,运维真轻松(一)
  6. 还在抱怨pandas运行速度慢?这几个方法会颠覆你的看法
  7. uniapp 在h5 模式下扫码
  8. Android 4.4 系统如何恢复出厂设置
  9. 一个程序员码农的迷茫期
  10. FTTB+NAT+DHCP+pppoe+CBAC+*** client+Authentication AAA