Apache Atlas的部署

Apache Atlas 的部署

个人公众号

编译atlas

你可以下载官方给出的稳定2.0 的稳定版，也可以clone最新的项目编译，本文以clone最新的项目编译为例：

git clone https://github.com/apache/atlas.git
cd atlas
#修改各组件版本和自己集群的各组件版本一致
vim pom.xml
export MAVEN_OPTS="-Xms2g -Xmx4g"
#该方式编译不会内嵌HBase和Solr（使用外部HBase和Solr）
mvn clean -DskipTests package -Pdist
#使用内嵌的HBase和Solr
mvn clean -DskipTests package -Pdist,embedded-hbase-solr

编译成功如下图：

部署

进入到apache-atlas-2.0.0/distro 目录下将编译好的安装包拷贝到安装目录下，解压安装即可

修改配置

vim atlas-application.properties
根据注释修改相关配置，如果使用内置的hbase 和 solr 则 Hbase 和 Solr 相关配置不用动# Hbase地址（对应的zk地址）配置（自带hbase会根据此端口启动一个zk实例）
atlas.graph.storage.hostname=localhost:2181 # 如果使用外部hbase，则填写外部zookeeper地址# Solr地址配置
atlas.graph.index.search.solr.http-urls=http://localhost:8984/solr（solr服务地址）# Kafka相关配置
atlas.notification.embedded=true # 如果要使用外部的kafka，则改为false
# 内嵌kafka会根据此端口启动一个zk实例
atlas.kafka.zookeeper.connect=localhost:9026 # 如果使用外部kafka，则填写外部zookeeper地址
atlas.kafka.bootstrap.servers=localhost:9027 # 如果使用外部kafka，则填写外部broker server地址

使用外部HBase和Solr的完整配置文件

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##########  Graph Database Configs  ########## Graph Database#Configures the graph database to use.  Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase
atlas.graph.storage.hbase.table=atlas_janus#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=pre-kafka01:2181,pre-kafka02:2181,pre-kafka03:2181
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1# Graph Search Index
atlas.graph.index.search.backend=solr#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=pre-kafka01:2181/solr,pre-kafka02:2181/solr,pre-kafka03:2181/solr
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: https://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.search.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150#########  Import Configs  #########
#atlas.import.temp.directory=/temp/import#########  Notification Configs  #########
# Setup the following configurations only in test deployments where Kafka is started within Atlas in embedded mode
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=pre-kafka01:2181,pre-kafka02:2181,pre-kafka03:2181
atlas.kafka.bootstrap.servers=pre-kafka01:9092,pre-kafka02:9092,pre-kafka03:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlasatlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab## Server port configuration
atlas.server.http.port=22440
#atlas.server.https.port=21443#########  Security Properties  ########## SSL config
atlas.enableTLS=false#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks# Authentication configatlas.authentication.method.kerberos=false
atlas.authentication.method.file=true#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=<password>
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=<default role>#########  JAAS Configuration #########atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM#########  Server Properties  #########
atlas.rest.address=http://localhost:22440
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false
# Client Configs
atlas.client.readTimeoutMSecs=60000
atlas.client.connectTimeoutMSecs=60000#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=pre-kafka01:2181,pre-kafka02:2181,pre-kafka03:2181#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json#########  Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=#########  Performance Configs  #########
atlas.graph.storage.lock.retries=10
atlas.graph.storage.cache.db-cache-time=120000
# Minimum number of threads in the atlas web server
atlas.webserver.minthreads=10
# Maximum number of threads in the atlas web server
atlas.webserver.maxthreads=100
# Keepalive time in secs for the thread pool of the atlas web server
atlas.webserver.keepalivetimesecs=60
# Queue size for the requests(when max threads are busy) for the atlas web server
atlas.webserver.queuesize=100
# Set to the property to true to enable warn on no relationships defined between entities on a particular attribute
# Not having relationships defined can lead to performance loss while adding new entities
atlas.relationships.warnOnNoRelationships=false#########  CSRF Configs  #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query.<key>.<name>
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=#########  Compiled Query Cache Configuration  ########## The size of the compiled query cache.  Older queries will be evicted from the cache
# when we reach the capacity.#atlas.CompiledQueryCache.capacity=1000# Allows notifications when items are evicted from the compiled query
# cache because it has become full.  A warning will be issued when
# the specified number of evictions have occurred.  If the eviction
# warning threshold <= 0, no eviction warnings will be issued.#atlas.CompiledQueryCache.evictionWarningThrottle=0#########  Full Text Search Configuration  ##########Set to false to disable full text search.
#atlas.search.fulltext.enable=true#########  Gremlin Search Configuration  ##########Set to false to disable gremlin search.
atlas.search.gremlin.enable=false########## Add http headers ############atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.<headerName>=<headerValue>#########  UI Configuration #########atlas.ui.default.version=v2#集成添加hive钩子配置
#在hive中做任何操作，都会被钩子所感应到，并生成相应的事件发往atlas所订阅
的kafka-topic，再由atlas进行元数据生成和存储管理
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary######### sqoop Hook Configs #######
atlas.hook.sqoop.synchronous=false
atlas.hook.sqoop.numRetries=3
atlas.hook.sqoop.queueSize=10000

集成Solr

1、将atlas conf 目录下的solr配置复制到solr安装节点的配置文件目录下

scp -r /atlas/conf/solr pre-tool:/etc/solr/
#重命名文件夹名（防止以后误删）
mv /etc/solr/solr atlas-solr

2、创建collection

cd /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr./bin/solr create -c vertex_index -d /etc/solr/atlas-solr-conf -shards 1 -replicationFactor 1 -force./bin/solr create -c fulltext_index -d /etc/solr/atlas-solr-conf -shards 1 -replicationFactor 1 -force./bin/solr create -c edge_index -d /etc/solr/atlas-solr-conf -shards 1 -replicationFactor 1 -force

3、访问solr web ui 可以看到刚创建collection

集成Hive

修改 hive-site.xml

<property><name>hive.exec.post.hooks</name><value>org.apache.atlas.hive.hook.HiveHook</value>
</property>

修改hive-env.sh

HIVE_AUX_JARS_PATH=/opt/atlas/hook/hive

修改Hive 辅助JAR目录

#atlas-plugin-classloader-3.0.0-SNAPSHOT.jar jar的路径
/opt/atlas/hook/hive

修改：hive-site.xml 的 HiveServer2 高级配置代码段（安全阀

<property><name>hive.exec.post.hooks</name><value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
<property><name>hive.reloadable.aux.jars.path</name><value>/opt/atlas/hook/hive</value>
</property>

修改：HiveServer2 环境高级配置代码段

HIVE_AUX_JARS_PATH=/opt/atlas/hook/hive

将所需的jar包传到hive所在节点

将atlas-application.properties配置文件拷贝到hive所在结点的HIVE_CONF_DIR 目录下

cp /opt/atlas/conf/atlas-application.properties /etc/hive/conf

启动

使用内嵌的HBase和solr 启动方式

启动 hbase
进入到 hbase 目录
.bin/start-hbase.sh 启动solr
进入到 solr 目录
./bin/solr start -c -z dev-kafka01:2181 -p 8983 -force# 创建初始化collections
bin/solr create -c vertex_index -shards 1 -replicationFactor 1 -force
bin/solr create -c edge_index -shards 1 -replicationFactor 1 -force
bin/solr create -c fulltext_index -shards 1 -replicationFactor 1 -force访问 solr 的web ui:
http://dev-tool:8983/solr/#/~collections/edge_index启动 atlas
进入到 atlas 目录下用内嵌的 hbase 和 solr 启动，配置hbase和solr跟随atlas启动和停止
export MANAGE_LOCAL_HBASE=true
export MANAGE_LOCAL_SOLR=true
./bin/atlas_start.py

使用外部的HBase和solr 启动方式

#进入到 atlas 目录下
./bin/atlas_start.py

访问web ui

http://ip:port/

用户名：admin 密码：admin

Apache Atlas的部署相关推荐

Apache Atlas 安装部署
Apache Atlas 安装部署 1.Atlas 原理及相关概念 2.安装前置条件 3.安装Hbase 4.安装Solr 5.安装kafka 6.编译Atlas 7.安装Atlas 8.集成hive ...
Apache Atlas安装部署报错之解决方案
说明:在按照官网文档安装Atlas的过程中,遇到了各种各样的问题,现将本人在实践中,成功安装部署的过程分享给大家. 一.下载安装包下载安装JDK.(不要用Linux自带的jdk,且atlas2.0要 ...
Apache Atlas 2.3.0 安装部署
安装前环境准备 hadoop 3.1.0 hbase 2.3.4 hive 3.1.3 solr 7.7.3 zookeeper 3.5.7 kafka 2.11-2.4.1 atlas 2.3.0 ...
Apache Atlas 1.2.0 安装部署
组件版本: Hive 1.1.0 CDH 5.15.0 Atlas 1.2.0 Centos 8 Maven 3.6.3 文章目录 Java 环境安装 Atlas编译安装下载源码包解压编译 Atl ...
使用 Apache Atlas 进行数据治理
本文由网易云发布. 作者:网易/刘勋(本篇文章仅限知乎内部分享,如需转载,请取得作者同意授权.) 面对海量且持续增加的各式各样的数据对象,你是否有信心知道哪些数据从哪里来以及它如何随时间而变化?采用 ...
Apache Atlas 安装及入门
介绍 Apache Atlas 用来管理hive元数据安装 # 解压更名 tar xvfz apache-atlas-1.0.0-sources.tar.gz cd apache-atlas-sou ...
Java Client Of Apache Atlas
一.问题描述公司数据中台要重构,一共由两个理念,"元数据驱动"和"可插拔式部署服务".本文侧重点是"元数据驱动",技术选型使用Apache ...
Apache Atlas管理Hive元数据
部署好Atlas服务后,重点是对Atlas的使用,这里对Atlas管理Hive元数据做简单总结. Hive元数据导入全量导入 bash /usr/hdp/2.6.4.0-91/atlas/hook- ...
apache atlas 2.1.0的安装
1.Atlas 架构原理 2.atlas安装 1)Atlas 官网地址:https://atlas.apache.org/ 2)文档查看地址:https://atlas.apache.org/2.1. ...

Apache Atlas的部署

编译atlas

部署

修改配置

集成Solr

集成Hive

修改 hive-site.xml

修改hive-env.sh

修改Hive 辅助JAR目录

修改：hive-site.xml 的 HiveServer2 高级配置代码段（安全阀

修改：HiveServer2 环境高级配置代码段

将所需的jar包传到hive所在节点

启动

使用内嵌的HBase和solr 启动方式

使用外部的HBase和solr 启动方式

访问web ui

Apache Atlas的部署相关推荐

最新文章

热门文章

Apache Atlas的部署

编译atlas

部署

修改配置

集成Solr

集成Hive

修改 hive-site.xml

修改hive-env.sh

修改Hive 辅助JAR目录

修改 ：hive-site.xml 的 HiveServer2 高级配置代码段（安全阀

修改：HiveServer2 环境高级配置代码段

将所需的jar包传到hive所在节点

启动

使用内嵌的HBase和solr 启动方式

使用外部的HBase和solr 启动方式

访问web ui

Apache Atlas的部署相关推荐

最新文章

热门文章

修改：hive-site.xml 的 HiveServer2 高级配置代码段（安全阀