Installation Path A - Automated Installation by ClouderaManager

要求所有机器都能连网,而且外国网站不太稳定。一旦失败,重装非常痛苦。
Installation PathB - Manual Installation Using Cloudera ManagerPackages
设置RedHat/CentOS或者Debian/Ubuntu,下载系统package安装,下载量数目众多
Installation PathC - Manual Installation Using Tarballs andParcels安装步骤
该方法对系统侵入性最小,最大优点可实现全离线安装,而且重装什么的都非常方便。后期的集群统一包升级也非常好 
(一)前置条件(所有结点)
  • 关闭防火墙:
    service iptables stop(临时关闭)
    chkconfig iptables off(重启后生效)
  • 关闭SELINUX:
    setenforce 0(临时生效)  
    修改 /etc/selinux/config 下的SELINUX=disabled (重启后永久生效)
            sestatus 检查状态
  • Cloudera-Manager-Server与Cloudera-Manager-Agents之间SSH免密码登陆

(二)基础组件(所有结点)

  • 安装JDK1.7
  • 安装Python 2.6 or2.7

(三)配置安装用户与目录(所有结点)

$ mkdir/opt/cloudera-manager
$ useradd --system--home=/opt/cloudera-manager/cm-5.0.0/run/cloudera-scm-server--no-create-home --shell=/bin/false --comment "Cloudera SCM User"cloudera-scm
$ chown -Rcloudera-scm:cloudera-scm /opt/cloudera-manager

$ mkdir -p/opt/cloudera/parcel-repo
$ chowncloudera-scm:cloudera-scm /opt/cloudera

(四)下载配置ClouderaManager Server与Cloudera Manager Agent(所有结点)
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Version-and-Download-Information/Cloudera-Manager-Version-and-Download-Information.html?scroll=cmvd_topic_1
Tarball Files 针对RHEL6/CentOS6为:cloudera-manager-el6-cm5.0.0_x86_64.tar.gz
$ tar -xzvf cloudera-manager*.tar.gz -C /opt/cloudera-manager(/opt/cloudera-manager/cm-5.0.0将是CDH根目录)
$ vim /etc/cloudera-scm-agent/config.ini
  编辑 server_host=hadoop-1 (主节点HOST IP)
(五)下载配置Parcel本地安装源(主结点)
$ yum install httpd

下载CDH Parcels http://archive-primary.cloudera.com/cdh5/parcels/ 最新RHEL6/CentOS6 为:CDH-5.0.0-1.cdh5.0.0.p0.47-el6.parcel
$ mkdir /var/www/html/cdh5.0
$ mv CDH-5.0.0-1.cdh5.0.0.p0.47-el6.parcel /var/www/html/cdh5.0

下载CDH manifest.json 
$ mv manifest.json/var/www/html/cdh5.0
$ chmod -R ugo+rX/var/www/html/cdh5.0
$ service httpdstart
访问Apache Httpd验证安装包 http://hostname/cdh5.0
(六)配置Cloudera Configuration Service数据库(主结点MySQL数据库)

  • 安装外部数据库(Oracle/MySQL/PostgreSQL),RDBMS字符集必须支持UTF-8,如Oracle设置为AL32UTF8
  • 下载安装数据库驱动(主结点)
$ cp/tmp/mysql-connector-java-5.1.30.jar/usr/share/java/mysql-connector-java.jar

  • 创建Cloudera CDH配置数据库用户及授权
$mysql-hhadoop-1 -uroot -p
mysql> CREATE USER cdhusr IDENTIFIEDBY 'cdhpwd';

mysql >CREATEDATABASE cdh_cfg  DEFAULT CHARACTER SET utf8;
mysql> GRANTALL ON cdh_cfg.* TO 'cdhusr'@'%' IDENTIFIED BY 'cdhpwd';
  • 运行脚本自动创建Cloudera CDH配置数据库
$/share/cmf/schema/scm_prepare_database.sh-hhadoop-1 -uroot -proot--scm-host hadoop-10 mysql cdh_cfg cdhusr cdhpwd
  • 创建ClouderaCDH 数据库及用户帐号
  1. Reports Manager(必装)
  2. Hive Metastore(必装)
  3. Activity Monitor(仅MRv1需要)
  4. Cloudera Navigator(Data HubEdition Trial 或者 ClouderaEnterprise可装)

$mysql -hhadoop-1  -uroot -p
mysql >CREATE USER reports IDENTIFIED BY'reports';
mysql >CREATEDATABASE cdh_reports DEFAULT CHARACTER SETutf8;
mysql> GRANT ALL ON cdh_reports .* TO 'reports'@'%' IDENTIFIED BY 'reports';

mysql >CREATE USER hiveIDENTIFIED BY 'hive';
mysql >CREATEDATABASE cdh_hive DEFAULTCHARACTER SET utf8;
mysql> GRANT ALL ON cdh_hive.* TO 'hive'@'%' IDENTIFIED BY'hive'; 
mysql >CREATE USERactivity@172.16.36.191 IDENTIFIED BY 'activity';
mysql >CREATEDATABASE cdh_activity DEFAULT CHARACTER SETutf8;
mysql> GRANT ALL ON cdh_activity.* TO 'activity'@'%' IDENTIFIED BY 'activity';

(七)启动Cloudera ManagerServer与Cloudera Manager Agent
启动Cloudera ManagerServer(主结点) 
$ vim/etc/init.d/cloudera-scm-server
将CMF_DEFAULTS 由 ${CMF_DEFAULTS:-/etc/default} 修改为/etc/default
$ cp/etc/init.d/cloudera-scm-server/etc/init.d/cloudera-scm-server
$/etc/init.d/cloudera-scm-server start
$chkconfig cloudera-scm-server on
启动Cloudera Manager Agent(从结点)
$ vim/etc/init.d/cloudera-scm-agent
将CMF_DEFAULTS 由 ${CMF_DEFAULTS:-/etc/default} 修改为/etc/default
$ cp/etc/init.d/cloudera-scm-agent/etc/init.d/cloudera-scm-agent
$/etc/init.d/cloudera-scm-agent start 
$chkconfig cloudera-scm-agent on  

(八)根据向导安装
启动Cloudera Manager AdminConsole  http://hadoop-10:7180   admin/admin,选择ClouderaExpress进行安装。

添加本地Parcel配置源:http://hadoop-1/cdh5.0/, 根据需要选择自定义安装组件。一般情况下除Zookeeper需要修改三个主机部署外,如无特殊原因建议按照默认配置进行安装。推荐保持默认,挂载HDFS文件、Hive数据仓库、Zookeeper等数据目录所在磁盘进行安装。

BTW:安装成功后,也可根据下述步骤配置或更新Parcel本地源:

  1. Do one of the followingto open the parcel settings page:

      1. Click  inthe top navigation bar
      2. Clickthe EditSettings button.
      1. Select Administration > Settings.
      2. Clickthe Parcels category.
      1. Clickthe Hosts tab.
      2. Select Configuration > Viewand Edit.
      3. Clickthe Parcels category.
      4. Clickthe EditSettings button.
  2. Inthe RemoteParcel Repository URLs list,click  toopen an additional row.
  3. Enter the path to theparcel. For example, http://hostname:80/cdh5.0/.
  4. Click SaveChanges to commit the changes.

(九)Cloudera默认系统安装概要

组件安装版本

CDH Packaging and TarballInformation

组件

版本

Apache Hadoop

2.3.0-cdh5.0.0

Apache Hadoop MRv1

2.3.0-mr1-cdh5.0.0

Apache Hive

0.12.0-cdh5.0.0

Apache HBase

0.96.1.1-cdh5.0.0

Apache ZooKeeper

3.4.5-cdh5.0.0

Apache Sqoop 1

1.4.4-cdh5.0.0

Apache Sqoop2

1.99.3-cdh5.0.0

Apache Pig

0.12.0-cdh5.0.0

Apache Flume

1.4.0-cdh5.0.0

Apache Oozie

4.0.0-cdh5.0.0

Apache Mahout

0.8-cdh5.0.0

Apache Whirr

0.9.0-cdh5.0.0

DataFu

1.1.0-cdh5.0.0

Apache Sentry (incubating)

1.2.0-cdh5.0.0

Parquet

1.2.5-cdh5.0.0

Llama

1.0.0-cdh5.0.0

Apache Spark

0.9.0-cdh5.0.0

Apache Crunch

0.9.0-cdh5.0.0

Apache Avro

1.7.5-cdh5.0.0

Kite SDK

0.10.0-cdh5.0.0

Apache Solr

4.4.0-cdh5.0.0

Cloudera Search

1.0.0-cdh5.0.0

Lily HBase Indexer

1.3-cdh5.0.0

ClouderaManager

服务

实例

说明

路径

cloudera-manager

Server

组件目录

/opt/cloudera-manager/cm-5.0.0/lib/cloudera-scm-server

cloudera-manager

Server

启动目录

/opt/cloudera-manager/cm-5.0.0/etc/init.d/cloudera-scm-server

cloudera-manager

Server

日志目录

/opt/cloudera-manager/cm-5.0.0/log/cloudera-scm-server

cloudera-manager

Agent

组件目录

/opt/cloudera-manager/cm-5.0.0/lib/cloudera-scm-agent

cloudera-manager

Agent

启动目录

/opt/cloudera-manager/cm-5.0.0/etc/init.d/cloudera-scm-agent

cloudera-manager

Agent

日志目录

/opt/cloudera-manager/cm-5.0.0/log/cloudera-scm-agent

ManagementService

服务

实例

说明

路径

mgmt

alertpublisher

Alert Publisher 组件目录

/var/lib/cloudera-scm-alertpublisher

mgmt

alertpublisher

Alert Publisher 日志目录

/var/log/cloudera-scm-alertpublisher

mgmt

eventserver

Event Server 组件目录

/var/lib/cloudera-scm-eventserver

mgmt

eventserver

Event Server 日志目录

/var/log/cloudera-scm-eventserver

mgmt

hostmonitor

Host Monitor 组件目录

/var/lib/cloudera-host-monitor

mgmt

hostmonitor

Host Monitor 日志目录

/var/log/cloudera-scm-firehose

mgmt

servicemonitor

Service Monitor 存储目录

/var/lib/cloudera-service-monitor

mgmt

servicemonitor

Service Monitor 日志目录

/var/log/cloudera-scm-firehose

mgmt

headlamp

headlamp 存储目录

/var/lib/cloudera-scm-headlamp

mgmt

headlamp

headlamp日志目录

/var/log/cloudera-scm-headlamp

ComponentLib

服务

路径

zookeeper

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/

hadoop

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop

hadoop-hdfs

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop-hdfs

hadoop-mapreduce

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop-mapreduce

hadoop-yarn

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop-yarn

hbase

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase

hive

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hive

spark

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark

ComponentConfig

服务

路径

zookeeper

/etc/zookeeper/conf

hadoop

/etc/hadoop/conf

hadoop-hdfs

/etc/hadoop/conf.cloudera.hdfs

hadoop-mapreduce

/etc/hadoop/conf.cloudera.mapreduce1

hadoop-yarn

/etc/hadoop/conf.cloudera.yarn

hbase

/etc/hbase/conf

hive

/etc/hive/conf

spark

/etc/spark/conf

ComponentShell

服务

路径

zookeeper

/usr/bin/zookeeper-client ->/etc/alternatives/zookeeper-client

zookeeper

/usr/bin/zookeeper-server ->/etc/alternatives/zookeeper-server

hadoop

/usr/bin/hadoop -> /etc/alternatives/hadoop

hadoop-hdfs

/usr/bin/hdfs -> /etc/alternatives/hdfs

hadoop-mapred

/usr/bin/mapred -> /etc/alternatives/mapred

hadoop-yarn

/usr/bin/yarn -> /etc/alternatives/yarn

spark

/usr/bin/spark-shell ->/etc/alternatives/spark-shell

spark

/usr/bin/spark-executor ->/etc/alternatives/spark-executor

hbase

/usr/bin/hbase -> /etc/alternatives/hbase

hive

/usr/bin/hive -> /etc/alternatives/hive

ComponentLog

服务

路径

zookeeper

/var/log/zookeeper/

hadoop-hdfs

/var/log/hadoop-hdfs

hadoop-mapred

/var/log/hadoop-mapreduce

hadoop-yarn

/var/log/hadoop-yarn

spark

/var/log/spark/

hbase

/var/log/hbase

hive

/var/log/hive

(十)验证安装
安装完成后,首先确认MapReduceJobs的运行框架,可通过修改Linux alternatives的方式进行切换
[root@hadoop-2hadoop]# ll /etc/hadoop/conf
lrwxrwxrwx 1root root 29 Apr 30 17:23 /etc/hadoop/conf ->/etc/alternatives/hadoop-conf
[root@hadoop-2hadoop]# ll /etc/alternatives/hadoop-conf
lrwxrwxrwx 1root root 30 Apr 30 17:23 /etc/alternatives/hadoop-conf-> /etc/hadoop/conf.cloudera.yarn 
这表示目前集群MapReduce运行于YARN MRv2上
[root@hadoop-9 hadoop]# ll/etc/hadoop/conf
lrwxrwxrwx 1 root root 29 Apr 30 17:23/etc/hadoop/conf -> /etc/alternatives/hadoop-conf
[root@hadoop-9 hadoop]# ll/etc/alternatives/hadoop-conf
lrwxrwxrwx 1 root root 36 Apr 30 17:23/etc/alternatives/hadoop-conf -> /etc/hadoop/conf.cloudera.mapreduce1
这表示目前集群MapReduce运行于MRv1上
$ sudo -u hdfshadoop jar/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jarpi 10 100
取决与MapReduce Jobs配置运行于YARN或者MapReduceService,登录相应的控制台界面进行查看:
  • Clusters > ClusterName > yarnApplications
  • Clusters > ClusterName > mapreduceActivities
(十一)Ports Used by Components of CDH5
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Installation-Guide/cm5ig_ports_cdh5.html
升级问题:
   CDH4升级CDH5时查看配置会发现hadoop仍然执行MRv1,如下:
# ll/etc/alternatives/hadoop-conf 
lrwxrwxrwx 1root root 36 May 9 14:02 /etc/alternatives/hadoop-conf ->/etc/hadoop/conf.cloudera.mapreduce1
CDH通过Linux alternatives管理MR分布式计算框架,因此进行如下调整:
# alternatives--set hadoop-conf /etc/hadoop/conf.cloudera.yarn;
# alternatives--remove hadoop-conf/etc/hadoop/conf.cloudera.mapreduce1;
# alternatives--remove hadoop-conf /etc/hadoop/conf.cloudera.hdfs1;
# rm -rf/etc/hadoop/conf*1;
回到ClouderaManager Admin Console重新部署客户端配置后,再次查看如下:
# ll/etc/alternatives/hadoop-conf
lrwxrwxrwx 1root root 30 May 12 09:17 /etc/alternatives/hadoop-conf ->/etc/hadoop/conf.cloudera.yarn
配置问题:
1)Cloudera 建议将/proc/sys/vm/swappiness 设置为 0。当前设置为 60。使用 sysctl 命令在运行时更改该设置并编辑/etc/sysctl.conf 以在重启后保存该设置。您可以继续进行安装,但可能会遇到问题,Cloudera Manager报告您的主机由于交换运行状况不佳。以下主机受到影响: 
解决办法:
$ echo 0 >/proc/sys/vm/swappiness
$ vim /etc/sysctl.conf
增加 vm.swappiness = 0
2)Running in non-interactive mode, and dataappears to exist in Storage Directory /dfs/nn. Notformatting
解决办法:
检查NameNode、SecondaryName、DataNode数据目录是否为空,如有文件则需要备份后清除;很有可能由于反复安装造成垃圾数据或升级遗留历史数据
3)时钟偏差
解决办法:设置NTP时间同步,所有节点安装NTP服务 yum installntp:
  • 主节点
  1. 根据提示设置时区$ tzselect  [ 5) Asia-> 9)China -> 1) east China -> 1) Yes ]
  2. 查看系统时间 $date设置系统时间$ date --set "04/25/09 10:19" (月/日/年时:分:秒)
  3. 查看硬件时间 $hwclock --show
  4. 同步硬件时间 $clock -w $hwclock--hctosys (hc代表硬件时间,sys代表系统时间)
  5. 修改NTP配置$ vim /etc/ntp.conf
  6. restrict 172.16.66.0 mask 255.255.255.0
  7. server 127.127.1.0
  8. fudge 127.127.1.0
  9. 重启并加入开机启动$service ntpd restart $ chkconfig ntpdon
  • 子节点
  1. 修改NTP配置$vim /etc/ntp.conf 注释掉其他server,并添加与主节点同步
  2. server172.16.66.138
  3. 重启并加入开机启动$service ntpd restart $chkconfig ntpd on

4 ) 401 Unauthorized: ERROR Failed to connectto newly launched supervisor. Agent willexit 
解决办法:
$/etc/init.d/cloudera-scm-agenthard_stop
$ kill -9 $(pgrep -fsupervisord)
$/etc/init.d/cloudera-scm-agentstart
5)已启用“透明大页面”,它可能会导致重大的性能问题。版本为“Red Hat EnterpriseLinux Server release 6.4 (Santiago)”且版本为“2.6.32-358.el6.x86_64”的Kernel 已将 enabled 设置为“[always] never”,并将 defrag 设置为“[always]never”。请运行“echo never >/sys/kernel/mm/redhat_transparent_hugepage/defrag”以禁用此设置。然后将同一命令添加到一个init 脚本中,如 /etc/rc.local,这样当系统重启时就会设置它。或者,升级到 RHEL 6.5或更新版本,它们不存在此错误。
解决办法:根据提示操作

6)主机主机检查器检查主机失败: Inspectorfailed on the following hosts...
解决办法:
修改/etc/host,参考如下:
127.0.0.1 localhost
::1localhost6
172.16.66.129 hadoop-1.certus.comhadoop-1
7)java.io.IOException: the pathcomponent: '/' is world-writable. Its permissions are 0777. Pleasefix this or select a different socket path
解决办法:
DataNode的root根目录权限设置为0777太高导致不安全,修改为755或者默认权限
8)java.io.IOException: Cannotrun program "/etc/hadoop/conf.cloudera.yarn/topology.py" (indirectory "/root"): error=13, Permission denied
很有可能/root目录没有“执行权限”,尝试 chmod +x/root
卸载Cloudera:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Installation-Guide/cm5ig_uninstall_cm.html#cmig_topic_18

CDH Parcels 离线安装相关推荐

  1. CDH 5.15.2 离线安装

    一.前置准备 1. 基础信息 1.1 机器 机器名 服务 hadoop1 主节点 hadoop2 data.task hadoop3 data.task 1.2 服务版本 服务 版本 cdh 5.15 ...

  2. CDH-5.7.1离线安装

    CDH-5.7.1离线安装 参考自:http://blog.csdn.net/jdplus/article/details/45920733 1.文件下载 CDH (Cloudera's Distri ...

  3. Cloudera Manager和CDH5.8离线安装

    https://blog.csdn.net/zzq900503/article/details/52982828 简介 我们在上篇文章中已经了解了CDH,为了后续的学习,我们本章就来安装CDH5.8. ...

  4. CentOS7 Cloudera Manager6 完全离线安装 CDH6 集群

    本文是在CentOS7.4 下进行CDH6集群的完全离线部署.CDH5集群与CDH6集群的部署区别比较大. 说明:本文内容所有操作都是在root用户下进行的. 文件下载 首先一些安装CDH6集群的必须 ...

  5. Cloudera Manager 和 CDH 4 终极安装

    转载请注明出处:http://www.cnblogs.com/thinkCoding/p/3567408.html 系统环境 操作系统:CentOS 6.5 Cloudera Manager 版本:4 ...

  6. Cloudera Manager安装之利用parcels方式安装3或4节点集群(包含最新稳定版本或指定版本的安装)(添加服务)(CentOS6.5)(五)...

    参考博客 Cloudera Manager安装之利用parcels方式安装单节点集群  Cloudera Manager安装之Cloudera Manager 5.3.X安装(三)(tar方式.rpm ...

  7. cdh6.2离线安装(傻瓜式安装教程)

    CDH6.2离线安装流程 1 安装准备 1.1 cdh6.2 下载方式一: 官网下载 https://archive.cloudera.com/cm6/6.2.0/redhat7/yum/RPMS/x ...

  8. [转]Cloudera Manager和CDH5.8离线安装

    https://blog.csdn.net/zzq900503/article/details/52982828 https://www.cnblogs.com/felixzh/p/9082344.h ...

  9. hadoop基础----hadoop实战(七)-----hadoop管理工具---使用Cloudera Manager安装Hadoop---Cloudera Manager和CDH5.8离线安装

    hadoop基础----hadoop实战(六)-----hadoop管理工具---Cloudera Manager---CDH介绍 简介 我们在上篇文章中已经了解了CDH,为了后续的学习,我们本章就来 ...

最新文章

  1. 3、vue-router之什么是动态路由
  2. Java之JDK和JRE
  3. vue商城项目开发:底部导航菜单(路由)
  4. Java多线程(五)之BlockingQueue深入分析
  5. php request time,php中time()与$_SERVER[REQUEST_TIME]用法区别分析
  6. 泛型那点儿事儿 泛型概述 简单样例代码
  7. icml和nips等各类重要会议论文收集
  8. 说说C#的async和await
  9. linux page cache 大小,Linux内核学习笔记(八)Page Cache与Page回写
  10. 迷宫 DFS (模拟和DFS)
  11. C++使用命名管道使用进程间通信
  12. 水利水电水资源模拟试题3
  13. python人机对战_人机对战初体验:Python实现四子棋游戏
  14. Java毕业设计-个性影片/电影推荐系统
  15. 基于mysql+php的英语四六级过级成绩管理
  16. redis配置RDB、AOF以及RDB、AOF同时开启
  17. Linux中如何一条命令创建多个文件夹
  18. 2022-2027年中国蛋白质粉行业市场深度分析及投资战略规划报告
  19. 同步电机相间互感推导
  20. 数据结构与算法A实验六图论---7-12 Dijkstra算法(模板)

热门文章

  1. FBX SDK下载安装教程
  2. 补交20145226蓝墨云班课 -- 后缀表达式
  3. java项目开发团队协作重要性_Java1班项目实战 | 团队协作,我们是认真的!
  4. 团队协作工具调研笔记
  5. 石家庄12重点项目开工,滹沱新区更名为正定新区
  6. 计算机学院工作总结报告,计算机学院学生会中期工作总结大会
  7. 随机抽样的python实现
  8. javaSE基础全覆盖
  9. 1000行代码入门python-小白入门篇,Python到底是什么?
  10. 看了鲁迅的平面设计,很多设计师表示可能要转行了