1、背景

在两台主机上安装包含master、standby、segment的一套GP数据库,在初始化阶段出现失败。

查看GP数据库的启停日志文件 /home/gpadmin/gpAdminLogs,错误信息如下:

20180317:08:31:39:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Postmaster /data2/primary/gpseg3 is running (pid 87089)
20180317:08:31:39:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Transitioning segments, mirroringMode is quiescent...
20180317:08:52:04:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Marking failed /data2/primary/gpseg3, Start failed; check segment logfile.  "peer shut down connection before response was fully received  Retrying no 1  peer shut down connection before response was fully received  Retrying no 2  peer shut down connection before response was fully received  Retrying no 3  peer shut down connection before response was fully received  Retrying no 4  peer shut down connection before response was fully received  Retrying no 5  peer shut down connection before response was fully received  Retrying no 6  peer shut down connection before response was fully received  Retrying no 7  peer shut down connection before response was fully received  Retrying no 8  peer shut down connection before response was fully received  Retrying no 9  peer shut down connection before response was fully received  Retrying no 10  peer shut down connection before response was fully received  Retrying no 11  peer shut down connection before response was fully received  Retrying no 12  peer shut down connection before response was fully received  Retrying no 13  peer shut down connection before response was fully received  Retrying no 14  peer shut down connection before response was fully received  Retrying no 15  peer shut down connection before response was fully received  Retrying no 16  peer shut down connection before response was fully received  Retrying no 17  peer shut down connection before response was fully received  Retrying no 18  peer shut down connection before response was fully received  Retrying no 19  peer shut down connection before response was fully received", 1000
20180317:08:52:04:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Marking failed /data1/primary/gpseg2, Start failed; check segment logfile.  "peer shut down connection before response was fully received  Retrying no 1  peer shut down connection before response was fully received  Retrying no 2  peer shut down connection before response was fully received  Retrying no 3  peer shut down connection before response was fully received  Retrying no 4  peer shut down connection before response was fully received  Retrying no 5  peer shut down connection before response was fully received  Retrying no 6  peer shut down connection before response was fully received  Retrying no 7  peer shut down connection before response was fully received  Retrying no 8  peer shut down connection before response was fully received  Retrying no 9  peer shut down connection before response was fully received  Retrying no 10  peer shut down connection before response was fully received  Retrying no 11  peer shut down connection before response was fully received  Retrying no 12  peer shut down connection before response was fully received  Retrying no 13  peer shut down connection before response was fully received  Retrying no 14  peer shut down connection before response was fully received  Retrying no 15  peer shut down connection before response was fully received  Retrying no 16  peer shut down connection before response was fully received  Retrying no 17  peer shut down connection before response was fully received  Retrying no 18  peer shut down connection before response was fully received  Retrying no 19  peer shut down connection before response was fully received", 1000
20180317:08:52:04:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Stopping segment /data2/primary/gpseg3, 40001 because of failure sending transition
20180317:08:52:05:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Stop of segment succeeded
20180317:08:52:05:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Stopping segment /data1/primary/gpseg2, 40000 because of failure sending transition
20180317:08:52:06:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Stop of segment succeeded
20180317:08:52:06:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Checking segment postmasters... (must_be_running True)
20180317:08:52:06:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Postmaster /data1/mirror/gpseg0 is running (pid 87084)
20180317:08:52:06:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Postmaster /data2/mirror/gpseg1 is running (pid 87085)
20180317:08:52:06:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Validating segment locales...
20180317:08:52:06:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Checking segment postmasters... (must_be_running True)
20180317:08:52:06:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Postmaster /data1/mirror/gpseg0 is running (pid 87084)
20180317:08:52:06:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-Postmaster /data2/mirror/gpseg1 is running (pid 87085)
20180317:08:52:06:087050 gpsegstart.py_host77:gpadmin:host77:gpadmin-[INFO]:-

COMMAND RESULTS

2、问题定位

重新安装了好几遍都卡在这里,一直初始化失败,后来从同事那儿了解到该机器不是全新安装,才换了种思维跟踪。

1)查看系统磁盘空间(df)和内存情况(free -g),发现有一台机器的内存趋近于0

2)释放空间后,重新安装仍失败,才找到gp数据库的内部打印日志,发现端口报失败。

通过netstat -apn|grep 端口号查看是否被占用,若已经被占用,然后通过ps -aux|grep 进程号查看进程,最后杀掉该进程即可。

[root@host77 gpAdminLogs]# netstat -anp|grep  40000
tcp        0      0 0.0.0.0:40000               0.0.0.0:*                   LISTEN      8330/emsent         
[root@host77 gpAdminLogs]# netstat -anp|grep  40001
tcp        0      0 0.0.0.0:40001               0.0.0.0:*                   LISTEN      8330/emsent         
[root@host77 gpAdminLogs]# ps -aux|grep 8330
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root      8330  0.0  0.5 567784 137140 ?       Ssl  Feb05  54:45 /ubas/ZXUN-UBAS/server/emsent/bin/emsent
root     53626  0.0  0.0   6392   724 pts/5    S+   09:16   0:00 grep 8330

[root@host77 gpAdminLogs]# kill -9 8330

3)重装再测试,安装成功。

3、经验教训

当问题出现时,如果不能一眼看出是自己脚本逻辑问题,需优先排除环境因素影响,如磁盘空间,内存使用,端口占用情况等,在环境没有问题的情况下,再分析前后逻辑。同时,对于一个成熟的产品,系统本身的日志是一个很重要的分析工具,定位时需要充分使用。

GP数据库初始化失败定位相关推荐

  1. Cello项目填坑记_keycloak数据库初始化失败

    Keycloak数据库初始化失败 原因 docker-compse.yml以及docker-compose-initial.yml定义的keycloak服务配置的是mariadb数据库或者MySQL数 ...

  2. mysql 数据库初始化失败怎么办_mysql数据库失败的解决方法

    mysql数据库失败的解决方法 发布时间:2020-11-16 10:50:01 来源:亿速云 阅读:87 作者:小新 这篇文章将为大家详细讲解有关mysql数据库失败的解决方法,小编觉得挺实用的,因 ...

  3. mysql 数据库初始化失败怎么办,MYSQL初始化错误方式详解

    mysql初始化错误[一] MySQL 初始化DATA报错 [ERROR] InnoDB: io_setup() failed with EAGAIN after 5 attempts. [ERROR ...

  4. oracle数据库初始化失败怎么办,oracle数据库正常停止

    利用dbstart和dbshut脚本自动启动和停止数据库的问题 客户的两台IBM Power 740小型机使用HACMP软件创建互备关系的数据库服务器,每台小型机运行一个数据库,任何一台服务器出现故障 ...

  5. oracle数据库初始化失败怎么办,数据库-如何解决ORA-011033:ORACLE初始化或关闭正在进行中...

    这是我对这个问题的解决方案: SQL> Startup mount ORA-01081: cannot start already-running ORACLE - shut it down f ...

  6. java连接数据库12514,数据库初始化失败,报错ORA-12514

    请问各位遇到过这种情况吗?- init datasource error java.sql.SQLException: Listener refused the connection with the ...

  7. 数据库还原失败,WITH MOVE 子句可用于重新定位一个或多个文件

    原因描述: 标题: Microsoft SQL Server Management Studio ------------------------------ 还原 对于 服务器"xxx-P ...

  8. U8 数据库服务器和应用服务器 分离后出现 登陆系统管理 远程组件初始化 失败 解决方案!

    U8 数据库服务器和应用服务器 分离后出现 登陆系统管理 远程组件初始化 失败 解决方案! 参考文章: (1)U8 数据库服务器和应用服务器 分离后出现 登陆系统管理 远程组件初始化 失败 解决方案! ...

  9. 金仓数据库KingbaseES初始化失败如何分析

    一.KingbaseES Initdb初始化数据库过程 initdb 创建一个新的KingbaseES数据库集群.一个数据库集群是由单个服务器实例管理的多个数据库的集合. 创建数据库系统包括创建数据库 ...

  10. U8 数据库服务器和应用服务器 分离后出现 登陆系统管理 远程组件初始化 失败 解决方案!...

    我的情况是: 因为我的虚拟机用的是U8的环境 ,因为做开发的时候,如果每个客户都要重新装一个环境 太麻烦了,而且也太废空间了. 于是 我把 U8的几个版本装到虚拟机,然后 把数据库都放到我的宿主主机上 ...

最新文章

  1. 先进机器人系统中的关键技术
  2. andpods授权码订单号分享_不要再让你的接口裸奔了,Boot快速尝试OAuth2密码和授权码模式...
  3. 改进筛法- 质数中的质数(质数筛法)
  4. BizTalk开发系列(三十一)配置和使用HTTP适配器
  5. csapp学习笔记2021.1.9
  6. HDU 4879 ZCC loves march (并查集,set,map)
  7. .NET Core第三方开源Web框架YOYOFx
  8. 7-4 哈利·波特的考试 (25 分)(C语言实现)
  9. c++: internal compiler error: Killed
  10. 每个程序员都必须知道的8种通用数据结构
  11. 雷军喊你报考武汉大学
  12. Vue3.x 推荐使用 mitt.js
  13. java 19 - 5 Throwable的几个常见方法
  14. 超级搜索术 总结篇2
  15. 408计算机考试科目英语数学,关于计算机考研408的那些事儿
  16. 三星BESPOKE家电系列海外发布会看点一览,定制化设计成未来家居首选
  17. 华为认证级别有哪些级别分类?考HCIP还是考HCIA?
  18. 利用python将中文名转换为英文名
  19. 多级分销系统(代理商佣金管理模块)设计概要(要求和数据库设计)
  20. GitHub加速访问插件

热门文章

  1. 安装VS2003 2005错误
  2. 图像处理中的数学原理详解
  3. CSS设置div水平垂直居中
  4. pdf编辑器如何在pdf上修改
  5. 分享WEB快速开发工具
  6. 魅族Android版本,魅族Flyme安卓版本
  7. VS2013 百度云资源以及密钥
  8. 基于51单片机的模拟信号检测系统
  9. ps复制文字到html,【答疑】PS里文字复制粘贴快捷键是什么啊? - 视频教程线上学...
  10. 报名 | AI产品经理闭门会_第13期_北京_4场主题分享_本周六(2月19日)