1.准备工作

安装Torque必须首先配置linux主机名称,服务器主机名称大多默认localhost,不建议直接使用localhost。

linux主机名称修改地址:http://www.cnblogs.com/smbin/p/8488909.html

linux系统:Centos 7

主机名称:master

系统用户:root

Torque官网下载地址:http://www.adaptivecomputing.com/support/download-center/torque-download/

作者下载的版本:http://wpfilebase.s3.amazonaws.com/torque/torque-6.1.2.tar.gz

2.安装和配置Torque

首先在/opt下创建文件夹torque,在此文件夹中下载压缩包,并解压下载并解压Torque文件

[root@mastar ]# cd /opt
[root@mastar ]# mkdir torque
[root@mastar ]# cd torque
[root@mastar torque]# wget http://wpfilebase.s3.amazonaws.com/torque/torque-6.1.2.tar.gz
......省略下载过程
[root@mastar torque]# tar -zxvf torque-6.1.2.tar.gz
......省略解压过程
[root@mastar torque]#cd torque-6.1.2/
[root@mastar torque-6.1.2]#

加载、安装和master配置。master配置就是主机和PBS之间的配置,master就是主机名

[root@master torque-6.1.2]# yum install libxml2-devel openssl-devel gcc gcc-c++ boost-devel libtool-y
Loaded plugins: fastestmirror, langpacks
base                                                                                                                                                                                      | 3.6 kB  00:00:00
extras                                                                                                                                                                                    | 3.4 kB  00:00:00
mysql-connectors-community                                                                                                                                                                | 2.5 kB  00:00:00
mysql-tools-community                                                                                                                                                                     | 2.5 kB  00:00:00
mysql56-community                                                                                                                                                                         | 2.5 kB  00:00:00
updates                                                                                                                                                                                   | 3.4 kB  00:00:00
Determining fastest mirrors* base: mirrors.cn99.com* extras: mirrors.tuna.tsinghua.edu.cn* updates: mirrors.tuna.tsinghua.edu.cn
Package libxml2-devel-2.9.1-6.el7_2.3.x86_64 already installed and latest version
Package 1:openssl-devel-1.0.2k-8.el7.x86_64 already installed and latest version
Package gcc-4.8.5-16.el7_4.1.x86_64 already installed and latest version
Package gcc-c++-4.8.5-16.el7_4.1.x86_64 already installed and latest version
Package boost-devel-1.53.0-27.el7.x86_64 already installed and latest version
No package libtool-y available.
Nothing to do
[root@master torque-6.1.2]# ./configure --prefix=/usr/local/torque --with-scp--with-default-server=master
......省略加载过程
Building components: server=yes mom=yes clients=yesgui=no drmaa=no pam=no
PBS Machine type    : linux
Remote copy         : /bin/scp -rpB
PBS home            : /var/spool/torque
Default server      : masterUnix Domain sockets :
Linux cpusets       : no
Tcl                 : disabled
Tk                  : disabled
Authentication      : trqauthdconfigure: WARNING: This compilation has strict compiler options enabled that cause
the build to fail if any compiler warnings are emitted.  If this build fails
because of a harmless warning, please report the problem to torqueusers@supercluster.org
and run configure again without --enable-gcc-warnings.Ready for 'make'.
[root@master torque-6.1.2]# make
......省略加载过程
[root@master torque-6.1.2]# make install
......省略加载过程
[root@master torque-6.1.2]# make packages

[root@master torque-6.1.2]# make packages
  Building packages from /opt/torque/torque-6.1.2/tpackages
  rm -rf /opt/torque/torque-6.1.2/tpackages
  mkdir /opt/torque/torque-6.1.2/tpackages
  Building ./torque-package-server-linux-x86_64.sh ...
  libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'          //需要去执行命令:libtool --finish /usr/local/torque/lib
  Building ./torque-package-mom-linux-x86_64.sh ...
  libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'
  Building ./torque-package-clients-linux-x86_64.sh ...
  libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'
  Building ./torque-package-devel-linux-x86_64.sh ...
  libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'
  Building ./torque-package-doc-linux-x86_64.sh ...
  Done.

The package files are self-extracting packages that can be copied
  and executed on your production machines. Use --help for options.
  [root@master torque-6.1.2]# libtool --finish /usr/local/torque/lib
  libtool: finish: PATH="/usr/lib/jvm/java-1.7.0-openjdk/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/torque/bin:/usr/local/torque/sbin:/root/bin:/sbin" ldconfig -n /usr/l   ocal/torque/lib
  ----------------------------------------------------------------------
  Libraries have been installed in:
  /usr/local/torque/lib

If you ever happen to want to link against installed libraries
  in a given directory, LIBDIR, you must either use libtool, and
  specify the full pathname of the library, or use the `-LLIBDIR'
  flag during linking and do at least one of the following:
  - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
  during execution
  - add LIBDIR to the `LD_RUN_PATH' environment variable
  during linking
  - use the `-Wl,-rpath -Wl,LIBDIR' linker flag
  - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
  more information, such as the ld(1) and ld.so(8) manual pages.

 

配置服务:pbs_server PBS,pbs_sched,pbs_mom,trqauthd

[root@master torque-6.1.2]# cp contrib/init.d/{pbs_{server,sched,mom},trqauthd} /etc/init.d/
[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do chkconfig --add $i; chkconfig $i on; done      //遇见y/n选择y回车继续

设置Torque环境变量

[root@master torque-6.1.2]# TORQUE=/usr/local/torque
[root@master torque-6.1.2]# echo "TORQUE=$TORQUE" >> /etc/profile
[root@master torque-6.1.2]# echo "export PATH=\$PATH:$TORQUE/bin:$TORQUE/sbin" >> /etc/profile
[root@master torque-6.1.2]# source /etc/profile

以root用户启动,报错服务指向的主机名和现有主机名不一致,安装过程中暂时没有找到解决方案!安装完毕后有解决方案,在本文最下方!!!

[root@master torque-6.1.2]# ./torque.setup root          //尝试以root启动,报错:服务“pbs_server”已经启动
initializing TORQUE (admin: root)
pbs_server already running... run 'qterm' to stop pbs_server and rerun          //运行sterm关闭服务
[root@master torque-6.1.2]# qterm                        //发现服务指向的主机名称和正常显示的主机名称不一致,命令qterm无法关闭
Can not resolve name for server mastar. (rc = -2 - )
Cannot resolve specified server host 'mastar'.
qterm: could not connect to server '' (15010) Access from host not allowed, or unknown host
[root@master mom_priv]# ps -e | grep pbs          //查询服务,尝试以kill -9命令关闭服务
30505 ?        00:00:00 pbs_server
[root@master mom_priv]# kill -9 30505
[root@master mom_priv]# ps -e | grep pbs
[root@master torque-6.1.2]# ./torque.setup root        //发现服务关闭后仍无法启动,服务指向的主机名和现有主机名不一致!经确认上边配置的时候没有配置错误://‘./configure --prefix=/usr/local/torque --with-scp--with-default-server=master’ configure没有错误,未找到解决方案,怀疑是系统缓存的问题。
initializing TORQUE (admin: root)              //暂时只能修改/etc/hosts文件的内容You have selected to start pbs_server in create mode.
If the server database exists it will be overwritten.
do you wish to continue y/(n)?y
Can not resolve name for server mastar. (rc = -2 - )
Cannot resolve specified server host 'mastar'.
qmgr: cannot connect to server  (errno=15010) Access from host not allowed, or unknown host
ERROR: cannot set root@master in operators list
Can not resolve name for server mastar. (rc = -2 - )
Cannot resolve specified server host 'mastar'.
qterm: could not connect to server '' (15010) Access from host not allowed, or unknown host
[root@master torque-6.1.2]# vi /etc/hosts            //修改/etc/hosts文件10.131.101.142   master
10.131.101.142   mastar        //添加这一行的内容
27.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

[root@master torque-6.1.2]# ./torque.setup root            //此时执行成功
  initializing TORQUE (admin: root)

You have selected to start pbs_server in create mode.
  If the server database exists it will be overwritten.
  do you wish to continue y/(n)?y          //输入y

开始pbs_server,pbs_sched服务,pbs_mom和trqauthd

[root@master torque-6.1.2]# qterm          //关闭服务
[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i start; done
Starting pbs_server (via systemctl):                       [  OK  ]
Starting pbs_sched (via systemctl):                        [  OK  ]
Starting pbs_mom (via systemctl):                          [  OK  ]
Starting trqauthd (via systemctl):                         [  OK  ]

指定计算节点

添加计算节点”master”,设置CPU的数量

检查CPU的数量通过使用命令“lscpu”或“nproc”

[root@master torque-6.1.2]# vi /var/spool/torque/server_priv/nodes
master np=8          //添加本行信息,注意等号前后不要有空格 master是主机名
[root@master torque-6.1.2]# vi /var/spool/torque/mom_priv/config
pbsserver master        //添加这两行信息  master是主机名
logevent 255

检查PBS的信息

[root@master torque-6.1.2]# ps -e | grep pbs
11188 ?        00:00:00 pbs_sched
11215 ?        00:00:00 pbs_mom
29683 ?        00:00:00 pbs_server
[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i restart; done
Restarting pbs_server (via systemctl):                     [  OK  ]
Restarting pbs_sched (via systemctl):                      [  OK  ]
Restarting pbs_mom (via systemctl):                        [  OK  ]
Restarting trqauthd (via systemctl):                       [  OK  ]

创建队列的默认信息

[root@master torque-6.1.2]# qmgr -c 'create queue master'
[root@master torque-6.1.2]# qmgr -c 'set queue master queue_type= execution'
[root@master torque-6.1.2]# qmgr -c 'set queue master started= true'
[root@master torque-6.1.2]# qmgr -c 'set queue master enabled= true'
[root@master torque-6.1.2]# qmgr -c 'set queue master resources_default.walltime= 240:00:00'
[root@master torque-6.1.2]# qmgr -c 'set queue master resources_default.nodes= 1'
[root@master torque-6.1.2]# qmgr -c 'set server default_queue= master'

提交任务测试:

[root@master torque-6.1.2]# qnodes      //查询计算节点的状态
masterstate = freepower_state = Runningnp = 8ntype = clusterstatus = opsys=linux,uname=Linux master 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64,sessions=3154 3489 41105 41699,nsessions=4,nusers=3,idletime=3198,totmem=94868512kb,availmem=92195284kb,physmem=32367652kb,ncpus=56,loadave=0.85,gres=,netload=4005925534,state=free,varattr= ,cpuclock=Fixed,macaddr=68:cc:6e:c3:cf:87,version=6.1.2,rectime=1519980694,jobs=mom_service_port = 15002mom_manager_port = 15003[root@master torque-6.1.2]# su master        //切换用户:此master不是主机名,而是一个用户的名字
[master@master torque-6.1.2]$ echo sleep 10 | qsub
0.master
[master@master torque-6.1.2]$ qstat        //查询任务状态
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
0.master                   STDIN            master                 0 R master
[master@master torque-6.1.2]$ qstat -a -n      //查询任务状态和每个任务占用cpu核数master: Req'd       Req'd       Elap
Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
0.master                master      master   STDIN             12470     1      1       --  240:00:00 C       -- master/0
[master@master torque-6.1.2]$ 

主机名和现有主机名不一致的问题解决方案:

这个问题一直没有找到出现的原因,但是怀疑是之前的Torque删除时没有删除干净,在“创建队列的默认信息”这一步的缓存依然存在。

在Torque安装成功后,停止Torque

[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i stop; done        //停止服务T,start改为stop
Stopping pbs_server (via systemctl):                       [  OK  ]
Stopping pbs_sched (via systemctl):                        [  OK  ]
Stopping pbs_mom (via systemctl):                          [  OK  ]
Stopping trqauthd (via systemctl):                         [  OK  ]
[root@master torque-6.1.2]# ./torque.setup root        //重新运行这一步
hostname: master
Currently no servers active. Default server will be listed as active server. Error  15133
Active server name: master  pbs_server port is: 15001
trqauthd daemonized - port /tmp/trqauthd-unix
trqauthd successfully started
initializing TORQUE (admin: root)You have selected to start pbs_server in create mode.
If the server database exists it will be overwritten.
do you wish to continue y/(n)?y          //输入y
[root@master torque-6.1.2]# vi /var/spool/torque/server_priv/nodes
master np=8           //=前后不要带空格[root@master torque-6.1.2]# qterm          //关闭pbs_server、 pbs_sched、 pbs_mom、 trqauthd服务
[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i start; done        //重启服务
Starting pbs_server (via systemctl): [ OK ]
Starting pbs_sched (via systemctl): [ OK ]
Starting pbs_mom (via systemctl): [ OK ]
Starting trqauthd (via systemctl): [ OK ]

[root@master torque-6.1.2]# qnodes          //查询状态,报错服务trqauthd没有启动
  socket_connect_unix failed: 15137
  qnodes: cannot connect to server master, error=15137 (could not connect to trqauthd)
  [root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i restart; done        //重新启动服务
  Restarting pbs_server (via systemctl): [ OK ]
  Restarting pbs_sched (via systemctl): [ OK ]
  Restarting pbs_mom (via systemctl): [ OK ]
  Restarting trqauthd (via systemctl): [ OK ]

[root@master torque-6.1.2]# qnodes      //查询状态,成功
master
state = free
power_state = Running
np = 8
ntype = cluster
status = opsys=linux,uname=Linux master 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64,sessions=3154 3489 10903 41105 41699,nsessions=5,nusers=4,idletime=5287,totmem=94868512kb,
availmem=92236268kb,physmem=32367652kb,ncpus=56,loadave=0.01,gres=,netload=8920006882,state=free,varattr= ,cpuclock=Fixed,macaddr=68:cc:6e:c3:cf:87,version=6.1.2,rectime=1519982783,jobs=
mom_service_port = 15002
mom_manager_port = 15003

Centos 7, Torque 单节点部署相关推荐

  1. Centos 6/7安装Torque(单节点)

    Centos 6/7安装Torque(单节点)** 1. 简介 PBS(Portable Batch System)最初由NASA的Ames研究中心开发,主要为了提供一个能满足异构计算网络需要的软件包 ...

  2. k8s二进制单节点部署

    k8s二进制单节点部署 常见的k8s部署方式 Kubernetes二进制部署(单节点) 环境准备 部署etcd集群(这里就不在单独的服务器上部署,直接部署在各节点上,节省资源) 下载证书制作工具 利用 ...

  3. 啃K8s之快速入门,以及哭吧S(k8s)单节点部署

    啃K8s之快速入门,以及哭吧S(k8s)单节点部署 一:Kubernets概述 1.1:Kubernets是什么? 1.2:Kubernets特性 1.3:Kubernets群集架构与组件 1.3.1 ...

  4. Elasticsearch在Linux中的单节点部署和集群部署

    目录 一.Elasticsearch简介 二.Linux单节点部署 1.软件下载解压 2.创建用户 3.修改配置文件 4.切换到刚刚创建的用户启动软件 5.测试 三.Linux集群配置 1.拷贝文件 ...

  5. Graylog 日志服务器单节点部署

    资料 https://docs.graylog.org/docs/ 简介 Graylog项目由Lennart Koopmann在2009年左右启动.当时,最着名的日志管理软件vendor发布了他们产品 ...

  6. Ubuntu下用devstack单节点部署Openstack

    一.实验环境 本实验是在Vmware Workstation下创建的单台Ubuntu服务器版系统中,利用devstack部署的Openstack Pike版. 宿主机:win10 1803  8G内存 ...

  7. skywalking单节点部署

    skywalking单节点部署 skywalking服务部署 下载skywalking服务文件 下载地址:http://skywalking.apache.org/downloads/ 我使用的是Bi ...

  8. openstack 系列: 基于CentOS7系统使用packstack工具单节点部署openstacktrain---Part-I安装简易命令

    1说明 本人非linux专业人士,更不是云计算专家 部署过程是从各大博客自己百度知道 各种搜索排查,硬是搭起了train环境 过程纠结,先是在win 10 vmware 上安装centos7 再基于c ...

  9. ElasticSearch学习(四)——Linux 单节点部署

    文章名称 地址 ElasticSearch学习(一)--概述 前往 ElasticSearch学习(二)--索引.文档简单操作 前往 ElasticSearch学习(三)--Windows 集群部署 ...

最新文章

  1. 使用 TFDConnection 的 pooled 连接池
  2. 用C#实现FTP搜索引擎
  3. python封装api给vue_vue的封装
  4. 点击按钮测试用例标题_功能测试有哪些用例?分享功能测试用例大全
  5. java string类api_java基础—String类型常用api
  6. 个人项目:wc程序(java)
  7. RN做的Android应用反编译,macOS Catalina配置Android反编译三件套 apktool/dex2jar/enjarify/jd-gui...
  8. (205)硬件实现多时钟设计
  9. java mongocollection_MongoDb完结笔记-与java结合
  10. Linix中Dockerfile部署Spring Boot项目
  11. 威纶通触摸屏与2台台达温控器modbus rtu 通讯程序
  12. 华为云学院干货:对象存储服务:便捷管理存储资源
  13. 在未来税制下个人所得税信息管理系统建设设想
  14. Filebeat日志采集
  15. android 照片裁剪_如何在Android上裁剪和编辑照片
  16. 通过驱动断链来隐藏驱动
  17. 使用云服务器搭一个网站(1)
  18. 数据结构(0719-林雪阵)
  19. 放弃40 万年薪从字节裸辞,告别 996 拥抱 955…
  20. 客户无盘系统服务器内存,无盘服务器内存大小

热门文章

  1. 国产数据库神通下载安装使用jdbc
  2. SDR原理--FM部分
  3. 卷积神经网络:(二)风格迁移——原理部分
  4. Linux下通过USB连接并利用手机拨号上网
  5. python字符是什么意思_字符串是什么—Python学习笔记
  6. 解决telnet无法连接(Connection refused)
  7. 第十篇:什么叫德配位
  8. 学业水平测试计算机flash,(信息技术学业水平考试—flash操作题解.doc
  9. 如何将wma格式转换mp3?
  10. 计算机专业3.6的绩点是什么水平,绩点3.5算什么水平