十二:NodeManager
Health Checker Service 创建检查服务
Disk Checker 磁盘检查
Configuration Name | Allowed Values | Description |
---|---|---|
yarn.nodemanager.disk-health-checker.enable | true, false | Enable or disable the disk health checker service |
yarn.nodemanager.disk-health-checker.interval-ms | Positive integer | The interval, in milliseconds, at which the disk checker should run; the default value is 2 minutes |
yarn.nodemanager.disk-health-checker.min-healthy-disks | Float between 0-1 | The minimum fraction of disks that must pass the check for the NodeManager to mark the node as healthy; the default is 0.25 |
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage | Float between 0-100 | The maximum percentage of disk space that may be utilized before a disk is marked as unhealthy by the disk checker service. This check is run for every disk used by the NodeManager. The default value is 90 i.e. 90% of the disk can be used. |
yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb | Integer |
The minimum amount of free space that must be available on the disk for the disk checker service to mark the disk as healthy. This check is run for every disk used by the NodeManager. The default value is 0 i.e. the entire disk can be used.
来源: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManager.html
|
External Health Script 附件健康检查脚本
Configuration Name | Allowed Values | Description |
---|---|---|
yarn.nodemanager.health-checker.interval-ms | Postive integer | The interval, in milliseconds, at which health checker service runs; the default value is 10 minutes. |
yarn.nodemanager.health-checker.script.timeout-ms | Postive integer | The timeout for the health script that’s executed; the default value is 20 minutes. |
yarn.nodemanager.health-checker.script.path | String | Absolute path to the health check script to be run. |
yarn.nodemanager.health-checker.script.opts | String | Arguments to be passed to the script when the script is executed. |
NodeManager Restart NM重启
Step 1. To enable NM Restart functionality, set the following property in conf/yarn-site.xml to true. 启用NM restart
Property | Value |
---|---|
yarn.nodemanager.recovery.enabled | true, (default value is set to false) |
Step 2. Configure a path to the local file-system directory where the NodeManager can save its run state. 配置state-store
Property | Description |
---|---|
yarn.nodemanager.recovery.dir | The local filesystem directory in which the node manager will store state when recovery is enabled. The default value is set to$hadoop.tmp.dir/yarn-nm-recovery. |
Step 3. Configure a valid RPC address for the NodeManager. 重启后NM可能会使用不同的端口导致client连接失效,因此要把随机端口改成固定端口
Property | Description |
---|---|
yarn.nodemanager.address | Ephemeral ports (port 0, which is default) cannot be used for the NodeManager’s RPC server specified via yarn.nodemanager.address as it can make NM use different ports before and after a restart. This will break any previously running clients that were communicating with the NM before restart. Explicitly setting yarn.nodemanager.address to an address with specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling NM restart. |
Step 4. Auxiliary services. 辅助服务 应用程序应该支持重启
NodeManagers in a YARN cluster can be configured to run auxiliary services. For a completely functional NM restart, YARN relies on any auxiliary service configured to also support recovery. This usually includes (1) avoiding usage of ephemeral ports so that previously running clients (in this case, usually containers) are not disrupted after restart and (2) having the auxiliary service itself support recoverability by reloading any previous state when NodeManager restarts and reinitializes the auxiliary service.
A simple example for the above is the auxiliary service ‘ShuffleHandler’ for MapReduce (MR). ShuffleHandler respects the above two requirements already, so users/admins don’t have do anything for it to support NM restart: (1) The configuration property mapreduce.shuffle.port controls which port the ShuffleHandler on a NodeManager host binds to, and it defaults to a non-ephemeral port. (2) The ShuffleHandler service also already supports recovery of previous state after NM restarts. ShuffleHandler支持NM的重启
转载于:https://www.cnblogs.com/skyrim/p/7455990.html
十二:NodeManager相关推荐
- 2021年大数据Kafka(十二):❤️Kafka配额限速机制❤️
全网最详细的大数据Kafka文章系列,强烈建议收藏加关注! 新文章都已经列出历史文章目录,帮助大家回顾前面的知识重点. 目录 系列历史文章 Kafka配额限速机制 限制producer端的速率 限制c ...
- 2021年大数据HBase(十二):Apache Phoenix 二级索引
全网最详细的大数据HBase文章系列,强烈建议收藏加关注! 新文章都已经列出历史文章目录,帮助大家回顾前面的知识重点. 目录 系列历史文章 前言 Apache Phoenix 二级索引 一.索引分类 ...
- 2021年大数据Hive(十二):Hive综合案例!!!
全网最详细的大数据Hive文章系列,强烈建议收藏加关注! 新文章都已经列出历史文章目录,帮助大家回顾前面的知识重点. 目录 系列历史文章 前言 Hive综合案例 一.需求描述 二.项目表的字段 三.进 ...
- 2021年大数据Hadoop(二十二):MapReduce的自定义分组
全网最详细的Hadoop文章系列,强烈建议收藏加关注! 后面更新文章都会列出历史文章目录,帮助大家回顾知识重点. 目录 本系列历史文章 前言 MapReduce的自定义分组 需求 分析 实现 第一步: ...
- 敏捷宣言遵循的十二条原则
敏捷宣言遵循的十二条原则Twelve Principles behind the Agile Manifesto 我们遵循以下原则: We follow these principles: 我们最重要 ...
- maya批量命名插件_教你玩转MAYA的四十二精华造诣(第一期)
最近在整理文档时发现我收藏了一篇关于MAYA应用技巧的文章,突然有兴趣看了看,结果发现老版本MAYA中的某些内容很多已经无法应用于新版本.我又上网查了一下,结果发现网上好多帖子和我收藏的这篇内容基本一 ...
- C++语言学习(十二)——C++语言常见函数调用约定
C++语言学习(十二)--C++语言常见函数调用约定 一.C++语言函数调用约定简介 C /C++开发中,程序编译没有问题,但链接的时候报告函数不存在,或程序编译和链接都没有错误,但只要调用库中的函数 ...
- 第十二周-学习进度条
第十二周 所花时间(包括上课) 20h 代码量(行) 230 博客园(篇) 2 了解到的知识点 fragment的相关知识 转载于:https://www.cnblogs.com/liujinxi ...
- axi dma 寄存器配置_FPGA Xilinx Zynq 系列(三十二)AXI 接口
大侠好,欢迎来到FPGA技术江湖,江湖偌大,相见即是缘分.大侠可以关注FPGA技术江湖,在"闯荡江湖"."行侠仗义"栏里获取其他感兴趣的资源,或者一起煮酒言欢. ...
- stm32l0的停止模式怎么唤醒_探索者 STM32F407 开发板资料连载第二十二章 待机唤醒实验
1)实验平台:alientek 阿波罗 STM32F767 开发板 2)摘自<STM32F7 开发指南(HAL 库版)>关注官方微信号公众号,获取更多资料:正点原子 第二十二章 待机唤醒实 ...
最新文章
- 英伟达十年力作:新一代光线追踪显卡 Quadro RTX及核心架构Turing,可支持AI运算...
- 技术负责人要停止写代码吗?
- Django笔记06
- SAP Spartacus 4.0 deprecation 之一 - i18next-xhr-backend
- android 一个字符串分两行显示_重新梳理Android权限管理
- Elementary OS 系统Java8环境的配置
- tensorflow tf.train.batch()
- 曝光原理_泰国精戈咖啡效果反馈 作用原理曝光
- [转载来之雨松:NGUI研究院之为什么打开界面太慢(十三)]
- PHP 批量生成 WORD2007 文件
- 【Python系列】之2:列表和元组
- java list_java中的list集合
- 问题四十六:怎么用ray tracing画superellipsoid
- linux 网卡 虚拟化,RHEL6.4 KVM虚拟化网卡桥接,PXE无人值守安装虚拟机
- hadoop菜鸟教程 Hadoop学习资料(云计算学习电子书)
- 使用Clion进行Qt项目开发
- python爬虫语句_Python爬虫练手之爬句子迷
- 苹果6s最大屏幕尺寸_苹果 iPhone 12 Pro DXOMARK 屏幕评分 87 分,最大问题是黄色色偏 - 苹果,iPhone...
- MMpose代码讲解之关键点Heatmap可视化
- 【UI】产品设计之什么是色彩情绪
热门文章
- ICallbackEventHandler 前后台无刷新交互
- php 调用系统命令 执行外部程序
- 记与公司内网微博的谈话
- Spread for Windows Forms快速入门(2)---设置Spread表单
- if with large data project
- spring boot 菜鸟教程学习:spring是一个超级大工厂能够管理java对象(bean)和他们之间的关系(依赖注入)
- medical research
- hide subscribers is a good approach if you have a very limited subscribers
- C++静态全局变量问题
- NOJ 20 吝啬的国度