那么到底什么是热点？？？

注意这里说的不是hot spot。。。

是hbase中的一个概念热点。

在其官方文档的行键设计中有着明确的说明：【谷歌翻译，其中有些点，请自动忽略！！！】

热点发现

HBase中的行按行键按字典顺序排序。该设计针对扫描进行了优化，使您可以将相关行或将一起读取的行彼此靠近存储。但是，设计不当的行键是引起热点的常见原因。当大量客户端流量定向到群集的一个节点或仅几个节点时，就会发生热点。此流量可能表示读取，写入或其他操作。流量使负责托管该区域的单台计算机不堪重负，从而导致性能下降并可能导致区域不可用。这也可能对由同一区域服务器托管的其他区域产生不利影响，因为该主机无法满足请求的负载。设计数据访问模式非常重要，这样才能充分，均匀地利用群集。

为防止写入时出现热点，请设计行键，以使确实确实需要在同一区域中的行存在，但从更大的角度看，数据将被写入集群中的多个区域，而不是一次写入一个区域。下面介绍了一些避免热点的常用技术，以及它们的一些优点和缺点。

盐

从这种意义上讲，加盐与加密无关，而是指将随机数据添加到行密钥的开头。在这种情况下，加盐是指在行键上添加一个随机分配的前缀，以使其排序不同于其他方式。可能的前缀数量对应于您要分布数据的区域数量。如果您在其他分布更均匀的行中反复出现一些“热”行键模式，则盐析会有所帮助。考虑下面的示例，该示例表明加盐可以将写入负载分散到多个RegionServer上，并说明对读取的某些负面影响。
例子11.加盐的例子
假设您具有以下行键列表，并且对表进行了拆分，以使字母表中的每个字母都有一个区域。前缀“ a”是一个区域，前缀“ b”是另一个区域。在此表中，所有以'f'开头的行都在同一区域中。本示例重点介绍具有以下键的行：
foo0001
foo0002
foo0003
foo0004
现在，假设您想将它们分布在四个不同的区域。您决定使用四个不同的盐：a，b，c，和d。在这种情况下，这些字母前缀中的每一个都将位于不同的区域。应用盐后，将改为使用以下行键。由于您现在可以写入四个单独的区域，因此理论上您在写入时的吞吐量是所有写入相同区域时的吞吐量的四倍。
a-foo0003
b-foo0001
c-foo0004
d-foo0002
然后，如果您添加另一行，则会随机为其分配四个可能的盐值之一，并最终靠近现有行之一。
a-foo0003
b-foo0001
c-foo0003
c-foo0004
d-foo0002
由于此分配是随机的，因此，如果要按字典顺序检索行，则需要做更多的工作。这样，盐化会尝试增加写入的吞吐量，但是会在读取期间增加成本。
散列

可以使用单向散列代替给定行，该散列将使给定的行始终使用相同的前缀“加盐”，其方式是将负载分布在RegionServer上，但允许在读取过程中进行可预测性。使用确定性哈希可以使客户端重建完整的行键，并使用Get操作正常检索该行。

例子12.散列例子

鉴于上述添加示例中的情况相同，您可以改为应用单向哈希，这将导致具有键的行foo0003始终且可预测地接收a前缀。然后，要检索该行，您将已经知道密钥。您还可以优化事物，例如使某些对密钥始终位于同一区域。

倒转钥匙

防止热点的第三个常见技巧是反转固定宽度或数字行键，以便最频繁更改（最低有效数字）的部分位于第一位。这有效地使行键随机化，但牺牲了行排序属性。

参见https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and article on Salted Tables from the Phoenix project, and the discussion in the comments of HBASE-11682 for more information about avoiding hotspotting.

原文：

Hotspotting

Rows in HBase are sorted lexicographically by row key. This design optimizes for scans, allowing you to store related rows, or rows that will be read together, near each other. However, poorly designed row keys are a common source of hotspotting. Hotspotting occurs when a large amount of client traffic is directed at one node, or only a few nodes, of a cluster. This traffic may represent reads, writes, or other operations. The traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability. This can also have adverse effects on other regions hosted by the same region server as that host is unable to service the requested load. It is important to design data access patterns such that the cluster is fully and evenly utilized.

To prevent hotspotting on writes, design your row keys such that rows that truly do need to be in the same region are, but in the bigger picture, data is being written to multiple regions across the cluster, rather than one at a time. Some common techniques for avoiding hotspotting are described below, along with some of their advantages and drawbacks.

Salting

Salting in this sense has nothing to do with cryptography, but refers to adding random data to the start of a row key. In this case, salting refers to adding a randomly-assigned prefix to the row key to cause it to sort differently than it otherwise would. The number of possible prefixes correspond to the number of regions you want to spread the data across. Salting can be helpful if you have a few "hot" row key patterns which come up over and over amongst other more evenly-distributed rows. Consider the following example, which shows that salting can spread write load across multiple RegionServers, and illustrates some of the negative implications for reads.

Example 11. Salting Example

Suppose you have the following list of row keys, and your table is split such that there is one region for each letter of the alphabet. Prefix 'a' is one region, prefix 'b' is another. In this table, all rows starting with 'f' are in the same region. This example focuses on rows with keys like the following:
foo0001
foo0002
foo0003
foo0004
Now, imagine that you would like to spread these across four different regions. You decide to use four different salts: a, b, c, and d. In this scenario, each of these letter prefixes will be on a different region. After applying the salts, you have the following rowkeys instead. Since you can now write to four separate regions, you theoretically have four times the throughput when writing that you would have if all the writes were going to the same region.
a-foo0003
b-foo0001
c-foo0004
d-foo0002
Then, if you add another row, it will randomly be assigned one of the four possible salt values and end up near one of the existing rows.
a-foo0003
b-foo0001
c-foo0003
c-foo0004
d-foo0002
Since this assignment will be random, you will need to do more work if you want to retrieve the rows in lexicographic order. In this way, salting attempts to increase throughput on writes, but has a cost during reads.

Hashing

Instead of a random assignment, you could use a one-way hash that would cause a given row to always be "salted" with the same prefix, in a way that would spread the load across the RegionServers, but allow for predictability during reads. Using a deterministic hash allows the client to reconstruct the complete rowkey and use a Get operation to retrieve that row as normal.

Example 12. Hashing Example

Given the same situation in the salting example above, you could instead apply a one-way hash that would cause the row with key foo0003 to always, and predictably, receive the a prefix. Then, to retrieve that row, you would already know the key. You could also optimize things so that certain pairs of keys were always in the same region, for instance.

Reversing the Key

A third common trick for preventing hotspotting is to reverse a fixed-width or numeric row key so that the part that changes the most often (the least significant digit) is first. This effectively randomizes row keys, but sacrifices row ordering properties.

See https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and article on Salted Tables from the Phoenix project, and the discussion in the comments of HBASE-11682 for more information about avoiding hotspotting.

那么到底什么是热点？？？相关推荐

【HBase】热点现象及 RowKey 设计（转）
原文链接:https://blog.csdn.net/qq_26803795/article/details/105994960?spm=a2c6h.12873639.0.0.62201019lo19 ...
干货 | 大白话彻底搞懂 HBase RowKey 详细设计
作者 | 且听风吟责编 | Carol 封图 | CSDN 付费下载于视觉中国前言 RowKey作为HBase的核心知识点,RowKey设计会影响到数据在HBase中的分布,还会影响我们查询效率, ...
用大白话彻底搞懂 HBase RowKey 详细设计
来源 | 且听_风吟来源 | CSDN 博客,责编 | Carol 封图 | CSDN 付费下载于东方 IC 前言 RowKey作为HBase的核心知识点,RowKey设计会影响到数据在HBase中 ...
用大白话彻底搞懂 HBase RowKey 详细设计！
来源 | 且听_风吟来源 | CSDN 博客,责编 | Carol 封图 | CSDN 付费下载于东方 IC 前言 RowKey作为HBase的核心知识点,RowKey设计会影响到数据在HBase中 ...
互联网运营面试题_产品运营面试常见问题
现如今产品运营的职位在在产品运营面试环节中,面试官们一般都会围绕以下几点进行提问: 1.产品感觉:对产品应该有感觉,网站运营最基础的就是产品运营,对产品有感觉,愿意从一个运营的角度去最大程度挖掘产品 ...
新年互联网领头行业随想
作者语: 新的一年,新的开始,互联网本来就是日新月异,那么新的一年,互联网到底什么是热点呢(最热的,势头最强的,而不是最新的)? 我就是互联网一个小卒,有点小小的想法,抛砖引玉吧.也许能出现精彩的回复 ...
大白话彻底搞懂HBase RowKey详细设计
写在前面:我是「且听风吟」,目前是某上市游戏公司的大数据开发工程师,热爱大数据开源技术,喜欢分享自己的所学所悟,现阶段正在从头梳理大数据体系的知识,以后将会把时间重点放在Spark和Flink上面. ...
无线上网不用花钱全攻略
当你带着笔记本电脑来到一个陌生的.没有网线接入的地方,怎么才能上网?非要购买价格不菲的"随e行"无线上网卡吗?其实不用,很多地方都有免费的"热点"!如何得知附近 ...
只要5分钟用数据可视化带你看遍11月份新闻热点事件
2017年11月份已经离我们而去,在过去的11月份我们也许经历了双十一的剁手,也可能亲眼看见了别人剁手.11月份的北京大兴区发生了"11·18"重大火灾,国内多家幼儿园也多次上了头 ...

那么到底什么是热点？？？

热点发现

Hotspotting

那么到底什么是热点？？？相关推荐

最新文章

热门文章