官方说法,来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings:

index.codecThe default value compresses stored data with LZ4 compression, but this can be set tobest_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.

注意:2.1以下都是实验特性!2.2+才稳定!

Now you can also enable better compression on the cold nodes by setting index.codec: best_compression in theirconfig/elasticsearch.yml file in order to be able to archive more data with the same amount of disk space.

摘自:https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch

下面的数据摘自:https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0

The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details.

Test String fields _all index size /w LZ4 index size /w DEFLATE expansion ratio /w LZ4 expansion ratio /w DEFLATE Impact of DEFLATE
Structured data file. Original file size: 67644119              
1 analyzed and not_analyzed  enabled 63047579 53131592 0.932 0.785 -0.157
2 analyzed and not_analyzed  disabled 48271433 38327106 0.713 0.566 -0.206
3 not_analyzed disabled 38920800 29014796 0.575 0.428 -0.254
3b not_analyzed, except for 'message' field which is retained and analyzed disabled 65382872 49532858 0.966 0.732 -0.242
4 not_analyzed, except for 'agent' field which is analyzed disabled 43083702 32063602 0.636 0.474 -0.255
Semi-structured data file.
Original file size: 75037027
             
1 analyzed and not_analyzed  enabled 100478376 82132782 1.339 1.094 -0.182
2 analyzed and not_analyzed  disabled 75238480 56911638 1.002 0.758 -0.243
3 not_analyzed disabled 71866672 53553561 0.957 0.713 -0.254
3b not_analyzed, except for 'message' field which is retained and analyzed disabled 104638750 83824398 1.394 1.117 -0.198
4 not_analyzed, except for 'agent' field which is analyzed disabled 72925624 54603882 0.971 0.727 -0.251

With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters.

As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.

Conclusion

There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.

转载于:https://www.cnblogs.com/bonelee/p/6269582.html

elasticsearch 2.2+ index.codec: best_compression启用压缩相关推荐

  1. 使sqoop能够启用压缩的一些配置

    在使用sqoop 将数据库表中数据导入至hdfs时 配置启用压缩 hadoop 的命令    检查本地库支持哪些  bin/hadoop checknative 需要配置native    要编译版本 ...

  2. Oracle 数据库备份启用压缩以及remap

    1. Oracle数据库进行备份恢复 客户测试环境, 有时候需要从现场copy到公司, 压缩虽然能够减少部分空间大小,但是copy到虚拟机里面也时浪费很大量的磁盘,所以能够在备份恢复的过程中执行压缩最 ...

  3. 【ES笔记01】ElasticSearch数据库之index索引、doc文档、alias别名、mappings映射结构的基本操作

    这篇文章,主要介绍ElasticSearch数据库之index索引.doc文档.alias别名.mappings映射结构的基本操作. 目录 一.索引index相关操作 1.1.创建索引 1.2.查询索 ...

  4. ElasticSearch Open/Close Index

    ElasticSearch Open/Close Index 1.close index 2.open index 3.总结 针对部分索引,我们暂时不需要对其进行读写,可以临时关闭索引,以减少es服务 ...

  5. ElasticSearch 系列: Index Template

    Index template Index template定义在创建新index时可以自动应用的settings和mappings. Elasticsearch根据与index名称匹配的index模式 ...

  6. 【es】Elasticsearch:inverted index,doc_values及source

    文章目录 1.概述 2.Inverted index 3.Source 4.Doc_values 1.概述 转载:https://www.cnblogs.com/sanduzxcvbnm/p/1208 ...

  7. elasticsearch报错index read-only

    背景 线上服务器的Elasticsearch服务大量报错,查询数据没问题,但是新增或者修改数据时,返回如下错误: {"error": {"root_cause" ...

  8. elasticsearch mapping之index

    index: true 是否为该field 创建索引,体现出来就是该字段是否可被查询 创建index: PUT my_index {"mappings": {"type& ...

  9. elasticsearch 磁盘优化-index

    index参数 的作用 当我们不需要一些字段参数搜索和过滤时可以配置index来减轻我们的磁盘空间的消耗. index 的默认值是true,是能够被索引和过滤的. index = false 配置那些 ...

最新文章

  1. 手机拍照功能的简单实现
  2. 原始套接字与sniffer
  3. 502 Bad Gateway nginx 解决
  4. php的autoload机制
  5. Titanium 列表显示TableView
  6. 计算机出现开机故障的原因,电脑开机出现英文故障?先别着急拿去修!几分钟教你轻松搞定...
  7. Puppet 部署tomcat
  8. oracle仅部分记录建立索引的方法
  9. 转载:谈谈BM25评分
  10. html链接描述,HTML常用文本标记,超级链接和路径描述
  11. sendto函数深入理解
  12. Tensorflow入门(二)文本自动生成
  13. SegFormer论文记录(详细翻译)
  14. 结合 Whisper + Stable-diffusion 的语音生成图像任务
  15. Math.log()是以常数e为底数的对数
  16. node.jshe npm的区别
  17. zabbix的搭建和简单使用
  18. I/O控制器 和 DMA控制方式
  19. Android Animation动画原理分析
  20. 3D数据转换一站式解决方案CAD Exchanger软件介绍

热门文章

  1. linux上进程状态查询
  2. java cutdown_Java并发程序入门介绍
  3. python花式编码_Python——花式打印对象的若干种方法
  4. php的div和p的区别,p标签与div标签区别
  5. angularjs获取上一个元素的id_三男子非法获取苹果ID账号买卖,交易数万条,价格从一毛到上百元不等...
  6. php 定义数组使用逗号,
  7. mysq 开启慢查询日志
  8. git和github的关系
  9. 区块链新经济蓝图及导读pdf_区块链加快产业数字化转型,区块链新零售模式为企业发展加码提速...
  10. notepad 没有plugin manager_如何在没有反光镜的情况下在户外拍摄人物?