etc2 纹理压缩_ETC纹理的紧缩压缩

etc2 纹理压缩

This blog post describes the basics of Crunch compression and explains in details how the original Crunch algorithm was modified in order to be able to compress ETC1 and ETC2 textures.

这篇博客文章描述了Crunch压缩的基础知识，并详细说明了如何修改原始的Crunch算法以能够压缩ETC1和ETC2纹理。

紧缩简介：压缩DXT纹理。 (Introduction to Crunch: Compressing DXT textures.)

Crunch is an open source texture compression library © Richard Geldreich, Jr. and Binomial LLC, available on GitHub. The library was originally designed for compression of DXT textures. The following section describes the main ideas used in the original algorithm.

Crunch是一个开源纹理压缩库©Richard Geldreich，Jr.和Binomial LLC，可在GitHub上找到。该库最初是为DXT纹理压缩而设计的。以下部分描述了原始算法中使用的主要思想。

DXT编码 (DXT Encoding)

DXT is a block-based texture compression format. The image is split up into 4×4 blocks, and each block is encoded using a fixed number of bits. In case of DXT1 format (used for compression of RGB images), each block is encoded using 64 bits. Information about each block is stored using two 16-bit color endpoint values (color0 and color1), and 16 2-bit selector values (one selector value per pixel) which determine how the color of each pixel is computed (it can be either one of the two endpoint colors or a blend between them). According to the DXT1 compression format, there are two different ways to blend the endpoint colors, depending on which endpoint color has higher value. However, Crunch algorithm uses a subset of DXT1 encoding (endpoint colors are always ordered in such a way that color0 >= color1). Therefore, when using Crunch compression, endpoint colors are always blended in the following way:

DXT是基于块的纹理压缩格式。图像被分成4×4的块，并且每个块使用固定数量的位进行编码。在DXT1格式(用于RGB图像压缩)的情况下，每个块使用64位编码。使用两个16位颜色端点值( color0和color1 )和16个2位选择器值(每个像素一个选择器值)存储有关每个块的信息，这些值确定如何计算每个像素的颜色(可以是一个两种端点颜色或它们之间的混合)。根据DXT1压缩格式，有两种不同的方法可以混合端点颜色，具体取决于哪种端点颜色具有更高的值。但是，Crunch算法使用DXT1编码的子集(端点颜色始终以color0> = color1的方式排序 )。因此，在使用Crunch压缩时，端点颜色始终以以下方式混合：

selector value	pixel color
0	color0
1	color1
2	(2 * color0 + color1) / 3
3	(color0 + 2 * color1) / 3

选择器值	像素颜色
0	颜色0
1个	颜色1
2	(2 * color0 + color1)/ 3
3	(color0 + 2 * color1)/ 3

DXT encoding can therefore be visually represented in the following way:

DXT编码因此可以通过以下方式直观地表示：

color0 (RGB565)
16 bits/block

color1 (RGB565)
16 bits/block

selectors
2 bits/pixel

decoded DXT
4 bits/pixel

color0(RGB565)
16位/块

color1(RGB565)
16位/块

选择器
2位/像素

解码DXT
4位/像素

Each pixel can be decoded by merging together color0 and color1 values according to the selector value.

可以通过根据选择器值将color0和color1值合并在一起来解码每个像素。

For simplicity, information about color0 and color1 can be displayed on the same image (with the upper part of every 4×4 block filled with color0 and the lower part filled with color1). Then all the information necessary for decoding the final texture can be represented in a form of the following 2 images (4×4 blocks are displayed slightly separated from each other):

为简单起见，有关color0和color1的信息可以显示在同一图像上(每4×4块的上部填充有color0 ，下部填充有color1 )。然后，可以用以下2个图像的形式表示解码最终纹理所需的所有信息(显示4×4块，彼此之间稍微分开)：

color endpoints
32 bits/block

color selectors
32 bits/block

颜色端点
32位/块

颜色选择器
32位/块

平铺 (Tiling)

For an average texture it is quite common that neighbor blocks have similar endpoints. This property can be used to improve the compression ratio. In order to achieve this, Crunch introduces the concept of “chunks”. All the texture blocks are split into “chunks” of 2×2 blocks (the size of each chunk is 8×8 pixels), and each chunk is associated with one of the following 8 chunk types:

对于平均纹理，邻居块具有相似的端点是很常见的。此属性可用于提高压缩率。为了实现这一目标，Crunch引入了“块”的概念。所有纹理块均被分为2×2块的“块”(每个块的大小为8×8像素)，并且每个块与以下8个块类型之一相关联：

Of course, the described chunk types don’t cover all the possible combinations of matching endpoints, but at the same time, this way the information about the matching endpoints can be encoded very efficiently. Specifically, encoding of the chunk type requires 3 bits per 4 blocks (0.75 bits per block, uncompressed).

当然，所描述的块类型并不能涵盖匹配端点的所有可能组合，但是与此同时，通过这种方式，可以非常高效地编码有关匹配端点的信息。具体来说，块类型的编码需要每4块3位(每块0.75位，未压缩)。

Crunch algorithm can enforce the neighbor blocks within a chunk to have identical endpoints in cases when extra accuracy of the encoded colors isn’t worth spending extra bits for encoding of additional endpoints. This is achieved in the following way. First, each chunk is encoded in 8 different ways, corresponding to the described 8 chunk types (instead of using DXT1 optimization for each block, the algorithm is using DXT1 optimization for each tile). The quality of each encoding is then evaluated as the PSNR multiplied by a coefficient associated with the used chunk type, and the optimal encoding is selected. The trick here is that chunk types with higher number of matching endpoints also have higher quality coefficients. In other words, if using the same endpoint for two neighbor blocks within a chunk doesn’t reduce the PSNR much, then the algorithm will most likely select the chunk type where those neighbor blocks belong to the same tile. The described process can be referenced as “tiling”.

当编码颜色的额外精度不值得花费额外的比特来编码其他端点时，Crunch算法可以强制块内的相邻块具有相同的端点。这可以通过以下方式实现。首先，以与所描述的8种块类型相对应的8种不同方式对每个块进行编码(算法不是对每个块使用DXT1优化，而是对每个图块使用DXT1优化)。然后，将每种编码的质量评估为PSNR乘以与所使用的块类型相关联的系数，然后选择最佳编码。这里的技巧是，具有更高数量的匹配端点的块类型也具有更高的质量系数。换句话说，如果对块内的两个相邻块使用相同的端点不会大大降低PSNR，则算法将最有可能选择那些相邻块属于同一图块的块类型。所描述的过程可以称为“平铺”。

量化 (Quantization)

The basic idea of Crunch compression is to perform quantization of the determined endpoints and selectors blocks, in order to encode them more efficiently. This is achieved using vector quantization. The idea is similar to color quantization, when a color image is represented using a color palette and palette indices defined for each pixel.

紧缩压缩的基本思想是对确定的端点和选择器块执行量化，以便更有效地对其进行编码。这是使用矢量量化实现的。当使用调色板和为每个像素定义的调色板索引表示彩色图像时，该想法类似于颜色量化。

In order to perform vector quantization, each endpoint pair should be represented with a vector. For example, it is possible represent a tile endpoint pair with a vector (color0.r, color0.g, color0.b, color1.r, color1.g, color1.b), where color0 and color1 are obtained from DXT1 optimization. However, such representation doesn’t reflect the continuity properties of the source texture very well (for example, in case of a solid block, a small change of the block color might result in significant change of the optimal color0 and color1, which are used to encode this color). Instead, Crunch algorithm is using a different representation. Source pixels of each tile, which are represented by their (r, g, b) vectors, are split into 2 clusters using vector quantization, providing two centroids for each tile: low_color and high_color. Then the endpoints of each tile are represented with a (low_color.r, low_color.g, low_color.b, high_color.r, high_color.g, high_color.b) vector. Such representation of the tile endpoints doesn’t depend on the DXT1 optimization result, but at the same time performs quite well.

为了执行矢量量化，每个端点对应该用一个矢量表示。例如，可能用矢量(color0.r，color0.g，color0.b，color1.r，color1.g，color1.b)表示图块端点对，其中color0和color1是从DXT1优化获得的。但是，这种表示不能很好地反映源纹理的连续性(例如，在实心块的情况下，块颜色的微小变化可能会导致所使用的最佳color0和color1发生重大变化。编码这种颜色)。相反，Crunch算法使用其他表示形式。使用矢量量化将每个图块的源像素(由其(r，g，b)矢量表示)分成两个簇，为每个图块提供两个质心： low_color和high_color 。然后，用(low_color.r，low_color.g，low_color.b，high_color.r，high_color.g，high_color.b)向量表示每个图块的端点。瓦片端点的这种表示方式不依赖于DXT1优化结果，但同时表现得很好。

Note that after quantization all the blocks within a tile will be associated with the same endpoint codebook element, so they will get assigned the same endpoint index. This means that initially determined chunk types will be still valid after endpoint quantization.

请注意，量化后，图块中的所有块将与相同的端点码本元素相关联，因此将为它们分配相同的端点索引。这意味着在端点量化之后，最初确定的块类型将仍然有效。

Selectors of each 4×4 block can be represented with a vector of 16 components, corresponding to the selector values of each block pixel. In order to improve the result of the quantization, selector values are reordered in the following way, in order to better reflect the continuity of the selected color values:

每个4×4块的选择器可以用16个分量的向量表示，与每个块像素的选择器值相对应。为了改善量化结果，选择器值以以下方式重新排序，以便更好地反映所选颜色值的连续性：

linear selector value	pixel color
0	color0
1	(2 * color0 + color1) / 3
2	(color0 + 2 * color1) / 3
3	color1

线性选择器值	像素颜色
0	颜色0
1个	(2 * color0 + color1)/ 3
2	(color0 + 2 * color1)/ 3
3	颜色1

Vector quantization algorithm splits all the input vectors into separate groups (clusters) in such a way so that vectors in each group appear to be more or less similar. Each group is represented by its centroid, which is computed as an average of all the vectors in the group according to the selected metric. The computed centroid vectors are then used to generate the codebook (centroid vector components are clipped and rounded to integers in order to represent valid endpoints or selectors). The original texture elements are then replaced with the elements of the computed codebooks (endpoints for each source 4×4 block are replaced with the closest endpoint pair from the generated endpoint codebook, selectors for each source 4×4 block are replaced with the selector values of the closest selector codebook element).

向量量化算法将所有输入向量分成不同的组(簇)，以使每组中的向量看起来或多或少相似。每个组由其质心表示，质心根据所选度量计算为该组中所有矢量的平均值。然后，将计算出的质心向量用于生成码本(质心向量分量被裁剪并舍入为整数，以表示有效的端点或选择器)。然后，将原始纹理元素替换为计算后的代码簿的元素(将每个源4×4块的端点替换为生成的端点代码本中最接近的端点对，将每个源4×4块的选择器替换为选择器值最接近的选择器代码簿元素的元素)。

The result of vector quantization performed for both endpoints and selectors can be represented in the following way:

对端点和选择器执行的矢量量化的结果可以用以下方式表示：

endpoint codebook:

selector codebook:

端点密码本：

选择器密码本：

After quantization, it is sufficient to store the following information in order to decode the image:

量化之后，存储以下信息足以解码图像：

The quality parameter provided for Crunch compressor directly controls the size of generated endpoint and selector codebooks. The higher is the quality value, the larger are the endpoint and selector codebooks, the wider is the range of the possible indices, and subsequently, the bigger is the size of the compressed texture.

为Crunch压缩器提供的quality参数直接控制生成的终结点和选择器代码簿的大小。质量值越高，端点和选择器代码簿越大，可能的索引范围越宽，随后，压缩纹理的大小也越大。

DXT Alpha通道的编码 (Encoding of the DXT Alpha channel)

DXT encoding for the alpha channel is very similar to the DXT encoding of the color information. Information about the alpha channel of each block is stored using 64 bits: two 8-bit alpha endpoint values (alpha0 and alpha1), and 16 3-bit selector values (one selector value per pixel) which determine how the alpha of each pixel is computed (it can be either one of the two alpha values or a blend between them). As has been mentioned before, Crunch algorithm uses a subset of DXT encoding, so the possible alpha values are always blended in the following way:

alpha通道的DXT编码与颜色信息的DXT编码非常相似。有关每个块的alpha通道的信息使用64位存储：两个8位alpha端点值( alpha0和alpha1 )和16个3位选择器值(每个像素一个选择器值)，这些值确定每个像素的alpha值如何计算(可以是两个alpha值之一，也可以是两个alpha值之间的混合值)。如前所述，Crunch算法使用DXT编码的子集，因此始终以以下方式混合可能的alpha值：

selector value	pixel alpha
0	alpha0
1	alpha1
2	(6 * alpha0 + 1 * alpha1) / 7
3	(5 * alpha0 + 2 * alpha1) / 7
4	(4 * alpha0 + 3 * alpha1) / 7
5	(3 * alpha0 + 4 * alpha1) / 7
6	(2 * alpha0 + 5 * alpha1) / 7
7	(1 * alpha0 + 6 * alpha1) / 7

选择器值	像素alpha
0	alpha0
1个	alpha1
2	(6 * alpha0 +1 * alpha1)/ 7
3	(5 * alpha0 + 2 * alpha1)/ 7
4	(4 * alpha0 + 3 * alpha1)/ 7
5	(3 * alpha0 + 4 * alpha1)/ 7
6	(2 * alpha0 + 5 * alpha1)/ 7
7	(1 * alpha0 + 6 * alpha1)/ 7

Vector quantization for the alpha channel is performed exactly the same way as for the color components, except that vectors which represent alpha endpoints of each tile, consist of 2 components (low_alpha, high_alpha), and are obtained through clusterization of the alpha values of all the tile pixels.

对alpha通道的矢量量化与对颜色分量完全相同的方式执行，除了代表每个图块的alpha端点的矢量由2个分量(low_alpha，high_alpha)组成 ，并且是通过对所有Alpha值的聚类获得的平铺像素。

Note that the chunk type, determined during the tiling step, is common for both color and alpha endpoints. So in case of textures using alpha channel, chunk type is determined based on the combined PSNR computed for color and alpha components.

请注意，在切片步骤中确定的块类型对于颜色和alpha端点都是通用的。因此，在使用alpha通道的纹理的情况下，基于为颜色和alpha分量计算的组合PSNR确定块类型。

压缩步骤 (Compression step)

The main idea used in Crunch algorithm for improving the compression ratio is based on the fact that changing the order of the elements in the codebook doesn’t affect the decompression result (considering that the indices are reassigned accordingly). In other words, the elements of the generated codebooks can be reordered in such a way, so that the dictionary elements and indices acquire some specific properties, which allow them to be compressed more efficiently. Specifically, if the neighbor encoded elements appear to be similar, then each element can be used for prediction of the following element, which significantly improves the compression ratio.

Crunch算法中用于提高压缩率的主要思想是基于这样的事实，即更改码本中元素的顺序不会影响解压缩结果(考虑到索引已相应地重新分配)。换句话说，可以以这种方式对所生成的码本的元素进行重新排序，以使字典元素和索引获得某些特定的属性，从而可以更有效地对其进行压缩。具体地，如果相邻编码的元素看起来相似，则每个元素可用于预测随后的元素，这显着提高了压缩率。

According to this scheme, Crunch algorithm is using zero order prediction when encoding codebook elements and indices. Instead of encoding endpoint and selector indices, the algorithm encodes the deltas between the indices of the neighbor encoded blocks. The codebook elements are encoded using per-component prediction. Specifically, each endpoint codebook element (which is represented by two RGB565 colors) is encoded as 6 per-component deltas from the previous dictionary element. Each selector codebook element (which is represented by 16 2-bit selector values) is encoded as 16 per-component deltas from the previous dictionary element.

根据此方案，Crunch算法在对码本元素和索引进行编码时使用零阶预测。该算法不对端点和选择器索引进行编码，而是对相邻编码块的索引之间的增量进行编码。使用每个分量的预测对码本元素进行编码。具体来说，每个终结点代码簿元素(由两种RGB565颜色表示)都被编码为来自上一个字典元素的6个每个组件的增量。每个选择器代码簿元素(由16个2位选择器值表示)都被编码为上一个字典元素中的每个组件16个增量。

On the one hand, endpoint indices of the neighbor blocks should be similar, as the encoder compresses the deltas between the indices of the neighbour blocks. On the other hand, the neighbor codebook elements should be also similar, as the encoder compresses the deltas between the components of those neighbor codebook elements. The combined optimization is based on the Zeng’s technique, using a weighted function which takes into account both similarity of the indices of the neighbor blocks and similarity of the neighbor elements in the codebook. Such reordering optimization is performed both for endpoint and selector codebooks.

一方面，当编码器压缩相邻块的索引之间的增量时，相邻块的端点索引应该相似。另一方面，相邻码本元素也应该相似，因为编码器压缩了那些相邻码本元素的分量之间的增量。组合的优化基于Zeng的技术，使用加权函数，该函数考虑了相邻块索引的相似性和码本中相邻元素的相似性。对端点和选择器代码本都执行了这种重新排序优化。

Finally, the reordered codebooks and indices, along with the chunk type information, are encoded with Huffman coding (using zero order prediction for indices and codebook components). Each type of encoded data uses its own Huffman table, or multiple tables. For performance reasons adaptive Huffman coding isn’t used.

最后，使用霍夫曼编码对重新排序的代码簿和索引以及块类型信息进行编码(对索引和代码簿组件使用零阶预测)。每种类型的编码数据都使用其自己的霍夫曼表或多个表。由于性能原因，未使用自适应霍夫曼编码。

改善Crunch压缩库 (Improving Crunch compression library)

We performed a comprehensive analysis of the algorithms and techniques used in the original version of Crunch and introduced several modifications which allowed us to significantly improve the compression performance. The updated Crunch library, introduced in Unity 2017.3, can compress DXT textures up to 2.5 times faster, while providing about 10% better compression ratio. At the same time, decompressed textures, generated by both libraries, are identical bit by bit. The latest version of the library, which will reach Beta builds soon, will be able to perform Crunch compression of DXT textures about 5 times faster than the original version. The latest version of the Crunch library can be found in the following GitHub repository.

我们对原始版本的Crunch中使用的算法和技术进行了全面的分析，并进行了一些修改，这些修改使我们能够显着提高压缩性能。 Unity 2017.3中引入的更新后的Crunch库可以将DXT纹理压缩速度提高多达2.5倍，同时压缩率提高了约10％。同时，两个库生成的解压缩纹理一点一点地相同。该库的最新版本即将发布Beta版，它将能够对DXT纹理执行Crunch压缩，速度约为原始版本的5倍。可在以下GitHub存储库中找到最新版本的Crunch库。

The main modifications of the original Crunch library are described below. The improvement in compressed size and compression time, introduced by each modification, is described as a saved portion of the compressed size and compression time spent by the original library. It has been evaluated on the Kodak image test set. When compressing real world textures, the improvement in compression size should be normally higher.

原始Crunch库的主要修改如下所述。每次修改带来的压缩大小和压缩时间的改善被描述为原始库所花费的压缩大小和压缩时间的节省部分。已在柯达影像测试仪上进行了评估。压缩现实世界的纹理时，压缩大小的改善通常应该更高。

Replace chunk encoding scheme with block encoding scheme (improvement in compressed size: 2.1%, improvement in compression time: 7%)

用块编码方案替换组块编码方案(压缩大小提高：2.1％，压缩时间提高：7％)

As described above, in the original version of Crunch algorithm all the blocks are grouped into chunks of 2×2 blocks. Each chunk is associated with one of 8 different chunk types. The type of the chunk determines which blocks inside the chunk have the same endpoints indices. This scheme performs quite well, because it is often more efficient to compress information about the endpoint equality, rather than compress duplicate endpoint indices. However, this scheme can be improved. The modified Crunch algorithm no longer uses the concept of chunks. Instead, for each block it can encode a reference to the previously processed neighbor block, where the endpoint can be copied from. Considering that the texture is decompressed from left-to-right, top-to-bottom, endpoints of each decoded block can be either decoded from the input stream, copied from the left nearest block (reference to the left) or copied from the upper nearest block (reference to the top):

如上所述，在原始版本的Crunch算法中，所有块被分组为2×2块的块。每个块与8种不同块类型之一相关联。块的类型确定块内的哪些块具有相同的端点索引。该方案执行得很好，因为压缩有关终结点相等性的信息通常比压缩重复的终结点索引更为有效。但是，可以改进此方案。修改后的Crunch算法不再使用块的概念。相反，对于每个块，它可以编码对先前处理过的邻居块的引用，可以从中复制端点。考虑到纹理是从左到右，从上到下解压缩的，每个解码块的端点都可以从输入流中解码，可以从最左边的块复制(参考左边)，也可以从上面复制最近的块(参考顶部)：

The following example shows quantized texture endpoints with the references:

以下示例显示了带有参考的量化纹理端点：

Note that the modified Crunch encoding is a superset of the original encoding, so all the images previously encoded with the original Crunch algorithm can be losslessly transcoded into the new format, but not vice versa. Even though the new endpoint equality encoding is more expensive (about 1.58 bits per block, uncompressed), it also provides more flexibility for endpoint matching inside the previously used “chunks”, but more importantly, it allows to copy endpoints from one “chunk” to another (which isn’t possible when using the original chunk encoding). The blocks are no longer grouped together and are encoded in the same order as they appear on the image, which significantly simplifies the algorithm and eliminates extra levels of indirection.

请注意，修改后的Crunch编码是原始编码的超集，因此所有以前使用原始Crunch算法编码的图像都可以无损地转码为新格式，反之亦然。即使新的端点相等编码更昂贵(每块约1.58位，未压缩)，它也为以前使用的“块”内的端点匹配提供了更大的灵活性，但更重要的是，它允许从一个“块”中复制端点到另一个(使用原始块编码时不可能)。这些块不再分组在一起，并按照与图像上出现的顺序相同的顺序进行编码，从而大大简化了算法并消除了额外的间接级别。

Encode selector indices without prediction (improvement in compressed size: 1.8%, improvement in compression time: 10%)

编码选择器索引而不进行预测(压缩大小的改善：1.8％，压缩时间的改善：10％)

The original version of Crunch encodes the deltas between the neighbour indices in order to get advantage of the neighbour indices similarity. The efficiency of such approach highly depends on the continuity of the encoded data. While neighbour color and alpha endpoints are usually similar, this is often not the case for selectors. Of course, in some situations, encoding the deltas for selector indices makes sense, for example, when an image contains a lot of regular patterns aligned to the 4×4 block boundaries. In practice, however, such situations are relatively rare, so it usually appears to be more efficient to encode raw selector indices without prediction. Note that when selector indices are encoded without prediction, the reordering of the selector indices no longer affects the size of the encoded selector indices stream (at least when using Huffman coding). This makes the Zeng optimization of selector indices unnecessary, and it’s sufficient to simply optimize the size of the packed selector codebook.

Crunch的原始版本对邻居索引之间的增量进行编码，以便利用邻居索引的相似性。这种方法的效率高度取决于编码数据的连续性。尽管邻居的颜色和alpha端点通常很相似，但选择器通常不是这种情况。当然，在某些情况下，例如，当图像包含许多与4×4块边界对齐的常规模式时，对选择器索引的增量进行编码是有意义的。但是，实际上，这种情况很少见，因此在不进行预测的情况下对原始选择器索引进行编码通常似乎更为有效。注意，当选择器索引编码时没有预测时，选择器索引的重新排序不再影响编码的选择器索引流的大小(至少在使用霍夫曼编码时)。这使得没有必要对选择器索引进行Zeng优化，而仅对打包的选择器代码本的大小进行优化就足够了。

Remove duplicate endpoints and selectors from the codebooks (improvement in compressed size: 1.7%)

从代码簿中删除重复的终结点和选择器(压缩大小提高了1.7％)

By default, the size of the endpoint and selector codebooks is calculated based on the total number of blocks in the image and the quality parameter, while the actual complexity of the image isn’t evaluated and isn’t taken into account. The target codebook size is selected in such a way that even complex images can be approximated well enough. At the same time, normally, the lower the complexity of the image, the higher is the density of the quantized vectors. Considering that vector quantization is performed using floating point computations, and the quantized endpoints have integer components, high density of quantized vectors will result in a large number of duplicate endpoints. As the result, some identical endpoints are being represented with multiple different indices, which affects the compression ratio. Note that this isn’t the case for selectors, as their corresponding vector components are rounded after quantization, but instead it leads to some duplicate selectors in the codebook being unused. In the modified version of the algorithm all the duplicate codebook entries are merged together, unused entries are removed from the codebooks, endpoint and selector indices are updated accordingly.

默认情况下，终结点和选择器代码簿的大小是根据图像中的块总数和quality参数计算的，而图像的实际复杂性不会被评估，也不会被考虑在内。以这样的方式选择目标码本大小，使得即使足够复杂的图像也可以足够近似。同时，通常，图像的复杂度越低，量化矢量的密度越高。考虑到矢量量化是使用浮点计算执行的，并且量化的端点具有整数分量，因此量化矢量的高密度将导致大量重复的端点。结果，一些相同的端点用多个不同的索引表示，这影响了压缩率。请注意，选择器不是这种情况，因为它们的相应矢量分量在量化后会四舍五入，但相反，这会导致码本中某些重复的选择器未被使用。在该算法的修改版本中，所有重复的代码簿条目都合并在一起，未使用的条目从代码簿中删除，相应地更新了端点和选择器索引。

Use XOR-deltas for encoding of the selector codebook (improvement in compressed size: 0.9%)

使用XOR增量对选择器代码簿进行编码(压缩后的大小提高了0.9％)

In the original version of Crunch, selector codebook is encoded with Huffman coding applied to the raw deltas between corresponding pixel selectors of the neighbour codebook elements. However, using Huffman coding for raw deltas has a downside. Specifically, for each individual pixel selector, only about half of all the possible raw deltas are valid. Indeed, once the value of the current selector is determined, the selector delta depends only on the next selector value, so only n out of 2 * n – 1 total raw delta values are possible at any specific point (where n is the number of possible selector values). This means that on each step the impossible raw delta values are being encoded with a non-zero probability, as the probability table is calculated only once throughout the whole codebook. The situation can be improved by using modulo-deltas instead of raw deltas (modulo 4 for color selectors and modulo 8 for alpha selectors). This eliminates the mentioned implicit restriction on the values of the decoded selector deltas, and therefore improves the compression ratio. Interestingly, the compression ratio can be improved even further if XOR-deltas are used instead of modulo-deltas (XOR-delta is computed by simply XOR-ing two selector values). At first it might seem counterintuitive that XOR-delta can perform better than modulo-delta, as it doesn’t reflect the continuity properties of the data that well. The trick here is that the encoded selectors are first sorted according to the used delta operation and the corresponding metric.

在原始版本的Crunch中，选择器代码簿使用霍夫曼编码进行编码，该霍夫曼编码应用于相邻代码簿元素的相应像素选择器之间的原始增量。然而，对原始增量使用霍夫曼编码有一个缺点。具体而言，对于每个单独的像素选择器，所有可能的原始增量中只有大约一半有效。实际上，一旦确定当前选择的值，选择器增量仅依赖于下一个选择器的值，因此，只有N OUT的2 * N - 1个总原始增量值是可能的，在任何特定的点(其中，n是数可能的选择器值)。这意味着在每个步骤中，不可能的原始增量值都以非零概率进行编码，因为概率表在整个密码本中仅计算一次。可以通过使用模增量而不是原始增量来改善这种情况(颜色选择器的模数为4，alpha选择器的模数为8)。这消除了对解码后的选择器增量的值的上述隐式限制，因此提高了压缩率。有趣的是，如果使用XOR-delta而不是模-delta(XOR-delta是通过简单地对两个选择器值进行XOR运算来计算的)，则可以进一步提高压缩率。乍一看，XOR-delta的性能优于模-delta，这似乎违反直觉，因为它不能很好地反映数据的连续性。这里的技巧是，首先根据使用的增量操作和相应的度量对编码的选择器进行排序。

Improve Zeng reordering algorithm (improvement in compressed size: 0.7%, improvement in compression time: 5%)

改进Zeng重新排序算法(压缩大小改进：0.7％，压缩时间改进：5％)

After the endpoint codebook has been computed, the endpoints are reordered to improve the compression ratio. As has been described above, optimization is based on Zeng’s technique, using a weighted function which takes into account both similarity of the indices in neighbor blocks and similarity of the neighbor elements in the codebook.

计算完端点码本后，将对端点进行重新排序以提高压缩率。如上所述，优化是基于Zeng的技术，使用加权函数，该函数考虑了相邻块中索引的相似性和码本中相邻元素的相似性。

The ordered list of endpoints is built starting from a single endpoint and then adding one of the remaining endpoints to the beginning or to the end of the list on each iteration. It’s using a greedy strategy which is controlled by the optimization function. The similarity of the endpoint indices is evaluated as a combined neighborhood frequency of the candidate endpoint and all the endpoints in the ordered list. The similarity of the neighbor endpoints in the codebook is evaluated as Euclidian distance from the candidate endpoint to the extremity of the ordered list. The original optimization function for an endpoint candidate p can be represented as:

端点的有序列表是从单个端点开始构建的，然后在每次迭代时将其余端点之一添加到列表的开头或结尾。它使用由优化功能控制的贪婪策略。将端点索引的相似性评估为候选端点和有序列表中所有端点的组合邻域频率。码本中相邻端点的相似性被评估为从候选端点到有序列表的末端的欧几里得距离。端点候选者p的原始优化函数可以表示为：

F(p) = (endpoint_similarity(p) + 1) * (neighborhood_frequency(p) + 1)

F(p)=(端点相似度(p)+1)*(邻居频率(p)+1)

The problem with this approach is the following. While the endpoint_similarity(p) has a limited range of values, the neighborhood_frequency(p) grows rapidly with the increasing size of the ordered list of endpoints. With each iteration this introduces additional disbalance for the weighted optimization function. In order to minimize this effect, is it proposed to normalize the neighborhood_frequency(p) on each iteration. For computational simplicity, the normalizer is computed as the optimal neighborhood_frequency value from the previous iteration, multiplied by a constant. The modified optimization function can be represented as:

这种方法的问题如下。尽管endpoint_similarity(p)的值范围有限，但是neighborhood_frequency(p)随端点的有序列表的大小的增加而Swift增长。每次迭代都会给加权优化函数带来额外的失衡。为了最小化此影响，建议在每次迭代中对Neighborhood_frequency(p)进行归一化。为了简化计算，将标准化器计算为前一次迭代的最佳邻域频率值乘以一个常数。修改后的优化函数可以表示为：

F(p) = (endpoint_similarity(p) + 1) * (neighborhood_frequency(p) + neighborhood_frequency_normalizer)

F(p)=(端点相似度(p)+1)*(邻域频率(p)+邻域频率归一化器)

其他改进 (Other improvements)

Additional improvement in compression speed has been achieved by optimizing the original algorithms, reducing the total amount of computations by caching the intermediate computation results, and spreading the computations between threads more efficiently.

通过优化原始算法，通过缓存中间计算结果来减少计算总量以及更有效地在线程之间扩展计算，可以进一步提高压缩速度。

紧缩编码与通用压缩 (Crunch encoding vs. general purpose compression)

The described modifications of the Crunch algorithm don’t change the result of the quantization step, which means that decompressed textures, generated by both libraries, will be identical bit by bit. In other words, the improvement in compression ratio has been achieved by using a different lossless encoding of the quantized images. It might therefore be interesting to compare Crunch encoding with alternative ways of compressing the quantized textures. For example, quantized textures can be stored in a raw DXT format, compressed with LZMA. The following table displays the difference in compression ratio when using different approaches:

描述的Crunch算法修改不会改变量化步骤的结果，这意味着两个库生成的解压缩纹理将一点一点相同。换句话说，通过使用量化图像的不同无损编码来实现压缩率的提高。因此，将Crunch编码与压缩量化纹理的替代方法进行比较可能会很有趣。例如，量化纹理可以以原始DXT格式存储，并用LZMA压缩。下表显示了使用不同方法时压缩率的差异：

	DXT	Quantized DXT + LZMA	Quantized DXT + original Crunch encoding	Quantized DXT + improved Crunch encoding
Kodak image set	6147.4 KB	2227.0 KB	2016.8 KB	1869.9 KB
Adam Character Pack: Adam, Guard, Lu (93 textures)	652.7 MB	155.8 MB	142.8 MB	128.7 MB
Adam Exterior Environment (227 textures)	717.8 MB	162.6 MB	156.3 MB	138.1 MB

	DXT	量化DXT + LZMA	量化DXT +原始Crunch编码	量化DXT +改进的Crunch编码
柯达图像集	6147.4 KB	2227.0 KB	2016.8 KB	1869.9 KB
亚当角色包：亚当，警卫队，陆(93个纹理)	652.7兆字节	155.8兆字节	142.8兆字节	128.7兆字节
亚当外部环境(227个纹理)	717.8兆字节	162.6兆字节	156.3兆字节	138.1兆字节

According to the test results, it seems to be more efficient to use Crunch encoding of the computed codebooks and indices, rather than compress the quantized texture with LZMA. Not to mention that Crunch decompression is also significantly faster than LZMA decompression.

根据测试结果，使用计算代码簿和索引的Crunch编码似乎比使用LZMA压缩量化纹理更为有效。更不用说Crunch减压也比LZMA减压快得多。

修改Crunch算法以支持ETC纹理格式 (Modifying Crunch algorithm to support ETC texture format)

Even though the Crunch algorithm was originally designed for compression of DXT textures, it is in fact much more powerful. With some minor adjustments it can be used to compress other texture formats. This section will describe in detail how the original Crunch algorithm was modified in order to be able to compress ETC and ETC2 textures.

尽管Crunch算法最初是为DXT纹理压缩而设计的，但实际上它的功能要强大得多。经过一些小的调整，它可以用于压缩其他纹理格式。本节将详细描述如何修改原始的Crunch算法，以便能够压缩ETC和ETC2纹理。

ETC编码 (ETC encoding)

ETC is a block-based texture compression format. The image is split up into 4×4 blocks, and each block is encoded using a fixed number of bits. In case of ETC1 format (used for compression of RGB images), each block is encoded using 64 bits.

ETC是基于块的纹理压缩格式。图像被分成4×4的块，并且每个块使用固定数量的位进行编码。在ETC1格式(用于RGB图像压缩)的情况下，每个块使用64位编码。

The first 32 bits contain information about the colors used within the 4×4 block. Each 4×4 block is split either vertically or horizontally into two 2×4 or 4×2 subblocks (the orientation of each block is controlled by the “flip” bit). Each subblock is assigned its own base color and its own modifier table index.

前32位包含有关4×4块内使用的颜色的信息。每个4×4块被垂直或水平分割为两个2×4或4×2子块(每个块的方向由“翻转”位控制)。每个子块都有自己的基色和自己的修饰符表索引。

The two base colors of a 4×4 block can be encoded either individually as RGB444, or differentially (the first base color is encoded as RGB555, and the second base color is encoded as RGB333 signed offset from the first base color). The type of the base color encoding for each block is controlled by the “diff” bit.

4×4块的两个基色可以单独编码为RGB444，也可以进行差分编码(第一基色编码为RGB555，第二基色编码为RGB333与第一基色有符号偏移)。每个块的基本颜色编码的类型由“ diff”位控制。

The modifier table index of each subblock is referencing one of the 8 possible rows in the following modifier table:

每个子块的修改器表索引都引用以下修改器表中的8个可能的行之一：

modifier table index	modifier0	modifier1	modifier2	modifier3
0	-8	-2	2	8
1	-17	-5	5	17
2	-29	-9	9	29
3	-42	-13	13	42
4	-60	-18	18	60
5	-80	-24	24	80
6	-106	-33	33	106
7	-183	-47	47	183

修改器表索引	修饰符0	修饰符1	修饰符2	修饰符3
0	-8	-2	2	8
1个	-17	-5	5	17
2	-29	-9	9	29
3	-42	-13	13	42
4	-60	-18	18	60
5	-80	-24	24	80
6	-106	-33	33	106
7	-183	-47	47	183

The intensity modifier set (modifier0, modifier1, modifier2, modifier3) defined by the modifier table index, along with the base color, determine 4 possible color values for each subblock:

由修改器表索引定义的强度修改器集(修改器0，修改器1，修改器2，修改器3)以及基础颜色为每个子块确定4种可能的颜色值：

base_color + RGB(modifier0, modifier0, modifier0)
base_color + RGB(modifier1, modifier1, modifier1)
base_color + RGB(modifier2, modifier2, modifier2)
base_color + RGB(modifier3, modifier3, modifier3)

base_color + RGB(修改器0，修改器0，修改器0) base_color + RGB(修改器1，修改器1，修改器1) base_color + RGB(modifier2，modifier2，modifier2) base_color + RGB(modifier3，modifier3，modifier3)

Note that the higher is the value of the modifier table index, the more distributed are the subblock colors along the intensity axis.

注意，修改器表索引的值越高，沿着强度轴的子块颜色分布越多。

Another 32 bits of the encoded ETC1 block describe 16 2-bit selectors values (each pixel in the block can take one of 4 possible color values, described above).

编码的ETC1块的另外32位描述了16个2位选择器值(如上所述，该块中的每个像素都可以采用4种可能的颜色值之一)。

ETC1 encoding can therefore be visually represented in the following way:

因此，ETC1编码可以通过以下方式直观地表示：

base colors +
block orientation
26 bits/block

modifier table index
3 bits/subblock

selectors
2 bits/pixel

decoded ETC1
4 bits/pixel

基色+
块定向
26位/块

修改器表索引
3位/子块

选择器
2位/像素

解码的ETC1
4位/像素

Each pixel color of an ETC1 block can be decoded by adding together the base color and the modifier color, defined by the modifier table index and selector value (the result color should be clamped).

ETC1块的每个像素颜色都可以通过将基色和修饰符颜色加在一起来解码，修饰符颜色由修饰符表索引和选择器值定义(结果颜色应钳位)。

For simplicity, information about the base colors, block orientations and modifier table indices can be displayed on the same image. The upper or the left part of each 2×4 or 4×2 subblock (depending on the block orientation) is filled with the base color, and the rest is filled with the modifier table index color. Then all the information necessary for decoding of the final texture can be represented in a form of the following 2 images (subblocks on the left image and blocks on the right image are displayed slightly separated from each other):

为简单起见，有关基色，块方向和修饰符表索引的信息可以显示在同一图像上。每个2×4或4×2子块的上部或左侧部分(取决于块方向)都用基础颜色填充，其余部分则用修饰符表索引颜色填充。然后，可以用以下两个图像的形式表示解码最终纹理所需的所有信息(左图像上的子块和右图像上的块彼此稍微分开地显示)：

base colors + block orientation +
modifier table index
32 bits/block

color selectors
32 bits/block

基色+块方向+
修改器表索引
32位/块

颜色选择器
32位/块

The detailed description of ETC1 format can be found at this Khronos Group page.

可在此Khronos组页面上找到ETC1格式的详细说明。

使用Crunch算法压缩ETC1纹理 (Using Crunch algorithm for compression of ETC1 textures)

Even though DXT1 and ETC1 encodings seem to be quite different, they also have a lot in common. Each pixel of an ETC1 texture can take one of four possible color values, which means that ETC1 selector encoding is equivalent to DXT1 selector encoding, and therefore ETC1 selectors can be quantized exactly the same way as DXT1 selectors. The main difference between the encodings is that in case of ETC1, each half of a 4×4 block has its own set of possible color values. But even though ETC1 subblock colors are encoded using a base color and a modifier table index, the four computed subblock colors normally lie on the same line and are more or less evenly distributed along that line, which highly resembles DXT1 block colors. The described similarities allow to use Crunch compression for ETC1 textures, with some modifications.

即使DXT1和ETC1编码似乎有很大的不同，它们也有很多共同点。 ETC1纹理的每个像素可以采用四个可能的颜色值之一，这意味着ETC1选择器编码与DXT1选择器编码等效，因此ETC1选择器可以与DXT1选择器完全相同地量化。编码之间的主要区别在于，在ETC1的情况下，4×4块的每半都有其自己可能的颜色值集。但是，即使使用基色和修饰符表索引对ETC1子块颜色进行了编码，四种计算出的子块颜色通常也位于同一行上，或多或少地沿该行均匀分布，这与DXT1块颜色非常相似。所描述的相似性允许对某些ETC1纹理使用Crunch压缩，并进行一些修改。

As has been described above, Crunch compression involves the following main steps:

如上所述，Crunch压缩涉及以下主要步骤：

When applying Crunch algorithms to a new texture format, it is necessary to first define the codebook element. In the context of Crunch, this means that the whole image consists of smaller non-overlapping blocks, while the contents of each individual block are determined by an endpoint and a selector from the corresponding codebooks. For example, in case of DXT format, each endpoint and selector codebook element corresponds to a 4×4 pixel block. In general, the size of the blocks, which form the encoded image, depends on the texture format and quality considerations.

将Crunch算法应用于新的纹理格式时，必须首先定义codebook元素。在Crunch的上下文中，这意味着整个图像由较小的非重叠块组成，而每个单独块的内容由端点和相应代码簿中的选择器确定。例如，在DXT格式的情况下，每个端点和选择器代码簿元素都对应一个4×4像素块。通常，形成编码图像的块的大小取决于纹理格式和质量考虑。

It’s proposed to define codebook elements according to the following limitations:

建议根据以下限制定义码本元素：

端点密码本 (Endpoint codebook)

In case of ETC1, the texture format itself determines the minimal size of the image block, defined by an endpoint: it can be either 2×4 or 4×2 rectangle, aligned to the borders of the 4×4 grid. It isn’t possible to use higher granularity, because each of those rectangles can have only one base color, according to the ETC1 format. For the same reason, any image block, defined by an endpoint codebook element, should represent a combination of ETC1 subblocks.

对于ETC1，纹理格式本身确定由端点定义的图像块的最小尺寸：它可以是2×4或4×2矩形，与4×4网格的边界对齐。无法使用更高的粒度，因为根据ETC1格式，这些矩形中的每个只能具有一种基本色。出于相同的原因，由端点码本元素定义的任何图像块都应表示ETC1子块的组合。

At the same time, each ETC1 subblock has its own base color and modifier table index, which approximately determine the high and the low colors of the subblock (even though there are some limitations on the position of those high and low colors, implied by the ETC1 encoding). If an endpoint codebook element is defined in such a way that it contains information about more than one ETC1 base color, then such a dictionary will become incompatible with the existing tile quantization algorithm for the following reason. The Crunch tiling algorithm first performs quantization of all the tile pixel colors, down to just 2 colors. Then it performs quantization of all the generated color pairs, generated by different tiles. This approach works quite well for 4×4 DXT blocks, as those 2 colors approximately represent the principal component of the tile pixel colors. In case of ETC1, however, mixing together pixels, which correspond to different base colors, doesn’t make much sense, because each group of those pixels has its own low and high color values independent from other groups. If those pixels are mixed together, the information about the original principal components of each subblock will get lost.

同时，每个ETC1子块都有其自己的基色和修饰符表索引，它们大约确定子块的上色和下色(即使这些上色和下色的位置受到某些限制，这也暗示了) ETC1编码)。如果以这样的方式定义端点码本元素，使其包含有关一种以上ETC1基色的信息，则由于以下原因，这样的词典将与现有的图块量化算法不兼容。 Crunch切片算法首先对所有切片像素颜色执行量化，直到仅2种颜色。然后，它对所有由不同图块生成的颜色对进行量化。这种方法对于4×4 DXT块效果很好，因为这两种颜色大约代表了瓦片像素颜色的主要成分。但是，在ETC1的情况下，将对应于不同基色的像素混合在一起没有多大意义，因为这些像素的每一组都有独立于其他组的自己的低色值和高色值。如果将那些像素混合在一起，则有关每个子块的原始主成分的信息将丢失。

The described limitations suggest that ETC1 endpoint codebook element should represent the area of a single ETC1 subblock (either 2×4 or 4×2). This means that ETC1 endpoint codebook element should contain information about the subblock base color (RGB444 or RGB555) and the modifier table index (3 bits). And it is therefore proposed to encode an ETC1 “endpoint” as 3555 (3 bits for the modifier table index and 5 bits for each component of the base color).

所描述的限制表明ETC1端点码本元素应代表单个ETC1子块的面积(2×4或4×2)。这意味着ETC1端点码本元素应包含有关子块基色(RGB444或RGB555)和修饰符表索引(3位)的信息。因此，建议将ETC1“端点”编码为3555(修饰符表索引为3位，基色的每个分量为5位)。

选择器密码本 (Selector codebook)

In case of DXT format, both endpoint codebook elements and selector codebook elements correspond to the same size of the decoded block (in case of DXT it is 4×4). So it would be reasonable to try the same scheme for ETC1 encoding (i.e. to use 2×4 or 4×2 blocks for selector codebooks, matching the blocks which are defined by endpoint codebook elements). Nevertheless, after additional research we discovered a very interesting observation. Specifically, endpoint blocks and selector blocks don’t have to be of the same size in order to be compatible with the existing Crunch algorithm. Indeed, selector codebook and selector indices are defined after the endpoint optimization is complete. At this point each image pixel is already associated with a specific endpoint. At the same time, the selector computation step is using those per-pixel endpoint associations as the only input information, so the size and the shape of the blocks, defined by selector codebook elements, doesn’t depend in any way on the size or shape of the blocks, defined by endpoint codebook elements.

在DXT格式的情况下，端点代码簿元素和选择器代码簿元素都对应于解码块的相同大小(在DXT情况下为4×4)。因此，对于ETC1编码尝试相同的方案是合理的(即对选择器代码簿使用2×4或4×2块，与由端点代码簿元素定义的块匹配)。然而，经过更多研究，我们发现了一个非常有趣的发现。具体来说，端点块和选择器块不必具有相同的大小即可与现有的Crunch算法兼容。 Indeed, selector codebook and selector indices are defined after the endpoint optimization is complete. At this point each image pixel is already associated with a specific endpoint. At the same time, the selector computation step is using those per-pixel endpoint associations as the only input information, so the size and the shape of the blocks, defined by selector codebook elements, doesn't depend in any way on the size or shape of the blocks, defined by endpoint codebook elements.

In other words, the endpoint space of the texture can be split into one set of blocks, defined by endpoint codebook and endpoint indices. And the selector space of the texture can be split into a completely different set of blocks, defined by selector codebook and selector indices. Endpoint blocks can be different in size from the selector blocks, as well as endpoint blocks can overlap in arbitrary way with the selector blocks, and such setup will still be fully compatible with the existing Crunch algorithm. The discovered property of the Crunch algorithm opens another dimension for optimization of the compression ratio. Specifically, the quality of the compressed selectors can now be adjusted in two ways: by changing the size of the selector codebook and by changing the size of the selector block. Note that both DXT and ETC formats have selectors encoded as plain bits in the output format, so there is no limitation on the size or shape of the selector block (though, for performance reasons, non-power-of-two selector blocks might require some specific optimizations in the decoder).

Several performance tests have been conducted using different selector block sizes, and the results suggest that 4×4 selector blocks perform quite well.

Tiling (Tiling)

As has been described above, each element of an ETC1 endpoint codebook should correspond to an ETC1 subblock (i.e. to a 2×4 or a 4×2 pixel block, depending on the block orientation). In case of DXT encoding, the size of the encoded block is 4×4 pixels, and tiling is performed in a 8×8 pixel area (covering 4 blocks). In case of ETC1, however, tiling can be performed either in a 4×4 pixel area (covering 2 subblocks), or in a 8×8 pixel area (covering 8 subblocks), while other possibilities are either not symmetrical or too complex. For performance reasons and simplicity it is proposed to use 4×4 pixel area for tiling. There are therefore 3 possible block types: the block isn’t split (the whole block is encoded using a single endpoint), the block is split horizontally, the block is split vertically:

As has been described above, each element of an ETC1 endpoint codebook should correspond to an ETC1 subblock (ie to a 2×4 or a 4×2 pixel block, depending on the block orientation). In case of DXT encoding, the size of the encoded block is 4×4 pixels, and tiling is performed in a 8×8 pixel area (covering 4 blocks). In case of ETC1, however, tiling can be performed either in a 4×4 pixel area (covering 2 subblocks), or in a 8×8 pixel area (covering 8 subblocks), while other possibilities are either not symmetrical or too complex. For performance reasons and simplicity it is proposed to use 4×4 pixel area for tiling. There are therefore 3 possible block types: the block isn't split (the whole block is encoded using a single endpoint), the block is split horizontally, the block is split vertically:

The following example shows computed tiles for the texture endpoints:

Endpoint references (Endpoint references)

At first, it might look like ETC1 block flipping can bring some complications for Crunch, as the subblock structure doesn’t look like a grid. This, however, can be easily resolved by flipping all the “horizontal” ETC1 blocks across the main diagonal of the block after the tiling step, so that all the ETC1 subblocks will become 2×4 and form a regular grid:

At first, it might look like ETC1 block flipping can bring some complications for Crunch, as the subblock structure doesn't look like a grid. This, however, can be easily resolved by flipping all the “horizontal” ETC1 blocks across the main diagonal of the block after the tiling step, so that all the ETC1 subblocks will become 2×4 and form a regular grid:

flipped color endpoints

flipped color selectors

flipped color endpoints

flipped color selectors

Note that decoded selectors should be flipped back according to the block orientation during decompression (this can be efficiently implemented by precomputing a codebook of flipped selectors).

Endpoint references for the ETC1 format are encoded in a similar way to the DXT1 format. The are however two modifications, specific to the ETC1 encoding:

The primary ETC1 subblock has the reference value of 0 if the endpoint is decoded from the input stream, the value of 1 if the endpoint is copied from the secondary subblock of the left neighbour ETC1 block, the value of 2 if the endpoint is copied from the primary subblock of the top neighbour ETC1 block, and the value of 3 if the endpoint is copied from the secondary subblock of the top-left neighbour ETC1 block:

The reference value of secondary ETC1 subblock contains information about the block tiling and flipping. It has the reference value of 0 if the endpoint is copied from the primary subblock (note that in this case flipping doesn’t need to be encoded, as endpoints are equal), the value of 1 if the endpoint is decoded from the input stream and the corresponding ETC1 block is split horizontally, and the value of 2 if the endpoint is decoded from the input stream and the corresponding ETC1 block is split vertically:

The reference value of secondary ETC1 subblock contains information about the block tiling and flipping. It has the reference value of 0 if the endpoint is copied from the primary subblock (note that in this case flipping doesn't need to be encoded, as endpoints are equal), the value of 1 if the endpoint is decoded from the input stream and the corresponding ETC1 block is split horizontally, and the value of 2 if the endpoint is decoded from the input stream and the corresponding ETC1 block is split vertically:

Quantization (Quantization)

Considering that each endpoint codebook element corresponds to a single ETC1 base color, the original endpoint quantization algorithm works almost the same way for the ETC1 encoding as for the DXT1 encoding. An endpoint of en ETC1 tile can be represented with a (low_color.r, low_color.g, low_color.b, high_color.r, high_color.g, high_color.b) vector, where low_color and high_color are generated by the tile palletizer, exactly the same way as for the DXT1 encoding.

Note that low_color and high_color, computed for a tile, implicitly contain information about the base color and the modifier table index, computed for this tile. Indeed, the base color normally lies somewhere in the middle between low_color and high_color, while the modifier table index corresponds to the distance between low_color and high_color. Vectors which represent tiles with close values of low_color and high_color, will most likely get into the same cluster after vector quantization. But this also means that for the tiles from the same cluster, the average values of low_color and high_color, and distances between low_color and high_color should be also pretty close. In other words, the original endpoint quantization algorithm will generate tile clusters with close values of the base color and the modifier table index.

Note that low_color and high_color , computed for a tile, implicitly contain information about the base color and the modifier table index, computed for this tile. Indeed, the base color normally lies somewhere in the middle between low_color and high_color , while the modifier table index corresponds to the distance between low_color and high_color . Vectors which represent tiles with close values of low_color and high_color , will most likely get into the same cluster after vector quantization. But this also means that for the tiles from the same cluster, the average values of low_color and high_color , and distances between low_color and high_color should be also pretty close. In other words, the original endpoint quantization algorithm will generate tile clusters with close values of the base color and the modifier table index.

Selectors of each 4×4 block can be represented with a vector of 16 components, corresponding to the selector values of each block pixel. This means that ETC1 selector quantization step is identical to the DXT1 selector quantization step.

The result of the vector quantization performed for both ETC1 endpoints and selectors can be represented in the following way:

endpoint codebook:

selector codebook:

endpoint codebook:

selector codebook:

Note that according to the ETC1 format, the base colors within an ETC1 block can be encoded either as RGB444 and RGB444, or differentially as RGB555 and RGB333. For simplicity, this aspect is currently not taken into account (all the quantized endpoints are encoded as 3555 in the codebook). If it appears that the base colors in the resulting ETC1 block can not be encoded differentially, the decoder will convert both base colors from RGB555 to RGB444 during decompression.

Compression of ETC2 textures (Compression of ETC2 textures)

The Crunch algorithm doesn’t yet support ETC2 specific modes (T, H or P), but it’s capable of efficiently encoding the ETC2 Alpha channel. This means that the current ETC2 + Alpha compression format is equivalent to ETC1 + Alpha. Note that ETC2 encoding is a superset of ETC1, so any texture, which consists of ETC1 color blocks and ETC2 Alpha blocks, can be correctly decoded by an ETC2_RGBA8 decoder.

The Crunch algorithm doesn't yet support ETC2 specific modes (T, H or P), but it's capable of efficiently encoding the ETC2 Alpha channel. This means that the current ETC2 + Alpha compression format is equivalent to ETC1 + Alpha. Note that ETC2 encoding is a superset of ETC1, so any texture, which consists of ETC1 color blocks and ETC2 Alpha blocks, can be correctly decoded by an ETC2_RGBA8 decoder.

ETC2 encoding for the alpha channel is very similar to the ETC1 encoding of the color information. Information about the alpha channel of each block is stored using 64 bits: 8-bit base alpha, 4-bit modifier table index, 4-bit multiplier and 16 3-bit selector values (one selector value per pixel).

The modifier table index and selector value determine a modifier value for a pixel, which is selected from the ETC2 alpha modifier table. For performance reasons, ETC2 Crunch compressor is currently using only the following subset of the modifier table:

The modifier table index and selector value determine a modifier value for a pixel, which is selected from the ETC2 alpha modifier table . For performance reasons, ETC2 Crunch compressor is currently using only the following subset of the modifier table:

modifier table index	modifier0	modifier1	modifier2	modifier3	modifier4	modifier5	modifier6	modifier7
11	-2	-5	-7	-10	1	4	6	9
13	-1	-2	-3	-10	0	1	2	9

modifier table index	modifier0	modifier1	modifier2	modifier3	modifier4	modifier5	modifier6	modifier7
11	-2	-5	-7	-10	1个	4	6	9
13	-1	-2	-3	-10	0	1个	2	9

The final alpha value for each pixel is calculated as base_alpha + modifier * multiplier, which is then clamped.

The final alpha value for each pixel is calculated as base_alpha + modifier * multiplier , which is then clamped.

Note that unlike ETC1 color, ETC2 Alpha is encoded using a single base alpha value per 4×4 pixel block. This means that each element of the alpha endpoint dictionary should correspond to a 4×4 pixel block, covering both primary and secondary ETC1 subblocks. For this reason, alpha channel can be ignored when performing color endpoint tiling.

The compression scheme for ETC2 Alpha blocks is equivalent to the compression scheme for DXT5 Alpha blocks. As has been shown before, vector representation of alpha endpoints doesn’t depend on the used encoding. This means that all the initial processing steps, including alpha endpoint quantization, will be almost identical for DXT5 and ETC2 Alpha channels. The only part which is actually different for the ETC2 Alpha encoding is the final Alpha endpoint optimization step.

The compression scheme for ETC2 Alpha blocks is equivalent to the compression scheme for DXT5 Alpha blocks. As has been shown before, vector representation of alpha endpoints doesn't depend on the used encoding. This means that all the initial processing steps, including alpha endpoint quantization, will be almost identical for DXT5 and ETC2 Alpha channels. The only part which is actually different for the ETC2 Alpha encoding is the final Alpha endpoint optimization step.

In order to perform ETC2 Alpha endpoint optimization, the already existing DXT5 Alpha endpoint optimization algorithm is run to obtain the initial approximate solution. Then the approximate solution is refined based on the ETC2 Alpha modifier table values. Note that ETC2 format supports 16 different Alpha modifier indices, but for performance reasons, only 2 Alpha modifier indices are currently used: modifier index 13, which allows to perform precise approximation on short Alpha intervals, and modifier index 11, which has more or less regularly distributed values, and is used for large Alpha intervals.

In order to perform ETC2 Alpha endpoint optimization, the already existing DXT5 Alpha endpoint optimization algorithm is run to obtain the initial approximate solution. Then the approximate solution is refined based on the ETC2 Alpha modifier table values. Note that ETC2 format supports 16 different Alpha modifier indices, but for performance reasons, only 2 Alpha modifier indices are currently used: modifier index 13 , which allows to perform precise approximation on short Alpha intervals, and modifier index 11 , which has more or less regularly distributed values, and is used for large Alpha intervals.

At first it might seem that different size of the color and alpha blocks can bring some complications for Crunch, as according to the original algorithm, both color and alpha endpoints should share the same endpoint references. This, however, is easily resolved in the following way: each alpha block is using the endpoint reference of the corresponding primary color subblock (this allows to copy alpha endpoint from the left, top, left-top or from the input stream), while the endpoint reference of the secondary color subblock is simply ignored when decoding alpha channel.

Closing summary (Closing summary)

The performed research demonstrates that Crunch compression algorithm in not limited to the DXT format and with some modifications can be used on a different gpu texture formats. We see some research potential to expand this work to cover further texture formats in the future.

翻译自: https://blogs.unity3d.com/2017/12/15/crunch-compression-of-etc-textures/

etc2 纹理压缩