JPEG图片编码格式分析
图片展示需要BGR模式的三维向量,图片的编码是把BGR图片编码成文件能存储的格式,解码则反之。目前常见的编码为jpg、png、gif等。新兴的如webp、heic。
BMP
从简单入手,BMP是最简单的编码方式,甚至数十行代码就能完成编码和解码简单的程序。
bmp由文件头和位图信息头组成
import struct
import numpy as npBITMAP_FILE_HEADER_FMT = '<2sI4xI'
BITMAP_FILE_HEADER_SIZE = struct.calcsize(BITMAP_FILE_HEADER_FMT)
BITMAP_INFO_FMT = '<I2i2H6I'
BITMAP_INFO_SIZE = struct.calcsize(BITMAP_INFO_FMT)class BmpHeader:def __init__(self):self.bf_type = Noneself.bf_size = 0self.bf_off_bits = 0self.bi_size = 0self.bi_width = 0self.bi_height = 0self.bi_planes = 1 # 颜色平面数self.bi_bit_count = 0self.bi_compression = 0self.bi_size_image = 0self.bi_x_pels_per_meter = 0self.bi_y_pels_per_meter = 0self.bi_clr_used = 0self.bi_clr_important = 0class BmpDecoder:def __init__(self, data):self.__header = BmpHeader()self.__data = datadef read_header(self):if self.__header.bf_type is not None:return self.__header# bmp信息头self.__header.bf_type, self.__header.bf_size,\self.__header.bf_off_bits = struct.unpack_from(BITMAP_FILE_HEADER_FMT, self.__data)if self.__header.bf_type != b'BM':return None# 位图信息头self.__header.bi_size, self.__header.bi_width, self.__header.bi_height, self.__header.bi_planes,\self.__header.bi_bit_count, self.__header.bi_compression, self.__header.bi_size_image,\self.__header.bi_x_pels_per_meter, self.__header.bi_y_pels_per_meter, self.__header.bi_clr_used,\self.__header.bi_clr_important = struct.unpack_from(BITMAP_INFO_FMT, self.__data, BITMAP_FILE_HEADER_SIZE)return self.__headerdef read_data(self):header = self.read_header()if header is None:return None# 目前只写了解析常见的24位或32位位图if header.bi_bit_count != 24 and header.bi_bit_count != 32:return None# 目前只写了RGB模式if header.bi_compression != 0:return Noneoffset = header.bf_off_bitschannel = int(header.bi_bit_count / 8)img = np.zeros([header.bi_height, header.bi_width, channel], np.uint8)y_axis = range(header.bi_height - 1, -1, -1) if header.bi_height > 0 else range(0, header.bi_height)for y in y_axis:for x in range(0, header.bi_width):plex = np.array(struct.unpack_from('<' + str(channel) + 'B', self.__data, offset), np.int8)img[y][x] = plexoffset += channelreturn imgclass BmpEncoder:def __init__(self, img):self.__img = imgdef write_data(self):image_height, image_width, channel = self.__img.shape# 只支持RGB或者RGBA图片if channel != 3 and channel != 4:return Falseheader = BmpHeader()header.bf_type = b'BM'header.bi_bit_count = channel * 8header.bi_width = image_widthheader.bi_height = image_heightheader.bi_size = BITMAP_INFO_SIZEheader.bf_off_bits = header.bi_size + BITMAP_FILE_HEADER_SIZEheader.bf_size = header.bf_off_bits + image_height * image_width * channelbuffer = bytearray(header.bf_size)# bmp信息头struct.pack_into(BITMAP_FILE_HEADER_FMT, buffer, 0, header.bf_type, header.bf_size, header.bf_off_bits)# 位图信息头struct.pack_into(BITMAP_INFO_FMT, buffer, BITMAP_FILE_HEADER_SIZE, header.bi_size, header.bi_width, header.bi_height,header.bi_planes, header.bi_bit_count, header.bi_compression, header.bi_size_image,header.bi_x_pels_per_meter, header.bi_y_pels_per_meter, header.bi_clr_used,header.bi_clr_important)# 位图,一般都是纵坐标倒序模式offset = header.bf_off_bitsfor y in range(header.bi_height - 1, -1, -1):for x in range(header.bi_width):struct.pack_into('<' + str(channel) + 'B', buffer, offset, *self.__img[y][x])offset += channelreturn buffer
bmp图片的纵坐标是反过来的,如下图所示:
JPEG
JPEG是一种编码压缩方法,真正描述图片如何存储的是JFIF(JPEG File Interchange Format),但是普通交流中往往使用“JPEG文件”这种叫法。由于精力有限,只尝试了JPEG解码的步骤。
背景知识
DCT
离散余弦变换(discrete cosine transform),把信号从空域转换成频域,且具有较好的能量聚集。变换公式如下:
DCT:,其中。
IDCT:,其中。
可以阅读matlab的帮助文档离散余弦变换- MATLAB & Simulink- MathWorks 中国,或者一篇博客离散余弦变换(DCT)的来龙去脉_独孤呆博的博客-CSDN博客_二维离散余弦变换。
哈夫曼编码
根据符号出现概率,使用较短的编码更频繁出现的符号。更详细的可以阅读详细图解哈夫曼Huffman编码树_无鞋童鞋的博客-CSDN博客_huffman编码树
色差信号
使用亮度和蓝色、红色的浓度偏移量描述图像信号的色彩空间,和RGB转换公式可阅读https://en.wikipedia.org/wiki/YCbCr。使用YCbCr是因为,人眼对于亮度对比的感知能力比色彩的感知能力要强,把亮度分量分离出来后,可以有针对性地使用不同的量化表、采样因子来达到不同的压缩率,且人眼感知不强。
读取JPEG文件Header
JPEG文件在制定规范时,定义文件是由marker和segment组成。marker都是以0xff开头,以非0x00结束。对应常用marker如下:
marker | value | description |
---|---|---|
SOI | 0xFFD8 | 图像开始(Start Of Scan) |
APP0 | 0xFFE0 | 存储图像参数 |
APP1 | 0xFFE1 | EXIF |
APP2 | 0xFFE2 | |
APP12 | 0xFFEC | 图片质量等信息 |
APP13 | 0xFFED | phptoshop存储的信息Photoshop Tags |
SOF0 | 0xFFC0 | Start Of Frame,SOF0是baseline DCT |
SOF2 | 0xFFC2 | Start Of Frame,SOF2是progressive DCT |
DHT | 0xFFC4 | Define Huffman Table,定义哈夫曼编码表,可以有多个,具体重建哈夫曼树方法见下 |
DQT | 0xFFDB | Define Quantization Table,定义量化表,可以有多个。量化表能影响图片的压缩质量 |
DRI | 0xFFDD | Define Restart Interval,重置DC信号的间隔(每解码指定次MCU就重置DC信号) |
SOS | 0xFFDA | Start Of Scan |
image data | 如果有0xFF的数据,会使用0xFF00表示,解码的时候需要注意 | |
EOI | 0xFFD9 | End Of Image |
更多marker可以参考exiftool的文档JPEG Tags
APP0
field | size(bytes) | description |
---|---|---|
长度 | 2 | 包括这个字段为首的整个segment长度 |
标识符 | 5 | 图片编码方式,“JFIF\0"或者”JFXX\0“等,下面的字段均以JFIF为示例 |
JFIF
JFIF版本 | 2 | 第一个字节为主版本,第二个字节为次要版本(01 02表示1.02) |
密度单位 | 1 |
下列像素密度字段的单位 |
x方向密度 | 2 | 水平像素密度。不得为零。 |
y方向密度 | 2 | 垂直像素密度。不得为零。 |
缩略图宽度 | 1 | 嵌入的RGB缩略图的水平像素数。可以为零。 |
缩略图高度 | 1 | 嵌入的RGB缩略图的垂直像素数。可以为零。 |
缩略图数据 | 3n | 未压缩的24位RGB(每个颜色通道8位)光栅缩略图数据,顺序为R0、G0、B0、...Rn、Gn、Bn;其中n = Xthumbnail × Ythumbnail。 |
APP12
field | size(bytes) | description |
---|---|---|
长度 | 2 | 包括这个字段为首的整个segment长度 |
标识符 | "Ducky"等 |
Ducky
field | size(bytes) | description |
---|---|---|
tag | 2 |
0x0001:压缩质量,uint32 0x0002:评论,string 0x0003:版权,string |
长度 | 2 |
接下来的内容长度 |
内容 |
SOF0
field | size(bytes) | description |
---|---|---|
长度 | 2 | 包括这个字段为首的整个segment长度 |
精度 | 1 | 每像素数据位数,一般为8bit |
高度 | 2 | 图像高度 |
宽度 | 2 | 图像宽度 |
颜色分量数 | 1 |
01:灰度图 03:YCbCr图,一般为这个 04:CMYK |
颜色分量信息 | 三色分量数*3 |
3字节分别为:
|
DHT
field |
size(bytes) |
description |
---|---|---|
长度 | 2 | 包括这个字段为首的整个segment长度 |
表类型和表ID | 1 |
高4位为表类型,其中:0位DC,1为AC。 低4位为哈夫曼表ID |
不同长度码字数量 | 16 | 码字长度为1-16位的时候各位的数量,详见下 |
各个码字对应的内容 | 用于构建哈夫曼树 |
举例,若不同码字长度的记录如:
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 5 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
各个长度码字对应内容如下:
bit_length |
code |
symbol |
---|---|---|
2 | 00 | 0 |
3 | 010 | 1 |
3 | 011 | 2 |
3 | 100 | 3 |
3 | 101 | 4 |
3 | 110 | 5 |
4 | 1110 | 6 |
5 | 11110 | 7 |
6 | 111110 | 8 |
7 | 1111110 | 9 |
8 | 11111110 | 10 |
9 | 111111110 | 11 |
DQT
field |
size(bytes) |
description |
---|---|---|
长度 | 2 | 包括这个字段为首的整个segment长度 |
精度和表ID | 1 |
高4位为精度,0为8位,1为16位。 低4位未量化表ID。 |
表数据 | 64*(精度) | 8x8的量化表数据 |
DRI
field |
size(bytes) |
description |
---|---|---|
长度 | 2 | 包括这个字段为首的整个segment长度 |
重置DC间隔 | 2 | 每隔x个MCU就重置一次所有DC信号prev的值,此时读取的文件流会有RST标记,这个RST标记会随着重启次数而递增,如第一次重启标记为0xD0,第二次则为0xD1直到0xD7后会再回到0xD0如此循环 |
一个解码jpeg文件的示例
初始化
获取整个图片component的数量,图片宽高,构建量化表、哈夫曼表,获取到各个component对应哈夫曼表ID、量化表、垂直和水平采样因子。
采样因子
因为图片保存时候是按照YCbCr保存的,而人眼对亮度比较敏感,对于色度不太敏感,因此可以对色度浓度更小的抽样,亮度按照原样抽样。JPEG在解码时候一个解码单元为一个MCU(Minimum Coded Unit)。一个MCU的大小取决于各个component的采样因子。MCU的宽度为max(各个componet的水平采样因子) * 8,高度为max(各个component的垂直采样因子) * 8。常见的,如Y的采样因子为2x2,Cb、Cr的采样因子为1x1,则一个MCU内数据的分布则是Y1、Y2、Y3、Y4、Cb1、Cr1,6小块,每小块是8x8像素的编码单元,整个MCU大小为16x16,这个MCU的各个数据分布如下:
Y1Cb1Cr1 | Y2 |
Y3 | Y4 |
如果一个Y、Cb、Cr的采样因子均为1x1,则整个MCU大小为8x8。
采样因子比例常见有4:2:2, 4:1:1, 4:4:4。
哈夫曼表
哈夫曼表区分了之流和交流表,一般有4个哈夫曼表,分别是亮度DC,亮度AC,色度DC,色度AC表。哈夫曼表用于压缩最终的图像编码。如示例的图片的亮度DC表如下:
bit_length |
code |
symbol |
---|---|---|
2 | 00 | 0 |
3 | 010 | 1 |
3 | 011 | 2 |
3 | 100 | 3 |
3 | 101 | 4 |
3 | 110 | 5 |
4 | 1110 | 6 |
5 | 11110 | 7 |
6 | 111110 | 8 |
7 | 1111110 | 9 |
8 | 11111110 | 10 |
9 | 111111110 | 11 |
量化表
量化表一般为2个,亮度一个,色度一个。量化表用于编码时候把DCT转换后的矩阵除以这个量化表,使得高频部分尽量约等于0,这样在最终编码的时候四舍五入能把大部分高频部分都压缩成0,乐观情况下只有DC,AC全为0,只需要2bit就能表示剩余63个AC信号了!例如示例的DC量化表如下:
16 | 11 | 12 | 14 | 12 | 10 | 16 | 14 |
13 | 14 | 18 | 17 | 16 | 19 | 24 | 40 |
26 | 24 | 22 | 22 | 24 | 49 | 35 | 37 |
29 | 40 | 58 | 51 | 61 | 60 | 57 | 51 |
56 | 55 | 64 | 72 | 92 | 78 | 64 | 68 |
87 | 69 | 55 | 56 | 80 | 109 | 81 | 87 |
95 | 98 | 103 | 104 | 103 | 62 | 77 | 113 |
121 | 112 | 100 | 120 | 92 | 101 | 103 | 99 |
解码一个MCU
解码MCU时,需要读取多个8x8的block。每个block均由1个DC信号和63个AC信号组成。解码时先读取DC信号,再读取AC信号。我们捋一下,对于baseline DCT编码的图片,目前我们各个颜色分量对应的ht,qt关系如下:
也就是说,一个颜色分量有1个直流ht,1个交流ht,1个量化表。
解码一个block
解码DC信号
由于相邻的block之间DC信号的差异很小,DC信号使用DPCM(Differential Pulse Code Modulation)编码。读取时通过对应哈夫曼表找到对应代表符号,然后读取对应位数的数值,然后进行DPCM解码。例如例子中,第一个MCU的亮度component的4个block读取出来的值分别为-37,1,-1,-1,则最终4个block的DC信号分别为-37,-36,-37,-38。每个component之间的DCPM编码是独立的。
解码AC信号
AC信号的编码使用RLE(Run Length Encoding)编码。读取时通过对应哈夫曼表找到对应符号,对应符号的高4位未接下来有几个连续的0,低4位为这些0后面跟着的位数,然后读取对应位数的数值。特别地,当读取到0x00时,代表接下来这个block所有数字为0,当读取的值为0xf0时,代表接下来有16个0。
zig-zag解码
8x8矩阵在编码时并不是按顺序读取,因此解码的时候也需要还原,其读取顺序如下:
zig-zag编码原因是,大部分能量(振幅)都集中在左上角(低频区域),为了编码AC信号时能更高的连续性,AC信号是RLE编码的,因此同一个数值连续数量越多压缩率越大。高频数据往往为0,而AC信号解码时遇到0则表示后续全部为0。
反量化
经过zigzag解码后,此时block的矩阵长这样:
-37 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
需要和量化表点乘一下,最终结果长这样:
-592 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
只有-592一个直流信号,可见这个8x8的block在DCT变换前,64个像素颜色一样(也可能存在一点点差别,但是在量化时候四舍五入走了)。
IDCT
把block从频域转换为空域,计算时往往使用矩阵叉乘加快速度。结果补码转成原码,并把符号位去掉,最终结果:
54 | 54 | 54 | 54 | 54 | 54 | 54 | -54 |
54 | 54 | 54 | 54 | 54 | 54 | 54 | 54 |
54 | 54 | 54 | 54 | 54 | 54 | 54 | 54 |
54 | 54 | 54 | 54 | 54 | 54 | 54 | 54 |
54 | 54 | 54 | 54 | 54 | 54 | 54 | 54 |
54 | 54 | 54 | 54 | 54 | 54 | 54 | 54 |
54 | 54 | 54 | 54 | 54 | 54 | 54 | 54 |
54 | 54 | 54 | 54 | 54 | 54 | 54 | 54 |
解码JPEG文件示例
import abc
import itertools
import struct
import sys
import typing
import cv2
import numpy as np
from scipy import fft, ndimageFRAME_SOF0 = b'\xc0' # Start Of Frame N
FRAME_SOF1 = b'\xc1' # N indicates which compression process
FRAME_SOF2 = b'\xc2' # Only SOF0-SOF2 are now in common use
FRAME_SOF3 = b'\xc3'
FRAME_SOF5 = b'\xc5' # NB: codes C4 and CC are NOT SOF markers
FRAME_SOF6 = b'\xc6'
FRAME_SOF7 = b'\xc7'
FRAME_SOF8 = b'\xc8'
FRAME_SOF9 = b'\xc9'
FRAME_SOF10 = b'\xca'
FRAME_SOF11 = b'\xcb'
FRAME_SOF13 = b'\xcd'
FRAME_SOF14 = b'\xce'
FRAME_SOF15 = b'\xcf'
FRAME_SOI = b'\xd8'
FRAME_EOI = b'\xd9' # End Of Image (end of datastream)
FRAME_SOS = b'\xda' # Start Of Scan (begins compressed data)
FRAME_APP0 = b'\xe0'
FRAME_APP1 = b'\xe1'
FRAME_APP2 = b'\xe2'
FRAME_APP3 = b'\xe3'
FRAME_APP4 = b'\xe4'
FRAME_APP5 = b'\xe5'
FRAME_APP6 = b'\xe6'
FRAME_APP7 = b'\xe7'
FRAME_APP8 = b'\xe8'
FRAME_APP9 = b'\xe9'
FRAME_APP10 = b'\xea'
FRAME_APP11 = b'\xeb'
FRAME_APP12 = b'\xec'
FRAME_APP13 = b'\xed'
FRAME_APP14 = b'\xee'
FRAME_APP15 = b'\xef'
FRAME_DQT = b'\xdb'
FRAME_DRI = b'\xdd'
FRAME_DHT = b'\xc4'ZIGZAGINVERSE = np.array([[0, 1, 5, 6, 14, 15, 27, 28],[2, 4, 7, 13, 16, 26, 29, 42],[3, 8, 12, 17, 25, 30, 41, 43],[9, 11, 18, 24, 31, 40, 44, 53],[10, 19, 23, 32, 39, 45, 52, 54],[20, 22, 33, 38, 46, 51, 55, 60],[21, 34, 37, 47, 50, 56, 59, 61],[35, 36, 48, 49, 57, 58, 62, 63]])
ZIGZAGFLATINVERSE = ZIGZAGINVERSE.flatten()
ZIGZAGFLAT = np.argsort(ZIGZAGFLATINVERSE)def zigzag_single(block):return block.flatten()[ZIGZAGFLAT].reshape([8, 8])def inverse_zigzag_single(array):return array.flatten()[ZIGZAGFLATINVERSE].reshape([8, 8])def decode_number(bit_length, bits):"""补码转原码:param bit_length::param bits::return:"""b = 2 ** (bit_length - 1)if bits >= b:return bitselse:return bits - (2 * b - 1)class BitStream:def __init__(self, data):self.data = dataself.pos = 0def get_bit(self):byte = self.data[self.pos >> 3]offset = 7 - (self.pos & 0x07)self.pos += 1return (byte >> offset) & 0x01def get_bits(self, n):val = 0for _ in range(n):val = val << 1 | self.get_bit()return valclass HuffmanCodec:def __init__(self):self._root = []self._symbol = []def _travel_root(self, root, code, bit_length, result):if len(root) == 0:returnif isinstance(root[0], list):self._travel_root(root[0], code << 1, bit_length + 1, result)else:result.append((code << 1, bit_length, root[0]))if len(root) == 2:if isinstance(root[1], list):self._travel_root(root[1], code << 1 | 1, bit_length + 1, result)else:result.append((code << 1 | 1, bit_length, root[1]))def print_code_table(self, out=sys.stdout):code = 0result = []self._travel_root(self._root, code, 1, result)columns = list(zip(*itertools.chain([('Bits', 'Code', 'Value', 'Symbol')],((str(v[1]), bin(v[0])[2:].rjust(v[1], '0'), str(v[0]), repr(v[2])) for v in result))))widths = tuple(max(len(s) for s in col) for col in columns)template = '{0:>%d} {1:%d} {2:>%d} {3}\n' % widths[:3]for row in zip(*columns):out.write(template.format(*row))def _add_symbol(self, root, symbol, pos):if isinstance(root, list):if pos == 0:if len(root) < 2:root.append(symbol)return Truereturn Falsefor i in [0, 1]:if len(root) == i:root.append([])if self._add_symbol(root[i], symbol, pos - 1):return Truereturn Falsedef build_from_bits(self, bits_length_seq, symbol_seq):self._symbol = symbol_seqsymbol_index = 0 # 当前使用到symbol列表的下标for i, count in enumerate(bits_length_seq):for j in range(count):self._add_symbol(self._root, symbol_seq[symbol_index], i)symbol_index += 1def _find(self, bit_stream: BitStream):node = self._rootwhile isinstance(node, list):node = node[bit_stream.get_bit()]return nodedef get_symbol(self, bit_stream) -> int:while True:res = self._find(bit_stream)if bit_stream == 0:return 0elif res != -1:return resclass JpegMarker:identify = ''def __init__(self, marker):self.__marker = markerdef marker(self):return self.__marker@abc.abstractmethoddef encode(self):pass@abc.abstractmethoddef decode(self, data):passclass JpegApp0Marker(JpegMarker):def __init__(self, marker):super().__init__(marker)self.__main_version = 0self.__sub_version = 0self.__density = 0self.__x_density = 0self.__y_density = 0self.__x_thumbnail = 0self.__y_thumbnail = 0def encode(self):passdef decode(self, data):offset = 0while data[offset] != 0 and offset < len(data) - 1:self.identify += chr(data[offset])offset += 1self.__main_version, self.__sub_version, self.__density, self.__x_density, self.__y_density, \self.__x_thumbnail, self.__y_thumbnail = struct.unpack_from('>3B2H2B', data, offset + 1)class JpegApp1Marker(JpegMarker):def encode(self):passdef decode(self, data):passclass JpegApp12Marker(JpegMarker):def __init__(self, marker):super().__init__(marker)self.quality = Noneself.comment = Noneself.copyright = Nonedef encode(self):passdef decode(self, data):data_size = len(data)tag_name = ''offset = 0while data[offset] != 0 and offset < data_size:tag_name += chr(data[offset])offset += 1if tag_name == 'Ducky':while offset < data_size:tag, = struct.unpack_from('2s', data, offset)if tag == b'\x00\x00':breakoffset += 2value_size, = struct.unpack_from('>H', data, offset)offset += 2if tag == b'\x00\x01':# qualityself.quality, = struct.unpack_from('>I', data, offset)if tag == 2:# commentself.comment, = struct.unpack_from(str(value_size) + 's', data, offset)if tag == 3:# copyrightself.copyright, = struct.unpack_from(str(value_size) + 's', data, offset)offset += value_sizeclass JpegApp13Marker(JpegMarker):def encode(self):passdef decode(self, data):print(data)class JpegApp14Marker(JpegMarker):def __init__(self, marker):super().__init__(marker)self.dct_encode_version = Noneself.flags_0 = Noneself.flags_1 = Noneself.color_transform = Nonedef encode(self):passdef decode(self, data):tag_name, version, flag0, flag1, color = struct.unpack('>5sxB2HB', data)if tag_name == b'Adobe':self.dct_encode_version = versionself.flags_0 = flag0self.flags_1 = flag1self.color_transform = colorclass JpegDQTMarker(JpegMarker):def __init__(self, marker):super().__init__(marker)self.table_id = Noneself.data_matrix = Nonedef encode(self):passdef decode(self, data):tmp, = struct.unpack_from('B', data)precision = (tmp & 0xF0) >> 4 # 0: 8位, 1: 16位self.table_id = tmp & 0x0Fdata = struct.unpack_from(str(64 * (precision + 1)) + 'B', data, 1)self.data_matrix = np.array(data, np.uint8).reshape([8 * (precision + 1), 8 * (precision + 1)])class JpegSof0Marker(JpegMarker):def __init__(self, marker):super().__init__(marker)self.precision = 0self.width = 0self.height = 0self.channel_count = 0self.channel_info = {}def encode(self):passdef decode(self, data):self.precision, self.width, self.height, self.channel_count = struct.unpack_from('>B2HB', data)offset = 6for i in range(self.channel_count):channel_id, sampling_factor, dqt_id = struct.unpack_from('>3B', data, offset)self.channel_info[channel_id] = {'horizontal_factor': (sampling_factor & 0xF0) >> 4,'vertical_factor': sampling_factor & 0x0F,'dqt_id': dqt_id}offset += 3class JpegDhtMarker(JpegMarker):def __init__(self, marker):super().__init__(marker)self.table_id = 0self.type = Noneself.bits_length_seq = Noneself.symbol_seq = Nonedef encode(self):passdef decode(self, data):tmp, = struct.unpack_from('B', data)self.type = (tmp & 0xF0) >> 4self.table_id = tmp & 0x0Fself.bits_length_seq = struct.unpack_from('16B', data, 1)self.symbol_seq = struct.unpack_from(str(sum(self.bits_length_seq)) + 'B', data, 17)class JpegDriMarker(JpegMarker):def __init__(self, marker):super().__init__(marker)self.restart_interval = 0def encode(self):passdef decode(self, data):self.restart_interval, = struct.unpack_from('>H', data)class JpegImageMarker(JpegMarker):def __init__(self, marker):super().__init__(marker)self.channel_count = None # 1灰度图 3YCrCb 4 CMYKself.huffman_map = {}self.image_data = Nonedef encode(self):passdef decode(self, data):info_size, = struct.unpack_from('>H', data)self.image_data = self._remove_ff00(data[info_size:])self.channel_count, = struct.unpack_from('>B', data, 2)offset = 3for i in range(0, self.channel_count):channel_id, huffman_table = struct.unpack_from('>2B', data, offset)offset += 2self.huffman_map[channel_id] = {'AC': (huffman_table & 0xF0) >> 4,'DC': huffman_table & 0x0F,}@staticmethoddef _remove_ff00(data):data_size = len(data)ret = []i = 0while i < data_size - 1:b, bnext = struct.unpack_from('2B', data, i)if b == 0xff and bnext == 0x00:ret.append(data[i])i += 2else:ret.append(data[i])i += 1ret.append(data[-1])return retclass JpegMarkerList:def __init__(self):self.__markers = []def add(self, marker):"""添加marker:param None|JpegMarker marker::return:"""if marker is not None:self.__markers.append(marker)def dump(self):for marker in self.__markers:print(marker.__dict__)def get_markers(self, marker_identify) -> typing.List[JpegMarker]:ret = []for marker in self.__markers:if marker_identify == marker.marker():ret.append(marker)return retdef get_image(self) -> typing.Union[JpegImageMarker, None]:markers = self.get_markers(FRAME_SOS)if len(markers) != 1:return Noneelse:return markers[0]def get_quantization_table(self) -> typing.Union[typing.Dict[int, np.ndarray], None]:markers = self.get_markers(FRAME_DQT)if len(markers) == 0:return Nonequantization_table = {}for marker in markers:quantization_table[marker.table_id] = marker.data_matrixreturn quantization_tabledef get_image_info(self) -> typing.Union[JpegSof0Marker, None]:markers = self.get_markers(FRAME_SOF0)if len(markers) != 1:return Nonereturn markers[0]def get_restart_interval(self) -> int:markers = self.get_markers(FRAME_DRI)if len(markers) != 1:return 0return markers[0].restart_intervaldef get_huffman_table(self) -> typing.Union[typing.Dict[str, typing.Dict[int, HuffmanCodec]], None]:markers = self.get_markers(FRAME_DHT)if len(markers) == 0:return Nonehuffman_table = {'DC': {},'AC': {},}for marker in markers:codec = HuffmanCodec()codec.build_from_bits(marker.bits_length_seq, marker.symbol_seq)if marker.type == 0:huffman_table['DC'][marker.table_id] = codecelse:huffman_table['AC'][marker.table_id] = codecreturn huffman_tableclass JpegDecoder:def __init__(self, data):self.__data = dataself.__offset = 0def _read_bytes(self, fmt='c'):ret = struct.unpack_from(fmt, self.__data, self.__offset)self.__offset += struct.calcsize(fmt)return ret[0]@staticmethoddef _is_sof_marker(marker):return marker == FRAME_SOF0 or marker == FRAME_SOF1 or marker == FRAME_SOF2 or marker == FRAME_SOF3 \or marker == FRAME_SOF5 or marker == FRAME_SOF6 or marker == FRAME_SOF7 or marker == FRAME_SOF8 \or marker == FRAME_SOF9 or marker == FRAME_SOF10 or marker == FRAME_SOF11 or marker == FRAME_SOF13 \or marker == FRAME_SOF14 or marker == FRAME_SOF15@staticmethoddef _is_app_marker(marker):return marker == FRAME_APP0 or marker == FRAME_APP1 or marker == FRAME_APP2 or marker == FRAME_APP3 \or marker == FRAME_APP4 or marker == FRAME_APP5 or marker == FRAME_APP6 or marker == FRAME_APP7 \or marker == FRAME_APP8 or marker == FRAME_APP9 or marker == FRAME_APP10 or marker == FRAME_APP11 \or marker == FRAME_APP12 or marker == FRAME_APP13 or marker == FRAME_APP14 or marker == FRAME_APP15def _read_marker_size(self):size = self._read_bytes('>H')return sizedef _read_app_marker(self, marker):marker_size = self._read_marker_size()if marker_size < 2:return Noneif marker == FRAME_APP0:ret = JpegApp0Marker(marker)elif marker == FRAME_APP1:ret = JpegApp1Marker(marker)elif marker == FRAME_APP12:ret = JpegApp12Marker(marker)elif marker == FRAME_APP13:ret = JpegApp13Marker(marker)elif marker == FRAME_APP14:ret = JpegApp14Marker(marker)else:ret = JpegMarker(marker)ret.decode(self.__data[self.__offset:self.__offset + marker_size - 2])self.__offset += marker_size - 2return retdef _read_sof_marker(self, marker):marker_size = self._read_marker_size()if marker_size < 2:return Noneif marker == FRAME_SOF0:ret = JpegSof0Marker(marker)else:ret = JpegMarker(marker)ret.decode(self.__data[self.__offset:self.__offset + marker_size - 2])self.__offset += marker_size - 2return retdef _read_dqt_marker(self):marker_size = self._read_marker_size()if marker_size < 2:return Noneret = JpegDQTMarker(FRAME_DQT)ret.decode(self.__data[self.__offset:self.__offset + marker_size - 2])self.__offset += marker_size - 2return retdef _read_dht_marker(self):marker_size = self._read_marker_size()if marker_size < 2:return Noneret = JpegDhtMarker(FRAME_DHT)ret.decode(self.__data[self.__offset:self.__offset + marker_size - 2])self.__offset += marker_size - 2return retdef _read_dri_marker(self):marker_size = self._read_marker_size()ret = JpegDriMarker(FRAME_DRI)ret.decode(self.__data[self.__offset:self.__offset + marker_size - 2])self.__offset += marker_size - 2return retdef _read_image(self):marker = JpegImageMarker(FRAME_SOS)if self.__data[-2:] != b'\xff' + FRAME_EOI:return Nonemarker.decode(self.__data[self.__offset:-2])self.__offset = len(self.__data) - 2return markerdef read_markers(self):marker_list = JpegMarkerList()while self.__offset < len(self.__data):if self._read_bytes() != b'\xff':return Nonemarker = self._read_bytes()if self._is_app_marker(marker):marker_list.add(self._read_app_marker(marker))elif self._is_sof_marker(marker):marker_list.add(self._read_sof_marker(marker))elif marker == FRAME_DQT:marker_list.add(self._read_dqt_marker())elif marker == FRAME_DHT:marker_list.add(self._read_dht_marker())elif marker == FRAME_SOS:marker_list.add(self._read_image())elif marker == FRAME_DRI:marker_list.add(self._read_dri_marker())elif marker == FRAME_EOI:return marker_listelif marker == FRAME_SOI:continueelse:return Nonedef read_data(self):markers = self.read_markers()image_marker = markers.get_image()# 仅处理YCrCbif image_marker.channel_count != 3:return None# 获取sof0,得到图片大小和各通道信息image_info = markers.get_image_info()# 仅处理YCrCbif image_info.channel_count != 3:return None# 获取dqt表quantization_table = markers.get_quantization_table()# 哈夫曼编码表huffman_table = markers.get_huffman_table()# 初始化之流信号prev表dc_prev = {}for channel_id in image_info.channel_info.keys():dc_prev[channel_id] = 0# restart intervalrestart_interval = markers.get_restart_interval()restart_interval_to_go = restart_intervalrestart_count = 0# 开始解码image_stream = BitStream(image_marker.image_data)img = np.zeros([image_info.width + 16, image_info.height + 16, 3], np.uint8)for y in range(0, image_info.width, 16):for x in range(0, image_info.height, 16):for channel_id, channel_info in image_info.channel_info.items():channel_huffman_map = image_marker.huffman_map[channel_id]dc_codec = huffman_table['DC'][channel_huffman_map['DC']]ac_codec = huffman_table['AC'][channel_huffman_map['AC']]quantization = quantization_table[channel_info['dqt_id']]for y_block in range(channel_info['vertical_factor']):for x_block in range(channel_info['horizontal_factor']):if restart_interval > 0 and restart_interval_to_go == 0:marker = image_stream.get_bits(8)if (marker & 0x07) != (restart_count & 0x07):raise UserWarning('error restart marker')restart_count += 1restart_interval_to_go = restart_intervalfor i in image_info.channel_info.keys():dc_prev[i] = 0idct, dc_prev[channel_id] = self.process_huffman_data_unit(image_stream,dc_codec,ac_codec,quantization,dc_prev[channel_id])if channel_info['vertical_factor'] == 2 and channel_info['horizontal_factor'] == 2:img[y + y_block * 8: y + (y_block + 1) * 8, x + x_block * 8: x + (x_block + 1) * 8,channel_id - 1] = idctelif channel_info['vertical_factor'] == 2 and channel_info['horizontal_factor'] == 1:tmp = np.zeros([8, 16], np.uint8)tmp[:, ::2] = idcttmp[:, 1::2] = idctimg[y + y_block * 8: y + (y_block + 1) * 8, x: x+16, channel_id - 1] = tmpelif channel_info['vertical_factor'] == 1 and channel_info['horizontal_factor'] == 2:tmp = np.zeros([16, 8], np.uint8)tmp[::2, :] = idcttmp[1::2, :] = idctimg[y: y + 16, x + x_block * 8: x + (x_block + 1) * 8, channel_id - 1] = tmpelse:img[y: y + 16, x: x + 16, channel_id - 1] = ndimage.interpolation.zoom(idct, 2)restart_interval_to_go -= 1# YCbCr 转 BGRfor y in range(0, image_info.width):for x in range(0, image_info.height):color_b = img[y, x, 0] + 1.772 * (img[y, x, 1] - 128)color_g = img[y, x, 0] - 0.344136 * (img[y, x, 1] - 128) - 0.714136 * (img[y, x, 2] - 128)color_r = img[y, x, 0] + 1.402 * (img[y, x, 2] - 128)img[y, x, :] = [color_b, color_g, color_r]return img[0:image_info.width, 0:image_info.height, :]@staticmethoddef process_huffman_data_unit(image_stream: BitStream,dc_codec: HuffmanCodec,ac_codec: HuffmanCodec,quantization: np.ndarray,prev_dc_symbol: int):dct = np.zeros([64])# 解码直流部分huff_symbol = dc_codec.get_symbol(image_stream)if huff_symbol > 0:bits = image_stream.get_bits(huff_symbol)prev_dc_symbol += decode_number(huff_symbol, bits)dct[0] = prev_dc_symbolelse:dct[0] = prev_dc_symbol# 解码交流部分index = 1while index < 64:huff_symbol = ac_codec.get_symbol(image_stream)if huff_symbol == 0:# EOFbreakif huff_symbol > 15:index += huff_symbol >> 4huff_symbol &= 0x0fdct[index] = decode_number(huff_symbol, image_stream.get_bits(huff_symbol))index += 1dct = dct.reshape([8, 8])dct = inverse_zigzag_single(dct)dct = dct * quantizationidct = fft.idct(fft.idct(dct.T, norm='ortho').T, norm='ortho')idct = idct + 0x80idct = idct.astype(np.uint8)return idct, prev_dc_symbolif __name__ == '__main__':with open('test.jpg', 'rb') as fp:decoder = JpegDecoder(fp.read())img = decoder.read_data()cv2.imshow('img', img)cv2.waitKey(0)cv2.destroyAllWindows()
参考文档
https://zh.wikipedia.org/wiki/JPEG%E6%96%87%E4%BB%B6%E4%BA%A4%E6%8D%A2%E6%A0%BC%E5%BC%8F
https://zh.wikipedia.org/wiki/%E8%89%B2%E5%BA%A6%E6%8A%BD%E6%A0%B7
离散余弦变换(DCT)的来龙去脉_独孤呆博的博客-CSDN博客_二维离散余弦变换
JPEG图片编码格式分析相关推荐
- JPEG图片格式简单分析
JPEG文件格式简单分析 作者:小爽 摘要: 这篇文章大体上介绍了JPEG文件的结构信息以及它的压缩算法和编码方式.使读者能够对JPEG文件格式有大体上的了解.为读者进一步进行学习JPEG文件压缩做好 ...
- 海思AI芯片(Hi3519A/3559A)方案学习(十四)JPEG图片转换成bgr文件
原文:https://blog.csdn.net/avideointerfaces/article/details/89931156 前言 在系列文章海思AI芯片(Hi3519A/3559A)方案学习 ...
- JPEG文件格式简单分析
本文选自 http://www.blogjava.net/wilsonny/archive/2005/07/01/7000.aspx 摘要: 这篇文章大体上介绍了JPEG文件的结构信息以及它的压缩算法 ...
- Android直播开发之旅(3):AAC编码格式分析与MP4文件封装(MediaCodec+MediaMuxer)
Android直播开发之旅(3):AAC编码格式分析与MP4文件封装(MediaCodec+MediaMuxer) (码字不易,转载请声明出处:http://blog.csdn.net/andrexp ...
- ffmpeg开发之旅(3):AAC编码格式分析与MP4文件封装(MediaCodec+MediaMuxer)
ffmpeg开发之旅(3):AAC编码格式分析与MP4文件封装(MediaCodec+MediaMuxer) (原文链接:http://blog.csdn.net/andrexpert/article ...
- Android中图片压缩分析(上)
此文章首发:https://mp.weixin.qq.com/s/QZ-XTsO7WnNvpnbr3DWQmg 一.前言 在 Android 中进行图片压缩是非常常见的开发场景,主要的压缩方法有两种: ...
- 【html】设置图片编码格式
设置图片编码格式 <img src="data:image/png;base64,*************************************************** ...
- JPEG 原理详细分析
一 JPEG 概述 JPEG 是 Joint Photographic Experts Group 的缩写,即 ISO 和 IEC 联合图像专家组,负责静态图像压缩标准的制定,这个专家组开发的算法就被 ...
- Android开发——JPEG码流分析
项目中使用Android解析JPEG码流,在此记录一下思路,一切以下方的实例分析图片数据信息编写. 目录 JPEG图片格式 1.SOI(0xFFD8):图片开始段 2.APP0(0xFFE0) 3.A ...
最新文章
- 德国汽车产业研究:立足本土,迈向世界
- cx oracle6 oracle10,python安装cx_Oracle
- python 单词发音-在python中的单词上拆分语音音频文件
- 电脑能上网,手机连上wifi不能上网
- 将数字转化为电话号码(忽略全局属性)
- java BigDecimal去掉小数点后的零
- mac微软雅黑字体_“微软雅黑”有坑,小编向你推荐免费的开源字体
- 生活在别处——“Samsung Cloud Print”云打印体验
- PostgreSQL体系结构和基本操作
- 目标检测——如何获取图片的唯一ID
- [Ubuntu] 添加/删除 ppa 仓库
- 大数据入门初学者需要学习的内容及学习路线详解
- php文件格式,php是什么文件格式
- android10 imei横线,【报Bug】android10设备plus.device.getInfo获取imei为空
- 什么是 MAC 地址,什么时候应该隐藏它?
- cesium添加天地图,从环境配置到加载天地图
- html复习第六天 京东首页布局(导航栏/左侧)
- element-ui表格行不对齐
- ping回显目标主机不可达destination host unreachable的含义及发生情况
- hexo 博客创建、部署、美化过程记录