1. 纠错码




纠错码(error correcting code),在传输过程中发生错误后能在收端自行发现或纠正的码。纠错码常用于保证信息在noisy channel的可靠传输以及保证信息在媒介上的可靠存储(可能会随着时间partially corrupted或者说相应的reading device is subject to errors)。
为使一种码具有检错或纠错能力,须对原码字增加多余的码元,以扩大码字之间的差别 ,即把原码字按某种规则变成有一定剩余度(见信源编码)的码字,并使每个码字的码之间有一定的关系。关系的建立称为编码。码字到达收端后,可以根据编码规则是否满足以判定有无错误。当不能满足时,按一定规则确定错误所在位置并予以纠正。纠错并恢复原码字的过程称为译码。

纠错码的典型应用是:将message切分为小的blocks,每个block单独编码,当只对某部分信息感兴趣时,可仅解码相应的部分即可。这种策略的优点是:可保证random-access retrieval of information的效率。缺点是:抗噪能力比较弱,哪怕仅有一个block completely corrupted了,相应的信息也就完全丢失了。

提高抗噪能力的一种办法是:对整个message使用纠错码进行编码(encode the whole message into a single codeword of an error-correcting code)。这种策略抗噪能力是增强了,但是当某人仅对某一部分信息感兴趣时,也需要恢复整个message。当面对的是现代的大数据集时,对应的解码复杂度也是令人难以接受的。

1.1 线性纠错码Linear code

其中的kkk为the dimension of the code,nnn为the block length of the code。Linear code需要用nnn个符号来传递kkk长的message,相应的效率R=k/nR=k/nR=k/n。

A linear code of length nnn, dimension kkk,and minimum distance ddd will be called an [n,k,d][n,k,d][n,k,d] code. 其中 d=mindist(u⃗,v⃗)=minwt(u⃗,v⃗),whereu⃗!=v⃗d=min\ dist(\vec{u},\vec{v})=min\ wt(\vec{u},\vec{v}), where\ \vec{u}!=\vec{v}d=min dist(u,v)=min wt(u,v),where u!=v。

最小距离ddd与可correct errors的关系为:

maximum likelihood decoding: 最大释然解码。
nearest neighbor decoding:最近邻解码。
由于error vector e⃗\vec{e}e未知,在实际解码时,若kkk值小,可采用暴力解码的方式将received vector y⃗\vec{y}y​与2k2^k2k个可能的x⃗\vec{x}x对比,找到最接近的值。但是当kkk值很大时,暴力解码就不实用了。

1.2 nonlinear code


线性与非线性code的表示方法是有差异的。用圆括号()表示的code可为linear或nonlinear,而用中括号[]表示的就是linear code。
linear code [n,k,d][n,k,d][n,k,d]也可表示为(n,2k,d)(n, 2^k,d)(n,2k,d)。
在相同的ddd的情况下,若希望最终编码的数量MMM尽可能多,可以用nonlinear code。

常见的nonlinear code有Hadamard code

1.2.1 Hadamard code

Hadamard matrix定义如下:

Normalized Hadamard matric如:

若a Hadamard matrix HHH of order nnn exists, then nnn is 1, 2 or a multiple of 4.

经典的Hadmard code是将nnn-bit messages编码为2n2^n2n-bit codewords。
由此可知,Hadmard code的query complexity为2,codeword length exponential in the message length.


CHADC_{HAD}CHAD​为2-query (2,δ,2δ)(2,\delta,2\delta)(2,δ,2δ)-LDC,证明如下:

上图中的译码过程,会恢复出所有的原码。当仅需要恢复一个原码而不是完整的所有原码时,---->于是有了LDC(Locally Decodable Code)。

2. LDC

Locally DECODABLE CODE是纠错码的一种,LDC既满足了抗噪性的要求,同时也能提供高效的random-access retrieval。(
allowing reliablereconstruction of an arbitrary bit of the message from looking at only a small number of randomly chosen codeword bits)

LDC不仅可用于可靠传输和可靠存储,还可用于其它领域,如:cryptography, complexity theory, data structures, derandomization, and the theory of fault tolerant computation.

Locally decodable codes can be seen as the combinatorial analogs of self-correctors [70, 21] that have been studied in complexity theory in the late 1980s. LDCs were also explicitly discussed in the PCP literature in early 1990s, most notably in [6, 88, 80]. However the fifirst formal defifinition of LDCs was given only in 2000 by Katz and Trevisan [64]. See also Sudan et al. [90]. Since then the study of LDCs has grown into a fairly broad fifield.

2.1 LDC定义



A rrr-query locally decodable code CCC encodes kkk-bit messages x⃗\vec{x}x in such a way that one can probabilistically recover any bit x(i)x(i)x(i) of the message by querying only rrr bits of the (possibly corrupted) codeword C(x⃗)C(\vec{x})C(x), where rrr can be as small as 2.

在LDC中,需要关注的参数主要有:codeword length以及query complexity。如何在codeword length和query complexity之间做取舍平衡,是当前LDC研究领域的热点。

  • The length of the code measures the amount of redundancy that is introduced into the message by the encoder.
  • The query complexity counts the number of bits that need to be read from the (corrupted) codeword in order to recover a single bit of the message.

2.2 Smooth LDC定义

根据queries的次数(query complexity)和codeword length(upper bounds和lower bounds),当前的研究成果主要有:

2.3 LDC的技术分类


  • 第一代LDC:多项式插值。capture codes that are based on the idea of polynomial interpolation。其编码实现为:将messages通过有限域内的多变量低阶多项式evaluation。典型代表为Reed-Muller(RM) LDC。当message length为kkk,query complexity r≥2r\geq2r≥2时,RM LDC的codeword length为exp(k1(r−1))exp(k^{\frac{1}{(r-1)}})exp(k(r−1)1​)。
  • 第二代LDC:多项式插值+递归。第二代LDC的构建是非直接的,分为两步:1)one obtains certain cryptographic protocols called Private Information Retrieval schemes, or PIRs, that on their own, are objects of interest. 2)one turns PIRs into LDCs. 第二代的LDC可以承受一定比例的错误。当message length为kkk,query complexity rrr时,第二代LDC的codeword length为exp(kO(loglogrrlogr))exp(k^{O(\frac{log\ log\ r}{r\ log\ r})})exp(kO(r log rlog log r​))。
  • 第三代LDC:代数组合思想。典型代表为Matching Vector(MV) LDC。MV codes可设计为最优的容错率(如,字母表1/2−ϵ1/2-\epsilon1/2−ϵ的错误率,以及二进制表1/4−ϵ1/4-\epsilon1/4−ϵ的错误率)。

2.3.1 第一代LDC——Reed-Muller LDC

Reed-Muller(RM) LDC主要由三个参数决定:

  • a prime power (alphabet size) qqq;
  • number of variables nnn;
  • a degree d<q−1d<q-1d<q−1。

【A DDD-evaluation of a function hhh defined over a domain DDD, is a vector of values of hhh at all points of DDD;】

RM LDC的qqq进制编码结果由以下内容组成:
在ring Fq[z1,...,zn]F_q[z_1,...,z_n]Fq​[z1​,...,zn​]内,对FqnF_q^nFqn​内的所有点在所有多项式(多项式的阶之和不超过ddd)的evaluation值组成。

RM LDC可将k=(n+dd)k=\begin{pmatrix} n+d\\ d \end{pmatrix}k=(n+dd​)长的消息编码为qnq^nqn长的码字。

3. LDC vs PIR(私有信息检索)

