LDCP 02 - 奇偶校验码发展史

前言：

LDPC码是麻省理工学院Robert Gallager于1963年在博士论文中提出的一种具有稀疏校验矩阵的分组纠错码。

为什么要稀疏校验呢，这个需要从校验码历史发展开看。

这段是LDPC发明者原话：

I invented LDPC about 60 years ago.there are Nelson those cellphones
then data transmission was painfully slow expensive and error-prone.
my main computer at a computational power,halfway between a modern cell
phone and a coffee machine . error correction coding was fascinating theoretical
problem that studied by many mathematicians and engineers.
simple schemes were ineffective but complex schemes were too expensive for
1960 technology.
LDPC was an interesting midpoint cute and interesting to theoreticians but thirty five years before technological feasibility.
that's the way research is and the way it should be hard research problem take
years to solve and should not be overly dependent on details of current technological.
technological capabilities and research projects both proceed at their own pace
and applications ,resolve when both are ready .

thank you

上世纪60s年代发明了LDPC 码。那个时候数据传输一方面特别慢，另一方面容易出错。

在60年代,简单的调度没有什么用,复杂的调度又太昂贵. LDPC理论码是特别有意思的，但是直到35年后才真正应用起来。

Polar 思想是: 信息传输的时候不要出错

LDPC 思想：信息出错了不要紧,我把它纠正过来。

一问题起源

通常我们传输数据的时候，是肯定会出错的。发送的bits流

可能会在某一位0 变成1 ,1变成0，也可能被erase掉了,变成一个不确定的状态。

这里主要讨论erase状态，也就是BEC信道

often we need to transmit sensitive data, such as banking information where
every symbol in the message is vital . one mistake can corrupt, the entire message and
no matter how we communicate . we face this problem one kind of corruption occurs when a bit flips the transmitter sends a 1 but recived as 0 ,or a 0 which is received as a 1 .

these are called errors, another kind of corruption is an erasure which occurs when the bits are so distorted ，they're considered unknowns or blanks in the message for this video we will discuss this problem in terms of erasers ，because it simplifies the main ideas while some communication suffer from less corruption than others .

it's important to note that corruptions are bound to occur, no channels are truly noiseless . so the problem communication engineers will always face ,is how can we transmit our messages perfectly when we know some symbols will be lost in transmission

二 repetion code

最早使用repetion code方式

如场景1：

一个bits copy 一份，同时发送两份:。

接收方收到bit流时,逐个比特对比，当其中一个被erase掉了，就用另一个的值。

, 但是也会发生场景2的情况，同一个位置，同时被erase掉了。

也可以理解每个bit都需要一个bit 做奇偶校验。

如场景2：

如果两个copy里面同一个bit 都发生错了，上面依然会校验失败.

解决方法：

也可以重复发3或者4次，但是又带来另一个问题，码率降低了

$rate=\frac{k}{n*k}$

k:信息bit ,n 为重复次数。

这里面就有个矛盾：

码率和降低解码失败概率

realize we all face this problem during ordinary voice communication,when a word
is misheard in a conversation to recover from this corruption .we simply repeat ourselfs.

technically this is called a repetion code, it's simplest error correction code, for example if we have a binary message ,it would be encoded before transmission into two copies of itself . this might received, for example,if we have a binary message it would be encoded before transmission into two copies itself this might be received. for example with some erasers as follows,these erasers can easily be decoded by comparing the two copies bit by bit.

thus the message would be decoded correctly this compare operation is simple and thus can easily keep us with the speed of communication ,unfortunately simple repetion fails when the same bit is erased in both copies and because this quite common this strategy leaves us very vulnerable to failure ,also it doubles the message length no matter the number erasers we express this efficiency [ɪˈfɪʃnsi] .

issue in terms of code rate where code rate is the orignial divied by the total number of transmitted bits thus the code rate in this repetition scheme is 50% that means half of the bis are need to describle our original message and the other half are there
to protect the message if we use triple repetion [ˈtrɪpl] 三部分的; the code rate would drop to 1/3. as of the rate drops the protection increases but there are fewer message sent so a lower code rate is a more expensive design since bit transmit has some cost so there is a balance to strike between code rate and probality of failure we need a coding strategy with just enough protection bits to prevent failure for any channel whether we expect a high or lowe erasure any channel whether we expect a high or low erasure rate while also maintaining fast code and decode operations to solve this problem

三 parity-check 奇偶校验和二项分布

repetion code ,每个bit 只会保护一个bit.

效率比较低，后来到50年底出现了奇偶校验码

3.1 单奇偶校验

如上图,校验码为1个bit, $p=a_1+a_2+a_3+a_4$ （或者求出其中1的个数,如果为奇数 p=1）

最后组成一个5bit的code word 发送出去

很多外文把 $[a_1,a_2,a_3,a_4]$ 集合叫做一个set

例子：

编码过程：

要发送的信息为 1 000 11 ，加上一个奇偶校验位1（1的个数和为3,为奇数）

解码过程：

当接收方收到数据 100?11 1 时. 因为其 $r_1+r_2+r_3+r_4+r_5=0$ ,

可以直接解码出原码为 1000 11

$code rate =\frac{n}{n+1}$ ,是不是高很多。

问题：

当超过一个bit 被 erase ,就无法解码了

结合二项分布，p为0.5, 这种只能纠正10%的场景。（传输7个bit,只有一个出错，其它都正确的概率为10%）

# -*- coding: utf-8 -*-
"""
Created on Wed Apr 27 15:39:14 2022@author: chengxf2
"""
import numpy as np
from scipy.stats import binom
import matplotlib.pyplot as pltn = 6
x = np.arange(n)
p =0.5#计算每次观测的二项分布概率密度函数为: stats.binom.pmf(k,n,p) 其中n 表示测量次数，
# k 是目标结果的次数， p 为抽样概率。输出为k个元素的数组，数组的每个元素表示出现k次目标结果的概率。
y = binom.pmf(x,n,p)
plt.plot(x, y, 'o-', label='p=0.5')plt.legend()
plt.title( 'p' )
plt.show()
print('当p不同时，成功m次的能性的最大值都出现在均值处，对应概率为n*p')

3.2 多奇偶校验位

这里举例解码 $a_1$ ，当其被erase掉：

单奇偶校验简单的理解：

$[a_1,a_2,a_3,a_4,p]$ 只受到一个set的保护.

如果要解码成功这个，需要 $a_2,a_3,a_4,p$ 全部传输成功，

假设条件独立:

假设都是50%的概率erase,解码成的概率：

$p(a_2)p(a_3)p(a_4)p(P)=\frac{1}{2}^4=\frac{1}{16}$

多奇偶校验：

$[a_1,a_2,a_3,p_1]$

$[a_1,a_2,a_4,p_3]$

$[a_1,a_3,a_4,p_4]$

同样其解码成功的概率为

$p= 1- (\frac{7}{8})^3$ =0.669

其中

$\frac{7}{8}=1-\frac{1}{2}^3$ : 是任意的一个set中，其解码

失败的概率

可以看到增加其set的数量,会提高其解码成功率，

但也会带来码率的降低

need to be smarter about how we use the protection bits first realize with repetition codes each bit is only protecting one bit of information instead we could define a single protection bit that protects the entire message。

from a single erasure no matter where it occurs to to this we use an encoder which counts the number of ones in the message and then chooses a protection bit to ensure that the total message is even .

this way we call this protection bit a parity bit the sender includes this
parity at the end of the message now consider what happens when a single erasure during transmission.the receiving decoder can correct the erasure with a simple operation it counts the number of ones in the received message ，and so it know the erased bit must have been a 1.

notice that a single parity-check bit yields quite an improvement to the code rate compared repetition coding,however the chance of failure has increased compared to repetition with a single parity-check bit.
we could only correct a single erasure and if two of more erasers occur the code fails because there isn't enough information to reconstruct the bits to decrease the chances of failure we can divied up the message into distinct sets and include multiple parity check bits where now each parity check bit protects a different set of bits from a single erasure .we call each of these sets a parity check set and as we add more parity check bits we increase the number of erasure as we can potentially correct ,at the cost of gradually decreasing ,the code rate in general.

the number of parity should be at least equal to the number of erases if we want to have a chance of correcting them all
however we face the same problem with this approach [əˈproʊtʃ] 方法 the code fails if two or more erasers occur in any parity check set in that case the bits can't be recoverd.

四 Hamming 码

1950 年 Richard Hamming 找到了一种简单的方法纠错多个错误。当传输一个序列,

消息bit为4个,3个为校验码，可以纠正任意两个erasers.

这是一种重叠法解码思想（overlapping approach)

先解码集合A

解码出的bit 集合可能有部分跟集合B 重叠

去更新集合B重叠的部分

这个时候多个集合B中的子集就有可能变成集合A中的子集

每次解码时候，里面只能有一个bit是erase

如下图 $a_1,a_2$ 被erase掉后，可以基于 $p_2,p_3$ 校验子集分别纠正出 $a_1,a_2$

erase 状态。

问题：

重叠码思想后来有大量的数学家以及工程师在这个方向做了研究

但是把这个扩展到数以千计的bit,我们遇到新的问题，

这是因为对于较长的消息，大多数人认为我们可以使用更大的重叠子集，但是有许多大的重叠子集将奇偶校验位的解码过程从一个简单的操作变成了一个复杂的谜题，这是因为当子集非常大时，得到只有一个擦除的子集的机会非常小.

这防止了级联效应的开始或继续，相反，奇偶校验位必须使用更复杂的解码操作来解决，因为现在你要解决一个方程组，这个方程组随着问题数量的增加而变得越来越复杂，但这些更复杂。解码操作会将通信过程减慢到不再复杂的程度

实际使用我们需要的是一个具有非常快速解码操作的代码，所以我们剩下的最后一个问题是，我们如何扩展这种重叠子集方法，以便在非常长的消息上快速工作，从而能够处理我们遇到的任何擦除率，同时保持良好的代码速率。

是不是跟Polar 码一样，要提高解码成功率，用SCL，但是又会影响解码速度，带来一系列的问题。

1950 Richard Hamming looking at simple ways of correcting multiple errors when transmitting a sequence of bits。 his trick was to use overlapping parity check sets which made it possible to correct any two erasers as a 4-bit message and three parity check bits each parity check bit covers three message bits now no matter where to erasers occur the code will always be able to correct in this case.
we can't recover the first erasure right away since both both black and blue sets contain
two erasers .the second erasure can be recoverd first beacuse the read set has one erasure and this provides the information we need to recover the first bit this casecade
( effect where one correction allows us to solve another is very important insight ,
because instead of correcting all erasers in one step we can remove erasers in multiple passsesor iterations it's important to note that this kind of cascade effect only works if we have at least one set with only one erasure at each step ,otherwise it would fail .

this led to a decade of excellent research by many people built on this idea of overlapping parity sets much of this work was aimed at finding the right structure for the parity check set arrangement 。

but when we extended this overlapping approach to very long messages with many thousands of bits we ran into a problem。this because with longer messages most assumed that we could use larger overlapping subsets,but having many large overlapping subsets turns the process of decoding the parity check bits from a simple operation into a complex puzzle。

this is because when the subsets are very large the chance of getting a subset in which there is only one erasure is very small ,and this prevents that cascade effect from starting or continuing, instead the parity check bits must be solved using a more complex decoding operation because now you are faced with solving a system of equations which becomes more and more complex as the number of questions grows ,but these more complex。 decoding operations slow down the communication process to the point where it's no longer
practical to use what we need is a code with very fast decode operations ,so we are left with a final problem how can we extend this overlapping subset approach to work quickly over very long messages in a way that can handle any erasure rate we encounter while maintaining a good code Rate .

五 LDPC code

1960 Robert Gallagher 发明了一种新的方法解决通讯编码问题（码率和重叠码冲突问题)

在数据存储以及无线网络中广泛使用。

算法的核心是使用小的奇偶校验集,小的奇偶校验集相对大的奇偶校验集里面的

只有一个erase概率更高（可以通过二项分布计算），可以充分利用硬件的并行计算能力，

同时解决多个简单的问题，而不是一个复杂的问题。

一个消息bit 关联的奇偶校验位越多,密度越高.后面也用Tanner图来表示 ,他的研究受到 Peter Elias的启发。

LDPC 的模型，编码，解码后面会系统的介绍一下。

in 1960 a doctoral thesis by Robert Gallagher discovered a new approach to the communication coding problem,which is in wide use today particularyly in cellular networks and data storage applications.
Gallagher combined to key in the first was to use smaller parity-check sets instead of larger ones this improves the chances that many parity-check sets will contain only a single erasure and enables the casecade effct to happen this was key to keeping the decoding speed fast over long message because it's faster for a computer to solve many simple problems instead of a single complex problem. which solves all of them at once.
Gallaher realized that we need an amount of overlap that is large
enough to protect each bit well, but not so large that the correction problem
becomes very complex he measures the amount of overlap by something he calls
density that is how densely the message bits are connected to parity
check bits with larger subsets we have a high density that is each message bit is connected to many parity check bits 。

with smaller subsets we have a lower density that is
each message bit is connected to fewer parity check bits and this is where the name
low-density parity-check codes comes from,。

but aside from the size of the sets there is also the problem of how to structure the connection between parity and message bits and that leads to his secode insight which was inspired by the work
of Peter Elias， who had recently shown that looking for the right structure
for large subsets was relatively unimportant instead a random subset arrangement
this simplified the problem compared to trying to design an idea structure for
the subsets ，and remarkably it did very well and that's key idea behind low-density
parity-check codes use small overlapping randomly assigned parity check sets this
allows us to solve eraser in a rapid casecade of simple operation no matter
how long the meessage or what the erasure rate is while maintaining a fast decoding
speed ,but there is one last insight to understand when it came to the actual
construction of these codes recall that Hammonds construction foucused on
protecting each message bit with multiple parity check in order to better protect
those bits in case of multiple erasers the problem is this dosent's proivde
the same protection to parity bits which can also be lost during transmission
notice that each parity check bit is itself only connected to one parity check
set since any bit that is connected to only one parity check set is vulnerable,
ˈvʌlnərəbl] 脆弱的，this leaves all parity check bits vulnerable.
this can make correction impossible,for example this would happen when one message
bit is erased and two parity bits it's connected to is also erased,
in this case we are stuck。

because the casecade can't happen and over very long messages
with many small parity check sets this kind of failure would become more and
more likely to happen, the solution to this dilemma ,is to instead
view things in a more general way and protect the parity bits and the
message bits equally,that is we make sure all bits message and parity
are contained in multiple check sets and now we need to understand the basics of
low-density parity-check codes or LDPC codes to see this in action , lets's do a
very simple example now say our message is 101,recall that we chose our
subsets randomly,to generate the code ,encoding is now done by finding a
collection of parity bits so all the check sets have an even number of ones this
involves solving a simple set of linear equations ,
now imagine our message is received with 3 erasers as follows to decode

this message we begin with the check sets that have a single erasure
first such as this one and we repeat this process for every check bit
until there are no more erasers left that's one way iterative can work using
low-density parity-check codes it's simple fast reliable,and can be extended
to messages of any length ,today many communication devices use LDPC codes
and versions of LDPC codes are being proposed for fifth generation wireless
system known as 5g but important to note that when these coats came out they
did not have a big impact at first and were almost forgotten for many years

LDCP 02 - 奇偶校验码发展史相关推荐

2020软件测试自学全套教程-基于python自动化软件测试-2020新版软件测试中级程序员学习路线
不知不觉间,在软件测试行业野蛮的折腾了七年之久.七年之痒也即将过去,但我还是热爱着软件测试这一份工作,一路坚持,走到现在.经历过各种难题,有过迷茫,有过焦虑失眠.也踩过无数的坑,深知行业的不易.自从9 ...
2022软件测试自学全套教程-基于python自动化软件测试-2022新版软件测试中级程序员学习路线
不知不觉间,在软件测试行业野蛮的折腾了七年之久.七年之痒也即将过去,但我还是热爱着软件测试这一份工作,一路坚持,走到现在.经历过各种难题,有过迷茫,有过焦虑失眠.也踩过无数的坑,深知行业的不易.自从9 ...
一篇有点长的HTTP协议详解
一.HTTP简介 01.什么是HTTP HTTP(HyperText Transfer Protocol ,超文本传输协议),是一个基于请求与响应的,无状态的,应用层的协议,常基于TCP/IP协议传输 ...
艾永亮：一瓶酱油市值超过中石化！海天味业到底有多牛？
他成功地把"一瓶酱油",变成了一款超级产品. 01 酱油界的茅台 2020年9月3号,海天味业股价到达203元高位,市值成功突破6500亿,远远超过4800亿的中国石化,逼近780 ...
【Linux】一步一步学Linux——Unix发展史(02)
目录 00. 目录 01. 请参考Unix传奇 02. 03. 00. 目录 @ 本博客后面会更新 01. 请参考Unix传奇链接: https://coolshell.cn/articles/23 ...
数仓建模—数仓架构发展史(02)
发展史时代的变迁,生死的轮回,历史长河滔滔,没有什么是永恒的,只有变化才是不变的,技术亦是如此,当你选择互联网的那一刻,你就相当于乘坐了一个滚滚向前的时代列车,开往未知的方向,不论什么样的技术架构只 ...
02，计算机组成与操作系统的理解、对程序的理解、OS发展史、Linux常见版本以及一些杂项...
这里马哥讲了好多,能听懂,但需要记得东西好少,不过我还是按照我对计算机的理解画了一张图, 不一定准确 , 但是差不多把, 计算机底层是由运算器.控制器.存储器.输入设备 ...
电脑的发展史_新时代新征程，新华电脑教育母公司中国东方教育成功港股上市！...
中国最大的职业技能教育提供商中国东方教育控股有限公司(股票代码:http://667.HK),6月12日上午9点30分在港交所成功上市,成为港股教育板块中最大的职业技能培训企业. 作为中国东方教育旗下 ...
必知！4张图看尽AI发展史重大里程碑
导读:本文主要是人工智能的基本概述,包括人工智能的起源和发展,以及基于机器学习和深度学习的下一代人工智能. 4张配图(图1至图4)是本文重点,是了解人工智能必备的基础知识. 作者:王健宗瞿晓阳来源 ...

LDCP 02 - 奇偶校验码发展史

LDCP 02 - 奇偶校验码发展史相关推荐

最新文章

热门文章