实验三 LZW编解码算法实现与分析

LZW简述

本部分参考wiki https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch

LZW压缩算法在1978年提出，由 Abraham Lempel, Jacob Ziv, and Terry Welch发明，因此称为LZW算法，是第一种在全世界计算机中广泛应用的压缩算法。

LZW是一种自适应词典编码，即对文件中的数据串进行分析，根据数据具体情况生成码字，字符串相对应的词典，在词典的基础上进行编解码。作为自适应词典编码，与一般词典编码不同处在于：一般词典编码需要遍历两遍文件，第一遍创造词典，第二遍进行编码，且传输数据时需要传送词典信息。自适应词典编码则只遍历一遍，一边创造词典一边编码，也不需要传送词典信息。

LZW编解码示例分析

从实例可以更容易理解LZW的编解码原理。

LZW编码

先介绍LZW编码基本步骤：

 1. 初始状态，字典里只有所有的默认项，例如0->a，1->b，2->c。此时P和C都是空的。2. 读入新的字符C，与P合并形成字符串P+C。3. 在字典里查找P+C，如果:- P+C在字典里，P=P+C。- P+C不在字典里，将P的记号输出；在字典中为P+C建立一个记号映射；更新P=C。4. 返回步骤2重复，直至读完原字符串中所有字符。

一般来说，默认词典即字符与其对应的ASCII码，即单个字符的词典。

假设对这么一串数据进行LZW编码：

ababbcbacb

j假设初始词典： ‘a’->0; ‘b’->1; ‘c’->2

具体过程如下

Step	P	C	P+C	P+C in dict？	Action	Output
1	—	a	a	Y	next P = ‘a’	—
2	a	b	ab	N	next P = ‘b’; 3 <- ‘ab’	0
3	b	a	ba	N	next P = 'a; 4 <- ‘ba’	1
4	a	b	a	Y	next P = ‘ab’;	—
5	ab	b	abb	N	next P = ‘b’; 5 <- ‘abb’	3
6	b	c	bc	N	next P = ‘c’; 6 <- ‘bc’	1
7	c	b	cb	N	next P = 'b; 7 <- ‘cb’	2
8	b	a	ba	Y	next P = ‘ba’	—
9	ba	c	bac	N	next P = ‘a’; 8 <- ‘bac’	4
10	c	b	cb	Y	next P = ‘cb’	—
11	cb	—	—	—	—	7

这串数据就可译成:0131247。

LZW解码

再将已编码的数据重新解码，解码算法比编码要复杂一些

1. 初始状态，字典里只有所有的默认项，例如0->a，1->b，2->c。此时pW和cW都是空的。
2. 读入第一个的符号cW，解码输出。注意第一个cW肯定是能直接解码的，而且一定是单个字符。
3. 赋值pW=cW。
4. 读入下一个符号cW。
5. 在字典里查找cW，如果:a. cW在字典里：(1) 解码cW，即输出 Str(cW)。(2) 令P=Str(pW)，C=Str(cW)的**第一个字符**。(3) 在字典中为P+C添加新的记号映射。b. cW不在字典里:(1) 令P=Str(pW)，C=Str(pW)的**第一个字符**。(2) 在字典中为P+C添加新的记号映射，这个新的记号一定就是cW。(3) 输出P+C。
6. 返回步骤3重复，直至读完所有记号。

其中最重要也是最难理解的时步骤5，通过实例理解会容易些。

将以下码字进行解码：

假设默认词典同上，解码具体过程如下，其中第一个字符肯定在默认词典中，可以顺利解码。

Step	pW	cW	cW in dict ?	Action	Output
1	—	0	—	—	a
2	0	1	Y	P = ‘a’; C = ‘b’; P+C = ‘ab’; 3 <- ‘ab’	b
3	1	3	Y	P = ‘b’; C = ‘a’; P+C = ‘ba’; 4 <- ‘ba’	ab
4	3	1	Y	P = ‘ab’; C = ‘b’; P+C = ‘abb’; 5 <- ‘abb’	b
5	1	2	Y	P = ‘b’; C = ‘c’; P+C = ‘bc’; 6 <- ‘bc’	c
6	2	4	Y	P = ‘c’; C = ‘b’; P+C = ‘cb’; 7 <- ‘cb’	ba
7	4	7	Y	P = ‘ba’; C = ‘c’; P+C = ‘bac’; 8 <- ‘bac’	cb

上述过程没有出现cW not in dict 的情况，这种情况下其他都相同，除了C赋值为pW的第一个字符，且输出为P+C。

解码后得到数据:ababbcbacb。

不仅得到了原数据，而且解码过程中创造的词典与编码时创造的一致。

LZW编解码的数据结构

继续分析之前的例子。

其实稍微分析我们就能看出，在编解码过程中添加的词典项目的前缀，往往在词典中早就存在。

因此，这些词典项目可以表示为：

3：‘ab’ = 0b
4: ‘ba’ = 1a
5: ‘abb’ = 3b
以此类推

有此可推出：为了便于LZW编解码的C++实现，本实验选择了字典树法的数据结构。

可创造字典树（查找树）：

对每一个词典项目，即每一个节点，可以定义类：

struct {int suffix;int parent;//default = -1int firstchild;//default = -1int nextsibling;//default = -1
} dictionary

suffix:当前项对应字符的最后一项，即从节点创造child时添加的末尾字符。
parent:当前节点对应的母节点的索引号，没有则为-1.
firstchild当前节点的第一个子节点（在图中显示为最左边的子节点）的索引号，没有则为-1.
nextsibling下一个同级节点的索引号，对‘ba’来说nextsibling为‘bc’，即6。没有则为-1

将这种思想引申到代码中，可以将编码/解码思想总结如下：

建立包含所有ascii字符的初始默认dictionary（0-255），int nextcode用于标记新写入的dictionary索引号
读取数据存入内存
依据上述的编码思想依次读取字符进行编码，同时对dictionary的属性进行操作
编码后输出文件。

LZW编解码代码分析：

实验所给出的具体代码如下：

bitio.h

/** Declaration for bitwise IO** vim: ts=4 sw=4 cindent*/
#ifndef __BITIO__
#define __BITIO__#include <stdio.h>typedef struct{FILE *fp;unsigned char mask;int rack;
}BITFILE;BITFILE *OpenBitFileInput( char *filename);
BITFILE *OpenBitFileOutput( char *filename);
void CloseBitFileInput( BITFILE *bf);
void CloseBitFileOutput( BITFILE *bf);
int BitInput( BITFILE *bf);
unsigned long BitsInput( BITFILE *bf, int count);
void BitOutput( BITFILE *bf, int bit);
void BitsOutput( BITFILE *bf, unsigned long code, int count);
#endif  // __BITIO__

bitio.c

/** Definitions for bitwise IO** vim: ts=4 sw=4 cindent*/#include <stdlib.h>
#include <stdio.h>
#include "bitio.h"
BITFILE *OpenBitFileInput( char *filename){BITFILE *bf;bf = (BITFILE *)malloc( sizeof(BITFILE));if( NULL == bf) return NULL;if( NULL == filename)  bf->fp = stdin;else bf->fp = fopen( filename, "rb");if( NULL == bf->fp) return NULL;bf->mask = 0x80;bf->rack = 0;return bf;
}BITFILE *OpenBitFileOutput( char *filename){BITFILE *bf;bf = (BITFILE *)malloc( sizeof(BITFILE));if( NULL == bf) return NULL;if( NULL == filename)    bf->fp = stdout;else bf->fp = fopen( filename, "wb");if( NULL == bf->fp) return NULL;bf->mask = 0x80;bf->rack = 0;return bf;
}void CloseBitFileInput( BITFILE *bf){fclose( bf->fp);free( bf);
}void CloseBitFileOutput( BITFILE *bf){// Output the remaining bitsif( 0x80 != bf->mask) fputc( bf->rack, bf->fp);fclose( bf->fp);free( bf);
}int BitInput( BITFILE *bf){int value;if( 0x80 == bf->mask){bf->rack = fgetc( bf->fp);if( EOF == bf->rack){fprintf(stderr, "Read after the end of file reached\n");exit( -1);}}value = bf->mask & bf->rack;bf->mask >>= 1;if( 0==bf->mask) bf->mask = 0x80;return( (0==value)?0:1);
}unsigned long BitsInput( BITFILE *bf, int count){unsigned long mask;unsigned long value;mask = 1L << (count-1);value = 0L;while( 0!=mask){if( 1 == BitInput( bf))value |= mask;mask >>= 1;}return value;
}void BitOutput( BITFILE *bf, int bit){if( 0 != bit) bf->rack |= bf->mask;bf->mask >>= 1;if( 0 == bf->mask){ // eight bits in rackfputc( bf->rack, bf->fp);bf->rack = 0;bf->mask = 0x80;}
}void BitsOutput( BITFILE *bf, unsigned long code, int count){unsigned long mask;mask = 1L << (count-1);while( 0 != mask){BitOutput( bf, (int)(0==(code&mask)?0:1));mask >>= 1;}
}
#if 0
int main( int argc, char **argv){BITFILE *bfi, *bfo;int bit;int count = 0;if( 1<argc){if( NULL==OpenBitFileInput( bfi, argv[1])){fprintf( stderr, "fail open the file\n");return -1;}}else{if( NULL==OpenBitFileInput( bfi, NULL)){fprintf( stderr, "fail open stdin\n");return -2;}}if( 2<argc){if( NULL==OpenBitFileOutput( bfo, argv[2])){fprintf( stderr, "fail open file for output\n");return -3;}}else{if( NULL==OpenBitFileOutput( bfo, NULL)){fprintf( stderr, "fail open stdout\n");return -4;}}while( 1){bit = BitInput( bfi);fprintf( stderr, "%d", bit);count ++;if( 0==(count&7))fprintf( stderr, " ");BitOutput( bfo, bit);}return 0;
}
#endif

lzw_E.c

/** Definition for LZW coding ** vim: ts=4 sw=4 cindent nowrap*/
#include <stdlib.h>
#include <stdio.h>
#include "bitio.h"
#define MAX_CODE 65535struct {int suffix;int parent, firstchild, nextsibling;
} dictionary[MAX_CODE+1];
int next_code;
int d_stack[MAX_CODE]; // stack for decoding a phrase#define input(f) ((int)BitsInput( f, 16))
#define output(f, x) BitsOutput( f, (unsigned long)(x), 16)int DecodeString( int start, int code);
void InitDictionary( void);
void PrintDictionary( void){int n;int count;for( n=256; n<next_code; n++){count = DecodeString( 0, n);printf( "%4d->", n);while( 0<count--) printf("%c", (char)(d_stack[count]));printf( "\n");}
}int DecodeString( int start, int code){int count;count = start;while( 0<=code){d_stack[ count] = dictionary[code].suffix;code = dictionary[code].parent;count ++;}return count;
}
void InitDictionary( void){int i;for( i=0; i<256; i++){dictionary[i].suffix = i;dictionary[i].parent = -1;dictionary[i].firstchild = -1;dictionary[i].nextsibling = i+1;}dictionary[255].nextsibling = -1;next_code = 256;//下一个字典条目编号
}
/** Input: string represented by string_code in dictionary,* Output: the index of character+string in the dictionary*      index = -1 if not found*/
int InDictionary( int character, int string_code){int sibling;if( 0>string_code) return character;//针对第一个字符没有母节点的情况sibling = dictionary[string_code].firstchild;//stringcode + characterwhile( -1<sibling){if( character == dictionary[sibling].suffix) return sibling;sibling = dictionary[sibling].nextsibling;}return -1;
}void AddToDictionary( int character, int string_code){int firstsibling, nextsibling;if( 0>string_code) return;dictionary[next_code].suffix = character;dictionary[next_code].parent = string_code;dictionary[next_code].nextsibling = -1;dictionary[next_code].firstchild = -1;firstsibling = dictionary[string_code].firstchild;if( -1<firstsibling){  // the parent has childnextsibling = firstsibling;while( -1<dictionary[nextsibling].nextsibling ) nextsibling = dictionary[nextsibling].nextsibling;dictionary[nextsibling].nextsibling = next_code;}else{// no child before, modify it to be the firstdictionary[string_code].firstchild = next_code;}next_code ++;
}void LZWEncode( FILE *fp, BITFILE *bf){int character;int string_code;int index;unsigned long file_length;fseek( fp, 0, SEEK_END);file_length = ftell( fp);fseek( fp, 0, SEEK_SET);BitsOutput( bf, file_length, 4*8);InitDictionary();string_code = -1;while( EOF!=(character=fgetc( fp))){index = InDictionary( character, string_code);if( 0<=index){    // string+character in dictionarystring_code = index;}else{   // string+character not in dictionaryoutput( bf, string_code);if( MAX_CODE > next_code){    // free space in dictionary// add string+character to dictionaryAddToDictionary( character, string_code);}string_code = character;//P=C}}output( bf, string_code);//对应最后一个字符没有suffix的情况
}void LZWDecode( BITFILE *bf, FILE *fp){int character;int new_code, last_code;int phrase_length;unsigned long file_length;InitDictionary();file_length = BitsInput( bf, 4*8);//预存输出码字if( -1 == file_length) file_length = 0;/*需填充*/last_code = -1;//第一个码字前无码字，设为-1,且第一个码字一定在默认词典(ascii)中，所以使用character一定有有效值，不必赋初值while( 0<file_length){new_code = input( bf);if( new_code >= next_code){ // this is the case CSCSC( not in dict)d_stack[0] = character;///dstack[0]存储当前last_code第一位phrase_length = DecodeString( 1, last_code);//空留dstack[0]，解码pW}else{//in dictphrase_length = DecodeString( 0, new_code);//解码cW}character = d_stack[phrase_length-1];//character取PW/CW的第一位while( 0<phrase_length){phrase_length --;fputc( d_stack[ phrase_length], fp);//倒序输出解码得到的字符file_length--;}if( MAX_CODE>next_code){   // add the new phrase to dictionaryAddToDictionary( character, last_code);}last_code = new_code;}
}int main( int argc, char **argv){FILE *fp;BITFILE *bf;if( 4>argc){fprintf( stdout, "usage: \n%s <o> <ifile> <ofile>\n", argv[0]);fprintf( stdout, "\t<o>: E or D reffers encode or decode\n");fprintf( stdout, "\t<ifile>: input file name\n");fprintf( stdout, "\t<ofile>: output file name\n");return -1;}if( 'E' == argv[1][0]){ // do encodingfp = fopen( argv[2], "rb");bf = OpenBitFileOutput( argv[3]);if( NULL!=fp && NULL!=bf){LZWEncode( fp, bf);fclose( fp);CloseBitFileOutput( bf);fprintf( stdout, "encoding done\n");}}else if( 'D' == argv[1][0]){   // do decodingbf = OpenBitFileInput( argv[2]);fp = fopen( argv[3], "wb");if( NULL!=fp && NULL!=bf){LZWDecode( bf, fp);fclose( fp);CloseBitFileInput( bf);fprintf( stdout, "decoding done\n");}}else{    // otherwisefprintf( stderr, "not supported operation\n");}system("pause");return 0;
}

需要着重分析的是lzw_E.c中的代码。

结构体定义与初始化部分

struct {int suffix;int parent, firstchild, nextsibling;
} dictionary[MAX_CODE+1];
int next_code;
int d_stack[MAX_CODE]; // stack for decoding a phrasevoid InitDictionary( void){int i;for( i=0; i<256; i++){dictionary[i].suffix = i;dictionary[i].parent = -1;dictionary[i].firstchild = -1;dictionary[i].nextsibling = i+1;}dictionary[255].nextsibling = -1;next_code = 256;//下一个字典条目编号
}

定义了dictionary结构体，预留充足内存空间以写入新项目。

对结构体的初始化，即写入单字符ascii码词典的过程。需要注意的是此时所有词典都为一级节点，没有母节点，子节点，因此parent,firstchild都为-1,且第255项没有nextsibling，值为-1。

dstack[]再解码时用到，用来存储一次循环解码出的一串数据（字符串）。

全局变量next_code用于标记下一项将要写入词典的项目的索引号。

编码部分

void LZWEncode( FILE *fp, BITFILE *bf){int character;int string_code;int index;unsigned long file_length;fseek( fp, 0, SEEK_END);file_length = ftell( fp);fseek( fp, 0, SEEK_SET);BitsOutput( bf, file_length, 4*8);InitDictionary();string_code = -1;while( EOF!=(character=fgetc( fp))){index = InDictionary( character, string_code);if( 0<=index){  // string+character in dictionarystring_code = index;}else{   // string+character not in dictionaryoutput( bf, string_code);if( MAX_CODE > next_code){    // free space in dictionary// add string+character to dictionaryAddToDictionary( character, string_code);}string_code = character;//P=C}}output( bf, string_code);//对应最后一个字符没有suffix的情况
}

共涉及到查找字典，添加词典条目以及解码三个函数，我们逐个分析。

先来看主函数LZWEncode：

在LZWEncode中，character用于存储当前新读取的单字符C（ascii码），index判断新读取的P+C对应的词典索引号（存在即为索引号，不存在即为-1）。string_code存放最终编码得到的码字并输出，bf为输出码字预存。

需要指出的是，在LZWEncode函数中，由于直接字符P+C实现难以操作，可以运用ascii码的特点，将字符首尾相连转换为两者对应词典索引值相加（新读取的单字符C必然在词典中）。用此索引值在dict中查询，若P+C在dict中，则返回其对应的索引值。然后nextP = C，string_code = index，并输出。

整个函数流程如下：

第一次读取：不输出，nextP=C（string_code = character），进行下一次读取。

读取下一个字符C，结合之前的到的P，首先用InDictionary函数返回的index值判断P+C是否在词典中。

在词典中：则返回index值为P+C对应索引值，直接index赋值给string_code（nextP=P+C），不进行输出，进行下一次读取。
不在词典中：返回index为-1。首先output( bf, string_code)输出P对应对应索引值。用AddToDictionary将P+C写入词典。并nextP=C（string_code = character），进行下一次读取。

最后一次读取：没有C，只有P，且P一定在词典中，输出P对应索引值，结束编码。

/** Input: string represented by string_code in dictionary,* Output: the index of character+string in the dictionary*       index = -1 if not found*/
int InDictionary( int character, int string_code){int sibling;if( 0>string_code) return character;//针对第一个字符没有母节点的情况sibling = dictionary[string_code].firstchild;//stringcode + characterwhile( -1<sibling){if( character == dictionary[sibling].suffix) return sibling;sibling = dictionary[sibling].nextsibling;}return -1;
}

在查找词典函数InDictionary中，输入参数分别为新读取单字符C的索引值与P的索引值。

输出值到LZWEncode赋值给index，-1则不在词典中，否则为P+C对应的词典索引。

此算法利用了sibling和firstchild值进行查找。其中0>string_code针对第一次读取，因为string_code初值为-1，之后都不可能为负值。而第一次读取时，C必定在默认词典中，直接返回C的索引值，输出index，进行下一次读取。

要查找P+C是否在词典里，可以根据词典树结构转化为先判断P是否有子节点，再判断P+C是否在其子节点群中。

一般情况下，首先查找P有没有子节点，若没有，则firstchild应为-1，说明P+C一定不在词典中，返回-1。

否则，则遍历P的各子节点中的suffix值是否对应C。没有对应则P+C不在词典中，返回-1。

若有对应的，即character == dictionary[sibling].suffix，即返回当前的sibling值，即该子节点的编号，也是P=C对应的词典编号。

void AddToDictionary( int character, int string_code){int firstsibling, nextsibling;if( 0>string_code) return;dictionary[next_code].suffix = character;dictionary[next_code].parent = string_code;dictionary[next_code].nextsibling = -1;dictionary[next_code].firstchild = -1;firstsibling = dictionary[string_code].firstchild;if( -1<firstsibling){    // the parent has childnextsibling = firstsibling;while( -1<dictionary[nextsibling].nextsibling ) nextsibling = dictionary[nextsibling].nextsibling;dictionary[nextsibling].nextsibling = next_code;}else{// no child before, modify it to be the firstdictionary[string_code].firstchild = next_code;}next_code ++;
}

写入词典函数AddToDictionary，输入参数为C和P。写入词典需要赋值dictionary各属性值，新节点的索引值为此时全局变量next_code的值。

写入算法步骤：

为新节点赋初值：suffix值为当前新读入的C，parent母节点为P。由于是新写入的词典项，一定没有子节点和下一个兄弟节点，firstchild和nextsibling为-1。
为父类节点及其子类节点更新属性值：首先判断父类节点之前有没有子类节点：
若没有，则将父类节点firstchild更新为当前的新节点索引值next_code。
如果有，则用nextsibling进行循环，其值依次为按顺序父类节点在此之前的的各子节点的索引值。（每个子节点的nextsibling的值即父类节点的下一个子类节点的值）通过此关系依次串联到最后一个子类节点后，将最后一个子类节点的nextsibling值设为当前next_code。
最后，next_code递增，为下一次写入词典做准备。

解码部分

void LZWDecode( BITFILE *bf, FILE *fp){int character;int new_code, last_code;int phrase_length;unsigned long file_length;InitDictionary();file_length = BitsInput( bf, 4*8);//预存输出码字if( -1 == file_length) file_length = 0;/*需填充*/last_code = -1;//第一个码字前无码字，设为-1,且第一个码字一定在默认词典(ascii)中，所以使用character一定有有效值，不必赋初值while( 0<file_length){new_code = input( bf);if( new_code >= next_code){ // this is the case CSCSC( not in dict)d_stack[0] = character;///dstack[0]存储当前last_code第一位phrase_length = DecodeString( 1, last_code);//空留dstack[0]，解码pW}else{//in dictphrase_length = DecodeString( 0, new_code);//解码cW}character = d_stack[phrase_length-1];//character取PW/CW的第一位while( 0<phrase_length){phrase_length --;fputc( d_stack[ phrase_length], fp);//倒序输出解码得到的字符file_length--;}if( MAX_CODE>next_code){ // add the new phrase to dictionaryAddToDictionary( character, last_code);}last_code = new_code;}
}

解码部分比编码部分更难懂些。牵涉的函数与编码部分相比增加了一个DecodeString，重复的函数语句解析不再赘述。

先来看主函数DecodeLZW。

character储存的是pW或cW中的第一个字符，没有赋初值，因为其值与上一次循环有关，用到时一定有有效值。new_code对应的每次新读取的cW，last_code则是由一次循环定义的pW（每次循环pW一定在词典中），phrase_length为此次解码的字符长度，由DecodeString函数返回值得到。dstack[]用来存储一次循环解码出的一串数据（字符串），倒序存储，输出时要倒序输出字符，fp为输出预存区。

第一次循环时，没有pW（因此last_code初值设为-1），cW一定在默认词典中，character赋值为cW字符（到下一次循环即为pW的第一个字符），直接输出cW对应的字符（一定是单个字符）。nextpW=cW（last_code = new_code），进行下一次循环。

之后每一次循环中，先判断cW是否在词典中：if( new_code >= next_code)

若在词典中，则比较简单：phrase_length = DecodeString( 0, new_code)，解码cW，dstack长度即为cW的长度，character赋值为cW的第一个字符（character = d_stack[phrase_length-1]），到下一次循环即为pW的第一个字符。接下来的while循环即为依次输出解码得到的字符dstack[]。将cW+pW第一个字符AddToDictionary写入词典中。最后nextpW=cW（last_code = new_code），进行下一次循环。

若不在词典中，则比较复杂：

phrase_length = DecodeString( 1, last_code)，解码pW，dstack长度即为pW的长度+1，dstack长度是pW长度+1，最终解码的是pW+pW第一位，且此值一定等于cW。而且此次循环cW不在词典中，则上一次循环中一定在（参考LZW编码步骤），可以保证character为当前pW第一位（上一次循环中character赋值为cW第一位）。将pW+pW第一位写入词典中。最后nextpW=cW（last_code = new_code），进行下一次循环。

最后一次循环中，没有cW，则执行语句phrase_length = DecodeString( 0, new_code)，解码pW并输出预存区，结束解码。

int DecodeString( int start, int code){int count;count = start;while( 0<=code){d_stack[ count] = dictionary[code].suffix;code = dictionary[code].parent;count ++;}return count;
}

DecodeString函数的作用为编码后字符倒序存储到dstack[]中，并返回dstack[]长度。

输入为start,即写入起始长度和code，即解码对象。

函数以start为起点将code解码写入stack。解码过程中利用了词典树中的parent属性。读取当前J节点的suffix后，将当前节点转换为母节点，可以一次得到一条链上各节点的suffix。倒序组起来即为解码得到的字符。code赋值为当前节点的母节点，循环到头时其值为-1。此时的count即为dstack的长度，作为返回值输出。

值得注意的是，对cW in or not in dict 情况下的两条解码语句：

//cW not in dict：
d_stack[0] = character;///dstack[0]存储当前last_code第一位
phrase_length = DecodeString( 1, last_code);//空留dstack[0]，解码pW
//cW in dict
phrase_length = DecodeString( 0, new_code);//解码cW

not in dict时，输入sart值为1，为倒序的最后一位pW留出位置，而cW in dict的情况则不用，start为0。

当前last_code第一位
phrase_length = DecodeString( 1, last_code);//空留dstack[0]，解码pW
//cW in dict
phrase_length = DecodeString( 0, new_code);//解码cW

not in dict时，输入sart值为1，为倒序的最后一位pW留出位置，而cW in dict的情况则不用，start为0。

LZW的压缩效果分析

选取几种常用的文件格式用以上代码进行压缩：

需要特别住注明的是，LZW是无损压缩，不会损失信息。我们常用的zipRAR就是用了LZW压缩。

本次实验共使用了十种学习中经常会碰到的文件格式进行LZW压缩，其中不同图片格式皆为同一图片。（输出为bit文件）

可以看到，虽然使用了压缩算法，但不是所有文件都变小了，神奇。

以图片文件为例，同一张图片，bmp获得了非常不错的压缩效果（898KB——43KB)，但同时tif，png，jpg都有不同程度的增大。

反观音视频文件wav，ts，avi都获得了一定程度的压缩，虽然效果没有bmp那么显著。

doc能够成功压缩，但pdf和pptx都变大了。

其实这种变大的现象是可以预见的，分析如下：

变大的现象往往多出现于数据重复度不高的情况。在这种情况下算法将不断写入新的词典项目，导致最终输出bit流仍然体积很大。光靠这个原因还不足以让文件变大。
同时，对LZW算法中源文件的编码单位的一个字节，转为比特流后变成了对应的两个字节。再结合重复度不高的特点，可以预见会出现文件体积不减反增的现象。
同时png等文件格式自身已经进行了一定程度的压缩，也许本身重复度就已经很低。