LZW (Lempel Ziv Welch) : 60.1 Brief History
LZW (Lempel Ziv Welch) : 60.1 Brief History
LZW (Lempel Ziv Welch) : 60.1 Brief History
Every programmer may have the knowledge about data compression. Data compression
is the process of reducing the size of the data file. One method of achieving this is by eliminating
redundant data. There are many other methods for data compression. In this chapter lets see
LZW (Lempel Ziv Welch) algorithm. This algorithm is not much known to people as many books
on algorithms ignore this neat algorithm.
A to Z of C 595
elements in the table. So when we store each element in the table it is to be converted to a 12-bit
number.
For example, when you want to store A(dec-65, hex -41), T(dec-84, hex-54),
O(dec-79, hex-4F) and Z(dec-90, hex-5A), you have to store it in bytes as 04, 10,
54, 04, F0, 5A . The reason is, we have allotted only 12-bits for each character.
Consider a string ATOZOFC. It takes 7x8(56) bits. Suppose if a code is assigned to it as
400(190h), it will take only 12-bits instead of 56-bits!
6.
7.
8.
9.
60.3.2 Example
Input string:
Characters Read
A
T
O
Z
O
F
C
A
T
O
Z
O
F
C
A
ATOZOFCATOZOFCATOZOFC
String Stored
/ Retrieved
Process in
Table
AT
TO
OZ
ZO
OF
FC
CA
AT
AT0
OZ
OZO
OF
OFC
CA
Store
Store
Store
Store
Store
Store
Store
Retrieve
Store
Retrieve
Store
Retrieve
Store
Retrieve
In file
Store
Store
Store
Store
Store
Store
Store
Store
Store
Store
Store
Relevant
Relevant
Relevant
Relevant
Code
Code
Code
Code
596 A to Z of C
Characters Read
T
O
Z
O
F
C
String Stored
/ Retrieved
CAT
TO
TOZ
ZO
ZOF
FC
Process in
Table
Store
Retrieve
Store
Retrieve
Store
Retrieve
In file
Store Relevant Code
Store Relevant Code
Store Relevant Code
In this example-string, the first character A is read and then the second character T.
Both the characters are concatenated as AT and a code is assigned to it. The code is stored in the
Code table. Since this is the first string that is new to the table, it is assigned 256(100h). Then the
second and the third characters are concatenated to form another new string TO. This string is
also new to the Code table and the table expands to accommodate this new string and it is
assigned the next code 257(101h). Thus whenever a new string is read after concatenation it is
assigned a relevant code and the Code table is build. The table expands till the code reaches 4096
(since we have assigned 12-bits) or it reaches the end of file.
When the same set of characters that is stored in the table is again read it is assigned to
the code in the Code table. Thus according to the number of bits specified by the program the
output code is stored. In other words, if we have extended the bits from 8 to 12 then the
characters that is stored in 8-bits should be adjusted so as to store it in 12-bit format.
A to Z of C 597
8.
60.4.2 Example
Consider the same example given above and do the decompression.
Compressed
Bytes
(in hex)
04
10
84
04
F0
5A
04
F0
46
04
31
00
10
21
04
10
61
01
10
31
05
Here each byte is read one by one as hexadecimal code and 3 of the bytes are combined
so as to convert them from a 12-bit format to a 8-bit character (ASCII) format.
Thus the bytes 04, 10 & 84 are combined as 041084. The combined code is split to get
A(041) and T(084). The table is also built concurrently when each new string is read. When we
read 100, 102 etc., we can refer to the relevant code in the table and output the relevant code to
the file. For example, when we reach the 4th set of characters and read 04, 31 and 00 they must be
converted to 12-bit form as 043 and 100 will refer to the code in the table and outputs the string C
and AT respectively. Thus we can get all the characters without knowing the previous Code table.
Suggested Projects
1. Write your own compression utility using LZW algorithm.