?? lzwtut.htm
字號:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>LZW Compression Tutorial</title>
</head>
<body>
<b>LZW Compression Tutorial Version 1.01</b>
<hr>
<p>
By Martin Zolnieryk aka Hard Rock. <br>
A Stars Dev Company Production (c) 2004<br>
This Version: Aug 5,2004<br>
Email: <a href ="matilto:hard_rock_2@yahoo.com"> Hard_rock_2@yahoo.com</a><br>
<br>
</p>
<p>
<b>ChangeLog</b>
<hr>
<b>Version 1.01</b>
<ul>
<li>Lots of Spelling/Grammer Error Fixes</li>
</ul>
<b>Version 1.0</b>
<ul>
<li>First Release</li>
</ul>
<b>Table of Contents</b>
<hr>
<p>
<ul>
<li><a href = "#intro">1. Introduction</a>
<ul>
<li>a. About</li>
<li>b. Compression</li>
<li>c. LZW</li>
</ul>
</li>
<li><a href = "#compression">2. Compression</a>
<ul>
<li>a. Psuedo Code</li>
<li>b. How it works</li>
<li>c. Table</li>
</ul>
</li>
<li><a href = "#decompression">3. Decompression</a>
<ul>
<li>a. How it Works</li>
<li>b. Psuedo Code</li>
<li>c. Example</li>
</ul>
</li>
<li><a href = "#gif">4. The Gif</a>
<ul>
<li>a. What is it?</li>
<li>b. The Gif Format</li>
<li>c. Psuedo Code</li>
<li>d. Advanced GIF Topics</li>
</ul>
</li>
<li><a href = "#code">5. Source Code</a>
<ul>
<li>a. About</li>
<li>b. Notes</li>
<li>c. C Code</li>
<li>d. Java Code</li>
</ul>
</li>
<li><a href = "#ref">6. References/The End</a>
<ul>
<li>a. Bibliography</li>
<li>b. Other Sources</li>
<li>c. Patent</li>
<li>d. Libraries/Tools Used</li>
<li>e. The End</li>
</ul>
</li>
</ul>
</p>
<a name = "intro"><b>Introduction</b></a>
<hr><br>
<b>1a. About</b>
<p>
Hey, welcome to my first tutorial. I've decided to write this tutorial on compression, specifically LZW and GIF, as the majority of tutorials on this subject I found, were hard to understand, therefore I started this tutorial to hopefully fill that gap. Compression admittedly takes some time to master so I've tried to make things as simple as I could. Once you've learned it however, you'll find it really simple. I've included a little list of references at the end of the tutorial, so hopefully if I missed something, or you're still unclear, you have some places to look. And if anyone finds any errors in this tutorial, email them to me at <a href = "mailto:hard_rock_2@yahoo.com">hard_rock_2@yahoo.com</a> and I will fix them as soon as I can. If you don't find any errors, or enjoyed reading this, then email me anyway since I like feedback. Maybe you'll even motivate me to write more tutorials! Of course source code is included with this tutorial, with sources of C/C++ and Java.
</p>
<b>1b.Compression</b>
<p>
Now let抯 start off. There are millions practical uses of compression, and a common one being the method of transferring data over the Internet. For that most Windows users use either .zip or self extracting/installing .exe files both of which are compressed. (These are not all of course; there are hundreds of compression formats). Now that we know how to use it and what it is, we can start!
</p>
<b>1c.LZW</b>
<p>
The compression format that will be discussed in this tutorial, is Lempel-Ziv-Welch or LZW compression. Rather than fill you with a history of the format we'll get right down to how to use it within your own programs. Let's discuss LZW compression, how does it work? Basically LZW creates tables of strings while compressing/decompressing data, and then later outputs the ID of the string rather than the string itself saving space. Decompression reads this and it to build a table of strings as it reads, saving the compression from needing to write large string tables at the beginning of the file, thus making it very effective. If this confuses you, don't worry we have the whole tutorial to explain it, as well as many link to online references. Well let get started and focus on how it works, worrying about the technicalities later.
</p>
<a name = "compression"><b>Compression</b></a>
<hr><br>
<b>2a.Psuedo Code</b>
<p>
Now we will learn how to actually compress data! Wohooo! Are we excited yet? Let's show some simple pseudo code and then we'll dissect it to see how it works:
<br><br><b>Pseudo Code</b>
<pre>
Initialize String Table
Add next char from charstream to String_Buffer
START LOOP here
get Next_Char from charstream
if String_Buffer + Next_Char is in string table
add Next_Char to String_Buffer
else
output code for String_Buffer
add String_Buffer + Next_Char code to table
clear String_Buffer and equal it to Next_Char
END LOOP when at end of charstream
</pre>
</p>
<b>2b.How it Works</b>
<p>
Whoa, that looks sweet, it looks so easy! This is more or less how you'd actually do it, although there are a few minor gripes, but this is for explanation purposes only. Rather then try to explain piece by piece how it works, why not use an example?
Let say we have a set of words say:<br>
__________________________<br>
CatCatInTheHatAndTheRat <br>
__________________________<br>
(This makes absolutely no sense I know, but it'll provide a good test bed for our code here, capitalization used to make reader friendly)<br>
Here are the steps are program would take:<br>
<ol>
<li>C is added to the String_Buffer.</li>
<li>Now it reads Next_Char, Next_Char = 'a', "Ca" and it is not in the string table so we output "C" to the file. "Ca" is given the code of 256 (0-255 are ASCII codes).</li>
<li>String_Buffer is now set to only equal Next_Char, "A"</li>
<li>Now it reads Next_Char, Next_Char = 't', "at" is not in string table so we output "a" to the file. "at" is given the code of 257.</li>
<li>Now it reads Next_Char, Next_Char = 'C', "tC" is not in string table so we output "t" to the file. "tC" is given the code of 258.</li>
<li>Whoa, Wait a second. Now when Next_Char is read, Next_Char = 'a', "Ca" and it is IN the string table! So now String_Buffer = "Ca"</li>
<li>Now it reads Next_Char, Next_Char = 't', "Cat" is not in string table so we output "256" to the file. "Cat" is given the code of 259.</li>
</ol>
This is basically how it would go on, until the entire file is complete! Cool eh! Now the Results are:<br>
__________________________________________<br>
CatCatInTheHatAndTheRat (uncompressed)<br>
__________________________________________<br>
Cat256TInTheH257And263eR257 (compressed)<br>
__________________________________________<br>
Sweet huh? We dropped, what 4 whole bytes! But realize that because when you compress, you have to use more bits per character to write out everything in order to write a code. Here a minimum of 9 bits (up to code 511) is required so in reality we haven't dropped 4 bytes, using 9 bits we would have dropped a single byte. This type of compression is a lot more effective when you have a larger amount of data, so the overhead goes away and the compression really starts to work! But really, would you normally try compressing a 23 byte files? I think not.
Still a little confused? Let抯 take a look at a table of values:
</p>
<b>2c. Table</b>
<p>
<pre>
String_ Next_ Code Code Output
Buffer Char Value
C a 256 Ca C
a t 257 at a
t C 258 tC t
Ca t 259 Cat 256
t I 260 tI t
I n 261 In I
n t 262 nt n
T h 263 Th T
h e 264 he h
e H 265 eH e
H a 266 Ha H
at a 267 atA 257
A n 268 An A
n d 269 nd n
d T 270 dT d
Th e 271 Th 263
e R 272 eR e
R a 273 Ra R
at - --- --- 257
</pre>
You have to only look at the table and see how easy it really is to use this compression, it抯 not that complicated right? And be sure to check out the source code to see exactly how it works! Right now let抯 march onto...
</p>
<a name = "decompression"><b>Decompression</b></a>
<hr><br>
<b>3a.How it works</b>
<p>
Were moving along at blinding speed here!
Compression of course, can be pretty useless if you can抰 use the compressed data, and that's what decompression is for, so we can use that now compressed data, so lets get started. Remember those cool string tables you saw earlier aren't saved, so once again we'll have to generate them as we go. Also there is an exception we have to check for, is if a code is listed but not in the table, this usually happens with highly repetitive data, and is easily remedied. Let抯 get some pseudo-code and see what we have:
</p>
<b>3b. Pseudo Code</b>
<p>
<pre>
Initialize String Table
get First_Code from charstream
output First_Code
START LOOP here
get Next_Code from charstream
if Next_Code is NOT in the string table
String_Buffer = translated First_code +
first byte of First_Code
else
String_Buffer = Translation of Next_Code
add translated First_code + first byte of
First_Code to the table
First_Code = Next_Code
output String_Buffer
END LOOP when at end of charstream
</pre>
I don't think there will be much explanation needed, now that you know how the compression algorithm works, it shouldn't be too difficult to decipher this code. Just remember that the compressor has already done all the work and we need to continue to build the string table both when compressing and decompressing data.
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -