INFORMATION TECHNOLOGY
DATA COMPRESSION AND DATA RETRIVAL
SUBJECT CODE: 2161603
B.E. 6thSEMESTER
Prerequisite: None
Rationale: Data compression refers to the process of encoding information such that memory/transmission
capacity requirements are minimized. Though there is an exponential growth in memory and transmission
capacity, many high-bandwidth applications, such as digital storage and transmission of video, would not
work without compression.
Content:
3 Huffman Coding 6 15
The Huffman Coding Algorithm 41
Minimum Variance Huffman Codes
Adaptive Huffman Coding
Update Procedure
Encoding Procedure
Decoding Procedure
Golomb Codes
Rice Codes
Tunstall Codes
Applications of Huffman Coding
Lossless Image Compression
Text Compression
Audio Compression
4 Arithmetic Coding 5 10
Introduction
Coding a Sequence
Generating a Tag
Deciphering the Tag
Generating a Binary Code
Uniqueness and Efficiency of the Arithmetic Code
Algorithm Implementation
Integer Implementation
Comparison of Huffman and Arithmetic Coding
Adaptive Arithmetic Coding
5 Dictionary Techniques 6 15
Static Dictionary
Digram Coding
Adaptive Dictionary
The LZ77 Approach
The LZ78 Approach
Applications
File Compression—UNIX compress
Image Compression—The Graphics Interchange Format (GIF)
Image Compression—Portable Network Graphics (PNG)
Compression over Modems—V.42 bis
6 Predictive Coding: 6 10
Prediction with Partial match (ppm):
The basic algorithm,
The ESCAPE SYMBOL,
Length of context,
The Exclusion Principle,
The Burrows-Wheeler Transform:
Move-to-front coding
Lossless Image Compression
CALIC, JPEG-LS, Multi-resolution Approaches
Facsimile Encoding
Dynamic Markoy Compression.
7 Mathematical Preliminaries for Lossy Coding 06 10
Distortion criteria, Models,
The Quantization Problem
Uniform Quantizer
Adaptive Quantization
Forward Adaptive Quantization
Backward Adaptive Quantization
Nonuniform Quantization
pdf-Optimized Quantization
Companded Quantization
8 Vector Quantization 07 10
Advantages of Vector Quantization over Scalar Quantization
The Linde-Buzo-Gray Algorithm
Initializing the LBG Algorithm
The Empty Cell Problem
Use of LBG for Image Compression
Tree-Structured Vector Quantizers
Design of Tree-Structured Vector Quantizers
Pruned Tree-Structured Vector Quantizers
Structured Vector Quantizers
Pyramid Vector Quantization
Polar and Spherical Vector Quantizers
Lattice Vector Quantizers
9 Boolean retrieval 04 10
An example information retrieval problem
A first take at building an inverted index
Processing Boolean queries
The extended Boolean model versus ranked retrieval
The term vocabulary and postings lists
Document delineation and character sequence decoding
Obtaining the character sequence in a document
Choosing a document unit
Determining the vocabulary of terms
Tokenization
Dropping common terms: stop words
Normalization (equivalence classing of terms)
Stemming and lemmatization
Faster postings list intersection via skip pointers
Positional postings and phrase queries
Biword indexes
Positional indexes
10 XML retrieval 02 5
Basic XML concepts
Challenges in XML retrieval
A vector space model for XML retrieval
Evaluation of XML retrieval
Text-centric vs. data-centric XML retrieval
Note: This specification table shall be treated as a general guideline for students and teachers. The actual
distribution of marks in the question paper may vary slightly from above table.
Reference Books:
Course Outcome:
List of Experiments:
1. Write a program that compresses and displays uncompressed windows BMP image file.
2. Write a program to generate binary code in case of arithmetic coding.
3. Implement Huffman Code(HC) to generate binary code when symbol and probabilities are given.
4. Implement Huffman code which can compress given file and decompress compressed file.
5. Implement adaptive Huffman program to compress decompressed file.
6. Write a program to Implement LZ77 algorithm.
7. Write a program to Implement LZ55 algorithm.
8. Write a program to Implement LZ78 algorithm
9. Write a program which performs JPEG compression, process step by step for given 8x8 block and
decompression also.
10. Write a program to find tokens from the files and eliminate stop words.
11. Write a program to implement vector space model for XML retrieval.
Major Equipment:
Computer ,Laptop
1) http://ocw.usu.edu/Electrical_and_Computer_Engineering/Information_Theory/