Concept in Information and Processing

1
CONCEPTS IN INFORMATION AND PROCESSING
Contents
Information Technology An Overview of Current IT Application What is the Difference between Data and Information? Information System Important Data Types Value of Information Quality of Information Data Compression Encoding vs Compression Entropy of Information Number System
1
1.1
INFORMATION TECHNOLOGY
The last decade in the global arena has witnessed a tremendous growth in the area of information technology. Rapid advances in the technologies for communication media like television, computer, internet, printing and publishing has enabled us to get prompt access to required information. The computer is the most versatile machine man has ever made. The use of computer at home has become a reality and the use of computers at work is very common. Now almost all the government departments and commercial organizations have accepted the computer as a major tool to renovate their function. Computers are being used in multiple areas ranging from solving intricate scientific problems to art, cultural, historical, accounting, financial, medical and even domestic sectors. Truly, with Information Technology, the computers has made a significant impact on all dimensions of our day to day life, e.g. reservation of air and railway tickets, buying and selling items on Internet, electronic market, bank transaction on net, entertainment, education, communication, hotel reservations and so on. Information Technology has replaced the conventional methods to solve technical and operational problem by introducing a much faster and more convenient method which is based on its ability to access large and complex pools of data. Initially computer could process information contained in the form of text only. A text is written with letters, digits and other characters which you can read. Later it was also realized that the information contained in form of images, animation, audio, video can also be processed. Imagine, if you have to create a database of your friends for future references, you will have to create the database using attributes like Name, Date of Birth, Father Name, Telephone No., Street, City, Pin Code etc. Just think, how good it would be if you could store the image of your friend, his voice or video clip in which he is seen to your database. The pressing demand for storage and retrieval of data represented in multiple forms like Text, Image, Animation, Graphics, Audio, Video has given a new direction to computer scientists and technologists to process information stored in multiple formats. All this has revolutionized information technology. Information Technology is a generic name for the following functions: 1. Information/Data Representation 2. Information/Data Storage 3. Information/Data Retrieval and Processing 4. Information/Data Communication
2
The computer is as a tool to do the above mentioned tasks effectively, efficiently and extremely quickly.
1.2
AN OVERVIEW OF CURRENT INFORMATION TECHNOLOGY APPLICATIONS
Among the fundamental computer applications are processing, storage and retrieval of information and developing effective technologies for communicating the information represented in various formats. The information may be contained in form of text, image, graphics, audio, video or animations. An important application is Video on Demand. The video on demand is very common now-a-days. The cable TV operator provides services to watch any video clipping, movie or any favorite TV program. The channel is established from the computer at home and the cable operator. You may surf the TV program and select any program of your choice by selecting the appropriate program on your computer. In such cases, the compressed video is transmitted over the communication channel, usually the cable, and is decompressed on your computer while playing. All video cassette player functions are provided at your computer to record, play, forward or rewind. Another important application is multimedia conferencing. It is now possible to arrange meeting between several executives when they are not physically present at one place. Using current technologies, a group of persons can talk and discuss with each other as though they were present in one room. Anybody who will speak will be listened by everybody. This is achieved using a underlying high bandwidth channel which is able to transmit the video data at an extremely fast rate. Applications like home shopping or shopping on web, knowing the details of the items to be purchased in the form of images, graphics or video are very common today. All healthcare systems using Telemedicine or Geographic Information System require a high bandwidth as in all such cases it is necessary to communicate video or graphics. The information contained in any format other than text requires high storage capacity. Storage, retrieval and processing of such information is a costly affair because of two reason, namely, lack of bandwidth and lack of effective tools and technologies to handle such large information. Apart from the applications described above, the Information Technology concepts are being used in business applications ranging from inventory control, preparation of various business documents like invoices, pay bills, salary statements, issue/dispatch transactions, accounting and financial management, account wise consumption, analysis report, sales report etc. There exist number of special purpose business system developed to meet the specific requirement of a company or business. Central to these software packages are modules to handle human resource, invoices, accounting etc. The requirement to bring all the activities of a business organization under single software has led to the development of ERP systems. The Enterprise Resource Planning (ERP) systems are bundle of the software which includes the standard business practices. These softwares are customized according to the need of an enterprise and provides the tailored solution to the enterprise. Information Technology is playing a significant role in standardization of different processes in banks. Banking has taken a major lead in past few years after deploying the Information Technology. Now it has become possible to transfer the balance, internet banking, Tele-services and using automatic tailor machines. Time, effort and money required to monitor the business processes in the banks has been reduced drastically in past
FOUNDATIONS OF INFORMATION TECHNOLOGY
few years. EDI (Electronic Data Interchange) has allowed the different automated/computerized organizations to transfer the documents electronically. EDI has reduced the cost of transportation, reduced paper work, minimum human interaction and faster exchange of the document within the organization. This is not all, Information Technology application to different areas such as hospitals, medicine, reservations, tele-shopping, manufacturing, communication etc., are very common. The process of updating the conventional practices through Information Technology in the different organization is still going on.
1.3
WHAT IS THE DIFFERENCE BETWEEN DATA AND INFORMATION ?
It is generally not easy to decide as to when a particular piece of text, numbers, tables, images, graphics serve as merely data and when they become information. In fact, there is no hardline to tell us that a piece of text or sample of numbers represent data or information. Let us take an example. The government has launched a polio vaccination drive to eradicate polio from India. In this programme, officers or executives at different levels have been deployed. The top level of executives monitor the overall progress and might be interested about the success at the national level. Similarly, the next level of executives watches the progress at the state level, the next at the zonal, district, block and village levels. The top level has fixed a target that vaccination of a certain percentage of population at the national level be achieved. To monitor the overall progress at a particular time, the top level collects the data from each state and process that data to know the current status. Similarly, at State level, data are collected from Zones and processed subsequently. Data from lower levels are collected and processed to find the current status at the upper level. The result of processing of data at each level serve as information at the next higher level. For example, suppose there are 100 villages in a particular block. If executives at block level are provided with vaccination data of all one hundred villages, then it will probably not be of much importance. However, if after processing of all such hundred data, if the average percentage of vaccination at block level is obtained, then this figure will be of much importance to executives at the block level. The executives at block level then may take decisions based upon the figure obtained after processing the data. This processed figure thus serves as information at the block level. The data are the basic facts and figures which may be used as a historical record about say, a company or an organization. These may be assembled together in the form of files, reports, graphs, payrolls etc. If raw data is processed as par certain rules or policy, the results obtained (if they are meaningful) are called information. The word meaningful here signifies that on which executives or the management may take decisions. It may be noted that information obtained at a certain level may serve as raw data for further information at another level. That is probably the reason that data and information words are used interchangeably. Strictly speaking, data consists of numbers, text etc. that a computer processes according to certain procedures to produce information. The computer can be used to organize the raw data in some order so that it becomes information. Preparing charts, tables, reports, work sheet etc. are examples of creating information from raw data. We may therefore conclude that processing data is a cyclic process and at every hop we receive more meaningful data as evident from Figure 1.1.
5
Information Data obtained in the form of Chart/Table/Text or Multimedia Presentations
Raw Data Numbers/Text/Sound Image/Audio/Video
Data Processing
Refining Information (Next Hop)
Figure 1.1
1.4
INFORMATION SYSTEM
The past decade has witnessed tremendous growth in the information innovation and application. Information Technology has become a vital component for the success of business because most of the organizations require fast dissemination of information, information processing, storage and retrieval of data. Today management of an organization involved in the business requires high speed processing of huge amount of data, fact and figures. High speed communication between organization, customers, clients etc. is also playing an important role to achieve high business goal. These requirements of modern business led to development of a business information system which provides appropriate information to appropriate person in desired format and at correct time. The timely processing of data also helps and enable management to take important decision at earliest possible time. Information System may be defined as organized collection of human, software, hardware and communication equipment and database, in which the person controls, process and communicate the information. The overall objective of the Information System is to gather the data, processing of data communicating the information to the user of the system. User group includes the person from all level i.e. top, middle and operational level. The information obtained from the information system allows the different persons to take decisions. To provide the appropriate information to user, it is necessary to collect the data, process and output of the data. Information System may include feedback mechanism under which processed data or output are fed back to the system to make changes in processing activities. For example, sales, inventory report generated may be fed back to appropriate managers to take appropriate decision in time. Therefore, the high end information systems are designed around feedback and control machanism, based on user-based criteria to produce and communicate the information for planning and control of business. Information System may be broadly categorized into two categories (i) Manual (ii) Computer Based Information System (CBIS). As discussed before, the major objective of the information system is to collect, process and disseminate the data to appropriate user. Traditionally, the business analyst in the organization study the pattern of investment, expenditure, sales etc. to evaluate the performance and to take decision for future. These analyst used to collect the data and prepare the report in the form of chart, table, graph etc. to analyze the business. Now-a-days, the requirement of a business analyst may be programmed and a computer based system may be developed to study and analyze these reports. These Information System are called Computer based Information System. For example, in earlier days the rail reservation system was manual. Traveller used to fill application form and allotment of seat in different quota on different train. These reservation used to be on the basis of certain well defined rule. After the introduction of the computer, these rules and guidelines have been programmed
in computer along with the required software that has emerged as reservation agent. We may say that the Information System existed previously but it was manual. The new Information System, which used computer as central component, is known as computer based Information System. Basic components of a computer based Information System are: 1. Users 2. Hardware/Communication Equipment 3. Software 4. Database 5. Set of Methods 1. Users: are one of the most important components of the Information System. These users include the different group of persons who manages the system and those who retrieve the information from this system take decisions. Another set of the users are those who not only retrieve the information but also provide the information to information system. For example, marketing and sales personnel provide the details of sale etc. to the Information System. 2. Hardware/Communication Equipment: In the modern business, it is not only necessary to gather and process information but the fast dissemination of the information is also essential. Lot of organizations maintain constant touch with a large customer base. It requires that the Information System at an organization must be computer network enabled and must be able to communicate the information through internet or other communication channel. All hardware, Network and communication equipment forms an important component for a computer based information system. 3. Software: A software is a collection of programs, which do a specific tasks. Different rules, methods and practices prevailing in a business organization are coded into the programs or software. The software once installed in computer system is considered as most important component of information system. These programs process the data and generate report such as sales report, invoice, bill etc. for customers and generate different reports for the managers. 4. Database: Database is a structured collection of data. The software or programs fetch the data from the database and process them as per the requirement. The database may contain the customer and employee record, data pertaining to sales, inventory, account etc. The raw data gathered from the field by sales or marketing persons, from customer etc. are stored in the database. To develop an efficient Information System, it is necessary to have a good design of database. The Information System are said to be built on top of database and performance of Information System depends on the underlying database. 5. Set of Methods: Set of methods is another important component of Information System. The set of methods refers to the tradition and practices prevailing in the business house where the Information System is used. Various traditions, practices, which govern the business, are laid down in the form of rules which are then coded into the programs. These rules or methods changes from time to time whenever any new business practice is adopted or any change in the business environment is observed. The Information System must be adaptable to these changes and must be flexible to incorporate the changes in the business environment. 1.4.1 Types of Information System Following are the motivating factors for any business enterprise to use information system: 1. Information Systems support for business processes and practices. 2. Information Systems support for decision making. 3. Information Systems support for the innovative planning.
Depending upon the specific requirement of users, various types of information systems may be developed. Based on the specific requirement of organization and need of user, information system may be categorized into the following categories: 1. Transaction Processing System 2. Management Information System 3. Work Flow System 4. Decision Support System 5. Expert System 1.4.2 Transaction Processing System (TPS) A transaction processing system is a traditional system which is combination of people, software, hardware and database. The main focus in these systems is on completion of a business transaction. The objective of these systems are to reduce the cost, effort and automation of business activities in the organization. For example, business transcations in an organization includes activities like raising an invoice, acceptance of sales order, receipt and dispatch of item from store etc. A business transaction is considered as an atomic activity. It is therefore necessary to complete the business transaction otherwise the underlying database may enter into inconsistent state. Suppose, a sales order is received by an organization from a client, after the receipt of sales order a chain of activities needs to be invoked. These involves, informing manufacturing unit to raise requirement of items, sales department, accounts, shipping etc. If any of the related activity is not completed, required modification to the database may not occur. This situation may lead disaster because incomplete or inconsistent information may jeopardize the business activity. The nature of these transactions may vary from one organization to another. The information system processes these transactions as a basic activity which satisfies the organizations day to day need. There may exit a number of transactions in the organization which need to be completed for full assistance of persons working at operative level and top management. These systems ensure timely and correct completion of the job. A transaction processing system deals with the transaction in two different ways. 1. Batch Processed Information System 2. On Line Transaction Processing (OLTP) In the batch processing, the different transactions are queued and they are executed one after another. These transactions keep modifying the data or database and preceding transaction operate on the data processed by previous transaction. Payroll system, electricity billing, telephone billing are examples of batch processed system. These activities are triggered at required time and result in fetching the data from the database and prepare the reports like marksheets, telephone bills etc. These transactions also modify the database when required. The On Line Transaction Processing System (OLTP), in contrast to batch processing, process the data instantaneously. The OLTP systems are becoming more popular now-a-days as they provide instant services to customer. The request raised by either customer or any other person are instantly (on line) processed by the computer. Good example of OLTP systems are railways reservation system banking system etc. However, OLTP, requests are processed instantaneously whenever they are submitted. The OLTP is the system in which operational level support to organization is provided by processing the data through business transactions. These requests retrieve and store the data in database on line. Any failure in these systems might become a costly affair, as recovery from the failure is time consuming and an intricate affair. There exist another type of transaction processing called Real Time Transaction Processing. In Real Time Transaction Processing System, not only transactions are processed on line but also the deadlines are maintained.
In the mission control operation, it is not only important to process the data but it is more of importance that the transactions are completed within deadline. 1.4.3 Management Information System (MIS) On Line Transaction Processing Systems provide the operation level support to the organization by processing the data through business transactions. These business transactions are submitted to the system time to time. MIS is used in those organizations, where information in form of reports, presentations is required by the management to take decisions. The Transaction Processing Systems are based on merely processing a business transaction. In MIS, the requirement is much higher as different areas of an organization like accounts, inventory, sales, purchase, marketing etc. needs to be tightly integrated to provide collective information to the management. MIS provides reports or feedback to the management with appropriate data, which arises from transaction processing systems. For example, MIS may be used by finance controller of huge organization to view daily budgetary positions in the budget heads. A sales manger may seek the report from MIS to judge the performance and work of their sales representatives. MIS also helps getting scheduled report of income, weekly report of sales etc. 1.4.4 Workflow System Workflow systems in an organization are used to manage and control the interrelated activities required to perform a business goal. These systems help users, employees and managers to evaluate and control the status of different interrelated tasks. These systems are based on certain rules that control the flow of the tasks. Primary objective of workflow systems is to provide tracking and routing of tasks or documents from one process to another. For example, in any typical university, a student falling short of attendance is required to take permission before appearing in the examination. Suppose the rules state that if a students attendance falls short up to ten percent then permission from head is required; if the attendance falls short up to twenty percent then permission from principal is required; if the attendance falls short of twenty-five percent or more then permission of dean is required. If all officers of university and students are connected via network, a student may download the application form and submit it electronically. The various steps i.e. routing of application from one desk to another will be monitored and permission from the concerned persons will be transmitted to student for the examination cell. There exist few workflow system tools out of which Lotus Notes, MS Exchange and Novell Group Ware are popular. Major advantages of workflow system include reducing time due to retyping, filling the option form and reports, and amount of work towards the reconciliation of several reports. 1.4.5 Decision Support System As we have discussed that MIS is helpful in meeting the organizations requirement to automate the business process and produces required information to employee or manager. MIS helps the organization to do the different task correctly but lacks in decision-making capabilities. Decision Support System supports management solving business problems. It often may not be solved by management information system. For example, many time management needs to decide which product of company should be continued and which product be discontinued. Deciding the areas, location and condition where a particular product have better sales prospects. These decisions are based upon certain underlying fact and feedback obtained by a company and its representatives. Taking these decisions MIS which merely provides processing data and also provides the information, are not sufficient. It requires to prepare the information specific formats and certain organization specific methdos needs to be deployed to take appropriate decision. After introduction of MIS at a later stage, organization has started feeling that MIS are not able to meet the decision making requirement of the management, as management had to
remain dependent on the MIS for getting appropriate information for decision making. A Decision Support System is a collection of software and hardware to support decision-making in specific environment or problem. The main objective of decision support system is to suggest the right options. Most of the cases, to solve complex problem where information to make effective decisions are difficult to obtain, the Decision Support System are used. Decision Support System are often designed as per the managers requirement and plays a vital role in making managerial judgements. Decision Support System are designed around the business policies and methods for decision making and supporting database to provide information. 1.4.6 Expert Systems Expert Systems are used to solve the problems of individual by providing expert decision making. These systems use Artificial Intelligence to solve the problem that requires significant human expertise. To the core, Expert Systems are computer based systems that emulate the decision making capability of human expert. Emulation means that computer system acts as an expert. The general purpose MIS are used to gather information from the database and decision support system helps us in decision making process, the expert system goes beyond the scope of MIS and DSS, Expert System provides the expert guidance to make use of a specialized knowledge required for decision making. These systems incorporate the knowledge which are not available to most of the people. The work Expert System and knowledge based system are often used interchangeably. One of the classical expert systems MYCIN was developed to provide the expert guidance to individual for medical diagnosis. In contrast to the expert system, several knowledge based system has also been developed for providing knowledge as an intelligent agent to human expert. Most of the expert systems are designed around knowledge base and inference engine. The user enters the information and expert system provides the response by invoking inference engine which draws the conclusion from the basis of information stored in knowledge base. One of the limitations posed by the expert system is that the knowledge and the techniques used by inference engines limit its performance. If the knowledge base does not have knowledge or information about any one of the facets, it may not provide the expert guidance.
1.5
IMPORTANT DATA TYPES
The most popular way of representing information is in the textual form. In this form, a combination of letters, numerals and some special characters are used. However, today there are several other ways in which data can be represented. These are Text, Image, Graphics and Animation, Audio and Video forms. 1.5.1 Text Text is a collection of alphabets (both lower and upper case), numerals (09) and special characters (* , ? , : , # ) etc. Data presented in textual form may be written and read. The information content in the text can be determined only after reading and interpreting it. Any collection of these characters does not constitute information; it is necessary to organize the characters according to some order or plan, then only it can have informative value. 1.5.2 Image Images are another form of data type. Images refer to data in the form of pictures, photographs, hand drawings etc. Suppose we have to create a database for the employees of an organization to develop identity cards with photographs of the employees. To generate the identity card, it is required to store several attributes of employees. These are Employee Id, Employee name, Date of Birth, Address, Telephone Number etc. All this information may be stored in a textual form and may be printed on the
10
identity card. A good and effective database of employees requires that the photograph of employees should also be stored. Collection of all attributes represented in textual form may not generate the photograph. While generating an identity card, the photograph of an employee will also be printed simultaneously with printing other textual attributes. A different software would be required to generate images like photographs. Information may be represented in the form of images. These images may be processed and several software programs have been developed to process images. Editing of images includes changing the size of object in images, changing the background, modifying the colors, shading, zooming an object on image etc. All of these changes the image or photograph, thus changing or modifying the information contained in the image. 1.5.3 Graphics and Animation Graphics and animations are another way of presenting information. For example, if you have to present the information about an organization systematically, it is possible to combine together the text, images and sound pertaining to that organization in order to prepare a good presentation. There are various progress for preparing this type of presentation, as for example, Microsoft Powerpoint tools. Powerpoint comes with music, sounds, and videos you can play during your slide shows. You can also insert music, sound, or video clips wherever you want it on the slide. It is also possible to add different animation effects to make the presentation more effective. The following are popular graphics file extension used by Microsoft: Enhanced Metafile (.emf) Joint Photographic Experts Group (.jpg) Portable Network Graphics (.png) Windows Bitmap (.bmp, .rle, .dib) 1.5.4 Audio Audio is the data in the form of sounds. Different type of sounds produce important information. For example, the sounds obtained through medical devices of the Heart, Speech or voice of any person provide important diagnostic information to the doctors. The meaning or value of information contained in audio can be interpreted by hearing. The audio may be stored in a database in the form of files. Audio data may be processed by the computer, as for example, mixing of sound, modifying the sound parameters like frequency, pitch, amplitude, bass etc. 1.5.5 Video Video is another important data format to hold information. It basically combines sound and stack of images and these are displayed over a period of time. This format stores synchronized play of both sound and image, putting them as a sequence of images. These images are called frames. Different frames are juxtaposed and so produced that it seems as though the objects are moving as in real life. Storing a clip of video takes maximum storage space. Video can also be processed in a similar way as sound and images. avi and .dat are popular extension of files holding video data.
1.6 VALUE OF INFORMATION

The need for information is a fundamental ingredient of any development process in society. The emergence of information triggers the development process. The modern society may be termed as Information Society, as it is characterized by increasing responsiveness towards the individuals need
11
for information. This society motivates the individual human being to engage in productive businesses that are knowledge based and knowledge generating. The value of information has been seen as a dynamic resource. The chronological development of society may be seen in three phasesAgricultural society, Industrial society and Knowledge based society. In earlier times, the society was mainly dependent on agriculture and agriculture based activities. Different societies during those times were quite isolated. During the past 400 years after the Industrial Revolution took place, industrial activities, business, trade and commerce grew rapidly. During this time it was realized that information about products technologies as well as customer needs plays a vital role in any business. This trend continued until last decade. In 1970s after the acceptance of digital computer by organizations for information storage, retrieval and processing, a new dimension to economic growth was added. The Industrial society is now rapidly moving towards knowledge based society. This society is centered around information, information processing tools and innovative ways for information communication. In the industrial society, the Capital resources were considered as the prime resource for individuals or organizations. In knowledge-based society, Information is considered as the prime resource for individuals or organizations. High speed telecommunication services also play an important role in information dissemination and communication. The rapid delivery of information has become a primary activity in this society. The value of information plays an important role in decision making process. It is possible to quantify the amount of the information but it is difficult to compute the absolute value of the information. The value of the information is different to the different groups of persons. Value of information is related to the parameters like, who uses the information, under what circumstances the information is used and most importantly how it is used. The information for this purpose may be treated as a item or commodity to be used by different persons for different purposes. It may be understood from the example. The glass of water may have high value to a thirsty person in summer and may have different value to the person who just had a cup of water in the winters. Similarly, the information received from the meteorological department that it may have heavy showers in next week will have different impact or value to different persons. This information may have high value to the farmer looking for the rains but may not have greater value to those who are not farmers. Therefore, the value of information to different persons will have different effects and it greatly depends on the person, time and environment. There may be different types of value of information. These are given below: 1. Normative Value 2. Realistic Value 3. Subjective Value Suppose the management of a electronic equipment manufacturing company gets the information that a bulk order for different equipment is going to be placed with them in coming days. Management of the company will estimate the cost of production and margins based on additional cost required to manufacture the required number of equipment. Based on these estimates, management will make a plan to quote the revised price of equipment to the purchaser. The computation may be carried out to estimate the profits by calculating the estimated cost of production with and without knowledge of information. The difference of estimated cost with prior knowledge of order and without the knowledge of order would be normative value of information. The normative values are obtained by theoretical procedures of decision making and assume that it will be an optimal decision. The experienced managers will treat the information in different ways. The major drawback of normative value of information is that it is based on the theoretical and standard procedures and ignores the human factor, environment and other risk factors. The experienced manager will like to carry out
12
some experiment to include the human and other environmental factors to study the impact of information. The gain in payoffs may be estimated after obtaining the information. When these payoffs are taken into the consideration to estimate the profit margin, it provides the realistic value of information. Therefore, the value of information obtained after taking the behavioural aspects into consideration is known as realistic value of information. At number of times, it is not possible to calculate the normative or realistic value of information, most experienced persons make an intuitive guess for the expected profit margins. Based on these intuitive guess management will quote the price to purchaser. The value obtained by using the intuitive guess is known as subjective value of information. In real life, mostly we use the subjective value of information.
1.7
QUALITY OF INFORMATION
It may be noted that data in the form of audio, video, graphics or animation requires a high amount of memory in comparison to text and numbers for storage. Since many applications require storage, retrieval and processing of data in various formats and also that information be communicated from one place to another on communication channel. Band width requirement has become a prime area of concern and it is quite a costly affair. It is always desirable that the information be presented in such a way that it enables one to take decisions. Quality of information refers to the extent to which it enables decision making. The need for information in an enterprise arises because of the following reasons: 1. Opportunities before the organization and formalizing the short term or long term policy for the growth of the organization. 2. Resource allocation in an optimal way in order to attain the basic goals of an organization. 3. Adjusting with new and rapidly changing technological advancement and opening new vistas for overall progress of the organization. 4. To maintain the relationship with the management, suppliers, customers, government, banking institutions, etc. 5. Product survey, product marketing, sales of product etc. require the data to be gathered from the field and consequent processing to generate information.
1.8
DATA COMPRESSION
Images, audio, video take enormously high amount of storage ranging from kilobytes to gigabytes. It is always desirable to store the information in a compressed form. Data Compression may be divided into following two categories: 1. Lossless Data Compression 2. Lossy Data Compression Lossless data compression refers to the compression where the exact input data value will be produced after decompression. In the case of lossy compression, data may loose some of content and the exact information will not be reproduced after decompression. There exist several techniques for lossless and lossy compressions. Images, Audio, Video are compressed using lossy data compression techniques as even after losses, the information retrieved after decompression will have certain value. Most of the lossy compression techniques may be adjusted to different quality levels. Lossy compression techniques are usually applied to images, audio, video as they result in certain loss of accuracy
13
thus they are more suitable to formats (images, graphics etc.) other than text. In text cases, where it is not acceptable to miss or lose even a single digit, lossless compression techniques are applied. All the software, programs and important data are compressed using lossless data compression techniques. Suppose, a file containing bank account detail is compressed. After decompression each data or figure must appear without any loss to it. If any digit is lost or missed, the processing of that data may have catastrophic results. Therefore lossless compression techniques are normally applied to text files.
1.9
ENCODING vs COMPRESSION
There is a fine difference between encoding and compression. The objective of compression is to convert the input data into a format which requires less space for storage. The graphics, audio, video data usually take very high amount of storage ranging from several megabyte to gigabytes. Storage, retrieval, processing and communication of such huge data is a very costly affair. Basic principle behind compression is to code the input data using coding techniques in such a way that the coded data takes less amount of storage. For this purpose many coding techniques are used and this process is called encoding. Encoding is therefore a part of compression. The objective of compression is to minimize the storage requirement and produce the same input data at the decompression phase. The objective of encoding is to generate the code for input data which after decoding produces the same information. Data compression is one of the applications of Information Theory. Information theory is actually a branch of mathematics which deal with information or data representation. Information storage, retrieval, processing and communication are also a part of Information Theory. Information theory mainly deals with computation and minimising the redundant information in a sample data. The audio, video, graphics and animation contain a lot of redundant information which can be easily notified without adversely affecting the value of information. Such modification is made in the values of some of the parameters. For example, if you take a original or new photograph and process it in such a way that some parameters like color, size of background objects etc. are slightly changed, then it will still have some information. The level of adjustment of such process must be controlled. If by doing some modification in the parameter pertaining to audio,video or text we save storage space, then this will always reduce the processing time, time for communication and enable fast storage and retrieval. Data compression therefore consists of taking the stream of characters and converting them into codes. The resulting stream of code is smaller than the original stream. The compression is obtained by following a model of compression. The model of compression is collection of statistical data and rules of coding which determine which code to output.
1.10 ENTROPY OF INFORMATION

The prime difference between Loss Less and Lossy Data compression is that Loss Less Data compression algorithm compreses the data without any loss of the information. The original data compressed using Loss Less compression is obtained without any loss while Lossy data compression algorithm allows certain losses to occur. The information theory provides the basic frame-work for development of loss less algorithms. For data compression, it is essential to measure information contents in the data or degree of disorder/randomness in the data. Quantitative measure of information serves the basis for the data compression. Claude Shannon has done pioneering work in information theory and proposed the concept of self-information. Self-information is associated with outcome of every event.
14
Suppose, A and B are the possible outcome of an event. With every possible outcome there is self information associated. Suppose P(A) = Probability of occurrence of A Suppose P(B) = Probability of occurrence of B Suppose Si (A) denotes Self Information associated with A and Si (B) denotes Self Information associated with B. According to Shannon Si (A) and Si (B) may be defined as, Si (A) = logm P(A)) = logm (1/P(A)) Si (B) = logm (P(B)) logm (1/P(B)) The base of the log function (m) defines the unit of information. For example, if the m=2, the unit is bits, if m=10 the unit is hartleys. Since we are always interested in knowing information in terms of bits, we generally set the value of m to 2. Let us analyze what is meant by self information. Since value of log (1)=0 and value of log2 (yx), where x is any number, increases as x decreases from one to zero. It is evident from the following table with assumption that base of the log is 2. The following table shows that with decreasing values of P(A), self information associated with event A increases. It clearly indicates that high probability event contains less self-information while low probability event associates much more self-information. Let us try to understand the meaning of it leaving the mathematics behind. We know that sun rises in the east. Probability that sun will rise in the east tomorrow, is extremely high probable event. (The probability is very high and too close to 1). Since this event has high probability of occurrence therefore, it does not associate much information. Assume, one morning, the sun did not rise in the east (very low probability event.), it will have lot of self Information. P(A) (Prob. of occurrence of event A) 1.0 .60 .50 .25 .20 .15 .10 .05 Self-Information Si (A) Si (A) = log2 (P(A)) 0.0 0.74 1.0 2.0 2.32 2.74 3.32 4.32
Entropy of information may be defined as a measure of information contents in the input sample or message. The higher entropy of message indicates that more information contents are present in the message. Higher entropy of the message also implies higher potential for data compression. Concepts of the self information may also be deployed to make inferences after associating two independent events. Suppose A and B are independent event. The self-information associated with two independent Si (AB) is the sum of self-information obtained from these events separately. Since A and B are independent events therefore, P (AB) = P (A) * P (A))
15
and self information of event A and B are Si(A) = log2 (P(A) Si(A) = log2 (P(B)) Self information associated with occurrence of event A and B, Si (AB) may be defined as Si(AB) = log2 (P (AB)) Si(A) = (log2 (P(A) + log2 (P(B)) = Si (A) + Si (B) 1.10.1 Entropy Function The term Entropy in the Information Theory has been borrowed from thermodynamics. Shannon used this term in Information Theory to determine degree of randomness or disorder in the data. The Shannon proposed following entropy function. Suppose there are n possible of outcome of an event and Pi denotes the probability of ith outcome, the Entropy may be computed as,
1= N
Entropy =
1=1
Pi * log2 (Pi)
...(1)
Let us understand the concept with following example.
Example Suppose we have to examine the outcome of tossing a coin. There are two possible outcome Head and Tail. We will compute the self-information and entropy under following cases. Case 1: The Coin is fair and probability of getting Head or Tail are equal. Case 2: The Coin is biased and probability of getting Head or Tail are not equal. Case 3: The Coin always falls on one side i.e. either Head or Tail. Analysis for all cases are given below. Case 1: Assuming that coin is fair, probability of getting head or tail will be equal. It may be defined as P (Head) = 1/2, P (Tail) =1/2 and P (Head) + P (Tail) =1 The self-information of both outcome therefore may be computed as, Si (Head) = log2 (P(Head) = 1 Si (Tail) = log2 (P(Tail)) = 1 The self-information associated with each outcome is therefore of 1 bit. We use the unit bit because the base of the logarithm is two. Since the event tossing of a coin have only two possible out-come, if we compute following function: E = (P(Head) * log2 (P(Head) + P(Tail) * log2 (P(Tail) ) = (1/2 * log2 (1/2) + 1/2 * (log2 (1/2)) = 1 The term denoted by E is known as Entropy. In this example the value of entropy is 1. Alternatively, the Entropy function may be written as, E = (P(Head) * Si (Head) + P (Tail) * Si (Tail)) = 1/2*1 + 1/2*1 = 1. Case 2: Assuming that the coin is not fair and it is biased toward Head. The probability of getting a Head is .75 and probability of getting a Tail is .25. P (Head) = .75, P (Tail) = .25
16
The self-information of both outcome therefore may be computed as, Si (Head) = log2 (P(Tail) = log2 (.75) = .41 Si (Tail = log2 (P(Tail) = log2 (.25) = 2.0 If we compute the following Entropy function E = (P(Head) * log2 (P(Head) + P (Tail) * log2 (P(Tail)) ) = (.75 * log2 (.75) + .25* (log2 (.25)) ) = .807 For the Case 2, the Entropy value therefore is .807. Alternatively, the entropy function may be written as, E = (P(Head) * Si (Head) + P (Tail) * Si (Tail)) = .807 Similarly if the Probability of getting Head and Tail are .60 and .40 respectively, the Entropy function will yield the value .972.
Case 3: If one of the outcome e.g. Head is guaranteed, the Probabilities of getting Head and Tail would be, P (Head) =1 P (Tail) = 0 Using the method given above the Entropy function will yield the value as under, E = ( 1 * log2 (1.0) + 0 * log2 (0) ) = 0 Result obtained Case 1, Case 2 and Case 3 are presented in the following table.
Case Case 1 : Coin is Fair Case 2 : Coin is biased Case 2 : Coin is biased Case 3: Coin always falls on Head side Probability P (Head) = P (Tail) = 1/2 P (Head) = .60; P (Tail) = .40 P (Head) = .75; P (Tail) =.25 P (Head) = 1; P (Tail) = 0 Entropy 1.0 .97 .80 0
Ealier we have observed that high probability event contains less self-information while low probability event associates much more self-information. It means when the high probability of event contains less self-information, therefore it requires less number of bits. From the table shown above, it is evident that entropy value decreases when the degree of disorder decreases. Case 1 indicates that the coin is fair. Outcome of tossing a fair coin is completely uncertain as the probability of getting Head or Tail is 1/2. Hence both of the outcome are equally likely to occur. This also indicates that degree of disorder is maximum as any one of the outcome may occur with equal probability. In this case the entropy value is maximum. In the Case 2, when the coin is biased, two cases are considered when there is more probability that tossing a coin will result in getting a Head. The degree of disorder is reduced in the case when the probability of getting a Head is .60. In this case the entropy function yields the value .97 which is smaller than 1.0. When the degree of disorder is further reduced (high probability of getting Head) i.e. when Probability of Getting a Head is .75, the entropy value is further reduced (Entropy = .80). The extreme case is Case 3, when one of the outcome is certain as tossing of coin will always result in getting a Head (P (Head) = 1). This event is most certain and possesses no disorder. In Case 3, the entropy function yields the value equal to zero. From the above discussion, it is therefore observed that under the certainty (degree of disorder is minimum) entropy reaches to minimum value and under most uncertain condition (Degree of Disorder is maximum) entropy reaches to maximum value. We may conclude that function of information is to reduce uncertainty by either reducing randomness or by decreasing number of choices. These observations
17
made by the Shannon, were widely accepted by scientific community and it later found application in generating efficient code to be used in communication. We may use the concept of self-information and entropy for generating efficient binary code for the different characters appearing in the text. Here efficient code means generating minimum size code. It means that when the codes of the different characters are communicated over communication channel, minimum number of bits are required to be sent over communication channel. This improves channel efficiency and reduces channel congestion. Suppose the text or message contains N characters. Then entropy of whole message can be defined as average self-information of all (N) characters. The self-information of a character is also known as entropy of character.
1= N
Entropy of Message = 1/N *
1=1
entropy of character
...(2)
Entropy of a character is related with the probability of occurrence of character. It is defined as follows: ...(3) Entropy (Self-Information) of Character = log2 (Probability of character) The entropy of whole message is therefore the sum of entropy of individual characters. Entropy is also used to determine that how many bits of information are actually present in the message stream.
Example: Compute the self-information and entropy of the following message stream: AABACDACDBABCAB. Total number of Characters in Message (N) = 15 Total number of characters, their probability and self-information (entropy) is shown in the following table.
Character A B C D Probability 6/15 4/15 3/15 2/15 Self-information (= log2 (Probability of character) 1.32 1.90 2.32 2.90
Table shown below contains the character of the message and their associated self-information. Consider the equation (2). The entropy of message may be obtained as following:
A 1.32 A 1.32 B 1.90 A 1.32 C 2.32 D 2.90
1= N
A 1.32
C 2.32
D 2.90
B 1.90
A 1.32
B 1.90
C 2/32
A 1/32
B 1/90
Entropy of Message = 1/N *
1=1
entropy of character
=1/15 * (1.32+1.32+1.90+1.32+2.32+1.32+2.32+2.90+1.90+1.32 +1.90+2.32+1.32+1.90) = 1.88 The entropy of message indicates the average number of bit required to represent a character in the message. We may also compute the entropy by the function given by equation 1.
18
1= N
Entropy = = = =
1=1
Pi * log2 (Pi)
6/15 * 1.32 + 4/15 * 1.90 + 3/15 * 2.32 + 2/15 * 2.90 .528 + .506 + .464 + .386 1.88
1.10.2 Use of Entropy for Coding As discussed before, the entropy function may be used for developing efficient code for purpose of communication or compression. Suppose we have to communicate the message stream containing several characters. We would like to assign a code to every distinct character and in place of a character a binary code may be communicated. Smaller the code, higher efficiency in communication will be achieved. While developing a code, entropy function reveals the scope of further refinement in coding scheme. The entropy of message is lower limit on average number of bits required to represent a character. Let us try to understand with following examples.
Example Consider a message stream consisting of characters A, B, C and D. Suppose the probability of occurrence of every character is .60, .30, .08 and .02 respectively. Self-information associated with every character is shown below.
Character A B C D Binary Code 00 01 10 11 Probability .60 .30 .08 .02 Self-Information 0.73 1.73 3.64 5.64
If we generate shortest binary code for representation of every character without considering probability of occurrence, we will probably generate the code as shown in column 2 of above table. If we consider the probability of occurrence of every character, we may compute the self-information for every character. Entropy of message stream may be computed as follows:
1= N
Entropy =
1=1
Pi * log2 (Pi)
= .60*0.73 + .30*1.73 + .08*3.64 + .02*5.64 = 1.36 The entropy function suggests that minimum average size of the code for representing the character should be 1.36. However, if we generate the code through most simple method (column 2), the average size of code for representing each character is 2.0. This difference (between 1.36 and 2.0) suggest that there is still scope for improvement. We may use some other method or scheme for development of code for character where average size of code is more closer to 1.36 or less than 2.0. Generally, the entropy value serves as an estimate for average message length. We may define quantity of information as the average code size is necessary to represent a character.
19
Example Suppose the probability of character * appearing in a particular text is 1/8. How many bits will be required to represent this character in compression? If a message string ***** has to be compressed then determine number of bits saved in comparison to ASCII code. Solution The probability of character * = 1/8 Entropy of character = log2 (Probability of character) = log2 (1/8) = 3 ...(1) Thus the entropy of character * =3, this means that the character may be represented by a 3 bit code in compressed form. Total number of characters in character string *****=5 Total number of bits required to represent a message string ***** = 5*3=15 ... (2) Characters or symbol requires 8 bit code to represent in ASCII code. Thus each character will require 8 bits for coding a character. Total no of bits required to encode the text ***** = 5*8 = 40 ... (3) Total number of bits saved = 40 15 = 25 ... (4) The difference in 15 bits of entropy and 40 bits to encode the message using standard ASCII code shows the potential for data compression.
1.10.3 Motivating Factors for Data Compression Shannons work in information theory has been widely accepted in communication and data compression. The concept of entropy and self-information are used to develop the efficient code. These codes require less amount of information bits to represent the data. Consider the following to understand. Suppose a message consists of four character A, B, C & D. The message consisting of these characters is to be sent over a communication channel. The receiver receives the message from the communication channel for further use. Suppose the probability of occurrence of each character is Pa, Pb, Pc & Pd respectively. Following condition holds on the probabilities: Pa + Pb + Pc + Pd =1 ... (1) If equivalent binary code has to be generated then the total number of bit required to code each character distinctly may be obtained as follows: ... (2) Total number of bits (M) = log2( N) = log2 (4) =2 If we assume that probability of occurrence of all characters in the message are equal then entropy function will yield the following:
1= N
Entropy =
1=1
Pi * log2 (Pi)
... (3)
= (.25*log2 (.25) + .25*log2 (.25) + .25*log2 (.25) + .25*log2 (.25)) = 2 Therefore, size of code requires two bits to represent all four characters. Suppose the messages consist of 100 such characters then total number of bits to be transmitted will be 100*2=200 bits. According to this scheme, the possible code for the characters are shown below.
20
Character A B C D
Code 00 01 10 11
Computation above with equation 2 and 3, suggests that if the probability of occurrence of all characters is equal, entropy function yields the value equal to two. Total number of bits used for actual coding are also two. Therefore, the coding scheme which generates the two bit code as shown above in the table, is optimum because the entropy value is also equal to two. Using this code a message containing 100 characters will require 200 bit to code. Consider another case, where the Pa, Pb, Pc and Pd are not same i.e. probability of occurrence of characters are not equal. Suppose Pa = .70, Pb = .15, Pc = .10 and Pd = .05. In this case, we will see that the equal size code as shown before in the table will not be efficient codes. Let us compute the entropy for second case where the probabilities are not equal,
1= N
Entropy =
1= 1
Pi * log2 (Pi)
... (4)
= (.7*log2 (.7) + .15*log2 (.15) + .1*log2 (.1) +.05*log2 (.05)) = 1.31 The entropy value therefore is 1.31 for the case when the probabilities are unequal. This value suggests that average size of code for character with unequal probabilities should be closer to the value 1.31. The coding scheme shown above, which generates the code with size two proves to be inefficient code because there exist a scope for improvement in coding scheme. This is evident from the difference in average code size (=2) and new entropy value (=1.31). To generate better code than earlier code we must generate the code of different size according to the probability. High probability character must be assigned smaller size code. Let us examine the codes shown in following table without bothering how they have been generated. Character A B C D Code 1 01 000 001 Probability .70 .15 .10 .05
If we use this coding scheme, the approximate number of bits to be transmitted over communication channel for a message containing 100 characters are: No. of bits = 70*1 + 15*2 + 10*3 + 5*3 = 135 For 100 characters total 135 bits will be transmitted for the new coding scheme. It also implies that average number of bits transmitted per character is 1.35. This value is much closer to the entropy (1.31)
21
for case of unequal probability of character. The difference in entropy value and actual number of bits transmitted can be used as factor for considering new and better strategy for generating code. Minimum difference will ensure minimum redundancy. This result is considered as a motivating factor to deploy better coding scheme for the communication and compression of messages. Work on data compression started well before the introduction of Digital Computers. In the late 1940s, it was a major issue for mathematicians to code the information. Researchers started exploring the possibilities for efficient coding, redundancy and entropy in the text. Basically there are two ways of assiging a code to a character or symbol. These are static coding and dynamic coding. In the static coding scheme, fixed length codes are generated uniquely to identify each symbol. The whole message or text is converted into coded form by replacing each symbol with its code. This method has a disadvantage that it does not consider the frequency or probability of occurrence of a particular symbol in a message. In fact, statistical analysis of every text or message reveals that there are few symbols which have maximum frequency i.e. these symbols are repeated frequently in the text. If these symbols can be identified, then they can be addressed by smaller codes. We can then obtain a higher degree of compression. This type of coding is known as dynamic coding using a variable length code. The static and dynamic coding schemes are explained below: 1.10.4 Static Coding (Fixed Size Code) In static coding, fixed sized codes are allocated to each symbol. Each symbol can be uniquely identified by its corresponding code. It is also possible to compute the minimum number of bits required to represent a symbol. Suppose there are M symbols which are used to constitute a message or text: Let N = minimum number of digits required to represent M distinct symbols. Let I = base of number system. N = logi (M) ... (1) In digital computer system, we represent the data in binary form. Thus the minimum number of bits required to uniquely represent a symbol will be, N = log 2 (M) ... (2)
Example Suppose a message is composed of five symbols a, b, c, d, e. Compute the following:

1. Find the minimum number of bits required to represent/code each symbol uniquely. 2. Generate the code for all symbols. 3. Find the coded form of message string bddac.
Solution
Total number of distinct symbol M = 5. Total number of bits required (as par Eq. 2) N = log2 (M) = log2 (5) = 3 Thus 3 bit code will be required to represent each symbol or the minimum number of bits required to represent a symbol uniquely is 3.
22
The code may be generated in the following way. Using 3 digit, the following unique code may be generated. Static Code 000 001 010 011 100 101 110 111 Symbol a b c d e
Total number of unique code generated = 23 = 8. We may assign any five codes to these symbols as mentioned in the table above. Using the scheme, coded string of bddac is as follows. 001 011 011 000 010. Thus string bddac would require 5*3 = 15 digits to code.
Example Consider the above example and show how many bits will be saved by using static coding over ASCII code for string bddac. Solution Using static code, the total number of bits to represent string bddac = 5*3 = 15 Using ASCII code, total number of bits to represent string bddac = 5*8 = 40 So the total bit saving = 40 15 = 25. Thus during compression, a text of size 40 bit may be compressed to 15 bit.
1.10.5 Dynamic Coding (Variable Size Codes) We have seen that a fixed size code may be generated to uniquely represent each symbol of input text. If we code according to this, we may obtain a compressed form of input text. If the coded text is communicated over a channel, then the input text may be obtained at the receiving end by the decoding process. In this process, we reduce the size of input text which has to be communicated. The same process can be applied if we have to store the text and we will be able to save considerable amount of disk space. Further compression may be obtained, if dynamic coding is done using variable size code. This method is based on the principle of identifying the symbols which appear frequently. Suppose a symbol a appears in the text most frequently. This property may be exploited by assigning a minimum number of digits to represent a. Since a appears most frequently, then we may assign one bit code to save space. The symbols which appears in the text less frequently are assigned higher bit code. Any statistical model may be used to calculate the average frequency of occurrence of symbols. Consider the following example:
Example Suppose a text or message may be composed of four symbols. These symbols are a, b, c and d. Frequency distribution of occurrence of each symbols is as under:
23
Symbol a b c d
Frequency 15 10 70 15
Suppose a text containing 1000 symbols has to be compressed. Compute the following: 1. Total number of bits required to represent the whole text using ASCII codes. 2. Total number of bits required to represent the whole text using fixed size code/static code. Generate the static code for all symbols. 3. Total number of bits required to represent the whole text using dynamic coding/variable length codes. Consider the frequency distribution. Generate the dynamic code for all symbols.
Solution 1. Number of bits used in ASCII code = 8. 1. Total number of symbols in text = 1000 1. Total number of bits to represent the whole text = 8000 bits. 2. Total number of distinct symbol M = 4. 1. Total number of bits required to represent each symbol 1 . N = + log2 (M) , NN = + log2 (4) , = 2
Thus 2 bit code will be required to represent each symbol. The code may be allocated as below: Symbol a b c d Code 00 01 10 11
Total number of bits required to represent a text in this scheme = 1000 * N = 1000 * 2 = 2000 bits. 3. In decreasing order of frequency, the symbols may be arranged as follows: Symbol c a b d Frequency 70 15 10 5 No. of Bits 1 2 3 4 Code 1 01 001 0001
Since c is the most frequent symbol, it may be given one bit code. Thereafter symbols a, b and d may be allocated 2, 3, 4 bit code respectively as given in the above table. After using the above scheme, we may compute the total number of bit required. Total bits required to represent text containing 1000 symbol = 1* Total occurrence of symbol c + 2 *
24
Total occurrence of symbol a + 3 * Total occurrence of symbol b + 4 * Total occurrence of symbol d = 1 * 700 + 2 * 150 + 3 * 100 + 4 * 50 = 700 + 300 + 300 + 200 = 1500 bits. Thus using variable length code, the coded text would require 1500 bits. It may be noted that this scheme of compression is suitable only if there is a large variation in the occurrence of symbols.
Example Suppose a text is composed of four symbols. These symbols are a, b, c and d. Frequency distribution of occurrence of each symbol is as under:
Symbol a b c d Frequency 15 10 70 5
Calculate the entropy and show the average number of bits required to represent a symbol.
Solution Let Pi = probability of occurrence of ith symbol. Entropy of symbol I, Ei = log2 (1/Pi)
1= N
... (1) ... (2)
Entropy of message, Em E m =
1=1
Pi * log2 (1/Pi)
= .15 * log (1/.15) + .1 * log (1/.1) + .7 * log (1/.7) + .05 * log (1/.05) = .41 + .33 + .36 + .21 = 1.31
1.11
NUMBER SYSTEM
We use the decimal number system in our day-to-day work. This system uses digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. This system is called decimal because it uses a total of ten digits and any number is represented as a string of these ten digits. However, a computer cannot use this number. Instead, the computer works on binary digits. A binary system has only two digits 0 and 1. This is because the computer uses integrated circuits with thousands of transistors which process the work submitted by the outside world in terms of electronic pulses. 1.11.1 Decimal Number System The decimal number system uses ten digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9). It thus is said to have a base of ten. Using the various digits in different positions we can express any number. Since the base in decimal number system is 10, the number 4563 is written as 4563/10. The digit used to represent a number carries a specific weight when it is used at a specific position. For example, the decimal number 4563 may be represented as 4563 = 4 * 10^3 + 5 * 10^2 + 6 * 10^1 + 3 * 10 ^0
25
1.11.2 Binary Number System This system uses only two digits and thus it is known as the binary number system. These numbers are 0 and 1. Any number represented in the binary number system is a string of 0 and 1s. Hence this system has a base of 2. The abbreviation of binary digit is bit. A string of 8 bits is known as byte. A byte is the basic unit of the computer. In most computers, the data processed is in the string of 8 bits or some multiple of 8 bits. As in the decimal system, the binary number system is position weighted. For example, the binary number 1001 may be represented as 1001 = 1* 2^3 + 0 * 2^2 + 0 * 2^1 + 1 *2^0 1.11.3 Octal Number System This system uses eight digits (0, 1, 2, 3, 4, 5, 6, 7). Since the octal number system uses a total of eight digits to compose a number, this system is said to have a base of eight. Using the different digits in different positions, we can express any number. Since the base in octal number system is 8, the number 4563 is written as 4563/8 . The digit used to represent a number carries a specific weight when it is used at a specific position. For example, the octal number 4563 may be represented as 4563 = 4 * 8^3 + 5 * 8^2 + 6 * 8^1 + 3 * 8^0 1.11.4 Hexadecimal Number System This system uses sixteen digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F) to express a number. Thus it is said to have a base of sixteen. Using the different digits in different positions, we can express any number. Since the base in hexadecimal number system is 16, the number 4563 is written as 4563/16. The digit used to represent a number carries a specific weight when it is used at a specific position. For example, the hexadecimal number 45AB may be represented as 45AB = 4 * 16^3 + 5 * 16^2 + 10 * 16^1 + 11 * 16^0 The following table presents four bit equivalent binary number of hexadecimal digits: Hexadecimal Digit 0 1 2 3 4 5 6 7 8 9 A B C D E F Four Digit Binary Equivalent 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
26
1.11.5 Binary to Decimal Conversion A binary number may be converted to a decimal number using the following process.
Example Convert 11001 to decimal number system. Since each position has a weight, first bit (Least Significant Bit) has a weight 2^0, if the position is nth from the LSB, position has a weight 2^(n 1). Thus 11001 = 1* 2^4 + 1 * 2^3 + 0 * 2^2 + 0 * 2^1 + 1 * 2^0 = 1 * 16 + 1* 8 + 0 * 4 + 0 * 2 + 1 * 1 = 16 + 8 + 0 + 0 + 1 = 25 Hence equivalent decimal number is 25.
1.11.6 Decimal to Binary Conversion To convert decimal to binary, a method of successive multiplication by 2 is used. After each multiplication, the integer part is noted and the fraction is again multiplied by 2 till the remainder become zero. Sometimes it is possible that the remainder doesnt become zero even after many stages. In such a case, approximation is made and the result is taken up to a certain number of bit after the binary point. A similar procedure is adopted for a number having both integer and fraction. Binary fraction is added and subtracted as the decimal numbers. Thus this method involves successive division by 2 and recording the remainder (the remainder will always be 0 or 1). The division will be stopped when we get a quotient of 0 with remainder of 1. The remainders when read upward give the equivalent binary number.
Example Convert decimal number 25 to binary number

remainder 2 2 2 2 2 25 12 16 13 11 1 0 0 1 1 The procedure begins with the successive division by 2. Keep noting the remainder of division until 1 comes as the quotient. The string of remainder obtained from the successive division constitutes the equivalent binary number. Binary equivalent to decimal number 25 is 11001. 1.11.7 Hexadecimal to Binary Conversion The hexadecimal number system is very convenient and extensively used because hexadecimal numbers are very short as compared to binary numbers. Hexadecimal means 16. Thus the hexadecimal system has a base of 16. It uses 16 digits to represent all numbers. The digits are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F.
27
Hexadecimal digits are converted to binary number by obtaining its 4-bit equivalent as per the conversion table.
Example
Convert 45A4 to binary equivalent. Hexadecimal number 4 Each digits equivalent Equivalent binary number 5 0100 A 0101 4 1100 0100
0100010111000100
1.11.8 Binary to Hexadecimal Conversion Convert each 4-bit binary into an equivalent hexadecimal.
Example Convert 0001010001001101 to equivalent hexadecimal number. Binary number : 0001 0100 0100 1101 1 4 4 D
Equivalent hexanumber 144D
1.11.9 Addition of Binary Number The rules for addition of binary numbers are as follows: 0+0 0+1 1+0 1+1 1 1 1 10
It may be noted that 1 + 1 is represented as 10 i.e., the sum is 0 and carry is 1.
Example Add two binary number 111010 and 1001.

Carry First number Second number + 1 1 1 0 1 1 0
1 1 0
0 0 0
1 0 1
0 1 1
1.11.10 Binary Subtraction The rules for subtraction of Binary numbers are as follows: 00 10 11 101 0 1 0 1
28
In both the operations of addition and subtraction, we start with the least significant bit (LSB) i.e. start with the bit on the extreme right side and proceed to the left.
Example Subtract 10001 from 110001.

First number Second number 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0
1.11.11 Multiplication of Binary Numbers The four basic rules for multiplication of binary numbers are as follows: 00=0 01=0 10=0 11=1 The method of binary multiplication is similar to that in decimal multiplication. The method involves forming partial products, shifting successive partial products left one place and adding all the partial products.
Example Multiply 10001 by 101

1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1
1.11.12 Signed Binary Number In binary number system the digit 0 is used for the +ve sign and the digit 1 for the ve sign as the Most Significant Digit. The most significant bit is the sign bit followed by the magnitude bits. Numbers expressed in this form are known as signed binary number. The number may be written in 4 bits, 8 bits, 16 bits etc. In every case the most leading bit represents the sign bit and remaining bits represent magnitude.
1s Complement The 1s complement of a binary number is obtained by complementing each bit. Example Obtain the 1s complement of 100001.
Number 1 0 0 1 0 1 0 1 0 1 1 0
29
2s Complement The signed binary number required too much electronic circuit for addition and subtraction. Therefore, positive decimal numbers are expressed in signed-magnitude form but negative decimal numbers are expressed in 2s complements. 2s complement of a number may be obtained by adding a binary digit 1 to the 1s complement of a number. Example Obtain the 2s complement of 100001.
Number 1s complement + 2s complement 0 1 1 1 1 1 0 0 1 0 1 0 1 0 1 1 0 1 1
2s Complement Addition Subtraction The use of 2s complement representation has simplified the computer hardware for arithmetic operation. When A and B are added, the bit are not inverted and so we get, S=A+B When B is to be subtracted from A, the computer hardware forms the twos complement and then adds it to A. Thus S = A + B = A + (B) = A B Conversion of Hexadecimal to Decimal One method to convert a hexadecimal into a decimal equivalent is to first convert hexadecimal to binary and then convert binary to decimal. A direct conversion of hexadecimal into decimal is also possible. Since the base of a hexadecimal is 16, the weight of different bits are 160,161,162, etc. starting with the bit on the extreme right. The decimal equivalent of a hexadecimal number equals the sum of all digits multiplied by their weights. Decimal to Hexadecimal Conversion One method is to convert the decimal to binary and then convert binary to hexadecimal. The direct method is successive division by 16 and to write the hexadecimal equivalent of remainder.
1.11.13 Binary Coded Decimal (BCD) In computer technology the numbers are represented in binary form while in our day-to-day functions, members are represented in the decimal form. The BCD codes are used to represent decimal number to binary. A weighted binary code is one in which number carries certain weight. A string of 4 bits is known as nibble. BCD means that each decimal digit is represented by a nibble (binary code of 4 digits). 8421 code is the most predominant BCD code. The designation 8421 indicates the weight of the 4 bits. When one refers to BCD code, it always means 8421 code. Though 16 number (24) can be represented by 4 bits, only 10 of them are used. The remaining 6 are invalid in 8421 BCD code. To represent any number in BCD code, each decimal number is replaced by the appropriate 4-bit code. BCD code is used in pocket calculator, electronic counter, digital voltameter, and digital clock. The early version of computers used BCD code. However BCD code was discarded later because it is slow and more complicated than the binary system.
30
BCD Addition Addition is the most important arithmetic operation. Subtraction, multiplication and division can be done by using addition. The rules of BCD addition are: 1. Add the two numbers using binary addition. If the four-bit sum is equal or less than 9, it is a valid BCD number. 2. If the four-bit sum is more than 9 or carry is generated from the group of four bits, the result is invalid. In such a case, add carry to next four-bit group.
1.12 ALPHANUMERIC CODE

For proper communication, we need to represent numbers, letters and symbols. Alphanumeric code can represent all these three.
ASCII Code (American Standard Code for Information Interchange) It is seven-bit code used extensively for printers and terminals of usually small computer systems. Many large computer systems also accommodate this code. The characters are assigned in the ascending order of binary numbers. Sometime an 8 bit is also added and this bit is either 0 or 1 or used as parity bit. EBCDIC Code This refers to Extended Binary Coded Decimal Interchange Code. EBCDIC is used in most of the large computers for communication. It is an eight bit code and uses BCD. Error Detection Codes Every digit of a digital system must be correct. An error in any digit can cause a problem because the computer may recognize it as something else. Many methods have been devised to detect such errors. Parity Parity refers to the number of 1s in the binary word. When the number of 1s in the binary word is odd, it is said to have odd parity. When the number of 1s in the word is even, it is said to have even parity. One method for error detection is to use 7 bits for data and 8th bit for parity. The parity can be 1 or 0. At the receiving end the parity is checked, and if an error has been committed, the data is required to be transmitted again. In some computer systems even parity is used. Check Sums The parity check cannot detect two errors in the same word. One method to such cases is the check sum. As each word is transmitted, it is added to the previous words and the sum is retained at the sending end. Each successive word is added to the sum of the previous words. At the end of transmission, the sum is also sent and is checked at the receiving point. Check sum method is commonly used in Teleprocessing.
OBJECTIVE TYPE QUESTIONS

Multiple Choice
1. Data can be represented in a digital computer as (a) Text (b) Image (c) Both (a) & (b) (d) None of above
31
2. 3.
4. 5. 6. 7. 8.
9. 10.
Which of the following are part of Information Technology? (a) Data Processing (b) Data Storage (c) Data Communication (d) All of above Which of following is true for Information? i. Information is obtained by processing raw data. ii. Raw Data consist of numbers, text etc. iii. Information is meaningful. (a) Only (i ) (b) Both (i) & ( iii) (c) Only (iii) (d) All of above Which of the following data file may be compressed? (a) Text (b) Image (c) Audio (d) All of above Which of the following compression techniques loses some data contents? (a) Lossless (b) Lossy (c) None of above (d) All of above Value of Information produced on the basis of intuitive guess is (a) Normative (b) Realistic (c) Subjective (d) None of Above Which of the following are not graphics file extensions? (a) .JPG (b) .BMP (c) .DOC (d) .PNG Entropy term is used to (a) Measure Information Contents (b) Represent value of information (c ) Represent Information System (d) None of above Lossless Data Compression technique is normally applied to (a) Image (b) Video (c) Text (d) Sound Which of following may be compressed with Lossy data compression technique? (a) Image (b) Video (c) Sound (d) All of Above
Answers
1-c, 2-d, 3-d, 4-d, 5-b, 6-c, 7-c, 8-a, 9-c, 10-d
State True or False

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Information plays a key role in decision-making. Computer can process the data represented in the form of text only. In Lossy compression, some data contents are lost. Encoding of a text ensures compression in data. Higher entropy of message implies high potential for data compression. Dynamic code yields better compression result than static codes. Normative value of information ignores the human errors and other risk factors. Base of Hexadecimal number system is 16. Graphics and animations also contains information. Entropy of information is measure of information contents in message. [True] [True] [True] [True] [True] [True] [True] [True] [True] [True] [False] [False] [False] [False] [False] [False] [False] [False] [False] [False]
Answers
1-True, 2-False, 3-True, 4-False, 5-True, 6-True, 7-True, 8-True, 9-True, 10-True.
SELECTED EXERCISES
1. 2. 3. 4. What do you mean by the term Information Technology? Express your views. How digital computers are being used for storage, retrieval and processing of information? List any five areas of IT application. Explain what do you mean by IT enabled services? How a Video can hold information? Is it possible to process the video data? If yes, then explain what kind of operation may be performed on video data? 5. Explain the various data formats? 6. What do we mean by the term interactive multimedia? Explain the reason why it is becoming popular in day-to-day life?
32
7. What do you mean by an image? How this format of storing data is different from animation and video ? 8. What are animations? How they are developed? Explain the role of computer in developing the animations. 9. What is the difference between data and information? 10. What do you mean by the term Information Representation? How is information represented in digital media or a digital computer? 11. Information is obtained by processing of raw data. Is it possible to process the data in an arbitrary manner and results be termed as Information? 12. Why is information required in an enterprise or an organization? 13. What do you mean by Quality of Information? 14. How is an image represented in the computer system? Is it possible to compress the image and obtain a copy of the original image after decompression? If yes, then how? 15. What do you understand by the term Graphics? Explain what kind of information may be contained in graphics? How are graphics represented in digital computer system? 16. Define Information System? Why an organization requires an Information System? How the employees of an organization are benefited with the Information System. 17. Define the steps, how would you develop an Information System for your college or university. 18. What are Computer Based Information Systems (CBIS)? How are they different from manual information systems? 19. What are the major components of an information system? Explain, why set of methods are considered as basis for Information System development. 20. How database are related to the information system? Explain why a good database design leads to development of efficient Information System. 21. How many types of information system exist today? List the name of any commercial information system. 22. Define Business Transaction? What are Transaction Processing Systems? List the limitations and advantages of transaction processing system. 23. What do you mean by On Line Transaction Processing? How it is different from Real Time Transaction processing? 24. What are Management Information Systems (MIS)? List the advantages of MIS. 25. What are decision support systems? 26. What are Workflow Systems? Under what circumstances, Workflow systems are required? Explain working of any workflow system you have seen in day-to-day life. 27. What are Expert Systems? Explain the role of inference engine and knowledge base in the design of expert system. How a knowledge base is different from database? 28. What do you mean by Normative Value of Information? How it is different from subjective and realistic value of information? 29. What do you mean by the self-information? Compute the self-information related to the event of tossing a coin. 30. Explain the relation of certainty with the self-information? Explain, why self-information is less for the most certain outcome of an event? 31. How the concept of self-information and entropy are used in data communication? Explain with example. 32. Write any four popular formats for storing the graphics. 33. What are the popular format for storing Video data? 34. What do you mean by the term Data Compression? 35. Explain what are Lossless Data Compression techniques? Detail the circumstances when data should be compressed using lossless data compression techniques. 36. What do you understand by Lossy Data Compression Techniques? 37. Explain why images are compressed by Lossy Data Compression Techniques? 38. What do you mean by the term Sound? Is it possible to store and process the sound using a digital computer? Can the audio data be compressed? List a few popular formats for storing audio data.
33
39. 40. 41. 42. 43. 44. 45. 46.
47. 48. 49.
Write the essential differences between Encoding and Compression of Data. Why is coding required in the process of data compression? What to do understand by Entropy of information? How do you measure the entropy of a message? What is the meaning of entropy of symbols and entropy of the whole message? How many applications at home are found to be computer run or possibly they may be run through a computer by making some changes in them? Give your own account. Convert the following decimal numbers to Binary, Octal and Hexadecimal Numbers: 1000, 22, 556, 229 Write down the rules to add two binary numbers. Compute the addition of following binary numbers: (1001 + 10 + 11010 ), ( 1 + 1111 ), ( 10101 + 11100 + 10101) Do the following operations on binary numbers: Subract 1001 from 110001 Multiply 1001 by 110001 What are signed binary numbers? How is addition performed using signed numbers? What are BCD codes? Explain. What are Alphanumeric codes? How are they are used?

Concept in Information and Processing

Uploaded by

Copyright:

Available Formats

Concept in Information and Processing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Concept in Information and Processing

Uploaded by

Copyright:

Available Formats

1

CONCEPTS IN INFORMATION AND PROCESSING

CONCEPTS IN INFORMATION AND PROCESSING

CONCEPTS IN INFORMATION AND PROCESSING

AN OVERVIEW OF CURRENT INFORMATION TECHNOLOGY APPLICATIONS

FOUNDATIONS OF INFORMATION TECHNOLOGY

WHAT IS THE DIFFERENCE BETWEEN DATA AND INFORMATION ?

CONCEPTS IN INFORMATION AND PROCESSING

Raw Data Numbers/Text/Sound Image/Audio/Video

Refining Information (Next Hop)

FOUNDATIONS OF INFORMATION TECHNOLOGY

CONCEPTS IN INFORMATION AND PROCESSING

FOUNDATIONS OF INFORMATION TECHNOLOGY

CONCEPTS IN INFORMATION AND PROCESSING

IMPORTANT DATA TYPES

FOUNDATIONS OF INFORMATION TECHNOLOGY

1.6 VALUE OF INFORMATION

CONCEPTS IN INFORMATION AND PROCESSING

FOUNDATIONS OF INFORMATION TECHNOLOGY

CONCEPTS IN INFORMATION AND PROCESSING

1.10 ENTROPY OF INFORMATION

FOUNDATIONS OF INFORMATION TECHNOLOGY

CONCEPTS IN INFORMATION AND PROCESSING

Let us understand the concept with following example.

FOUNDATIONS OF INFORMATION TECHNOLOGY

CONCEPTS IN INFORMATION AND PROCESSING

Entropy of Message = 1/N *

Entropy of Message = 1/N *

FOUNDATIONS OF INFORMATION TECHNOLOGY

CONCEPTS IN INFORMATION AND PROCESSING

FOUNDATIONS OF INFORMATION TECHNOLOGY

CONCEPTS IN INFORMATION AND PROCESSING

Example Suppose a message is composed of five symbols a, b, c, d, e. Compute the following:

FOUNDATIONS OF INFORMATION TECHNOLOGY

CONCEPTS IN INFORMATION AND PROCESSING

FOUNDATIONS OF INFORMATION TECHNOLOGY

... (1) ... (2)

CONCEPTS IN INFORMATION AND PROCESSING

FOUNDATIONS OF INFORMATION TECHNOLOGY

Example Convert decimal number 25 to binary number

CONCEPTS IN INFORMATION AND PROCESSING

It may be noted that 1 + 1 is represented as 10 i.e., the sum is 0 and carry is 1.

Example Add two binary number 111010 and 1001.

FOUNDATIONS OF INFORMATION TECHNOLOGY

Example Subtract 10001 from 110001.

Example Multiply 10001 by 101

CONCEPTS IN INFORMATION AND PROCESSING

FOUNDATIONS OF INFORMATION TECHNOLOGY

1.12 ALPHANUMERIC CODE

OBJECTIVE TYPE QUESTIONS

CONCEPTS IN INFORMATION AND PROCESSING

State True or False

FOUNDATIONS OF INFORMATION TECHNOLOGY

CONCEPTS IN INFORMATION AND PROCESSING

39. 40. 41. 42. 43. 44. 45. 46.

47. 48. 49.

You might also like