Computer Virus: Not To Be Confused With
Computer Virus: Not To Be Confused With
Computer Virus: Not To Be Confused With
A computer virus is a computer program that can copy itself[1] and infect a computer. The term
"virus" is also commonly but erroneously used to refer to other types of malware, including but
not limited to adware and spyware programs that do not have the reproductive ability. A true
virus can spread from one computer to another (in some form of executable code) when its host
is taken to the target computer; for instance because a user sent it over a network or the Internet,
or carried it on a removable medium such as a floppy disk, CD, DVD, or USB drive.[2]
Viruses can increase their chances of spreading to other computers by infecting files on a
network file system or a file system that is accessed by another computer.[3][4]
As stated above, the term "computer virus" is sometimes used as a catch-all phrase to include all
types of malware, even those that do not have the reproductive ability. Malware includes
computer viruses, computer worms, Trojan horses, most rootkits, spyware, dishonest adware and
other malicious and unwanted software, including true viruses. Viruses are sometimes confused
with worms and Trojan horses, which are technically different. A worm can exploit security
vulnerabilities to spread itself automatically to other computers through networks, while a Trojan
horse is a program that appears harmless but hides malicious functions. Worms and Trojan
horses, like viruses, may harm a computer system's data or performance. Some viruses and other
malware have symptoms noticeable to the computer user, but many are surreptitious or simply
do nothing to call attention to themselves. Some viruses do nothing beyond reproducing
themselves.
Contents
[hide]
1 History
o 1.1 Academic work
o 1.2 Science Fiction
o 1.3 Virus programs
2 Infection strategies
o 2.1 Nonresident viruses
o 2.2 Resident viruses
3 Vectors and hosts
4 Methods to avoid detection
o 4.1 Avoiding bait files and other undesirable hosts
o 4.2 Stealth
4.2.1 Self-modification
4.2.2 Encryption with a variable key
4.2.3 Polymorphic code
4.2.4 Metamorphic code
5 Vulnerability and countermeasures
o 5.1 The vulnerability of operating systems to viruses
o 5.2 The role of software development
o 5.3 Anti-virus software and other preventive measures
o 5.4 Recovery methods
5.4.1 Virus removal
5.4.2 Operating system reinstallation
6 See also
7 References
8 Further reading
9 External links
History
Academic work
The first academic work on the theory of computer viruses (although the term "computer virus"
was not invented at that time) was done by John von Neumann in 1949 who held lectures at the
University of Illinois about the "Theory and Organization of Complicated Automata". The work
of von Neumann was later published as the "Theory of self-reproducing automata".[5] In his essay
von Neumann postulated that a computer program could reproduce.
In 1972 Veith Risak published his article "Selbstreproduzierende Automaten mit minimaler
Informationsübertragung" (Self-reproducing automata with minimal information exchange).[6]
The article describes a fully functional virus written in assembler language for a SIEMENS
4004/35 computer system.
In 1980 Jürgen Kraus wrote his diplom thesis "Selbstreproduktion bei Programmen" (Self-
reproduction of programs) at the University of Dortmund.[7] In his work Kraus postulated that
computer programs can behave in a way similar to biological viruses.
In 1984 Fred Cohen from the University of Southern California wrote his paper "Computer
Viruses - Theory and Experiments".[8] It was the first paper to explicitly call a self-reproducing
program a "virus"; a term introduced by his mentor Leonard Adleman.
An article that describes "useful virus functionalities" was published by J. B. Gunn under the title
"Use of virus functions to provide a virtual APL interpreter under user control" in 1984.[9]
Science Fiction
The Terminal Man, a science fiction novel by Michael Crichton (1972), told (as a sideline story)
of a computer with telephone modem dialing capability, which had been programmed to
randomly dial phone numbers until it hit a modem that is answered by another computer. It then
attempted to program the answering computer with its own program, so that the second computer
would also begin dialing random numbers, in search of yet another computer to program. The
program is assumed to spread exponentially through susceptible computers.
The actual term 'virus' was first used in David Gerrold's 1972 novel, When HARLIE Was One. In
that novel, a sentient computer named HARLIE writes viral software to retrieve damaging
personal information from other computers to blackmail the man who wants to turn him off.
Virus programs
The Creeper virus was first detected on ARPANET, the forerunner of the Internet, in the early
1970s.[10] Creeper was an experimental self-replicating program written by Bob Thomas at BBN
Technologies in 1971.[11] Creeper used the ARPANET to infect DEC PDP-10 computers running
the TENEX operating system.[12] Creeper gained access via the ARPANET and copied itself to
the remote system where the message, "I'm the creeper, catch me if you can!" was displayed. The
Reaper program was created to delete Creeper.[13]
A program called "Elk Cloner" was the first computer virus to appear "in the wild" — that is,
outside the single computer or lab where it was created.[14] Written in 1981 by Richard Skrenta, it
attached itself to the Apple DOS 3.3 operating system and spread via floppy disk.[14][15] This
virus, created as a practical joke when Skrenta was still in high school, was injected in a game on
a floppy disk. On its 50th use the Elk Cloner virus would be activated, infecting the computer
and displaying a short poem beginning "Elk Cloner: The program with a personality."
The first PC virus in the wild was a boot sector virus dubbed (c)Brain,[16] created in 1986 by the
Farooq Alvi Brothers in Lahore, Pakistan, reportedly to deter piracy of the software they had
written.[17]
Before computer networks became widespread, most viruses spread on removable media,
particularly floppy disks. In the early days of the personal computer, many users regularly
exchanged information and programs on floppies. Some viruses spread by infecting programs
stored on these disks, while others installed themselves into the disk boot sector, ensuring that
they would be run when the user booted the computer from the disk, usually inadvertently. PCs
of the era would attempt to boot first from a floppy if one had been left in the drive. Until floppy
disks fell out of use, this was the most successful infection strategy and boot sector viruses were
the most common in the wild for many years.[1]
Traditional computer viruses emerged in the 1980s, driven by the spread of personal computers
and the resultant increase in BBS, modem use, and software sharing. Bulletin board-driven
software sharing contributed directly to the spread of Trojan horse programs, and viruses were
written to infect popularly traded software. Shareware and bootleg software were equally
common vectors for viruses on BBS's.[citation needed]
Macro viruses have become common since the mid-1990s. Most of these viruses are written in
the scripting languages for Microsoft programs such as Word and Excel and spread throughout
Microsoft Office by infecting documents and spreadsheets. Since Word and Excel were also
available for Mac OS, most could also spread to Macintosh computers. Although most of these
viruses did not have the ability to send infected e-mail, those viruses which did take advantage of
the Microsoft Outlook COM interface.[citation needed]
Some old versions of Microsoft Word allow macros to replicate themselves with additional blank
lines. If two macro viruses simultaneously infect a document, the combination of the two, if also
self-replicating, can appear as a "mating" of the two and would likely be detected as a virus
unique from the "parents".[18]
A virus may also send a web address link as an instant message to all the contacts on an infected
machine. If the recipient, thinking the link is from a friend (a trusted source) follows the link to
the website, the virus hosted at the site may be able to infect this new computer and continue
propagating.
Viruses that spread using cross-site scripting were first reported in 2002,[19] and were
academically demonstrated in 2005.[20] There have been multiple instances of the cross-site
scripting viruses in the wild, exploiting websites such as MySpace and Yahoo.
Infection strategies
In order to replicate itself, a virus must be permitted to execute code and write to memory. For
this reason, many viruses attach themselves to executable files that may be part of legitimate
programs. If a user attempts to launch an infected program, the virus' code may be executed
simultaneously. Viruses can be divided into two types based on their behavior when they are
executed. Nonresident viruses immediately search for other hosts that can be infected, infect
those targets, and finally transfer control to the application program they infected. Resident
viruses do not search for hosts when they are started. Instead, a resident virus loads itself into
memory on execution and transfers control to the host program. The virus stays active in the
background and infects new hosts when those files are accessed by other programs or the
operating system itself.
Nonresident viruses
Nonresident viruses can be thought of as consisting of a finder module and a replication module.
The finder module is responsible for finding new files to infect. For each new executable file the
finder module encounters, it calls the replication module to infect that file.
Resident viruses
Resident viruses contain a replication module that is similar to the one that is employed by
nonresident viruses. This module, however, is not called by a finder module. The virus loads the
replication module into memory when it is executed instead and ensures that this module is
executed each time the operating system is called to perform a certain operation. The replication
module can be called, for example, each time the operating system executes a file. In this case
the virus infects every suitable program that is executed on the computer.
Resident viruses are sometimes subdivided into a category of fast infectors and a category of
slow infectors. Fast infectors are designed to infect as many files as possible. A fast infector, for
instance, can infect every potential host file that is accessed. This poses a special problem when
using anti-virus software, since a virus scanner will access every potential host file on a
computer when it performs a system-wide scan. If the virus scanner fails to notice that such a
virus is present in memory the virus can "piggy-back" on the virus scanner and in this way infect
all files that are scanned. Fast infectors rely on their fast infection rate to spread. The
disadvantage of this method is that infecting many files may make detection more likely, because
the virus may slow down a computer or perform many suspicious actions that can be noticed by
anti-virus software. Slow infectors, on the other hand, are designed to infect hosts infrequently.
Some slow infectors, for instance, only infect files when they are copied. Slow infectors are
designed to avoid detection by limiting their actions: they are less likely to slow down a
computer noticeably and will, at most, infrequently trigger anti-virus software that detects
suspicious behavior by programs. The slow infector approach, however, does not seem very
successful.
Binary executable files (such as COM files and EXE files in MS-DOS, Portable
Executable files in Microsoft Windows, the Mach-O format in OSX, and ELF files in
Linux)
Volume Boot Records of floppy disks and hard disk partitions
The master boot record (MBR) of a hard disk
General-purpose script files (such as batch files in MS-DOS and Microsoft Windows,
VBScript files, and shell script files on Unix-like platforms).
Application-specific script files (such as Telix-scripts)
System specific autorun script files (such as Autorun.inf file needed by Windows to
automatically run software stored on USB Memory Storage Devices).
Documents that can contain macros (such as Microsoft Word documents, Microsoft
Excel spreadsheets, AmiPro documents, and Microsoft Access database files)
Cross-site scripting vulnerabilities in web applications (see XSS Worm)
Arbitrary computer files. An exploitable buffer overflow, format string, race condition or
other exploitable bug in a program which reads the file could be used to trigger the
execution of code hidden within it. Most bugs of this type can be made more difficult to
exploit in computer architectures with protection features such as an execute disable bit
and/or address space layout randomization.
PDFs, like HTML, may link to malicious code. PDFs can also be infected with malicious code.
In operating systems that use file extensions to determine program associations (such as
Microsoft Windows), the extensions may be hidden from the user by default. This makes it
possible to create a file that is of a different type than it appears to the user. For example, an
executable may be created named "picture.png.exe", in which the user sees only "picture.png"
and therefore assumes that this file is an image and most likely is safe, yet when opened runs the
executable on the client machine.
An additional method is to generate the virus code from parts of existing operating system files
by using the CRC16/CRC32 data. The initial code can be quite small (tens of bytes) and unpack
a fairly large virus. This is analogous to a biological "prion" in the way it works but is vulnerable
to signature based detection. This attack has not yet been seen "in the wild".
Some viruses can infect files without increasing their sizes or damaging the files. They
accomplish this by overwriting unused areas of executable files. These are called cavity viruses.
For example, the CIH virus, or Chernobyl Virus, infects Portable Executable files. Because those
files have many empty gaps, the virus, which was 1 KB in length, did not add to the size of the
file.
Some viruses try to avoid detection by killing the tasks associated with antivirus software before
it can detect them.
As computers and operating systems grow larger and more complex, old hiding techniques need
to be updated or replaced. Defending a computer against viruses may demand that a file system
migrate towards detailed and explicit permission for every kind of file access.
A virus needs to infect hosts in order to spread further. In some cases, it might be a bad idea to
infect a host program. For example, many anti-virus programs perform an integrity check of their
own code. Infecting such programs will therefore increase the likelihood that the virus is
detected. For this reason, some viruses are programmed not to infect programs that are known to
be part of anti-virus software. Another type of host that viruses sometimes avoid are bait files.
Bait files (or goat files) are files that are specially created by anti-virus software, or by anti-virus
professionals themselves, to be infected by a virus. These files can be created for various
reasons, all of which are related to the detection of the virus:
Anti-virus professionals can use bait files to take a sample of a virus (i.e. a copy of a
program file that is infected by the virus). It is more practical to store and exchange a
small, infected bait file, than to exchange a large application program that has been
infected by the virus.
Anti-virus professionals can use bait files to study the behavior of a virus and evaluate
detection methods. This is especially useful when the virus is polymorphic. In this case,
the virus can be made to infect a large number of bait files. The infected files can be used
to test whether a virus scanner detects all versions of the virus.
Some anti-virus software employs bait files that are accessed regularly. When these files
are modified, the anti-virus software warns the user that a virus is probably active on the
system.
Since bait files are used to detect the virus, or to make detection possible, a virus can benefit
from not infecting them. Viruses typically do this by avoiding suspicious programs, such as
small program files or programs that contain certain patterns of 'garbage instructions'.
A related strategy to make baiting difficult is sparse infection. Sometimes, sparse infectors do
not infect a host file that would be a suitable candidate for infection in other circumstances. For
example, a virus can decide on a random basis whether to infect a file or not, or a virus can only
infect host files on particular days of the week.
Stealth
Some viruses try to trick antivirus software by intercepting its requests to the operating system.
A virus can hide itself by intercepting the antivirus software’s request to read the file and passing
the request to the virus, instead of the OS. The virus can then return an uninfected version of the
file to the antivirus software, so that it seems that the file is "clean". Modern antivirus software
employs various techniques to counter stealth mechanisms of viruses. The only completely
reliable method to avoid stealth is to boot from a medium that is known to be clean.
Self-modification
Most modern antivirus programs try to find virus-patterns inside ordinary programs by scanning
them for so-called virus signatures. A signature is a characteristic byte-pattern that is part of a
certain virus or family of viruses. If a virus scanner finds such a pattern in a file, it notifies the
user that the file is infected. The user can then delete, or (in some cases) "clean" or "heal" the
infected file. Some viruses employ techniques that make detection by means of signatures
difficult but probably not impossible. These viruses modify their code on each infection. That is,
each infected file contains a different variant of the virus.
A more advanced method is the use of simple encryption to encipher the virus. In this case, the
virus consists of a small decrypting module and an encrypted copy of the virus code. If the virus
is encrypted with a different key for each infected file, the only part of the virus that remains
constant is the decrypting module, which would (for example) be appended to the end. In this
case, a virus scanner cannot directly detect the virus using signatures, but it can still detect the
decrypting module, which still makes indirect detection of the virus possible. Since these would
be symmetric keys, stored on the infected host, it is in fact entirely possible to decrypt the final
virus, but this is probably not required, since self-modifying code is such a rarity that it may be
reason for virus scanners to at least flag the file as suspicious.
An old, but compact, encryption involves XORing each byte in a virus with a constant, so that
the exclusive-or operation had only to be repeated for decryption. It is suspicious for a code to
modify itself, so the code to do the encryption/decryption may be part of the signature in many
virus definitions.
Polymorphic code
Polymorphic code was the first technique that posed a serious threat to virus scanners. Just like
regular encrypted viruses, a polymorphic virus infects files with an encrypted copy of itself,
which is decoded by a decryption module. In the case of polymorphic viruses, however, this
decryption module is also modified on each infection. A well-written polymorphic virus
therefore has no parts which remain identical between infections, making it very difficult to
detect directly using signatures. Antivirus software can detect it by decrypting the viruses using
an emulator, or by statistical pattern analysis of the encrypted virus body. To enable polymorphic
code, the virus has to have a polymorphic engine (also called mutating engine or mutation
engine) somewhere in its encrypted body. See Polymorphic code for technical detail on how
such engines operate.[21]
Some viruses employ polymorphic code in a way that constrains the mutation rate of the virus
significantly. For example, a virus can be programmed to mutate only slightly over time, or it
can be programmed to refrain from mutating when it infects a file on a computer that already
contains copies of the virus. The advantage of using such slow polymorphic code is that it makes
it more difficult for antivirus professionals to obtain representative samples of the virus, because
bait files that are infected in one run will typically contain identical or similar samples of the
virus. This will make it more likely that the detection by the virus scanner will be unreliable, and
that some instances of the virus may be able to avoid detection.
Metamorphic code
To avoid being detected by emulation, some viruses rewrite themselves completely each time
they are to infect new executables. Viruses that utilize this technique are said to be metamorphic.
To enable metamorphism, a metamorphic engine is needed. A metamorphic virus is usually
very large and complex. For example, W32/Simile consisted of over 14000 lines of Assembly
language code, 90% of which is part of the metamorphic engine.[22][23]
Just as genetic diversity in a population decreases the chance of a single disease wiping out a
population, the diversity of software systems on a network similarly limits the destructive
potential of viruses. This became a particular concern in the 1990s, when Microsoft gained
market dominance in desktop operating systems and office suites. The users of Microsoft
software (especially networking software such as Microsoft Outlook and Internet Explorer) are
especially vulnerable to the spread of viruses. Microsoft software is targeted by virus writers due
to their desktop dominance, and is often criticized for including many errors and holes for virus
writers to exploit. Integrated and non-integrated Microsoft applications (such as Microsoft
Office) and applications with scripting languages with access to the file system (for example
Visual Basic Script (VBS), and applications with networking features) are also particularly
vulnerable.
Although Windows is by far the most popular target operating system for virus writers, viruses
also exist on other platforms. Any operating system that allows third-party programs to run can
theoretically run viruses. Some operating systems are more secure than others. Unix-based
operating systems (and NTFS-aware applications on Windows NT based platforms) only allow
their users to run executables within their own protected memory space.
An Internet based experiment revealed that there were cases when people willingly pressed a
particular button to download a virus. Security analyst Didier Stevens ran a half year advertising
campaign on Google AdWords which said "Is your PC virus-free? Get it infected here!". The
result was 409 clicks.[24][25]
As of 2006, there are relatively few security exploits targeting Mac OS X (with a Unix-based file
system and kernel).[26] The number of viruses for the older Apple operating systems, known as
Mac OS Classic, varies greatly from source to source, with Apple stating that there are only four
known viruses, and independent sources stating there are as many as 63 viruses. Many Mac OS
Classic viruses targeted the HyperCard authoring environment. The difference in virus
vulnerability between Macs and Windows is a chief selling point, one that Apple uses in their
Get a Mac advertising.[27] In January 2009, Symantec announced the discovery of a trojan that
targets Macs.[28] This discovery did not gain much coverage until April 2009.[28]
While Linux, and Unix in general, has always natively blocked normal users from having access
to make changes to the operating system environment, Windows users are generally not. This
difference has continued partly due to the widespread use of administrator accounts in
contemporary versions like XP. In 1997, when a virus for Linux was released – known as
"Bliss" – leading antivirus vendors issued warnings that Unix-like systems could fall prey to
viruses just like Windows.[29] The Bliss virus may be considered characteristic of viruses – as
opposed to worms – on Unix systems. Bliss requires that the user run it explicitly, and it can only
infect programs that the user has the access to modify. Unlike Windows users, most Unix users
do not log in as an administrator user except to install or configure software; as a result, even if a
user ran the virus, it could not harm their operating system. The Bliss virus never became
widespread, and remains chiefly a research curiosity. Its creator later posted the source code to
Usenet, allowing researchers to see how it worked.[30]
Because software is often designed with security features to prevent unauthorized use of system
resources, many viruses must exploit software bugs in a system or application to spread.
Software development strategies that produce large numbers of bugs will generally also produce
potential exploits.
Many users install anti-virus software that can detect and eliminate known viruses after the
computer downloads or runs the executable. There are two common methods that an anti-virus
software application uses to detect viruses. The first, and by far the most common method of
virus detection is using a list of virus signature definitions. This works by examining the content
of the computer's memory (its RAM, and boot sectors) and the files stored on fixed or removable
drives (hard drives, floppy drives), and comparing those files against a database of known virus
"signatures". The disadvantage of this detection method is that users are only protected from
viruses that pre-date their last virus definition update. The second method is to use a heuristic
algorithm to find viruses based on common behaviors. This method has the ability to detect
novel viruses that anti-virus security firms have yet to create a signature for.
Some anti-virus programs are able to scan opened files in addition to sent and received e-mails
"on the fly" in a similar manner. This practice is known as "on-access scanning". Anti-virus
software does not change the underlying capability of host software to transmit viruses. Users
must update their software regularly to patch security holes. Anti-virus software also needs to be
regularly updated in order to recognize the latest threats.
One may also minimize the damage done by viruses by making regular backups of data (and the
operating systems) on different media, that are either kept unconnected to the system (most of
the time), read-only or not accessible for other reasons, such as using different file systems. This
way, if data is lost through a virus, one can start again using the backup (which should preferably
be recent).
If a backup session on optical media like CD and DVD is closed, it becomes read-only and can
no longer be affected by a virus (so long as a virus or infected file was not copied onto the
CD/DVD). Likewise, an operating system on a bootable CD can be used to start the computer if
the installed operating systems become unusable. Backups on removable media must be
carefully inspected before restoration. The Gammima virus, for example, propagates via
removable flash drives.[31][32]
Recovery methods
Once a computer has been compromised by a virus, it is usually unsafe to continue using the
same computer without completely reinstalling the operating system. However, there are a
number of recovery options that exist after a computer has a virus. These actions depend on
severity of the type of virus.
Virus removal
One possibility on Windows Me, Windows XP, Windows Vista and Windows 7 is a tool known
as System Restore, which restores the registry and critical system files to a previous checkpoint.
Often a virus will cause a system to hang, and a subsequent hard reboot will render a system
restore point from the same day corrupt. Restore points from previous days should work
provided the virus is not designed to corrupt the restore files or also exists in previous restore
points.[33] Some viruses, however, disable System Restore and other important tools such as Task
Manager and Command Prompt. An example of a virus that does this is CiaDoor. However,
many such viruses can be removed by rebooting the computer, entering Windows safe mode, and
then using system tools.
Administrators have the option to disable such tools from limited users for various reasons (for
example, to reduce potential damage from and the spread of viruses). A virus can modify the
registry to do the same even if the Administrator is controlling the computer; it blocks all users
including the administrator from accessing the tools. The message "Task Manager has been
disabled by your administrator" may be displayed, even to the administrator.[citation needed]
Users running a Microsoft operating system can access Microsoft's website to run a free scan,
provided they have their 20-digit registration number. Many websites run by anti-virus software
companies provide free online virus scanning, with limited cleaning facilities (the purpose of the
sites is to sell anti-virus products). Some websites allow a single suspicious file to be checked by
many antivirus programs in one operation.
Reinstalling the operating system is another approach to virus removal. It involves either
reformatting the computer's hard drive and installing the OS and all programs from original
media, or restoring the entire partition with a clean backup image. User data can be restored by
booting from a Live CD, or putting the hard drive into another computer and booting from its
operating system with great care not to infect the second computer by executing any infected
programs on the original drive; and once the system has been restored precautions must be taken
to avoid reinfection from a restored executable file.
These methods are simple to do, may be faster than disinfecting a computer, and are guaranteed
to remove any malware. If the operating system and programs must be reinstalled from scratch,
the time and effort to reinstall, reconfigure, and restore user preferences must be taken into
account. Restoring from an image is much faster, totally safe, and restores the exact
configuration to the state it was in when the image was made, with no further trouble.