FSCK Paper For AIX
FSCK Paper For AIX
FSCK Paper For AIX
T. J. Kowalski
Bell Laboratories
Murray Hill, New Jersey 07974
ABSTRACT
This document reflects the use of fsck with the 4.2BSD and 4.3BSD file system
organization. This is a revision of the original paper written by T. J. Kowalski.
File System Check Program (fsck) is an interactive file system check and repair
program. Fsck uses the redundant structural information in the UNIX file system to per-
form several consistency checks. If an inconsistency is detected, it is reported to the
operator, who may elect to fix or ignore each inconsistency. These inconsistencies result
from the permanent interruption of the file system updates, which are performed every
time a file is modified. Unless there has been a hardware failure, fsck is able to repair cor-
rupted file systems using procedures based upon the order in which UNIX honors these
file system update requests.
The purpose of this document is to describe the normal updating of the file system,
to discuss the possible causes of file system corruption, and to present the corrective
actions implemented by fsck. Both the program and the interaction between the program
and the operator are described.
TABLE OF CONTENTS
1. Introduction
Acknowledgements
References
4. Appendix A
4.1. Conventions
4.2. Initialization
4.3. Phase 1 - Check Blocks and Sizes
4.4. Phase 1b - Rescan for more Dups
4.5. Phase 2 - Check Pathnames
4.6. Phase 3 - Check Connectivity
4.7. Phase 4 - Check Reference Counts
4.8. Phase 5 - Check Cyl groups
4.9. Cleanup
The UNIX File System Check Program SMM:3-3
1. Introduction
This document reflects the use of fsck with the 4.2BSD and 4.3BSD file system organization. This is
a revision of the original paper written by T. J. Kowalski.
When a UNIX operating system is brought up, a consistency check of the file systems should always
be performed. This precautionary measure helps to insure a reliable environment for file storage on disk. If
an inconsistency is discovered, corrective action must be taken. Fsck runs in two modes. Normally it is run
non-interactively by the system after a normal boot. When running in this mode, it will only make changes
to the file system that are known to always be correct. If an unexpected inconsistency is found fsck will exit
with a non-zero exit status, leaving the system running single-user. Typically the operator then runs fsck
interactively. When running in this mode, each problem is listed followed by a suggested corrective action.
The operator must decide whether or not the suggested correction should be made.
The purpose of this memo is to dispel the mystique surrounding file system inconsistencies. It first
describes the updating of the file system (the calm before the storm) and then describes file system corrup-
tion (the storm). Finally, the set of deterministic corrective actions used by fsck (the Coast Guard to the res-
cue) is presented.
2.1. Superblock
A file system is described by its super-block. The super-block is built when the file system is created
(newfs(8)) and never changes. The super-block contains the basic parameters of the file system, such as the
number of data blocks it contains and a count of the maximum number of files. Because the super-block
contains critical data, newfs replicates it to protect against catastrophic loss. The default super block always
resides at a fixed offset from the beginning of the file system’s disk partition. The redundant super blocks
are not referenced unless a head crash or other hard disk error causes the default super-block to be unus-
able. The redundant blocks are sprinkled throughout the disk partition.
Within the file system are files. Certain files are distinguished as directories and contain collections
of pointers to files that may themselves be directories. Every file has a descriptor associated with it called
an inode. The inode contains information describing ownership of the file, time stamps indicating modifi-
cation and access times for the file, and an array of indices pointing to the data blocks for the file. In this
section, we assume that the first 12 blocks of the file are directly referenced by values stored in the inode
structure itself†. The inode structure may also contain references to indirect blocks containing further data
block indices. In a file system with a 4096 byte block size, a singly indirect block contains 1024 further
block addresses, a doubly indirect block contains 1024 addresses of further single indirect blocks, and a
triply indirect block contains 1024 addresses of further doubly indirect blocks (the triple indirect block is
never needed in practice).
In order to create files with up to 2↑32 bytes, using only two levels of indirection, the minimum size
of a file system block is 4096 bytes. The size of file system blocks can be any power of two greater than or
equal to 4096. The block size of the file system is maintained in the super-block, so it is possible for file
systems of different block sizes to be accessible simultaneously on the same system. The block size must
be decided when newfs creates the file system; the block size cannot be subsequently changed without
rebuilding the file system.
†The actual number may vary from system to system, but is usually in the range 5-13.
SMM:3-4 The UNIX File System Check Program
2.4. Fragments
To avoid waste in storing small files, the file system space allocator divides a single file system block
into one or more fragments. The fragmentation of the file system is specified when the file system is cre-
ated; each file system block can be optionally broken into 2, 4, or 8 addressable fragments. The lower
bound on the size of these fragments is constrained by the disk sector size; typically 512 bytes is the lower
bound on fragment size. The block map associated with each cylinder group records the space availability
at the fragment level. Aligned fragments are examined to determine block availability.
On a file system with a block size of 4096 bytes and a fragment size of 1024 bytes, a file is repre-
sented by zero or more 4096 byte blocks of data, and possibly a single fragmented block. If a file system
block must be fragmented to obtain space for a small amount of data, the remainder of the block is made
available for allocation to other files. For example, consider an 11000 byte file stored on a 4096/1024 byte
file system. This file uses two full size blocks and a 3072 byte fragment. If no fragments with at least 3072
bytes are available when the file is created, a full size block is split yielding the necessary 3072 byte frag-
ment and an unused 1024 byte fragment. This remaining fragment can be allocated to another file, as
needed.
before the pointer to the block in the old inode has been cleared in the copy of the old inode on the disk,
and after the pointer to the block in the new inode has been written out to the copy of the new inode on the
disk. Here, there is no deterministic method for deciding which inode should really claim the block. A
similar problem can arise with a multiply claimed inode.
The problem with asynchronous inode updates can be avoided by doing all inode deallocations syn-
chronously. Consequently, inodes and indirect blocks are written to the disk synchronously (i.e. the process
blocks until the information is really written to disk) when they are being deallocated. Similarly inodes are
kept consistent by synchronously deleting, adding, or changing directory entries.
Fsck checks the range of each block number claimed by an inode. If the block number is lower than
the first data block in the file system, or greater than the last data block, then the block number is a bad
block number. Many bad blocks in an inode are usually caused by an indirect block that was not written to
the file system, a condition which can only occur if there has been a hardware failure. If an inode contains
bad block numbers, fsck prompts the operator to clear it.
Acknowledgements
I thank Bill Joy, Sam Leffler, Robert Elz and Dennis Ritchie for their suggestions and help in imple-
menting the new file system. Thanks also to Robert Henry for his editorial input to get this document
SMM:3-8 The UNIX File System Check Program
together. Finally we thank our sponsors, the National Science Foundation under grant MCS80-05144, and
the Defense Advance Research Projects Agency (DoD) under Arpa Order No. 4031 monitored by Naval
Electronic System Command under Contract No. N00039-82-C-0235. (Kirk McKusick, July 1983)
I would like to thank Larry A. Wehr for advice that lead to the first version of fsck and Rick B. Brandt
for adapting fsck to UNIX/TS. (T. Kowalski, July 1979)
References
[Dolotta78] Dolotta, T. A., and Olsson, S. B. eds., UNIX User’s Manual, Edition 1.1, January
1978.
[Joy83] Joy, W., Cooper, E., Fabry, R., Leffler, S., McKusick, M., and Mosher, D. 4.2BSD
System Manual, University of California at Berkeley, Computer Systems Research
Group Technical Report #4, 1982.
[McKusick84] McKusick, M., Joy, W., Leffler, S., and Fabry, R. A Fast File System for UNIX,
ACM Transactions on Computer Systems 2, 3. pp. 181-197, August 1984.
[Ritchie78] Ritchie, D. M., and Thompson, K., The UNIX Time-Sharing System, The Bell
System Technical Journal 57, 6 (July-August 1978, Part 2), pp. 1905-29.
[Thompson78] Thompson, K., UNIX Implementation, The Bell System Technical Journal 57, 6
(July-August 1978, Part 2), pp. 1931-46.
The UNIX File System Check Program SMM:3-9
4.1. Conventions
Fsck is a multi-pass file system check program. Each file system pass invokes a different Phase of
the fsck program. After the initial setup, fsck performs successive Phases over each file system, checking
blocks and sizes, path-names, connectivity, reference counts, and the map of free blocks, (possibly rebuild-
ing it), and performs some cleanup.
Normally fsck is run non-interactively to preen the file systems after an unclean halt. While preen’ing a file
system, it will only fix corruptions that are expected to occur from an unclean halt. These actions are a
proper subset of the actions that fsck will take when it is running interactively. Throughout this appendix
many errors have several options that the operator can take. When an inconsistency is detected, fsck reports
the error condition to the operator. If a response is required, fsck prints a prompt message and waits for a
response. When preen’ing most errors are fatal. For those that are expected, the response taken is noted.
This appendix explains the meaning of each error condition, the possible responses, and the related error
conditions.
The error conditions are organized by the Phase of the fsck program in which they can occur. The error
conditions that may occur in more than one Phase will be discussed in initialization.
4.2. Initialization
Before a file system check can be performed, certain tables have to be set up and certain files opened.
This section concerns itself with the opening of files and the initialization of tables. This section lists error
conditions resulting from command line options, memory requests, opening of files, status of files, file sys-
tem size checks, and creation of the scratch file. All the initialization errors are fatal when the file system is
being preen’ed.
C option?
C is not a legal option to fsck; legal options are −b, −c, −y, −n, and −p. Fsck terminates on this error condi-
tion. See the fsck(8) manual entry for further detail.
Can’t stat F
Can’t make sense out of name F
Fsck’s request for statistics about the file system F failed. When running manually, it ignores this file sys-
tem and continues checking the next file system given. Check access modes of F.
Can’t open F
Fsck’s request attempt to open the file system F failed. When running manually, it ignores this file system
SMM:3-10 The UNIX File System Check Program
and continues checking the next file system given. Check access modes of F.
F: (NO WRITE)
Either the −n flag was specified or fsck’s attempt to open the file system F for writing failed. When running
manually, all the diagnostics are printed out, but no modifications are attempted to fix them.
INTERNAL INCONSISTENCY: M
Fsck’s has had an internal panic, whose message is specified as M. This should never happen. See a guru.
B BAD I=I
Inode I contains block number B with a number lower than the number of the first data block in the file sys-
tem or greater than the number of the last block in the file system. This error condition may invoke the
EXCESSIVE BAD BLKS error condition in Phase 1 (see next paragraph) if inode I has too many block
numbers outside the file system range. This error condition will always invoke the BAD/DUP error condi-
tion in Phase 2 and Phase 4.
B DUP I=I
Inode I contains block number B that is already claimed by another inode. This error condition may invoke
the EXCESSIVE DUP BLKS error condition in Phase 1 if inode I has too many block numbers claimed
by other inodes. This error condition will always invoke Phase 1b and the BAD/DUP error condition in
Phase 2 and Phase 4.
B DUP I=I
Inode I contains block number B that is already claimed by another inode. This error condition will always
invoke the BAD/DUP error condition in Phase 2. You can determine which inodes have overlapping blocks
by examining this error condition and the DUP error condition in Phase 1.
SMM:3-14 The UNIX File System Check Program
ZERO LENGTH DIRECTORY I=I OWNER=O MODE=M SIZE=S MTIME=T DIR=F (REMOVE)
A directory entry F has a size S that is zero. The owner O, mode M, size S, modify time T, and directory
name F are printed.
Possible responses to the REMOVE prompt are:
YES the directory entry F is removed; this will always invoke the BAD/DUP error condition in Phase 4.
NO ignore this error condition.
DIRECTORY TOO SHORT I=I OWNER=O MODE=M SIZE=S MTIME=T DIR=F (FIX)
A directory F has been found whose size S is less than the minimum size directory. The owner O, mode M,
size S, modify time T, and directory name F are printed.
Possible responses to the FIX prompt are:
YES increase the size of the directory to the minimum directory size.
NO ignore this directory.
NO skip up to the next directory boundary and resume reading, but do not modify the directory.
BAD INODE NUMBER FOR ‘.’ I=I OWNER=O MODE=M SIZE=S MTIME=T DIR=F (FIX)
A directory I has been found whose inode number for ‘.’ does does not equal I.
Possible responses to the FIX prompt are:
YES change the inode number for ‘.’ to be equal to I.
NO leave the inode number for ‘.’ unchanged.
EXTRA ‘.’ ENTRY I=I OWNER=O MODE=M SIZE=S MTIME=T DIR=F (FIX)
A directory I has been found that has more than one entry for ‘.’.
Possible responses to the FIX prompt are:
YES remove the extra entry for ‘.’.
NO leave the directory unchanged.
BAD INODE NUMBER FOR ‘..’ I=I OWNER=O MODE=M SIZE=S MTIME=T DIR=F (FIX)
A directory I has been found whose inode number for ‘..’ does does not equal the parent of I.
Possible responses to the FIX prompt are:
YES change the inode number for ‘..’ to be equal to the parent of I (‘‘..’’ in the root inode points to itself).
NO leave the inode number for ‘..’ unchanged.
EXTRA ‘..’ ENTRY I=I OWNER=O MODE=M SIZE=S MTIME=T DIR=F (FIX)
A directory I has been found that has more than one entry for ‘..’.
Possible responses to the FIX prompt are:
YES remove the extra entry for ‘..’.
NO leave the directory unchanged.
(CLEAR)
The inode mentioned in the immediately previous error condition can not be reconnected. This cannot
occur if the file system is being preen’ed, since lack of space to reconnect files is a fatal error.
Possible responses to the CLEAR prompt are:
YES de-allocate the inode mentioned in the immediately previous error condition by zeroing its contents.
NO ignore this error condition.
and aborts the attempt to linkup the lost inode. This will always invoke the UNREF error condition
in Phase 4.
NO abort the attempt to linkup the lost inode. This will always invoke the UNREF error condition in
Phase 4.
LINK COUNT type I=I OWNER=O MODE=M SIZE=S MTIME=T COUNT=X SHOULD BE Y
(ADJUST)
The link count for inode I, is X but should be Y. The owner O, mode M, size S, and modify time T are
printed. When preen’ing the link count is adjusted unless the number of references is increasing, a condi-
tion that should never occur unless precipitated by a hardware failure. When the number of references is
increasing under preen mode, fsck exits with the message:
LINK COUNT INCREASING
Possible responses to the ADJUST prompt are:
YES replace the link count of file inode I with Y.
NO ignore this error condition.
inode maps, allocated inodes missing from used-inode maps, and the total used-inode count incorrect.
4.9. Cleanup
Once a file system has been checked, a few cleanup functions are performed. This section lists advi-
sory messages about the file system and modify status of the file system.