0% found this document useful (0 votes)
6 views

Dir STR

fr

Uploaded by

unimourya
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Dir STR

fr

Uploaded by

unimourya
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 3

Directory Structure

UNIX is based on storing data in bits of the disk, labeled as files. Files can be nearly
any length, although modern OSs impose a limit of between 2 GB and 64 TB (kB=1024
bytes, MB=1024 kB, GB=1024 MB, TB=1024 GB) on file length (depending on whether
the file offset counters are 32- or 64-bit integers). Files are often small chunks of data,
and UNIX has filesystems built to insure efficient storage of small and large files, and is
very good at insuring files are not fragmented across a disk. This is in distinct contrast to
the older Windows filesystems, which would fragment files quite heavily, slowing file
I/O considerably. Users used to the old (pre-NTFS) Windows world should note that
UNIX does not ever require a defragmentation of a disk.

Files and Directories in UNIX


Files in UNIX can have names that are less than 1024 characters long. There is no
absolutely forbidden character in a name, other than (possibly) ``/'', which is used as the
directory separator. Many characters (?, *, !, etc.) are more difficult to use than standard
letters and numbers, but they can be used. As noted before, files can be anywhere from 0
bytes to 64 TB in size, although most files are around 1 kB in size.

Directories have the same naming rules as files, and are effectively just a small, special
type of file. They hold the entries for files and directories that are contained below them
in the directory tree. All UNIX machines have a directory tree that starts at the root
directory ``/''. Users generally have a home directory located under /home.

UNIX machines do not have a concept of drive letters or names. Instead, all disks on a
system (including network-accessible disks) are given a unique mount point, or directory,
where they are accessible. A disk is mounted to a directory, and the contents appear as
files and directories under the mount point. Thus, a typical UNIX server will have one
disk (or partition) mounted for /, another for /usr, another for /home, and perhaps also
one for /var. To a user, all these disks appear as a single, coherent, filesystem. This
means that an administrator needs to keep track of what is mounted where (for space
concerns), but a user need not. If a disk fills, an administrator can move all or part of the
data on that disk to a new disk (under a new or same mount point), and the user will not
notice except that the space available increases.

This ability to make disks and the filesystem independant, to maintain constant absolute
paths to a given file regardless of the disk it is actually stored on, is invaluable to a
scientist. Upgrades of the disk space on a UNIX machine are not accompanied by a great
reorganization of file locations; the disks simply get bigger, or entire directory trees are
shifted to new disks, while retaining their old filesystem locations! Hence, once an
absolute path is chosen, it can be maintained for all time. This makes maintaining
software and data heirarchies vastly easier.

Standardized Directory Names


UNIX, over its' long history, has standardized some directory names and file locations
across all implementations. The following describes some of the ``special'' names. These
are not hard and fast rules, but are strongly followed customs among UNIX
administrators and software designers.
1. /etc holds system-wide configuration files, often in subdirectories.
2. /usr holds system binaries and data for users, such as editors, X11, graphing
programs, and so forth. /usr has many subdirectories, including /usr/bin for the
binaries, /usr/X11 for data and programs for X11, /usr/etc for special
configuration files, and /usr/local for local software not part of the standard
UNIX distribution.
3. /var holds log files, cache directories, and spool directories for mail and printing.
/var/tmp is also used by some editors for temporary storage, as /var is often on a
different disk than /tmp
4. /tmp is meant for temporary storage by all users. This is often its' own disk (or
partition), so filling /tmp does not impact the rest of the system.
5. /bin and /sbin hold system binaries. These are different from the programs in
/usr/bin in that they are designed as the basic programs to run the system.
Programs in /bin and /sbin include the user shells, system startup and shutdown,
and run-time library handlers.
6. /lib and /usr/lib hold system libraries. These are chunks of code used by many
programs to do common tasks. Rather than keep a copy in every program, the
code is kept once in the library, and the run-time dynamic linker stitches the
program and libraries together. /lib holds libraries for the system binaries, /usr/lib
for the binaries in /usr/bin.
7. /home is the typical location of user home directories. Ordinary users in UNIX
are not allowed write permission to the system directories, so they need a place to
store their own files. These go in their ``home directory'', normally
/home/username. Within the home directory, a user normally has absolute
authority, although the system superuser (root) can modify files if necessary.

These directories are present on virtually all UNIX systems, although a specific file may
change its' location depending on the UNIX distribution and system administrator. Often,
administrators who move files from the ``standard'' locations will use a symbolic link to
help users navigate the directory tree.

One Directory per Project


Individual users on UNIX can generally only write to their own home directory, and
subdirectories thereof. This means that all of a user's files are normally in a single
directory structure rooted at /home/username. If a user creates all files in their home
directory, it will quickly become cluttered with thousands of files, which will need
increasingly complex names to avoid namespace collision (only one file can exist
with a given name in a directory).

Therefore, it is best to give each scientific project its' own directory, under the home
directory. All data, results, and interpretations can be kept in the project directory.
Programs which are used by many projects can be kept in a ``bin'' directory in the home
directory, and referenced easily as `` /bin/program name'' in scripts and documentation.

When each project has its' own directory, namespace collision is minimized, so results
and data don't get overwritten on accident. Each project directory can also be the root of
its' own standardized directory tree, with (for example) a subdirectory doc/ for reports
and documents related to the project, data/ for raw data files, processed/ for processed
data, etc. If each project directory has roughly the same structure under it, it is easier to
find results, documents, and data years after finishing a project.

Finally, if each project has its' own directory, it is easy to backup or move an entire
project using UNIX file tools; package the entire directory into an archive (see tar or
cpio) and transfer the one archive to another location. This makes collabaration
significantly easier.

Keep Data and Results Together


Keep the raw data for a project together with the results and interpretations. This makes it
easier to move and archive a project, and also makes it easier to discover what was done
after a project is finished.

If the raw data and results are in the same directory tree (under the project directory),
then scripts which operate on the data and results need short relative paths, which are less
prone to breaking (less fragile) than long, absolute paths. Scripts can use a shallow, local
directory structure in the project directory to track processing steps. This structure can be
specialized to the project (or subproblem), without worrying about breaking another
project's scripts.

Symbolic Links Make Life Better


If it is necessary to have final results from multiple projects in a single directory off the
home, use symbolic links to link the final results from the project directories to
somewhere else. Symbolic links explicitly show the link target, so the source of the
results can be quickly determined.

If files from one project are needed in another, use symbolic links rather than copying the
files. Copied files must be updated by hand; symbolic links need no updating.

Raw data files should generally be kept in a primary location for use, and also in a
backup location in case of corruption or accidental deletion. In this case, do not use links,
symbolic or otherwise.

You might also like