Linux and (Bash) Shell Basics
Linux and (Bash) Shell Basics
Linux and (Bash) Shell Basics
Disclaimer I: Keep in mind that the truly way for mastering Linux is to make
the man and help pages in the command line your best friends:
$ man <COMMAND>
Disclaimer II: The Linux distributions I'm current using are Fedora and Kali (Debian-
based). You should be comfortable exploring several distributions until you find your
favorite. You should definitely aim to go beyond Ubuntu.
Disclaimer III: This guide is written for bash. I encourage you to go further and search
for your favorite shell.
Let's start getting an idea of our system. The Linux filesystem is composed of
several system directories locate at /:
$ ls /
You can verify their sizes and where they are mounted with:
$ df -h .
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/fedora-home ext4 127G 62G 59G 51% /home
/dev
/dev contains device nodes, which are a type of pseudo-file used by most
hardware and software devices (except for network devices).
The directory also contains entries that are created by the udev system, which
creates and manages device nodes on Linux (creating them dynamically when
devices are found).
/var
/var stands for variable and contains files that are expected to be changing in
size and content as the system is running.
For example, the system log files are located at /var/log, the packages and
database files are located at /var/lib, the print queues are located at /var/spool,
temporary files stay inside /var/tmp, and networks services can be found in
subdirectories such as /var/ftp and /var/www.
/etc
/etc stands for the system configuration files. It contains no binary programs,
but it might have some executable scripts.
For instance, the file /etc/resolv.conf tells the system where to go on the
network to obtain the host name of some IP address (i.e. DNS).
Additionally, the /etc/passwd file is the authoritative list of users on any Unix
system. It does not contain the passwords: the encrypted password
information was migrated into /etc/shadow.
/lib
/lib contains libraries (common code shared by applications and needed for
them to run) for essential programs in /bin and /sbin.
This library filenames either start with ld or lib and are called dynamically
loaded libraries (or shared libraries).
/boot
/boot contains the few essential files needed to boot the system.
For every alternative kernel installed on the system, there are four files:
o vmlinuz: the compressed Linux kernel, required for booting.
o initramfs or initrd: the initial RAM filesystem, required for booting.
o config: the kernel configuration file, used for debugging.
o system.map: the kernel symbol table.
o GRUB files can also be found here.
/opt
/tmp
/usr
/dev Specials
There exist files provided by the operating system that do not represent any
physical device, but provide a way to access special features:
o /dev/null ignores everything written to it. It's convenient for discarding
unwanted output.
o /dev/zero contains an infinite numbers of zero bytes, which can be
useful for creating files of a specified length.
o /dev/urandom and /dev/random contain infinite stream of operating-
system-generated random numbers, available to any application that
wants to read them. The difference between them is that the second
guarantees strong randomness (it will wait until enough is available)
and so it should used for encryption, while the former can be used for
games.
For example, to output random bytes, you can type:
$ cat /dev/urandom | strings
The Kernel
The Linux Kernel is the program that manages input/output requests from
software, and translates them into data processing instructions for the central
processing unit (CPU).
To find the Kernel information you can type:
$ cat /proc/version
Linux version 3.14.9-200.fc20.x86_64 ([email protected])
(gcc version 4.8.3 20140624 (Red Hat 4.8.3-1) (GCC) ) #1 SMP Thu Jun 26 21:40:51
UTC 2014
You can also print similar system information with the specific command to
print system information, uname. The flag -a stands for all:
$ uname -a
Linux XXXXX 3.14.9-200.fc20.x86_64 #1 SMP Thu Jun 26 21:40:51 UTC 2014 x86_64
x86_64 x86_64 GNU/Linux
For instance, we might be interested on checking whether you are using the
latest Kernel. You can do this by checking whether the outputs of the following
commands match:
Additionally, for Fedora (and RPM systems) you can check what kernels are
installed with:
$ rpm -q kernel
Processes
A running program is called process. Each process has a owner (in the same
sense as when we talk about file permissions below).
You can find out which programs are running with the ps command. This also
gives the process ID or PID, which is a unique long-term identity for the
process (different copies of a given program will have separate PIDs).
To put a job (process) in the background we either run it with & or we press
CTRL-Z and then type bg. To bring back to the foreground, we type fg.
To get the list of running jobs in the shell, we type jobs. Each job has a job
ID which can be used with the percent sign % to bg, fg or kill (described
below).
ps
To see the processes that were not started from your current session you can
run:
$ ps x
$ ps aux
$ ps aux | grep -w Z
or
$ ps -e
top
$ top
I particularly like htop over top, which needs to be installed if you want to use
it.
kill
To stop running a command you can use kill. This will send a message
called signal to the program. There are 64 different signals, some having
distinct meanings from stop running:
Pressing CTRL-C is a simpler way to tell the program to quit, and it sends a
message called SIGINT. You can also specify the PID as an argument to kill.
uptime
Another great command is uptime, which shows how long the system has
been running, with a measure of its load average as well:
$ uptime
Finally, you can change processes priority using nice (runs a program with
modified scheduling priority) and renice(alter priority of running processes).
Environment Variables
You can see the environment variables and configuration in your system with:
$ set
or
$ env
$ export VAR=<value>
$ echo $VAR
The PATH (search path) is the list of directories that the shell look in to try to
find a particular command. For example, when you type ls it will look
at /bin/ls. The path is stored in the variable PATH, which is a list of directory
names separated by colons and it's coded inside ./bashrc. To export a new
path you can do:
$ export PATH=$PATH:/<DIRECTORY>
Variable in Scripts
Inside a running shell script, there are pseudo-environment variables that are
called with $1, $2, etc., for individual arguments that were passed to the script
when it was run. In addition, $0 is the name of the script and $@ is for the list
of all the command-line arguments.
The leading dot in a file is used as an indicator to not list these files normally,
but only when they are specifically requested. The reason is that, generally,
dot-files are used to store configuration and sensitive information for
applications.
~/.bashrc
~/.bashrc contains scripts and variables that are executed when bash is
invoked.
It's a good experience to customize your ~/.bashrc. Just google for samples,
or take a look at this site dedicated for sharing dot-files, or at mine. Don't
forget to source your ./bashrc file every time you make a change (opening a
new terminal has the same effect):
$ source ~/.bashrc
Sensitive dot-files
If you use cryptographic programs such as ssh and gpg, you'll find that they
keep a lot of information in the directories ~/.ssh and ~/.gnupg.
If you are a Firefox user, the ~/.mozilla directory contains your web browsing
history, bookmarks, cookies, and any saved passwords.
If you use Pidgin, the ~/.purple directory (after the name of the IM library)
contains private information. This includes sensitive cryptographic keys for
users of cryptographic extensions to Pidgin such as Off-the-Record.
File Descriptors
A file descriptor (FD) is a number indicator for accessing an I/O resource. The
values are the following:
o fd 0: stdin (standard input).
o fd 1: stdout (standard output).
o fd 2: stderr (standard error).
This naming is used for manipulation of these resources in the command line.
For example, to send an input to a program you use <:
$ <PROGRAM> < <INPUT>
To send a program's output somewhere else than the terminal (such as a file),
you use >. For example, to just discard the output:
To send the program's error messages to a file you use the file descriptor 2:
To send the program's error messages to the same place where stdout is
going, i.e. merging it into a single stream (this works greatly for pipelines):
$ <PROGRAM> 2>&1
File Permissions
chmod
Unix permissions model does not support access control lists allowing a file to
be shared with an enumerated list of users for a particular purpose. Instead,
the admin needs to put all the users in a group and make the file to belong to
that group. File owners cannot share files with an arbitrary list of users.
There are three agents relate to the resource: user, group, and all. Each of
them can have separated permissions to read, write, and execute.
To change the owner of a resource you use chown. There are two ways of
setting permissions with chmod:
o A numeric form using octal modes: read = 4, write = 2, execute = 1,
where you multiply by user = x100, group = x10, all = x1, and sum the
values corresponding to the granted permissions. For example 755 =
700 + 50 + 5 = rwxr-xr-x: $ chmod 774 <FILENAME>
o An abbreviated letter-based form using symbolic modes: u, g, or a,
followed by a plus or minus, followed by a letter r, w, or x. This means
that u+x "grants user execute permission", g-w "denies group write
permission", and a+r "grants all read permission":$ chmod g-
w <FILENAME>.
To change the group you use chgrp, using the same logic as for chmod.
To see the file permissions in the current folder, type:
$ ls -l
For example, -rw-r--r-- means that it is a file (-) where the owner has read
(r) and write (w) permissions, but not execute permission (-).
Reading Files
cat
$ cat <FILENAME>
tac
Prints the inverse of the content of a file in the terminal (starting from the
bottom):
$ tac <FILENAME>
$ less <FILENAME>
$ more <FILENAME>
nl
To print (cat) a file with line numbers:
$ nl <FILENAME>
tee
wc
$ wc <FILENAME>
diff can be used to compare files and directories. Useful flags include: -c to list
differences, -r to recursively compare subdirectories, -i to ignore case, and -
w to ignore spaces and tabs.
You can compare three files at once using diff3, which uses one file as the
reference basis for the other two.
file
$ file requirements.txt
requirements.txt: ASCII text
grep
grep finds matches for a particular search pattern. The flag -l lists the files that
contain matches, the flag -imakes the search case insensitive, and the flag -
r searches all the files in a directory and subdirectory:
ls
ls lists directory and files. Useful flags are -l to list the permissions of each file
in the directory and -a to include the dot-files:
$ ls -la
$ ls -lrS
To list the names of the 10 most recently modified files ending with .txt:
tree
find
which
$ which ls
whereis
$ whereis <FILENAME>
locate
To find files by name (using database):
$ locate <FILENAME>
$ test -f <FILENAME>
Modifying Files
true
tr
tr takes a pair of strings as arguments and replaces, in its input, every letter
that occurs in the first string by the corresponding characters in the second
string. For example, to make everything lowercase:
$ tr A-Z a-z
$ tr -d '\n'
tr doesn't accept the names of files to act upon, so we can pipe it with cat to
take input file arguments (same effect as $ <PROGRAM> < <FILENAME>):
$ cat "$@" | tr
sort
Sort the contents of text files. The flag -r sort backwards, and the flag -
n selects numeric sort order (for example, without it, 2 comes after 1000):
$ sort -m <FILENAME>
uniq
uniq remove adjacent duplicate lines. The flag -c can include a count:
$ uniq -c <FILENAME>
$ uniq -d <FILENAME>
cut
cut selects particular fields (columns) from a structured text files (or particular
characters from each line of any text file). The flag -d specifies what delimiter
should be used to divide columns (default is tab), the flag -fspecifies which
field or fields to print and in what order:
join
mkdir
mkdir creates a directory. An useful flag is -p which creates the entire path of
directories (in case they don't exist):
$ mkdir -p <DIRNAME>
cp
Copying directory trees is done with cp. The flag -a is used to preserve all
metadata:
$ cp -a <ORIGIN> <DEST>
Interestingly, commands enclosed in $() can be run and then the output of the
commands is substituted for the clause and can be used as a part of another
command line:
The pushd command saves the current working directory in memory so it can
be returned to at any time, optionally changing to a new directory:
$ pushd ~/Desktop/
The popd command returns to the path at the top of the directory stack.
ln
Files can be linked with different names with the ln. To create a symbolic (soft)
link you can use the flag -s:
$ ln -s <TARGET> <LINKNAME>
dd
dd is used for disk-to-disk copies, being useful for making copies of raw disk
space. For example, to back up your Master Boot Record (MBR):
$ dd if=/dev/sda of=sda.mbr bs=512 count=1
$ dd if=/dev/sda of=/dev/sdb
du
$ du -sha
df
$ df -h
ifconfig
$ ifconfig
In general, you will see the following devices when you issue ifconfig:
o eth0: shows the Ethernet card with information such as: hardware
(MAC) address, IP address, and the network mask.
o lo: loopback address or localhost.
ifconfig is supposed to be deprecated. See my short guide on ip-netns.
dhclient
Linux has a DHCP server that runs a daemon called dhcpd, assigning IP
address to all the systems on the subnet (it also keeps logs files):
$ dhclient
dig
netstat
$ netstat -tulpn
To connect to a host server, you can use netcat (nc) and telnet. To connect
under an encrypted session, ssh is used. For example, to send a string to a
host at port 3000:
lsof
lsof lists open files (remember that everything is considered a file in Linux):
$ lsof <STRING>
Useful Stuff
echo
echo prints its arguments as output. It can be useful for pipeling, and in this
case you use the flag -n to not output the trailing new line:
$ echo -n <FILENAME>
echo can be useful to generate commands inside scripts (remember the
discussion about file descriptor):
$ echo 'Done!' >&2
$ echo $PATH
bc
A calculator program is given by the command bc The flag -l stands for the
standard math library:
$ bc -l
To find information about logged users you can use the commands w, who,
finger, and users.
Regular Expression 101
Additionally, to find lines that end with a particular string you can use $:
$ grep awesome$
As an extension, egrep uses a version called extended regular
expresses (EREs) which include things such:
o () for grouping
o | for or
o + for one or more times
o \n for back-references (to refer to an additional copy of whatever was
matched before by parenthesis group number n in this expression).
For instance, you can use egrep '.{12}'to find words of at least 12 letters.
You can use egrep -x '.{12}'to find words of exactly twelve letters.
awk is a pattern scanning tool while sed is a stream editor for filtering and
transform text. While these tools are extremely powerful, if you have
knowledge of any very high level languages such as Python or Ruby, you
don't necessary need to learn them.
sed
Let's say we want to replace every occurrence of mysql and with MySQL
(Linux is case sensitive), and then save the new file to <FILENAME>. We can
write an one-line command that says "search for the word mysql and replace it
with the word MySQL":
To pass an input through a stream editor and then quit after printing the
number of lines designated by the script's first parameter:
$ sed ${1}q
at
A very cute bash command is at, which allows you to run processes later
(ended with CTRL+D):
$ at 3pm
cron
If you have to run processes periodically, you should use cron, which is
already running as a system daemon. You can add a list of tasks in a file
named crontab and install those lists using a program also
called crontab. cron checks all the installed crontab files and run cron jobs.
To view the contents of your crontab, run:
$ crontab -l
To edit your crontab, run:
$ crontab -e
The format of cron job is: min, hour, day, month, dow (day of the week, where
Sunday is 0). They are separated by tabs or spaces. The symbol * means any.
It's possible to specify many values with commas.
For example, to run a backup every day at 5am, edit your crontab to:
0 5 * * * /home/files/backup.sh
Or if you want to remember some birthday, you can edit your crontab to:
rsync
rsync performs file synchronization and file transfer. It can compress the data
transferred using zlib and can use SSH or stunnel to encrypt the transfer.
rsync is very efficient when recursively copying one directory tree to another
because only the differences are transmitted over the network.
Useful flags are: -e to specify the SSH as remote shell, -a for archive mode, -
r for recurse into directories, and -z to compress file data.
A very common set is -av which makes rsync to work recursively, preserving
metadata about the files it copies, and displaying the name of each file as it is
copied. For example, the command below is used to transfer some directory to
the /planning subdirectory on a remote host:
$ rsync -av <DIRECTORY> <HOST>:/planning
File Compression
Historically, tar stood for tape archive and was used to archive files to a
magnetic tape. Today tar is used to allow you to create or extract files from an
archive file, often called a tarball.
Additionally you can add file compression, which works by finding
redundancies in a file (like repeated strings) and creating more concise
representation of the file's content. The most common compression programs
are gzip and bzip2.
When issuing tar, the flag f must be the last option. No hyphen is needed. You
can add v as verbose.
A simple tarball is created with the flag c:
$ tar cf <FILE.tar> <CONTENTS>
$ tar xf <FILE.tar>
gzip
gzip is the most frequently used Linux compression utility. To create the
archive and compress with gzip you use the flag z:
bzip2
xz
xz is the most space efficient compression utility used in Linux. To create the
archive and compress with xz:
Logs
$ lastlog
If the last link to a file is deleted but this file is open in some editor, we can still
retrieve its content. This can be done, for example, by:
1. attaching a debugger like gdb to the program that has the file open,
2. commanding the program to read the content out of the file descriptor (the
**/proc** filesystem), copying the file content directly out of the open file
descriptor pseudo-file inside **/proc**.
For example, if one runs $ dd if=/dev/zero of=trash & sleep 10; rm
trash, the available disk space on the system will continue to go downward
(since more contents gets written into the file by which dd is sending its
output).
However, the file can't be seen everywhere in the system! Only killing
the dd process will cause this space to be reclaimed.
An index node (inode) is a data structure used to represent a filesystem object
such as files or directories. The true name of a file, even when it has no other
name, is in fact its inode number within the filesystem it was created, which
can be obtained by
$ stat
or
$ ls -i
Creating a hard link with ln results in a new file with the same inode number as
the original. Running rm won't affect the other file:
$ echo awesome > awesome
$ cp awesome more-awesome
$ ln awesome same-awesome
$ ls -i *some
7602299 awesome
7602302 more-awesome
7602299 same-awesome
A Linux text file contains lines consisting of zero of more text characters,
followed by the newline character(ASCII 10, also referred to as hexadecimal
0x0A or '\n').
A text with a single line containing the word 'Hello' in ASCII would be 6 bytes
(one for each letter, and one for the trailing newline). For example, the text
below:
$ cat text.txt
Hello everyone!
Linux is really cool.
Let's learn more!
is represented as:
genpass() {
local p=$1
[ "$p" == "" ] && p=16
tr -dc A-Za-z0-9_ < /dev/urandom | head -c ${p} | xargs
}
$ genpass
For example:
$ genpass
dIBObynGX9epYogz
$ genpass 8
c_yhmaXt
$ genpass 12
FZI2wz2LzyVQ
$ genpass 14
ZEfgQvpY4ixePt
Password Asterisks
By default, when you type your password in the terminal you should see no
feedback. If you would like to see asterisks instead, edit:
$ sudo visudo
to have the value:
Defaults pwfeedback
imagemagick
Type !! to run the last command in the history, !-2 for the command before
that, and so on.