Learning Linux Binary Analysis - Sample Chapter
Learning Linux Binary Analysis - Sample Chapter
Learning Linux Binary Analysis - Sample Chapter
This book will lead you into territory that is uncharted even
by some experts, right into the world of the computer hacker.
$ 44.99 US
28.99 UK
P U B L I S H I N G
Learning Linux
Binary Analysis
ee
pl
C o m m u n i t y
E x p e r i e n c e
D i s t i l l e d
Learning Linux
Binary Analysis
Uncover the secrets of Linux binary analysis with this handy guide
Sa
m
Preface
Preface
Software engineering is the act of creating an invention that exists, lives, and
breathes on a microprocessor. We call it a program. Reverse engineering is the act
of discovering how exactly that program lives and breathes, and furthermore it is
how we can understand, dissect, or modify the behavior of that program using a
combination of disassemblers and reversing tools and relying on our hacker instincts
to master the target program which we are reverse engineering. We must understand
the intricacies of binary formats, memory layout, and the instruction set of the
given processor. We therefore become masters of the very life given to a program
on a microprocessor. A reverse engineer is skilled in the art of binary mastery. This
book is going to give you the proper lessons, insight, and tasks required to become
a Linux binary hacker. When someone can call themselves a reverse engineer, they
elevate themselves beyond the level of just engineering. A true hacker can not only
write code but also dissect code, disassembling the binaries and memory segments in
pursuit of modifying the inner workings of a software program; now that is power
On both a professional and a hobbyist level, I use my reverse engineering skills in
the computer security field, whether it is vulnerability analysis, malware analysis,
antivirus software, rootkit detection, or virus design. Much of this book will be
focused towards computer security. We will analyze memory dumps, reconstruct
process images, and explore some of the more esoteric regions of binary analysis,
including Linux virus infection and binary forensics. We will dissect malwareinfected executables and infect running processes. This book is aimed at explaining
the necessary components for reverse engineering in Linux, so we will be going deep
into learning ELF (executable and linking format), which is the binary format used
in Linux for executables, shared libraries, core dumps, and object files. One of the
most significant aspects of this book is the deep insight it gives into the structural
complexities of the ELF binary format. The ELF sections, segments, and dynamic
linking concepts are vital and exciting chunks of knowledge. We will explore the
depths of hacking ELF binaries and see how these skills can be applied to a broad
spectrum of work.
Preface
The goal of this book is to teach you to be one of the few people with a strong
foundation in Linux binary hacking, which will be revealed as a vast topic that opens
the door to innovative research and puts you on the cutting edge of low-level hacking
in the Linux operating system. You will walk away with valuable knowledge of Linux
binary (and memory) patching, virus engineering/analysis, kernel forensics, and the
ELF binary format as a whole. You will also gain more insights into program execution
and dynamic linking and achieve a higher understanding of binary protection and
debugging internals.
I am a computer security researcher, software engineer, and hacker. This book is
merely an organized observation and documentation of the research I have done
and the foundational knowledge that has manifested as a result.
This knowledge covers a wide span of information that can't be found in any one
place on the Internet. This book tries to bring many interrelated topics together into
one piece so that it may serve as an introductory manual and reference to the subject
of Linux binary and memory hacking. It is by no means a complete reference but
does contain a lot of core information to get started with.
Preface
Linux tools
Throughout this book, we will be using a variety of free tools that are accessible by
anyone. This section will give a brief synopsis of some of these tools for you.
[1]
GDB
GNU Debugger (GDB) is not only good to debug buggy applications. It can also
be used to learn about a program's control flow, change a program's control flow,
and modify the code, registers, and data structures. These tasks are common for a
hacker who is working to exploit a software vulnerability or is unraveling the inner
workings of a sophisticated virus. GDB works on ELF binaries and Linux processes.
It is an essential tool for Linux hackers and will be used in various examples
throughout this book.
We will be exploring objdump and other tools in great depth during our introduction
to the ELF format in Chapter 2, The ELF Binary Format.
Chapter 1
To copy the .data section from an ELF object to a file, use this line:
objcopy only-section=.data <infile> <outfile>
The objcopy tool will be demonstrated as needed throughout the rest of this book.
Just remember that it exists and can be a very useful tool for the Linux binary hacker.
strace
System call trace (strace) is a tool that is based on the ptrace(2) system call, and it
utilizes the PTRACE_SYSCALL request in a loop to show information about the system
call (also known as syscalls) activity in a running program as well as signals that
are caught during execution. This program can be highly useful for debugging, or
just to collect information about what syscalls are being called during runtime.
This is the strace command used to trace a basic program:
strace /bin/ls -o ls.out
The initial output will show you the file descriptor number of each system call that
takes a file descriptor as an argument, such as this:
SYS_read(3, buf, sizeof(buf));
If you want to see all of the data that was being read into file descriptor 3, you can
run the following command:
strace -e read=3 /bin/ls
You may also use -e write=fd to see written data. The strace tool is a great little
tool, and you will undoubtedly find many reasons to use it.
ltrace
library trace (ltrace) is another neat little tool, and it is very similar to strace. It
works similarly, but it actually parses the shared library-linking information of a
program and prints the library functions being used.
[3]
ftrace
Function trace (ftrace) is a tool designed by me. It is similar to ltrace, but it also
shows calls to functions within the binary itself. There was no other tool I could find
publicly available that could do this in Linux, so I decided to code one. This tool can
be found at https://github.com/elfmaster/ftrace. A demonstration of this tool
is given in the next chapter.
readelf
The readelf command is one of the most useful tools around for dissecting ELF
binaries. It provides every bit of the data specific to ELF necessary for gathering
information about an object before reverse engineering it. This tool will be used
often throughout the book to gather information about symbols, segments, sections,
relocation entries, dynamic linking of data, and more. The readelf command is the
Swiss Army knife of ELF. We will be covering it in depth as needed, during Chapter 2,
The ELF Binary Format, but here are a few of its most commonly used flags:
Chapter 1
issues/63/9.txt)
/proc/<pid>/maps
/proc/<pid>/maps file contains the layout of a process image by showing each
memory mapping. This includes the executable, shared libraries, stack, heap, VDSO,
and more. This file is critical for being able to quickly parse the layout of a process
address space and is used more than once throughout this book.
/proc/kcore
The /proc/kcore is an entry in the proc filesystem that acts as a dynamic core file
of the Linux kernel. That is, it is a raw dump of memory that is presented in the form
of an ELF core file that can be used by GDB to debug and analyze the kernel. We will
explore /proc/kcore in depth in Chapter 9, Linux /proc/kcore Analysis.
[5]
/boot/System.map
This file is available on almost all Linux distributions and is very useful for kernel
hackers. It contains every symbol for the entire kernel.
/proc/kallsyms
The kallsyms is very similar to System.map, except that it is a /proc entry that
means that it is maintained by the kernel and is dynamically updated. Therefore, if
any new LKMs are installed, the symbols will be added to /proc/kallsyms on the
fly. The /proc/kallsyms contains at least most of the symbols in the kernel and will
contain all of them if specified in the CONFIG_KALLSYMS_ALL kernel config.
/proc/iomem
The iomem is a useful proc entry as it is very similar to /proc/<pid>/maps, but for
all of the system memory. If, for instance, you want to know where the kernel's text
segment is mapped in the physical memory, you can search for the Kernel string
and you will see the code/text segment, the data segment, and the bss segment:
$ grep Kernel /proc/iomem
01000000-016d9b27 : Kernel code
016d9b28-01ceeebf : Kernel data
01df0000-01f26fff : Kernel bss
ECFS
Extended core file snapshot (ECFS) is a special core dump technology that was
specifically designed for advanced forensic analysis of a process image. The code for
this software can be found at https://github.com/elfmaster/ecfs. Also, Chapter 8,
ECFS Extended Core File Snapshot Technology, is solely devoted to explaining what
ECFS is and how to use it. For those of you who are into advanced memory forensics,
you will want to pay close attention to this.
Chapter 1
0x8048034
AT_PHENT: 32
AT_PHNUM: 9
AT_BASE:
0xb777a000
AT_FLAGS: 0x0
AT_ENTRY: 0x8048eb8
AT_UID:
1000
AT_EUID: 1000
AT_GID:
1000
AT_EGID: 1000
AT_SECURE: 0
[7]
The auxiliary vector will be covered in more depth in Chapter 2, The ELF Binary Format.
Linker scripts
Linker scripts are a point of interest to us because they are interpreted by the linker
and help shape a program's layout with regard to sections, memory, and symbols.
The default linker script can be viewed with ld -verbose.
The ld linker program has a complete language that it interprets when it is taking
input files (such as relocatable object files, shared libraries, and header files), and it
uses this language to determine how the output file, such as an executable program,
will be organized. For instance, if the output is an ELF executable, the linker script
will help determine what the layout will be and what sections will exist in which
segments. Here is another instance: the .bss section is always at the end of the data
segment; this is determined by the linker script. You might be wondering how this is
interesting to us. Well! For one, it is important to have some insights into the linking
process during compile time. The gcc relies on the linker and other programs to
perform this task, and in some instances, it is important to be able to have control
over the layout of the executable file. The ld command language is quite an in-depth
language and is beyond the scope of this book, but it is worth checking out. And
while reverse engineering executables, remember that common segment addresses
may sometimes be modified, and so can other portions of the layout. This indicates
that a custom linker script is involved. A linker script can be specified with gcc using
the -T flag. We will look at a specific example of using a linker script in Chapter 5,
Linux Binary Protection.
Summary
We just touched upon some fundamental aspects of the Linux environment and the
tools that will be used most commonly in the demonstrations from each chapter.
Binary analysis is largely about knowing the tools and resources that are available
for you and how they all fit together. We only briefly covered the tools, but we will
get an opportunity to emphasize the capabilities of each one as we explore the vast
world of Linux binary hacking in the following chapters. In the next chapter, we will
delve into the internals of the ELF binary format and cover many interesting topics,
such as dynamic linking, relocations, symbols, sections, and more.
[8]
www.PacktPub.com
Stay Connected: