Basics of Reverse Engineering
Basics of Reverse Engineering
Basics of Reverse Engineering
Table of Contents
Table of Contents
Basics of Reverse Engineering for x86....................................................................................... 1
Numbers and numbering systems........................................................................................... 4
Integers (whole numbers) versus floating-point (fractional numbers) ........................... 4
Numbering systems............................................................................................................. 4
Number ranges.................................................................................................................... 4
Negative values................................................................................................................... 5
Basic Computer Architecture................................................................................................... 5
Registers.............................................................................................................................. 5
Memory................................................................................................................................ 6
Status flags.......................................................................................................................... 6
Floating point stack.............................................................................................................. 7
Program code.......................................................................................................................... 7
Concepts and tools............................................................................................................... 7
Assembler code.................................................................................................................... 8
Mathematical operations on integer values......................................................................... 8
Mathematical operations on floating-point values............................................................... 9
Logical operations................................................................................................................ 9
Bitwise operations......................................................................................................... 9
Bit rotation..................................................................................................................... 9
Data movement instructions................................................................................................ 9
Stack operations............................................................................................................... 9
Moving a byte, word, dword............................................................................................ 10
Widening data (bytes into words, words into dwords) ..................................................... 10
Moving larger amounts of data....................................................................................... 10
Direction flag............................................................................................................... 11
Optimization................................................................................................................ 11
Working with pointers..................................................................................................... 11
LEA math..................................................................................................................... 12
Comparison instructions.................................................................................................... 12
Conditions for signed values........................................................................................... 12
Conditions for unsigned values....................................................................................... 13
Code flow instructions........................................................................................................ 13
Jumps.............................................................................................................................. 13
Function calls.................................................................................................................. 13
Passing arguments to a function.................................................................................. 14
Expendable and non-expendable registers.................................................................. 14
Function result............................................................................................................. 14
Function return address............................................................................................... 15
Function environment.................................................................................................. 15
Indirect function calls................................................................................................... 15
Examples..................................................................................................................... 16
Higher-level constructs expressed in assembler................................................................... 17
C constructs....................................................................................................................... 17
Arrays............................................................................................................................. 17
Strings......................................................................................................................... 17
Allocation..................................................................................................................... 17
Constants........................................................................................................................ 17
Pointer math................................................................................................................... 17
Structs............................................................................................................................ 18
Conditional statements................................................................................................... 18
Labels.......................................................................................................................... 18
If.................................................................................................................................. 18
If-else........................................................................................................................... 18
If-elseif-else.................................................................................................................. 19
Loops.............................................................................................................................. 19
While loop.................................................................................................................... 19
Do-while loop............................................................................................................... 20
for loop........................................................................................................................ 20
Switch statements....................................................................................................... 21
Tightly grouped cases with multiple different actions ............................................... 21
Tightly grouped cases with few different actions ...................................................... 22
Ungrouped cases...................................................................................................... 23
Varargs............................................................................................................................ 24
Additional C++ constructs................................................................................................. 25
Constructors and destructors.......................................................................................... 25
Heap memory.............................................................................................................. 25
Independently allocated memory / placement new..................................................... 25
Local variable allocation.............................................................................................. 25
Enumerators................................................................................................................... 26
Methods.......................................................................................................................... 26
Inheritance...................................................................................................................... 26
Virtual functions.............................................................................................................. 26
COM................................................................................................................................ 28
Templates........................................................................................................................ 28
Appendix A: Syringe.............................................................................................................. 28
Appendix B: Security considerations..................................................................................... 29
String overflows................................................................................................................. 30
Integer overflows............................................................................................................... 30
Use of uninitialized values.................................................................................................. 31
Appendix C: analysing gamemd.exe..................................................................................... 32
Reading the program code in IDA Pro ................................................................................. 32
How-tos........................................................................................................................... 32
Appendix D: reading crash dumps from the game................................................................ 33
usually written down in base-2 and stored in special binary formats which are irrelevant for
now.
Number ranges
When performing math, we can equally easily calculate 10 1 and 1000000000000 1.
Computers, on the other hand, do not have such luxury and are designed to operate on fixed
number ranges effectively. Operating outside those ranges is possible with special libraries,
but it tends to be slower. Certain higher-level programming languages abstract these
complications away from the programmer and always use the slower way since its more
correct and less surprising, but thats not the languages were interested in.
X86, like most other computers today, operates on binary numbers. A single binary digit can
only have one of two values, 0 or 1. Bits are usually grouped into groups of 8. 8 bits can
represent unsigned numbers between 0 and 2 8-1 = 255, or signed values between -128 and
127. A value with 8 bits is called a byte. Two bytes (16 bits) make up a word. (Technically, the
term word used to refer to machine word, which is a value that a machine can operate on
in one step without breaking it down to smaller pieces.) A word can represent signed values
between 0 and 216-1 = 65535, or signed values between -32768 and 32767. Four bytes (32
bits) are called a dword (double word) and can represent values between 0 and 2 32-1 =
4294967295, or between -2147483648 and 2147483647. Eight bytes (64 bits) are called a
qword (quad word) and can represent values between 0 and 2 64-1.
One hexadecimal digit can represent 4 binary digits, so a byte needs only 2 hexadecimal
digits instead of 8 binary ones. For this reason, hexadecimal representation is much more
popular. A hexadecimal digit is also called a nibble.
Negative values
Representation of negative values is more complex than simply summing up all digits values,
because the computer doesnt have a separate place to store the signedness of the value.
Instead, it is up to the code to know whether to interpret a given value as a signed or an
unsigned number. If the value is signed, the highest (rightmost) bit is used as a sign indicator
and not as part of the value. If the value is negative, the values of all the bits except the sign
bit are inverted, and 1 is added to the resulting value, yielding the absolute value of the
number that is being represented. The representation of non-negative values within the range
[0; 01111] is the same in signed and unsigned modes. In unsigned mode, bit patterns from
0000 to 1111 represent values in correct increasing order. In signed mode, bit patterns
from 000..0 to 011..1 represent 0 and positive values in increasing order, 1000 represents
the smallest representable negative value and 1111 represents -1.
16 15
EAX
8 7
AX
AH
AL
Memory
In 32-bit mode, a process (program) can use 4 Gigabytes of memory (since it can only address
that much in 32 bits). In certain operating systems, additional limitations and workarounds
apply. Note that the operating system doesnt let the program access raw memory, but
instead lets it operate on Virtual Memory. So even if the actual computer has less than 4
Gigabytes of physical memory, the program will not have any problems. In fact, the OS lets
each program have its own 4 Gigabytes without interfering with other programs.
This memory space contains everything the program needs to run. That includes the actual
program code, the data it operates on, and the operating system code necessary to run the
program. Free memory that the program can use is called the Heap, and usually dynamically
allocated memory comes from there. A specific area of the memory is reserved for a Stack,
which contains small amounts of data that is modified very often, including the programs
current call stack (what function called the current one, what function called that, etc.).
Assuming you are familiar with C++, you should know that the Stack is where functions
automatic local variables are allocated, whereas global variables, static variables and
manually allocated variables (managed through new/delete) are placed on the Heap.
The Stack operates in a pretty simple manner normally the last value you put on it is the
first one you will get out of it. Whats not so obvious is that it starts at a fairly high memory
address and grows towards lower ones.
The ESP register always contains the address of the topmost value of the stack. Therefore
adding a new value on the stack decrements the ESP value by the size of that value (which is
typically 32 bits), and removing it increases it by the same amount.
Status flags
The status flags pseudo-register contains several logical indicators, such as overflow
happened, carry value, signed value, and zero value that are updated by most
mathematical operations. These indicators are often used to make logical decisions (if-else
logic). It also contains some indicators, such as direction, which is used by a repeated
instruction to decide whether to operate on increasing or decreasing addresses.
The precise layout of this pseudo-register is outlined in Intel's reference manuals (Volume I,
page 3-20) and any other reference, and because it isn't really relevant for us, it will not be
repeated here. I believe a list of most commonly used flags will be sufficient:
Carry Flag (CF) if it's set, the last operation resulted in an unsigned integer overflow.
Overflow Flag (OF) if it's set, the last operation resulted in a signed integer overflow.
Zero Flag (ZF) set if the result is zero.
Sign Flag (SF) set to the value of the sign bit of the result when treated as a signed integer
if the result is negative, this will be 1, otherwise it will be 0.
Direction Flag (DF) controls direction of certain looping constructs, will be explained later.
Note that most mathematical operations such as addition or subtraction evaluate their result
in both signed and unsigned modes and set all the related flags appropriately.
Floating point stack
The system contains 8 80-bit registers that are used to store floating-point values for a very
short time. This collection of registers is treated as a pseudo-stack, which operates and is
addressed in a slightly different manner from the stack referenced by ESP. The FPU status
flags register tracks which of the 8 registers currently contains the top of the stack.
Adding/removing a new value to the stack changes this registers value. The registers do not
have any fixed names, when the code needs to refer to them, the topmost register is called st
or st(0), the next one is called st(1) and so on. Notice that after adding a new value to the
stack, the register that was st(0) becomes st(1), and so on. Removing a value operates in the
reverse direction.
A register similar to EFLAGS exists to indicate similar attributes, but it cannot be accessed
directly and special instructions must be used to copy its value to the AX register before using
it. The precise layout of this pseudo-register is outlined in Intel's reference manuals (Volume I,
page 8-10). There are three useful flags in this register:
the C0 flag - equivalent to the Carry Flag, indicated by AH & 0x1.
the C2 flag equivalent to the Parity Flag, AH & 0x4.
the C3 flag equivalent to the Zero Flag, AH & 0x40.
Program code
Concepts and tools
On the lowest level, a programs code, like any other data, is a sequence of bytes that the
CPU interprets as instructions. These numeric instructions have textual mnemonics that are
easier to read and understand. An instruction performs a very small task, such as moving a
value from one register to another, or multiplying it with a different value.
An assembler takes the mnemonics, turns them into the sequence of bytes and places them
in an executable. Examples of these tools include MASM (Microsoft Assembler), NASM and
GAS (GNU Assembler).
A compiler takes code written in a high-level language, such as C++, and transforms it into
assembler code, which is just a fancy name for these same mnemonics. The mnemonics are
then typically fed to the assembler. Most popular C++ compilers for x86 are Microsofts Visual
C++ (known as MSVC) and GNUs G++.
A tool called a disassembler takes the sequence of bytes and displays the corresponding
mnemonic sequence. However, the information that was contained in the original high-level
code, such as actual operations, variable names, function names, data types, etc. is not saved
in the executable and therefore cannot be recovered. What the disassembler gives you is a
very low-level view of what the program does, like value from this address is read, multiplied
by 3 and placed into that address. This naturally takes more lines to express things that take
just a few lines in high-level code. Obviously, it makes the code more difficult to understand.
Popular disassemblers are IDA Pro, OllyDbg and Linuxs objdump d. More powerful
disassemblers contain helpful functionality like automatically adding empty lines around
logical sections (e.g., before/after the loop iteration code).
A debugger lets you inspect a running process step by step you can make the program stop
after each operation, inspect its variables and observe the code flow. This is very helpful when
trying to figure out a bug in the program, or a myriad other scenarios. Any decent
development environment for a higher level language includes a debugger for that language,
and typically that allows you to see corresponding assembler code as well. Virtually every
disassembler includes a debugger, as both those tools are very useful and complement each
other really well. Debuggers can operate either in user level (as a normal program, where
they can debug ordinary programs) or kernel level, where they can debug even the operating
system and the drivers. For example, OllyDbg and IDA Pro both contain powerful user-level
debuggers. SoftICE (now known as NuMega Driver Studio) is an incredibly powerful kernellevel debugger. Microsofts Debugging Tools for Windows package contains several debuggers
for both user- and kernel- level debugging. Unix developers typically use GDB.
If a program was built in debug mode, it typically contains the symbols (data structures,
types, variables) to simplify debugging. Most debuggers and disassemblers can typically
make good use of these to make the disassembled code more readable.
When a program crashes, the operating system or the programs own crash handling code
can make a memory snapshot (process dump) of the whole program and save it. Certain
debuggers and disassemblers can read these snapshots, so the developers can try and figure
out what caused the crash.
A decompiler attempts to go one step further than the disassembler and turn the sequence of
bytes or equivalent assembler code back into high-level code, usually C. Due to previously
mentioned limitations, this process heavily depends on the ability to identify the original
compiler that was used and the decompilers familiarity with that compilers code generation.
Even then, generated code is not going to look much like original code, but it is still
immensely helpful when reverse engineering code. The most famous decompiler is IDA Pros
companion Hex-Rays.
Assembler code
Assembler (mnemonics for CPU instructions) is what a reverse engineer spends most of
his/her time looking at, so being familiar with them is very important.
There are two main syntaxes for x86 assembler, one created by Intel and widely used in the
Windows environment, and another created by AT&T and preferred in Unix environments.
Microsoft Visual C++ only supports the Intel syntax, whereas GCC defaults to AT&T syntax but
has a flag to use Intel syntax instead. In my opinion, Intels syntax is more readable, but that's
a matter of opinion of course.
In both syntaxes, an instruction begins with optional prefixes, followed by the instruction
code, followed by optional arguments. In Intel syntax, the first argument is destination and
any following ones are source, while AT&Ts syntax is more mundane and puts the destination
argument at the end. AT&Ts syntax also uses very different expressions for arguments, which
we will not go deeply into.
X86 has a large amount of instructions, not all of which are equally useful. The best reference
for these is the Intel IA-32 Programmer Reference multi-volume book set, available as free
PDFs from Intels website. It used to also be available as free dead-tree versions, but they
dont do that any more. Its hard to overstate the importance of having these for easy
reference.
Manually writing a complete program in assembler is not often done these days. Manual
assembly used to be reserved for small performance-critical code blocks, but modern
compilers can often generate better code than humans. Instruction set extensions such as
MMX or SSE contain instructions that can be used to parallelize operations and perform other
tasks faster, but the C/C++ compilers do not always have enough information in the source
code to determine what extensions would be safe to apply so they play it safe this is an area
where manual code generation can still yield better performance.
For similar reasons, this document will show small snippets of assembly code, but not entire
programs.
Mathematical operations on integer values
Mathematical operations usually take two operands, apply the operation and put it into the
destination operand. Abstractly, it looks like this: dst = dst src, where represents the
mathematical operation.
Adding two numbers is done simply with an ADD instruction: ADD EAX, EBX. Same for
subtraction: SUB EBX, EAX.
Multiplication is a little trickier, since it can result in much larger values than either operand. It
also yields highly different results depending on the signedness of the operands. Therefore
there are two instructions to perform multiplication, MUL and IMUL, and their result is 64 bits
wide, its top dword is put into EDX and the bottom one into EAX. This is called a register pair
and written as EDX:EAX.
Division is more complex still. Again we have two instructions, DIV and IDIV. Note that these
perform integer division (5 / 2 = 2), to get floating point results you need the floating point
instructions.
Mathematical operations on floating-point values
Operations on floating-point values are usually named similar to the integer ones the
addition is performed by FADD, subtraction by FSUB... They follow the same pattern of dst =
dst src, but their operands are usually the floating point registers.
Logical operations
Bitwise operations
Bitwise operations (OR, XOR, AND...) follow the same operand/result rules as mathematical
ones.
OR EAX, 1h sets the lowest bit of EAX.
AND EAX, 1111b clears all bits of EAX except the bottom four.
NOT EAX flips each bit from 0 to 1 and vice versa, e.g. 0101b becomes 1010b.
NEG EAX changes the sign of the value, that is, if EAX contained 44, it will now contain -44.
This is done according to the rules described in the Negative values section.
Bit rotation
Sometimes it is useful to rotate the bits that represent a value in some way. For example, if
we move all of the values bits one position to the left, we will get a value that is exactly 2
times larger than the original value was. If we move them to the right instead, we will get a
value that is 2 times smaller. Other rotation methods exist, but they are useful under more
specific circumstances.
The SHL EAX, 2 instruction shifts all bits two positions to the left and sets the bottom two bits
to zero. This yields a value that is 22 larger than the original one was (this loses the values of
the top two bits, so we actually get a value that is twice larger than the 30 bottom bits of
EAX).
The SHR EAX, 2 instruction shifts all bits two positions to the right and fills the top two bits
with zero. This basically removes the sign, divides the value by 2 2 and drops the remainder.
The SAR EAX, 2 instruction shifts all bits two positions to the right and fills the top two bits
with the value of the original top bit. This is equivalent to SHR, but it retains the sign bit.
The ROL EAX, 3 instruction rotates the bits in EAX by three positions, the three bottom bits are
filled with the values of the three formerly topmost bits. This is useful in cryptographic
routines, but it has no simple mathematical representation.
The ROR EAX, 3 does the same as ROL, but to the right instead of left.
Usage of SHL/SHR used to be a fast way to perform multiplication/division by powers of two.
Nowadays, the processors are fast enough that the dedicated multiplication/division
instructions are just as adequate.
Data movement instructions
One of the key tasks a CPU has to do is move data from point A to point B, for example,
calculating X+Y and putting the result somewhere other code can find it.
Stack operations
The Stack uses some specific terminology pushing a value refers to adding a new value at
the top of the stack, and popping a value means the reverse, removal of the topmost value.
This terminology is reflected in the instruction names:
PUSH arg pushes the argument onto the stack (and decrements the stack pointer). In x86-32,
this usually means 32 bit operands.
POP reg moves the topmost value off the stack into the specified register and increments the
stack pointer.
Moving a byte, word, dword
Moving a small amount of data that fits in one register can be done by a single MOV
instruction.
For example, assembler code to move the value 42 into the EAX register would look like MOV
EAX, 42. Moving the value of ECX into EAX would be MOV EAX, ECX. Moving the DWORD from
memory address 0x102030 into EAX is MOV EAX, dword ptr ds:[102030h].
Widening data (bytes into words, words into dwords)
It is sometimes necessary to take a byte/word value and turn it into a word/dword.
The MOVZX (zero-extend) instruction is used to extend values by filling the unused bits with
zero, that is known as unsigned extension.
In contrast, the MOVSX (sign-extend) instruction fills unused bytes with the value of the
source operands sign bit. This is signed extension.
For example: assume the memory address 0x203040 contains 0x71 0x81 0x91 0x41. If we
consider these to be signed values, that is 113 -127 -111 65. If we consider them unsigned
though, they become 113 129 145 65.
MOVSX AX, byte ptr ds:[0x203042] puts 0xFF91 into AX (Highest bit of 0x91 is 1, so the 8 top
bits of AX are filled with 1). When viewed as a signed value, AX also contains -111. If you view
it as unsigned however, it becomes nonsensical 65425.
MOVSX AX, byte ptr ds:[0x203043], however, puts 0x0041 into AX, since the highest bit of
0x41 is 0.
MOVZX EAX, byte ptr ds:[0x203041] puts 0x00000081 into EAX. EAX now contains 129,
whether you view it as signed or unsigned.
MOVZX AX, byte ptr ds:[0x203043] puts 0x0041 into AX, just like MOVSX did.
The CDQ instruction can be used to extend a signed DWORD in EAX into the EDX:EAX register
pair. This is commonly used before multiplication/division.
Moving larger amounts of data
Moving larger amounts of data requires more instructions. Two key instructions for this are
MOVSD and MOVSB (move string in dwords/bytes). They both read the value of ESI,
interpret it as a memory address, read respectively a DWORD or a BYTE from that location
and put it into the address contained in EDI. Afterwards, they modify both ESI and EDI by the
amount of bytes copied. So, you can write five MOVSD in a row to copy 20 bytes.
However, there are helper prefix bytes to simplify loops. Prefix REP makes the instruction
repeat multiple times. Before running the instruction, the value in ECX is checked. If its not
zero, its decremented and the instruction is executed and ECX is tested again. This process
repeats again until ECX becomes zero. For example, MOV ECX, 40h; MOV ESI, 102030h; MOV
EDI, 202030h; REP MOVSD; will copy 40h * 4 = 100h bytes from the memory block starting at
102030h to the memory block starting at 202030h. After this operation, ECX will contain zero,
ESI will contain 102130h, and EDI will be 202130h.
Direction flag
The previous section glossed over a slight detail. As was mentioned in the EFLAGS section, the
EFLAGS has a so-called Direction Flag bit that indicates whether the MOVS* will increment or
decrement the ESI/EDI registers. It is usually not modified these days and typically set to
increment, but you can modify it if you want. Why would you? A fairly common scenario is
copying data between two overlapping memory ranges to copy data from range
[0x1000...0x2000] to [0x18000x2800], you should begin at the higher address and copy
towards the lower one.
Optimization
What would be the fastest way to copy 66h bytes? MOVSD cannot copy that exact amount,
and a large amount of MOVSB instructions will be slow. The simple answer, and one that is
used by most compilers today, is to divide the work:
MOV EDX, 66h
MOV ECX, EDX
SHR ECX, 2 ; 66h / 4, dropping the remainder
REP MOVSD
MOV ECX, EDX
AND ECX, 3 ; 66h % 4 *
REP MOVSB
*This is a common trick when you need to divide by powers of two shifting the divisor right
by the appropriate power of two gives you the division result, and anding the divisor with the
divider minus one yields the remainder.
It should be noted that modern CPUs have instruction set extensions such as MMX and SSE,
which provide more interesting instructions. Some of them can be used to build an even faster
data copying function, but these extensions are outside the scope of this document.
Working with pointers
Pointers are very commonly used in assembler, much like in C/C++. Consider the example
from earlier:
MOVZX AX, byte ptr ds:[0x203043].
This instruction takes one byte from the memory location 0x203043 and operates on it.
Unless your function operates on the same data all the time, the address holding the
information you need might be different (suppose you have to read from a buffer that was
allocated dynamically to store user input, the memory location of the buffer would be different
each time). In this case, you cannot actually hardcode the 0x203043 into the code, can you?
Something has to tell you where the information you need is, and you need to operate on that
address. Lets say you know that the buffers starting address is currently in the EDX register,
and you need to sign-extend the first and fourth bytes from that buffer into AX and EBX
respectively. One possible way to do that is:
MOVSX AX, byte ptr ds:[EDX]
ADD EDX, 3 ; move three bytes ahead, from #1 to #4
MOVSX EBX, byte ptr ds:[EDX]
SUB EDX, 3
You have to restore the EDX value so that other code can use it correctly. A different approach
would be to use an instruction called LEA (nothing to do with Star Wars, sorry). LEA stands for
Load Effective Address, and it calculates the value of its source operand (the address it should
be pointing to) and writes that value into the destination operand. It doesnt read any data
from the pointer, it just calculates it. The useful part comes when you see how complex the
source operands can be:
LEA dst, [reg1 + x * reg2 + y]
Here, x and y can be any number between 0 and 7 inclusive. LEA is optimized to calculate the
results very quickly, so it can be used often and for unintended purposes. With knowledge of
this instruction, you can perform our task differently:
MOVSX AX, byte ptr ds:[EDX]
LEA EBX, [EDX + 3]
MOVSX EBX, byte ptr ds:[EBX]
This saves us the trouble of restoring a register value and is arguably easier to read.
LEA math
As noted earlier, thanks to its speed LEA used to often be used for purposes other than
pointer math. This is largely irrelevant nowadays, since processors are so much faster than
they were when this trick was invented, but you will need to know about it since its widely
used in existing programs. You can probably see how a compiler can use it to turn a complex
mathematical operation into a sequence of simpler ones, for example:
EAX = EDX * 210h can be implemented as a single multiply instruction, or as this sequence:
LEA
LEA
LEA
LEA
LEA
ECX,
ECX,
ECX,
EAX,
EAX,
Comparison instructions
The x86 has two ways to compare values mathematically (less, equal, more) and bitwise (is
bit # set). Both of them put results into the EFLAGS register. Certain other instructions behave
differently based on certain flags values. The most common example of a conditional
instruction would be the family of conditional jumps (Jxx). These instructions take one
operand, a memory address, and each instruction corresponds to a more-or-less commonly
used condition. These instructions check if the EFLAGS meets their condition, and tell the CPU
after this instruction, start executing code at < given memory address> if it does.
Otherwise, nothing happens. Note that this instruction will not save any information about
where it was before switching to <given memory address>, so you cant later tell the CPU
ok, now go back to what you were doing earlier.
Problem is that almost every instruction that calculates a mathematical result also writes the
results attributes to EFLAGS, so the code should be carefully designed to avoid using those
instructions between making the comparison and using the result flags. Documenting each
instructions result attributes here would be a lot of work, and that is much better covered in
Intels reference.
The TEST arg1, arg2 instruction performs the bitwise comparison. It basically ANDs the two
operands and sets the ZF, PF flags depending on the result.
The CMP arg1, arg2 instruction performs the mathematical comparison. It basically subtracts
arg2 from arg1 and sets the ZF, SF, CF, OF, PF flags according to the result, which is NOT
saved anywhere. The flags that should be checked depend on whether the compared values
are signed or not, and what condition we are testing for.
Condition to meet
Equality
JZ/JE
JNZ/JNE
JLE/JNG
JGE/JNL
JG/JNLE
JBE/JNA
JAE/JNB
JA/JNBE
Other conditions
JS
Jump if Sign
JP
Jump if Parity
Function calls
Logical division of code into functions is really important for code quality, and any complex
program is bound to have large amounts of functions in it. This makes the function call
mechanism a very important part of x86 operations. The instruction to call a function is
unsurprisingly named CALL. The function returns to its caller when it encounters a RET/RETN
instruction. It takes an optional argument, but surprisingly its not the functions return value!
Well see what it is soon enough.
Passing arguments to a function
When you call a function, you first need to pass any required arguments to it. There are
several widely used pseudo-standard ways to do that, called calling conventions.
The earliest designed way to do that is to use the stack and let the caller clean it up
calculate how many bytes are needed for all the arguments, PUSH them all from right to left,
CALL the function, once it returns, ADD ESP, <needed size> to restore the stack. This allows
functions to take a varying number of parameters like Cs printf does. These functions end
with RETN and no argument. This is known as the cdecl calling convention and is used by
default in C functions.
Another common convention, stdcall, also pushes arguments from right to left, but this time
the callee is responsible for cleaning them up. This means the callee has to know how many
arguments it receives, and to remove them from the stack. This is done by the RET instruction
its argument tells the CPU to remove that many bytes off the stack before returning to the
caller. The RET instruction can only take a constant integer argument, meaning stdcall
functions cannot have varying numbers of arguments the way printf can. However, default
arguments are okay. This convention is used in the Windows API.
A variation of stdcall is called fastcall. It puts the first two DWORD-or-less sized arguments
into ECX and EDX respectively, and only the remaining ones into the stack. As a result, the
size of the RET argument should not take those arguments into account. This convention is
arguably faster than stdcall since it uses less stack, hence the name.
C++ non-static class functions (methods) use yet different conventions, and these vary from
compiler to compiler. They need a different convention to pass around the invisible this
pointer, pointing to the object whose method is called. MSVC puts this pointer into ECX and all
the other arguments into the stack, G++ pushes this pointer onto the stack after all the other
arguments. This convention is somewhat predictably labelled thiscall, but unlike other
conventions, it cannot be explicitly assigned to a function, and methods cannot be assigned a
different convention.
Note that while these are the most popular conventions, the compiler can use a different
convention (e.g. putting arguments into ESI, EDI, EAX and EDX instead of the stack) if the
function doesnt explicitly specify a convention and the compiler can prove it has seen all the
calls to this function (meaning the function is not exported to other compilation units and not
exported from a DLL). This is inconvenient when reversing, because it will require extra work
to determine where the arguments are. However, this is not overly difficult; to find which
registers are used to pass parameters, you just need to find registers that are read without
first being written to. Dont forget instructions that read/write data to hardcoded registers
without actually referencing them, e.g. MOVSD reads from ESI and writes to EDI, so barring
earlier writes to those registers you can assume they were used to pass arguments.
Rightmost stack
argument
Callers local variables
Right after CALL, you can access the arguments by [ESP+04h], [ESP+08h] But as soon as
you do something to change the ESP, the offsets will change and become more difficult to
track. To avoid this, a so-called stack frame pointer is introduced. The current value of EBP
is PUSHed onto the stack, and EBP is assigned the current value of ESP. Then EBP is not
modified during the function and POPed off the stack as the function ends. This way,
arguments can be accessed by [EBP+08h], [EBP+0Ch] and so on. Note that the offset off EBP
is 4 bytes larger than the offset off ESP, because the EBP value was saved on the stack and
decreased ESP by 4 bytes.
If a function needs local variables, the space for them is typically allocated right after the
stack frame is set up by SUBtracting the wanted space size from ESP, and restored by
ADDing the same value before restoring EBP. As a bonus, local variables can also be accessed
through fixed offsets off EBP like stack arguments can, except this time the offsets are
negative.
If the functions local variables include double-precision numbers, the compiler will
additionally perform an AND ESP, -8 before allocating local variables. -8 has the same bit
pattern as (NOT 7), that is, the three lowest bits are zero. This basically reduces ESP to the
nearest smaller value that leaves no remainder after being divided by 8, which (after any
necessary reordering of the local variables) ensures the doubles will be placed at an address
aligned to 8 bytes. This is done because loading values from an unaligned memory address is
slow (and in other architectures even disallowed!). This alignment procedure is not done for
complex type variables (structs and objects).
Indirect function calls
CALL supports both direct function calls call the function that starts at <address relative to
current location> and indirect function calls <absolute address> is a pointer that points a
memory address, which points to the start of the function to call. The former method is done
when the compiler can determine what exact function we want to call at compile time and
knows where that function is. The latter is done when calling functions that are imported from
a separate DLL, using function pointers, or when the function to call cannot be determined at
compile time (for example, when using C++ virtual functions, where the actual called function
depends on the objects type at runtime).
Examples
Given a function definition:
int __cdecl delta(int a, int b)
Calls to it will compile to this (or similar) code:
PUSH b
PUSH a
CALL delta; EAX now contains the result
ADD ESP, 8
Given a function definition:
void __fastcall remove(void * structure, void * item)
Assuming structure and item are both declared as objects (not pointers), calls to it will
compile to:
LEA ECX, [structure]
LEA EDX, [item]
CALL remove; no stack modification
Assuming structure and item are declared as pointers, calls to it will compile to:
MOV ECX, structure
MOV EDX, item
CALL remove; no stack modification
Heres a complex example using function pointers:
typedef (int __stdcall *fptr)(int x1, int y1, int x2, int y2); function pointer type definition
int __stdcall Distance(int x1, int y1, int x2, int y2) { }; function declaration, lets assume this
functions code starts at 0x701050
int __fastcall callback(fptr function, int x1, int y1, int x2, int y2) {
int dist = (*f)(x1, y1, x2, y2);
return dist * 2;
}
int __fastcall callerFunction() {
return callback(&Distance, 3, 0, 0, 4);
}
; -- callerFunction -PUSH 4
PUSH 0
PUSH 0
MOV EDX, 3
MOV ECX, 0x701050
CALL callback
RET
; -- callback -PUSH EBP
MOV EBP, ESP
PUSH dword ptr [EBP+10h]
PUSH dword ptr [EBP+0Ch]
PUSH dword ptr [EBP+08h]
PUSH EDX
CALL [ECX]
SHL EAX, 1
MOV ESP, EBP
POP EBP
RET 0Ch
Additionally, observe that the array size is not stored anywhere and must be managed
separately.
Strings
Strings are stored as character arrays (plain char[] for a single-byte encoding, more complex
types such as wchar_t[] for multi byte encodings).
Notice again that the length is not saved as part of the array. As a convention, a string has to
contain a null byte (0x00) after its last symbol, and this byte is considered the string end
marker most C string management functions will keep reading symbols until they encounter
it. So this byte has to be accounted for when calculating the buffer size.
Allocation
Normally, arrays follow the same allocation rules as other variables function local variables
are allocated on the stack, and other variables (global and static) are allocated on the heap.
Strings are handled slightly differently string literals, which are known at compile time, are
allocated on the read-only portion of the heap. The compiler may or may not choose to
compact multiple identical literals into one allocation. As a result, string literals cannot be
modified at runtime (but for historical reasons they can be pointed to by plain char *, which
would suggest otherwise).
Constants
Integer constants are usually substituted by their values when compiling, which makes it very
difficult to determine which uses of a specific value were actually constants in the source
code. By extension, it makes it difficult to replace said constant without source code.
Floating-point constants are either substituted by their values or preserved in the read-only
data segment.
Pointer math
Pointer math works differently between C and assembler:
int x[2] = {1, 2};
int *p = &x[0]; // p now points to the first element of x
++p; // p now points to the second element of x, that is, p was actually incremented by
sizeof(*p).
LEA EAX, [x] ; EAX now points to the first byte of x
ADD EAX, 1 ; EAX now points to the second byte of x, which is certainly not the second
element of x.
As you can see, in assembler, pointer math is no different from normal math, and operates on
bytes. In C, pointer math operates on elements that the pointer is declared as pointing to.
Structs
As a C struct is simply a collection of values, theres not much special handling needed. The
only tricky part is that structs can have packing, that is, small values (bytes, bools) followed
by larger ones (dwords, doubles) can have unused bytes (called padding) after them to make
those larger values align to their preferred byte boundary. Neither C nor C++ standards define
any specific packing indicators, so each compiler has its own ways to change struct packing
as necessary. However, there is no padding inserted between array members.
Conditional statements
Conditional statements are usually expressed slightly differently in ASM, because ASM is built
around if condition, then skip the following code logic rather than if condition, then execute
following code logic that C/C++ uses.
Labels
Like in most other languages, a simple string followed by a : is called a label, it doesnt
produce any actual code but serves as an anchor/goal post jump instructions can aim for.
If
If(condition) {
x += 5;
}
Becomes something like this in assembler:
MOV ECX, condition ; assuming condition is a Boolean value
MOV EAX, x;
TEST ECX, ECX
JZ after_if
ADD EAX, 5
after_if:
If-else
If(condition) {
x += 5;
} else {
x -= 3;
}
Becomes:
MOV ECX, condition ; assuming condition is a Boolean value
MOV EAX, x;
TEST ECX, ECX
JZ else_branch ; if ECX is zero, go to label
ADD EAX, 5
JMP after_if
else_branch:
SUB EAX, 3
after_if:
If-elseif-else
This is just a variation on the simple if-else logic.
If(condition > 5) {
x += 4;
} elseif(condition < 1) {
x += 7;
} else {
x -= 10;
}
Turns into:
MOV ECX, x
MOV EAX, a
while_body:
ADD EAX, 2
DEC ECX
CMP ECX, 1
JA while_body ; assuming x is unsigned change to JG if signed
after_while:
All that is gone is the pre-loop condition check.
for loop
This loop is somewhat more terse, so its translation is a bit more complex:
for(int a = 0, x = 1; x < 10; ++x) {
a += 5 * x;
}
Becomes:
XOR EAX, EAX ; a = 0
MOV ECX, 1 ; x = 1
CMP ECX, 10 ; this check will not be made if the compiler is able to deduce its uselessness
JG after_loop ; x is signed, therefore JG
for_body:
LEA EDX, [ECX + 4 * ECX] ; tmp = 5 * x
ADD EAX, EDX
for_post_iteration: ; the loop body is finished, do the post-iteration action and comparison
INC ECX
CMP ECX, 10
JL for_body ; x is signed, therefore JL
after_loop:
As you can see, the structure of the transformation looks like this:
Int a = 0, x = 1;
if(x < 10) {
do {
a += x * 5;
++x;
} while (x < 10);
}
It looks awkward, so get used to it.
Switch statements
Switch statements are also tricky, and can be represented in several patterns, depending on
specific case values:
}
va_end(Arguments);
return sum;
}
This functionality is implemented through macros, and normally C programmers dont need to
understand how it works.
On the assembler side, this is not really different from accessing ordinary arguments.
Remember that in cdecl functions, arguments are all passed on the stack and take at least
four bytes.
PUSH EBP
MOV EBP, ESP
PUSH EBX
MOV ECX, dword ptr [EBP + 08h] ; take first argument
MOV EAX, ECX
LEA ECX, [EBP + 0Ch] ; address of second argument
OR EDX, 0FFFFFFFFh
loop_body:
MOV EBX, [ECX] ; read addressed argument
CMP EBX, EDX
JZ after_loop
ADD EAX, EBX
ADD ECX, 4 ; increment pointer to point to next argument
JMP loop_body
after_loop:
POP EBX ; this is not a scratch register, so it needs to be restored to its original state
MOV ESP, EBP
POP EBP
RET ; in cdecl, the caller is responsible for cleaning up the stack, so no argument
Notice that we calculated the sum directly in EAX, so we dont need to explicitly move it there
as a return value.
Additional C++ constructs
Constructors and destructors
C++ object construction is a bit complex it covers allocation of memory for the object(s) and
invocation of the appropriate initializer lists and constructor functions. Memory allocation is
complicated by:
1) the amount of objects - allocating a single object differs from allocating an array;
2) the programmer's choice of where to store the object:
a) if it's a local variable (on the stack), the memory space for it has already been allocated
and nothing needs to be done;
b) whereas if it is in heap memory (operator new), the appropriate memory allocation
routine has to be called;
c) It is possible for the programmer to allocate a memory block instead of asking operator
new to do so, this is known as placement new.
Similar issues arise with destructors if the object has a destructor, it has to be called, and if
the array holds such objects, the destructor has to be called for each of them.
Heap memory
Objects allocated on the heap (using operator new) begin by asking the memory allocator
routine to allocate a memory block of the appropriate size. If that fails, the reaction varies (C+
+ lets the programmer choose how to handle this problem). An exception can be thrown, or
NULL can be returned without calling the constructor.
If the allocation succeeds, the initializer list is invoked, followed by the appropriate
constructor. The constructor receives the pointer to the memory block as its this pointer and
compilers silently return it as the function result.
When allocating an array, and if the object being allocated has a destructor, things become
more complicated. The array can be of variable length, so something has to know how many
objects there are in the array and call the destructor for each of them. MSVC handles this by
requesting 4 more bytes than the array actually needs (which is array length * object size),
hiding the array length in the first 4 bytes, and returning the fifth byte of the allocated block
as the array start address. The compiler then runs the constructor in a loop, invoking it for
each object in order. After that is done, the array is ready for use. The destructor works in a
similar fashion it expects to find the array length in the four bytes immediately preceding
the array, runs the destructor that number of times (in reverse order! The last object in the
array is destroyed first, because it was constructed last), and returns the entire block to the
memory manager. This means that mixing up scalar and vector allocators/deallocators will
result in nasty crashing.
Independently allocated memory / placement new
A programmer can choose to manage his memory independently from the
constructors/destructor. In this case, placement new can be used to tell the compiler to just
construct the object at the given memory location. The object is constructed without any
memory management involved. The destruction is a bit trickier, since there's no built-in
placement delete you have to call the destructor manually, and then pass the pointer to the
memory manager for deallocation.
There is no standard way to manage arrays through this mechanism.
Local variable allocation
Local variables are typically allocated on the stack, unless the programmer uses the register
keyword to try and get them placed in a register, or the optimizer decides that would be a
good idea even without the programmer's hints. When they are allocated on the stack, the
memory necessary to store the object is allocated at function start automatically, and the
constructor doesn't have to do it. If the variable is an array, it is definitely of constant size, so
the compiler knows how much memory it needs to allocate, and how many times to call the
initializer list/constructor/destructor, so no extra memory is necessary to store that
information, and no tricks are needed in the destruction.
Note that if the function requires a large amount of stack space, the compiler inserts
additional code that walks the allocated area and pokes bytes every so often (in MSVC parlor,
this is __alloca_probe). This mechanism is necessary because of virtual memory memory
given to the program by the memory manager is initialized in pages on first use, and after
each initialized page a guard page is placed. Accessing this guard page results in an
exception which the memory manager silently swallows and initializes the page. If your stack
space takes more than 2 pages, it is possible youll be accessing the area beyond the guard
page, which will result in a different exception that the memory manager doesnt handle. The
poking process initializes the allocated area to avoid this. If you manually modify the stack
pointer to get scratch space this way, you should also perform a similar poking routine.
Enumerators
Enums are handled identically to simple numeric constants their occurrences in the source
are replaced by their literal values.
Methods
A method is simply a member function of a class. That is, it has a this pointer pointing to
the object itself. The way this pointer is passed to the function depends on the compiler (see
function declarations on thiscall). Unless the function is virtual, it looks quite like an ordinary
function whose argument happens to be a class pointer.
Inheritance
Single inheritance is straightforward the data members of the parent class are placed in the
beginning of the class, the data members of the child class are placed after them. The actual
layout of the parent data members is the exact same as it was in the parent class alone.
(Otherwise you couldnt pass a derived class pointer to a function expecting a base class one.)
Multiple inheritance is somewhat more complex, each parent class is laid out in the same way,
but the actual ordering of the parent classes is undefined the compiler is free to choose the
more optimal ordering.
Casting a multiply-inheriting class to one of its parents is a simple enough task for the
compiler, it just has to adjust the this pointer by the appropriate amount of bytes. Suppose
we have a class Base1 that takes 20 bytes, Base2 that takes 40, Base3 that takes 12, and
Derived that inherits from all three of them. If the compiler has chosen to keep the class
ordering as Base1 Base2 Base3, casting Derived* to Base3* requires an addition of 60 bytes.
If, on the other hand, the compiler had chosen the ordering of Base2 Base3 Base1, the same
cast would require only 40 bytes.
Multiple inheritance complicates method calls as well the compiler has to quietly cast the
this pointer to the class that the method expects.
Virtual inheritance
Virtual functions
Virtual functions are an important part of inheritance. Compilers typically combine all of the
classs virtual functions in declaration order, and put all of their addresses into an array. This
array is called the virtual function table, or vftable/vtable for short. The address of this array is
then (normally) inserted as the first invisible data member of the class. Child classes replace
those pointers with their own during construction and restore the base pointers during
destruction.
Calling a virtual function usually looks like this (in MSVC):
MOV ECX, ESI ; assuming ESI contains the object whose function were calling
MOV EDX, [ECX] ; EDX contains the pointer to the vftable
CALL [EDX+24h] ; call the function pointed to by the 9 th pointer in the table
Remember that in MSVC, this pointer is passed in ECX.
A class inheriting from multiple classes with their own virtual functions naturally has a
separate pointer for each classs vtable. But the functions of the second and later virtual table
have the same problem as simple methods of multiply-inheriting class. Suppose we have a
base class Base1 that has virtual functions and no data members (this is called an interface),
a base class Base2 that has both virtual functions and data members, and a derived class
Derived that inherits from both Base1 and Base2. Then calling a virtual function of Base2
looks like this:
MOV ECX, ESI
LEA ECX, [ECX + 4] ; load this pointer of the Base2 portion
MOV EDX, [ECX] ; address of vtable
CALL [EDX + 14h] ; call the 5th function in the vtable
Notice that the this pointer points to the Base2 portion. The actual implementations of the
functions in the Deriveds Base2 vtable can reference Deriveds own data members but the
member offsets will be different from those youd see when working with Deriveds non-virtual
functions.
In addition, Deriveds implementations of Base2 virtual functions can call Base1s functions
in which case the this pointer has to be adjusted again:
MOV ECX, ESI ; assume that were in a Base2 virtual function
LEA ECX, [ECX 4] ; reduce the pointer by four bytes (sizeof Base1)
MOV EDX, [ECX] ; load vtable
CALL [EDX + 10h] ; call the 4th function in Base1s vtable
This construct is relatively simple, but like all C++ additions it is usually beyond the
disassemblers range of comprehension. So it resorts to showing you a confusing pointer math
expression similar to this:
v48 = (DWORD *)(((char *)v3) 4);
v49 = *v48;
v49[4](v48);
You can assign a type to v48 to make the last two lines nicer and more readable, but the
pointer math is quite impossible to get rid of.
Notice that even if you define just one implementation for a virtual function, any derived class
that inherits from this base class and some additional classes potentially requires adjustments
of the this pointer. This is impossible to place into call sites, so adjustments are needed in
the functions themselves. This adjustment is known as a thunk. At least in MSVC, this is
implemented as follows: the derived classs vftable contains a pointer to this small function:
MOV ECX, ESI
LEA ECX, [ECX + 4] ; load the this pointer of the appropriate base class in this case the
appropriate base class starts at 0x4 bytes.
JMP BaseFunctionImplementation; the linker places the starting address of the base classs
implementation of this particular function here
The JMP means the RET in the parents implementation will return to the place where this
thunk was called from. This is easier than replicating all the argument layout, CALL ing it, and
RET ing in each thunk. Since nothing is done in the thunk itself, it usually doesnt create its
own stack frame.
COM
A COM interface is really nothing more than a simple class that has virtual functions (declared
abstract the inheriting class has to implement them manually) and no normal methods or
data members. Implementing an interface only adds one vftable pointer to the class.
Templates
Unlike actual C++ templates, their assembler representations are usually straightforward
each template specialization that is actually used in the code is created as a separate
standalone class/function with specific types and whatnot. The only hint of the function/class
being a template is usually the associated RTTI data.
Appendix A: Syringe
Syringe is a DLL Injection tool developed by us (mostly pd) to insert custom-written DLLs into
the target process and hook their code at specific locations. This enables us to write custom
code for the game in C++ instead of raw ASM. Its open source, and for the curious, heres the
breakdown of how it works.
-
Once the executable is found, Syringe attempts to parse its import table looking
for LoadLibrary and GetProcAddress. If they are not found, target will not be
launched.
Syringe scans the target executables directory for pairs of .dll and .dll.inj files.
An .inj file contains information about the addresses where which hooks should
be placed.
When the process is launched, Syringe injects a small piece of code that tells the
process to LoadLibrary each injectable DLL and GetProcAddress each hook
function.
When Syringe knows all hook function addresses, it allocates a small memory
area for each hook placement and rewrites the code at the hook location with a
jump to this memory (each hook can specify how many bytes of code should be
backed up, these bytes are copied to the end of the buffer). In this buffer is a call
to the hook function and analysis of its return.
o
Because an absolute jump takes five bytes, each hook will overwrite
at least that many bytes of original code. Make sure you are not
overwriting any bytes of a different code branch (there should be no
way to jump into the middle of the bytes being overwritten). That
includes if/else, switch cases, and function epilogues if you are
hooking up near the end of the function, be sure that your hook
wont hit the next function.
Before calling the hook, Syringe executes PUSHAD; PUSHFD; PUSH (hook
address); PUSH ESP;
If this is zero, the execution will continue at the original bytes that
have been backed up and will jump to the first byte that hasnt
been backed up afterwards, unless the original bytes contain a
control flow instruction.
If its not zero, the execution will treat the return value as an
absolute address to jump to.
In both cases, all registers and flags will be restored from the stack
before continuing.
After all hook functions are hooked up, Syringe lets the target process run
normally but keeps acting like a debugger, trapping exceptions and terminating
the process if necessary.
No support for packed executables. This was not necessary for the game, and as
such is not considered a priority.
No way to have two executables with different hooks in the same directory
Syringe will hook up all .inj files to the launched process.
No way to debug a process yourself Syringe acts like a debugger. This will
hopefully be solved in the future by making Syringe detach from the process
after all hooks are set up.
Integer overflows
Mathematical operations on numbers are subject to limits of values that can be represented
by the variables' types. As was mentioned very early, signed and unsigned integers typically
take up 32 bits in a 32-bit environment. Which means that a signed integer can represent
values between -231 and 231-1 inclusive, while an unsigned integer can represent values
between 0 and 232-1 inclusive. If the programmer is not careful when performing math,
unexpected results might happen (which is generally true for doing anything carelessly, come
to think of it...).
For example, processing file formats a file that contains an image typically starts with a
header that specifies the image's width and height. If the file format is not compressed (e.g.
BMP or TGA), the code can try to allocate a buffer to store the pixels like this:
byte * allocateBuffer(int width, int height, int bitDepth) {
int size = width * height;
size *= (bitDepth >> 3);
return new byte[size];
}
multiplying width by height by bytes-per-pixel. However, if the file header says it is very large
(e.g. a 24-bit BMP with width=0x10001, height=0x10001), the needed buffer size should be
0x10`001 * 0x10`001 * 3 = 0x100`020`001 * 3 = 0x300`060`003, but that number exceeds
32 bits... so the signed integer representing size will be left with a value of 0x60`003 bytes. If
the code then tries to copy the file contents into this buffer, it will start overwriting other data
outside this buffer after a while.
Of course using signed integers for dimensions is fairly stupid because they cannot be
negative, but using unsigned integers doesn't solve the problem of integer overflow. In fact, C
doesn't offer any way to detect that such an overflow has happened. The most reasonable
thing to do, in my opinion, would be to use unsigned integers for dimensions, and a longer
unsigned integer for the size. If the dimensions cannot exceed 32 bits, then their product
cannot exceed 64 bits using an unsigned 64 bit integer to store the size would be sufficient
to contain the width*height, which should be sanity-checked to not exceed 64 bits when
multiplied by the bytes-per-pixel value (this can be tested by using a constant equal to
0xFFFFFFFF`FFFFFFFF divided by the bytes-per-pixel, and testing that the w*h is not larger
than this value) . Of course other sanity checks for image dimensions should exist to refuse to
load images exceeding reasonable dimensions long before these bit size limitations come into
play.
Promoting values is also a possible source of problems. If you store the length of a string in a
short buffer (16 bits), you might be tempted to just pass this value to a memory allocator to
get a new buffer of the same length. But even though operator new[] takes an unsigned
(size_t) argument, the short is extended to int beforehand. Depending on the compiler, short
is likely to be signed by default, which means for any string longer than 0x7FFFF chars (that
is, 8 kb or more), it will be sign-extended and the memory allocator will receive an argument
that exceeds 0x7FFFFFFF (2 GB).
Another likely problem is comparing signed and unsigned values while modern compilers
can catch this, it's still a problem, especially when coupled with the previously mentioned
integer promotions. The coder might try to limit the maximum value of this buffer, but the
nave
Hit Ctrl-F1 to see a list of all the custom data structures that have been defined
(note that these are C equivalents of the games original C++ structures,
massaged into format that IDA can understand do not blindly copy these to
YR++ or assume these precisely represent Westwoods original data structures);
Hit Ctrl-M to get a list of all the locations that weve marked as important for
some reason maybe we need to refer to them often, maybe they represent
bugs that we will want to fix someday, maybe they are just exceptionally
interesting or dumb.
Reading the string literals can tell you a lot of interesting things, including all the flags that
are read from the INI files (which does not always indicate the flag value is actually used).
Inspecting the functions can explain how the game performs certain functionality, though this
is going to be a tricky undertaking given the size of the game. You can try using Hex-Rays (F5)
to create a C representation of the function, but this is going to be quite different from what
Westwood originally wrote do not cite hex-rays restored code as an example of Westwood
stupidity.
There is no real recipe for how to understand the games behaviour, so youll just have to read
the code and follow any function calls it performs to see what actually is done where and
when. This is worsened by the fact that the original code wasnt exceptionally well thought
out.
How-tos
If you want to find out when the flag Bunkerable= is used by the game, you can find
Bunkerable= in the string literals list and check the references to it this will show you the
function where all the flags are read from the INI and stored in the respective objects. In the
general case, youll see a call to INIClass_GetBool(pINI, pSection, Bunkerable), and the result
of this function will be placed into the respectable object at a specific offset. Later on, when
that flag is checked, the game has to read from that specific offset. Once you know what
offset that flag is stored in, you can use Alt-I (Search for Immediate) to search for all uses of
that numeric value (be sure to check Find all occurrences). This will produce a large list
including a lot of unrelated lines, because the search simply finds where that value is used as
an instruction operand, regardless of whether its mov eax, [esp+18h] , call [ebx+18h] or add
ecx, 18h. But since you know how data structure members are likely to be accessed (mov
edx, [esi+DFEh]), you can look over all the results of that form. If other data structures do not
have a member at the same offset, or you can determine when the instruction is referring to
the right data structure, youll be able to see where that flag is read and what is done as a
result.
When your cursor is on a named item (function name, static object, etc.) you can hit X to get
a list of all known references to that item in the executable. You can hit Ctrl-X to get a list of
all known references to the current address. Mind that if the named item's name can be
interpreted as an integer value, IDA will assume it is an address rather than a name hitting X
while the cursor is over labels dd or db will cause this confusion.
When your cursor is on a stack variable, you can use K to toggle between a named variable
and simple numeric stack offset. You can also hit Enter when its a named variable, to get a
new window containing the functions stack frame here you can modify local variables. If the
function uses any structures on the stack, you can use Alt-Q to place a structure at the right
offset.
When your cursor is on an expression like [ebx+48h], you can use Ctrl-T to select a structure
and display its member access instead of 48h, e.g. you can see expressions like
[ebx+ObjectClass.IsFalling] , which make it a lot more readable. If you select several lines
before hitting Ctrl-T, all accesses to the register your cursor was pointing at will be turned into
that structures offsets. This is convenient when youre analysing a large function in that
case, usually the ESI register is dedicated to storing the this pointer, and not modified
throughout the function. You can drag-select the code from the function start to the last
access to ESI, making sure the cursor stays on ESI, and hit Ctrl-T, then select the appropriate
structure. All the references to ESI will display that structures members at the given offsets.
We have defined class vftables as structures as well, so this works even for virtual function
calls. Note that right clicking the expression will offer a very similar Structure Offset submenu
with the structures and members listed.
the code is passing function pointers around as callbacks this usually means the
pointer points to the beginning of a function instead of the middle,
IDA has miscalculated the local stack pointer - it happens fairly often with virtual
functions (in which case you need to manually fix it in the IDB for future reference and
recheck the stack),
those addresses are leftovers from previous stack uses (uninitialized values, as far as
this call stack is concerned), in which case they can be ignored. One common case of
seeing these uninitialized values is fixed size arrays that track their used size
separately: if the array is 10 elements wide, and only the first three elements are used
at the point it crashed, the latter 7 elements can legitimately be uninitialized and
contain leftover pointers.
WinDBG's Memory View can also be used to inspect objects at byte level, and with symbols
for Ares loaded it can even display the objects as actual collections of properties, making
analysis even easier. That means, if the crash happened because the object is misconfigured
(often the cause of crashes in gamemd code), there will likely be a pointer to this object in the
stack near the crash site, or in one of the registers. It can be used to determine the faulty
object's ID, which can be used to analyse the object's INI code for problems.