Good Slides To Understand System Call

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

SYSTEM CALL

IMPLEMENTATION
CS124 – Operating Systems
Fall 2018-2019, Lecture 14
2

User Processes and System Calls


• Previously stated that user applications interact with the
kernel via system calls
• Typically invoked via a trap instruction
• An intentional software-generated exception

• The kernel registers a handler for a specific trap


• int $0x80 for Linux system calls
• int $0x2e for Windows system calls
• int $0x30 for Pintos system calls

• Can’t easily pass arguments to system calls on the stack


• Trap instruction causes the CPU to switch operating modes (i.e.
from user mode to kernel mode)
• Different operating modes have different stacks
3

User Processes and System Calls (2)


• Typically, arguments to system calls passed in registers,
and the return-value(s) come back in registers
• One of the arguments is an integer indicating which
system call to invoke
• e.g. on Linux and Windows, %eax is set to operation to perform
• e.g. on UNIX systems, sys/syscall.h specifies these numbers
• Note: UNIX syscall IDs are not uniform across different UNIXes
• Obvious constraint: system-call arguments can’t be wider
than the registers
• Several possible approaches:
• Can split larger arguments across multiple registers
• Can store larger arguments in a struct, then pass a pointer to the
struct as an argument
4

User Processes and System Calls (3)


• The operating system frequently exposes system calls via
a standard library
• e.g. UNIX syscalls are exposed via the C standard library (libc)
• e.g. Windows syscalls are exposed via the (largely undocumented)
Native API (ntapi.dll)
• The library serves as an intermediary between apps and
the operating system
• Some functions are direct wrappers for system calls
• e.g. ssize_t read(int fd, void *buf, size_t nbyte)
• Implementation stores arguments from stack into registers, invokes
the system call entry-point (e.g. int $0x80), and returns result
• Others utilize system call wrappers internally
• e.g. malloc() is mainly implemented in user space, but uses
system calls to increase the process’ heap size
5

Review: Interrupt Mechanics


• Previously discussed how interrupts and traps are
handled on IA32 (see lecture 9 for details)
• User process has its own stack
• Executing the trap causes the
CPU to switch to the kernel-mode User Process Stack Kernel Thread Stack

stack associated with the process Caller’s SS


current contents Caller’s ESP
• Since system calls change of user process
stack Caller’s EFLAGS
from user mode to kernel Caller’s CS
mode, IA32 saves pointer to Caller’s EIP
previous stack on new stack
• Next, CPU saves the user trap

process’ execution state:


cs, eip and eflags
6

Review: Interrupt Mechanics (2)


• Operating system has a stub for every possible interrupt
• Some interrupts push an error code onto the stack; if not,
the OS stub will push a dummy value for consistency
• Next, stub pushes the interrupt User Process Stack Kernel Thread Stack
number onto the stack Caller’s SS
• Finally, stub records all register current contents
of user process
Caller’s ESP

state onto kernel stack stack Caller’s EFLAGS


Caller’s CS
Caller’s EIP

• Now the ISR can run without Error Code


Interrupt No.
disrupting the interrupted code
Register State
of Interrupted
Program
7

System Call Mechanics


• The OS exposes the user program’s CPU and register
state as arguments to the ISR
• Typically exposed to ISR as a struct with a field for each register
• System call handler needs to receive arguments from the
user program User Process Stack Kernel Thread Stack

• Can easily access these values Caller’s SS


on the kernel stack current contents Caller’s ESP
of user process
• Syscall handler must also stack Caller’s EFLAGS

return a status result in eax Caller’s CS


Caller’s EIP
• Can modify user program’s eax
on the kernel stack Error Code
Interrupt No.
• When kernel returns to the user
Register State
program, its context is restored of Interrupted
• Program sees new value of eax Program
8

System Call Mechanics (2)


• The ID of the system call is used to dispatch to a function
that implements the system call
• Called a system call service routine
• System call service routines are usually named after their
user-mode entry points
• e.g. sys_write() implements write()
• e.g. sys_fork() implements fork()
• (Aside: these service routines are sometimes called within the
kernel implementation to implement more complex operations)
• A system call table holds an array of function pointers to
all system call service routines
• The syscall ID is used to index into this table when making the call
9

System Call Mechanics (3)


• Need to check the system call ID to ensure it’s valid…
• If it’s invalid, return ENOSYS “Function not implemented” error

• Can easily check that the ID is below the max syscall ID


• If a specific syscall ID below the max is not supported,
simply register a service routine that returns ENOSYS
Example: Linux System Calls
• Snippet [paraphrased] of Linux system_call() handler:

... # Save registers onto stack

# Make sure it's a valid syscall ID


cmpl $(NR_syscalls), %eax
jb nobadsys

# Return-value of syscall() will be in eax


# as usual, so set value of eax stored on
# kernel stack to ENOSYS to indicate error
movl $(-ENOSYS), 24(%esp)
jmp ret_from_sys_call
nobadsys:
...
Example: Linux System Calls (2)
• Linux system_call() handler, continued:

...
nobadsys:
# Dispatch to the function in the system-call
# table corresponding to the specified ID
# (On IA32, pointers are 4 bytes, so use
# ID*4 as the address within the table)
call *sys_call_table(, %eax, 4)

# Store return-value from routine into


# location of eax on the kernel stack
movl %eax, 24(%esp)
jmp ret_from_sys_call
12

Example: Linux System Calls (3)


• Different syscalls require different numbers of arguments
• e.g. getpid() and fork() require no arguments
• e.g. mmap() requires up to six arguments
• System-call arguments are passed from the Kernel Thread Stack
user process in specific registers

• ebx is first argument, ecx is second argument, etc.
ebp = arg6
• Syscall service routines are written in C, and edi = arg5
they expect their args on the kernel stack esi = arg4
edx = arg3
• Linux system_call() handler pushes all
ecx = arg2
of the process’ registers onto the kernel stack ebx = arg1
in a specific order
• Specifically, the reverse order that registers are
used to pass arguments to system calls
13

Example: Linux System Calls (4)


• Arguments to syscall service routines are pushed in
reverse order, following the cdecl calling convention
• Under cdecl, if a function is passed more arguments than
it expects, the extra arguments are ignored Kernel Thread Stack
• Allows system_call() to dispatch to all the …
different service routines, regardless of the ebp = arg6

number of arguments they take edi = arg5


esi = arg4
• e.g. int sys_write(int fd, char *buf, int size) edx
edx == arg3
size
• Service routine for write(int fd, char *buf, int size) ecx
ecx==arg2
buf
ebx
ebx==arg1
fd
• When system_call() dispatches to sys_write(),
return address
sys_write() sees only the expected arguments
sys_write() frame
• Extra arguments are simply ignored by sys_write() …
14

System Calls: Security Holes?


• It goes without saying that the system call service routine
must carefully check all arguments to the system call…

• Are there potential security holes in accepting pointers as


arguments to system calls?
• Example: ssize_t read(int fd, void *buf,
size_t nbytes)
• Reads bytes from a file descriptor into a buffer

• Caller specifies:
• The file-descriptor to read
• A pointer to the buffer to store the data in
• A number of bytes to read
15

System Calls: Security Holes?!


• Example: ssize_t read(int fd, void *buf,
size_t nbytes)
• Generally the pointers are expected to be in user space…
• What if the user-mode program specifies an address in
the kernel’s address space?
• As long as the user-mode program doesn’t access this address, it
won’t cause a general protection fault…
• But, the kernel is allowed to write to this address!
• If kernel naïvely accepts the address from the user-mode program,
it could overwrite critical data
• Example: target critical kernel data structures
• Program opens file containing the data it wants to insert into kernel
• Program passes that file descriptor and address of kernel struct…
16

System Calls: Security Holes


Process-specific
• Very important to verify all addresses data structures

Kernel Space
that come from user-mode programs: Kernel stack

• Addresses must be in userspace! Mapping to


physical memory
• If an address is in kernel space, it’s an
Kernel code
access violation and global data
0xc0000000
%esp User stack

• Fast way to verify addresses:


• Make sure the address is below the Memory mapped region
for shared libraries
kernel / user address boundary! 0x40000000

User Space
(e.g. 0xc0000000 in Linux/Pintos)
brk
Run-time heap
(via malloc)
Uninitialized data (.bss)
Initialized data (.data)
Program text (.text)
0x08048000
Forbidden
0
17

System Calls and Page Faults


• Addresses below the kernel / user Process-specific
data structures

Kernel Space
boundary could still be invalid… Kernel stack

• e.g. pass a pointer to unallocated memory Mapping to


to a read() system call physical memory
• e.g. pass a pointer to read-only memory Kernel code
to a write() system call 0xc0000000
and global data
User stack
• OS will see a page fault or a general %esp

protection fault within the kernel


• Problem: this isn’t always an error!
Memory mapped region
for shared libraries
0x40000000

User Space
• Many OSes don’t allocate virtual memory
pages until they are actually accessed brk
Run-time heap
• Private copy-on-write pages are marked (via malloc)
read-only; first write attempt causes the Uninitialized data (.bss)
page to be copied for the writing process Initialized data (.data)
Program text (.text)
0x08048000
Forbidden
0
18

System Calls and Page Faults (2)


• Aside:
• In the Pintos system-call lab, virtual memory management isn’t
completed yet, so a page fault does mean an invalid address J

• The OS may see memory faults within the kernel:


• Sometimes these are valid scenarios
• Sometimes it’s an invalid pointer passed to a syscall L
• Sometimes it is a kernel bug L L
• Assume there is a way to identify valid scenarios…
• (We will examine that question in a few weeks)

• How do we distinguish between the remaining two cases?


19

System Calls and Page Faults (3)


• How to distinguish between:
• Faults caused by invalid addresses passed to system calls
• Faults caused by kernel bugs

• Linux has a very interesting solution to this problem

• How much kernel code actually interacts with user space?


• (Remember, CPU state of user processes is saved onto the kernel
stack, which is in kernel space)
20

System Calls and Page Faults (3)


• The amount of kernel code that interacts with user space
is actually very small…

• Linux kernel keeps an exception table, which records the


addresses of all instructions that touch user space
• In the fault handler, consult the exception table:
• If the faulting instruction is in the exception table, then the user
program passed the kernel a bad pointer
• Otherwise, it’s a kernel bug L
• Aside: if it’s a kernel bug, Linux performs a kernel oops
• Print out suitable info for a kernel developer to debug the error, and
log it to the system log
• Then terminate the process!
• Keeps kernel bugs from bringing down the entire system…
21

Example Kernel Oops


22

Pintos System Calls


• Pintos doesn’t follow the Linux syscall mechanism
• Syscall arguments are on the user stack, not in the registers
• This complicates the syscall mechanism, but only slightly
• Pictorially: User Process Stack Kernel Thread Stack

current contents Caller’s SS


of user process Caller’s ESP
stack
Caller’s EFLAGS
Arguments to
System Call Sys-Call Args Caller’s CS
struct intr_frame
Caller’s EIP
(threads/interrupt.h)
Error Code
Interrupt No. Pointer is passed to
system-call function
Register State
(userprog/syscall.c)
of Interrupted
Program
23

Pintos System Calls (2)


• intr_frame struct exposes process machine context
struct intr_frame {
• Note that topmost values // Pushed by intr_entry (intr-stubs.S).
on stack appear at bottom // The interrupted task's saved registers.
uint32_t edi; // Saved EDI
of the structure… uint32_t esi; // Saved ESI
uint32_t ebp; // Saved EBP
• Recall: C structure members uint32_t esp_dummy; // Not used
assigned increasing offsets uint32_t ebx; // Saved EBX
...
• Last struct members have the
highest addresses // Pushed by intrNN_stub (intr-stubs.S).
uint32_t vec_no; // Interrupt vector no.
• This struct makes it easy to // Sometimes pushed by CPU; otherwise for
// consistency, 0 is pushed (intrNN_stub).
access the user process’ uint32_t error_code;
stack contents // Pushed by the CPU. These are the
• e.g. retrieve esp member, // interrupted task's saved registers.
void (*eip) (void); // Next instruction
cast to uint32_t*, then uint16_t cs, :16; // Code segment
access user stack like an array uint32_t eflags; // Saved CPU flags
void *esp; // Saved stack ptr
uint16_t ss, :16; // Stack segment
};
24

Pintos System Calls (3)


• Pintos system-call arguments are pushed on the user
process stack
• Arguments themselves are pushed in reverse order
• Finally, system-call number is pushed
User Process Stack Kernel Thread Stack

• Caller’s esp points to the current contents Caller’s SS


system-call number of user process
stack
Caller’s ESP

• Use syscall no. to determine Caller’s EFLAGS


how many args are required Caller’s CS
Arg N Caller’s EIP
Arguments to … Error Code
• Finally, read in the System Call
Arg 1 Interrupt No.
args themselves Syscall Number Register State
• Accessing user-space, of Interrupted
so need to do this carefully Program
25

Next Topics!
• Next three lectures cover two fun topics!

• How signal handling works (1 lecture)

• Kernel allocators: how memory allocations are managed


within the kernel (2 lectures)

You might also like