Ipc
Ipc
Ipc
6.1
Introduction
The Linux IPC (Inter-process communication) facilities provide a method for multiple processes to communicate with one another. There are several methods of IPC available to Linux C programmers: Half-duplex UNIX pipes FIFOs (named pipes) SYSV style message queues SYSV style semaphore sets SYSV style shared memory segments Networking sockets (Berkeley style) (not covered in this paper) Full-duplex pipes (STREAMS pipes) (not covered in this paper) These facilities, when used effectively, provide a solid framework for client/server development on any UNIX system (including Linux).
6.2
6.2.1
Simply put, a pipe is a method of connecting the standard output of one process to the standard input of another. Pipes are the eldest of the IPC tools, having been around since the earliest incarnations of the UNIX operating system. They provide a method of one-way communications (hence the term half-duplex) between processes. This feature is widely used, even on the UNIX command line (in the shell).
ls | sort | lp
17
18
The above sets up a pipeline, taking the output of ls as the input of sort, and the output of sort as the input of lp. The data is running through a half duplex pipe, traveling (visually) left to right through the pipeline. Although most of us use pipes quite religiously in shell script programming, we often do so without giving a second thought to what transpires at the kernel level. When a process creates a pipe, the kernel sets up two le descriptors for use by the pipe. One descriptor is used to allow a path of input into the pipe (write), while the other is used to obtain data from the pipe (read). At this point, the pipe is of little practical use, as the creating process can only use the pipe to communicate with itself. Consider this representation of a process and the kernel after a pipe has been created:
From the above diagram, it is easy to see how the descriptors are connected together. If the process sends data through the pipe (fd0), it has the ability to obtain (read) that information from fd1. However, there is a much larger objective of the simplistic sketch above. While a pipe initially connects a process to itself, data traveling through the pipe moves through the kernel. Under Linux, in particular, pipes are actually represented internally with a valid inode. Of course, this inode resides within the kernel itself, and not within the bounds of any physical le system. This particular point will open up some pretty handy I/O doors for us, as we will see a bit later on. At this point, the pipe is fairly useless. After all, why go to the trouble of creating a pipe if we are only going to talk to ourself? At this point, the creating process typically forks a child process. Since a child process will inherit any open le descriptors from the parent, we now have the basis for multiprocess communication (between parent and child). Consider this updated version of our simple sketch:
Above, we see that both processes now have access to the le descriptors which constitute the pipeline. It is at this stage, that a critical decision must be made. In which direction do we desire data to travel? Does the child process send information to the parent, or viceversa? The two processes mutually agree on this issue, and proceed to close the end of the pipe that they are not concerned with. For discussion purposes, lets say the child performs some processing, and sends information back through the pipe to the parent. Our newly revised sketch would appear as such:
19
Construction of the pipeline is now complete! The only thing left to do is make use of the pipe. To access a pipe directly, the same system calls that are used for low-level le I/O can be used (recall that pipes are actually represented internally as a valid inode). To send data to the pipe, we use the write() system call, and to retrieve data from the pipe, we use the read() system call. Remember, low-level le I/O system calls work with le descriptors! However, keep in mind that certain system calls, such as lseek(), do not work with descriptors to pipes.
6.2.2
Creating Pipes in C
Creating pipelines with the C programming language can be a bit more involved than our simple shell example. To create a simple pipe with C, we make use of the pipe() system call. It takes a single argument, which is an array of two integers, and if successful, the array will contain two new le descriptors to be used for the pipeline. After creating a pipe, the process typically spawns a new process (remember the child inherits open le descriptors). SYSTEM CALL: pipe(); PROTOTYPE: int pipe( int fd[2] ); RETURNS: 0 on success -1 on error: errno = EMFILE (no free descriptors) EMFILE (system file table is full) EFAULT (fd array is not valid) NOTES: fd[0] is set up for reading, fd[1] is set up for writing The rst integer in the array (element 0) is set up and opened for reading, while the second integer (element 1) is set up and opened for writing. Visually speaking, the output of fd1 becomes the input for fd0. Once again, all data traveling through the pipe moves through the kernel. #include <stdio.h> #include <unistd.h> #include <sys/types.h> main() { int fd[2];
pipe(fd); . . }
20
Remember that an array name in C decays into a pointer to its rst member. Above, fd is equivalent to &fd[0]. Once we have established the pipeline, we then fork our new child process: #include <stdio.h> #include <unistd.h> #include <sys/types.h> main() { int pid_t fd[2]; childpid;
pipe(fd); if((childpid = fork()) == -1) { perror("fork"); exit(1); } . . } If the parent wants to receive data from the child, it should close fd1, and the child should close fd0. If the parent wants to send data to the child, it should close fd0, and the child should close fd1. Since descriptors are shared between the parent and child, we should always be sure to close the end of pipe we arent concerned with. On a technical note, the EOF will never be returned if the unnecessary ends of the pipe are not explicitly closed. #include <stdio.h> #include <unistd.h> #include <sys/types.h> main() { int pid_t fd[2]; childpid;
pipe(fd); if((childpid = fork()) == -1) { perror("fork"); exit(1); } if(childpid == 0) { /* Child process closes up input side of pipe */ close(fd[0]); } else {
21
/* Parent process closes up output side of pipe */ close(fd[1]); } . . } As mentioned previously, once the pipeline has been established, the le descriptors may be treated like descriptors to normal les.
/***************************************************************************** Excerpt from "Linux Programmers Guide - Chapter 6" (C)opyright 1994-1995, Scott Burkett ***************************************************************************** MODULE: pipe.c *****************************************************************************/ #include <stdio.h> #include <unistd.h> #include <sys/types.h> int main(void) { int pid_t char char
pipe(fd); if((childpid = fork()) == -1) { perror("fork"); exit(1); } if(childpid == 0) { /* Child process closes up input side of pipe */ close(fd[0]); /* Send "string" through the output side of pipe */ write(fd[1], string, strlen(string)); exit(0); } else { /* Parent process closes up output side of pipe */ close(fd[1]); /* Read in a string from the pipe */ nbytes = read(fd[0], readbuffer, sizeof(readbuffer)); printf("Received string: %s", readbuffer);
22 }
return(0); } Often, the descriptors in the child are duplicated onto standard input or output. The child can then exec() another program, which inherits the standard streams. Lets look at the dup() system call: SYSTEM CALL: dup(); PROTOTYPE: int dup( int oldfd ); RETURNS: new descriptor on success -1 on error: errno = EBADF (oldfd is not a valid descriptor) EBADF (newfd is out of range) EMFILE (too many descriptors for the process) NOTES: the old descriptor is not closed! Both may be used interchangeably
Although the old descriptor and the newly created descriptor can be used interchangeably, we will typically close one of the standard streams rst. The dup() system call uses the lowest-numbered, unused descriptor for the new one. Consider: . . childpid = fork(); if(childpid == 0) { /* Close up standard input of the child */ close(0); /* Duplicate the input side of pipe to stdin */ dup(fd[0]); execlp("sort", "sort", NULL); . } Since le descriptor 0 (stdin) was closed, the call to dup() duplicated the input descriptor of the pipe (fd0) onto its standard input. We then make a call to execlp(), to overlay the childs text segment (code) with that of the sort program. Since newly execd programs inherit standard streams from their spawners, it actually inherits the input side of the pipe as its standard input! Now, anything that the original parent process sends to the pipe, goes into the sort facility. There is another system call, dup2(), which can be used as well. This particular call originated with Version 7 of UNIX, and was carried on through the BSD releases and is now required by the POSIX standard. SYSTEM CALL: dup2(); PROTOTYPE: int dup2( int oldfd, int newfd ); RETURNS: new descriptor on success -1 on error: errno = EBADF (oldfd is not a valid descriptor)
23 EBADF (newfd is out of range) EMFILE (too many descriptors for the process)
NOTES: the old descriptor is closed with dup2()! With this particular call, we have the close operation, and the actual descriptor duplication, wrapped up in one system call. In addition, it is guaranteed to be atomic, which essentially means that it will never be interrupted by an arriving signal. The entire operation will transpire before returning control to the kernel for signal dispatching. With the original dup() system call, programmers had to perform a close() operation before calling it. That resulted in two system calls, with a small degree of vulnerability in the brief amount of time which elapsed between them. If a signal arrived during that brief instance, the descriptor duplication would fail. Of course, dup2() solves this problem for us. Consider: . . childpid = fork(); if(childpid == 0) { /* Close stdin, duplicate the input side of pipe to stdin */ dup2(0, fd[0]); execlp("sort", "sort", NULL); . . }
6.2.3
If all of the above ramblings seem like a very round-about way of creating and utilizing pipes, there is an alternative. LIBRARY FUNCTION: popen(); PROTOTYPE: FILE *popen ( char *command, char *type); RETURNS: new file stream on success NULL on unsuccessful fork() or pipe() call NOTES: creates a pipe, and performs fork/exec operations using "command" This standard library function creates a half-duplex pipeline by calling pipe() internally. It then forks a child process, execs the Bourne shell, and executes the command argument within the shell. Direction of data ow is determined by the second argument, type. It can be r or w, for read or write. It cannot be both! Under Linux, the pipe will be opened up in the mode specied by the rst character of the type argument. So, if you try to pass rw, it will only open it up in read mode. While this library function performs quite a bit of the dirty work for you, there is a substantial tradeoff. You lose the ne control you once had by using the pipe() system call, and handling the fork/exec yourself. However, since the Bourne shell is used directly, shell metacharacter expansion (including wildcards) is permissible within the command argument. Pipes which are created with popen() must be closed with pclose(). By now, you have probably realized that popen/pclose share a striking resemblance to the standard le stream I/O functions fopen() and fclose().
24
CHAPTER 6. LINUX INTERPROCESS COMMUNICATIONS LIBRARY FUNCTION: pclose(); PROTOTYPE: int pclose( FILE *stream ); RETURNS: exit status of wait4() call -1 if "stream" is not valid, or if wait4() fails NOTES: waits on the pipe process to terminate, then closes the stream.
The pclose() function performs a wait4() on the process forked by popen(). When it returns, it destroys the pipe and the le stream. Once again, it is synonymous with the fclose() function for normal stream-based le I/O. Consider this example, which opens up a pipe to the sort command, and proceeds to sort an array of strings: /***************************************************************************** Excerpt from "Linux Programmers Guide - Chapter 6" (C)opyright 1994-1995, Scott Burkett ***************************************************************************** MODULE: popen1.c *****************************************************************************/ #include <stdio.h> #define MAXSTRS 5 int main(void) { int cntr; FILE *pipe_fp; char *strings[MAXSTRS] = { "echo", "bravo", "alpha", "charlie", "delta"}; /* Create one way pipe line with call to popen() */ if (( pipe_fp = popen("sort", "w")) == NULL) { perror("popen"); exit(1); } /* Processing loop */ for(cntr=0; cntr<MAXSTRS; cntr++) { fputs(strings[cntr], pipe_fp); fputc(\n, pipe_fp); } /* Close the pipe */ pclose(pipe_fp); return(0); } Since popen() uses the shell to do its bidding, all shell expansion characters and metacharacters are available for use! In addition, more advanced techniques such as redi-
25
rection, and even output piping, can be utilized with popen(). Consider the following sample calls: popen("ls scottb", "r"); popen("sort > /tmp/foo", "w"); popen("sort | uniq | more", "w"); As another example of popen(), consider this small program, which opens up two pipes (one to the ls command, the other to sort): /***************************************************************************** Excerpt from "Linux Programmers Guide - Chapter 6" (C)opyright 1994-1995, Scott Burkett ***************************************************************************** MODULE: popen2.c *****************************************************************************/ #include <stdio.h> int main(void) { FILE *pipein_fp, *pipeout_fp; char readbuf[80]; /* Create one way pipe line with call to popen() */ if (( pipein_fp = popen("ls", "r")) == NULL) { perror("popen"); exit(1); } /* Create one way pipe line with call to popen() */ if (( pipeout_fp = popen("sort", "w")) == NULL) { perror("popen"); exit(1); } /* Processing loop */ while(fgets(readbuf, 80, pipein_fp)) fputs(readbuf, pipeout_fp); /* Close the pipes */ pclose(pipein_fp); pclose(pipeout_fp); return(0); } For our nal demonstration of popen(), lets create a generic program that opens up a pipeline between a passed command and lename: /*****************************************************************************
26
Excerpt from "Linux Programmers Guide - Chapter 6" (C)opyright 1994-1995, Scott Burkett ***************************************************************************** MODULE: popen3.c *****************************************************************************/ #include <stdio.h> int main(int argc, char *argv[]) { FILE *pipe_fp, *infile; char readbuf[80]; if( argc != 3) { fprintf(stderr, "USAGE: exit(1); }
/* Open up input file */ if (( infile = fopen(argv[2], "rt")) == NULL) { perror("fopen"); exit(1); } /* Create one way pipe line with call to popen() */ if (( pipe_fp = popen(argv[1], "w")) == NULL) { perror("popen"); exit(1); } /* Processing loop */ do { fgets(readbuf, 80, infile); if(feof(infile)) break; fputs(readbuf, pipe_fp); } while(!feof(infile)); fclose(infile); pclose(pipe_fp); return(0); } Try this program out, with the following invocations: popen3 popen3 popen3 popen3 sort popen3.c cat popen3.c more popen3.c cat popen3.c | grep main
27
6.2.4
In order for an operation to be considered atomic, it must not be interrupted for any reason at all. The entire operation occurs at once. The POSIX standard dictates in /usr/include/posix1 lim.h that the maximum buffer size for an atomic operation on a pipe is: #define _POSIX_PIPE_BUF 512
Up to 512 bytes can be written or retrieved from a pipe atomically. Anything that crosses this threshold will be split, and not atomic. Under Linux, however, the atomic operational limit is dened in linux/limits.h as: #define PIPE_BUF 4096
As you can see, Linux accommodates the minimum number of bytes required by POSIX, quite considerably I might add. The atomicity of a pipe operation becomes important when more than one process is involved (FIFOS). For example, if the number of bytes written to a pipe exceeds the atomic limit for a single operation, and multiple processes are writing to the pipe, the data will be interleaved or chunked. In other words, one process may insert data into the pipeline between the writes of another.
6.2.5
Two way pipes can be created by opening up two pipes, and properly reassigning the le descriptors in the child process. The pipe() call must be made BEFORE a call to fork(), or the descriptors will not be inherited by the child! (same for popen()). With half-duplex pipes, any connected processes must share a related ancestry. Since the pipe resides within the connes of the kernel, any process that is not in the ancestry for the creator of the pipe has no way of addressing it. This is not the case with named pipes (FIFOS).
6.3
6.3.1
A named pipe works much like a regular pipe, but does have some noticeable differences. Named pipes exist as a device special le in the le system. Processes of different ancestry can share data through a named pipe. When all I/O is done by sharing processes, the named pipe remains in the le system for later use.
6.3.2
Creating a FIFO
There are several ways of creating a named pipe. The rst two can be done directly from the shell. mknod MYFIFO p mkfifo a=rw MYFIFO
28
The above two commands perform identical operations, with one exception. The mkfo command provides a hook for altering the permissions on the FIFO le directly after creation. With mknod, a quick call to the chmod command will be necessary. FIFO les can be quickly identied in a physical le system by the p indicator seen here in a long directory listing: $ ls -l MYFIFO prw-r--r-1 root
root
Also notice the vertical bar (pipe sign) located directly after the le name. Another great reason to run Linux, eh? To create a FIFO in C, we can make use of the mknod() system call: LIBRARY FUNCTION: mknod(); PROTOTYPE: int mknod( char *pathname, mode_t mode, dev_t dev); RETURNS: 0 on success, -1 on error: errno = EFAULT (pathname invalid) EACCES (permission denied) ENAMETOOLONG (pathname too long) ENOENT (invalid pathname) ENOTDIR (invalid pathname) (see man page for mknod for others) NOTES: Creates a filesystem node (file, device file, or FIFO) I will leave a more detailed discussion of mknod() to the man page, but lets consider a simple example of FIFO creation from C: mknod("/tmp/MYFIFO", S_IFIFO|0666, 0); In this case, the le /tmp/MYFIFO is created as a FIFO le. The requested permissions are 0666, although they are affected by the umask setting as follows: final_umask = requested_permissions & original_umask A common trick is to use the umask() system call to temporarily zap the umask value: umask(0); mknod("/tmp/MYFIFO", S_IFIFO|0666, 0); In addition, the third argument to mknod() is ignored unless we are creating a device le. In that instance, it should specify the major and minor numbers of the device le.
6.3.3
FIFO Operations
I/O operations on a FIFO are essentially the same as for normal pipes, with once major exception. An open system call or library function should be used to physically open up a channel to the pipe. With half-duplex pipes, this is unnecessary, since the pipe resides in the kernel and not on a physical lesystem. In our examples, we will treat the pipe as a stream, opening it up with fopen(), and closing it with fclose(). Consider a simple server process:
29
/***************************************************************************** Excerpt from "Linux Programmers Guide - Chapter 6" (C)opyright 1994-1995, Scott Burkett ***************************************************************************** MODULE: fifoserver.c *****************************************************************************/ #include #include #include #include <stdio.h> <stdlib.h> <sys/stat.h> <unistd.h>
int main(void) { FILE *fp; char readbuf[80]; /* Create the FIFO if it does not exist */ umask(0); mknod(FIFO_FILE, S_IFIFO|0666, 0); while(1) { fp = fopen(FIFO_FILE, "r"); fgets(readbuf, 80, fp); printf("Received string: %s\n", readbuf); fclose(fp); } return(0); } Since a FIFO blocks by default, run the server in the background after you compile it: $ fifoserver& We will discuss a FIFOs blocking action in a moment. First, consider the following simple client frontend to our server: /***************************************************************************** Excerpt from "Linux Programmers Guide - Chapter 6" (C)opyright 1994-1995, Scott Burkett ***************************************************************************** MODULE: fifoclient.c *****************************************************************************/ #include <stdio.h> #include <stdlib.h> #define FIFO_FILE "MYFIFO"
30
int main(int argc, char *argv[]) { FILE *fp; if ( argc != 2 ) { printf("USAGE: fifoclient [string]\n"); exit(1); } if((fp = fopen(FIFO_FILE, "w")) == NULL) { perror("fopen"); exit(1); } fputs(argv[1], fp); fclose(fp); return(0); }
6.3.4
Normally, blocking occurs on a FIFO. In other words, if the FIFO is opened for reading, the process will block until some other process opens it for writing. This action works vice-versa as well. If this behavior is undesirable, the O NONBLOCK ag can be used in an open() call to disable the default blocking action. In the case with our simple server, we just shoved it into the background, and let it do its blocking there. The alternative would be to jump to another virtual console and run the client end, switching back and forth to see the resulting action.
6.3.5
On a last note, pipes must have a reader and a writer. If a process tries to write to a pipe that has no reader, it will be sent the SIGPIPE signal from the kernel. This is imperative when more than two processes are involved in a pipeline.
6.4
6.4.1
System V IPC
Fundamental Concepts
With System V, AT&T introduced three new forms of IPC facilities (message queues, semaphores, and shared memory). While the POSIX committee has not yet completed its standardization of these facilities, most implementations do support these. In addition, Berkeley (BSD) uses sockets as its primary form of IPC, rather than the System V elements. Linux has the ability to use both forms of IPC (BSD and System V), although we will not discuss sockets until a later chapter. The Linux implementation of System V IPC was authored by Krishna Balasubramanian, at [email protected]. IPC Identiers Each IPC object has a unique IPC identier associated with it. When we say IPC object, we are speaking of a single message queue, semaphore set, or shared memory segment.
31
This identier is used within the kernel to uniquely identify an IPC object. For example, to access a particular shared memory segment, the only item you need is the unique ID value which has been assigned to that segment. The uniqueness of an identier is relevant to the type of object in question. To illustrate this, assume a numeric identier of 12345. While there can never be two message queues with this same identier, there exists the distinct possibility of a message queue and, say, a shared memory segment, which have the same numeric identier. IPC Keys To obtain a unique ID, a key must be used. The key must be mutually agreed upon by both client and server processes. This represents the rst step in constructing a client/server framework for an application. When you use a telephone to call someone, you must know their number. In addition, the phone company must know how to relay your outgoing call to its nal destination. Once the other party responds by answering the telephone call, the connection is made. In the case of System V IPC facilities, the telephone correllates directly with the type of object being used. The phone company, or routing method, can be directly associated with an IPC key. The key can be the same value every time, by hardcoding a key value into an application. This has the disadvantage of the key possibly being in use already. Often, the ftok() function is used to generate key values for both the client and the server. LIBRARY FUNCTION: ftok(); PROTOTYPE: key_t ftok ( char *pathname, char proj ); RETURNS: new IPC key value if successful -1 if unsuccessful, errno set to return of stat() call The returned key value from ftok() is generated by combining the inode number and minor device number from the le in argument one, with the one character project indentier in the second argument. This doesnt guarantee uniqueness, but an application can check for collisions and retry the key generation. key_t mykey; mykey = ftok("/tmp/myapp", a); In the above snippet, the directory /tmp/myapp is combined with the one letter identier of a. Another common example is to use the current directory: key_t mykey; mykey = ftok(".", a); The key generation algorithm used is completely up to the discretion of the application programmer. As long as measures are in place to prevent race conditions, deadlocks, etc, any method is viable. For our demonstration purposes, we will use the ftok() approach. If we assume that each client process will be running from a unique home directory, the keys generated should sufce for our needs. The key value, however it is obtained, is used in subsequent IPC system calls to create or gain access to IPC objects.
The ipcs command can be used to obtain the status of all System V IPC objects. The Linux version of this tool was also authored by Krishna Balasubramanian. ipcs -q: ipcs -s: ipcs -m: ipcs --help: Show only message queues Show only semaphores Show only shared memory Additional arguments
By default, all three categories of objects are shown. Consider the following sample output of ipcs: ------ Shared Memory Segments -------shmid owner perms bytes ------ Semaphore Arrays -------semid owner perms nsems ------ Message Queues -------msqid owner perms used-bytes 0 root 660 5
nattch
status
status
messages 1
Here we see a single message queue which has an identier of 0. It is owned by the user root, and has octal permissions of 660, or -rw-rw---. There is one message in the queue, and that message has a total size of 5 bytes. The ipcs command is a very powerful tool which provides a peek into the kernels storage mechanisms for IPC objects. Learn it, use it, revere it. The ipcrm Command The ipcrm command can be used to remove an IPC object from the kernel. While IPC objects can be removed via system calls in user code (well see how in a moment), the need often arises, especially under development environments, to remove IPC objects manually. Its usage is simple: ipcrm <msg | sem | shm> <IPC ID>
Simply specify whether the object to be deleted is a message queue (msg), a semaphore set (sem), or a shared memory segment (shm). The IPC ID can be obtained by the ipcs command. You have to specify the type of object, since identiers are unique among the same type (recall our discussion of this earlier).
6.4.2
Message Queues
Basic Concepts Message queues can be best described as an internal linked list within the kernels addressing space. Messages can be sent to the queue in order and retrieved from the queue in several different ways. Each message queue (of course) is uniquely identied by an IPC identier.
33
The key to fully understanding such complex topics as System V IPC is to become intimately familiar with the various internal data structures that reside within the connes of the kernel itself. Direct access to some of these structures is necessary for even the most primitive operations, while others reside at a much lower level. Message buffer The rst structure well visit is the msgbuf structure. This particular data structure can be thought of as a template for message data. While it is up to the programmer to dene structures of this type, it is imperative that you understand that there is actually a structure of type msgbuf. It is declared in linux/msg.h as follows: /* message buffer for msgsnd and msgrcv calls */ struct msgbuf { long mtype; /* type of message */ char mtext[1]; /* message text */ }; There are two members in the msgbuf structure: mtype The message type, represented in a positive number. This must be a positive number! mtext The message data itself. The ability to assign a given message a type, essentially gives you the capability to multiplex messages on a single queue. For instance, client processes could be assigned a magic number, which could be used as the message type for messages sent from a server process. The server itself could use some other number, which clients could use to send messages to it. In another scenario, an application could mark error messages as having a message type of 1, request messages could be type 2, etc. The possibilities are endless. On another note, do not be misled by the almost too-descriptive name assigned to the message data element (mtext). This eld is not restricted to holding only arrays of characters, but any data, in any form. The eld itself is actually completely arbitrary, since this structure gets redened by the application programmer. Consider this redenition: struct my_msgbuf { long mtype; long request_id; struct client info; };
Here we see the message type, as before, but the remainder of the structure has been replaced by two other elements, one of which is another structure! This is the beauty of message queues. The kernel makes no translations of data whatsoever. Any information can be sent. There does exist an internal limit, however, of the maximum size of a given message. In Linux, this is dened in linux/msg.h as follows: #define MSGMAX 4056 /* <= 4056 */ /* max size of message (bytes) */
Messages can be no larger than 4,056 bytes in total size, including the mtype member, which is 4 bytes in length (long).
34
Kernel msg structure The kernel stores each message in the queue within the framework of the msg structure. It is dened for us in linux/msg.h as follows: /* one msg structure for each message */ struct msg { struct msg *msg_next; /* next message on queue */ long msg_type; char *msg_spot; /* message text address */ short msg_ts; /* message text size */ }; msg next This is a pointer to the next message in the queue. They are stored as a singly linked list within kernel addressing space. msg type This is the message type, as assigned in the user structure msgbuf. msg spot A pointer to the beginning of the message body. msg ts The length of the message text, or body. Kernel msqid ds structure Each of the three types of IPC objects has an internal data structure which is maintained by the kernel. For message queues, this is the msqid ds structure. The kernel creates, stores, and maintains an instance of this structure for every message queue created on the system. It is dened in linux/msg.h as follows: /* one msqid structure for each queue on the system */ struct msqid_ds { struct ipc_perm msg_perm; struct msg *msg_first; /* first message on queue */ struct msg *msg_last; /* last message in queue */ time_t msg_stime; /* last msgsnd time */ time_t msg_rtime; /* last msgrcv time */ time_t msg_ctime; /* last change time */ struct wait_queue *wwait; struct wait_queue *rwait; ushort msg_cbytes; ushort msg_qnum; ushort msg_qbytes; /* max number of bytes on queue */ ushort msg_lspid; /* pid of last msgsnd */ ushort msg_lrpid; /* last receive pid */ }; While you will rarely have to concern yourself with most of the members of this structure, a brief description of each is in order to complete our tour: msg perm An instance of the ipc perm structure, which is dened for us in linux/ipc.h. This holds the permission information for the message queue, including the access permissions, and information about the creator of the queue (uid, etc).
6.4. SYSTEM V IPC msg first Link to the rst message in the queue (the head of the list). msg last Link to the last message in the queue (the tail of the list). msg stime Timestamp (time t) of the last message that was sent to the queue. msg rtime Timestamp of the last message retrieved from the queue. msg ctime Timestamp of the last change made to the queue (more on this later). wwait and rwait
35
Pointers into the kernels wait queue. They are used when an operation on a message queue deems the process go into a sleep state (i.e. queue is full and the process is waiting for an opening). msg cbytes Total number of bytes residing on the queue (sum of the sizes of all messages). msg qnum Number of messages currently in the queue. msg qbytes Maximum number of bytes on the queue. msg lspid The PID of the process who sent the last message. msg lrpid The PID of the process who retrieved the last message. Kernel ipc perm structure The kernel stores permission information for IPC objects in a structure of type ipc perm. For example, in the internal structure for a message queue described above, the msg perm member is of this type. It is declared for us in linux/ipc.h as follows: struct ipc_perm { key_t key; ushort uid; ushort gid; ushort cuid; ushort cgid; ushort mode; ushort seq; };
/* owner euid and egid */ /* creator euid and egid */ /* access modes see mode flags below */ /* slot usage sequence number */
36
All of the above are fairly self-explanatory. Stored along with the IPC key of the object is information about both the creator and owner of the object (they may be different). The octal access modes are also stored here, as an unsigned short. Finally, the slot usage sequence number is stored at the end. Each time an IPC object is closed via a system call (destroyed), this value gets incremented by the maximum number of IPC objects that can reside in a system. Will you have to concern yourself with this value? No. NOTE:There is an excellent discussion on this topic, and the security reasons as to its existence and behavior, in Richard Stevens UNIX Network Programming book, pp. 125. SYSTEM CALL: msgget() In order to create a new message queue, or access an existing queue, the msgget() system call is used. SYSTEM CALL: msgget(); PROTOTYPE: int msgget ( key_t key, int msgflg ); RETURNS: message queue identifier on success -1 on error: errno = EACCESS (permission denied) EEXIST (Queue exists, cannot create) EIDRM (Queue is marked for deletion) ENOENT (Queue does not exist) ENOMEM (Not enough memory to create queue) ENOSPC (Maximum queue limit exceeded) NOTES: The rst argument to msgget() is the key value (in our case returned by a call to ftok()). This key value is then compared to existing key values that exist within the kernel for other message queues. At that point, the open or access operation is dependent upon the contents of the msgflg argument. IPC CREAT Create the queue if it doesnt already exist in the kernel. IPC EXCL When used with IPC CREAT, fail if queue already exists. If IPC CREAT is used alone, msgget() either returns the message queue identier for a newly created message queue, or returns the identier for a queue which exists with the same key value. If IPC EXCL is used along with IPC CREAT, then either a new queue is created, or if the queue exists, the call fails with -1. IPC EXCL is useless by itself, but when combined with IPC CREAT, it can be used as a facility to guarantee that no existing queue is opened for access. An optional octal mode may be ORd into the mask, since each IPC object has permissions that are similar in functionality to le permissions on a UNIX le system! Lets create a quick wrapper function for opening or creating message queue: int open_queue( key_t keyval ) { int qid; if((qid = msgget( keyval, IPC_CREAT | 0660 )) == -1) {
37
Note the use of the explicit permissions of 0660. This small function either returns a message queue identier (int), or -1 on error. The key value must be passed to it as its only argument. SYSTEM CALL: msgsnd() Once we have the queue identier, we can begin performing operations on it. To deliver a message to a queue, you use the msgsnd system call: SYSTEM CALL: msgsnd();
PROTOTYPE: int msgsnd ( int msqid, struct msgbuf *msgp, int msgsz, int msgflg ); RETURNS: 0 on success -1 on error: errno = EAGAIN (queue is full, and IPC_NOWAIT was asserted) EACCES (permission denied, no write permission) EFAULT (msgp address isnt accessable - invalid) EIDRM (The message queue has been removed) EINTR (Received a signal while waiting to write) EINVAL (Invalid message queue identifier, nonpositiv message type, or invalid message size) ENOMEM (Not enough memory to copy message buffer) NOTES: The rst argument to msgsnd is our queue identier, returned by a previous call to msgget. The second argument, msgp, is a pointer to our redeclared and loaded message buffer. The msgsz argument contains the size of the message in bytes, excluding the length of the message type (4 byte long). The msgflg argument can be set to 0 (ignored), or: IPC NOWAIT If the message queue is full, then the message is not written to the queue, and control is returned to the calling process. If not specied, then the calling process will suspend (block) until the message can be written. Lets create another wrapper function for sending messages: int send_message( int qid, struct mymsgbuf *qbuf ) { int result, length; /* The length is essentially the size of the structure minus sizeof(mtype) */ length = sizeof(struct mymsgbuf) - sizeof(long); if((result = msgsnd( qid, qbuf, length, 0)) == -1) { return(-1); } return(result); }
38
This small function attempts to send the message residing at the passed address (qbuf) to the message queue designated by the passed queue identier (qid). Here is a sample code snippet utilizing the two wrapper functions we have developed so far: #include #include #include #include main() { int qid; key_t msgkey; struct mymsgbuf long int double } msg; <stdio.h> <stdlib.h> <linux/ipc.h> <linux/msg.h>
/* Generate our IPC key value */ msgkey = ftok(".", m); /* Open/create the queue */ if(( qid = open_queue( msgkey)) == -1) { perror("open_queue"); exit(1); } /* Load up the message with arbitrary test data */ msg.mtype = 1; /* Message type must be a positive number! */ msg.request = 1; /* Data element #1 */ msg.salary = 1000.00; /* Data element #2 (my yearly salary!) */ /* Bombs away! */ if((send_message( qid, &msg )) == -1) { perror("send_message"); exit(1); } } After creating/opening our message queue, we proceed to load up the message buffer with test data (note the lack of character data to illustrate our point about sending binary information). A quick call to send message merrily distributes our message out to the message queue. Now that we have a message on our queue, try the ipcs command to view the status of your queue. Now lets turn the discussion to actually retrieving the message from the queue. To do this, you use the msgrcv() system call:
SYSTEM CALL: msgrcv(); PROTOTYPE: int msgrcv ( int msqid, struct msgbuf *msgp, int msgsz, long mtype, int RETURNS: Number of bytes copied into message buffer -1 on error: errno = E2BIG (Message length is greater than msgsz, no
6.4. SYSTEM V IPC EACCES EFAULT EIDRM EINTR EINVAL ENOMSG NOTES:
39 (No read permission) (Address pointed to by msgp is invalid) (Queue was removed during retrieval) (Interrupted by arriving signal) (msgqid invalid, or msgsz less than 0) (IPC_NOWAIT asserted, and no message exists in the queue to satisfy the request)
Obviously, the rst argument is used to specify the queue to be used during the message retrieval process (should have been returned by an earlier call to msgget). The second argument (msgp) represents the address of a message buffer variable to store the retrieved message at. The third argument (msgsz) represents the size of the message buffer structure, excluding the length of the mtype member. Once again, this can easily be calculated as: msgsz = sizeof(struct mymsgbuf) - sizeof(long); The fourth argument (mtype) species the type of message to retrieve from the queue. The kernel will search the queue for the oldest message having a matching type, and will return a copy of it in the address pointed to by the msgp argument. One special case exists. If the mtype argument is passed with a value of zero, then the oldest message on the queue is returned, regardless of type. If IPC NOWAIT is passed as a ag, and no messages are available, the call returns ENOMSG to the calling process. Otherwise, the calling process blocks until a message arrives in the queue that satises the msgrcv() parameters. If the queue is deleted while a client is waiting on a message, EIDRM is returned. EINTR is returned if a signal is caught while the process is in the middle of blocking, and waiting for a message to arrive. Lets examine a quick wrapper function for retrieving a message from our queue: int read_message( int qid, long type, struct mymsgbuf *qbuf ) { int result, length; /* The length is essentially the size of the structure minus sizeof(mtype) */ length = sizeof(struct mymsgbuf) - sizeof(long); if((result = msgrcv( qid, qbuf, length, type, { return(-1); } return(result); } After successfully retrieving a message from the queue, the message entry within the queue is destroyed. The MSG NOERROR bit in the msgflg argument provides some additional capabilities. If the size of the physical message data is greater than msgsz, and MSG NOERROR is asserted, then the message is truncated, and only msgsz bytes are returned. Normally, the msgrcv() system call returns -1 (E2BIG), and the message will remain on the queue for later retrieval. This behavior can used to create another wrapper function, which will allow us to peek inside the queue, to see if a message has arrived that satises our request: 0)) == -1)
40
int peek_message( int qid, long type ) { int result, length; if((result = msgrcv( qid, NULL, 0, type, { if(errno == E2BIG) return(TRUE); } return(FALSE); } Above, you will notice the lack of a buffer address and a length. In this particular case, we want the call to fail. However, we check for the return of E2BIG which indicates that a message does exist which matches our requested type. The wrapper function returns TRUE on success, FALSE otherwise. Also note the use of the IPC NOWAIT ag, which prevents the blocking behavior described earlier. SYSTEM CALL: msgctl() Through the development of the wrapper functions presented earlier, you now have a simple, somewhat elegant approach to creating and utilizing message queues in your applications. Now, we will turn the discussion to directly manipulating the internal structures associated with a given message queue. To perform control operations on a message queue, you use the msgctl() system call. IPC_NOWAIT)) == -1)
SYSTEM CALL: msgctl(); PROTOTYPE: int msgctl ( int msgqid, int cmd, struct msqid_ds *buf ); RETURNS: 0 on success -1 on error: errno = EACCES (No read permission and cmd is IPC_STAT) EFAULT (Address pointed to by buf is invalid with IPC_STAT commands) EIDRM (Queue was removed during retrieval) EINVAL (msgqid invalid, or msgsz less than 0) EPERM (IPC_SET or IPC_RMID command was issued, bu calling process does not have write (alter access to the queue) NOTES: Now, common sense dictates that direct manipulation of the internal kernel data structures could lead to some late night fun. Unfortunately, the resulting duties on the part of the programmer could only be classied as fun if you like trashing the IPC subsystem. By using msgctl() with a selective set of commands, you have the ability to manipulate those items which are less likely to cause grief. Lets look at these commands: IPC STAT Retrieves the msqid ds structure for a queue, and stores it in the address of the buf argument. IPC SET Sets the value of the ipc perm member of the msqid ds structure for a queue. Takes the values from the buf argument.
6.4. SYSTEM V IPC IPC RMID Removes the queue from the kernel.
41
Recall our discussion about the internal data structure for message queues (msqid ds). The kernel maintains an instance of this structure for each queue which exists in the system. By using the IPC STAT command, we can retrieve a copy of this structure for examination. Lets look at a quick wrapper function that will retrieve the internal structure and copy it into a passed address: int get_queue_ds( int qid, struct msgqid_ds *qbuf ) { if( msgctl( qid, IPC_STAT, qbuf) == -1) { return(-1); } return(0); } If we are unable to copy the internal buffer, -1 is returned to the calling function. If all went well, a value of 0 (zero) is returned, and the passed buffer should contain a copy of the internal data structure for the message queue represented by the passed queue identier (qid). Now that we have a copy of the internal data structure for a queue, what attributes can be manipulated, and how can we alter them? The only modiable item in the data structure is the ipc perm member. This contains the permissions for the queue, as well as information about the owner and creator. However, the only members of the ipc perm structure that are modiable are mode, uid, and gid. You can change the owners user id, the owners group id, and the access permissions for the queue. Lets create a wrapper function designed to change the mode of a queue. The mode must be passed in as a character array (i.e. 660). int change_queue_mode( int qid, char *mode ) { struct msqid_ds tmpbuf; /* Retrieve a current copy of the internal data structure */ get_queue_ds( qid, &tmpbuf); /* Change the permissions using an old trick */ sscanf(mode, "%ho", &tmpbuf.msg_perm.mode); /* Update the internal data structure */ if( msgctl( qid, IPC_SET, &tmpbuf) == -1) { return(-1); } return(0); } We retrieve a current copy of the internal data structure by a quick call to our get queue ds wrapper function. We then make a call to sscanf() to alter the mode member of the associated msg perm structure. No changes take place, however, until
42
the new copy is used to update the internal version. This duty is performed by a call to msgctl() using the IPC SET command. BE CAREFUL! It is possible to alter the permissions on a queue, and in doing so, inadvertently lock yourself out! Remember, these IPC objects dont go away unless they are properly removed, or the system is rebooted. So, even if you cant see a queue with ipcs doesnt mean that it isnt there. To illustrate this point, a somewhat humorous anecdote seems to be in order. While teaching a class on UNIX internals at the University of South Florida, I ran into a rather embarrassing stumbling block. I had dialed into their lab server the night before, in order to compile and test the labwork to be used in the week-long class. In the process of my testing, I realized that I had made a typo in the code used to alter the permissions on a message queue. I created a simple message queue, and tested the sending and receiving capabilities with no incident. However, when I attempted to change the mode of the queue from 660 to 600, the resulting action was that I was locked out of my own queue! As a result, I could not test the message queue labwork in the same area of my source directory. Since I used the ftok() function to create the IPC key, I was trying to access a queue that I did not have proper permissions for. I ended up contacting the local system administrator on the morning of the class, only to spend an hour explaining to him what a message queue was, and why I needed him to run the ipcrm command for me. grrrr. After successfully retrieving a message from a queue, the message is removed. However, as mentioned earlier, IPC objects remain in the system unless explicitly removed, or the system is rebooted. Therefore, our message queue still exists within the kernel, available for use long after a single message disappears. To complete the life cycle of a message queue, they should be removed with a call to msgctl(), using the IPC RMID command: int remove_queue( int qid ) { if( msgctl( qid, IPC_RMID, 0) == -1) { return(-1); } return(0); } This wrapper function returns 0 if the queue was removed without incident, else a value of -1. The removal of the queue is atomic in nature, and any subsequent accesses to the queue for whatever purpose will fail miserably. msgtool: An interactive message queue manipulator Few can deny the immediate benet of having accurate technical information readily available. Such materials provide a tremendous mechanism for learning and exploring new areas. On the same note, having real world examples to accompany any technical information will speed up and reinforce the learning process. Until now, the only useful examples which have been presented were the wrapper functions for manipulating message queues. While they are extremely useful, they have not been presented in a manner which would warrant further study and experimentation. To remedy this, you will be presented with msgtool, an interactive command line utility for manipulating IPC message queues. While it certainly functions as an adequate tool for education reinforcement, it can be applied directly into real world assignments, by providing message queue functionality via standard shell scripts.
43
Background The msgtool program relies on command line arguments to determine its behavior. This is what makes it especially useful when called from a shell script. All of the capabilities are provided, from creating, sending, and retrieving, to changing the permissions and nally removing a queue. Currently, it uses a character array for data, allowing you to send textual messages. Changing it to facilitate additional data types is left as an exercise to the reader. Command Line Syntax Sending Messages msgtool s (type) "text" Retrieving Messages msgtool r (type) Changing the Permissions (mode) msgtool m (mode) Deleting a Queue msgtool d Examples msgtool msgtool msgtool msgtool msgtool msgtool s s s r d m 1 test 5 test 1 "This is a test" 1 660
The Source The following is the source code for the msgtool facility. It should compile clean on any recent (decent) kernel revision that supports System V IPC. Be sure to enable System V IPC in your kernel when doing a rebuild! On a side note, this utility will create a message queue if it does not exist, no matter what type of action is requested. NOTE: Since this tool uses the ftok() function to generate IPC key values, you may encounter directory conicts. If you change directories at any point in your script, it probably wont work. Another solution would be to hardcode a more complete path into msgtool, such as /tmp/msgtool, or possibly even allow the path to be passed on the command line, along with the operational arguments. /***************************************************************************** Excerpt from "Linux Programmers Guide - Chapter 6" (C)opyright 1994-1995, Scott Burkett ***************************************************************************** MODULE: msgtool.c *****************************************************************************
44
A command line tool for tinkering with SysV style Message Queues *****************************************************************************/ #include #include #include #include #include #include <stdio.h> <stdlib.h> <ctype.h> <sys/types.h> <sys/ipc.h> <sys/msg.h>
#define MAX_SEND_SIZE 80 struct mymsgbuf { long mtype; char mtext[MAX_SEND_SIZE]; }; void void void void void send_message(int qid, struct mymsgbuf *qbuf, long type, char *text); read_message(int qid, struct mymsgbuf *qbuf, long type); remove_queue(int qid); change_queue_mode(int qid, char *mode); usage(void);
int main(int argc, char *argv[]) { key_t key; int msgqueue_id; struct mymsgbuf qbuf; if(argc == 1) usage(); /* Create unique key via call to ftok() */ key = ftok(".", m); /* Open the queue - create if necessary */ if((msgqueue_id = msgget(key, IPC_CREAT|0660)) == -1) { perror("msgget"); exit(1); } switch(tolower(argv[1][0])) { case s: send_message(msgqueue_id, (struct mymsgbuf *)&qbuf, atol(argv[2]), argv[3]); break; case r: read_message(msgqueue_id, &qbuf, atol(argv[2])); break; case d: remove_queue(msgqueue_id); break; case m: change_queue_mode(msgqueue_id, argv[2]); break;
45
void send_message(int qid, struct mymsgbuf *qbuf, long type, char *text) { /* Send a message to the queue */ printf("Sending a message ...\n"); qbuf->mtype = type; strcpy(qbuf->mtext, text); if((msgsnd(qid, (struct msgbuf *)qbuf, strlen(qbuf->mtext)+1, 0)) ==-1) { perror("msgsnd"); exit(1); } } void read_message(int qid, struct mymsgbuf *qbuf, long type) { /* Read a message from the queue */ printf("Reading a message ...\n"); qbuf->mtype = type; msgrcv(qid, (struct msgbuf *)qbuf, MAX_SEND_SIZE, type, 0); printf("Type: %ld Text: %s\n", qbuf->mtype, qbuf->mtext); } void remove_queue(int qid) { /* Remove the queue */ msgctl(qid, IPC_RMID, 0); } void change_queue_mode(int qid, char *mode) { struct msqid_ds myqueue_ds; /* Get current info */ msgctl(qid, IPC_STAT, &myqueue_ds); /* Convert and load the mode */ sscanf(mode, "%ho", &myqueue_ds.msg_perm.mode); /* Update the mode */ msgctl(qid, IPC_SET, &myqueue_ds); } void usage(void) {
46
CHAPTER 6. LINUX INTERPROCESS COMMUNICATIONS fprintf(stderr, fprintf(stderr, fprintf(stderr, fprintf(stderr, fprintf(stderr, exit(1); "msgtool - A utility for tinkering with msg queues\n"); "\nUSAGE: msgtool (s)end <type> <messagetext>\n"); " (r)ecv <type>\n"); " (d)elete\n"); " (m)ode <octal mode>\n");
6.4.3
Semaphores
Basic Concepts Semaphores can best be described as counters used to control access to shared resources by multiple processes. They are most often used as a locking mechanism to prevent processes from accessing a particular resource while another process is performing operations on it. Semaphores are often dubbed the most difcult to grasp of the three types of System V IPC objects. In order to fully understand semaphores, well discuss them briey before engaging any system calls and operational theory. The name semaphore is actually an old railroad term, referring to the crossroad arms that prevent cars from crossing the tracks at intersections. The same can be said about a simple semaphore set. If the semaphore is on (the arms are up), then a resource is available (cars may cross the tracks). However, if the semaphore is off (the arms are down), then resources are not available (the cars must wait). While this simple example may stand to introduce the concept, it is important to realize that semaphores are actually implemented as sets, rather than as single entities. Of course, a given semaphore set might only have one semaphore, as in our railroad example. Perhaps another approach to the concept of semaphores, is to think of them as resource counters. Lets apply this concept to another real world scenario. Consider a print spooler, capable of handling multiple printers, with each printer handling multiple print requests. A hypothetical print spool manager will utilize semaphore sets to monitor access to each printer. Assume that in our corporate print room, we have 5 printers online. Our print spool manager allocates a semaphore set with 5 semaphores in it, one for each printer on the system. Since each printer is only physically capable of printing one job at a time, each of our ve semaphores in our set will be initialized to a value of 1 (one), meaning that they are all online, and accepting requests. John sends a print request to the spooler. The print manager looks at the semaphore set, and nds the rst semaphore which has a value of one. Before sending Johns request to the physical device, the print manager decrements the semaphore for the corresponding printer by a value of negative one (-1). Now, that semaphores value is zero. In the world of System V semaphores, a value of zero represents 100% resource utilization on that semaphore. In our example, no other request can be sent to that printer until it is no longer equal to zero. When Johns print job has completed, the print manager increments the value of the semaphore which corresponds to the printer. Its value is now back up to one (1), which means it is available again. Naturally, if all 5 semaphores had a value of zero, that would indicate that they are all busy printing requests, and that no printers are available. Although this was a simple example, please do not be confused by the initial value of one (1) which was assigned to each semaphore in the set. Semaphores, when thought of as resource counters, may be initialized to any positive integer value, and are not limited to either being zero or one. If it were possible for each of our ve printers to handle 10 print jobs at a time, we could initialize each of our semaphores to 10, decrementing by one for every new job, and incrementing by one whenever a print job was nished. As you will discover in the next chapter, semaphores have a close working relationship with shared
47
memory segments, acting as a watchdog to prevent multiple writes to the same memory segment. Before delving into the associated system calls, lets take a brief tour through the various internal data structures utilized during semaphore operations. Internal Data Structures Lets briey look at data structures maintained by the kernel for semaphore sets. Kernel semid ds structure As with message queues, the kernel maintains a special internal data structure for each semaphore set which exists within its addressing space. This structure is of type semid ds, and is dened in linux/sem.h as follows: /* One semid data structure for each set of semaphores in the system. */ struct semid_ds { struct ipc_perm sem_perm; /* permissions .. see ipc.h */ time_t sem_otime; /* last semop time */ time_t sem_ctime; /* last change time */ struct sem *sem_base; /* ptr to first semaphore in array */ struct wait_queue *eventn; struct wait_queue *eventz; struct sem_undo *undo; /* undo requests on this array */ ushort sem_nsems; /* no. of semaphores in array */ }; As with message queues, operations on this structure are performed by a special system call, and should not be tinkered with directly. Here are descriptions of the more pertinent elds: sem perm This is an instance of the ipc perm structure, which is dened for us in linux/ipc.h. This holds the permission information for the semaphore set, including the access permissions, and information about the creator of the set (uid, etc). sem otime Time of the last semop() operation (more on this in a moment) sem ctime Time of the last change to this structure (mode change, etc) sem base Pointer to the rst semaphore in the array (see next structure) sem undo Number of undo requests in this array (more on this in a moment) sem nsems Number of semaphores in the semaphore set (the array)
48
Kernel sem structure In the semid ds structure, there exists a pointer to the base of the semaphore array itself. Each array member is of the sem structure type. It is also dened in linux/sem.h: /* One semaphore structure for each semaphore in the system. */ struct sem { short sempid; /* pid of last operation */ ushort semval; /* current value */ ushort semncnt; /* num procs awaiting increase in semval */ ushort semzcnt; /* num procs awaiting semval = 0 */ }; sem pid The PID (process ID) that performed the last operation sem semval The current value of the semaphore sem semncnt Number of processes waiting for resources to become available sem semzcnt Number of processes waiting for 100% resource utilization SYSTEM CALL: semget() In order to create a new semaphore set, or access an existing set, the semget() system call is used. SYSTEM CALL: semget();
PROTOTYPE: int semget ( key_t key, int nsems, int semflg ); RETURNS: semaphore set IPC identifier on success -1 on error: errno = EACCESS (permission denied) EEXIST (set exists, cannot create (IPC_EXCL)) EIDRM (set is marked for deletion) ENOENT (set does not exist, no IPC_CREAT was used) ENOMEM (Not enough memory to create new set) ENOSPC (Maximum set limit exceeded) NOTES: The rst argument to semget() is the key value (in our case returned by a call to ftok()). This key value is then compared to existing key values that exist within the kernel for other semaphore sets. At that point, the open or access operation is dependent upon the contents of the semflg argument. IPC CREAT Create the semaphore set if it doesnt already exist in the kernel. IPC EXCL When used with IPC CREAT, fail if semaphore set already exists.
49
If IPC CREAT is used alone, semget() either returns the semaphore set identier for a newly created set, or returns the identier for a set which exists with the same key value. If IPC EXCL is used along with IPC CREAT, then either a new set is created, or if the set exists, the call fails with -1. IPC EXCL is useless by itself, but when combined with IPC CREAT, it can be used as a facility to guarantee that no existing semaphore set is opened for access. As with the other forms of System V IPC, an optional octal mode may be ORd into the mask to form the permissions on the semaphore set. The nsems argument species the number of semaphores that should be created in a new set. This represents the number of printers in our ctional print room described earlier. The maximum number of semaphores in a set is dened in linux/sem.h as: #define SEMMSL 32 /* <=512 max num of semaphores per id */
Note that the nsems argument is ignored if you are explicitly opening an existing set. Lets create a wrapper function for opening or creating semaphore sets: int open_semaphore_set( key_t keyval, int numsems ) { int sid; if ( ! numsems ) return(-1); if((sid = semget( mykey, numsems, IPC_CREAT | 0660 )) == -1) { return(-1); } return(sid); } Note the use of the explicit permissions of 0660. This small function either returns a semaphore set identier (int), or -1 on error. The key value must be passed to it, as well as the number of semaphores to allocate space for if creating a new set. In the example presented at the end of this section, notice the use of the IPC EXCL ag to determine whether or not the semaphore set exists or not. SYSTEM CALL: semop()
SYSTEM CALL: semop(); PROTOTYPE: int semop ( int semid, struct sembuf *sops, unsigned nsops); RETURNS: 0 on success (all operations performed) -1 on error: errno = E2BIG (nsops greater than max number of ops allowed EACCESS (permission denied) EAGAIN (IPC_NOWAIT asserted, operation could not go EFAULT (invalid address pointed to by sops argument) EIDRM (semaphore set was removed) EINTR (Signal received while sleeping) EINVAL (set doesnt exist, or semid is invalid) ENOMEM (SEM_UNDO asserted, not enough memory to crea undo structure necessary) ERANGE (semaphore value out of range) NOTES:
50
The rst argument to semget() is the key value (in our case returned by a call to semget). The second argument (sops) is a pointer to an array of operations to be performed on the semaphore set, while the third argument (nsops) is the number of operations in that array. The sops argument points to an array of type sembuf. This structure is declared in linux/sem.h as follows: /* semop system struct sembuf { ushort short short }; sem num The number of the semaphore you wish to deal with sem op The operation to perform (positive, negative, or zero) sem flg Operational ags If sem op is negative, then its value is subtracted from the semaphore. This correlates with obtaining resources that the semaphore controls or monitors access of. If IPC NOWAIT is not specied, then the calling process sleeps until the requested amount of resources are available in the semaphore (another process has released some). If sem op is positive, then its value is added to the semaphore. This correlates with returning resources back to the applications semaphore set. Resources should always be returned to a semaphore set when they are no longer needed! Finally, if sem op is zero (0), then the calling process will sleep() until the semaphores value is 0. This correlates to waiting for a semaphore to reach 100% utilization. A good example of this would be a daemon running with superuser permissions that could dynamically adjust the size of the semaphore set if it reaches full utilization. In order to explain the semop call, lets revisit our print room scenario. Lets assume only one printer, capable of only one job at a time. We create a semaphore set with only one semaphore in it (only one printer), and initialize that one semaphore to a value of one (only one job at a time). Each time we desire to send a job to this printer, we need to rst make sure that the resource is available. We do this by attempting to obtain one unit from the semaphore. Lets load up a sembuf array to perform the operation: struct sembuf sem_lock = { 0, -1, IPC_NOWAIT }; Translation of the above initialized structure dictates that a value of -1 will be added to semaphore number 0 in the semaphore set. In other words, one unit of resources will be obtained from the only semaphore in our set (0th member). IPC NOWAIT is specied, so the call will either go through immediately, or fail if another print job is currently printing. Here is an example of using this initialized sembuf structure with the semop system call: if((semop(sid, &sem_lock, 1) == -1) perror("semop"); call takes an array of these */ sem_num; sem_op; sem_flg; /* semaphore index in array */ /* semaphore operation */ /* operation flags */
51
The third argument (nsops) says that we are only performing one (1) operation (there is only one sembuf structure in our array of operations). The sid argument is the IPC identier for our semaphore set. When our print job has completed, we must return the resources back to the semaphore set, so that others may use the printer. struct sembuf sem_unlock = { 0, 1, IPC_NOWAIT }; Translation of the above initialized structure dictates that a value of 1 will be added to semaphore number 0 in the semaphore set. In other words, one unit of resources will be returned to the set. SYSTEM CALL: semctl() SYSTEM CALL: semctl(); PROTOTYPE: int semctl ( int semid, int semnum, int cmd, union semun arg ); RETURNS: positive integer on success -1 on error: errno = EACCESS (permission denied) EFAULT (invalid address pointed to by arg argument) EIDRM (semaphore set was removed) EINVAL (set doesnt exist, or semid is invalid) EPERM (EUID has no privileges for cmd in arg) ERANGE (semaphore value out of range) NOTES: Performs control operations on a semaphore set The semctl system call is used to perform control operations on a semaphore set. This call is analogous to the msgctl system call which is used for operations on message queues. If you compare the argument lists of the two system calls, you will notice that the list for semctl varies slightly from that of msgctl. Recall that semaphores are actually implemented as sets, rather than as single entities. With semaphore operations, not only does the IPC key need to be passed, but the target semaphore within the set as well. Both system calls utilize a cmd argument, for specication of the command to be performed on the IPC object. The remaining difference lies in the nal argument to both calls. In msgctl, the nal argument represents a copy of the internal data structure used by the kernel. Recall that we used this structure to retrieve internal information about a message queue, as well as to set or change permissions and ownership of the queue. With semaphores, additional operational commands are supported, thus requiring a more complex data type as the nal argument. The use of a union confuses many neophyte semaphore programmers to a substantial degree. We will dissect this structure carefully, in an effort to prevent any confusion. The rst argument to semctl() is the key value (in our case returned by a call to semget). The second argument (semun) is the semaphore number that an operation is targeted towards. In essence, this can be thought of as an index into the semaphore set, with the rst semaphore (or only one) in the set being represented by a value of zero (0). The cmd argument represents the command to be performed against the set. As you can see, the familiar IPC STAT/IPC SET commands are present, along with a wealth of additional commands specic to semaphore sets: IPC STAT Retrieves the semid ds structure for a set, and stores it in the address of the buf argument in the semun union.
52 IPC SET
Sets the value of the ipc perm member of the semid ds structure for a set. Takes the values from the buf argument of the semun union. IPC RMID Removes the set from the kernel. GETALL Used to obtain the values of all semaphores in a set. The integer values are stored in an array of unsigned short integers pointed to by the array member of the union. GETNCNT Returns the number of processes currently waiting for resources. GETPID Returns the PID of the process which performed the last semop call. GETVAL Returns the value of a single semaphore within the set. GETZCNT Returns the number of processes currently waiting for 100% resource utilization. SETALL Sets all semaphore values with a set to the matching values contained in the array member of the union. SETVAL Sets the value of an individual semaphore within the set to the val member of the union. The arg argument represents an instance of type semun. This particular union is declared in linux/sem.h as follows: /* arg for semctl system calls. union semun { int val; struct semid_ds *buf; ushort *array; struct seminfo *__buf; void *__pad; }; val Used when the SETVAL command is performed. Species the value to set the semaphore to. buf Used in the IPC STAT/IPC SET commands. Represents a copy of the internal semaphore data structure used in the kernel. array A pointer used in the GETALL/SETALL commands. Should point to an array of integer values to be used in setting or retrieving all semaphore values in a set. */ /* /* /* /* value for SETVAL */ buffer for IPC_STAT & IPC_SET */ array for GETALL & SETALL */ buffer for IPC_INFO */
53
The remaining arguments buf and pad are used internally in the semaphore code within the kernel, and are of little or no use to the application developer. As a matter of fact, these two arguments are specic to the Linux operating system, and are not found in other UNIX implementations. Since this particular system call is arguably the most difcult to grasp of all the System V IPC calls, well examine multiple examples of it in action. The following snippet returns the value of the passed semaphore. The nal argument (the union) is ignored when the GETVAL command is used: int get_sem_val( int sid, int semnum ) { return( semctl(sid, semnum, GETVAL, 0)); } To revisit the printer example, lets say the status of all ve printers was required: #define MAX_PRINTERS 5 printer_usage() { int x; for(x=0; x<MAX_PRINTERS; x++) printf("Printer %d: %d\n\r", x, get_sem_val( sid, x )); } Consider the following function, which could be used to initialize a new semaphore value: void init_semaphore( int sid, int semnum, int initval) { union semun semopts; semopts.val = initval; semctl( sid, semnum, SETVAL, semopts); } Note that the nal argument of semctl is a copy of the union, rather than a pointer to it. While were on the subject of the union as an argument, allow me to demonstrate a rather common mistake when using this system call. Recall from the msgtool project that the IPC STAT and IPC SET commands were used to alter permissions on the queue. While these commands are supported in the semaphore implementation, their usage is a bit different, as the internal data structure is retrieved and copied from a member of the union, rather than as a single entity. Can you locate the bug in this code? /* Required permissions should be passed in as text (ex: "660") */ void changemode(int sid, char *mode) { int rc; struct semid_ds mysemds;
54
CHAPTER 6. LINUX INTERPROCESS COMMUNICATIONS /* Get current values for internal data structure */ if((rc = semctl(sid, 0, IPC_STAT, semopts)) == -1) { perror("semctl"); exit(1); } printf("Old permissions were %o\n", semopts.buf->sem_perm.mode); /* Change the permissions on the semaphore */ sscanf(mode, "%o", &semopts.buf->sem_perm.mode); /* Update the internal data structure */ semctl(sid, 0, IPC_SET, semopts); printf("Updated...\n");
} The code is attempting to make a local copy of the internal data structure for the set, modify the permissions, and IPC SET them back to the kernel. However, the rst call to semctl promptly returns EFAULT, or bad address for the last argument (the union!). In addition, if we hadnt checked for errors from that call, we would have gotten a memory fault. Why? Recall that the IPC SET/IPC STAT commands use the buf member of the union, which is a pointer to a type semid ds. Pointers are pointers are pointers are pointers! The buf member must point to some valid storage location in order for our function to work properly. Consider this revamped version: void changemode(int sid, char *mode) { int rc; struct semid_ds mysemds; /* Get current values for internal data structure */ /* Point to our local copy first! */ semopts.buf = &mysemds; /* Lets try this again! */ if((rc = semctl(sid, 0, IPC_STAT, semopts)) == -1) { perror("semctl"); exit(1); } printf("Old permissions were %o\n", semopts.buf->sem_perm.mode); /* Change the permissions on the semaphore */ sscanf(mode, "%o", &semopts.buf->sem_perm.mode); /* Update the internal data structure */ semctl(sid, 0, IPC_SET, semopts);
55
Background The semtool program relies on command line arguments to determine its behavior. This is what makes it especially useful when called from a shell script. All of the capabilities are provided, from creating and manipulating, to changing the permissions and nally removing a semaphore set. It can be used to control shared resources via standard shell scripts. Command Line Syntax Creating a Semaphore Set semtool c (number of semaphores in set) Locking a Semaphore semtool l (semaphore number to lock) Unlocking a Semaphore semtool u (semaphore number to unlock) Changing the Permissions (mode) semtool m (mode) Deleting a Semaphore Set semtool d Examples semtool semtool semtool semtool semtool The Source /***************************************************************************** Excerpt from "Linux Programmers Guide - Chapter 6" (C)opyright 1994-1995, Scott Burkett ***************************************************************************** MODULE: semtool.c ***************************************************************************** A command line tool for tinkering with SysV style Semaphore Sets *****************************************************************************/ c 5 l u m 660 d
CHAPTER 6. LINUX INTERPROCESS COMMUNICATIONS <stdio.h> <ctype.h> <stdlib.h> <sys/types.h> <sys/ipc.h> <sys/sem.h> 1 /* Initial value of all semaphores */
#define SEM_RESOURCE_MAX
void opensem(int *sid, key_t key); void createsem(int *sid, key_t key, int members); void locksem(int sid, int member); void unlocksem(int sid, int member); void removesem(int sid); unsigned short get_member_count(int sid); int getval(int sid, int member); void dispval(int sid, int member); void changemode(int sid, char *mode); void usage(void); int main(int argc, char *argv[]) { key_t key; int semset_id; if(argc == 1) usage(); /* Create unique key via call to ftok() */ key = ftok(".", s); switch(tolower(argv[1][0])) { case c: if(argc != 3) usage(); createsem(&semset_id, key, atoi(argv[2])); break; case l: if(argc != 3) usage(); opensem(&semset_id, key); locksem(semset_id, atoi(argv[2])); break; case u: if(argc != 3) usage(); opensem(&semset_id, key); unlocksem(semset_id, atoi(argv[2])); break; case d: opensem(&semset_id, key); removesem(semset_id); break; case m: opensem(&semset_id, key); changemode(semset_id, argv[2]); break; default: usage();
57
} return(0); } void opensem(int *sid, key_t key) { /* Open the semaphore set - do not create! */ if((*sid = semget(key, 0, 0666)) == -1) { printf("Semaphore set does not exist!\n"); exit(1); } } void createsem(int *sid, key_t key, int members) { int cntr; union semun semopts; if(members > SEMMSL) { printf("Sorry, max number of semaphores in a set is %d\n", SEMMSL); exit(1); } printf("Attempting to create new semaphore set with %d members\n", members); if((*sid = semget(key, members, IPC_CREAT|IPC_EXCL|0666)) == -1) { fprintf(stderr, "Semaphore set already exists!\n"); exit(1); } semopts.val = SEM_RESOURCE_MAX; /* Initialize all members (could be done with SETALL) */ for(cntr=0; cntr<members; cntr++) semctl(*sid, cntr, SETVAL, semopts); } void locksem(int sid, int member) { struct sembuf sem_lock={ 0, -1, IPC_NOWAIT}; if( member<0 || member>(get_member_count(sid)-1)) { fprintf(stderr, "semaphore member %d out of range\n", member); return;
58 }
/* Attempt to lock the semaphore set */ if(!getval(sid, member)) { fprintf(stderr, "Semaphore resources exhausted (no lock)!\n"); exit(1); } sem_lock.sem_num = member; if((semop(sid, &sem_lock, 1)) == -1) { fprintf(stderr, "Lock failed\n"); exit(1); } else printf("Semaphore resources decremented by one (locked)\n"); dispval(sid, member); } void unlocksem(int sid, int member) { struct sembuf sem_unlock={ member, 1, IPC_NOWAIT}; int semval; if( member<0 || member>(get_member_count(sid)-1)) { fprintf(stderr, "semaphore member %d out of range\n", member); return; } /* Is the semaphore set locked? */ semval = getval(sid, member); if(semval == SEM_RESOURCE_MAX) { fprintf(stderr, "Semaphore not locked!\n"); exit(1); } sem_unlock.sem_num = member; /* Attempt to lock the semaphore set */ if((semop(sid, &sem_unlock, 1)) == -1) { fprintf(stderr, "Unlock failed\n"); exit(1); } else printf("Semaphore resources incremented by one (unlocked)\n"); dispval(sid, member); }
6.4. SYSTEM V IPC void removesem(int sid) { semctl(sid, 0, IPC_RMID, 0); printf("Semaphore removed\n"); } unsigned short get_member_count(int sid) { union semun semopts; struct semid_ds mysemds; semopts.buf = &mysemds;
59
/* Return number of members in the semaphore set */ return(semopts.buf->sem_nsems); } int getval(int sid, int member) { int semval; semval = semctl(sid, member, GETVAL, 0); return(semval); } void changemode(int sid, char *mode) { int rc; union semun semopts; struct semid_ds mysemds; /* Get current values for internal data structure */ semopts.buf = &mysemds; rc = semctl(sid, 0, IPC_STAT, semopts); if (rc == -1) { perror("semctl"); exit(1); } printf("Old permissions were %o\n", semopts.buf->sem_perm.mode); /* Change the permissions on the semaphore */ sscanf(mode, "%ho", &semopts.buf->sem_perm.mode); /* Update the internal data structure */ semctl(sid, 0, IPC_SET, semopts); printf("Updated...\n"); } void dispval(int sid, int member)
60 {
int semval; semval = semctl(sid, member, GETVAL, 0); printf("semval for member %d is %d\n", member, semval); } void usage(void) { fprintf(stderr, fprintf(stderr, fprintf(stderr, fprintf(stderr, fprintf(stderr, fprintf(stderr, exit(1); }
"semtool - A utility for tinkering with semaphores\n"); "\nUSAGE: semtool4 (c)reate <semcount>\n"); " (l)ock <sem #>\n"); " (u)nlock <sem #>\n"); " (d)elete\n"); " (m)ode <mode>\n");
semstat: A semtool companion program As an added bonus, the source code to a companion program called semstat is provided next. The semstat program displays the values of each of the semaphores in the set created by semtool. /***************************************************************************** Excerpt from "Linux Programmers Guide - Chapter 6" (C)opyright 1994-1995, Scott Burkett ***************************************************************************** MODULE: semstat.c ***************************************************************************** A companion command line tool for the semtool package. semstat displays the current value of all semaphores in the set created by semtool. *****************************************************************************/ #include #include #include #include #include <stdio.h> <stdlib.h> <sys/types.h> <sys/ipc.h> <sys/sem.h>
int get_sem_count(int sid); void show_sem_usage(int sid); int get_sem_count(int sid); void dispval(int sid); int main(int argc, char *argv[]) { key_t key; int semset_id; /* Create unique key via call to ftok() */ key = ftok(".", s);
6.4. SYSTEM V IPC /* Open the semaphore set - do not create! */ if((semset_id = semget(key, 1, 0666)) == -1) { printf("Semaphore set does not exist\n"); exit(1); } show_sem_usage(semset_id); return(0); } void show_sem_usage(int sid) { int cntr=0, maxsems, semval; maxsems = get_sem_count(sid);
61
while(cntr < maxsems) { semval = semctl(sid, cntr, GETVAL, 0); printf("Semaphore #%d: --> %d\n", cntr, semval); cntr++; } } int get_sem_count(int sid) { int rc; struct semid_ds mysemds; union semun semopts; /* Get current values for internal data structure */ semopts.buf = &mysemds; if((rc = semctl(sid, 0, IPC_STAT, semopts)) == -1) { perror("semctl"); exit(1); } /* return number of semaphores in set */ return(semopts.buf->sem_nsems); } void dispval(int sid) { int semval; semval = semctl(sid, 0, GETVAL, 0); printf("semval is %d\n", semval); }
62
6.4.4
Shared Memory
Basic Concepts Shared memory can best be described as the mapping of an area (segment) of memory that will be mapped and shared by more than one process. This is by far the fastest form of IPC, because there is no intermediation (i.e. a pipe, a message queue, etc). Instead, information is mapped directly from a memory segment, and into the addressing space of the calling process. A segment can be created by one process, and subsequently written to and read from by any number of processes. Internal and User Data Structures Lets briey look at data structures maintained by the kernel for shared memory segments. Kernel shmid ds structure As with message queues and semaphore sets, the kernel maintains a special internal data structure for each shared memory segment which exists within its addressing space. This structure is of type shmid ds, and is dened in linux/shm.h as follows:
/* One shmid data structure for each shared memory segment in the system. */ struct shmid_ds { struct ipc_perm shm_perm; /* operation perms */ int shm_segsz; /* size of segment (bytes) */ time_t shm_atime; /* last attach time */ time_t shm_dtime; /* last detach time */ time_t shm_ctime; /* last change time */ unsigned short shm_cpid; /* pid of creator */ unsigned short shm_lpid; /* pid of last operator */ short shm_nattch; /* no. of current attaches */ /* the following are private */
unsigned short shm_npages; /* size of segment (pages) */ unsigned long *shm_pages; /* array of ptrs to frames -> SHMMA struct vm_area_struct *attaches; /* descriptors for attaches */ }; Operations on this structure are performed by a special system call, and should not be tinkered with directly. Here are descriptions of the more pertinent elds: shm perm This is an instance of the ipc perm structure, which is dened for us in linux/ipc.h. This holds the permission information for the segment, including the access permissions, and information about the creator of the segment (uid, etc). shm segsz Size of the segment (measured in bytes). shm atime Time the last process attached the segment. shm dtime Time the last process detached the segment.
6.4. SYSTEM V IPC shm ctime Time of the last change to this structure (mode change, etc). shm cpid The PID of the creating process. shm lpid The PID of the last process to operate on the segment. shm nattch Number of processes currently attached to the segment. SYSTEM CALL: shmget()
63
In order to create a new message queue, or access an existing queue, the shmget() system call is used. SYSTEM CALL: shmget();
PROTOTYPE: int shmget ( key_t key, int size, int shmflg ); RETURNS: shared memory segment identifier on success -1 on error: errno = EINVAL (Invalid segment size specified) EEXIST (Segment exists, cannot create) EIDRM (Segment is marked for deletion, or was remove ENOENT (Segment does not exist) EACCES (Permission denied) ENOMEM (Not enough memory to create segment) NOTES: This particular call should almost seem like old news at this point. It is strikingly similar to the corresponding get calls for message queues and semaphore sets. The rst argument to shmget() is the key value (in our case returned by a call to ftok()). This key value is then compared to existing key values that exist within the kernel for other shared memory segments. At that point, the open or access operation is dependent upon the contents of the shmflg argument. IPC CREAT Create the segment if it doesnt already exist in the kernel. IPC EXCL When used with IPC CREAT, fail if segment already exists. If IPC CREAT is used alone, shmget() either returns the segment identier for a newly created segment, or returns the identier for a segment which exists with the same key value. If IPC EXCL is used along with IPC CREAT, then either a new segment is created, or if the segment exists, the call fails with -1. IPC EXCL is useless by itself, but when combined with IPC CREAT, it can be used as a facility to guarantee that no existing segment is opened for access. Once again, an optional octal mode may be ORd into the mask. Lets create a wrapper function for locating or creating a shared memory segment :
64
int open_segment( key_t keyval, int segsize ) { int shmid; if((shmid = shmget( keyval, segsize, IPC_CREAT | 0660 )) == -1) { return(-1); } return(shmid); } Note the use of the explicit permissions of 0660. This small function either returns a shared memory segment identier (int), or -1 on error. The key value and requested segment size (in bytes) are passed as arguments. Once a process has a valid IPC identier for a given segment, the next step is for the process to attach or map the segment into its own addressing space. SYSTEM CALL: shmat() SYSTEM CALL: shmat();
PROTOTYPE: int shmat ( int shmid, char *shmaddr, int shmflg); RETURNS: address at which segment was attached to the process, or -1 on error: errno = EINVAL (Invalid IPC ID value or attach address pas ENOMEM (Not enough memory to attach segment) EACCES (Permission denied) NOTES: If the addr argument is zero (0), the kernel tries to nd an unmapped region. This is the recommended method. An address can be specied, but is typically only used to facilitate proprietary hardware or to resolve conicts with other apps. The SHM RND ag can be ORd into the ag argument to force a passed address to be page aligned (rounds down to the nearest page size). In addition, if the SHM RDONLY ag is ORd in with the ag argument, then the shared memory segment will be mapped in, but marked as readonly. This call is perhaps the simplest to use. Consider this wrapper function, which is passed a valid IPC identier for a segment, and returns the address that the segment was attached to: char *attach_segment( int shmid ) { return(shmat(shmid, 0, 0)); } Once a segment has been properly attached, and a process has a pointer to the start of that segment, reading and writing to the segment become as easy as simply referencing or dereferencing the pointer! Be careful not to lose the value of the original pointer! If this happens, you will have no way of accessing the base (start) of the segment. SYSTEM CALL: shmctl() SYSTEM CALL: shmctl(); PROTOTYPE: int shmctl ( int shmqid, int cmd, struct shmid_ds *buf ); RETURNS: 0 on success
65
-1 on error: errno = EACCES (No read permission and cmd is IPC_STAT) EFAULT (Address pointed to by buf is invalid with IP IPC_STAT commands) EIDRM (Segment was removed during retrieval) EINVAL (shmqid invalid) EPERM (IPC_SET or IPC_RMID command was issued, but calling process does not have write (alter) access to the segment) NOTES: This particular call is modeled directly after the msgctl call for message queues. In light of this fact, it wont be discussed in too much detail. Valid command values are: IPC STAT Retrieves the shmid ds structure for a segment, and stores it in the address of the buf argument IPC SET Sets the value of the ipc perm member of the shmid ds structure for a segment. Takes the values from the buf argument. IPC RMID Marks a segment for removal. The IPC RMID command doesnt actually remove a segment from the kernel. Rather, it marks the segment for removal. The actual removal itself occurs when the last process currently attached to the segment has properly detached it. Of course, if no processes are currently attached to the segment, the removal seems immediate. To properly detach a shared memory segment, a process calls the shmdt system call. SYSTEM CALL: shmdt() SYSTEM CALL: shmdt(); PROTOTYPE: int shmdt ( char *shmaddr ); RETURNS: -1 on error: errno = EINVAL (Invalid attach address passed) After a shared memory segment is no longer needed by a process, it should be detached by calling this system call. As mentioned earlier, this is not the same as removing the segment from the kernel! After a detach is successful, the shm nattch member of the associates shmid ds structure is decremented by one. When this value reaches zero (0), the kernel will physically remove the segment. shmtool: An interactive shared memory manipulator Background Our nal example of System V IPC objects will be shmtool, which is a command line tool for creating, reading, writing, and deleting shared memory segments. Once again, like the previous examples, the segment is created during any operation, if it did not previously exist. Command Line Syntax Writing strings to the segment shmtool w "text"
66
shmtool r Changing the Permissions (mode) shmtool m (mode) Deleting the segment shmtool d Examples shmtool shmtool shmtool shmtool shmtool The Source #include #include #include #include <stdio.h> <sys/types.h> <sys/ipc.h> <sys/shm.h> w w r d m test "This is a test"
660
#define SEGSIZE 100 main(int argc, char *argv[]) { key_t key; int shmid, cntr; char *segptr; if(argc == 1) usage(); /* Create unique key via call to ftok() */ key = ftok(".", S); /* Open the shared memory segment - create if necessary */ if((shmid = shmget(key, SEGSIZE, IPC_CREAT|IPC_EXCL|0666)) == -1) { printf("Shared memory segment exists - opening as client\n"); /* Segment probably already exists - try as a client */ if((shmid = shmget(key, SEGSIZE, 0)) == -1) { perror("shmget"); exit(1); } } else {
} /* Attach (map) the shared memory segment into the current process */ if((segptr = shmat(shmid, 0, 0)) == -1) { perror("shmat"); exit(1); } switch(tolower(argv[1][0])) { case w: writeshm(shmid, segptr, argv[2]); break; case r: readshm(shmid, segptr); break; case d: removeshm(shmid); break; case m: changemode(shmid, argv[2]); break; default: usage(); } } writeshm(int shmid, char *segptr, char *text) { strcpy(segptr, text); printf("Done...\n"); } readshm(int shmid, char *segptr) { printf("segptr: %s\n", segptr); } removeshm(int shmid) { shmctl(shmid, IPC_RMID, 0); printf("Shared memory segment marked for deletion\n"); } changemode(int shmid, char *mode) { struct shmid_ds myshmds; /* Get current values for internal data structure */ shmctl(shmid, IPC_STAT, &myshmds); /* Display old permissions */ printf("Old permissions were: %o\n", myshmds.shm_perm.mode); /* Convert and load the mode */ sscanf(mode, "%o", &myshmds.shm_perm.mode);
68
/* Update the mode */ shmctl(shmid, IPC_SET, &myshmds); printf("New permissions are : %o\n", myshmds.shm_perm.mode); } usage() { fprintf(stderr, fprintf(stderr, fprintf(stderr, fprintf(stderr, fprintf(stderr, exit(1); } Sven Goldt The Linux Programmers Guide "shmtool - A utility for tinkering with shared memory\n"); "\nUSAGE: shmtool (w)rite <text>\n"); " (r)ead\n"); " (d)elete\n"); " (m)ode change <octal mode>\n");