Cluster Management: AIX Version 7.1
Cluster Management: AIX Version 7.1
Cluster Management: AIX Version 7.1
Cluster Management
SC23-6779-00
Cluster Management
SC23-6779-00
Note Before using this information and the product it supports, read the information in Notices on page 25.
First Edition (September 2010) This edition applies to AIX Version 7.1 and to all subsequent releases and modifications until otherwise indicated in new editions. Copyright IBM Corporation 2010. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
About this document . . . . . . . . . v
Highlighting . . . . Case-sensitivity in AIX . ISO 9000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v . v . v Troubleshooting with the snap command . . . Troubleshooting with the syslog facility . . . Troubleshooting with cluster maintenance mode Sample output for cluster commands . . . . . clcmd date command sample output . . . . lscluster -m command sample output . . . . lscluster -i command sample output . . . . lscluster -s command sample output . . . . lscluster -d command sample output . . . . nodeState cluster event sample output . . . Code samples for cluster events . . . . . . Cluster events using AHAFS sample code . . Cluster socket programming sample code . . . 7 . 8 . 8 . 8 . 8 . 9 . 10 . 11 . 11 . 11 . 12 . 12 . 15
Notices . . . . . . . . . . . . . . 25
Trademarks . . . . . . . . . . . . . . 27
iii
iv
Highlighting
The following highlighting conventions are used in this book:
Bold Identifies commands, subroutines, keywords, files, structures, directories, and other items whose names are predefined by the system. Also identifies graphical objects such as buttons, labels, and icons that the user selects. Identifies parameters whose actual names or values are to be supplied by the user. Identifies examples of specific data values, examples of text similar to what you might see displayed, examples of portions of program code similar to what you might write as a programmer, messages from the system, or information you should actually type.
Italics Monospace
Case-sensitivity in AIX
Everything in the AIX operating system is case-sensitive, which means that it distinguishes between uppercase and lowercase letters. For example, you can use the ls command to list files. If you type LS, the system responds that the command is not found. Likewise, FILEA, FiLea, and filea are three distinct file names, even if they reside in the same directory. To avoid causing undesirable actions to be performed, always ensure that you use the correct case.
ISO 9000
ISO 9000 registered quality systems were used in the development and manufacturing of this product.
vi
Using Cluster Aware you can monitor communications and network topology changes at various levels for all available services. With cluster monitoring, you can sense that a node is down, and a cluster can detect that a specific adapter is down or that a specific interface on an adapter is down. A point-of-contact indicates that a node has actually received communication packets across this interface from another node. This communication process allows the application that is monitoring the health of a node to make discrete actions based on near real-time event notification. You can also monitor the storage devices to provide UP events and DOWN events for any recovery actions that are identified as necessary by the monitoring application.
chcluster Use this command to change the cluster configuration. The following example adds a node to the cluster configuration:
chcluster -n mycluster -m +nodeD
rmcluster Use this command to remove the cluster configuration. The following example removes the cluster configuration:
rmcluster -n mycluster
lscluster Use this command to list cluster configuration information. The following example lists the cluster configuration for all nodes:
lscluster -m
clcmd Use this command to distribute a command to a set of nodes that are members of a cluster. The following example lists the date for all the nodes in the cluster:
clcmd date
Related concepts Sample output for cluster commands on page 8 You can view sample output for the lscluster -m command, the lscluster -i command, the lscluster -s command, and the lscluster -d command. Related information chcluster command clcmd command lscluster command mkcluster command rmcluster command
Cluster repository
The cluster repository disk is used as the central repository for the cluster configuration data.
The cluster repository disk must be accessible from all nodes in the cluster and is a minimum of 10 GB in size. The cluster repository disk is backed up by a redundant and highly available storage configuration. The cluster repository disk should be configured for RAID to accommodate the requirements of the data center. The cluster repository disk is a special device for the cluster. The use of LVM commands are not supported when used on the cluster repository disk. The AIX LVM commands are single node administrative commands, and are not applicable in a clustered configuration. The cluster repository disk is renamed to a private device name. Due to the special device characteristics required by the cluster repository disk, a raw section of the disk and a section of the disk that contains a special volume group and special logical volumes are used during cluster operations.
Naming a cluster
When you are naming a cluster you must follow specific guidelines. The only acceptable ASCII characters you can use when naming a cluster are A - Z, a - z, 0 - 9, (hyphen), . (period), and _ (underscore). The first character of the cluster name and domain name cannot be a hyphen. The maximum length of a cluster name is 63 characters.
Cluster communication
Cluster communication takes advantage of traditional networking interfaces, such as IP based network communications and storage interface communication through Fibre Channel and SAS adapters. When you use both the IP-based network communications and the storage interface communications, all nodes in the cluster can always communicate with any other nodes in the cluster configuration. Having clusters in this configuration eliminates "split brain" incidents. You must complete the Fibre Channel setup before the cluster can use the storage interfaces as an alternative communication path. The SAS adapter does not require special setup. During SAN port configuration you must verify that your server interfaces are connected to the SAN fabric ports in the same zone. Related concepts Setting up cluster storage communication on page 6 You must complete the following setup before creating a cluster that uses storage communication interfaces.
The following steps display the process for event handling: 1. Create a monitor file based on the /aha directory.
2. Write the required information to the monitor file to represent the wait type, either a select call or blocking read call, and when the event should be triggered. For example, a state change of node down. 3. Wait in a select ( ) call or a blocking read ( ) call. 4. Read from the monitor file to obtain the event data. Related concepts nodeState cluster event sample output on page 11 Related information AIX Event Infrastructure for AIX and AIX Clusters - AHAFS
Note: To find the node number, view the output from the lscluster m command. For the cluster shorthand ID, you can also use the get_clusterid function. To start the socksimple program as the sender on node 3 (nodeC), run the following command:
./socksimple -s -a 1
Note: The a (address) option sends the packets to node 1 in this local cluster. The following code is output from running the socksimple s a 1 command:
./socksimple -s -a 1 socksimple version 1.2 socksimple 1/12 with ttl=1: 1275 bytes from cluster host id = 1: seqno=1275 1276 bytes from cluster host id = 1: seqno=1276 1277 bytes from cluster host id = 1: seqno=1277 1278 bytes from cluster host id = 1: seqno=1278 --- socksimple statistics --4 packets transmitted, 4 packets received round-trip min/avg/max = 0.267/0.291/0.411 ms ttl=1 ttl=1 ttl=1 ttl=1 time=0.411 time=0.275 time=0.287 time=0.284 ms ms ms ms
Note: For the most current list of supported Fibre Channel adapters, contact your IBM representative. To configure the Fibre Channel adapters that will be used for cluster storage communications, complete the following steps: Note: In the following steps the X in fcsX represents the number of your Fibre Channel adapters, for example, fcs1, fsc2, or fcs3. 1. Run the following command:
rmdev -Rl fcsX
Note: If you booted from the Fibre Channel adapter, you do not need to complete this step. 2. Run the following command:
chdev -l fcsX -a tme=yes
Note: If you booted from the Fibre Channel adapter, add the -P flag. 3. Run the following command:
chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail
4. Run the cfgmgr command. Note: If you booted from the Fibre Channel adapter and used the -P flag, you must reboot. 5. Verify the configuration changes by running the following command:
lsdev -C | grep sfwcom
The following is an example of the output displayed from running the lscluster -i command:
lsdev -C | grep sfwcom sfwcomm0 Available 01-00-02-FF Fiber Channel Storage Framework Comm sfwcomm1 Available 01-01-02-FF Fiber Channel Storage Framework Comm
After you create the cluster, you can list the cluster interfaces and view the storage interfaces by running the following command:
lscluster -i
Related concepts Cluster communication on page 4 Cluster communication takes advantage of traditional networking interfaces, such as IP based network communications and storage interface communication through Fibre Channel and SAS adapters.
Disaster recovery
When you are setting up the cluster configuration, you must create a disaster recovery plan. When creating the disaster recovery plan, evaluate the storage fabric to adequately cover the requirements of the data center. When you have a total loss of the hardware and storage elements associated with the cluster configuration, the cluster nodes must be completely rebuilt. In the operating system image, there are inter-relationships of device names and host names that are unique and that cannot be restored from a backup disk and or from an image of the cluster repository disk. You can reinstall a lost node if the cluster repository is intact. To manually configure the newly reinstalled node back into the cluster configuration, complete the following steps: 1. Save the output from the lscluster -m command. From the output, identify the universally unique identifier (UUID value for the node). 2. Verify that the node's host name is exactly as it was before it was lost. 3. Install the new node. 4. To update the node's UUID, run the following command (with the node's UUID):
chdev -l cluster0 -a node_uuid=2adef228-a712-11df-8baa-0245c0004002
Note: The UUID value is located in the output from the lscluster -m command.
The following structure is an example of the data files collected during the snap script execution for Cluster Aware for AIX:
/tmp/ibmsupt | -- caa | -- Data | |-- 20100817215934 (For example, a timestamp at which "snap caa" was run) | | | |-- nodeA.austin.ibm.com.tar.gz | |-- ... | |-- nodeB.austin.ibm.com.tar.gz | |-Cluster Aware for AIX
| |-- nodeC.austin.ibm.com.tar.gz | -- ... (For example, more timestamp directories to distinguish separate "snap caa" invocations)
3. Use the touch command on the syslog.caa file to initiate the syslog facility. Related information syslog.conf file refresh command
------------------------------NODE nodeB.austin.ibm.com ------------------------------Fri Jul 30 08:00:00 CDT 2010 ------------------------------NODE nodeC.austin.ibm.com ------------------------------Fri Jul 30 08:00:00 CDT 2010
10
11
Related concepts Cluster event management on page 4 AIX event management is implemented using a pseudo-file system architecture. The use of the pseudo-file system allows you to use existing application programming interfaces (APIs) to program the monitoring of events, such as a select ( ) call or a blocking read ( ) call.
#define MAX_WRITE_STR_LEN
void syntax(char *prog); int ahaMonFile(char *str); static int mk_parent_dirs (char *path); void read_data (int fd,int outfd); char *monFile;
test_prog :: main
int main (int argc, char *argv[]) { int fd,outfd, rc,i=0,cnt=0; fd_set readfds; char *outputFile; char wrStr[MAX_WRITE_STR_LEN+1]; char waitInRead[] = "WAIT_TYPE=WAIT_IN_READ"; if (argc < 5) syntax( argv[0]); monFile = argv[1]; if ( ! ahaMonFile(monFile) ) /* Not a .mon file under /aha */ syntax( argv[0]); /* Create intermediate directories of the .mon file */ rc = mk_parent_dirs(monFile); if (rc) { fprintf (stderr, "Could not create intermediate directories of the file %s !\n", monFile); return(-1); }
12
printf("Monitor file name: %s\n", monFile); sprintf (wrStr, "%s", argv[2]); cnt = atoi(argv[3]); printf("Write String : %s\n", wrStr); outputFile = argv[4]; fd = open (monFile, O_CREAT|O_RDWR); if (fd < 0) { fprintf (stderr,"Could not open the file %s; errno = %d\n", monFile,errno); exit (1); } outfd = open (outputFile, O_CREAT|O_RDWR); if (outfd < 0) { fprintf (stderr, "Could not open the file %s; errno = %d !\n", monFile, errno); return(-1); } write(fd, wrStr, strlen(wrStr)); for(i = 0; i < cnt; i++) { if (strstr(wrStr, waitInRead) == NULL) { FD_ZERO(&readfds); FD_SET(fd, &readfds); printf( "Entering select() to wait till the event corresponding to the AHA node %s occurs.\n", monFile); printf("Please issue a command from another window to trigger this event.\n"); rc = select (fd+1, &readfds, NULL, NULL, NULL); printf("\nThe select() completed. \n"); if (rc <= 0) /* No event occurred or an error was found. */ { fprintf (stderr, "The select() returned %d.\n", rc); perror ("select: "); return (-1); } if(! FD_ISSET(fd, &readfds)) goto end; printf("The event corresponding to the AHA node %s has occurred.\n", monFile); } else { printf( "Entering read() to wait till the event corresponding to the AHA node %s occurs.\n", monFile); printf("Please issue a command from another window to trigger this event.\n"); } read_data(fd,outfd); } end: close(fd); close(outfd); }
test_prog :: syntax
/* -------------------------------------------------------------------------- */ void syntax(char *prog) { printf("\nSYNTAX: %s <aha-monitor-file> [<key1>=<value1>[;<key2>=<value2>;...]] <count> <outfile> \n",prog); exit (1); }
Cluster Aware for AIX
13
test_prog :: ahaMonFile
/* -------------------------------------------------------------------------* PURPOSE: To check whether the file provided is an AHA monitor file. */ int ahaMonFile(char *str) { char cwd[PATH_MAX]; int len1=strlen(str), len2=strlen(".mon"); int rc = 0; struct stat sbuf; /* Make sure /aha is mounted. */ if ((stat("/aha", &sbuf) < 0) || (sbuf.st_flag != FS_MOUNT)) { printf("ERROR: The filesystem /aha is not mounted!\n"); return (rc); } /* Make sure the path has .mon as a suffix. */ if ((len1 <= len2) || (strcmp ( (str + len1 - len2), ".mon")) ) goto end; if (! strncmp (str, "/aha",4)) /* The given path starts with /aha */ rc = 1; else /* It could be a relative path */ { getcwd (cwd, PATH_MAX); if ((str[0] != / ) && /* Relative path and */ (! strncmp (cwd, "/aha",4)) /* cwd starts with /aha . */ ) rc = 1; } end: if (!rc) printf("ERROR: %s is not an AHA monitor file !\n", str); return (rc); }
test_prog :: mk_parent_dirs
/*----------------------------------------------------------------* NAME: mk_parent_dirs() * PURPOSE: To create intermediate directories of a .mon file if * they are not created. */ static int mk_parent_dirs (char *path) { char s[PATH_MAX]; char *dirp; struct stat buf; int rc=0; dirp = dirname(path); if (stat(dirp, &buf) != 0) { sprintf(s, "/usr/bin/mkdir -p %s", dirp); rc = system(s); } return (rc); }
14
test_prog :: read_data
/*----------------------------------------------------------------* PURPOSE: To parse and print the data received at the occurrence * of the event. */ void read_data (int fd,int outfd) { #define READ_BUF_SIZE 3072 char data[READ_BUF_SIZE]; char *p, *line; char cmd[64]; time_t sec, nsec; pid_t pid; uid_t uid, luid; gid_t gid; char curTm[64]; int n; int stackInfo = 0; char uname[64], lname[64], gname[64]; bzero((char *)data, READ_BUF_SIZE); /* Read the info from the beginning of the file. */ n=pread(fd, data,READ_BUF_SIZE, 0); p = data; printf("%s\n",p); write(outfd, data, n); }
Function :: main
#include <socksimple.h> /* TEST Program Only */ int int int int int int int sndflag=0; /* sender flag */ rcvflag=0; /* receiver flag */ iend=DEFAULT_END; istart=DEFAULT_START; errcount=DEFAULT_ERRCOUNT; actual_err=0; current_ping;
int main(int argc, char **argv) { int c; /* hold command-line args */ extern int getopt(); /* for getopt */ extern char *optarg; /* for getopt */ /* parse command-line arguments */ while ((c = getopt(argc, argv, "vrsa:p:t:b:e:c:")) != -1) { switch (c) { case r: /* socksimple receiver */ rcvflag=1; break; case s: /* socksimple sender */ sndflag=1;
Cluster Aware for AIX
15
break; case v: verbose=1; break; case a: /* socksimple address override */ strcpy(arg_addr_str, optarg); break; case p: /* socksimple port override */ arg_port = atoi(optarg); break; case b: istart = atoi(optarg); if ( istart <= 0 ) istart = 1; break; case c: errcount = atoi(optarg); break; case e: if ( iend > MAX_BUF_LEN ) iend = MAX_BUF_LEN; iend = atoi(optarg); break; case t: /* socksimple ttl override */ arg_ttl = atoi(optarg); break; case ?: usage(); break; } } /* verify one and only one send or receive flag */ if ( ((!rcvflag) && (!sndflag)) || ((rcvflag) && (sndflag)) ) { usage(); } current_ping=istart; printf("socksimple version %d.%d\n", VERSION_MAJOR, VERSION_MINOR); init_socket(); get_local_host_info(); if (sndflag) { printf("socksimpleing %s/%d with ttl=%d:\n\n", arg_addr_str, arg_port, arg_ttl); /* catch interrupts with clean_exit() */ signal(SIGINT, clean_exit); /* catch alarm signal with send_socksimple() */ signal(SIGALRM, send_socksimple); /* send an alarm signal now */ send_socksimple(SIGALRM); /* listen for response packets */ sender_listen_loop(); } else {
16
receiver_listen_loop(); } exit(0); }
Function :: init_socket
void init_socket() { int flag_on=1; /* create a UDP socket */ if ((sock = socket(AF_CLUST, SOCK_DGRAM, 0)) < 0) { perror("receive socket() failed"); exit(1); } /* construct a cluster address structure */ memset(&dst_addr, 0, sizeof(dst_addr)); dst_addr.sclust_family = AF_CLUST; dst_addr.sclust_len = sizeof(struct sockaddr_clust); if ( sndflag ) { dst_addr.sclust_addr = atoi(arg_addr_str); dst_addr.sclust_port = arg_port; dst_addr.sclust_cluster_id = WWID_LOCAL_CLUSTER; } memset(&src_addr, 0, sizeof(src_addr)); src_addr.sclust_family = AF_CLUST; src_addr.sclust_len = sizeof(struct sockaddr_clust); src_addr.sclust_addr = get_clusterid(); src_addr.sclust_port = arg_port; src_addr.sclust_cluster_id = WWID_LOCAL_CLUSTER; /* bind to address to socket */ if ((bind(sock, (struct sockaddr *) &src_addr, sizeof(src_addr))) < 0) { perror("bind() failed"); exit(1); } }
Function :: get_local_host_info
void get_local_host_info() { char hostname[MAX_HOSTNAME_LEN]; struct hostent* hostinfo; /* lookup local hostname */ gethostname(hostname, MAX_HOSTNAME_LEN); if (verbose) printf("Localhost is %s, ", hostname); /* use gethostbyname to get hosts IP address */ if ((hostinfo = gethostbyname(hostname)) == NULL) { perror("gethostbyname() failed"); } localIP.s_addr = *((unsigned long *) hostinfo->h_addr_list[0]); if (verbose) printf("%s\n", inet_ntoa(localIP)); pid = getpid(); }
Function :: send_socksimple
void send_socksimple(int sig) { struct timeval now; int ioffset;
Cluster Aware for AIX
17
/* increment count, check if done */ if (current_ping >= iend) { clean_exit(); } /* clear send buffer */ memset(&socksimple_payload, 4, sizeof(socksimple_payload)); /* populate the socksimple packet */ socksimple_payload.socksimple_packet.type = SENDER; socksimple_payload.socksimple_packet.version_major = htons(VERSION_MAJOR); socksimple_payload.socksimple_packet.version_minor = htons(VERSION_MINOR); socksimple_payload.socksimple_packet.seq_no = htonl(current_ping); socksimple_payload.socksimple_packet.src_host = get_clusterid(); socksimple_payload.socksimple_packet.dest_host = atoi(arg_addr_str); socksimple_payload.socksimple_packet.ttl = arg_ttl; socksimple_payload.socksimple_packet.pid = pid; ioffset = current_ping - strlen(PKT_END)- sizeof(struct socksimple_struct) - 2; strcpy((char *) &socksimple_payload.payload[ioffset],PKT_END); gettimeofday(&now, NULL); socksimple_payload.socksimple_packet.tv.tv_sec = htonl(now.tv_sec); socksimple_payload.socksimple_packet.tv.tv_usec = htonl(now.tv_usec); /* send the outgoing packet */ send_packet(&socksimple_payload, &dst_addr, current_ping); current_ping++; /* set another alarm call to send in 1 second */ (void) signal(SIGALRM, send_socksimple); alarm(1); }
Function :: send_packet
void send_packet(struct socksimple_payload *packet, struct sockaddr_clust *target, int ilen) { int pkt_len; pkt_len = ilen; /* send string to cluster socket address */ if ((sendto(sock, packet, pkt_len, 0, (struct sockaddr *) target, sizeof(struct sockaddr_clust))) != pkt_len) { perror("sendto() sent incorrect number of bytes"); exit(1); } packets_sent++; }
Function :: sender_listen_loop
oid sender_listen_loop() { char *recv_packet; /* buffer to receive packet */ int recv_len; /* len of packet received */ struct timeval current_time; /* time value structure */ double rtt; /* round trip time */ socklen_t from_len; struct sockaddr_clust send_host; int ilen; ilen = sizeof(struct socksimple_payload); if (!(recv_packet = (char *)malloc(ilen))) { fprintf(stderr,"malloc_failed\n");
18
exit(-1); } from_len = sizeof(struct sockaddr_clust); while (1) { /* clear the receive buffer */ memset(recv_packet, 0, ilen); /* block waiting to receive a packet */ if ((recv_len = recvfrom(sock, recv_packet, ilen, 0, (struct sockaddr *) &send_host, &from_len)) < 0) { if (errno == EINTR) { /* interrupt is ok */ continue; } else { perror("recvfrom() failed"); exit(1); } } /* get current time */ gettimeofday(¤t_time, NULL); /* process the received packet */ if (process_socksimple_packet(recv_packet, recv_len, RECEIVER) == 0) { /* packet processed successfully */ /* calculate round trip time in milliseconds */ subtract_timeval(¤t_time, &rcvd_pkt->socksimple_packet.tv); rtt = timeval_to_ms(¤t_time); /* keep rtt total, min and max */ rtt_total += rtt; if (rtt > rtt_max) rtt_max = rtt; if (rtt < rtt_min) rtt_min = rtt; /* output received packet information */ printf("%d bytes from cluster host id = %d: seqno=%d ttl=%d time=%.3f ms\n", recv_len, send_host.sclust_addr, rcvd_pkt->socksimple_packet.seq_no, rcvd_pkt->socksimple_packet.ttl, rtt); } } }
Function :: receiver_listen_loop
void receiver_listen_loop() { char *recv_packet; /* buffer to receive packet */ int recv_len; /* len of string received */ socklen_t from_len; struct sockaddr_clust send_host; int ilen,ioffset; ilen = sizeof(struct socksimple_payload); if (!(recv_packet = (char *)malloc(ilen))) { fprintf(stderr,"malloc_failed\n"); exit(-1); } printf("Listening on %s/%d:\n\n", arg_addr_str, arg_port); from_len = sizeof(struct sockaddr_clust); while (1) {
Cluster Aware for AIX
19
/* clear the receive buffer */ memset(recv_packet, 0, ilen); /* block waiting to receive a packet */ if ((recv_len = recvfrom(sock, recv_packet, ilen, 0, (struct sockaddr *) &send_host, &from_len)) < 0) { perror("recvfrom() failed"); exit(1); } /* printf("recvfrom cluster node id = %d port = %d \n",send_host.sclust_addr, send_host.sclust_port); */ /* process the received packet */ if (process_socksimple_packet(recv_packet, recv_len, SENDER) == 0) { /* packet processed successfully */ /* printf("Replying to socksimple from cluster node id = %d bytes=%d seqno=%d ttl=%d\n", rcvd_pkt->src_host, recv_len, rcvd_pkt->seq_no, rcvd_pkt->ttl); */ printf("Replying to socksimple from cluster node id = %d bytes=%d seqno=%d ttl=%d\n", send_host.sclust_addr, recv_len, rcvd_pkt->socksimple_packet.seq_no, rcvd_pkt->socksimple_packet.ttl); /* populate socksimple response packet */ memset(&socksimple_payload, 6, sizeof(socksimple_payload)); socksimple_payload.socksimple_packet.type = RECEIVER; socksimple_payload.socksimple_packet.version_major = htons(VERSION_MAJOR); socksimple_payload.socksimple_packet.version_minor = htons(VERSION_MINOR); socksimple_payload.socksimple_packet.seq_no = htonl(rcvd_pkt->socksimple_packet.seq_no); socksimple_payload.socksimple_packet.dest_host = rcvd_pkt->socksimple_packet.src_host; socksimple_payload.socksimple_packet.src_host = get_clusterid(); socksimple_payload.socksimple_packet.ttl = rcvd_pkt->socksimple_packet.ttl; socksimple_payload.socksimple_packet.pid = rcvd_pkt->socksimple_packet.pid; socksimple_payload.socksimple_packet.tv.tv_sec = htonl(rcvd_pkt->socksimple_packet.tv.tv_sec); socksimple_payload.socksimple_packet.tv.tv_usec = htonl(rcvd_pkt->socksimple_packet.tv.tv_usec); ioffset = recv_len - sizeof(struct socksimple_struct) - strlen(PKT_END) - 2; strcpy((char *) &socksimple_payload.payload[ioffset],PKT_END); /* send response packet */ send_packet(&socksimple_payload, &send_host, recv_len); } } }
Function :: subtract_timeval
void subtract_timeval(struct timeval *val, const struct timeval *sub) { /* subtract sub from val and leave result in val */ if ((val->tv_usec -= sub->tv_usec) < 0) { val->tv_sec--; val->tv_usec += 1000000; } val->tv_sec -= sub->tv_sec; }
Function :: timeval_to_ms
double timeval_to_ms(const struct timeval *val) { /* return the timeval converted to a number of milliseconds */ return (val->tv_sec * 1000.0 + val->tv_usec / 1000.0); }
20
Function :: process_socksimple_packet
int process_socksimple_packet(char *packet, int recv_len, unsigned char type) { int ioffset, icheck; /* validate packet size */ ioffset = recv_len - strlen(PKT_END) - 2 - sizeof(struct socksimple_struct); /* cast data to socksimple_struct */ rcvd_pkt = (struct socksimple_payload *) packet; /* convert required fields to host byte order */ rcvd_pkt->socksimple_packet.version_major = ntohs(rcvd_pkt->socksimple_packet.version_major); rcvd_pkt->socksimple_packet.version_minor = ntohs(rcvd_pkt>socksimple_packet.version_minor); rcvd_pkt->socksimple_packet.seq_no = ntohl(rcvd_pkt->socksimple_packet.seq_no); rcvd_pkt->socksimple_packet.tv.tv_sec = ntohl(rcvd_pkt->socksimple_packet.tv.tv_sec); rcvd_pkt->socksimple_packet.tv.tv_usec = ntohl(rcvd_pkt->socksimple_packet.tv.tv_usec); /* validate socksimple version matches */ if ((rcvd_pkt->socksimple_packet.version_major != VERSION_MAJOR) || (rcvd_pkt->socksimple_packet.version_minor != VERSION_MINOR)) { if (verbose) printf("Discarding packet: version mismatch (%d.%d)\n", rcvd_pkt->socksimple_packet.version_major, rcvd_pkt->socksimple_packet.version_minor); return(-1); } /* validate socksimple packet type (sender or receiver) */ if (rcvd_pkt->socksimple_packet.type != type) { if (verbose) { switch (rcvd_pkt->socksimple_packet.type) { case SENDER: printf("Discarding sender packet\n"); break; case RECEIVER: printf("Discarding receiver packet\n"); break; case ?: printf("Discarding packet: unknown type(%c)\n", rcvd_pkt->socksimple_packet.type); break; } } return(-1); } /* if response packet, validate pid */ if (rcvd_pkt->socksimple_packet.type == RECEIVER) { if (rcvd_pkt->socksimple_packet.pid != pid) { if (verbose) printf("Discarding packet: pid mismatch (%d/%d)\n", (int)pid, (int)rcvd_pkt->socksimple_packet.pid); return(-1); } } if (strcmp((char *) &rcvd_pkt->payload[ioffset],PKT_END)) { printf("Payload mismatch: = %s\n", &rcvd_pkt->payload[ioffset]); printf(" payload mismatch: = %x:%x:%x:%x:%x:%x\n", rcvd_pkt->payload[ioffset], rcvd_pkt->payload[ioffset+1], rcvd_pkt->payload[ioffset+2], rcvd_pkt->payload[ioffset+3], rcvd_pkt->payload[ioffset+4],
Cluster Aware for AIX
21
rcvd_pkt->payload[ioffset+5]); actual_err++; } for (icheck = 0; icheck < ioffset; icheck++) { if (rcvd_pkt->socksimple_packet.type == RECEIVER) { if ( (int) rcvd_pkt->payload[icheck] != 6 ) { printf("Junk at offset %d 0x%x\n", icheck, rcvd_pkt->payload[icheck]); actual_err++; } } else { if ( (int) rcvd_pkt->payload[icheck] != 4 ) { printf("Junk at offset %d 0x%x\n", icheck, rcvd_pkt->payload[icheck]); actual_err++; } } if ( actual_err > errcount ) exit(-1); } /* packet validated, increment counter */ packets_rcvd++; return(0); }
Function :: clean_exit
void clean_exit() { /* close the socket */ close(sock); /* output statistics and exit program */ printf("\n--- socksimple statistics ---\n"); printf("%d packets transmitted, %d packets received\n", packets_sent, packets_rcvd); if (packets_rcvd == 0) printf("round-trip min/avg/max = NA/NA/NA ms\n"); else printf("round-trip min/avg/max = %.3f/%.3f/%.3f ms\n", rtt_min, (rtt_total/packets_rcvd), rtt_max); exit(0); }
Function :: usage
void usage() { printf("Usage: socksimple -r|-s [-v] [-a address]"); printf(" [-p port] [-t ttl]\n\n"); printf("-r|-s Receiver or sender. Required argument,\n"); printf(" mutually exclusive\n"); printf("-a address Cluster address to listen/send on,\n"); printf(" overrides the default.\n"); printf("-p port port to listen/send on,\n"); printf(" overrides the default of 12.\n"); printf("-p ttl Time-To-Live to send,\n"); printf(" overrides the default of 1.\n"); printf("-v Verbose mode\n"); exit(1); }
22
#define SENDER s /* socksimple sender identifier */ #define RECEIVER r /* socksimple receiver identifier */ #define PKT_END "lwrwashere" /* socksimple receiver identifier */ /* socksimple packet structure */ struct socksimple_struct { unsigned short version_major; unsigned short version_minor; unsigned char type; unsigned char ttl; clustid_t src_host; clustid_t dest_host; unsigned int seq_no; pid_t pid; struct timeval tv; }; struct socksimple_payload { struct socksimple_struct socksimple_packet; char payload[MAX_BUF_LEN]; } socksimple_payload; /* pointer to socksimple packet buffer */ struct socksimple_payload *rcvd_pkt; int sock; /* socket descriptor */ pid_t pid; /* pid of socksimple program */ struct sockaddr_clust dst_addr; struct sockaddr_clust src_addr; struct in_addr localIP; /* socket address structure */ /* socket address structure */
/* counters and statistics variables */ int packets_sent = 0; int packets_rcvd = 0; double rtt_total = 0; double rtt_max = 0;
Cluster Aware for AIX
23
double rtt_min
= 999999999.0;
/* default command-line arguments */ char arg_addr_str[16] = "1"; int arg_port = 12; unsigned char arg_ttl = 1; int verbose=0; /* function prototypes */ void init_socket(); void get_local_host_info(); void send_socksimple(int); void send_packet(struct socksimple_payload *payload, struct sockaddr_clust *target, int len); void sender_listen_loop(); void receiver_listen_loop(); void subtract_timeval(struct timeval *val, const struct timeval *sub); double timeval_to_ms(const struct timeval *val); int process_socksimple_packet(char *packet, int recv_len, unsigned char type); void clean_exit(); void usage();
#define CLUSTPCB_REF(rp) { \ fetch_and_add(&((rp)->rclust_refcnt), 1); \ } #define CLUSTPCB_UNREF(rp) { \ fetch_and_add(&((rp)->rclust_refcnt), -1); { \ } #endif /* _H_CLUST_VAR */
24
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan, Ltd. 1623-14, Shimotsuruma, Yamato-shi Kanagawa 242-8502 Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this
Copyright IBM Corp. 2010
25
one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation Dept. LRAS/Bldg. 903 11501 Burnet Road Austin, TX 78758-3400 U.S.A. Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. All IBM prices shown are IBM's suggested retail prices, are current and are subject to change without notice. Dealer prices may vary. This information is for planning purposes only. The information herein is subject to change before the products described become available. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs. Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as follows:
26
(your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. Copyright IBM Corp. _enter the year or years_. If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Other company, product, or service names may be trademarks or service marks of others.
Notices
27
28
Printed in USA
SC23-6779-00