Schema Thatdefines The Operations Database
Schema Thatdefines The Operations Database
A deep dive into the key relationships of the schema that defines the Operations Database
Len Greenwood (len.greenwood@uk.ibm.com) DataStage Core Architect IBM Arron Harden (arron.harden@uk.ibm.com) Senior Software Engineer IBM Geoff McClean (geoff.mcclean@uk.ibm.com) Senior Software Developer IBM Sumit Kumar (sumit.kumar6@in.ibm.com) Senior Software Developer IBM This article is a deep dive into the schema of the IBM InfoSphere DataStage and QualityStage Operations Database, and the tables and columns that make up its key relationships. Specimen SQL queries are included to demonstrate how data can be read from these tables to answer specific operational questions. You can adapt these to build, for example, custom reports based on the operational data collected at your particular DataStage and QualityStage installation. View more content in this series Skill Level: Intermediate Date: 17 May 2012
Introduction
The DataStage and QualityStage Operations Database (DSODB) is designed as a relational schema whose tables contain information about job runs and system resources used on a system that has a DataStage engine installed. This information
Copyright IBM Corporation 2012 Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Trademarks Page 1 of 24
developerWorks
ibm.com/developerWorks/
is used to drive the Operations Console, but can also be queried directly given some knowledge of the key columns of the tables and the relationships among them. This article describes the main tables and views of the DSODB, including all columns used as primary or natural keys, and all foreign key relationships defined. It gives examples of SQL SELECT statements, which can be used to answer specific questions based on the information held by DSODB. Online reference material for DSODB 8.7, including descriptions of all current tables and columns, can be found in the schema document reference in the Resources section. Table diagrams Some pictures are included to illustrate the relationships between various tables. Each box represents a table or view. Relevant column names are shown, with icons against each column to show their use as follows (Figure 1). Figure 1. Key to types of column
Note that only the columns of each table that are mentioned in the text of this article are shown. For a complete list of table columns, consult the schema document reference in the Resources section. The arrows between tables show the direction of relationships established by foreign keys and SQL views, as shown next (Figure 2).
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 2 of 24
ibm.com/developerWorks/
developerWorks
Each row in the HOST table represents a system on which all or part of an InfoSphere Information Server engine is installed. The HOSTID column is a surrogate primary key; The HostName column is the system name as derived from the hostname operating system command or the HA_ALIAS environment variable, if that is set in the environment of the EngMonApp process when it starts. For systems acting as a conductor node and on which an instance of the EngMonApp process is running, the InstallationDir column is set to a path name. This is the home directory for the Information Server engine installation, such as / opt/IBM/InformationServer/Server/DSEngine. The key of the HOSTS table is the combination of HostName and InstallationDir, which allows for more than one engine to be installed on the same host system.
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 3 of 24
developerWorks
ibm.com/developerWorks/
The CreatedTimestamp column is the UTC time at which EngMonApp inserted the row. Each time a ResMonApp instance starts, it looks to see if there is a row in the table with the values of HostName and InstallationDir it requires, and if so, it updates the MonStartTimestamp column; otherwise, it creates a new row. ResMonApp also creates new HOST entries, as required. Remote nodes have InstallationDir set to a single hyphen character, '-'. These are systems whose HostNames have been defined in the DSODBConfig.cfg file via the ResourceNode property. These rows have a CreatedTimestamp, but no MonStartTimestamp entry. Cascading delete constraints are set so that deleting a row in HOST removes all rows that point back to it. Therefore, records of any jobs that ran on that system and all resource usage records are deleted at the same time.
The JOBEXEC table Every time a new instance of a DataStage and QualityStage job run starts, it creates an initial event describing which version of a job executable has started, when, and what its invocation ID string is in the case of a multi-instanceable job. The JOBEXEC
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 4 of 24
ibm.com/developerWorks/
developerWorks
table holds details about each unique job executable for which a run has been seen. Note that this might not be the same list as would be found by querying the Metadata Repository; a job therein might never have been run, or it might no longer exist if it has been deleted from the repository since it last ran. Rows have a surrogate key JOBID used to relate all runs and other details of this particular version of the job. Job names are unique within a DataStage project, and project names are unique for a given host. Whenever a job is compiled, the job's executable has potentially been changed, so one can distinguish between versions of a job's executable by including the compilation time. Hence, the unique primary key for the table is a combination of the ProjectName, JobName, and CompilationTimestamp columns, combined with the HOSTID as a foreign key to the HOST table to identify the system where the project is located. These rows are only created once. Each run of a job either finds an existing row put there by a previous run of the same version of the job or creates a new one. Listing 1. SQL Example: Querying the JOBEXEC table
-- List names and locations of all job sequences that have -ever been run on host H. SELECT X.ProjectName, X.FolderPath, X.JobName, X.CompilationTimestamp FROM DSODB.JOBEXEC AS X JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID WHERE H.HostName = "H" AND X.JobType = "SEQ" ORDER BY X.ProjectName, X.FolderPath, X.JobName, X.CompilationTimestamp
Cascading delete constraints are set so that deleting an entry in JOBEXEC removes all records to do with any runs of that version of the job. Listing 2. SQL Example: Deleting from the JOBEXEC table
-- Delete all monitored information for all jobs that have -been run from project P on host H. DELETE FROM DSODB.JOBEXEC WHERE JOBID IN ( SELECT X.JOBID FROM DSODB.JOBEXEC AS X JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID WHERE H.HostName = "H" AND X.ProjectName = "P" )
The JOBRUN table One row is inserted in this table for each monitored run of a particular version of a job. It is updated as the job run progresses. The RUNID column is a surrogate
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 5 of 24
developerWorks
ibm.com/developerWorks/
primary key, used to relate run details from other tables (such as job parameters and log messages). The JOBID column is a foreign key to the row in the JOBEXEC table that contains the description of the exact version of the job executable that is running which host is started on, in which project, the job's name, and its compilation time. The other columns that form the unique primary key are InvocationId and CreationTimestamp. The former is only relevant for jobs with the multi-instanceable property set; this allows more than one instance of a job to be running at once, as long as each instance has a different invocation ID, which is an arbitrary string assigned by the user at runtime. Note that if a run does not have an invocation ID string, the value is set to a single hyphen. The CreationTimestamp is the UTC time when the monitoring system created the first run event. Note that this might be different from the RunStartTimestamp, in the case of a run that gets queued. The RunStartTimestamp is the UTC time that the engine really launched the run, as opposed to when it was first submitted for running. If a run is started from a job sequence, or via job control from another job, the CONTROLLING_RUNID column contains a foreign key to another entry in JOBRUN, which is the run that started this one. Therefore, all runs started by a particular instance of a sequence can be related by their common CONTROLLING_RUNID, and these potentially form a tree-structured set of relationships if job sequences are nested. As a run progresses, it generates other events when its status changes or when it produces new values for some of its monitored properties. These cause its row in JOBRUN to be updated, and the LastUpdateTimestamp column records the UTC time that last happened. A note about run statuses: there are two columns in the JobRun table that hold status values: RunMajor Status and RunMinorStatus. The major status can be used to quickly filter those runs that have finished from those that are starting up or still running. The minor status gives the specific status for the run (see the schema document reference in the Resources section for all possible values). Listing 3. SQL Example: Querying the JOBRUN table - 1
-- List start and finish times, names, and statuses of all jobs that have -been run on host H, started after time YYYY-MM-DD HH:MM:SS, -and have finished. SELECT X.ProjectName, X.JobName, R.RunStartTimestamp, R.RunEndTimestamp, R.RunMinorStatus FROM DSODB.JOBRUN AS R JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID WHERE H.HostName = "H" AND R.RunMajorStatus = "FIN" AND R.RunStartTimestamp >= "YYYY-MM-DD HH:MM:SS" ORDER BY R.RunStartTimestamp
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 6 of 24
ibm.com/developerWorks/
developerWorks
Cascading delete constraints are set so that deleting an entry in JOBRUN deletes all related details of the run (parameters, logs, metrics, etc.). If the run started other runs (if it was a job sequence, for example), all those runs are also deleted, along with their details. Listing 5. SQL Example: Deleting from the JOBRUN table
-- Delete all job runs from project P on host H -that finished normally. DELETE FROM DSODB.JOBRUN WHERE RUNID IN ( SELECT R.RUNID FROM DSODB.JOBRUN AS R JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID WHERE H.HostName = "H" AND X.ProjectName = "P" AND R.RunMinorStatus = '"FOK" )
The JOBRUNPARAMS table and JOBRUNPARAMSVIEW view There is one row in the JOBRUNPARAMS table to describe the values of the job parameters that have been set for a particular run. These are the values you would see in a message at the beginning of the run's log in the Director client. The RUNID column of this table is a foreign key to the JOBRUN table for the run to which these parameters relate. The ParamList column contains the information as an XML string; you can easily extract the names and values for the parameters by querying through the JOBRUNPARAMSVIEW view. This expands the XML into separate rows for each individual ParamName and ParamValue, with RUNID as the same foreign key.
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 7 of 24
developerWorks
ibm.com/developerWorks/
Constraints are set so that deleting an entry in JOBRUN deletes any corresponding entry in JOBRUNPARAMS. The JOBRUNLOG table As a job runs, certain of its log messages are captured and stored in the JOBRUNLOG table, where each row's RUNID column is a foreign key back to a RUNID in the JOBRUN table. Exactly which messages are captured depends on how the DSODBConfig.cfg file is configured, but in general this is a small subset of the overall jog logs, and contains only the job's initial and final messages, fatal messages, and the first N warnings. The EventId column is an integer that can be used to get messages in the order emitted by the run, since the LogTimestamp column is the UTC time to the nearest second and might not be unique. Listing 7. SQL Example: Querying the JOBRUNLOG table
-- List job details and message info for all runs on host H that have -emitted fatal messages since time YYYY-MM-DD HH:MM:SS. SELECT X.ProjectName, X.JobName, R.RunStartTimestamp, L.LogTimestamp, L.MessageId, L.MessageText FROM DSODB.JOBRUN AS R JOIN DSODB.JOBRUNLOG AS L ON R.RUNID = L.RUNID JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID WHERE H.HostName = "H" AND L.LogType = "FAT" AND R.RunStartTimestamp > "YYYY-MM-DD HH:MM:SS" ORDER BY R.RunStartTimestamp, R.RUNID, L.EventID
Constraints are set so that deleting an entry in JOBRUN deletes all corresponding entries in JOBRUNLOG. The PARALLELCONFIG table and PARALLELCONFIGNODES view Parallel job runs always specify a path to a file containing configuration information for the parallel engine (via the APT_CONFIG_FILE environment variable). The path
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 8 of 24
ibm.com/developerWorks/
developerWorks
is recorded in the ConfigFileName column of the JOBRUN table. However, that path might point to a temporary file, or the contents of the file my have changed since the job ran. So the monitoring system also reads and parses the file and stores information about how many nodes the run was configured for in the PARALLELCONFIG table. Since many runs can use the same configuration file, or at least files whose parallelization instructions are equivalent, the schema saves space by making all runs with equivalent configurations point to the same row. So the CONFIGID column in this table is a surrogate key and the CONFIGID column on the JOBRUN table is a foreign key to it. The PARALLELCONFIG table also has HOSTID as a foreign key, so configurations are partitioned by host that the run started on. The list of physical and logical nodes in the configuration is held as an XML column. Use the PARALLELCONFIGNODES view to query the physical name of each node in the configuration and the number of logical nodes assigned to each. (Note that the NodeListHash column is used simply to find whether an existing row with the same configuration exists, since the XML column cannot be used directly in a lookup.) Listing 8. SQL Example: Querying the PARALLELCONFIGNODES view
-- Get the names of all job runs started on host H that used a node named N -and ran between times YYYY-MM-D1 HH:MM:SS and YYYY-MM-D2 HH:MM:SS. SELECT R.RunStartTimestamp, X.ProjectName, X.JobName FROM DSODB.JOBRUN AS R JOIN DSODB.PARALLELCONFIGNODES AS N ON R.CONFIGID = N.CONFIGID JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID WHERE H.HostName = 'H' AND N.PhysicalName = "N" AND R.RunStartTimestamp >= "YYYY-MM-D1 HH:MM:SS" AND R.RunEndTimestamp <= "YYYY-MM-D2 HH:MM:SS" ORDER BY R.RunStartTimestamp
Constraints are set so that deleting a row in HOST deletes all PARALLELCONFIG entries generated by runs on that host (and all the runs are deleted at the same time). However, deleting entries in JOBRUN does not delete entries in PARALLELCONFIG; they might be linked to by a later run of another job on that host. Listing 9. SQL Example: Deleting from the PARALLELCONFIG table
-- Delete all rows in PARALLELCONFIG that are not referenced -by any row in JOBRUN. DELETE FROM DSODB.PARALLELCONFIG WHERE NOT EXISTS ( SELECT * FROM DSODB.JOBRUN AS R JOIN DSODB.PARALLELCONFIG AS C ON R.CONFIGID = C.CONFIGID )
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 9 of 24
developerWorks
ibm.com/developerWorks/
The JOBRUNUSAGE table and JOBRUNTOTALROWSUSAGE view During the course of a server or parallel job run, the number of rows read and written by its source and target stages (if any) is collected on a timed basis. So a snapshot of the volume of data processed by the run is built up at intervals, and can be used to graph its progress over time. Each row in JOBRUNUSAGE contains an XML structure containing a set of snapshots taken at increasing elapsed times since the start of the job. The set is identified by the StartTimestamp column, which gives the UTC time at which the set of snapshots starts, and the RUNID column, which is a foreign key to the JOBRUN table to say which run this relates to. These rows should be queried through the JOBRUNTOTALROWSUSAGE view. This expands the snapshot sets into individual snapshots that give the elapsed time since the start of the run, and the values of TotalRowsConsumed and TotalRowsProduced at that point in the run.
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 10 of 24
ibm.com/developerWorks/
developerWorks
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 11 of 24
developerWorks
ibm.com/developerWorks/
Figure 6. Key columns and relationships between tables related to JOBRUNSTAGE and JOBRUNLINK
The JOBSTAGE table A row appears in this table for each stage to be reported on for a job run. It is created if it does not exist and relates back to the JOBEXEC entry for the current run via its JOBID foreign key. The entry contains static information about the stage, such as name, description and stage type. Note that since stage names are only unique within the top level of the job, or a container if they are at a lower level, there is a ContainerPath column that might need to be used to fully identify the stage as well as the StageName column. Each row is given a surrogate key so it can be joined to its links. The JOBLINK table This contains rows for each link determined to be a source or target link for an overall job that has been run. It uses foreign key FROMSTAGEID and TOSTAGEID to point back to the row in JOBSTAGE table that describes the stage it connects to
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 12 of 24
ibm.com/developerWorks/
developerWorks
in its role as a source or target. This depends on the setting of the IsSource and IsTarget columns; in general, only one is set and only one of the foreign keys is correspondingly related. The LinkName is unique, in combination with FROMSTAGEID/TOSTAGEID, since a stage cannot have two links with the same name. The JOBRUNSTAGE table Every run that is monitored in this mode generates one row per stage, to record, among other things, the number of CPU milliseconds that can be ascribed to the stage. A parallel job might be running each stage in a partitioned mode, which involves more than one process. Therefore, the NumInstances column might be greater than 1, and the InstanceCPUList might be a comma-separated string of numbers rather than a single value. But it is easier to select the TotalCPU column, which gives the overall value by summing entries in the list, if necessary. Listing 11. SQL Example: Querying the JOBRUNSTAGE table
-- List the stages of jobs run on host H in descending order of CPU used. SELECT X.ProjectName, X.JobName, R.RunStartTimestamp, JS.ContainerPath, JS.StageName, RS.TotalCPU FROM DSODB.JOBRUN AS R JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID JOIN DSODB.JOBSTAGE AS JS ON X.JOBID = JS.JOBID JOIN DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID WHERE H.HostName = "H" ORDER BY RS.TotalCPU DESC
The JOBRUNLINK table Every run of a job that has source or target links also generates one row per such link. This records the number of rows that passed down the link (these rows contribute to the "consumed" or "produced" total row counts if the link is marked as a source or target, respectively). As for stages, parallelization is recorded via a commaseparated list of row counts per partition, where relevant, and the TotalRows column is always the sum over all partitions.
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 13 of 24
developerWorks
ibm.com/developerWorks/
The DATALOCATOR table A link can also record "locator" information. This involves providing eight strings that identify the location of an external data resource. So for a sequential file stage, these strings should identify a file-system path name; for a database table, this should be a database type, name, schema, and table name. Exact details depend on the stage type. For each unique set of identifiers, there is a separate row in the DATALOCATOR table, with a surrogate key in the LOCATORID column. Each link that has such information attempts to see if a row with the same identifiers has been created and creates it if not. The LOCATORID column in the JOBRUNLINK table is then a foreign key to that row. Table 1. The DATALOCATOR columns are used as follows
Column name ComputerName SoftwareProductName DataStoreSubClass DataStoreName DataSchemaSubClass DataSchemaName DataCollectionSubClass DataCollectionName Usage Name of system that hosts the resource Name of product that manages the resource Overall type of the data store Overall name of database or path to top level of resource Type of next level of resource if applicable Name of next level of resource if applicable Type of lowest level of resource Name of lowest level of resource, such as table or file
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 14 of 24
ibm.com/developerWorks/
developerWorks
DSODB.JOBSTAGE AS JS ON X.JOBID = JS.JOBID DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID DSODB.JOBLINK AS JL ON (JS.STAGEID = JL.FROMSTAGEID OR JS.STAGEID = JL.TOSTAGEID) DSODB.JOBRUNLINK AS RL ON R.RUNID = RL.RUNID DSODB.DATALOCATOR AS DL ON RL.LOCATORID = DL.LOCATORID
H.HostName = "H" AND JL.IsTarget = 1 AND DL.DataStoreName = "D" ORDER BY TableName, RL.TotalRows DESC, R.RunStartTimestamp
The HOSTDETAIL table Every row in the HOST table should have at least one row from the HOSTDETAIL table related to it. These rows contain information about the operating system properties that were in force at some time when an instance of the ResMonApp process started. The CreatedTimestamp column is the UTC time when the particular set of properties was first inserted. The LastCheckedTimestamp is the UTC time when an instance of ResMonApp last determined that the properties of that particular system had not changed since the previous check (i.e., none of the values in any of the other columns needed to be changed). If any column would require a different value, a new row is inserted, rather than the exiting row being updated. The new row has its CreatedTimestamp and LastCheckedTimestamp set to "now." The rows form a timeline that can be used to track changes to the operating system properties of a particular node. The entries in HOSTDETAIL are related to the HOST table by two foreign keys. HOSTID is the primary key of the HOST row that identifies which host name this set of detail records describes. HEAD_HOSTID is the primary key of the HOST row that identifies the host on which the instance of ResMonApp that inserted the detail record is running. In the case of a conductor node, HOSTID and HEAD_HOSTID are the same; for a remote node, they are different. In the latter case, the detail record is recording the properties of a remote node as seen from one particular conductor
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 15 of 24
developerWorks
ibm.com/developerWorks/
node. Note that the unique primary key for this table is the combination of HOSTID, HEAD_HOSTID, and CreatedTimestamp, since any change to columns other than LastCheckedTimestamp results in a new row being inserted, rather than this row being updated. Listing 14. SQL Example: Querying the HOSTDETAIL table 1
-- List operating system name and version for host H as they were at time -YYYY-MM-DD HH:MM:SS (this means finding the row whose CreatedTimestamp -is nearest to that time). SELECT Max(HD.CreatedTimestamp) AS "Last Changed", HD.PlatformName, HD.PlatformVersion FROM DSODB.HOSTDETAIL AS HD JOIN DSODB.HOST AS H ON HD.HEAD_HOSTID = H.HOSTID WHERE H.HostName = "H" AND HD.CreatedTimestamp <= "YYYY-MM-DD HH:MM:SS" GROUP BY HD.PlatformName, HD.PlatformVersion
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 16 of 24
ibm.com/developerWorks/
developerWorks
Figure 8. RESOURCESNAP and RESOURCEUSAGE tables; key columns and related views
The RESOURCESNAP table This table only contains one row per host that is being monitored from a particular engine. Its HOSTID column is a foreign key to an entry in the HOST table to identify which system's resources are described. The HEAD_HOSTID column in RESOURCESNAP is also defined as a foreign key that identifies which engine system inserted the row. In the case of a conductor node, HOSTID, and HEAD_HOSTID are the same; for a remote node, they are different. On each engine (corresponding to a HEAD_HOSTID value), for each system it is monitoring including itself (the HOSTID values), a row is updated at intervals containing all the system resource usage information being collected for that system. The LastUpdateTimestamp column tells you when the last update took place. The content of RESOURCESNAP should always be queried through either of the views RESOURCESNAPSYSTEM or RESOURCESNAPDISKS. For the full set of columns available via these views, please see the schema document reference in the Resources section. The RESOURCESNAPSYSTEM view This view presents several columns that hold various values for CPU, free memory, process counts, and paging. The names of the columns are used to group them counters. So CPUPctxxxx refers to percentage of CPU used; MemFreeKBxxxx refers to free memory in kilobytes; ProcNumxxxx refers to number of processes; PageNumxxxx refers to number of paging events.
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 17 of 24
developerWorks
ibm.com/developerWorks/
For example, CPUPctUser is the percentage of the CPU assigned to user processes in the interval leading up to LastUpdateTimestamp; MemFreeKBPhysical is the number of kilobytes of physical memory that were free in that period. The RESOURCESNAPDISKS view This view expands a repeating field in RESOURCESNAP, which is controlled by the number of disk paths, if any, that have been entered in the DSODBConfig.cfg configuration file (properties ResourceLocalFS and ResourceRemoteFS, which can be repeated). The DiskPathMonitored column identifies each file system path, and the DiskTotalKB and DiskFreeKB columns give the total number of kilobytes and number currently unused for the disk mounted on that path. The RESOURCEUSAGE table This contains counters derived from those in the RESOURCESNAP table, aggregated over intervals and arranged as a timeline. As an entry in RESOURCESNAP is updated, its values are used to calculate the latest maximum, minimum, and average numbers over the most recent time period, then at intervals a row is inserted to describe the behavior over that period. Using the default configuration settings the RESOURCESNAP table is updated every 10 seconds, and every 60 seconds, a new row is inserted in RESOURCEUSAGE containing the maximum, minimum, and averages of the last six updates for that combination of HOSTID and HEAD_HOSTID. The StartTimestamp column of this table can be used to sort the entries into time order. The RESOURCEUSAGESYSTEM view For each column of RESOURCESNAP that holds system counters, there are three columns, with the suffixes Avg, Max, and Min. So CPUPctUserAvg is the average value derived form the CPUPctUser column, and MemFreeKBPhysicalMin is the minimum value of the MemFreeKBPhysical column over the period as recorded between StartTimestamp and EndTimestamp. Listing 16. SQL Example: Querying the RESOURCEUSAGESYSTEM view
-- List average user CPU percentage usage for engine system H -between times YYYY-MM-D1 HH:MM:SS and YYYY-MM-D2 HH:MM:SS. SELECT RS.StartTimestamp, RS.CPUPctUserAvg FROM DSODB.RESOURCEUSAGESYSTEM AS RS JOIN DSODB.HOST AS H ON RS.HEAD_HOSTID = H.HOSTID WHERE H.HostName= "H" AND RS.StartTimeStamp >= "YYYY-MM-D1 HH:MM:SS" AND RS.EndTimeStamp <= "YYYY-MM-D2 HH:MM:SS"
The RESOURCEUSAGEDISKS view This allows you to query the average, minimum, and maximum number of free kilobytes for each monitored disk path over each monitored period.
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 18 of 24
ibm.com/developerWorks/
developerWorks
The MASTERREF table This contains a row for each distinct value of each enumerated type. The combination of the Enumeration and Code columns forms the primary key that can be used by the following views to look up a specific code value. The Name and Description columns then give you correspondingly longer strings that can be used in reports instead of the code value itself. The xxxxREF Views Each view corresponds to a table and column that contains a code, as shown in Table 2.
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 19 of 24
developerWorks
ibm.com/developerWorks/
For example, if a row of the JOBRUN table contains "FIN" in the RunMajorStatus column, this can be looked up via the RUNMAJORSTATUSREF view as Name = "Finished" and Description = "Run has finished". A RunMinorStatus of "FWF" can be looked up via RUNMINORSTATUSREF as Name = "Finished aborted" and Description = "Run has finished and logged at least one fatal message". Listing 18. SQL Example: Querying the JOBRUN table using lookups against reference views
-- List the job names , job types, run types, and minor status names -of all runs on host H that have finished. SELECT X.ProjectName, X.JobName, JT.JobTypeName, RT.RunTypeName, R.RunStartTimestamp, R.RunEndTimestamp, SR.MinorStatusName FROM DSODB.JOBRUN AS R JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID JOIN DSODB.JOBTYPEREF AS JT ON X.JobType = JT.JobTypeCode JOIN DSODB.RUNTYPEREF AS RT ON R.RunType = RT.RunTypeCode JOIN DSODB.RUNMINORSTATUSREF AS SR ON R.RunMinorStatus = SR.MinorStatusCode WHERE H.HostName = "H" AND R.RunMajorStatus = "FIN" ORDER BY R.RunStartTimestamp
Conclusion
We have examined how the tables that make up the Operations Database schema contain rows that describe job runs, host systems, and system resource usage. We have described the identifying key fields for the main tables, some of the other important fields, and how foreign key fields are used to join other tables to them. We have given several examples of real-world questions that can be answered by appropriate use of these fields.
Acknowledgements
Thanks go to the members of the DataStage Operations Console development team who ensured that the schema got ported to the various flavors of database
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction Page 20 of 24
ibm.com/developerWorks/
developerWorks
supported: DB2 LUW 9.5 and 9.7, Oracle 10g and 11gR2, Microsoft SQL Server 2005 and 2008, on the various Windows and UNIX platforms. Thanks also to the QA team who patiently checked that data could indeed be inserted and retrieved correctly from all those variants.
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 21 of 24
developerWorks
ibm.com/developerWorks/
Resources
Learn The operations database schema is documented on the IBM InfoCenter. IBM InfoSphere Information Server 8.7 What's New whitepaper, containing details of the Operations Console. IBM DB2 9.7 for Linux, UNIX, and Windows Information Center and SQL reference. Learn more about Information Management at the developerWorks Information Management zone. Find technical documentation, how-to articles, education, downloads, product information, and more. Stay current with developerWorks technical events and webcasts. Follow developerWorks on Twitter. Get products and technologies Build your next development project with IBM trial software, available for download directly from developerWorks. Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications. Discuss Check out the developerWorks blogs and get involved in the developerWorks community.
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 22 of 24
ibm.com/developerWorks/
developerWorks
Arron Harden Arron Harden is a senior software engineer for IBM InfoSphere DataStage and QualityStage. Staying with the DataStage product after several mergers and acquisitions, he has worked on DataStage for more than 12 years, joining IBM through the acquisition of Ascential Software Inc in 2005. Having spent a year working in Boston, he is currently based in the United Kingdom, working at the IBM Milton Keynes office. In his most recent role, he was the lead developer for the web application component of the DataStage and QualityStage Operations Console, written using the Dojo toolkit.
Geoff McClean Geoff McClean was on the original DataStage development team at its inception and is currently a senior software developer for core components of the IBM InfoSphere DataStage and QualityStage development and production tools, part of the IBM InfoSphere Information Server suite. He oversaw the implementation of the database management, event handling, and resource tracking services of the IBM InfoSphere DataStage and QualityStage Operations Console.
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 23 of 24
ibm.com/developerWorks/
Sumit Kumar has 13 years of industry experience, including the financial, banking, telecom, supply-chain management, insurance, and healthcare domains. He has worked with IBM InfoSphere Information Server from early 2010, including the Operations Console feature of Information Server, from the beginning of its design and development phase as a key contributor. He was involved in the implementation of most of the public APIs and service layers of the Operations Console. Copyright IBM Corporation 2012 (www.ibm.com/legal/copytrade.shtml) Trademarks (www.ibm.com/developerworks/ibm/trademarks/)
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Page 24 of 24