SQLBase Performance - Gupta Technologies
SQLBase Performance - Gupta Technologies
By Raj Parmar
Sales Engineer
September 2004
Table of Contents
Introduction ..............................................................4
We will concentrate on how the data is stored and organized, what files
are used to ensure database consistency and what techniques are used
to improve overall database performance in a multi-user environment.
We will also look at improving performance through the use of compiled
queries held on the database plus efficient use of message buffers.
SQLBase will run on Windows and Linux. Let’s start the discussion on
what attributes of the Operating System can help us achieve better
performance.
First, if the OS has multiple controllers – use ones that support multi-
threaded I/O.
The database (.dbs) and the log file(s) (.log) constitute the physical
database. In addition, temporary and history files (.tmp) are used for
aggregate type functions and read-only isolation levels respectively.
Database size
The extent size for the database indicates how much the database will
be extended when it runs out of room. This value is typically around
100K. With a large database with lots of growth, you might want it to
expand by 10-20 MB, which can reduce disk fragmentation of the file.
The advantages of using cache and check pointing in this way are that
pages are buffered in memory with the potential of multiple processes
accessing the same data and sequential reads and heavily accessed
tables may be in-memory, increasing performance. This is very
effective for non-fragmented tables during sequential reads.
Cache
You need to experiment and find what works best for your server
Cache and application mix!
Sortcache
settings are Remember that ‘the optimum’ may change as new applications and
dependant on processes are installed later utilizing more memory. You should keep
types an eye on potential problems before they become real problems.
transactions and
may be set to Use SQLConsole and SQLPerformance tools to monitor physical I/O and
minimize number cache-hit ratios for cache efficiency.
of physical I/O’s
Suggested Cache settings are one fourth of the physical memory.
Sortcache
The sortcache is used when creating indexes and when any sorting of
the retrieved data is done, i.e. when doing 'select' statements using
'order by', 'group by' or 'distinct'. It is used on a per user basis when
it's needed and then the memory is released.
Since this is a per-user setting care should be taken not to set this
value too high as it can create memory-starved situations.
PCTFREE
This is the percent of each row, which is reserved for increase in row
data. Used when rows are updated with longer data, NULL columns
updated with non-NULL data and when new columns are added to the
table. The free space is calculated for each row based on the size of the
actual data, i.e. Character columns are variable length and if only 10
bytes are entered in a 30-character column, the actual size used for the
calculation is 10. The free space is placed directly after the inserted
row.
When increasing the row’s length exceeds a row’s PCTFREE area, Extent
pages are created and the data is moved to the Extent page. Extent
pages force additional I/O operations due to the requirement of
reading/writing to two pages instead of one and can significantly affect
performance.
A smaller PCTFREE value yields more rows per page and higher cache
hit ratios, better performance during sequential reads and smaller
database size. It does have the disadvantage of increased contention.
A large PCTFREE value yields fewer rows per page and larger database
size with decreased contention. It generally also decreases
performance.
If you know the data field is to increase once the row has been built it
may be a good idea to increase PCTFREE.
Extent pages
For example, if you define a column as char(35), but only insert a value
of 3 characters, SQLBase will only allocate storage for the 3 characters.
The table is initially populated with rows where the character columns
have null values or small strings. In the future, the character columns
will likely have larger values, closer to their column size. By specifying
an appropriate PCTFREE parameter, you can reserve space in the table
page for future row growth within the table page. This minimizes the
need to allocate extent pages when a row increases in size.
The size of the logfiles is not dependent on the size of the database, but
on the amount of log activity that you have. The maximum log file size
can be set with the SQLTalk SET LOGFILESIZE command or the sqlset
API function. When the log file grows past this size, SQLBase closes it
and creates a new log file. The default size of the log file is 1 megabyte.
The smallest size is 100,000 bytes. Once set, it will stay this size until
an unload/load is done.
If you're creating 50-60 log files a day and your log files are 1 MB, you
might look to setting the logfilesize to 10 MB. You must also think
about how often a backup is done. When you do a backup, the log file
is 'rolled over' to a new log, regardless of how much of the current log
is used.
A large log file will improve database performance slightly because log
files will need to be created less often. However, if the log file is too
Log files are large, it wastes disk space.
essential for
Transaction By default, the current log file grows in increments of 10 percent of its
Integrity and current size. This uses space conservatively but can lead to a
fragmented log file. The size of the log file can be pre-allocated for
Backup. Settings added performance. This will build the log file at the full size specified
can be tuned for
individual servers
on creation and not grow incrementally. To use, ‘set logfileprealloc=1’
in the sql.ini file.
Although you may think you have plenty of disk space, it may not be
adequate to fulfil SQLBase recovery requirements. The physical disk
space required for recovery is twice the size of the data required for
processing. SQLBase also requires space allowance for a second
recovery attempt, should the first fail.
For the SQLBase optimizer to make the most cost effective and efficient
decisions when preparing and formulating a query it needs accurate and
current statistics. Update Statistics:
If a lot of tables show in your results or say (15 – 20%) of a table has
extent pages then a reorganization procedure needs to be done.
Forcing Statistics
For example, in systables you can find the page-count (number of data
pages in a table) and extent page counts. In sysindexes the number of
overflow pages allocated for an underlying table using a hashed index is
shown.
REORGANIZE procedure
Over time a database can get fragmented if it does not have contiguous
disk space and tables can get fragmented due to modifications to the
data. We need to defragment!
CONNECT <databasename>
UNLOAD … ON SERVER
DISCONNECT ALL
SET SERVER servername/password
DROP DATABASE
CREATE DATABASE
CONNECT <databasename>
SET RECOVERY OFF
LOCK DATABASE
LOAD … ON SERVER
UPDATE STATISTICS
UNLOCK DATABASE
SET RECOVERY ON
DISCONNECT ALL
Better to turn off Referential Integrity. and drop indexes and recreate
them after the bulk operation.
After bulk deletes, be sure to run your REORGANIZE scripts (as above),
as the pages are not returned to the free page pool leading to incorrect
statistics.
For DBA related activities e.g. ‘load’, ‘check database’ etc. a single user
can have an exclusive lock on the entire database. No new connections
would be allowed.
Timeout mechanism
For a detailed look at the timeout and locking activity you can use the
‘START AUDIT', category 8 feature’ which provides an audit file of lock
manager, deadlock and timeout activity. Note, other ‘Audit’ categories
are available for tracing including monitoring who logs on to a database,
what tables they access, or record command execution time.
The auditing will have its own effect on performance so use judiciously.
You can also use the SQLBase server process screen to check server
status, process and system activity for up to 4 levels of tracing.
Stored Procedures
Server side
compiled A sequence of GUPTA Team Developer SAL statements that can be
assigned a name, compiled, and used immediately or (optionally) stored
procedures can
in SQLBase.
dramatically
improve
performance and
reduce
maintenance
effort
The logic uses SAL procedural statements for flow control and a subset
of SAL Sql* functions with the ability to accept input and output
parameters and call external functions. Note, an instance of a
procedure is associated with a single application cursor and is executed,
fetched and closed using the same cursor.
Types of Procedures
External Functions
Triggers
Trace statement
The Trace statement prints the value of one or more variables to the
Process Activity screen on the server. Embed Trace statements in the
Actions section of a procedure.
In SQLTalk you can “SET TRACE ON | OFF” which will display ALL
statements in the procedure's Actions section on the process activity
screen before execution.
Monitoring Tools
SQLConsole
Gives a picture of the whole server. Can be used to monitor cursor and
locking activity, virtual: Physical I/O Ratios, Cache Hit Ratio and many
SQLConsole is other useful figures.
invaluable in
analyzing the Some examples to aid performance assessment are:
actual behavior
of transactions How many processes are running concurrently? If >100 the server
and other activity is probably overloaded.
on the server Are process switches > 100? This means that the server is CPU-
bound.
What isolation levels are being used? Most often RL.
How many cursors are attached? If >400 then server probably
overloaded.
Are there any ‘old’ cursors not marked as ‘inactive’? Old cursors that
may be in ‘fetched’ or other operation state that have been ‘hanging
around’ mean the command has not been committed and could be
holding locks.
Are there any large SQL Costs shown? If so, the query is poorly
designed so check query plans. SQLPerfomance tool can aid to
check frequency of query.
What isolation level is being used? This should normally be in
Release Locks (RL) mode. Check for other isolation levels – for
Read-only mode example, Read Repeatability (RR) may show for a SQLTalk user who
has not set the default isolation level for a session.
is a huge
performance SQL.INI settings
‘hog’
Check if readonly=1 set? If so, do you really need any database on the
server to operate in read-only mode? If not, remove the line. It can
hugely affect performance due to needing to maintain ‘history’ files for
transactions. If you need to have one database operate in read-only
mode, you can set this for that database only using the SQLTalk ‘set
readonly on’ command.
You also need to check that all client DLL versions match the server
version, e.g. check versions of SQLWNTM.DLL and SQLBAPIW.DLL.
groupcommit
or
The number of ticks that have elapsed since the last commit is greater
than the value of groupcommitdelay (see below).
groupcommitdelay
optimizefirstfetch
Used to set the optimization method for the first fetch. When set, the
keyword instructs the optimizer to pick a query execution plan that
takes the least amount of time to fetch the first row of the result set.
The valid values for this keyword are 0 and 1. When optimizefirstfetch
is set to 0, SQLBase optimizes the time it takes to return the entire
result set. When optimizefirstfetch is set to 1, SQLBase optimizes the
time it takes to return the first row.
logfileprealloc
inmessage
The behavior of
and use of input The input message buffer holds the data coming back from the server.
There is one input message buffer per client connection handle. The
and output
default size is 2000 bytes. Despite the inmessage value, SQLBase
message buffers is dynamically allocates more space when necessary as it will
affected by the automatically size to hold at least one row.
isolation level set
When fetching data, as many rows as possible are stored into one input
message buffer. Each fetch command reads the next row from the
input message buffer until the end of the message buffer is reached.
The SQL/API transparently fetches the next input buffer of rows (the
actual behavior depends on the current isolation level)
outmessage
The output message buffer holds the output from the application such
as the SQL statement to compile or rows of data to insert. Again, it is
dynamically allocated if needed. The setting does not affect
Fetchthrough
The feature is only applicable to queries against base tables and will not
work if the query involves ORDER BY, GROUP BY, JOINS, UNIONS, etc.
Study entries in the PLAN TABLE to check if temporary tables are being
created during ‘conversions’. This indicates that the optimizer is sorting
the intermediate results so try adding an index to avoid the sort
process.
Other tips:
For example,
Indexes
Do not load data in sorted order by the designated index. Data values
should be randomly distributed.
You may notice that performance is really slow when deleting from a
table that has foreign keys. This is due to the fact that foreign keys
need to be established after an index exists on the joining column in
each child table. For each table with the foreign key:
Clustered Hashed Indexes (CHI) can also be used to specify how row
data should be positioned for efficient access. Table rows are stored in
locations based on their key value (clustering). CHI’s must be created
on a table before any data is inserted. They perform best for random
row access on static tables. When the index is 'created', space is pre-
allocated in the table for the specified number of rows.
You cannot update the columns (of a table) that make up a clustered
hashed index.
Any rows that don’t fit in the pre-allocated pages are stored in overflow
pages. This will degrade performance so care must be taken to choose
the size judiciously.
Composite indexes
Based upon more than one column. They can be defined in conjunction
with any of the other index types (for example, a clustered hashed
index based upon one function and one column).
Choosing an Index
Good Candidates
Small tables: If the table has less than five pages, overheads are
more than benefits
Large batch updates and inserts: The indexes need to constantly
update thus degrading performance. Alternative is to drop the
indexes before the batch operations and recreate them.
Non-uniqueness of keys: In cases where the cardinality is high,
but there are a large number of rows with the same value, there
may be a performance penalty.
Low cardinality: The optimizer would avoid these indexes
because of low selectivity.
Many unknown values: Null values skew the distribution hence
could lead to performance penalties.
Frequently changing values: There is index maintenance
overheads, as well as locking contention on the index structure
Transaction semantics
AUTOCOMMIT
Enabling CCP maintains locks held and results sets even after a
COMMIT. The application can continue to process a result set on one
Locking
SQLBase uses Shared locks (S), Update locks (U) and Exclusive locks
(X). S and X locks behave as ‘read locks’ and ‘exclusive write locks’. A
U lock is treated as an ‘intended update’ or ‘intended exclusive’ lock.
Although S-locks and U-locks can co-exist on the same page, all S-locks
from other transactions must be released before a U-lock can be
upgraded to an X-lock.
When any kind of lock is placed on a row page, the same type of lock is
applied to any extent pages and LONG VARCHAR pages associated with
the row.
For indexes, the entire index page is locked, even if only one index
node is the subject of the lock.
The RR isolation level makes effective use of the input message buffer,
filling it with row data at the server, then sending the buffer to the
client. This is suitable for transactions that require a high degree of
consistency throughout the transaction.
As you fetch rows after executing a query, the page of the current row
is held with an S-lock. You are guaranteed stability at the cursor (cursor
in this context is the current row of the result set). When the page
changes as rows are fetched, the S-lock on the current page is released
and an S-lock for the next page is acquired.
Read-Only (RO)
Performance impact of this isolation level is huge. You also must stop
and start processes periodically to clean up history files!
Once the result set is complete and the server is ready to return control
to the client, all S-locks are released. As rows are fetched from the
result set, an S-lock is briefly placed on the page of the row to place it
in the input message buffer.
Locking strategies
Pessimistic Locking
The RR isolation level is suitable for use with pessimistic locking.
Optimistic Locking
For updates and deletes – we need to use the ROWID as a Time Stamp
to operate in an optimistic locking scenario.
If the client application is using result set mode, it can simply attempt
to re-fetch the row. (Important: Enable Fetchthrough: see the topic
earlier in this chapter) The fetch indicator value will either be 2 (row
was updated) or 3 (row was deleted).
If the client is not using result set mode, the client can execute another
SELECT based on the primary key of the row to retrieve new ROWID, if
it still exists.
Result Set Mode