Title | Contents | Previous | Next | Index

Appendix D

Developing Performance-Optimized ODBC Applications

Developing performance-optimized, ODBC applications is not easy. Microsoft's ODBC 3.0 Programmer's Reference does not provide information about system performance. In addition, ODBC drivers and the ODBC driver manager do not return warnings when applications run inefficiently.

This appendix provides general guidelines for optimizing system performance that have been compiled by examining how numerous shipping ODBC applications have been implemented. These guidelines are divided into the following categories:

Catalog Functions

The following ODBC functions are catalog functions:

  • SQLColumns
  • SQLColumnPrivileges
  • SQLForeignKeys
  • SQLGetTypeInfo
  • SQLProcedures
  • SQLProcedureColumns
  • SQLSpecialColumns
  • SQLStatistics
  • SQLTables
  • SQLTablePrivileges

SQLGetTypeInfo is listed as a potentially performance-expensive ODBC function, because many drivers must query the server to obtain accurate information about which data types are supported (for example, to find dynamic data types such as user-defined types).

Catalog Functions Are Slow

Because catalog functions are slow compared to other ODBC functions, their frequent use can impair system performance. Although it is almost impossible to write an ODBC application without using catalog functions, you can improve performance by minimizing their use.

To return all result column information mandated by the ODBC specification, an ODBC driver may have to perform multiple queries, joins, subqueries, and unions to return the required result set for a single call to a catalog function. These particular elements of the SQL language are performance "hogs."

Applications should cache information from catalog functions so that multiple executions are unnecessary. For example, call SQLGetTypeInfo once in the application and cache the elements of the result set on which your application depends. It is unlikely that an application will use every element of the result set generated by a catalog function, so the cached information should not be difficult to maintain.

Passing Null Arguments

Passing null arguments to catalog functions generates time-consuming queries. In addition, network traffic may increase because of unnecessary result set information. Always supply as many non-null arguments to catalog functions as possible.

Because catalog functions are slow, applications should invoke them efficiently. Many applications pass the minimum number of non-null arguments required for the function to return a successful result set.

For example, consider a call to SQLTables where the application requests information about the table "Customers." Often, this call is coded as shown:

rc = SQLTables (NULL, NULL, NULL, NULL, "Customers", SQL_NTS, 
NULL);



A driver may process this SQLTables call into SQL as shown:

SELECT ... FROM SysTables WHERE TableName = `Customers' UNION ALL



SELECT ... FROM SysViews WHERE ViewName = `Customers' UNION ALL



SELECT ... FROM SysSynonyms WHERE SynName = `Customers'



ORDER BY ...



Sometimes, not much information is known about the object for which you are requesting information. Any information that the application can send the driver when calling catalog functions can result in improved performance and reliability.

Using the previous example, suppose three "Customers" tables were returned in the result set:

It may not be obvious to the user which table to choose. If the application had specified the OwnerName argument for the SQLTables call, only one table would be returned and performance would improve, because less network traffic was required to return only one result row and unnecessary rows were filtered by the database. In addition, if the TableType argument can be supplied, the SQL sent to the server can be changed from a three-query union to a single Select statement as shown:

SELECT ... FROM SysTables WHERE TableName = `Customers' and Owner 
= `Beth'



SQLColumns

Avoid using SQLColumns to determine table characteristics. Instead, use a dummy query with SQLDescribeCol.

Consider an application that allows the user to choose columns. Should the application use SQLColumns to return information about the columns to the user or prepare a dummy query and call SQLDescribeCol?

Case 1: SQLColumns Method

rc = SQLColumns (... "UnknownTable" ...);



// This call to SQLColumns will generate a query to the system



// catalogs... possibly a join which must be prepared,



// executed, and produce a result set



rc = SQLBindCol (...);



rc = SQLExtendedFetch (...);



// user must retrieve N rows from the server



// N = # result columns of UnknownTable



// result column information has now been obtained



Case 2: SQLDescribeCol Method

// prepare dummy query 



rc = SQLPrepare (... "SELECT * from UnknownTable



    WHERE 1 = 0" ...);



// query is never executed on the server - only prepared



rc = SQLNumResultCols (...);



for (irow = 1; irow <= NumColumns; irow++) {



   rc = SQLDescribeCol (...)



   // + optional calls to SQLColAttributes



   }



// result column information has now been obtained



// Note we also know the column ordering within the table!



// This information cannot be



// assumed from the SQLColumns example.



In both cases, a query is sent to the server. In Case 1, the query must be evaluated and return a result set to the client. Case 2 is the better performing model.

To complicate this discussion, consider a database server that does not support natively preparing a SQL statement. The performance of Case 1 would not change, but the performance of Case 2 would improve slightly, because the dummy query is evaluated before being prepared. Because the Where clause of the query always evaluates to FALSE, the query generates no result rows and is processed without accessing table data. Again, Case 2 performs better than Case 1.

Retrieving Data

This section provides guidelines for retrieving data with ODBC applications.

Retrieving Long Data

Unless it is necessary, applications should not request long data (SQL_LONGVARCHAR and SQL_LONGVARBINARY data), because retrieving long data across a network is slow and resource-intensive.

Most users do not want to see long data. If the user does need to see these result items, the application can query the database again, specifying only the long columns in the select list. This method allows the average user to retrieve result sets without having to pay a high performance penalty for network traffic.

Although the best method is to exclude long data from the select list, some applications do not formulate the select list before sending the query to the ODBC driver (for example, some applications select * from <table name> ...). If the select list contains long data, the driver must retrieve that data at fetch time, even if the application does not bind the long data in the result set. When possible, the application developer should use a method that does not retrieve all columns of the table.

Reducing the Size of Retrieved Data

To reduce network traffic and improve performance, you can reduce the size of data being retrieved to a manageable limit by calling SQLSetStmtOption with the SQL_MAX_LENGTH option. Although eliminating SQL_LONGVARCHAR and SQL_LONGVARBINARY data from the result set is ideal for performance optimization, sometimes, retrieving long data is necessary. When it is necessary, remember that most users do not want to see 100 KB, or more, of text on the screen. What techniques, if any, are available to limit the amount of data retrieved?

Many application developers mistakenly assume that if they call SQLGetData with a container of size x, the ODBC driver only retrieves x bytes of information from the server. Because SQLGetData can be called multiple times for any one column, the driver optimizes its network use by retrieving long data in large chunks and returning it to the user when requested. For example:

char CaseContainer[1000];



...



rc = SQLExecDirect (hstmt, "SELECT CaseHistory FROM Cases WHERE



   CaseNo = 71164", SQL_NTS);



...



rc = SQLFetch (hstmt);



rc = SQLGetData (hstmt, 1, CaseContainer,(SWORD)



sizeof(CaseContainer), ...);



At this point, it is more probable that an ODBC driver will retrieve 64 KB of information from the server, rather than 1000 bytes. One 64-KB retrieval requires less network traffic than sixty-four, 1000-byte retrievals. Unfortunately, the application may not call SQLGetData again; thus, the first and only retrieval of CaseHistory would be slowed by the fact that 64 KB of data had to be sent across the network.

Many ODBC drivers allow you to limit the amount of data retrieved across the network by using the statement attribute SQL_MAX_LENGTH. This attribute allows the driver to communicate to the database server that only Z bytes of data are relevant to the client. The server responds by sending only the first Z bytes of data for all result columns. This optimization substantially reduces network traffic and improves performance. Our example returned only one row, but, consider the case where 100 rows are returned in the result set--the performance improvement would be substantial.

Using Bound Columns

Retrieving data using bound columns (SQLBindCol), instead of SQLGetData, reduces the ODBC call load and improves performance.

Consider the following example:

rc = SQLExecDirect (hstmt, "SELECT <20 columns> FROM Employees



   WHERE HireDate >= ?", SQL_NTS);



do {



rc  = SQLFetch (hstmt);



// call SQLGetData 20 times



} while ((rc == SQL_SUCCESS) || (rc == SQL_SUCCESS_WITH_INFO));



Suppose the query returns 90 result rows. More than 1890 ODBC calls are made (20 calls to SQLGetData x 90 result rows + 91 calls to SQLFetch).

Consider the same scenario that uses SQLBindCol, instead of SQLGetData:

rc = SQLExecDirect (hstmt, "SELECT <20 columns> FROM Employees



   WHERE HireDate >= ?", SQL_NTS);



// call SQLBindCol 20 times



do {



rc  = SQLFetch (hstmt);	 



} while ((rc == SQL_SUCCESS) || (rc == SQL_SUCCESS_WITH_INFO));



The number of ODBC calls made is reduced from more than 1890 to about 110 (20 calls to SQLBindCol + 91 calls to SQLFetch). In addition to reducing the call load, many ODBC drivers optimize how SQLBindCol is used by binding result information directly from the database server to the user's buffer. That is, instead of the ODBC driver retrieving information into a container and copying that information to the user's buffer, the ODBC driver simply requests the information from the server be placed directly into the user's buffer.

Using SQLExtendedFetch Instead of SQLFetch

By using SQLExtendedFetch to retrieve data, instead of SQLFetch, the ODBC call load decreases, resulting in better performance, and the code becomes less complex, resulting in more easily maintainable code.

Most ODBC drivers support SQLExtendedFetch for forward only cursors; yet, most ODBC applications use SQLFetch to retrieve data. Again, consider the previous example using SQLExtendedFetch, instead of SQLFetch:

rc = SQLSetStmtOption (hstmt, SQL_ROWSET_SIZE, 100);



// use arrays of 100 elements



rc = SQLExecDirect (hstmt, "SELECT <20 columns> FROM



   Employees WHERE HireDate >= ?", SQL_NTS);



// call SQLBindCol 1 time specifying row-wise binding



do {



rc = SQLExtendedFetch (hstmt, SQL_FETCH_NEXT, 0, &RowsFetched,



   RowStatus);



} while ((rc == SQL_SUCCESS) || (rc == SQL_SUCCESS_WITH_INFO));



In this second example, the number of ODBC calls made by the application is reduced from 110 to 4 (1 SQLSetStmtOption + 1 SQLExecDirect + 1 SQLBindCol + 1 SQLExtendedFetch)--a total savings from an initial call load of more than 1890 ODBC calls to 4 ODBC calls. In addition to reducing the call load, many ODBC drivers retrieve data from the server in arrays, further improving performance by reducing network traffic.

For ODBC drivers that do not support SQLExtendedFetch, the application can enable forward-only cursors using the ODBC cursor library (call SQLSetConnectOption using SQL_ODBC_CURSORS/ SQL_CUR_USE_IF_NEEDED).

Although using the cursor library does not improve performance, it should not be detrimental to your application's response time when using forward-only cursors (no logging is required). Furthermore, using the cursor library when SQLExtendedFetch is not supported natively by the ODBC driver simplifies the code, because the application can depend on SQLExtendedFetch being available. The application does not require two algorithms (one using SQLExtendedFetch and another using SQLFetch).

ODBC Function Selection

This section provides guidelines for selecting which ODBC functions will provide the best performance.

Using SQLPrepare/SQLExecute and SQLExecDirect

Do not assume that SQLPrepare/SQLExecute is always as efficient as SQLExecDirect. Use SQLExecDirect for queries that will be executed once and SQLPrepare/SQLExecute for queries that will be executed multiple times.

ODBC drivers are optimized based on the perceived use of the functions that are being executed. SQLPrepare/SQLExecute is optimized for multiple executions of a statement that uses parameter markers. SQLExecDirect is optimized for a single execution of a SQL statement. Unfortunately, more than 75 percent of all ODBC applications use SQLPrepare/SQLExecute exclusively.

Consider an ODBC driver that implements SQLPrepare by creating a stored procedure on the server which contains the prepared statement. Creating stored procedures has substantial overhead, but the statement will be executed multiple times. Although creating stored procedures is performance-expensive, processing is minimal because the query is parsed and optimization paths are stored when the procedure is created.

Using SQLPrepare/SQLExecute for a statement that will be executed only once results in unnecessary overhead. Furthermore, applications that use SQLPrepare/SQLExecute for large, single-execution query batches will probably exhibit poor performance. Similarly, applications that use SQLExecDirect exclusively do not perform as well as those that use a logical combination of SQLPrepare/SQLExecute and SQLExecDirect sequences.

Using SQLPrepare and Multiple SQLExecute Calls

Applications that use SQLPrepare and multiple SQLExecute calls should use SQLParamOptions. Passing arrays of parameter values reduces the ODBC call load and network traffic.

Consider the following example that inserts data:

rc = SQLPrepare (hstmt, "INSERT INTO DailyLedger (...) VALUES



   (?,?,...)", SQL_NTS);



// bind parameters



...



do {



// read ledger values into bound parameter buffers



...



rc = SQLExecute (hstmt);      // insert row



} while ! (eof);



If there are 100 rows to insert, SQLExecute is called 100 times, resulting in 100 network requests to the server. Alternatively, consider an algorithm that uses parameter arrays by calling SQLParamOptions:

rc = SQLPrepare (hstmt, "INSERT INTO DailyLedger (...) VALUES



   (?,?,...)", SQL_NTS);



rc = SQLParamOptions (hstmt, (UDWORD) 50, &CurrentRow);



// pass 50 parameters per execute



// bind parameters



...



do {



// read up to 50 ledger values into bound parameter buffers



...



rc = SQLExecute (hstmt);      // insert row



The call load is reduced from 100 to 2 SQLExecute calls. Furthermore, network traffic is reduced considerably. Some ODBC drivers do not support SQLParamOptions. To achieve the best performance, applications should contain algorithms for using SQLParamOptions. SQLParamOptions is ideal for copying data into new tables or bulk loading tables.

Using the Cursor Library

If scrollable cursors are provided by the driver, do not use the cursor library automatically. The cursor library creates local temporary log files, which are performance-expensive to generate and provide worse performance than native scrollable cursors.

The cursor library provides support for static cursors, which simplifies the coding of applications that use scrollable cursors; however, the cursor library creates temporary log files on the user's local disk drive as it performs the task. Typically, disk input/output is one of the slowest operations on PCs. Although the cursor library is beneficial, applications should not choose automatically to use the cursor library when an ODBC driver supports scrollable cursors natively.

Typically, ODBC drivers that support scrollable cursors achieve better performance by requesting that the database server produce a scrollable result set, instead of emulating this ability by creating log files.

Many applications use:

rc = SQLSetConnectOption (hdbc, SQL_ODBC_CURSORS,



   SQL_CUR_USE_ODBC);



but shoul d use:

rc = SQLSetConnectOption (hdbc, SQL_ODBC_CURSORS,



   SQL_CUR_USE_IF_NEEDED);



Designing ODBC Applications

This section provides guidelines for designing ODBC applications.

Managing Connections

Connection management affects application performance. Developers should optimize their applications by connecting once and using multiple statement handles, instead of performing multiple connections. Most ODBC applications contain poorly designed elements for connection management. Avoid connecting to a data source after establishing an initial connection.

Some ODBC applications are designed to call informational gathering routines that have no record of attached connection handles. For example, some applications establish a connection and call a routine in a separate DLL or shared library that reattaches and gathers information about the driver.

Although gathering ODBC driver information at connection is a good practice, it should not be minimized by connecting twice. One popular commercial ODBC application connects a second time to gather driver information, and never disconnects the second connection. Applications that are designed as separate entities should pass the connected HDBC pointer to the data collection routine, instead of establishing a second connection.

Another bad practice is to connect and disconnect several times throughout your application to perform SQL statements. Connection handles can have multiple statement handles associated with them. Statement handles can provide memory storage for information about SQL statements; therefore applications do not need to allocate new connection handles to perform SQL statements. Applications should use statement handles to manage multiple SQL statements.

Connection and statement handling should be addressed before implementation. Spending time and thoughtfully handling connection management improves application performance and maintainability.

Committing Data

Committing data is extremely disk input/output intensive and slow. If the ODBC driver can support transactions, turn Autocommit off.

What does a commit involve? The database server must flush back to disk every data page containing updated or new data. This is not a sequential write, but a searched write to replace existing data in the table. By default, Autocommit is on when connecting to a data source, and Autocommit mode usually impairs performance because of the significant amount of disk input/output required to commit every operation.

Furthermore, some database servers do not provide an Autocommit mode. For this server type, the ODBC driver must explicitly issue a COMMIT statement and a BEGIN TRANSACTION for every operation sent to the server. In addition to the large amount of disk input/output required to support Autocommit mode, a performance penalty is paid for up to three network requests for every statement issued by an application.

Updating Data

This section provides guidelines for updating data in databases.

Using Positional Updates and Deletes

Although positional updates do not apply to all types of applications, developers should attempt to use positional updates and deletes when it makes sense. Positional updates (using "update where current of cursor" or using SQLSetPos) allow the developer to signal the ODBC driver to "change the data here" by positioning the database cursor to the appropriate row to be changed. The developer is not forced to build a complex SQL statement and simply supplies the data that will be changed.

In addition to making the application more easily maintainable, positional updates usually result in improved performance. Because the database server is already positioned on the row for the Select statement in process, performance-expensive operations to locate the row to be changed are not needed. If the row must be located, the server usually has an internal pointer to the row available (for example, ROWID).

Using SQLSpecialColumns

Use SQLSpecialColumns to determine the optimal set of columns to use in the Where clause for updating data. Often, pseudo-columns provide the fastest access to the data, and these columns can only be determined by using SQLSpecialColumns.

Some applications cannot be designed to take advantage of positional updates and deletes. These applications usually update data by using a Where clause consisting of some subset of the column values returned in the result set. Some applications may formulate the Where clause by using all searchable result columns or by calling SQLStatistics to find columns that may be part of a unique index. These methods usually work, but may result in fairly complex queries.

Consider the following:

rc = SQLExecDirect (hstmt, "SELECT first_name, last_name, ssn,



   address, city, state, zip FROM emp", SQL_NTS);



// fetchdata



...



rc = SQLExecDirect (hstmt, "UPDATE EMP SET ADDRESS = ?



   WHERE first_name = ? and last_name = ? and ssn = ?



   and address = ? and city = ? and state = ? and zip = ?",



   SQL_NTS);



// fairly complex query



Applications should call SQLSpecialColumns/SQL_BEST_ROWID to retrieve the optimal set of columns (possibly a pseudo-column) that identifies a specific record. Many databases support special columns that are not explicitly defined by the user in the table definition but are "hidden" columns of every table (for example, ROWID and TID). These pseudo-columns usually provide the fastest access to the data, because they usually point to the exact location of the record. Because pseudo-columns are not part of the explicit table definition, they are not returned from SQLColumns. To determine if pseudo-columns exist, call SQLSpecialColumns.

Consider the previous example again:

...



rc = SQLSpecialColumns (hstmt, ..... `emp', ...);



...



rc = SQLExecDirect (hstmt, "SELECT first_name, last_name, ssn,



   address, city, state, zip, ROWID FROM emp", SQL_NTS);



// fetch data and probably "hide" ROWID from the user



...



rc = SQLExecDirect (hstmt, "UPDATE emp SET address = ? WHERE



   ROWID = ?", SQL_NTS);



// fastest access to the data!



If your data source does not contain special pseudo-columns, the result set of SQLSpecialColumns consists of the columns of the optimal unique index on the specified table (if a unique index exists); therefore, your application does not need to call SQLStatistics to find the smallest unique index.



Title | Contents | Previous | Next | Index

SequeLink ODBC Driver Reference for Windows 9x, Windows NT, UNIX, and Macintosh

1999, MERANT. All rights reserved.