Previous  |  Next  >  
Product: Cluster Server Guides   
Manual: Cluster Server 4.1 User's Guide   

Can My Application be Clustered?

Most applications can be placed under cluster control provided the basic guidelines are met:

  • Defined start, stop, and monitor procedures.
  • Ability to restart in a known state.
  • Ability to store required data on shared disks.
  • Adherence to license requirements and host name dependencies.

Defined Start, Stop, and Monitor Procedures

The application to be clustered must have defined procedures for starting, stopping, and monitoring.

Defined Start Procedure

The application must have a command to start it and all resources it may require, such as mounted file systems, IP addresses, etc. VCS brings up the required resources in a specific order, then brings up the application using the defined start procedure.

For example, to start an Oracle database, VCS first brings the required storage and file systems online, then the database instance. To start the instance, VCS must know which Oracle utility to call, such as sqlplus. To use this utility properly, VCS must also know the Oracle user, instance ID, Oracle home directory, and the pfile.

Defined Stop Procedure

An individual instance of the application must be capable of being stopped without affecting other instances. For example, killing all HTTPd processes on a Web server is unacceptable because it would also stop other Web servers. Instead, the application must have a defined procedure for stopping a single instance.

In many cases, a method to "clean up" after an application must also be identified. If VCS cannot stop an application cleanly, it may call for a more forceful method, like a kill signal. After a forced stop, the clean-up procedure may also be required for various process- and application-specific items left behind, such as shared memory segments or semaphores.

Defined Monitor Procedure

The application must have a monitor procedure that determines if the specified application instance is healthy. The application must allow individual monitoring of unique instances.

For example, the monitor procedure for a Web server connects to the specified server and verifies that it is serving Web pages. In a database environment, the monitoring application can connect to the database server and perform SQL commands to verify read and write to the database. In both cases, end-to-end monitoring is a far more robust check of application health. The closer a test comes to matching what a user does, the better the test is in discovering problems. However, there is a tradeoff: end-to-end monitoring increases system load and may increase system response time. The level of monitoring should be carefully balanced between ensuring the application is up and minimizing monitor overhead.

Ability to Restart the Application in a Known State

When the application is taken offline, it must close out all tasks, store data properly on shared disk, and exit. Stateful servers must not keep that state of clients in memory. States should be written to shared storage to ensure proper failover.

Commercial databases such as Oracle, Sybase, or SQL Server are perfect examples of well-written, crash-tolerant applications. On any client SQL request, the client is responsible for holding the request until it receives acknowledgement from the server. When the server receives a request, it is placed in a special redo log file. The data is confirmed as being written to stable disk storage before acknowledging the client. After a server crashes, the database recovers to the last-known committed state by mounting the data tables and applying the redo logs. This returns the database to the time of the crash. The client resubmits any outstanding client requests unacknowledged by the server, and all others are contained in the redo logs. Note the cooperation between the client application and the server. This must be factored in when assessing whether the application is cluster-compatible.

If an application cannot recover gracefully after a server crashes, it cannot run in a cluster environment. The takeover server cannot start up because of data corruption and other problems.

 ^ Return to Top Previous  |  Next  >  
Product: Cluster Server Guides  
Manual: Cluster Server 4.1 User's Guide  
VERITAS Software Corporation
www.veritas.com