C H A P T E R 1 |
Watchdog Timer |
The System Management Controller (SMC) on the Netra CP2000/CP2100 board, implements a two-level watchdog timer. The watchdog timer is used to recover the central processing unit (CPU) in case the CPU freezes.
This chapter provides detailed information on the SMC-based watchdog timer driver and its operation for the Netra CP2000/CP2100 boards. This chapter also describes the user-level application programming interface (API) and behavior of the Netra CP2000/CP2100 board watchdog timer. For functional details of the watchdog timer, see the technical reference and installation guide for your board product. See Accessing Sun Documentation for information on accessing this documentation.
This chapter includes the following sections:
There are two watchdog timers:
This section described one of the many different options the user can select regarding the actions for WD1 and WD2.
Each tick represents 100 ms. This timer, set to a nonzero number, counts down first. When the timer reaches zero, a warning is sent to the host CPU through EBus and the WD2 pre-timeout counter is set to a nonzero value when interrupt option is enabled. Otherwise the SMC resets the host CPU immediately. The reset action takes place when the reset option is enabled
Each tick represents one second. This timer is started when the countdown timer reaches zero (if WD1 is set to zero, WD2 starts right away). When the value of this counter reaches zero, the host is reset. If the hard reset option is enabled, no warning is issued prior to reset
The watchdog driver is a loadable STREAMS pseudo driver layered atop the Netra CP2000/CP2100 series service processor hardware. This driver implements a standardized watchdog timer function that can be used by systems management software for a number of systems timeout tasks.
The systems management software that uses the watchdog driver has access to two independent timers, the WD1 timer and the WD2 timer. The WD2 is the main timer and is used to detect conditions where the Solaris operating environment hangs. Systems management software starts and periodically restarts the WD2 timer before it expires. If the WD2 timer expires, the watchdog function of the WD2 timer can force the SPARC processor to reset. The maximum range for WD2 is 255 seconds. Or the WD2 timer could be set to take no action.
The WD1 timer is typically set to a shorter interval than the WD2 timer. User applications can examine the expiration status of the WD1 timer to get advance warning if the main timer, WD2, is about to expire. The system management software has to start WD1 before it can start WD2. If WD1 expires, then WD2 starts only if enabled. The maximum range for WD1 is 6553.5 seconds.
The applications programming interface exported by the watchdog driver is input output control-based (IOCTL-based). The watchdog driver is an exclusive-use device. If the device has already been opened, subsequent opens fail with EBUSY.
Operations on the watchdog timers require a call to ioctl(2) using the parameters appropriate to the operation. The watchdog driver exports Input Output Controls (IOCTLs) to start, stop, and get the current status of the watchdog timers.
When the device is initially opened, both the watchdog timers, WD1 and WD2, are in STOPPED state. To start either timer, an application program must use the WIOCSTART command. Once started, the WD1 timer can be stopped by using the WIOCSTOP command. Once started, the WD2 timer cannot be stopped--it can only be restarted. Each watchdog timer takes the default action when it expires.
If the WD1 timer expires and the default action is enabled, WD1 interrupts the SPARC processor. This interrupt is handled and the status of the WD1 timer queried shows the EXPIRED condition. If the default action is disabled, then the WD1 timer is in FREERUN state and no interrupt is delivered to the SPARC processor on expiration.
If the WD2 timer expires and the default action is enabled, WD2 resets the SPARC processor. If the default action is disabled, the WD2 timer is put in FREERUN state and its expiration does not affect the SPARC processor.
In the Netra CP2000/CP2100 series board, the SMC-based watchdog timers are not independant. The WD2 timer is a continuation of the WD1 timer. There are some behavioral consequences to this implementation that result in the Netra CP2000/CP2100 series watchdog timer having different semantics. The most obvious difference is that starting one timer when the other timer is active causes the other timer to be restarted with its programmed timeout period.
The IOCTL-based watchdog timer application programming interface (API) uses a common data structure to communicate all requests and responses between the watchdog timer driver and user applications.
Along with other API definitions, this structure is defined in the include file
sys/wd_if.h. The structure, called watchdog_if_t, is provided below for reference.
The following fields are used by the IOCTL interface. The watchdog timer driver does not use the thr_fd and thr_lock fields.
The states that each watchdog timer can assume are listed below. These states are exclusive of each other.
The counter is running, and its associated action (interrupt or system reset) is enabled. |
|
The counter is running, but no associated action is enabled. |
In addition to these states, the following modes can become attached to a timer, based on its state:
The watchdog timer driver supports the following input/output control (IOCTL) requests:
This code example retrieves the status of the watchdog timers, then starts both timers:
The watchdog device driver runs only on the following implementations:
By rule, the watchdog driver and its configuration file must reside in the platform-specific driver directory, /platform/implementation/kernel/drv. The value of implementation for a given Netra CP2000/CP2100 board system can be obtained by running the uname(1) command on that machine with the -i option:
This directory contains the wdog.conf driver configuration file. This file controls the boot-time configuration of the watchdog timer driver. The driver is configured through a directive to send a notice to syslog when the WD1 timer interrupt is serviced. The Netra CP2000/CP2100 board implementation requires that the appropriate control directive be placed in wdog.conf.
The format for this directive is as follows:
# # control to enable syslog notification when a WD1 # interrupt is handled. # handler-message="on" enables syslog notice. # handler-message="off" disables syslog notice. # handler-message="on" |
The OpenBoot PROM provides two environmental parameters, settable at the ok prompt, that control the behavior of the SMC watchdog timer.
These parameters are watchdog-enable? and watchdog-timeout?. The watchdog-enable? parameter is a logical switch with two possible values: true or false.
If watchdog-enable? is set to false,the watchdog timer is disabled at boot time,. Once the kernel is booted, applications have the option to start the watchdog timer.
If watchdog-enable? is set to true, the watchdog timer is enabled at boot time with its default actions: The WD1 timer is controlled by the value in watchdog-timeout variable. When WD1 expires it sends an asynchronous message to the local CPU. It also starts the WD2 timer. The default value for the WD2 timer is 1 second. If the WD2 timer expires, it resets the CPU board.
If the watchdog timer is enabled at boot time, it is your responsibility to ensure that an application program is run to periodically restart the WD1 timer. If you fail to do so, the timer expires. The system could be reset when the watchdog timer expires.
Refer to CODE EXAMPLE 1-1 for details on the data structure that is used with watchdog timer programs.
The watchdog operation (the local watchdog) is the watchdog that works between the host CPU and System Management Controller (SMC).
TABLE 1-1 lists the commands at OpenBoot prompt.
When watchdog reset occurs, the power module is toggled. Thus, the state of the CPU, except those stored in nonvolatile memory, will be lost. Once watchdog reset occurs after the host CPU is restarted, the host CPU must restart the watchdog timer.
The host CPU must perform a corner case. After the SMC resets the host CPU, the output buffer full (OBF) bit and OEM1 bit in the EBus status register remain set. Since this is a read-only bit, the SMC cannot reset the bit. The host must ignore the status bits and clear the OBF bit by reading one byte of data from EBus. This action must be performed after watchdog reset. Otherwise, the host CPU can inadvertently restart watchdog. For example, if the timer's values are set to very low numbers, the board can never boot to the Solaris operating system.
The SMC manages the race condition by putting interlock. The SMC does not start pre-timeout timer unless the warning is dispatched to the host CPU. The code is set up on the host side after watchdog warning is issued. Use a Keyboard Controller Style (KCS) command to clear the watchdog interrupt. Using this command is the only way to avoid the selected pre-timeout action such as hard reset. This command rewinds the watchdog timer. The host code internally manages the warning, along with the command being sent to the SMC.
If diag-switch? is set to true, the timing for watchdog can be affected.
To Set the Watchdog Timer Without Running the Pre-Timeout Timer |
The examples below are at the OpenBoot PROM level. AFter Level 1 expires the local CPU is put into reset.
1. Set the timer to 10 minutes = 600 sec = 600,000/10 msec = 0x1770.
2. Set the reload values inside the SMC:
This procedure sets the reload values of countdown timer and pre-timeout timer. Following the Level 1 expiry, there are 80 seconds before the reset action.
1. Set the timer to 80 seconds = 0x50.
Set the countdown value to 10 minutes, as in the previous procedure, and set the pre-timout timer to 80 seconds.
Copyright © 2004, Sun Microsystems, Inc. All Rights Reserved.