C H A P T E R 1 |
Watchdog Timer |
The System Management Controller (SMC) on the Netra CP2300 cPSB board implements a two-level watchdog timer. The watchdog timer is used to recover the central processing unit (CPU) in case the CPU freezes.
This chapter provides detailed information on the SMC-based watchdog timer driver and its operation for the Netra CP2300 cPSB board. This chapter also describes the user-level application programming interface (API) and behavior of the Netra CP2300 cPSB board watchdog timer. For functional details of the watchdog timer, see the technical reference and installation guide for your board product. See Accessing Sun Documentation for information on accessing this documentation.
This chapter includes the following sections:
There are two watchdog timers:
Each tick represents 100 ms. This timer, set to a nonzero number, counts down first. When the timer reaches zero, a warning is sent to the SPARC CPU through the isa bus and the WD2 pre-timeout counter is set to a nonzero value when interrupt option is enabled. Otherwise the SMC resets the SPARC CPU immediately. The reset action takes place when the reset option is enabled.
Each tick represents one second. This timer is started when the countdown timer reaches zero (if WD1 is set to zero, WD2 starts right away). When the value of this counter reaches zero, the SPARC CPU is reset. If the hard reset option is enabled, no warning is issued prior to reset.
The watchdog driver is a loadable STREAMS pseudo driver layered atop the Netra CP2300 cPSB board service processor hardware. This driver implements a standardized watchdog timer function that can be used by systems management software for a number of systems timeout tasks.
The systems management software that uses the watchdog driver has access to two independent timers, the WD1 timer and the WD2 timer. The WD2 is the main timer and is used to detect conditions where the Solaris operating environment hangs. Systems management software starts and periodically restarts the WD2 timer before it expires. If the WD2 timer expires, the watchdog function of the WD2 timer forces the SPARC processor to reset. The maximum range for WD2 is 255 seconds.
The WD1 timer is typically set to a shorter interval than the WD2 timer. User applications can examine the expiration status of the WD1 timer to get advance warning if the main timer, WD2, is about to expire. The system management software has to start WD1 before it can start WD2. If WD1 expires, then WD2 starts only if enabled. The maximum range for WD1 is 6553.5 seconds.
The applications programming interface exported by the watchdog driver is input output control-based (IOCTL-based). The watchdog driver is an exclusive-use device. If the device has already been opened, subsequent opens fail with EBUSY.
Operations on the watchdog timers require a call to ioctl(2) using the parameters appropriate to the operation. The watchdog driver exports Input Output Controls (IOCTLs) to start, stop, and get the current status of the watchdog timers.
When the device is initially opened, both the watchdog timers, WD1 and WD2, are in STOPPED state. To start either timer, an application program must use the WIOCSTART command. Once started, the WD1 timer can be stopped by using the WIOCSTOP command. Once started, the WD2 timer cannot be stopped--it can only be restarted. Each watchdog timer takes the default action when it expires.
If the WD1 timer expires and the default action is enabled, WD1 interrupts the SPARC processor. This interrupt is handled and the status of the WD1 timer queried shows the EXPIRED condition. If the default action is disabled, then the WD1 timer is in FREERUN state and no interrupt is delivered to the SPARC processor on expiration.
If the WD2 timer expires and the default action is enabled, WD2 resets the SPARC processor. If the default action is disabled, the WD2 timer is put in FREERUN state and its expiration does not affect the SPARC processor.
In the Netra CP2300 cPSB board, the SMC-based watchdog timers are not independent. The WD2 timer is a continuation of the WD1 timer. There are some behavioral consequences to this implementation that result in the Netra CP2300 cPSB board watchdog timer having different semantics. The most obvious difference is that starting one timer when the other timer is active causes the other timer to be restarted with its programmed timeout period.
The IOCTL-based watchdog timer application programming interface (API) uses a common data structure to communicate all requests and responses between the watchdog timer driver and user applications.
Along with other API definitions, this structure is defined in the include file
sys/wd_if.h. The structure, called watchdog_if_t, is provided below for reference.
The following fields are used by the IOCTL interface. The watchdog timer driver does not use the thr_fd and thr_lock fields.
The states that each watchdog timer can assume are listed below. These states are exclusive of each other.
The counter is running, and its associated action (interrupt or system reset) is enabled. |
|
The counter is running, but no associated action is enabled. |
In addition to these states, the following modes can become attached to a timer, based on its state:
The watchdog timer driver supports the following input/output control (IOCTL) requests:
This code example retrieves the status of the watchdog timers, then starts both timers:
The watchdog device driver runs only on the following implementation:
The watchdog configuration file resides in /platform/implementation/kernel/drv. The watchdog driver binary resides in /platform/implementation/kernel/drv/sparcv9. The value of implementation for a given Netra CP2300 cPSB board system can be obtained by running the uname(1) command on that machine with the -i option:
# uname -iSUNW, Netra-CP2300 |
The wdog.conf driver configuration file controls the boot-time configuration of the watchdog timer driver. The driver is configured through a directive to send a notice to syslog when the WD1 timer interrupt is serviced. The Netra CP2300 cPSB board implementation requires that the appropriate control directive be placed in wdog.conf.
The format for this directive is as follows:
# # control to enable syslog notification when a WD1 # interrupt is handled. # handler-message="on" enables syslog notice. # handler-message="off" disables syslog notice. # handler-message="off"; |
The OpenBoot PROM provides two environmental parameters, settable at the ok prompt, that control the behavior of the SMC watchdog timer.
These parameters are watchdog-enable? and watchdog-timeout?. The watchdog-enable? parameter is a logical switch with two possible values: true or false.
If watchdog-enable? is set to false, the watchdog timer is disabled at boot time. Once the kernel is booted, applications have the option to open and start the watchdog timer.
If watchdog-enable? is set to true, the watchdog timer is enabled at boot time with its default actions, as follows. The WD1 timer is controlled by the value in the watchdog-timeout variable. The default value for watchdog-timeout is 65535 (in the unit of one-tenth of a second). When WD1 expires, it sends an asynchronous message to the SPARC CPU and starts the WD2 timer. The default value for WD2 is one second. If WD2 expires, it resets the system.
If the watchdog timer is enabled at boot time, it is your responsibility to ensure that an application program is run to periodically restart the WD1 timer. If you fail to do so, the watchdog timer may reset the SPARC CPU when the watchdog expires.
For information on the data structure that is used with watchdog timer programs, refer to CODE EXAMPLE 1-1.
The watchdog operation (the local watchdog) is the watchdog that works between the SPARC CPU and System Management Controller (SMC).
Commands for smc are available in the SMC controller device mode
(/pci@1f,0/pci@1,1/isa@7/sysmgmt@0,8010 alias hsc). You need to go to the sysmgmt node before executing the smc commands and execute the following once:
ok dev hsc |
TABLE 1-1 lists the commands at OpenBoot prompt.
When watchdog reset occurs, the power module is toggled. Thus, the state of the CPU, except those stored in nonvolatile memory, will be lost. Once watchdog reset occurs after the SPARC CPU is restarted, the SPARC CPU must restart the watchdog timer.
The SPARC CPU must perform a corner case. After the SMC resets the SPARC CPU, the output buffer full (OBF) bit and OEM1 bit in the isa bus status register remain set. Since this is a read-only bit, the SMC cannot reset the bit. The SPARC CPU must ignore the status bits and clear the OBF bit by reading one byte of data from the isa bus. This action must be performed after watchdog reset. Otherwise, the SPARC CPU can inadvertently restart watchdog. For example, if the timer's values are set to very low numbers, the board can never boot to the Solaris operating system.
The SMC manages the race condition by putting interlock. The SMC does not start pre-timeout timer unless the warning is dispatched to the SPARC CPU. The code is set up on the SPARC CPU side after watchdog warning is issued. Use a Keyboard Controller Style (KCS) command to clear the watchdog interrupt. Using this command is the only way to avoid the selected pre-timeout action such as hard reset. This command rewinds the watchdog timer. The application program internally manages the warning, along with the command being sent to the SMC.
If diag-switch? is set to true, the timing for watchdog can be affected.
The examples in this section are performed at the OpenBoot PROM level.
To Set the Watchdog Timer Without Running the Pre-Timeout Timer |
In this example, after level one expires, the CPU is reset.
1. Set the timer to 10 minutes = 600 sec = 600,000/10 msec = 0x1770.
2. Set the reload values inside the SMC:
ok 17 70 ff 0 31 4 smc-set-wdt |
ok smc-reset-wdt |
This procedure sets the reload values of countdown timer and pre-timeout timer. In this example, after level one expires, there are 80 seconds before the reset.
1. Set the timer to 80 seconds = 0x50.
Set the countdown value to 10 minutes, as in the previous procedure, and set the pre-timeout timer to 80 seconds.