C H A P T E R  1

Watchdog Timer

The System Management Controller (SMC) on the Netra CP2000/CP2100 board, implements a two-level watchdog timer. The watchdog timer is used to recover the central processing unit (CPU) in case the CPU freezes.

This chapter provides detailed information on the SMC-based watchdog timer driver and its operation for the Netra CP2000/CP2100 boards. This chapter also describes the user-level application programming interface (API) and behavior of the Netra CP2000/CP2100 board watchdog timer. For functional details of the watchdog timer, see the technical reference and installation guide for your board product. See Accessing Sun Documentation for information on accessing this documentation.

This chapter includes the following sections:


Watchdog Timers

There are two watchdog timers:

This section described one of the many different options the user can select regarding the actions for WD1 and WD2.

16-bit Timer (WD1)

Each tick represents 100 ms. This timer, set to a nonzero number, counts down first. When the timer reaches zero, a warning is sent to the host CPU through EBus and the WD2 pre-timeout counter is set to a nonzero value when interrupt option is enabled. Otherwise the SMC resets the host CPU immediately. The reset action takes place when the reset option is enabled

8-bit Pre-timeout Timer (WD2)

Each tick represents one second. This timer is started when the countdown timer reaches zero (if WD1 is set to zero, WD2 starts right away). When the value of this counter reaches zero, the host is reset. If the hard reset option is enabled, no warning is issued prior to reset


Watchdog Timer Driver

The watchdog driver is a loadable STREAMS pseudo driver layered atop the Netra CP2000/CP2100 series service processor hardware. This driver implements a standardized watchdog timer function that can be used by systems management software for a number of systems timeout tasks.

The systems management software that uses the watchdog driver has access to two independent timers, the WD1 timer and the WD2 timer. The WD2 is the main timer and is used to detect conditions where the Solaris operating environment hangs. Systems management software starts and periodically restarts the WD2 timer before it expires. If the WD2 timer expires, the watchdog function of the WD2 timer can force the SPARCtrademark processor to reset. The maximum range for WD2 is 255 seconds. Or the WD2 timer could be set to take no action.

The WD1 timer is typically set to a shorter interval than the WD2 timer. User applications can examine the expiration status of the WD1 timer to get advance warning if the main timer, WD2, is about to expire. The system management software has to start WD1 before it can start WD2. If WD1 expires, then WD2 starts only if enabled. The maximum range for WD1 is 6553.5 seconds.

The applications programming interface exported by the watchdog driver is input output control-based (IOCTL-based). The watchdog driver is an exclusive-use device. If the device has already been opened, subsequent opens fail with EBUSY.


Operations on the Watchdog Timers

Operations on the watchdog timers require a call to ioctl(2) using the parameters appropriate to the operation. The watchdog driver exports Input Output Controls (IOCTLs) to start, stop, and get the current status of the watchdog timers.

When the device is initially opened, both the watchdog timers, WD1 and WD2, are in STOPPED state. To start either timer, an application program must use the WIOCSTART command. Once started, the WD1 timer can be stopped by using the WIOCSTOP command. Once started, the WD2 timer cannot be stopped--it can only be restarted. Each watchdog timer takes the default action when it expires.

If the WD1 timer expires and the default action is enabled, WD1 interrupts the SPARC processor. This interrupt is handled and the status of the WD1 timer queried shows the EXPIRED condition. If the default action is disabled, then the WD1 timer is in FREERUN state and no interrupt is delivered to the SPARC processor on expiration.

If the WD2 timer expires and the default action is enabled, WD2 resets the SPARC processor. If the default action is disabled, the WD2 timer is put in FREERUN state and its expiration does not affect the SPARC processor.

In the Netra CP2000/CP2100 series board, the SMC-based watchdog timers are not independant. The WD2 timer is a continuation of the WD1 timer. There are some behavioral consequences to this implementation that result in the Netra CP2000/CP2100 series watchdog timer having different semantics. The most obvious difference is that starting one timer when the other timer is active causes the other timer to be restarted with its programmed timeout period.


Parameters Transfer Structure

The IOCTL-based watchdog timer application programming interface (API) uses a common data structure to communicate all requests and responses between the watchdog timer driver and user applications.

Along with other API definitions, this structure is defined in the include file
sys/wd_if.h. The structure, called watchdog_if_t, is provided below for reference.

 

 


CODE EXAMPLE 1-1 Include File w d_if.h

#ifndef _SYS_WD_IF_H
#define _SYS_WD_IF_H
 
#pragma ident   "@(#)wd_if.h    1.3     01/12/17 SMI"
 
/*
 * wd_if.h
 * watchdog timer user interface header file.
 */
 
#ifdef  __cplusplus
extern "C" {
#endif
 
/*
 * handy defines:
 */
#define WD1             1               /* wd level 1 */
#define WD2             2               /* wd level 2 */
#define WD3             3               /* wd level 3 */
 
/*
 * state of the counters:
 */
#define FREERUN         0x01            /* counter is running, no intr */
#define EXPIRED         0x02            /* counter has  expired */
#define RUNNING         0x04            /* counter is running, intr is on */
#define STOPPED         0x08            /* counter not started at all */
#define SERVICED        0x10            /* intr was serviced */
 
/*
 * IOCTL related stuff.
 */
/*
 * TIOC ioctls for watchdog control and monitor
 */
#if (!defined(_POSIX_C_SOURCE) && !defined(_XOPEN_SOURCE)) || \
        defined(__EXTENSIONS__)
#define wIOC    ('w' << 8)
#endif /* (!defined(_POSIX_C_SOURCE) && !defined(_XOPEN_SOURCE))... */
 
#define WIOCSTART       (wIOC | 0)      /* start counters */
#define WIOCSTOP        (wIOC | 1)      /* inhibit interrupts (stop) */
#define WIOCGSTAT       (wIOC | 2)      /* get status of counters */
 
 
typedef struct {
        int             thr_fd;         /* wd fd, used in the thread */
        uint8_t         thr_lock;       /* lock for the thread */
        uint8_t         level;          /* wd level */
        uint16_t        count;          /* value to be loaded into limit reg */
        uint16_t        next_count;     /* next lev timer count */
        uint8_t         restart;        /* timer to restart, 0 = stop */
        uint8_t         status[3];      /* status filled in ioctl() */
        uint8_t         inhibit;        /* inhibit timers, bit field */
} watchdog_if_t;
 
/*
 * Bit field defines for the user interface
 * inhibit.
 */
#define WD1_INHIBIT     0x1             /* inhibit timer 1 */
#define WD2_INHIBIT     0x2             /* inhibit timer 2 */
#define WD3_INHIBIT     0x4             /* inhibit timer 3 */
 
#ifdef  __cplusplus
}
#endif
 
#endif  /* _SYS_WD_IF_H */
 

The following fields are used by the IOCTL interface. The watchdog timer driver does not use the thr_fd and thr_lock fields.


level

Select timer to perform operations on: WD1 or WD2

count

The period for the timer specified by level to run before it expires. Legal values lie in the range from 1 to 65534. If the value of count is equal to 0 or -1, the timer is set to its default value. The default value for WD1 is 10 seconds and for WD2 it is 15 seconds.

restart

(Optional) Select a timer to start automatically when the timer specified by level expires. Legal values are WD1 or WD2. This timer can be the same or different from that specified by level.

next_count

(Optional) The period for the timer specified by restart to run before it expires. The next_count parameter is subject to the same range and default value rules as count, described above.

inhibit

This is a mechanism for controlling the action taken by a timer when it expires. The inhibit flag is a mask to control the default actions taken on the expiration of each timer. A bit corresponding to each timer determines whether the timer's default action is enabled or disabled. If the corresponding bit in inhibit is zero, then the default action occurs on expiration of that timer; if the bit is set to one, then the default action is disabled. The symbolic names for the control masks, defined in sys/wd_if.h, are WD1_INHIBIT for timer WD1, and WD2_INHIBIT for timer WD2.

status

After a call to ioctl(2) with the WIOCGSTAT command, the status vector reflects the state of each watchdog timer (WD1 and WD2) available on the system. The status vector element status[0] corresponds to the state of WD1 and status[1] corresponds to the state of WD2.


 

The states that each watchdog timer can assume are listed below. These states are exclusive of each other.


STOPPED

The counter is not running.

RUNNING

The counter is running, and its associated action (interrupt or system reset) is enabled.

FREERUN

The counter is running, but no associated action is enabled.


In addition to these states, the following modes can become attached to a timer, based on its state:


EXPIRED

This mode is applicable only to the WD1 timer. This mode indicates that the WD1 timer interrupt has expired.

SERVICED

This mode is also applicable only to the WD1 timer. This mode indicates that an expiration interrupt has occurred and been serviced by the driver. This mode is cleared once it is reported to the user through WIOCGSTAT. Thus, if two consecutive IOCTL calls using WIOCGSTAT are made by a user program, the driver might return SERVICED for the first IOCTL call, but not for the second.



Input/Output Controls

The watchdog timer driver supports the following input/output control (IOCTL) requests:


WIOCGSTAT

Get the state of all the watchdog timers. If the level field of the watchdog_if_t structure is a valid value (either WD1 or WD2), the WIOCGSTAT IOCTL returns the status of both timers in the status vector or the structure. Getting the status of the timers clears the EXPIRED bit if set for the timer specified by the level field of the watchdog_if_t structure, so that each timer expiration event is reported.

WIOCSTART

A few behavioural consequences are associated with the WIOCSTART command that arise from the fact that WD1 and WD2 timers are not independent in the Netra CP2000/CP2100 series board implementation. When a WIOCSTART command is issued, the other timer, if already running, will be restarted from its current initial value. In addition, since the WD2 timer is in a sense an extension of the WD1 timer, it is not permissible to set the count value for WD1 to a value greater than that of an active WD2 timer. Similarly, it is not permissible to set the count value for WD2 to a value greater than that of an active WD1 timer. The following rules are applied when setting a timer if the other timer is already active: When WD1 is active, lowering WD2 to a value less than that of WD1 will cause WD1 to be lowered to be equal to WD2. When WD2 is active, raising WD1 to a value greater than that of WD2 will raise the value of WD2 to be the same as WD1.

WIOCSTOP

 

The WIOCSTOP command disables timer expiration actions. The inhibit mask parameter of the watchdog_if_t structure determines which timer is being controlled by WIOCSTOP. The level parameter of the watchdog_if_t structure passed with this command must be a valid watchdog level: either WD1 or WD2. If the watchdog level is not valid, you will receive an error message indicating that the device is not valid. It is possible to stop the WD1 timer if it is running. However, once started, the WD2 timer cannot be stopped and resets the system unless it is prevented from expiration by being periodically restarted.


Errors


EBUSY

An application program attempted to perform an open(2) on
/dev/wd but another application already owned the device.

EFAULT

An invalid pointer to a watchdog_if_t structure was passed as a parameter to ioctl(2).

EINVAL

The IOCTL command passed to the driver was not recognized.

OR

The level parameter of the watchdog_if_t structure is set to an invalid value. Legal values are WD1 or WD2.

OR

The restart parameter of the watchdog_if_t structure is set to an invalid value. Legal values are WD1, WD2, or zero.

ENXIO

The watchdog driver has not been plumbed to communicate with the SMC device driver.


Example

This code example retrieves the status of the watchdog timers, then starts both timers:


CODE EXAMPLE 1-2 Status of Watchdog Timers and Starting Timers
#include						sys/fcntl.h
#include						sys/wd_if.h
	.
	.
	.
int						fd;
watchdog_if_t						wdog1;
watchdog_if_t						wdog2;
int						rperiod = 5;
 
		/*
		 * open the watchdog driver
		 */
 
		if ((fd = open("/dev/wd", O_RDWR)) < 0) {
			perror("/dev/wd open failed");
			exit(0);
		}
 
		/*
		 * get the status of the timers
 
		 */
		wdog1.level = WD1;				
               /* must be a valid value */
		if (ioctl(fd, WIOCGSTAT, &wdog1) < 0) {
			perror("WIOCGSTAT ioctl failed");
			exit(0);
		}
 
		printf("Status WD1: 0x%x WD2: 0x%x\n",
			wdog1.status[0], wdog1.status[1]);
 
		/*
		 * Start WD1 to give advance warning if we don't
		 * respond in 10 seconds. Also, when WD1 expires,
		 * restart it automatically.
		 */
 
	#define RES(sec) (10 * (sec))	
             /* convert to 0.1 sec resolution */
		wdog1.level = WD1;
		wdog1.count = RES(10);
              /* 10 sec, resolution of 0.1 sec */
		wdog1.restart = WD1;
		wdog1.next_count = RES(10);
              /* 10 sec, resolution of 0.1 sec */
 
		/*
		 * start the timers ticking...
		 */
		if (ioctl(fd, WIOCSTART, &wdog1) < 0) {
			perror("WIOCSTART ioctl failed");
			exit(0);
		}
 
		/*
		 * Start WD2 to reset the SPARC processor if we don't
		 * kick it again within 20 seconds.
		 */
		wdog2.level = WD2;
		wdog2.count = RES(20);	
             /* 20 sec, resolution of 0.1 sec */
		wdog2.restart = 0;
 
		if (ioctl(fd, WIOCSTART, &wdog2) < 0) {
			perror("WIOCSTART ioctl failed");
			exit(0);
		}
 
		/*
		 * loop, restarting the timers to prevent RESET
		 */
 
		for (;;) {
			watchdog_if_t								wstat;
 
			/*
			 * first sleep for the desired period
			 * before restarting the timer(s)
			 */
			sleep(rperiod);
 
			/*
			 * setup to get the status of the timers
			 */
			wstat.level = WD1;		/* must be a valid value */
			if (ioctl(fd, WIOCGSTAT, &wstat) < 0) {
				perror("WIOCGSTAT ioctl failed");
				exit(0);
			}
			/*
			 * If the WD1 timer has expired, take
			 * appropriate action.
			 */
			if (wstat.status[0] & EXPIRED) {
				/* timer expired. shorten sleep? */
				puts("WD1: <EXPIRED>");
			}
 
			/*
			 * restart the timers
			 */
			if (ioctl(fd, WIOCSTART, &wdog2) < 0) {
				perror("WIOCSTART ioctl failed");
				exit(0);
			}
		}

Configuration

The watchdog device driver runs only on the following implementations:

By rule, the watchdog driver and its configuration file must reside in the platform-specific driver directory, /platform/implementation/kernel/drv. The value of implementation for a given Netra CP2000/CP2100 board system can be obtained by running the uname(1) command on that machine with the -i option:


# uname -iSUNW, UltraSPARCengine_CP-60

This directory contains the wdog.conf driver configuration file. This file controls the boot-time configuration of the watchdog timer driver. The driver is configured through a directive to send a notice to syslog when the WD1 timer interrupt is serviced. The Netra CP2000/CP2100 board implementation requires that the appropriate control directive be placed in wdog.conf.

The format for this directive is as follows:


        #
        # control to enable syslog notification when a WD1
        # interrupt is handled.
        # handler-message="on" enables syslog notice.
        # handler-message="off" disables syslog notice.
        #
        handler-message="on"

OpenBoot PROM Interface

The OpenBoottrademark PROM provides two environmental parameters, settable at the ok prompt, that control the behavior of the SMC watchdog timer.

These parameters are watchdog-enable? and watchdog-timeout?. The watchdog-enable? parameter is a logical switch with two possible values: true or false.

If watchdog-enable? is set to false,the watchdog timer is disabled at boot time,. Once the kernel is booted, applications have the option to start the watchdog timer.

If watchdog-enable? is set to true, the watchdog timer is enabled at boot time with its default actions: The WD1 timer is controlled by the value in watchdog-timeout variable. When WD1 expires it sends an asynchronous message to the local CPU. It also starts the WD2 timer. The default value for the WD2 timer is 1 second. If the WD2 timer expires, it resets the CPU board.

If the watchdog timer is enabled at boot time, it is your responsibility to ensure that an application program is run to periodically restart the WD1 timer. If you fail to do so, the timer expires. The system could be reset when the watchdog timer expires.


Data Structure

Refer to CODE EXAMPLE 1-1 for details on the data structure that is used with watchdog timer programs.


Watchdog Operation

The watchdog operation (the local watchdog) is the watchdog that works between the host CPU and System Management Controller (SMC).

Commands at OpenBoot PROM Prompt

TABLE 1-1 lists the commands at OpenBoot prompt.


TABLE 1-1 OpenBoot PROM Prompt Commands

Command

Description

smc-get-wdt

Gets the current timers values, and other watchdog state bits.

smc-set-wdt

Sets the timers values and other flags. This command is also used to stop watchdog operations.

smc-reset-wdt

Starts timer countdown and is often referred to as the "heartbeat".


Corner Cases

When watchdog reset occurs, the power module is toggled. Thus, the state of the CPU, except those stored in nonvolatile memory, will be lost. Once watchdog reset occurs after the host CPU is restarted, the host CPU must restart the watchdog timer.

The host CPU must perform a corner case. After the SMC resets the host CPU, the output buffer full (OBF) bit and OEM1 bit in the EBus status register remain set. Since this is a read-only bit, the SMC cannot reset the bit. The host must ignore the status bits and clear the OBF bit by reading one byte of data from EBus. This action must be performed after watchdog reset. Otherwise, the host CPU can inadvertently restart watchdog. For example, if the timer's values are set to very low numbers, the board can never boot to the Solaris operating system.

The SMC manages the race condition by putting interlock. The SMC does not start pre-timeout timer unless the warning is dispatched to the host CPU. The code is set up on the host side after watchdog warning is issued. Use a Keyboard Controller Style (KCS) command to clear the watchdog interrupt. Using this command is the only way to avoid the selected pre-timeout action such as hard reset. This command rewinds the watchdog timer. The host code internally manages the warning, along with the command being sent to the SMC.

If diag-switch? is set to true, the timing for watchdog can be affected.

Setting the Watchdog Timer at OpenBoot PROM


procedure icon  To Set the Watchdog Timer Without Running the Pre-Timeout Timer

The examples below are at the OpenBoot PROM level. AFter Level 1 expires the local CPU is put into reset.

1. Set the timer to 10 minutes = 600 sec = 600,000/10 msec = 0x1770.

2. Set the reload values inside the SMC:


ok 17 70 ff 0 31 4 smc-set-wdt

3. Start the watchdog timer:


ok smc-reset-wdt


procedure icon  To Set the Watchdog Timer With Pre-Timeout Time

This procedure sets the reload values of countdown timer and pre-timeout timer. Following the Level 1 expiry, there are 80 seconds before the reset action.

1. Set the timer to 80 seconds = 0x50.

Set the countdown value to 10 minutes, as in the previous procedure, and set the pre-timout timer to 80 seconds.


ok 17 70 ff 50 31 4 smc-set-wdt

2. Start the watchdog timer:


ok smc-reset-wdt


procedure icon  To Stop the Watchdog Timer


ok ff ff ff 0 31 4 smc-set-wdt