C H A P T E R  1

Watchdog Timer

The Netra CT system's watchdog service captures catastrophic faults in the Solaris operating environment (OE) running on the node card. The watchdog service reports such faults to the baseboard management controller (BMC) by means of either an IPMI message or by a de-assertion of the CPU's HEALTHY# signal.

This chapter contains the following sections:


Watchdog Timers

The Netra CT system management controller provides two watchdog timers: the watchdog level 2 (WD2) timer and the watchdog level 1 (WD1) timer. Systems management software starts and the Solaris OE periodically pats the timers before they expire. If the WD2 timer expires, the watchdog function of the WD2 timer forces the SPARCtrademark processor to optionally reset. The maximum range for WD2 is 255 seconds.

The WD1 timer is typically set to a shorter interval than the WD2 timer. User applications can examine the expiration status of the WD1 timer to get advance warning if the main timer, WD2, is about to expire. The system management software has to start WD1 before it can start WD2. If WD1 expires, then WD2 starts only if enabled. The maximum range for WD1 is 6553.5 seconds.


PICL Plug-in Module

The watchdog subsystem is managed by a platform information and control library (PICL) plug-in module. This PICL plug-in module provides a set of PICL properties to the system, which enables a Solaris PICL client to specify the attributes of the watchdog system.

To use the PICL API to set the watchdog properties, your application must follow the following sequence:



Note - The following instructions are not server-specific. Check your server documentation for additional software configuration that may be needed with the watchdog timer.



1. Before setting the watchdog to disable the primary HEALTHY# signal monitoring for the node card on which the watchdog timer is to be changed.

2. In your application, use the PICL API to disarm, set, and arm the active watchdog timer.

Refer to the picld(1M), libpicl(3LIB), and libpicl(3PICL) man pages for a complete description of the PICL architecture and programming interface. Develop your application to use the PICL programming interface to do the following:

3. Re-enable the primary HEALTHY# signal monitoring on the CPU card in the specified slot.

PICL interfaces for the watchdog plug-in module include the nodes watchdog-controller and watchdog-timer. See TABLE 1-1, TABLE 1-2 and TABLE 1-3 for descriptions of the properties of these nodes.

TABLE 1-1 Watchdog Plug-in Interfaces for Netra CP2300 Board Software

PICL Class

Property

Meaning

watchdog-controller

WdOp

Represents a watchdog subsystem.

watchdog-timer

State

Represents a watchdog timer hardware that belongs to its controller. Each timer depends on the status of its peers to be activated or deactivated.

 

WdTimeout

Timeout for the watchdog timer

 

WdAction

Action to be taken after the watchdog expires.


 

TABLE 1-2 Properties Under watchdog-controller Node

Property

Operations

Description

WdOp

arm

Activates all timers under the controller with values already set for WdTimeout and WdAction.

 

disarm

All active timers under the controller will be stopped.


 

TABLE 1-3 Properties Under watchdog-timer Node

Property

Values

Description

State

armed

Indicates timer is armed or running. Cleared by disarm.

 

expired

Indicates timer has expired. Cleared by disarm.

 

disarmed

Default value set at startup time. Indicates timer is disarmed or stopped.

WdTimeout[1]

Varies by system and timer level

Indicates the timer initial countdown value. Should be set prior to arming the timer.

WdAction[2]

none

Default value. No action is taken.

 

alarm

Send notifications to system alarm hardware by means of HEALTHY#.

 

reset

Perform a soft or hard reset of the system (implementation specific).

 

reboot

Reboot the system.


To identify current settings of watchdog-controller, issue the command prtpicl -v as shown in CODE EXAMPLE 1-1.

CODE EXAMPLE 1-1 Example of watchdog-controller Settings

# prtpicl -v
         <snip>
        watchdog-controller1 (watchdog-controller,3600000729) 
                :wd-op  disarm
                :_class watchdog-controller
                :name   watchdog-controller1 
                   watchdog-level1 (watchdog-timer, 360000073f)
                        :WdAction     alarm
                        :WdTimeout    0x1f4
                        :State        armed 
                        :_class       watchdog-timer 
                        :name  watchdog-level1 
                   watchdog-level2 (watchdog-timer, 3600000742)
                        :WdAction     none 
                        :WdTimeout    0xffff 
                        :State        disarmed 
                        :_class       watchdog-timer 
                        :name  watchdog-level2 
 


Watchdog Node Management Code

CODE EXAMPLE 1-2 contains an example of the code used for managing the watchdog timer nodes. This code can be used to change watchdog timer action and timeout values and also to arm and disarm the watchdog controller.

CODE EXAMPLE 1-2 System Watchdog Node Management Code Example
/*
 * Copyright 2003 Sun Microsystems, Inc.  All rights reserved.
 * Use is subject to license terms.
 */
 
#pragma ident   "@(#)wdadm.c    1.6     03/10/16 SMI"
 
/*
 * This program is used to manage the system watchdog nodes.
 * Please refer to libpicl(3LIB) for information on picl APIs
 * To compile:
 *      cc -o wdadm -lpicl wdadm.c
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <strings.h>
#include <errno.h>
#include <alloca.h>
#include <libintl.h>
#include <locale.h>
#include <unistd.h>
#include <assert.h>
#include <inttypes.h>
#include <sys/termios.h>
#include <picl.h>
 
/*
 * Error codes
 */
#define EM_USAGE                0
#define EM_INIT                 1
#define EM_GETROOT              2
#define EM_GETPVALBYNAME        3
 
#define USAGE_STR       "Usage:\n"\
                "wdadm -l [<controller_name:timer_name>...]\n"\
                "wdadm -m <controller_name:timer_name> [-t <timeout>]"\
                " [-a action]]\n"\
                "wdadm -c <controller_name> -o <op>\n"
 
#define DETAILED_HELP   "wdadm  - System Watchdog Controller Administration\n"\
"Description:\n"\
"The operations include displaying status (-l), modifying the values (-m)\n"\
"and executing commands on the watchdog controller (-c).\n"\
"This utility must be run with super user permissions.\n"\
"OPTIONS\n"\
"       -l   list all the watchdog timer nodes.\n"\
"            Each Timer node is denoted as controller:timer\n"\
"            Exmaple:\n"\
"            wdadm -l                   - lists all the nodes\n"\
"            wdadm -l c1:t1 c1:t2       - lists c:t1 and c:t2 nodes\n"\
"                     c1  - controller name\n"\
"                     t1  - timer name\n"\
"       -m   modify the timeout and action parameters for a timer node.\n"\
"            Example:\n"\
"            wdadm -m c1:t1 -t <timeout in ms> -a <action>\n"\
"            wdadm -m c1:t1 -t <timeout in ms>\n"\
"            wdadm -m c1:t1 -a <action>\n"\
"            Note: Before using this option, the controller must be\n"\
"                  disarmed (using -c option).\n"\
"       -c   Execute commands on the watchdog controller node\n"\
"            Commands supported are : arm, disarm\n"\
"            Example:\n"\
"            wdadm -c controller -o arm\n"\
"            arms the watchdog controller node called controller\n"
 
#define HEADER          "NAME (controller:timer)\t\tSTATUS"\
                        "\t\tACTION\t\tTIMEOUT\n"
#define PRINT_FORMAT    "\t%-10s\t%-10s\t%d"
#define ILLEGAL_TIMEOUT -999
 
/* watchdog properties */
#define WATCHDOG_ACTION                 "WdAction"
#define WATCHDOG_TIMEOUT                "WdTimeout"
#define WATCHDOG_STATUS                 "State"
#define WATCHDOG_OP                     "WdOp"
#define PICL_WATCHDOG_CONTROLLER        "watchdog-controller"
#define WATCHDOG_DISARMED               "disarmed"
 
/*
 * data structure that will be passed as argument to
 * picl_walk_tree_by_class callback function
 */
typedef struct {
        int start_index;
        int max_index;
        char    **list;
        char    *name;
        char    *action;
        char    *op;
        int32_t timeout;
        int     error_code;
} wdadm_args_t;
 
static  char            *prog;
static picl_nodehdl_t   rooth;
static          int count = 0;
 
/*
 * Error mesage texts
 */
static  char    *err_msg[] = {
        /* program usage */
        USAGE_STR,                                              /* 0 */
        /* picl call failed messages */
        "picl_initialize failed: %s\n",                         /*  1 */
        "picl_get_root failed: %s\n",                           /*  2 */
        "picl_get_propval_by_name failed: %s\n"                 /*  3 */
};
 
#define NUM_ERROR_CODES 7
/* mapping between picl error codes and errno */
static int error_map[][2] =  {
        {PICL_SUCCESS, 0}, { PICL_FAILURE, -1}, {PICL_VALUETOOBIG, E2BIG},
        {PICL_NODENOTFOUND, ENODEV}, {PICL_PERMDENIED, EPERM},
        {PICL_NOSPACE, ENOMEM}, {PICL_INVALIDARG, EINVAL} };
 
static int
picl2errno(int piclerr)
{
        int i;
        for (i = 0; i < NUM_ERROR_CODES; i++) {
                if (error_map[i][0] == piclerr)
                        return (error_map[i][1]);
        }
        return (-1);
}
 
static void
print_errmsg(char *message, ...)
{
        va_list ap;
 
        va_start(ap, message);
        (void) fprintf(stderr, "%s: ", prog);
        (void) vfprintf(stderr, message, ap);
        va_end(ap);
}
 
/*
 * Print wdadm usage
 */
static void
usage(void)
{
        print_errmsg(gettext(err_msg[EM_USAGE]));
        exit(1);
}
 
/*
 * This function is used to read picl property. The value is copied
 * into vbuf.
 * memory allocated for vbuf must be free'd by caller
 */
static picl_errno_t
wdadm_get_picl_prop(picl_nodehdl_t nodeh, const char *prop_name, void **vbuf)
{
        picl_errno_t    err;
        picl_propinfo_t pinfo;
        picl_prophdl_t  proph;
 
        /* get the information about the property */
        if ((err = picl_get_propinfo_by_name(nodeh, prop_name,
                        &pinfo, &proph)) != PICL_SUCCESS) {
                return (err);
        }
 
        *vbuf = malloc(pinfo.size);
        if (vbuf == NULL)
                return (PICL_NOSPACE);
 
        /* read the property value */
        if ((err = picl_get_propval(proph, *vbuf, pinfo.size)) !=
                        PICL_SUCCESS) {
                return (err);
        }
        return (PICL_SUCCESS);
}
 
/*
 * This function is used to set the value of a picl property
 */
static picl_errno_t
wdadm_set_picl_prop(picl_nodehdl_t nodeh, const char *prop_name,
                                void *vbuf, int size)
{
        picl_errno_t    err;
        picl_propinfo_t pinfo;
        picl_prophdl_t  proph;
        void            *tmp_buf;
 
        if ((err = picl_get_propinfo_by_name(nodeh, prop_name,
                        &pinfo, &proph)) != PICL_SUCCESS) {
                return (err);
        }
 
        tmp_buf = alloca(pinfo.size);
        if (tmp_buf == NULL) {
                return (PICL_NOSPACE);
        }
        if (size > pinfo.size) {
                return (PICL_VALUETOOBIG);
        }
 
        bzero(tmp_buf, pinfo.size);
        (void) memcpy(tmp_buf, vbuf, size);
 
        /* set the property value */
        if ((err = picl_set_propval(proph, vbuf, pinfo.size)) !=
                        PICL_SUCCESS) {
                return (err);
        }
        return (PICL_SUCCESS);
}
 
/*
 * This function prints the timeout, state, action of a
 * watchdog-timer node
 */
static picl_errno_t
print_watchdog_node_props(picl_nodehdl_t nodeh)
{
        int32_t *timeout = NULL;
        char    *action = NULL, *status = NULL;
 
        if (wdadm_get_picl_prop(nodeh, WATCHDOG_TIMEOUT,
                        (void **)&timeout) != PICL_SUCCESS) {
                free(timeout);
                return (PICL_FAILURE);
        }
 
        if (wdadm_get_picl_prop(nodeh, WATCHDOG_STATUS,
                                (void **)&status) != PICL_SUCCESS) {
                free(status);
                free(timeout);
                return (PICL_FAILURE);
        }
 
        if (wdadm_get_picl_prop(nodeh, WATCHDOG_ACTION,
                                (void **)&action) != PICL_SUCCESS) {
                free(status);
                free(timeout);
                free(action);
                return (PICL_FAILURE);
        }
 
        (void) printf(PRINT_FORMAT, status, action, *timeout);
        free(status);
        free(timeout);
        free(action);
        return (PICL_SUCCESS);
}
 
/*
 * This function is the callback function that gets called
 * due to picl_walk_tree_by_class call from print_wd_info function.
 * This function traveses all the watchdog-timer nodes under the given
 * controller and makes a call to print_watchdog_node_props to print
 * the watchdog properties
 */
static int
wd_printf_info(picl_nodehdl_t nodeh, void *args)
{
        int err = PICL_SUCCESS;
        int print = 0, i = 0;
        wdadm_args_t    *wd_arg = NULL;
        picl_nodehdl_t childh, peerh;
        char cntrl_name[PICL_PROPNAMELEN_MAX];
        char wd_name[PICL_PROPNAMELEN_MAX];
        char name[2 * PICL_PROPNAMELEN_MAX];
 
        wd_arg = (wdadm_args_t *)args;
 
        /* get the controller name */
        err = picl_get_propval_by_name(nodeh, PICL_PROP_NAME,
                (void *)cntrl_name, PICL_PROPNAMELEN_MAX);
        if (err != PICL_SUCCESS) {
                print_errmsg(gettext(err_msg[EM_GETPVALBYNAME]),
                    picl_strerror(err));
                return (err);
        }
 
        /* get the first child of controller */
        err = picl_get_propval_by_name(nodeh, PICL_PROP_CHILD,
                &childh, sizeof (picl_nodehdl_t));
        if (err != PICL_SUCCESS) /* This controller has no childs */
                return (PICL_WALK_CONTINUE); /* move to next controller */
 
        peerh = childh;
        /* traverse thru all the timer nodes using peer property. */
        do
        {
                /* get the name of watchdog node */
                err = picl_get_propval_by_name(peerh, PICL_PROP_NAME,
                        (void *)wd_name, PICL_PROPNAMELEN_MAX);
                if (err != PICL_SUCCESS) {
                        print_errmsg(gettext(err_msg[EM_GETPVALBYNAME]),
                            picl_strerror(err));
                        return (err);
                }
                (void) sprintf(name, "%s:%s", cntrl_name, wd_name);
 
                if (wd_arg != NULL) {
                        /* check if the node is in the list  to print */
                        for (i = wd_arg->start_index; i < wd_arg->max_index;
                                                                i++) {
                                if (strcmp(wd_arg->list[i], name) == 0) {
                                        print = 1;
                                        break;
                                }
                        }
                }
 
                if (wd_arg == NULL || print) {
                        if (count == 0) {
                                (void) printf("%s", HEADER);
                                count++;
                        }
 
                        (void) printf("%-30s", name);
                        (void) print_watchdog_node_props(peerh);
                        (void) printf("\n");
                        print = 0;
                }
                /* move to next timer node */
                err = picl_get_propval_by_name(peerh, PICL_PROP_PEER,
                        &peerh, sizeof (picl_nodehdl_t));
        } while (err == PICL_SUCCESS);
 
        return (PICL_WALK_CONTINUE); /* move to next controller */
}
 
/*
 * This routine is used to print the information of watchdog nodes
 */
static int
print_wd_info(int argc, char **argv, int optind)
{
        int             err = PICL_SUCCESS;
        wdadm_args_t    *args = NULL;
        wdadm_args_t    wd_args;
 
        if (argc == optind) {
                /* print information of all the nodes */
                args  = NULL;
        } else {
                /* print information of only specified nodes */
                wd_args.list = argv;
                wd_args.start_index = optind;
                wd_args.max_index = argc;
                args  = &wd_args;
        }
        err = picl_walk_tree_by_class(rooth, PICL_WATCHDOG_CONTROLLER,
                (void  *)args, wd_printf_info);
 
        if (count == 0) {
                (void) fprintf(stderr, "%s:Node not found:%d\n",
                        prog, picl2errno(PICL_NODENOTFOUND));
                return (PICL_NODENOTFOUND);
        }
        return (err);
}
 
/*
 * This function is the callback function that gets called
 * due to picl_walk_tree_by_class call from set_wd_params function.
 * This function checks if the given controller node has the watchdog-timer
 * of interest and then changes the timeout and action of that timer.
 */
static int
wd_set_params(picl_nodehdl_t nodeh, void *args)
{
        int err = PICL_SUCCESS;
        char     *ptr = NULL;
        char cntrl_name[PICL_PROPNAMELEN_MAX];
        char wd_name[PICL_PROPNAMELEN_MAX];
        picl_nodehdl_t childh, peerh;
        wdadm_args_t    *wd_arg = NULL;
        char            *status = NULL;
 
        wd_arg = (wdadm_args_t *)args;
        if (wd_arg == NULL || wd_arg->name == NULL)
                return (PICL_WALK_TERMINATE);
 
        /* get the name of the controller */
        err = picl_get_propval_by_name(nodeh, PICL_PROP_NAME,
                (void *)cntrl_name, PICL_PROPNAMELEN_MAX);
        if (err != PICL_SUCCESS) {
                print_errmsg(gettext(err_msg[EM_GETPVALBYNAME]),
                    picl_strerror(err));
                return (err);
        }
 
        /*
         * name is of cntrl:node_name format (user input)
         * do the parsing to extract controller name and watchdog-timer
         * name
         */
        ptr = strchr(wd_arg->name, ':');
        if (ptr == NULL) {
                (void) fprintf(stderr, "%s:Node not found:%d\n",
                        prog, picl2errno(PICL_NODENOTFOUND));
                return (PICL_NODENOTFOUND);
        }
 
        /* check if the controller is of interest */
        if (strncmp(cntrl_name, wd_arg->name, (ptr - wd_arg->name)) != 0) {
                return (PICL_WALK_CONTINUE);
        }
 
        err = picl_get_propval_by_name(nodeh, PICL_PROP_CHILD,
                &childh, sizeof (picl_nodehdl_t));
 
        if (err != PICL_SUCCESS)
                return (PICL_WALK_TERMINATE);
 
        ptr++;  /* this points to watchdog node name */
        if (ptr == NULL) {
                (void) fprintf(stderr, "%s:Node not found:%d\n",
                        prog, picl2errno(PICL_NODENOTFOUND));
                return (PICL_WALK_TERMINATE);
        }
 
        /* traverse thru the list of timers under this controller */
        peerh = childh;
        do
        {
                /* get the name of watchdog node */
                err = picl_get_propval_by_name(peerh, PICL_PROP_NAME,
                        (void *)wd_name, PICL_PROPNAMELEN_MAX);
                if (err != PICL_SUCCESS) {
                        print_errmsg(gettext(err_msg[EM_GETPVALBYNAME]),
                            picl_strerror(err));
                        return (err);
                }
 
                /* This code segment changes the watchdog timeout and action */
                if (strcmp(ptr, wd_name) == 0) {
                        if ((err = wdadm_get_picl_prop(peerh, WATCHDOG_STATUS,
                                (void **)&status)) != PICL_SUCCESS) {
                                (void) free(status);
                                return (err);
                        }
                        if (strcmp(status, WATCHDOG_DISARMED) != 0) {
                                (void) fprintf(stderr, "%s: Timer is not "
                                        "disarmed, cannot change the "
                                        "parameters\n", prog);
                                (void) free(status);
                                return (PICL_PERMDENIED);
                        }
                        (void) free(status);
 
                        /* set watchdog action */
                        if (wd_arg->action)
                        if ((err = wdadm_set_picl_prop(peerh, WATCHDOG_ACTION,
                                wd_arg->action,
                                strlen(wd_arg->action) + 1)) != PICL_SUCCESS) {
                                        (void) fprintf(stderr, "%s:Error in "
                                        "setting action:%d\n", prog,
                                        picl2errno(err));
                                return (err);
                        }
 
                        /* set watchdog timeout */
                        if (wd_arg->timeout != ILLEGAL_TIMEOUT)
                        if ((err = wdadm_set_picl_prop(peerh, WATCHDOG_TIMEOUT,
                                        (void *)&wd_arg->timeout,
                                        sizeof (wd_arg->timeout))) !=
                                                        PICL_SUCCESS) {
                                        (void) fprintf(stderr, "%s:Error in "
                                        "setting timeout:%d\n", prog,
                                        picl2errno(err));
                                return (err);
                        }
                        return (PICL_WALK_TERMINATE);
                }
                err = picl_get_propval_by_name(peerh, PICL_PROP_PEER,
                        &peerh, sizeof (picl_nodehdl_t));
        } while (err == PICL_SUCCESS);
 
        (void) fprintf(stderr, "%s:Node not found:%d\n",
                prog, picl2errno(PICL_NODENOTFOUND));
        return (PICL_NODENOTFOUND);
}
 
/*
 * This routine gets called to change the watchdog timeout and
 * action.
 * wd_name is of "controller:watchdog-timer" format
 */
static int
set_wd_params(char *wd_name, char *action, char *timeout)
{
        int             err = PICL_SUCCESS;
        char            *ptr = NULL;
        wdadm_args_t    wd_arg;
 
        if (wd_name == NULL) {
                return (PICL_INVALIDARG);
        }
 
        ptr = strchr(wd_name, ':');
        if (ptr == NULL) {      /* invalid format */
                (void) fprintf(stderr, "%s:Node not found:%d\n",
                        prog, picl2errno(PICL_NODENOTFOUND));
                return (PICL_NODENOTFOUND);
        }
 
        wd_arg.name = wd_name;
        wd_arg.action = action;
        wd_arg.error_code = 0;
        if (timeout) {
                errno = 0;
                wd_arg.timeout = strtol(timeout, NULL, 10);
                if (errno != 0) {
                        (void) fprintf(stderr, "%s:Illegal timeout value\n",
                                                                prog);
                        return (PICL_INVALIDARG);
                }
        } else {
                wd_arg.timeout = ILLEGAL_TIMEOUT; /* need not program timeout */
        }
 
        err = picl_walk_tree_by_class(rooth, PICL_WATCHDOG_CONTROLLER,
                (void  *)&wd_arg, wd_set_params);
        return (err);
}
 
/*
 * This is the callback function that gets called due to
 * picl_walk_tree_by_class function call from control_wd function.
 * This function is used to arm/disarm the watchdog controller.
 */
static int
wd_change_state(picl_nodehdl_t nodeh, void *arg)
{
        int err = PICL_SUCCESS;
        char cntrl_name[PICL_PROPNAMELEN_MAX];
        wdadm_args_t    *wd_arg = NULL;
 
        wd_arg = (wdadm_args_t *)arg;
        if (wd_arg == NULL || wd_arg->name == NULL)
                return (PICL_WALK_TERMINATE);
 
        err = picl_get_propval_by_name(nodeh, PICL_PROP_NAME,
                (void *)cntrl_name, PICL_PROPNAMELEN_MAX);
        if (err != PICL_SUCCESS) {
                print_errmsg(gettext(err_msg[EM_GETPVALBYNAME]),
                    picl_strerror(err));
                return (err);
        }
 
        /*
         * check to see if the controller is of interest, otherwise
         * move to the next controller.
         */
        if (strcmp(cntrl_name, wd_arg->name) != 0) {
                return (PICL_WALK_CONTINUE);
        }
 
        count++;
        /* change the watchdog-controller's WdOp property */
        if ((err = wdadm_set_picl_prop(nodeh, WATCHDOG_OP,
                wd_arg->op, strlen(wd_arg->op) + 1)) != PICL_SUCCESS) {
                        (void) fprintf(stderr, "%s:Failed:%d\n", prog,
                                        picl2errno(err));
        }
        return (err);
}
 
/*
 * Function is used to disarm/arm the watchdog controller
 */
static int
control_wd(char *cntrl_name, char *op)
{
        wdadm_args_t    wd_arg;
        int err = PICL_SUCCESS;
 
        if (cntrl_name == NULL || op == NULL) {
                (void) fprintf(stderr, "%s:Invalid arguments\n", prog);
                return (PICL_INVALIDARG);
        }
        wd_arg.name = cntrl_name;
        wd_arg.op = op;
        wd_arg.error_code = 1;
        err = picl_walk_tree_by_class(rooth, PICL_WATCHDOG_CONTROLLER,
                (void  *)&wd_arg, wd_change_state);
 
        if (count == 0) {
                (void) fprintf(stderr, "%s:Invalid controller name\n",
                                                                prog);
                return (PICL_NODENOTFOUND);
        }
 
        return (err);
}
 
int
main(int argc, char **argv)
{
        int             err;
        int             c, rc = 0;
        char            cntrl_name[PICL_CLASSNAMELEN_MAX];
        char            op[PICL_CLASSNAMELEN_MAX];
        char            wd_name[PICL_CLASSNAMELEN_MAX];
        char            timeout[PICL_CLASSNAMELEN_MAX];
        char            action[PICL_CLASSNAMELEN_MAX];
        int             cflg = 0, oflg = 0, lflg = 0;
        int             mflg = 0, tflg = 0, aflg = 0;
 
        (void) setlocale(LC_ALL, "");
        if ((prog = strrchr(argv[0], '/')) == NULL)
                prog = argv[0];
        else
                prog++;
 
        bzero(timeout, PICL_CLASSNAMELEN_MAX);
        bzero(action, PICL_CLASSNAMELEN_MAX);
 
        while ((c = getopt(argc, argv, "hlc:o:m:t:a:")) != EOF) {
                switch (c) {
                case 'l':
                        lflg = 1;
                        break;
                case 'c':
                        cflg = 1;
                        (void) strlcpy(cntrl_name, optarg,
                            PICL_CLASSNAMELEN_MAX);
                        break;
                case 'o':
                        oflg = 1;
                        (void) strlcpy(op, optarg,
                            PICL_CLASSNAMELEN_MAX);
                        break;
                case 'm':
                        mflg = 1;
                        (void) strlcpy(wd_name, optarg,
                            PICL_CLASSNAMELEN_MAX);
                        break;
                case 't':
                        tflg = 1;
                        (void) strlcpy(timeout, optarg,
                            PICL_CLASSNAMELEN_MAX);
                        break;
                case 'a':
                        aflg = 1;
                        (void) strlcpy(action, optarg,
                            PICL_CLASSNAMELEN_MAX);
                        break;
                case 'h':
                        (void) printf("%s\n", USAGE_STR);
                        (void) printf("%s", DETAILED_HELP);
                        exit(0);
                case '?': /*FALLTHROUGH*/
                default:
                        usage();
                        /*NOTREACHED*/
                }
        }
 
        /* check if more than one action is specified */
        if ((lflg + cflg + mflg) > 1) {
                (void) printf("wdadm: more than one action "
                        "specified (-l,-m,-c)\n");
                usage();
        }
 
        if ((lflg + cflg + mflg) == 0) {
                /* if no args are specified, default action is listing */
                lflg++;
        }
 
        err = picl_initialize();
        if (err != PICL_SUCCESS) {
                print_errmsg(gettext(err_msg[EM_INIT]), picl_strerror(err));
                exit(1);
        }
 
        err = picl_get_root(&rooth);
        if (err != PICL_SUCCESS) {
                print_errmsg(gettext(err_msg[EM_GETROOT]),
                    picl_strerror(err));
                (void) picl_shutdown();
                exit(1);
        }
 
        if (lflg) {
                rc = print_wd_info(argc, argv, optind);
                (void) picl_shutdown();
                return (picl2errno(rc));
        }
 
        if (argc != optind) {
                (void) picl_shutdown();
                usage();
        }
 
        if (mflg) {
                if ((aflg + tflg) < 1) {
                        /*
                         * m flag must be associated with atleast
                         * action or timeout
                         */
                        (void) printf("wdadm: timeout and action values "
                                "are missing\n");
                        (void) picl_shutdown();
                        usage();
                }
                rc = set_wd_params(wd_name, (aflg ? action : NULL),
                                (tflg ? timeout : NULL));
        }
 
        if (cflg) {
                if (oflg == 0) {
                        /* operation must be specified along with c option */
                        (void) printf("wdadm: operation argument is missing\n");
                        (void) picl_shutdown();
                        usage();
                }
                rc = control_wd(cntrl_name, op);
        }
        (void) picl_shutdown();
        return (picl2errno(rc));
}


OpenBoot PROM Interface

The OpenBoottrademark PROM provides two environmental parameters, settable at the ok prompt, that control the behavior of the SMC watchdog timer.

These parameters are watchdog-enable? and watchdog-timeout. The watchdog-enable? parameter is a logical switch with two possible values: true or false.

If watchdog-enable? is set to false, the watchdog timer is disabled at boot time. Once the kernel is booted, applications have the option to open and start the watchdog timer.

If watchdog-enable? is set to true, the watchdog timer is enabled at boot time with its default actions, as follows. The WD1 timer is controlled by the value in the watchdog-timeout variable. The default value for watchdog-timeout is 65535 (in the unit of one-tenth of a second). When WD1 expires, it sends an asynchronous message to the SPARC CPU and starts the WD2 timer. The default value for WD2 is one second. If WD2 expires, it resets the system.

If the watchdog timer is enabled at boot time, it is your responsibility to ensure that an application program is run to periodically restart the WD1 timer. If you fail to do so, the watchdog timer may reset the SPARC CPU when the watchdog expires.


Watchdog Operation

The watchdog operation (the local watchdog) is the watchdog that works between the SPARC CPU and System Management Controller (SMC).

Commands at OpenBoot PROM Prompt

Commands for smc are available in the SMC controller device mode
(/pci@1f,0/pci@1,1/isa@7/sysmgmt@0,8010 alias hsc). You need to go to the sysmgmt node before executing the smc commands and execute the following once:

ok dev hsc

TABLE 1-4 lists the commands at OpenBoot prompt.

TABLE 1-4 OpenBoot PROM Prompt Commands

Command

Description

smc-get-wdt

Gets the current timers values, and other watchdog state bits.

smc-set-wdt

Sets the timers values and other flags. This command is also used to stop watchdog operations.

smc-reset-wdt

Starts timer countdown and is often referred to as the "heartbeat".


Corner Cases

When watchdog reset occurs, the power module is toggled. Thus, the state of the CPU, except those stored in nonvolatile memory, will be lost. Once watchdog reset occurs after the SPARC CPU is restarted, the SPARC CPU must restart the watchdog timer.

The SPARC CPU must perform a corner case. After the SMC resets the SPARC CPU, the output buffer full (OBF) bit and OEM1 bit in the isa bus status register remain set. Since this is a read-only bit, the SMC cannot reset the bit. The SPARC CPU must ignore the status bits and clear the OBF bit by reading one byte of data from the isa bus. This action must be performed after watchdog reset. Otherwise, the SPARC CPU can inadvertently restart watchdog. For example, if the timer's values are set to very low numbers, the board can never boot to the Solaris operating system.

The SMC manages the race condition by putting interlock. The SMC does not start pre-timeout timer unless the warning is dispatched to the SPARC CPU. The code is set up on the SPARC CPU side after watchdog warning is issued. Use a Keyboard Controller Style (KCS) command to clear the watchdog interrupt. Using this command is the only way to avoid the selected pre-timeout action such as hard reset. This command rewinds the watchdog timer. The application program internally manages the warning, along with the command being sent to the SMC.

If diag-switch? is set to true, the timing for watchdog can be affected.

Setting the Watchdog Timer at OpenBoot PROM

The examples in this section are performed at the OpenBoot PROM level.


procedure icon  To Set the Watchdog Timer Without Running the Pre-Timeout Timer

In this example, after level one expires, the CPU is reset.

1. Set the timer to 10 minutes = 600 sec = 600,000/10 msec = 0x1770.

2. Set the reload values inside the SMC:

ok 17 70 ff 0 31 4 smc-set-wdt

3. Start the watchdog timer:

ok smc-reset-wdt


procedure icon  To Set the Watchdog Timer With Pre-Timeout Time

This procedure sets the reload values of countdown timer and pre-timeout timer. In this example, after level one expires, there are 80 seconds before the reset.

1. Set the timer to 80 seconds = 0x50.

Set the countdown value to 10 minutes, as in the previous procedure, and set the pre-timeout timer to 80 seconds.

ok 17 70 ff 50 31 4 smc-set-wdt

2. Start the watchdog timer:

ok smc-reset-wdt


procedure icon  To Stop the Watchdog Timer

ok ff ff ff 0 31 4 smc-set-wdt

 


1 (TableFootnote) A platform might not support a specified timeout resolution. For example, Netra CT systems only take -1, 0, and 100 to 6553500 msec in increments of 100 msec for level 1; and -1, 0, and 1000 to 255000 in increments of 1000 msec for level 2.
2 (TableFootnote) A specific timer node might not support all action types. For example, Netra CT watchdog level 1 timer supports only none, alarm, and reboot actions. Watchdog level 2 timer supports only none and reset