Sun Logo


Sun HPC ClusterToolstrademark 5 Software Administrator's Guide

817-0083-10



Contents

Preface

1. Introduction

Sun HPC Clusters

Cluster Runtime Environment Daemons

Sun HPC ClusterTools Software

Sun CRE's Integration With Batch Processing Systems

Sun MPI and MPI I/O

Loadable Protocol Modules

Prism Environment

Sun S3L

Related Tools

Sun Compilers

Cluster Console Manager

2. Getting Started

Fundamental Sun CRE Concepts

Cluster of Nodes

Security

Partitions

Load Balancing

Jobs and Processes

Communication Protocols

Activating the Sun HPC ClusterTools Software

Activating Specified Nodes From a Central Host

Activating the Local Node

Verifying Basic Functionality

Check That Nodes Are Up

Create a Default Partition

Verify That Sun CRE Executes Jobs

Verifying MPI Communications

Stopping and Restarting Sun CRE

Stopping and Starting Sun CRE Daemons From a Central Host

Stopping and Starting Sun CRE Daemons on the Local Node

3. Overview of Administration Controls

The Sun CRE Daemons

Master Daemon tm.rdb

Master Daemon tm.mpmd

Master Daemon tm.watchd

Nodal Daemon tm.omd

Nodal Daemon tm.spmd

Spin Daemon tm.spind

RSM Daemon

mpadmin: Administration Interface

Introduction to mpadmin

Understanding Objects, Attributes, and Contexts

Performing Sample mpadmin Tasks

Quitting mpadmin

Cluster Configuration File hpc.conf

Preparing to Edit hpc.conf

Specifying MPI Options

Updating the Sun CRE Database

Authentication and Security

Setting the Sun CRE Cluster Password

Establishing the Current Authentication Method

Setting Up the Default Authentication

Setting Up DES Authentication

Setting Up Kerberos Authentication

4. Cluster Configuration Notes

Nodes

Number of CPUs

Memory

Swap Space

Interconnects

Sun HPC ClusterTools Internode Communication

Network Characteristics

Notes on RSM Setup

Close Integration With Batch Processing Systems

How Close Integration Works

How Close Integration Is Used

Instructions for Enabling Close Integration

5. mpadmin: Detailed Description

mpadmin Syntax

Command-Line Options

-c command - Single Command Option

-f file-name - Take Input From a File

-h - Display Help

-q - Suppress Warning Message

-s cluster-name - Connect to Specified Cluster

-V - Version Display Option

mpadmin Objects, Attributes, and Contexts

mpadmin Objects and Attributes

mpadmin Contexts

mpadmin Command Overview

Types of mpadmin Commands

Configuration Control

Attribute Control

Context Navigation

Information Retrieval

Miscellaneous Commands

Additional mpadmin Functionality

Multiple Commands on a Line

Command Abbreviation

Using mpadmin

Note on Naming Partitions and Custom Attributes

Logging In to the Cluster

Customizing Cluster-Level Attributes

Managing Nodes

Managing Partitions

Setting Custom Attributes

6. hpc.conf Configuration File

ShmemResource Section

Guidelines for Setting Limits

MPIOptions Section

Setting MPI Spin Policy

CREOptions Section

Specifying the Cluster

Logging System Events

Enabling Core Files

Enabling Authentication

Changing the Maximum Number of Published Names

Identifying A Default Resource Manager

Limiting mprun's Ability to Launch Programs in Batch Mode

HPCNodes Section

PMODULES Section

PM Section

NAME Column

RANK Column

AVAIL Column

TCP-IP PM Section

Propagating hpc.conf Information

7. Maintenance and Troubleshooting

Cleaning Up Defunct Sun CRE Jobs

Removing Sun CRE Jobs That Have Exited

Removing Sun CRE Jobs That Have Not Terminated

Killing Orphaned Processes

Cleaning Up After RSM Failures

Using Diagnostics

Using Network Diagnostics

Checking Load Averages

Using Interval Diagnostics

Interpreting Sun CRE Error Messages

Anticipating Common Problems

Understanding Protocol-Related Errors

Errors When Sun CRE Daemons Load Protocol Modules

Errors When Protocol Modules Discover Interfaces

Errors When the RSM Protocol Module Reads MPI Options

Action of the RSM Daemon

Recovering From System Failure

Configuring Out Network Controllers

Using the PM Section

Using the MPIOptions Section

A. Cluster Console Manager Tools

Launching Cluster Console Tools

Common Window

Hosts Menu

Select Hosts Dialog

Options Menu

Help Menu

Text Field

Term Windows

Using the Cluster Console

Administering Configuration Files

The clusters File

The serialports File

Index