A P P E N D I X  C

Troubleshooting

This appendix describes some common problem situations, resulting error messages, and suggestions for fixing the problems. It includes the following topics:

Sun MPI error reporting, including I/O, follows the MPI-2 Standard. By default, errors are reported in the form of standard error classes. These classes and their meanings are listed in TABLE C-1 (for non-I/O MPI) and TABLE C-2 (for MPI I/O) and are also available on the MPI man page.

Three predefined error handlers are available in Sun MPI:


MPI Messages

You can make changes to and get information about the error handler by using any of the following routines:

Messages resulting from an MPI program fall into two categories:

Error Messages

Sun MPI error messages use a standard format:

[x y z] Error in function_name: errclass_string:intern(a):description:unixerrstring

Where

Warning Messages

Sun MPI warning messages also use a standard format:

[x y z] Warning message

Where message is a description of the error.

Standard Error Classes

TABLE C-1 lists the error return classes you can encounter in your MPI programs. Error values may also be found in mpi.h (for C), mpif.h (for Fortran), and mpi++.h (for C++).

MPI I/O messages are listed separately, in TABLE C-2.

TABLE C-1 Sun MPI Standard Error Classes

Error Code

Value

Meaning

MPI_SUCCESS

0

Successful return code.

MPI_ERR_BUFFER

1

Invalid buffer pointer.

MPI_ERR_COUNT

2

Invalid count argument.

MPI_ERR_TYPE

3

Invalid datatype argument.

MPI_ERR_TAG

4

Invalid tag argument.

MPI_ERR_COMM

5

Invalid communicator.

MPI_ERR_RANK

6

Invalid rank.

MPI_ERR_ROOT

7

Invalid root.

MPI_ERR_GROUP

8

Null group passed to function.

MPI_ERR_OP

9

Invalid operation.

MPI_ERR_TOPOLOGY

10

Invalid topology.

MPI_ERR_DIMS

11

Illegal dimension argument.

MPI_ERR_ARG

12

Invalid argument.

MPI_ERR_UNKNOWN

13

Unknown error.

MPI_ERR_TRUNCATE

14

Message truncated on receive.

MPI_ERR_OTHER

15

Other error; use Error_string.

MPI_ERR_INTERN

16

Internal error code.

MPI_ERR_IN_STATUS

17

Look in status for error value.

MPI_ERR_PENDING

18

Pending request.

MPI_ERR_REQUEST

19

Illegal MPI_Request() handle.

MPI_ERR_KEYVAL

36

Illegal key value.

MPI_ERR_INFO

37

Invalid info object.

MPI_ERR_INFO_KEY

38

Illegal info key.

MPI_ERR_INFO_NOKEY

39

No such key.

MPI_ERR_INFO_VALUE

40

Illegal info value.

MPI_ERR_TIMEDOUT

41

Timed out.

MPI_ERR_RESOURCES

42

Out of resources.

MPI_ERR_TRANSPORT

43

Transport layer error.

MPI_ERR_HANDSHAKE

44

Error accepting/connecting.

MPI_ERR_SPAWN

45

Error spawning.

MPI_ERR_WIN

46

Invalid window.

MPI_ERR_BASE

47

Invalid base.

MPI_ERR_SIZE

48

Invalid size.

MPI_ERR_DISP

49

Invalid displacement.

MPI_ERR_LOCKTYPE

50

Invalid lock type.

MPI_ERR_ASSERT

51

Invalid assert.

MPI_ERR_RMA_CONFLICT

52

Conflicting accesses to window.

MPI_ERR_RMA_SYNC

53

Erroneous RMA synchronization.

MPI_ERR_NO_MEM

54

Memory exhausted.

MPI_ERR_LASTCODE

55

Last error code.



MPI I/O Error Handling

Sun MPI I/O error reporting follows the MPI-2 Standard. By default, errors are reported in the form of standard error codes (found in /opt/SUNWhpc/include/mpi.h). Error classes and their meanings are listed in TABLE C-2. You can also find them in mpif.h (for Fortran) and mpi++.h (for C++).

You can change the default error handler by specifying MPI_FILE_NULL as the file handle with the routine MPI_File_set_errhandler(), even if no file is currently open. Or, you can use the same routine to change a specific file's error handler.

TABLE C-2 Sun MPI I/O Error Classes

Error Class

Value

Meaning

MPI_ERR_FILE

20

Bad file handle.

MPI_ERR_NOT_SAME

21

Collective argument not identical on all processes.

MPI_ERR_AMODE

22

Unsupported amode passed to open.

MPI_ERR_UNSUPPORTED_DATAREP

23

Unsupported datarep passed to MPI_File_set_view().

MPI_ERR_UNSUPPORTED_OPERATION

24

Unsupported operation, such as seeking on a file that supports only sequential access.

MPI_ERR_NO_SUCH_FILE

25

File (or directory) does not exist.

MPI_ERR_FILE_EXISTS

26

File exists.

MPI_ERR_BAD_FILE

27

Invalid file name (for example, path name too long).

MPI_ERR_ACCESS

28

Permission denied.

MPI_ERR_NO_SPACE

29

Not enough space.

MPI_ERR_QUOTA

30

Quota exceeded.

MPI_ERR_READ_ONLY

31

Read-only file system.

MPI_ERR_FILE_IN_USE

32

File operation could not be completed, because the file is currently open by a process.

MPI_ERR_DUP_DATAREP

33

Conversion functions could not be registered because a data representation identifier that was already defined was passed to MPI_REGISTER_DATAREP.

MPI_ERR_CONVERSION

34

An error occurred in a user-supplied data-conversion function.

MPI_ERR_IO

35

I/O error.

MPI_ERR_INFO

37

Invalid info object.

MPI_ERR_INFO_KEY

38

Illegal info key.

MPI_ERR_INFO_NOKEY

39

No such key.

MPI_ERR_INFO_VALUE

40

Illegal info value.

MPI_ERR_LASTCODE

55

Last error code.



Adjusting System V Shared-Memory Limits

When using the Sun MPI library on a cluster equipped with RSM links, memory allocated from System V shared memory is used for process communication.

Given that the maximum number of such shared-memory segments is set to a conservative value by default, calls to allocate new memory segments can fail due to these system-imposed limits. When such calls fail, check the values specified for the following variables in /etc/system:

If the values are too low, increase them by editing /etc/system and rebooting the system.