This appendix describes some common problem situations, resulting error messages, and suggestions for fixing the problems. It includes the following topics:
Sun MPI error reporting, including I/O, follows the MPI-2 Standard. By default, errors are reported in the form of standard error classes. These classes and their meanings are listed in TABLE C-1 (for non-I/O MPI) and TABLE C-2 (for MPI I/O) and are also available on the MPI man page.
Three predefined error handlers are available in Sun MPI:
- MPI_ERRORS_RETURN - The default; returns an error code if an error occurs.
- MPI_ERRORS_ARE_FATAL - I/O errors are fatal, and no error code is returned.
- MPI_THROW_EXCEPTION - A special error handler to be used only with C++.
MPI Messages
You can make changes to and get information about the error handler by using any of the following routines:
- MPI_Comm_call_errhandler
- MPI_File_call_errhandler
- MPI_Win_call_errhandler
- MPI_Comm_create_errhandler
- MPI_Comm_get_errhandler
- MPI_Comm_set_errhandler
- MPI_Add_error_class
- MPI_Add_error_code
- MPI_Add-error-string
Messages resulting from an MPI program fall into two categories:
- Error messages - Error messages stem from within MPI. Usually an error message explains why your program cannot complete, and the program aborts.
- Warning messages - Warnings stem from the environment in which you are running your MPI program and are usually sent by MPI_Init(). They are not associated with an aborted program; that is, programs continue to run despite warning messages.
Error Messages
Sun MPI error messages use a standard format:
[x y z] Error in function_name: errclass_string:intern(a):description:unixerrstring
Where
- [x y z] is the process communication identifier, which is present in every error message, and:
- x is the job ID (or jid).
- y is the name of the communicator if a name exists; otherwise it is the address of the opaque object.
- z is the rank of the process.
- function_name is the name of the associated MPI function. It is present in every error message.
- errclass_string is the string associated with the MPI error class. It is present in every error message.
- intern is an internal function. It is optional.
- a is a system call if one is the cause of the error. It is optional.
- description is a description of the error. It is optional.
- unixerrstring is the UNIX error string that describes system call a. It is optional.
Warning Messages
Sun MPI warning messages also use a standard format:
[x y z] Warning message
Where message is a description of the error.
Standard Error Classes
TABLE C-1 lists the error return classes you can encounter in your MPI programs. Error values may also be found in mpi.h (for C), mpif.h (for Fortran), and mpi++.h (for C++).
MPI I/O messages are listed separately, in TABLE C-2.
TABLE C-1 Sun MPI Standard Error Classes
Error Code
|
Value
|
Meaning
|
MPI_SUCCESS
|
0
|
Successful return code.
|
MPI_ERR_BUFFER
|
1
|
Invalid buffer pointer.
|
MPI_ERR_COUNT
|
2
|
Invalid count argument.
|
MPI_ERR_TYPE
|
3
|
Invalid datatype argument.
|
MPI_ERR_TAG
|
4
|
Invalid tag argument.
|
MPI_ERR_COMM
|
5
|
Invalid communicator.
|
MPI_ERR_RANK
|
6
|
Invalid rank.
|
MPI_ERR_ROOT
|
7
|
Invalid root.
|
MPI_ERR_GROUP
|
8
|
Null group passed to function.
|
MPI_ERR_OP
|
9
|
Invalid operation.
|
MPI_ERR_TOPOLOGY
|
10
|
Invalid topology.
|
MPI_ERR_DIMS
|
11
|
Illegal dimension argument.
|
MPI_ERR_ARG
|
12
|
Invalid argument.
|
MPI_ERR_UNKNOWN
|
13
|
Unknown error.
|
MPI_ERR_TRUNCATE
|
14
|
Message truncated on receive.
|
MPI_ERR_OTHER
|
15
|
Other error; use Error_string.
|
MPI_ERR_INTERN
|
16
|
Internal error code.
|
MPI_ERR_IN_STATUS
|
17
|
Look in status for error value.
|
MPI_ERR_PENDING
|
18
|
Pending request.
|
MPI_ERR_REQUEST
|
19
|
Illegal MPI_Request() handle.
|
MPI_ERR_KEYVAL
|
36
|
Illegal key value.
|
MPI_ERR_INFO
|
37
|
Invalid info object.
|
MPI_ERR_INFO_KEY
|
38
|
Illegal info key.
|
MPI_ERR_INFO_NOKEY
|
39
|
No such key.
|
MPI_ERR_INFO_VALUE
|
40
|
Illegal info value.
|
MPI_ERR_TIMEDOUT
|
41
|
Timed out.
|
MPI_ERR_RESOURCES
|
42
|
Out of resources.
|
MPI_ERR_TRANSPORT
|
43
|
Transport layer error.
|
MPI_ERR_HANDSHAKE
|
44
|
Error accepting/connecting.
|
MPI_ERR_SPAWN
|
45
|
Error spawning.
|
MPI_ERR_WIN
|
46
|
Invalid window.
|
MPI_ERR_BASE
|
47
|
Invalid base.
|
MPI_ERR_SIZE
|
48
|
Invalid size.
|
MPI_ERR_DISP
|
49
|
Invalid displacement.
|
MPI_ERR_LOCKTYPE
|
50
|
Invalid lock type.
|
MPI_ERR_ASSERT
|
51
|
Invalid assert.
|
MPI_ERR_RMA_CONFLICT
|
52
|
Conflicting accesses to window.
|
MPI_ERR_RMA_SYNC
|
53
|
Erroneous RMA synchronization.
|
MPI_ERR_NO_MEM
|
54
|
Memory exhausted.
|
MPI_ERR_LASTCODE
|
55
|
Last error code.
|
MPI I/O Error Handling
Sun MPI I/O error reporting follows the MPI-2 Standard. By default, errors are reported in the form of standard error codes (found in /opt/SUNWhpc/include/mpi.h). Error classes and their meanings are listed in TABLE C-2. You can also find them in mpif.h (for Fortran) and mpi++.h (for C++).
You can change the default error handler by specifying MPI_FILE_NULL as the file handle with the routine MPI_File_set_errhandler(), even if no file is currently open. Or, you can use the same routine to change a specific file's error handler.
TABLE C-2 Sun MPI I/O Error Classes
Error Class
|
Value
|
Meaning
|
MPI_ERR_FILE
|
20
|
Bad file handle.
|
MPI_ERR_NOT_SAME
|
21
|
Collective argument not identical on all processes.
|
MPI_ERR_AMODE
|
22
|
Unsupported amode passed to open.
|
MPI_ERR_UNSUPPORTED_DATAREP
|
23
|
Unsupported datarep passed to MPI_File_set_view().
|
MPI_ERR_UNSUPPORTED_OPERATION
|
24
|
Unsupported operation, such as seeking on a file that supports only sequential access.
|
MPI_ERR_NO_SUCH_FILE
|
25
|
File (or directory) does not exist.
|
MPI_ERR_FILE_EXISTS
|
26
|
File exists.
|
MPI_ERR_BAD_FILE
|
27
|
Invalid file name (for example, path name too long).
|
MPI_ERR_ACCESS
|
28
|
Permission denied.
|
MPI_ERR_NO_SPACE
|
29
|
Not enough space.
|
MPI_ERR_QUOTA
|
30
|
Quota exceeded.
|
MPI_ERR_READ_ONLY
|
31
|
Read-only file system.
|
MPI_ERR_FILE_IN_USE
|
32
|
File operation could not be completed, because the file is currently open by a process.
|
MPI_ERR_DUP_DATAREP
|
33
|
Conversion functions could not be registered because a data representation identifier that was already defined was passed to MPI_REGISTER_DATAREP.
|
MPI_ERR_CONVERSION
|
34
|
An error occurred in a user-supplied data-conversion function.
|
MPI_ERR_IO
|
35
|
I/O error.
|
MPI_ERR_INFO
|
37
|
Invalid info object.
|
MPI_ERR_INFO_KEY
|
38
|
Illegal info key.
|
MPI_ERR_INFO_NOKEY
|
39
|
No such key.
|
MPI_ERR_INFO_VALUE
|
40
|
Illegal info value.
|
MPI_ERR_LASTCODE
|
55
|
Last error code.
|
Adjusting System V Shared-Memory Limits
When using the Sun MPI library on a cluster equipped with RSM links, memory allocated from System V shared memory is used for process communication.
Given that the maximum number of such shared-memory segments is set to a conservative value by default, calls to allocate new memory segments can fail due to these system-imposed limits. When such calls fail, check the values specified for the following variables in /etc/system:
- shmsys:shminfo_shmseg
- shmsys:shminfo_shmmni
- shmsys:shminfo_shmmax
If the values are too low, increase them by editing /etc/system and rebooting the system.