Rationale for Shell and Utilities

The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2003 Edition
Copyright © 2001-2003 The IEEE and The Open Group, All Rights reserved.

Rationale for Shell and Utilities

Introduction

Scope

Refer to Scope .

Conformance

Refer to Conformance .

Normative References

There is no additional rationale provided for this section.

Change History

The change history is provided as an informative section, to track changes from previous issues of IEEE Std 1003.1-2001.

The following sections describe changes made to the Shell and Utilities volume of IEEE Std 1003.1-2001 since Issue 5 of the base document. The CHANGE HISTORY section for each utility describes technical changes made to that utility from Issue 5. Changes between earlier issues of the base document and Issue 5 are not included.

The change history between Issue 5 and Issue 6 also lists the changes since the ISO POSIX-2:1993 standard.

Changes from Issue 5 to Issue 6 (IEEE Std 1003.1-2001)

The following list summarizes the major changes that were made in the Shell and Utilities volume of IEEE Std 1003.1-2001 from Issue 5 to Issue 6:

This volume of IEEE Std 1003.1-2001 is extensively revised so that it can be both an IEEE POSIX Standard and an Open Group Technical Standard.
The terminology has been reworked to meet the style requirements.
Shading notation and margin codes are introduced for identification of options within the volume.
This volume of IEEE Std 1003.1-2001 is updated to mandate support of FIPS 151-2. The following changes were made:
- Support is mandated for the capabilities associated with the following symbolic constants:
```
_POSIX_CHOWN_RESTRICTED
_POSIX_JOB_CONTROL
_POSIX_SAVED_IDS
```
- In the environment for the login shell, the environment variables LOGNAME and HOME shall be defined and have the properties described in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 7, Locale.
This volume of IEEE Std 1003.1-2001 is updated to align with some features of the Single UNIX Specification.
A new section on Utility Limits is added.
A section on the Relationships to Other Documents is added.
Concepts and definitions have been moved to a separate volume.
A RATIONALE section is added to each reference page.
The c99 utility is added as a replacement for c89, which is withdrawn in this issue.
IEEE Std 1003.2d-1994 is incorporated, adding the qalter, qdel, qhold, qmove, qmsg, qrerun, qrls, qselect, qsig, qstat, and qsub utilities.
IEEE P1003.2b draft standard is incorporated, making extensive updates and adding the iconv utility.
IEEE PASC Interpretations are applied.
The Open Group's corrigenda and resolutions are applied.

New Features in Issue 6

The following table lists the new utilities introduced since the ISO POSIX-2:1993 standard (as modified by IEEE Std 1003.2d-1994). Apart from the c99 and iconv utilities, these are all part of the XSI extension.

New Utilities in Issue 6
admin c99 cal cflow	compress cxref delta fuser	gencat get hash iconv	ipcrm ipcs link m4	nl prs sact sccs	tsort ulimit uncompress unget	unlink uucp uustat uux	val what zcat

Terminology

Refer to Terminology .

Definitions

Refer to Definitions .

Relationship to Other Documents

System Interfaces

It has been pointed out that the Shell and Utilities volume of IEEE Std 1003.1-2001 assumes that a great deal of functionality from the System Interfaces volume of IEEE Std 1003.1-2001 is present, but never states exactly how much (and strictly does not need to since both are mandated on a conforming system). This section is an attempt to clarify the assumptions.

File Removal

This is intended to be a summary of the unlink() and rmdir() requirements. Note that it is possible using the unlink() function for item 4. to occur.

Concepts Derived from the ISO C Standard

This section was introduced to address the issue that there was insufficient detail presented by such utilities as awk or sh about their procedural control statements and their methods of performing arithmetic functions.

The ISO C standard was selected as a model because most historical implementations of the standard utilities were written in C. Thus, it was more likely that they would act in the desired manner without modification.

Using the ISO C standard is primarily a notational convenience so that the many procedural languages in the Shell and Utilities volume of IEEE Std 1003.1-2001 would not have to be rigorously described in every aspect. Its selection does not require that the standard utilities be written in Standard C; they could be written in Common Usage C, Ada, Pascal, assembler language, or anything else.

The sizes of the various numeric values refer to C-language data types that are allowed to be different sizes by the ISO C standard. Thus, like a C-language application, a shell application cannot rely on their exact size. However, it can rely on their minimum sizes expressed in the ISO C standard, such as {LONG_MAX} for a long type.

The behavior on overflow is undefined for ISO C standard arithmetic. Therefore, the standard utilities can use "bignum'' representation for integers so that there is no fixed maximum unless otherwise stated in the utility description. Similarly, standard utilities can use infinite-precision representations for floating-point arithmetic, as long as these representations exceed the ISO C standard requirements.

This section addresses only the issue of semantics; it is not intended to specify syntax. For example, the ISO C standard requires that 0L be recognized as an integer constant equal to zero, but utilities such as awk and sh are not required to recognize 0L (though they are allowed to, as an extension).

The ISO C standard requires that a C compiler must issue a diagnostic for constants that are too large to represent. Most standard utilities are not required to issue these diagnostics; for example, the command:

diff -C 2147483648 file1 file2

has undefined behavior, and the diff utility is not required to issue a diagnostic even if the number 2147483648 cannot be represented.

Portability

Refer to Portability .

Codes

Refer to Codes .

Utility Limits

This section grew out of an idea that originated with the original POSIX.1, in the tables of system limits for the sysconf() and pathconf() functions. The idea being that a conforming application can be written to use the most restrictive values that a minimal system can provide, but it should not have to. The values provided represent compromises so that some vendors can use historically limited versions of UNIX system utilities. They are the highest values that a strictly conforming application can assume, given no other information.

However, by using the getconf utility or the sysconf() function, the elegant application can be tailored to more liberal values on some of the specific instances of specific implementations.

There is no explicitly stated requirement that an implementation provide finite limits for any of these numeric values; the implementation is free to provide essentially unbounded capabilities (where it makes sense), stopping only at reasonable points such as {ULONG_MAX} (from the ISO C standard). Therefore, applications desiring to tailor themselves to the values on a particular implementation need to be ready for possibly huge values; it may not be a good idea to allocate blindly a buffer for an input line based on the value of {LINE_MAX}, for instance. However, unlike the System Interfaces volume of IEEE Std 1003.1-2001, there is no set of limits that return a special indication meaning "unbounded". The implementation should always return an actual number, even if the number is very large.

The statement:

"It is not guaranteed that the application ...''

is an indication that many of these limits are designed to ensure that implementors design their utilities without arbitrary constraints related to unimaginative programming. There are certainly conditions under which combinations of options can cause failures that would not render an implementation non-conforming. For example, {EXPR_NEST_MAX} and {ARG_MAX} could collide when expressions are large; combinations of {BC_SCALE_MAX} and {BC_DIM_MAX} could exceed virtual memory.

In the Shell and Utilities volume of IEEE Std 1003.1-2001, the notion of a limit being guaranteed for the process lifetime, as it is in the System Interfaces volume of IEEE Std 1003.1-2001, is not as useful to a shell script. The getconf utility is probably a process itself, so the guarantee would be without value. Therefore, the Shell and Utilities volume of IEEE Std 1003.1-2001 requires the guarantee to be for the session lifetime. This will mean that many vendors will either return very conservative values or possibly implement getconf as a built-in.

It may seem confusing to have limits that apply only to a single utility grouped into one global section. However, the alternative, which would be to disperse them out into their utility description sections, would cause great difficulty when sysconf() and getconf were described. Therefore, the standard developers chose the global approach.

Each language binding could provide symbol names that are slightly different from those shown here. For example, the C-Language Binding option adds a leading underscore to the symbols as a prefix.

The following comments describe selection criteria for the symbols and their values:

{ARG_MAX}: This is defined by the System Interfaces volume of IEEE Std 1003.1-2001. Unfortunately, it is very difficult for a conforming application to deal with this value, as it does not know how much of its argument space is being consumed by the environment variables of the user.
{BC_BASE_MAX}
{BC_DIM_MAX}
{BC_SCALE_MAX}: These were originally one value, {BC_SCALE_MAX}, but it was unreasonable to link all three concepts into one limit.
{CHILD_MAX}: This is defined by the System Interfaces volume of IEEE Std 1003.1-2001.
{COLL_WEIGHTS_MAX}: The weights assigned to order can be considered as "passes" through the collation algorithm.
{EXPR_NEST_MAX}: The value for expression nesting was borrowed from the ISO C standard.
{LINE_MAX}: This is a global limit that affects all utilities, unless otherwise noted. The {MAX_CANON} value from the System Interfaces volume of IEEE Std 1003.1-2001 may further limit input lines from terminals. The {LINE_MAX} value was the subject of much debate and is a compromise between those who wished to have unlimited lines and those who understood that many historical utilities were written with fixed buffers. Frequently, utility writers selected the UNIX system constant BUFSIZ to allocate these buffers; therefore, some utilities were limited to 512 bytes for I/O lines, while others achieved 4096 bytes or greater.
It should be noted that {LINE_MAX} applies only to input line length; there is no requirement in IEEE Std 1003.1-2001 that limits the length of output lines. Utilities such as awk, sed, and paste could theoretically construct lines longer than any of the input lines they received, depending on the options used or the instructions from the application. They are not required to truncate their output to {LINE_MAX}. It is the responsibility of the application to deal with this. If the output of one of those utilities is to be piped into another of the standard utilities, line length restrictions will have to be considered; the fold utility, among others, could be used to ensure that only reasonable line lengths reach utilities or applications.
{LINK_MAX}: This is defined by the System Interfaces volume of IEEE Std 1003.1-2001.
{MAX_CANON}
{MAX_INPUT}
{NAME_MAX}
{NGROUPS_MAX}
{OPEN_MAX}
{PATH_MAX}
{PIPE_BUF}: These limits are defined by the System Interfaces volume of IEEE Std 1003.1-2001. Note that the byte lengths described by some of these values continue to represent bytes, even if the applicable character set uses a multi-byte encoding.
{RE_DUP_MAX}: The value selected is consistent with historical practice. Although the name implies that it applies to all REs, only BREs use the interval notation \{m,n\} addressed by this limit.
{POSIX2_SYMLINKS}: The {POSIX2_SYMLINKS} variable indicates that the underlying operating system supports the creation of symbolic links in specific directories. Many of the utilities defined in IEEE Std 1003.1-2001 that deal with symbolic links do not depend on this value. For example, a utility that follows symbolic links (or does not, as the case may be) will only be affected by a symbolic link if it encounters one. Presumably, a file system that does not support symbolic links will not contain any. This variable does affect such utilities as ln -s and pax that attempt to create symbolic links.
{POSIX2_SYMLINKS} was developed even though there is no comparable configuration value for the system interfaces.

There are different limits associated with command lines and input to utilities, depending on the method of invocation. In the case of a C program exec-ing a utility, {ARG_MAX} is the underlying limit. In the case of the shell reading a script and exec-ing a utility, {LINE_MAX} limits the length of lines the shell is required to process, and {ARG_MAX} will still be a limit. If a user is entering a command on a terminal to the shell, requesting that it invoke the utility, {MAX_INPUT} may restrict the length of the line that can be given to the shell to a value below {LINE_MAX}.

When an option is supported, getconf returns a value of 1. For example, when C development is supported:

if [ "$(getconf POSIX2_C_DEV)" -eq 1 ]; then
    echo C supported
fi

The sysconf() function in the C-Language Binding option would return 1.

The following comments describe selection criteria for the symbols and their values:

POSIX2_C_BIND

POSIX2_C_DEV

POSIX2_FORT_DEV

POSIX2_FORT_RUN

POSIX2_SW_DEV

POSIX2_UPE

It is possible for some (usually privileged) operations to remove utilities that support these options or otherwise to render these options unsupported. The header files, the sysconf() function, or the getconf utility will not necessarily detect such actions, in which case they should not be considered as rendering the implementation non-conforming. A test suite should not attempt tests such as:

rm /usr/bin/c99
getconf POSIX2_C_DEV

POSIX2_LOCALEDEF

This symbol was introduced to allow implementations to restrict supported locales to only those supplied by the implementation.

IEEE Std 1003.1-2001/Cor 1-2002, item XCU/TC1/D6/2 is applied, deleting the entry for {POSIX2_VERSION} since it is not a utility limit minimum value.

IEEE Std 1003.1-2001/Cor 1-2002, item XCU/TC1/D6/3 is applied, changing the text in Utility Limits from: "utility (see getconf) through the sysconf() function defined in the System Interfaces volume of IEEE Std 1003.1-2001. The literal names shown in Table 1-3 apply only to the getconf utility; the high-level language binding describes the exact form of each name to be used by the interfaces in that binding." to: "utility (see getconf).".

Grammar Conventions

There is no additional rationale provided for this section.

Utility Description Defaults

This section is arranged with headings in the same order as all the utility descriptions. It is a collection of related and unrelated information concerning:

The default actions of utilities
The meanings of notations used in IEEE Std 1003.1-2001 that are specific to individual utility sections

Although this material may seem out of place here, it is important that this information appear before any of the utilities to be described later.

NAME

There is no additional rationale provided for this section.

SYNOPSIS

There is no additional rationale provided for this section.

DESCRIPTION

There is no additional rationale provided for this section.

OPTIONS

Although it has not always been possible, the standard developers tried to avoid repeating information to reduce the risk that duplicate explanations could each be modified differently.

The need to recognize -- is required because conforming applications need to shield their operands from any arbitrary options that the implementation may provide as an extension. For example, if the standard utility foo is listed as taking no options, and the application needed to give it a pathname with a leading hyphen, it could safely do it as:

foo -- -myfile

and avoid any problems with -m used as an extension.

OPERANDS

The usage of - is never shown in the SYNOPSIS. Similarly, the usage of -- is never shown.

The requirement for processing operands in command-line order is to avoid a "WeirdNIX" utility that might choose to sort the input files alphabetically, by size, or by directory order. Although this might be acceptable for some utilities, in general the programmer has a right to know exactly what order will be chosen.

Some of the standard utilities take multiple file operands and act as if they were processing the concatenation of those files. For example:

asa file1 file2

and:

cat file1 file2 | asa

have similar results when questions of file access, errors, and performance are ignored. Other utilities such as grep or wc have completely different results in these two cases. This latter type of utility is always identified in its DESCRIPTION or OPERANDS sections, whereas the former is not. Although it might be possible to create a general assertion about the former case, the following points must be addressed:

Access times for the files might be different in the operand case versus the cat case.
The utility may have error messages that are cognizant of the input filename, and this added value should not be suppressed. (As an example, awk sets a variable with the filename at each file boundary.)

STDIN

There is no additional rationale provided for this section.

INPUT FILES

A conforming application cannot assume the following three commands are equivalent:

tail -n +2 file
(sed -n 1q; cat) < file
cat file | (sed -n 1q; cat)

The second command is equivalent to the first only when the file is seekable. In the third command, if the file offset in the open file description were not unspecified, sed would have to be implemented so that it read from the pipe 1 byte at a time or it would have to employ some method to seek backwards on the pipe. Such functionality is not defined currently in POSIX.1 and does not exist on all historical systems. Other utilities, such as head, read, and sh, have similar properties, so the restriction is described globally in this section.

The definition of "text file" is strictly enforced for input to the standard utilities; very few of them list exceptions to the undefined results called for here. (Of course, "undefined" here does not mean that historical implementations necessarily have to change to start indicating error conditions. Conforming applications cannot rely on implementations succeeding or failing when non-text files are used.)

The utilities that allow line continuation are generally those that accept input languages, rather than pure data. It would be unusual for an input line of this type to exceed {LINE_MAX} bytes and unreasonable to require that the implementation allow unlimited accumulation of multiple lines, each of which could reach {LINE_MAX}. Thus, for a conforming application the total of all the continued lines in a set cannot exceed {LINE_MAX}.

The format description is intended to be sufficiently rigorous to allow other applications to generate these input files. However, since <blank>s can legitimately be included in some of the fields described by the standard utilities, particularly in locales other than the POSIX locale, this intent is not always realized.

ENVIRONMENT VARIABLES

There is no additional rationale provided for this section.

ASYNCHRONOUS EVENTS

Because there is no language prohibiting it, a utility is permitted to catch a signal, perform some additional processing (such as deleting temporary files), restore the default signal action (or action inherited from the parent process), and resignal itself.

STDOUT

The format description is intended to be sufficiently rigorous to allow post-processing of output by other programs, particularly by an awk or lex parser.

STDERR

This section does not describe error messages that refer to incorrect operation of the utility. Consider a utility that processes program source code as its input. This section is used to describe messages produced by a correctly operating utility that encounters an error in the program source code on which it is processing. However, a message indicating that the utility had insufficient memory in which to operate would not be described.

Some utilities have traditionally produced warning messages without returning a non-zero exit status; these are specifically noted in their sections. Other utilities shall not write to standard error if they complete successfully, unless the implementation provides some sort of extension to increase the verbosity or debugging level.

The format descriptions are intended to be sufficiently rigorous to allow post-processing of output by other programs.

OUTPUT FILES

The format description is intended to be sufficiently rigorous to allow post-processing of output by other programs, particularly by an awk or lex parser.

Receipt of the SIGQUIT signal should generally cause termination (unless in some debugging mode) that would bypass any attempted recovery actions.

EXTENDED DESCRIPTION

There is no additional rationale provided for this section.

EXIT STATUS

Note the additional discussion of exit values in Exit Status for Commands in the sh utility. It describes requirements for returning exit values greater than 125.

A utility may list zero as a successful return, 1 as a failure for a specific reason, and greater than 1 as "an error occurred". In this case, unspecified conditions may cause a 2 or 3, or other value, to be returned. A strictly conforming application should be written so that it tests for successful exit status values (zero in this case), rather than relying upon the single specific error value listed in IEEE Std 1003.1-2001. In that way, it will have maximum portability, even on implementations with extensions.

The standard developers are aware that the general non-enumeration of errors makes it difficult to write test suites that test the incorrect operation of utilities. There are some historical implementations that have expended effort to provide detailed status messages and a helpful environment to bypass or explain errors, such as prompting, retrying, or ignoring unimportant syntax errors; other implementations have not. Since there is no realistic way to mandate system behavior in cases of undefined application actions or system problems-in a manner acceptable to all cultures and environments-attention has been limited to the correct operation of utilities by the conforming application. Furthermore, the conforming application does not need detailed information concerning errors that it caused through incorrect usage or that it cannot correct.

There is no description of defaults for this section because all of the standard utilities specify something (or explicitly state "Unspecified") for exit status.

CONSEQUENCES OF ERRORS

Several actions are possible when a utility encounters an error condition, depending on the severity of the error and the state of the utility. Included in the possible actions of various utilities are: deletion of temporary or intermediate work files; deletion of incomplete files; and validity checking of the file system or directory.

The text about recursive traversing is meant to ensure that utilities such as find process as many files in the hierarchy as they can. They should not abandon all of the hierarchy at the first error and resume with the next command-line operand, but should attempt to keep going.

APPLICATION USAGE

This section provides additional caveats, issues, and recommendations to the developer.

EXAMPLES

This section provides sample usage.

RATIONALE

There is no additional rationale provided for this section.

FUTURE DIRECTIONS

FUTURE DIRECTIONS sections act as pointers to related work that may impact the interface in the future, and often cautions the developer to architect the code to account for a change in this area. Note that a future directions statement should not be taken as a commitment to adopt a feature or interface in the future.

CHANGE HISTORY

There is no additional rationale provided for this section.

Considerations for Utilities in Support of Files of Arbitrary Size

This section is intended to clarify the requirements for utilities in support of large files.

The utilities listed in this section are utilities which are used to perform administrative tasks such as to create, move, copy, remove, change the permissions, or measure the resources of a file. They are useful both as end-user tools and as utilities invoked by applications during software installation and operation.

The chgrp, chmod, chown, ln, and rm utilities probably require use of large file-capable versions of stat(), lstat(), ftw(), and the stat structure.

The cat, cksum, cmp, cp, dd, mv, sum, and touch utilities probably require use of large file-capable versions of creat(), open(), and fopen().

The cat, cksum, cmp, dd, df, du, ls, and sum utilities may require writing large integer values. For example:

The cat utility might have a -n option which counts <newline>s.
The cksum and ls utilities report file sizes.
The cmp utility reports the line number at which the first difference occurs, and also has a -l option which reports file offsets.
The dd, df, du, ls, and sum utilities report block counts.

The dd, find, and test utilities may need to interpret command arguments that contain 64-bit values. For dd, the arguments include skip= n, seek= n, and count= n. For find, the arguments include -size n. For test, the arguments are those associated with algebraic comparisons.

The df utility might need to access large file systems with statvfs().

The ulimit utility will need to use large file-capable versions of getrlimit() and setrlimit() and be able to read and write large integer values.

Built-In Utilities

All of these utilities can be exec-ed. There is no requirement that these utilities are actually built into the shell itself, but many shells need the capability to do so because the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.9.1.1, Command Search and Execution requires that they be found prior to the PATH search. The shell could satisfy its requirements by keeping a list of the names and directly accessing the file-system versions regardless of PATH . Providing all of the required functionality for those such as cd or read would be more difficult.

There were originally three justifications for allowing the omission of exec-able versions:

It would require wasting space in the file system, at the expense of very small systems. However, it has been pointed out that all 16 utilities in the table can be provided with 16 links to a single-line shell script:
```
$0 "$@"
```
It is not logical to require invocation of utilities such as cd because they have no value outside the shell environment or cannot be useful in a child process. However, counter-examples always seemed to be available for even the most unusual cases:
```
find . -type d -exec cd {} \; -exec foo {} \;
    (which invokes "foo" on accessible directories)

ps ... | sed ... | xargs kill


find . -exec true \; -a ...
    (where "true" is used for temporary debugging)
```
It is confusing to have a utility such as kill that can easily be in the file system in the base standard, but that requires built-in status for the User Portability Utilities option (for the % job control job ID notation). It was decided that it was more appropriate to describe the required functionality (rather than the implementation) to the system implementors and let them decide how to satisfy it.

On the other hand, it was realized that any distinction like this between utilities was not useful to applications, and that the cost to correct it was small. These arguments were ultimately the most effective.

There were varying reasons for including utilities in the table of built-ins:

alias, fc, unalias: The functionality of these utilities is performed more simply within the shell itself and that is the model most historical implementations have used.
bg, fg, jobs: All of the job control-related utilities are eligible for built-in status because that is the model most historical implementations have used.
cd, getopts, newgrp, read, umask, wait: The functionality of these utilities is performed more simply within the context of the current process. An example can be taken from the usage of the cd utility. The purpose of the cd utility is to change the working directory for subsequent operations. The actions of cd affect the process in which cd is executed and all subsequent child processes of that process. Based on the POSIX standard process model, changes in the process environment of a child process have no effect on the parent process. If the cd utility were executed from a child process, the working directory change would be effective only in the child process. Child processes initiated subsequent to the child process that executed the cd utility would not have a changed working directory relative to the parent process.
command: This utility was placed in the table primarily to protect scripts that are concerned about their PATH being manipulated. The "secure" shell script example in the command utility in the Shell and Utilities volume of IEEE Std 1003.1-2001 would not be possible if a PATH change retrieved an alien version of command. (An alternative would have been to implement getconf as a built-in, but the standard developers considered that it carried too many changing configuration strings to require in the shell.)
kill: Since kill provides optional job control functionality using shell notation ( %1 , %2 , and so on), some implementations would find it extremely difficult to provide this outside the shell.
true, false: These are in the table as a courtesy to programmers who wish to use the "while true" shell construct without protecting true from PATH searches. (It is acknowledged that "while :" also works, but the idiom with true is historically pervasive.)

All utilities, including those in the table, are accessible via the system() and popen() functions in the System Interfaces volume of IEEE Std 1003.1-2001. There are situations where the return functionality of system() and popen() is not desirable. Applications that require the exit status of the invoked utility will not be able to use system() or popen(), since the exit status returned is that of the command language interpreter rather than that of the invoked utility. The alternative for such applications is the use of the exec family.

UNIX ® is a registered Trademark of The Open Group.
POSIX ® is a registered Trademark of The IEEE.
[ Main Index | XBD | XCU | XSH | XRAT ]