The variable environ is not intended to be declared in any header, but rather to be declared by the user for accessing the array of strings that is the environment. This is the traditional usage of the symbol. Putting it into a header could break some programs that use the symbol for their own purposes.
The decision to restrict conforming systems to the use of digits, uppercase letters, and underscores for environment variable names allows applications to use lowercase letters in their environment variable names without conflicting with any conforming system.
In addition to the obvious conflict with the shell syntax for positional parameter substitution, some historical applications (including some shells) exclude names with leading digits from the environment.
The text about locale implies that any utilities written in standard C and conforming to IEEE Std 1003.1-2001 must issue the following call:
setlocale(LC_ALL, "")
If this were omitted, the ISO C standard specifies that the C locale would be used.
If any of the environment variables are invalid, it makes sense to default to an implementation-defined, consistent locale environment. It is more confusing for a user to have partial settings occur in case of a mistake. All utilities would then behave in one language/cultural environment. Furthermore, it provides a way of forcing the whole environment to be the implementation-defined default. Disastrous results could occur if a pipeline of utilities partially uses the environment variables in different ways. In this case, it would be appropriate for utilities that use LANG and related variables to exit with an error if any of the variables are invalid. For example, users typing individual commands at a terminal might want date to work if LC_MONETARY is invalid as long as LC_TIME is valid. Since these are conflicting reasonable alternatives, IEEE Std 1003.1-2001 leaves the results unspecified if the locale environment variables would not produce a complete locale matching the specification of the user.
The locale settings of individual categories cannot be truly independent and still guarantee correct results. For example, when collating two strings, characters must first be extracted from each string (governed by LC_CTYPE ) before being mapped to collating elements (governed by LC_COLLATE ) for comparison. That is, if LC_CTYPE is causing parsing according to the rules of a large, multi-byte code set (potentially returning 20000 or more distinct character codeset values), but LC_COLLATE is set to handle only an 8-bit codeset with 256 distinct characters, meaningful results are obviously impossible.
The LC_MESSAGES variable affects the language of messages generated by the standard utilities.
The description of the environment variable names starting with the characters "LC_" acknowledges the fact that the interfaces presented may be extended as new international functionality is required. In the ISO C standard, names preceded by "LC_" are reserved in the name space for future categories.
To avoid name clashes, new categories and environment variables are divided into two classifications: "implementation-independent" and "implementation-defined".
Implementation-independent names will have the following format:
LC_NAME
where NAME is the name of the new category and environment variable. Capital letters must be used for implementation-independent names.
Implementation-defined names must be in lowercase letters, as below:
LC_name
The default values for the number of column positions, COLUMNS , and screen height, LINES , are unspecified because historical implementations use different methods to determine values corresponding to the size of the screen in which the utility is run. This size is typically known to the implementation through the value of TERM , or by more elaborate methods such as extensions to the stty utility or knowledge of how the user is dynamically resizing windows on a bit-mapped display terminal. Users should not need to set these variables in the environment unless there is a specific reason to override the default behavior of the implementation, such as to display data in an area arbitrarily smaller than the terminal or window. Values for these variables that are not decimal integers greater than zero are implicitly undefined values; it is unnecessary to enumerate all of the possible values outside of the acceptable set.
In most implementations, the value of such a variable is easily forged, so security-critical applications should rely on other means of determining user identity. LOGNAME is required to be constructed from the portable filename character set for reasons of interchange. No diagnostic condition is specified for violating this rule, and no requirement for enforcement exists. The intent of the requirement is that if extended characters are used, the "guarantee" of portability implied by a standard is void.
Many historical implementations of the Bourne shell do not interpret a trailing colon to represent the current working directory and are thus non-conforming. The C Shell and the KornShell conform to IEEE Std 1003.1-2001 on this point. The usual name of dot may also be used to refer to the current working directory.
Many implementations historically have used a default value of /bin and /usr/bin for the PATH variable. IEEE Std 1003.1-2001 does not mandate this default path be identical to that retrieved from getconf _CS_PATH because it is likely that the standardized utilities may be provided in another directory separate from the directories used by some historical applications.
The SHELL variable names the preferred shell of the user; it is a guide to applications. There is no direct requirement that that shell conform to IEEE Std 1003.1-2001; that decision should rest with the user. It is the intention of the standard developers that alternative shells be permitted, if the user chooses to develop or acquire one. An operating system that builds its shell into the "kernel" in such a manner that alternative shells would be impossible does not conform to the spirit of IEEE Std 1003.1-2001.
The quoted form of the timezone variable allows timezone names of the form UTC+1 (or any name that contains the character plus ( '+' ), the character minus ( '-' ), or digits), which may be appropriate for countries that do not have an official timezone name. It would be coded as <UTC+1>+1<UTC+2>, which would cause std to have a value of UTC+1 and dst a value of UTC+2, each with a length of 5 characters. This does not appear to conflict with any existing usage. The characters '<' and '>' were chosen for quoting because they are easier to parse visually than a quoting character that does not provide some sense of bracketing (and in a string like this, such bracketing is helpful). They were also chosen because they do not need special treatment when assigning to the TZ variable. Users are often confused by embedding quotes in a string. Because '<' and '>' are meaningful to the shell, the whole string would have to be quoted, but that is easily explained. (Parentheses would have presented the same problems.) Although the '>' symbol could have been permitted in the string by either escaping it or doubling it, it seemed of little value to require that. This could be provided as an extension if there was a need. Timezone names of this new form lead to a requirement that the value of {_POSIX_TZNAME_MAX} change from 3 to 6.
Since the TZ environment variable is usually inherited by all applications started by a user after the value of the TZ environment variable is changed and since many applications run using the C or POSIX locale, using characters that are not in the portable character set in the std and dst fields could cause unexpected results.
The format of the TZ environment variable is changed in Issue 6 to allow for the quoted form, as defined in previous versions of the ISO POSIX-1 standard.
IEEE Std 1003.1-2001/Cor 1-2002, item XBD/TC1/D6/7 is applied, adding the ctime_r() and localtime_r() functions to the list of functions that use the TZ environment variable.