[BE] This chapter describes the services and utilities that shall be implemented on all systems that claim conformance to the Batch Environment Services and Utilities option. This functionality is dependent on support of this option (and the rest of this section is not further marked for this option).
Batch jobs are created and managed by batch servers. A batch client interacts with a batch server to access batch services on behalf of the user. In order to use batch services, a user must have access to a batch client.
A batch server is a computational entity, such as a daemon process, that provides batch services. Batch servers route, queue, modify, and execute batch jobs on behalf of batch clients.
The batch utilities described in this volume of IEEE Std 1003.1-2001 (and listed in Batch
Utilities ) are clients of batch services; they allow users to perform actions on the job such as creating, modifying, and
deleting batch jobs from a shell command line. Although these batch utilities may be said to accomplish certain services, they
actually obtain services on behalf of a user by means of requests to batch servers.
Client-server interaction takes place by means of the batch requests defined in this chapter. Because direct access to batch jobs and queues is limited to batch servers, clients and servers of different implementations can interoperate, since dependencies on private structures for batch jobs and queues are limited to batch servers. Also, batch servers may be clients of other batch servers.
Two types of batch queue are described: routing queues and execution queues. When a batch job is placed in a routing queue, it is a candidate for routing. A batch job is removed from routing queues under the following conditions:
The batch job has been routed to another queue.
The batch job has been deleted from the batch queue.
The batch job has been aborted.
When a batch job is placed in an execution queue, it is a candidate for execution.
A batch job is removed from an execution queue under the following conditions:
The batch job has been executed and exited.
The batch job has been aborted.
The batch job has been deleted from the batch queue.
The batch job has been moved to another queue.
Access to a batch queue is limited to the batch server that manages the batch queue. Clients never access a batch queue or a batch job directly, either to read or write information; all client access to batch queues or jobs takes place through batch servers.
When a batch server creates a batch job on behalf of a client, it shall assign a batch job identifier to the job. A batch job identifier consists of both a sequence number that is unique among the sequence numbers issued by that server and the name of the server. Since the batch server name is unique within a name space, the job identifier is likewise unique within the name space.
The batch server that creates a batch job shall return the batch server-assigned job identifier to the client that requested the job creation. If the batch server routes or moves the job to another server, it sends the job identifier with the job. Once assigned, the job identifier of a batch job shall never change.
Since a batch job may be moved after creation, the batch server name component of the job identifier need not indicate the location of the job. An implementation may provide a batch job tracking mechanism, in which case the user generally does not need to know the location of the job. However, an implementation need not provide a batch job tracking mechanism, in which case the user must find routed jobs by probing the possible destinations.
To route a batch job, a batch server either moves the job to some other queue that is managed by the batch server, or requests that some other batch server accept the job.
Each routing queue has one or more queues to which it can route batch jobs. The batch server administrator creates routing queues.
A batch server may route a batch job from a routing queue to another routing queue. Batch servers shall prevent or otherwise handle cases of circular routing paths. As a deferred service, a batch server routes jobs from the routing queues that it manages. The algorithm by which a batch server selects a batch queue to which to route a batch job is implementation-defined.
A batch job need not be eligible for routing to all the batch queues fed by the routing queue from which it is routed. A batch server that has been asked to accept the job may reject the request if the job requires resources that are unavailable to that batch server, or if the client is not authorized to access the batch server.
Batch servers may route high-priority jobs before low-priority jobs, but, on other than overloaded systems, the effect may be imperceptible to the user. If all the batch servers fed by a routing queue reject requests to accept the job for reasons that are permanent, the batch server that manages the job shall abort the job. If all or some rejections are temporary, the batch server should try to route the job again at some later point.
The reasons for rejecting a batch job are implementation-defined. The reasons for which the routing should be retried later and the reasons for which the job should be aborted are also implementation-defined.
To execute a batch job is to create a session leader (a process) that runs the shell program indicated by the Shell_Path attribute of the job. The script shall be passed to the program as its standard input. An implementation may pass the script to the program by other implementation-defined means. At the time a batch job begins execution, it is defined to enter the RUNNING state. The primary program that is executed by a batch job is typically, though not necessarily, a shell program.
A batch server shall execute eligible jobs as a deferred service-no client request is necessary once the batch job is created and eligible. However, the attributes of a batch job, such as the job hold type, may render the job ineligible. A batch server shall scan the execution queues that it manages for jobs that are eligible for execution. The algorithm by which the batch server selects eligible jobs for execution is implementation-defined.
As part of creating the process for the batch job, the batch server shall open the standard output and standard error streams of the session.
The attributes of a batch job may indicate that the batch server executing the job shall send mail to a list of users at the time it begins execution of the job.
When the session leader of an executing job terminates, the job exits. As part of exiting a batch job, the batch server that manages the job shall remove the job from the batch queue in which it resides. The server shall transfer output files of the job to a location described by the attributes of the job.
The attributes of a batch job may indicate that the batch server managing the job shall send mail to a list of users at the time the job exits.
A batch server shall abort jobs for which a required deferred service cannot be performed. The attributes of a batch job may indicate that the batch server that aborts the job shall send mail to a list of users at the time it aborts the job.
Clients, such as the batch environment utilities (marked BE), access batch services by means of requests to one or more batch servers. To acquire the services of any given batch server, the user identifier under which the client runs must be authorized to use that batch server.
The user with an associated user name that creates a batch job shall own the job and can perform actions such as read, modify, delete, and move.
A user identifier of the same value at a different host need not be the same user. For example, user name smith at host alpha may or may not represent the same person as user name smith at host beta. Likewise, the same person may have access to different user names on different hosts.
An implementation may optionally provide an authorization mechanism that permits one user name to access jobs under another user name.
A process on a client host may be authorized to run processes under multiple user names at a batch server host. Where appropriate, the utilities defined in this volume of IEEE Std 1003.1-2001 provide a means for a user to choose from among such user names when creating or modifying a batch job.
The processing of a batch job by a batch server is affected by the attributes of the job. The processing of a batch job may also be affected by the attributes of the batch queue in which the job resides and by the status of the batch server that manages the job. See also the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 3, Definitions for batch definitions.
Whereas batch servers are persistent entities, clients are often transient. For example, the qsub utility creates a batch job and exits. For this reason, batch servers notify users of batch job events by sending mail to the user that owns the job, or to other designated users.
The presence of Batch Environment Services and Utilities option services is indicated by the configuration variable POSIX2_PBS. A conforming batch server provides services as defined in this section.
A batch server shall provide batch services in two ways:
The batch server provides a service at the request of a client.
The batch server provides a deferred service as a result of a change in conditions monitored by the batch server.
If a batch server cannot complete a request, it shall reject the request. If a batch server cannot complete a deferred service
for a batch job, the batch server shall abort the batch job. Environment Variable Summary is a summary of
environment variables that shall be supported by an implementation of the batch server and utilities.
Variable |
Description |
---|---|
PBS_DPREFIX |
Defines the directive prefix (see qsub ) |
PBS_ENVIRONMENT |
Batch Job is batch or interactive (see Batch Job Execution ) |
PBS_JOBID |
The job_identifier attribute of job (see Queue Batch Job Request ) |
PBS_JOBNAME |
The job_name attribute of job (see Queue Batch Job Request ) |
PBS_O_HOME |
Defines the HOME of the batch client (see qsub ) |
PBS_O_HOST |
Defines the host name of the batch client (see qsub ) |
PBS_O_LANG |
Defines the LANG of the batch client (see qsub ) |
PBS_O_LOGNAME |
Defines the LOGNAME of the batch client (see qsub ) |
PBS_O_MAIL |
Defines the MAIL of the batch client (see qsub ) |
PBS_O_PATH |
Defines the PATH of the batch client (see qsub ) |
PBS_O_QUEUE |
Defines the submit queue of the batch client (see qsub ) |
PBS_O_SHELL |
Defines the SHELL of the batch client (see qsub ) |
PBS_O_TZ |
Defines the TZ of the batch client (see qsub ) |
PBS_O_WORKDIR |
Defines the working directory of the batch client (see qsub ) |
PBS_QUEUE |
Defines the initial execution queue (see Batch Job Execution ) |
A batch job shall always be in one of the following states: QUEUED, RUNNING, HELD, WAITING, EXITING, or TRANSITING. The state of a batch job determines the types of requests that the batch server that manages the batch job can accept for the batch job. A batch server shall change the state of a batch job either in response to service requests from clients or as a result of deferred services, such as job execution or job routing.
A batch job that is in the QUEUED state resides in a queue but is still pending either execution or routing, depending on the queue type.
A batch server that queues a batch job in a routing queue shall put the batch job in the QUEUED state. A batch server that puts a batch job in an execution queue, but has not yet executed the batch job, shall put the batch job in the QUEUED state. A batch job that resides in an execution queue and is executing is defined to be in the RUNNING state. While a batch job is in the RUNNING state, a session leader is associated with the batch job.
A batch job that resides in an execution queue, but is ineligible to run because of a hold attribute, is defined to be in the HELD state.
A batch job that is not held, but must wait until a future date and time before executing, is defined to be in the WAITING state.
When the session leader associated with a running job exits, the batch job shall be placed in the EXITING state.
A batch job for which the session leader has terminated is defined to be in the EXITING state, and the batch server that manages such a batch job cannot accept job modification requests that affect the batch job. While a batch job is in the EXITING state, the batch server that manages the batch job is staging output files and notifying clients of job completion. Once a batch job has exited, it no longer exists as an object managed by a batch server.
A batch job that is being moved from a routing queue to another queue is defined to be in the TRANSITING state.
When a batch job in a routing queue has been selected to be moved to a new destination, then the batch job shall be in either the QUEUED state or the TRANSITING state, depending on the batch server implementation.
Batch jobs with either an Execution_Time attribute value set in the future or a Hold_Types attribute of value not equal to NO_HOLD, or both, may be routed or held in the routing queue. The treatment of jobs with the Execution_Time or Hold_Types attributes in a routing queue is implementation-defined.
When a batch job in a routing queue has not been selected to be moved to a new destination and the batch job has a
Hold_Types attribute value of other than NO_HOLD, then the job should be in the HELD state.
When a batch job in a routing queue has not been selected to be moved to a new destination and the batch job has:
A Hold_Types attribute value of NO_HOLD
An Execution_Time attribute in the past
then the batch job shall be in the QUEUED state.
When a batch job in a routing queue has not been selected to be moved to a new destination and the batch job has:
A Hold_Types attribute value of NO_HOLD
An Execution_Time attribute in the future
then the batch job may be in the WAITING state.
Next State Table describes the next state of a batch job, given the current state of the batch job and the type of request. Results/Output Table describes the response of a batch server to a request, given the current state of the batch job and the type of request.
This section describes the deferred services performed by batch servers: job execution, job routing, job exit, job abort, and the rerunning of jobs after a restart.
To execute a batch job is to create a session leader (a process) that runs the shell program indicated by the
Shell_Path_List attribute of the batch job. The script is passed to the program as its standard input. An implementation may
pass the script to the program by other implementation-defined means. At the time a batch job begins execution, it is defined to
enter the RUNNING state.
|
Current State |
||||||
---|---|---|---|---|---|---|---|
|
_ |
||||||
Request Type |
X |
Q |
R |
H |
W |
E |
T |
Queue Batch Job Request |
Q |
e |
e |
e |
e |
e |
e |
Modify Batch Job Request |
e |
Q |
R |
H |
W |
e |
T |
Delete Batch Job Request |
e |
X |
E |
X |
X |
E |
X |
Batch Job Message Request |
e |
Q |
R |
H |
W |
E |
T |
Rerun Batch Job Request |
e |
e |
Q |
e |
e |
e |
e |
Signal Batch Job Request |
e |
e |
R |
H |
W |
e |
e |
Batch Job Status Request |
e |
Q |
R |
H |
W |
E |
T |
Batch Queue Status Request |
X |
Q |
R |
H |
W |
E |
T |
Server Status Request |
X |
Q |
R |
H |
W |
E |
T |
Select Batch Jobs Request |
X |
Q |
R |
H |
W |
E |
T |
Move Batch Job Request |
e |
Q |
R |
H |
W |
e |
T |
Hold Batch Job Request |
e |
H |
R/H |
H |
H |
e |
T |
Release Batch Job Request |
e |
Q |
R |
Q/W/H |
W |
e |
T |
Server Shutdown Request |
X |
Q |
Q |
H |
W |
E |
T |
Locate Batch Job Request |
e |
Q |
R |
H |
W |
E |
T |
A batch server that has an execution queue containing jobs is said to own the queue and manage the batch jobs in that queue. A batch server that has been started shall execute the batch jobs in the execution queues owned by the batch server. The batch server shall schedule for execution those jobs in the execution queues that are in the QUEUED state. The algorithm for scheduling jobs is implementation-defined.
A batch server that executes a batch job shall create, in the environment of the session leader of the batch job, an environment variable named PBS_ENVIRONMENT , the value of which is the string PBS_BATCH encoded in the portable character set.
A batch server that executes a batch job shall create, in the environment of the session leader of the batch job, an environment variable named PBS_QUEUE , the value of which is the name of the execution queue of the batch job encoded in the portable character set.
To rerun a batch job is to requeue a batch job that is currently executing and then kill the session leader of the executing job
by sending a SIGKILL prior to completion; see Rerun Batch Job Request . A batch server that reruns a
batch job shall append the standard output and standard error files of the batch job to the corresponding files of the previous
execution, if they exist, with appropriate annotation. If either file does not exist, that file shall be created as in normal
execution.
|
Current State |
||||||
---|---|---|---|---|---|---|---|
|
_ |
||||||
Request Type |
X |
Q |
R |
H |
W |
E |
T |
Queue Batch Job Request |
O |
e |
e |
e |
e |
e |
e |
Modify Batch Job Request |
e |
O |
e |
O |
O |
e |
e |
Delete Batch Job Request |
e |
O |
O |
O |
O |
e |
O |
Batch Job Message Request |
e |
e |
O |
e |
e |
e |
e |
Rerun Batch Job Request |
e |
e |
O |
e |
e |
e |
e |
Signal Batch Job Request |
e |
e |
O |
e |
e |
e |
e |
Batch Job Status Request |
e |
O |
O |
O |
O |
O |
O |
Batch Queue Status Request |
O |
O |
O |
O |
O |
O |
O |
Server Status Request |
O |
O |
O |
O |
O |
O |
O |
Select Batch Job Request |
e |
O |
O |
O |
O |
O |
O |
Move Batch Job Request |
e |
O |
O |
O |
O |
e |
e |
Hold Batch Job Request |
e |
O |
O |
O |
O |
e |
e |
Release Batch Job Request |
e |
O |
e |
O |
O |
e |
e |
Server Shutdown Request |
O |
O |
e |
O |
O |
e |
e |
Locate Batch Job Request |
e |
O |
O |
O |
O |
O |
O |
The execution of a batch job by a batch server shall be controlled by job, queue, and server attributes, as defined in this section.
Batch accounting is an optional feature of batch servers. If a batch server implements accounting, the statements in this section apply and the configuration variable POSIX2_PBS_ACCOUNTING shall be set to 1.
A batch server that executes a batch job shall charge the account named in the Account_Name attribute of the batch job for resources consumed by the batch job.
If the Account_Name attribute of the batch job is absent from the batch job attribute list or is altered while the batch job is in execution, the batch server action is implementation-defined.
Batch checkpointing is an optional feature of batch servers. If a batch server implements checkpointing, the statements in this section apply and the configuration variable POSIX2_PBS_CHECKPOINT shall be set to 1.
There are two attributes associated with the checkpointing feature: Checkpoint and Minimum_Cpu_Interval. Checkpoint is a batch job attribute, while Minimum_Cpu_Interval is a queue attribute. An implementation that does not support checkpointing shall support the Checkpoint job attribute to the extent that the batch server shall maintain and pass this attribute to other servers.
The behavior of a batch server that executes a batch job for which the value of the Checkpoint attribute is CHECKPOINT_UNSPECIFIED is implementation-defined. A batch server that executes a batch job for which the value of the Checkpoint attribute is NO_CHECKPOINT shall not checkpoint the batch job.
A batch server that executes a batch job for which the value of the Checkpoint attribute is CHECKPOINT_AT_SHUTDOWN shall checkpoint the batch job only when the batch server accepts a request to shut down during the time when the batch job is in the RUNNING state.
A batch server that executes a batch job for which the value of the Checkpoint attribute is CHECKPOINT_AT_MIN_CPU_INTERVAL shall checkpoint the batch job at the interval specified by the Minimum_Cpu_Interval attribute of the queue for which the batch job has been selected. The Minimum_Cpu_Interval attribute shall be specified in units of CPU minutes.
A batch server that executes a batch job for which the value of the Checkpoint attribute is an unsigned integer shall checkpoint the batch job at an interval that is the value of either the Checkpoint attribute, or the Minimum_Cpu_Interval attribute of the queue for which the batch job has been selected, whichever is greater. Both intervals shall be in units of CPU minutes. When the Minimum_Cpu_Interval attribute is greater than the Checkpoint attribute, the batch job shall write a warning message to the standard error stream of the batch job.
The Error_Path attribute of a running job cannot be changed by a Modify Batch Job Request. When the Join_Path attribute of the batch job is set to the value FALSE and the Keep_Files attribute of the batch job does not contain the value KEEP_STD_ERROR, a batch server that executes a batch job shall perform one of the following actions:
Set the standard error stream of the session leader of the batch job to the path described by the value of the Error_Path attribute of the batch job.
Buffer the standard error of the session leader of the batch job until completion of the batch job, and when the batch job exits return the contents to the destination described by the value of the Error_Path attribute of the batch job.
Applications shall not rely on having access to the standard error of a batch job prior to the completion of the batch job.
When the Error_Path attribute does not specify a host name, then the batch server shall retain the standard error of the batch job on the host of execution.
When the Error_Path attribute does specify a host name and the Keep_Files attribute does not contain the value KEEP_STD_ERROR, then the final destination of the standard error of the batch job shall be on the host whose host name is specified.
If the path indicated by the value of the Error_Path attribute of the batch job is a relative path, the batch server shall expand the path relative to the home directory of the user on the host to which the file is being returned.
When the batch server buffers the standard error of the batch job and the file cannot be opened for write upon completion of the batch job, then the server shall place the standard error in an implementation-defined location and notify the user of the location via mail. It shall be possible for the user to process this mail using the mailx utility.
If a batch server that does not buffer the standard error cannot open the standard error path of the batch job for write access, then the batch server shall abort the batch job.
A batch server shall not execute a batch job before the time represented by the value of the Execution_Time attribute of the batch job. The Execution_Time attribute is defined in seconds since the Epoch.
A batch server shall support the following hold types:
An implementation may define other hold types. Any additional hold types, how they are specified, their internal representation, their behavior, and how they affect the behavior of other utilities are implementation-defined.
The value of the Hold_Types attribute shall be the union of the valid hold types ( 's' , 'o' , 'u' , and any implementation-defined hold types), or 'n' .
A batch server shall not execute a batch job if the Hold_Types attribute of the batch job has a value other than NO_HOLD. If the Hold_Types attribute of the batch job has a value other than NO_HOLD, the batch job shall be in the HELD state.
The Job_Owner attribute consists of a pair of user name and host name values of the form:
username@hostname
A batch server that accepts a Queue Batch Job Request shall set the Job_Owner attribute to a string that is the username@ hostname of the user who submitted the job.
A batch server that executes a batch job for which the value of the Join_Path attribute is TRUE shall ignore the value of the Error_Path attribute and merge the standard error of the batch job with the standard output of the batch job.
A batch server that executes a batch job for which the value of the Keep_Files attribute includes the value KEEP_STD_OUTPUT shall retain the standard output of the batch job on the host where execution occurs. The standard output shall be retained in the home directory of the user under whose user ID the batch job is executed and the filename shall be the default filename for the standard output as defined under the -o option of the qsub utility. The Output_Path attribute is not modified.
A batch server that executes a batch job for which the value of the Keep_Files attribute includes the value KEEP_STD_ERROR shall retain the standard error of the batch job on the host where execution occurs. The standard error shall be retained in the home directory of the user under whose user ID the batch job is executed and the filename shall be the default filename for standard error as defined under the -e option of the qsub utility. The Error_Path attribute is not modified.
A batch server that executes a batch job for which the value of the Keep_Files attribute includes values other than KEEP_STD_OUTPUT and KEEP_STD_ERROR shall retain these other files on the host where execution occurs. These files (with implementation-defined names) shall be retained in the home directory of the user under whose user identifier the batch job is executed.
A batch server that executes a batch job for which one of the values of the Mail_Points attribute is the value MAIL_AT_BEGINNING shall send a mail message to each user account listed in the Mail_Users attribute of the batch job.
The mail message shall contain at least the batch job identifier, queue, and server at which the batch job currently resides, and the Job_Owner attribute.
The Output_Path attribute of a running job cannot be changed by a Modify Batch Job Request. When the Keep_Files attribute of the batch job does not contain the value KEEP_STD_OUTPUT, a batch server that executes a batch job shall either:
Set the standard output stream of the session leader of the batch job to the destination described by the value of the Output_Path attribute of the batch job.
or:
Buffer the standard output of the session leader of the batch job until completion of the batch job, and when the batch job exits return the contents to the destination described by the value of the Output_Path attribute of the batch job.
When the Output_Path attribute does not specify a host name, then the batch server shall retain the standard output of the batch job on the host of execution.
When the Keep_Files attribute does not contain the value KEEP_STD_OUTPUT and the Output_Path attribute does specify a host name, then the final destination of the standard output of the batch job shall be on the host specified.
If the path specified in the Output_Path attribute of the batch job is a relative path, the batch server shall expand the path relative to the home directory of the user on the host to which the file is being returned.
Whether or not the batch server buffers the standard output of the batch job until completion of the batch job is implementation-defined. Applications shall not rely on having access to the standard output of a batch job prior to the completion of the batch job.
When the batch server does buffer the standard output of the batch job and the file cannot be opened for write upon completion of the batch job, then the batch server shall place the standard output in an implementation-defined location and notify the user of the location via mail. It shall be possible for the user to process this mail using the mailx utility.
If a batch server that does not buffer the standard output cannot open the standard output path of the batch job for write access, then the batch server shall abort the batch job.
A batch server implementation may choose to preferentially execute a batch job based on the Priority attribute. The interpretation of the batch job Priority attribute by a batch server is implementation-defined. If an implementation uses the Priority attribute, it shall interpret larger values of the Priority attribute to mean the batch job shall be preferentially selected for execution.
A batch job that began execution but did not complete, because the batch server either shut down or terminated abnormally, shall be requeued if the Rerunable attribute of the batch job has the value TRUE.
If a batch job, which was requeued after beginning execution but prior to completion, has a valid checkpoint file and the batch server supports checkpointing, then the batch job shall be restarted from the last valid checkpoint.
If the batch job cannot be restarted from a checkpoint, then when a batch job has a Rerunable attribute value of TRUE and was requeued after beginning execution but prior to completion, the batch server shall place the batch job into execution at the beginning of the job.
When a batch job has a Rerunable attribute value other than TRUE and was requeued after beginning execution but prior to completion, and the batch job cannot be restarted from a checkpoint, then the batch server shall abort the batch job.
A batch server that executes a batch job shall establish the resource limits of the session leader of the batch job according to the values of the Resource_List attribute of the batch job. Resource limits shall be enforced by an implementation-defined method.
The Shell_Path_List job attribute consists of a list of pairs of pathname and host name values. The host name component can be omitted, in which case the pathname serves as the default pathname when a batch server cannot find the name of the host on which it is running in the list.
A batch server that executes a batch job shall select, from the value of the Shell_Path_List attribute of the batch job, a pathname where the shell to execute the batch job shall be found. The batch server shall select the pathname, in order of preference, according to the following methods:
Select the pathname that contains the name of the host on which the batch server is running.
Select the pathname for which the host name has been omitted.
Select the pathname for the login shell of the user under which the batch job is to execute.
If the shell path value selected is an invalid pathname, the batch server shall abort the batch job.
If the value of the selected pathname from the Shell_Path_List attribute of the batch job represents a partial path, the batch server shall expand the path relative to a path that is implementation-defined.
The batch server that executes the batch job shall execute the program that was selected from the Shell_Path_List attribute of the batch job. The batch server shall pass the path to the script of the batch job as the first argument to the shell program.
The User_List job attribute consists of a list of pairs of user name and host name values. The host name component can be omitted, in which case the user name serves as a default when a batch server cannot find the name of the host on which it is running in the list.
A batch server that executes a batch job shall select, from the value of the User_List attribute of the batch job, a user name under which to create the session leader. The server shall select the user name, in order of preference, according to the following methods:
Select the user name of a value that contains the name of the host on which the batch server executes.
Select the user name of a value for which the host name has been omitted.
Select the user name from the Job_Owner attribute of the batch job.
A batch server that executes a batch job shall create, in the environment of the session leader of the batch job, each environment variable listed in the Variable_List attribute of the batch job, and set the value of each such environment variable to that of the corresponding variable in the variable list.
To route a batch job is to select a queue from a list and move the batch job to that queue.
A batch server that has routing queues, which have been started, shall route the jobs in the routing queues owned by the batch server. A batch server may delay the routing of a batch job. The algorithm for selecting a batch job and the queue to which it will be routed is implementation-defined.
When a routing queue has multiple possible destinations specified, then the precedence of the destinations is implementation-defined.
A batch server that routes a batch job to a queue at another server shall move the batch job into the target queue with a Queue Batch Job Request.
If the target server rejects the Queue Batch Job Request, the routing server shall retry routing the batch job or abort the batch job. A batch server that retries failed routings shall provide a means for the batch administrator to specify the number of retries and the minimum period of time between retries. The means by which an administrator specifies the number of retries and the delay between retries is implementation-defined. When the number of retries specified by the batch administrator has been exhausted, the batch server shall abort the batch job and perform the functions of Batch Job Exit; see Batch Job Exit .
For each job in the EXITING state, the batch server that exited the batch job shall perform the following deferred services in the order specified:
If buffering standard error, move that file into the location specified by the Error_Path attribute of the batch job.
If buffering standard output, move that file into the location specified by the Output_Path attribute of the batch job.
If the Mail_Points attribute of the batch job includes MAIL_AT_EXIT, send mail to the users listed in the Mail_Users attribute of the batch job. The mail message shall contain at least the batch job identifier, queue, and server at which the batch job currently resides, and the Job_Owner attribute.
Remove the batch job from the queue.
If a batch server that buffers the standard error output cannot return the standard error file to the standard error path at the time the batch job exits, the batch server shall do one of the following:
Mail the standard error file to the batch job owner.
Save the standard error file and mail the location and name of the file where the standard error is stored to the batch job owner.
Save the standard error file and notify the user by other implementation-defined means.
If a batch server that buffers the standard output cannot return the standard output file to the standard output path at the time the batch job exits, the batch server shall do one of the following:
Mail the standard output file to the batch job owner.
Save the standard output file and mail the location and name of the file where the standard output is stored to the batch job owner.
Save the standard output file and notify the user by other implementation-defined means.
At the conclusion of job exit processing, the batch job is no longer managed by a batch server.
A batch server that has been either shutdown or terminated abnormally, and has returned to operation, is said to have ``restarted''.
Upon restarting, a batch server shall requeue those jobs managed by the batch server that were in the RUNNING state at the time the batch server shut down and for which the Rerunable attribute of the batch job has the value TRUE.
Queues are defined to be non-volatile. A batch server shall store the content of queues that it controls in such a way that server and system shutdowns do not erase the content of the queues.
A batch server that cannot perform a deferred service for a batch job shall abort the batch job.
A batch server that aborts a batch job shall perform the following services:
Delete the batch job from the queue in which it resides.
If the Mail_Points attribute of the batch job includes the value MAIL_AT_ABORT, send mail to the users listed in the value of the Mail_Users attribute of the job. The mail message shall contain at least the batch job identifier, queue, and server at which the batch job currently resides, the Job_Owner attribute, and the reason for the abort.
If the batch job was in the RUNNING state, terminate the session leader of the executing job by sending the session leader a SIGKILL, place the batch job in the EXITING state, and perform the actions of Batch Job Exit.
This section describes the services provided by batch servers in response to requests from clients. Batch
Services Summary summarizes the current set of batch service requests and for each gives its type (deferred or not) and whether
it is an optional function.
Batch Service |
Deferred |
Optional |
---|---|---|
Batch Job Execution |
Yes |
No |
Batch Job Routing |
Yes |
No |
Batch Job Exit |
Yes |
No |
Batch Server Restart |
Yes |
No |
Batch Job Abort |
Yes |
No |
Delete Batch Job Request |
No |
No |
Hold Batch Job Request |
No |
No |
Batch Job Message Request |
No |
Yes |
Batch Job Status Request |
No |
No |
Locate Batch Job Request |
No |
Yes |
Modify Batch Job Request |
No |
No |
Move Batch Job Request |
No |
No |
Queue Batch Job Request |
No |
No |
Batch Queue Status Request |
No |
No |
Release Batch Job Request |
No |
No |
Rerun Batch Job Request |
No |
No |
Select Batch Jobs Request |
No |
No |
Server Shutdown Request |
No |
No |
Server Status Request |
No |
No |
Signal Batch Job Request |
No |
No |
Track Batch Job Request |
No |
Yes |
If a request is rejected because the batch client is not authorized to perform the action, the batch server shall return the same status as when the batch job does not exist.
A batch job is defined to have been deleted when it has been removed from the queue in which it resides and not instantiated in another queue. A client requests that the server that manages a batch job delete the batch job. Such a request is called a Delete Batch Job Request.
A batch server shall reject a Delete Batch Job Request if any of the following statements are true:
The user of the batch client is not authorized to delete the designated job.
The designated job is not managed by the batch server.
The designated job is in a state inconsistent with the delete request.
A batch server may reject a Delete Batch Job Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server requested to delete a batch job shall delete the batch job if the batch job exists and is not in the EXITING state.
A batch server that deletes a batch job in the RUNNING state shall send a SIGKILL signal to the session leader of the batch job. It is implementation-defined whether additional signals are sent to the session leader of the job prior to sending the SIGKILL signal.
A batch server that deletes a batch job in the RUNNING state shall place the batch job in the EXITING state after it has killed the session leader of the batch job and shall perform the actions of Batch Job Exit.
A batch client can request that the batch server add one or more holds to a batch job. Such a request is called a Hold Batch Job Request.
A batch server shall reject a Hold Batch Job Request if any of the following statements are true:
The batch server does not support one or more of the requested holds to be added to the batch job.
The user of the batch client is not authorized to add one or more of the requested holds to the batch job.
The batch server does not manage the specified job.
The designated job is in the EXITING state.
A batch server may reject a Hold Batch Job Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server that accepts a Hold Batch Job Request for a batch job in the RUNNING state shall place a hold on the batch job. The effects, if any, the hold will have on a batch job in the RUNNING state are implementation-defined.
A batch server that accepts a Hold Batch Job Request shall add each type of hold listed in the Hold Batch Job Request, that is not already present, to the value of the Hold_Types attribute of the batch job.
Batch Job Message Request is an optional feature of batch servers. If an implementation supports Batch Job Message Request, the statements in this section apply and the configuration variable POSIX2_PBS_MESSAGE shall be set to 1.
A batch client can request that a batch server write a message into certain output files of a batch job. Such a request is called a Batch Job Message Request.
A batch server shall reject a Batch Job Message Request if any of the following statements are true:
The batch server does not support sending messages to jobs.
The user of the batch client is not authorized to post a message to the designated job.
The designated job does not exist on the batch server.
The designated job is not in the RUNNING state.
A batch server may reject a Batch Job Message Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server that accepts a Batch Job Message Request shall write the message sent by the batch client into the files indicated by the batch client.
A batch client can request that a batch server respond with the status and attributes of a batch job. Such a request is called a Batch Job Status Request.
A batch server shall reject a Batch Job Status Request if any of the following statements are true:
The user of the batch client is not authorized to query the status of the designated job.
The designated job is not managed by the batch server.
A batch server may reject a Batch Job Status Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server that accepts a Batch Job Status Request shall return a Batch Job Status Message to the batch client.
A batch server may return other information in response to a Batch Job Status Request.
Locate Batch Job Request is an optional feature of batch servers. If an implementation supports Locate Batch Job Request, the statements in this section apply and the configuration variable POSIX2_PBS_LOCATE shall be set to 1.
A batch client can ask a batch server to respond with the location of a batch job that was created by the batch server. Such a request is called a Locate Batch Job Request.
A batch server that accepts a Locate Batch Job Request shall return a Batch Job Location Message to the batch client.
A batch server may reject a Locate Batch Job Request for a batch job that was not created by that server.
A batch server may reject a Locate Batch Job Request for a batch job that is no longer managed by that server; that is, for a batch job that is not in a queue owned by that server.
A batch server may reject a Locate Batch Job Request for other implementation-defined reasons.
Batch clients modify (alter) the attributes of a batch job by making a request to the server that manages the batch job. Such a request is called a Modify Batch Job Request.
A batch server shall reject a Modify Batch Job Request if any of the following statements are true:
The user of the batch client is not authorized to make the requested modification to the batch job.
The designated job is not managed by the batch server.
The requested modification is inconsistent with the state of the batch job.
An unrecognized resource is requested for a batch job in an execution queue.
A batch server may reject a Modify Batch Job Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server that accepts a Modify Batch Job Request shall modify all the specified attributes of the batch job. A batch server that rejects a Modify Batch Job Request shall modify none of the attributes of the batch job.
If the servicing by a batch server of an otherwise valid request would result in no change, then the batch server shall indicate successful completion of the request.
A batch client can request that a batch server move a batch job to another destination. Such a request is called a Move Batch Job Request.
A batch server shall reject a Move Batch Job Request if any of the following statements are true:
The user of the batch client is not authorized to remove the designated job from the queue in which the batch job resides.
The user of the batch client is not authorized to move the designated job to the destination.
The designated job is not managed by the batch server.
The designated job is in the EXITING state.
The destination is inaccessible.
A batch server can reject a Move Batch Job Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server that accepts a Move Batch Job Request shall perform the following services:
Queue the designated job at the destination.
Remove the designated job from the queue in which the batch job resides.
If the destination resides on another batch server, the batch server shall queue the batch job at the destination by sending a Queue Batch Job Request to the other server. If the Queue Batch Job Request fails, the batch server shall reject the Move Batch Job Request. If the Queue Batch Job Request succeeds, the batch server shall remove the batch job from its queue.
The batch server shall not modify any attributes of the batch job.
A batch queue is controlled by one and only one batch server. A batch server is said to own the queues that it controls. Batch clients make requests of batch servers to have jobs queued. Such a request is called a Queue Batch Job Request.
A batch server requested to queue a batch job for which the queue is not specified shall select an implementation-defined queue for the batch job. Such a queue is called the ``default queue'' of the batch server. The implementation shall provide the means for a batch administrator to specify the default queue. The queue, whether specified or defaulted, is called the ``target queue''.
A batch server shall reject a Queue Batch Job Request if any of the following statements are true:
The client is not authorized to create a batch job in the target queue.
The request specifies a queue that does not exist on the batch server.
The target queue is an execution queue and the batch server cannot satisfy a resource requirement of the batch job.
The target queue is an execution queue and an unrecognized resource is requested.
The target queue is an execution queue, the batch server does not support checkpointing, and the value of the Checkpoint attribute of the batch job is not NO_CHECKPOINT.
The job requires access to a user identifier that the batch client is not authorized to access.
A batch server may reject a Queue Batch Job Request for other implementation-defined reasons.
A batch server that accepts a Queue Batch Job Request for a batch job for which the PBS_O_QUEUE value is missing from the value of the Variable_List attribute of the batch job shall add that variable to the list and set the value to the name of the target queue. Once set, no server shall change the value of PBS_O_QUEUE, even if the batch job is moved to another queue.
A batch server that accepts a Queue Batch Job Request for a batch job for which the PBS_JOBID value is missing from the value of the Variable_List attribute shall add that variable to the list and set the value to the batch job identifier assigned by the server in the format:
sequence_number.server
A batch server that accepts a Queue Batch Job Request for a batch job for which the PBS_JOBNAME value is missing from the value of the Variable_List attribute of the batch job shall add that variable to the list and set the value to the Job_Name attribute of the batch job.
A batch client can request that a batch server respond with the status and attributes of a queue. Such a request is called a Batch Queue Status Request.
A batch server shall reject a Batch Queue Status Request if any of the following statements are true:
The user of the batch client is not authorized to query the status of the designated queue.
The designated queue does not exist on the batch server.
A batch server may reject a Batch Queue Status Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server that accepts a Batch Queue Status Request shall return a Batch Queue Status Reply to the batch client.
A batch client can request that the server remove one or more holds from a batch job. Such a request is called a Release Batch Job Request.
A batch server shall reject a Release Batch Job Request if any of the following statements are true:
The user of the batch client is not authorized to remove one or more of the requested holds from the batch job.
The batch server does not manage the specified job.
A batch server may reject a Release Batch Job Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server that accepts a Release Batch Job Request shall remove each type of hold listed in the Release Batch Job Request, that is present, from the value of the Hold_Types attribute of the batch job.
To rerun a batch job is to kill the session leader of the batch job and leave the batch job eligible for re-execution. A batch client can request that a batch server rerun a batch job. Such a request is called Rerun Batch Job Request.
A batch server shall reject a Rerun Batch Job Request if any of the following statements are true:
The user of the batch client is not authorized to rerun the designated job.
The Rerunable attribute of the designated job has the value FALSE.
The designated job is not in the RUNNING state.
The batch server does not manage the designated job.
A batch server may reject a Rerun Batch Job Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server that rejects a Rerun Batch Job Request shall in no way modify the execution of the batch job.
A batch server that accepts a request to rerun a batch job shall perform the following services:
Requeue the batch job in the execution queue in which it was executing.
Send a SIGKILL signal to the process group of the session leader of the batch job.
An implementation may indicate to the batch job owner that the batch job has been rerun. Whether and how the batch job owner is notified that a batch job is rerun is implementation-defined.
A batch server that reruns a batch job may send other implementation-defined signals to the session leader of the batch job prior to sending the SIGKILL signal.
A batch server may preferentially select a rerun job for execution. Whether rerun jobs shall be selected for execution before other jobs is implementation-defined.
A batch client can request from a batch server a list of jobs managed by that server that match a list of selection criteria. Such a request is called a Select Batch Jobs Request. All the batch jobs managed by the batch server that receives the request are candidates for selection.
A batch server that accepts a Select Batch Jobs Request shall return a list of zero or more job identifiers that correspond to jobs that meet the selection criteria.
If the batch client is not authorized to query the status of a batch job, the batch server shall not select the batch job.
A batch server is defined to have shut down when it does not respond to requests from clients and does not perform deferred services for jobs. A batch client can request that a batch server shut down. Such a request is called a Server Shutdown Request.
A batch server shall reject a Server Shutdown Request from a client that is not authorized to shut down the batch server. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server may reject a Server Shutdown Request for other implementation-defined reasons. The reasons for which a Server Shutdown Request may be rejected are implementation-defined.
At server shutdown, a batch server shall do, in order of preference, one of the following:
If checkpointing is implemented and the batch job is checkpointable, then checkpoint the batch job and requeue it.
If the batch job is rerunnable, then requeue the batch job to be rerun (restarted from the beginning).
Abort the batch job.
A batch client can request that a batch server respond with the status and attributes of the batch server. Such a request is called a Server Status Request.
A batch server shall reject a Server Status Request if the following statement is true:
The user of the batch client is not authorized to query the status of the designated server.
A batch server may reject a Server Status Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server that accepts a Server Status Request shall return a Server Status Reply to the batch client.
A batch client can request that a batch server signal the session leader of a batch job. Such a request is called a Signal Batch Job Request.
A batch server shall reject a Signal Batch Job Request if any of the following statements are true:
The user of the batch client is not authorized to signal the batch job.
The job is not in the RUNNING state.
The batch server does not manage the designated job.
The requested signal is not supported by the implementation.
A batch server may reject a Signal Batch Job Request for other implementation-defined reasons. The method used to determine whether the user of a client is authorized to perform the requested action is implementation-defined.
A batch server that accepts a request to signal a batch job shall send the signal requested by the batch client to the process group of the session leader of the batch job.
Track Batch Job Request is an optional feature of batch servers. If an implementation supports Track Batch Job Request, the statements in this section apply and the configuration variable POSIX2_PBS_TRACK shall be set to 1.
Track Batch Job Request provides a method for tracking the current location of a batch job. Clients may use the tracking information to determine the batch server that should receive a batch server request.
If Track Batch Job Request is supported by a batch server, then when the batch server queues a batch job as a result of a Queue Batch Job Request, and the batch server is not the batch server that created the batch job, the batch server shall send a Track Batch Job Request to the batch server that created the job.
If Track Batch Job Request is supported by a batch server, then the Track Batch Job Request may also be sent to other servers as a backup to the primary server. The method by which backup servers are specified is implementation-defined.
If Track Batch Job Request is supported by a batch server that receives a Track Batch Job Request, then the batch server shall record the current location of the batch job as contained in the request.
A utility shall recognize job_identifiers of the format:
[sequence_number][.server_name][@server]
where:
If the application omits the batch server_name portion of a batch job identifier, a utility shall use the name of a default batch server.
If the application omits the batch server portion of a batch job identifier, a utility shall use:
The batch server indicated by server_name, if present
The name of the default batch server
The name of the batch server that is currently managing the batch job
If only @ server is specified, then the status of all jobs owned by the user on the requested server is listed.
The means by which a utility determines the default batch server is implementation-defined.
If the application presents the batch server portion of a batch job identifier to a utility, the utility shall send the request to the specified server.
A strictly conforming application shall use the syntax described for the job identifier. Whenever a batch job identifier is specified whose syntax is not recognized by an implementation, then a message for each error that occurs shall be written to standard error and the utility shall exit with an exit status greater than zero.
When a batch job identifier is supplied as an argument to a batch utility and the server_name portion of the batch job identifier is omitted, then the utility shall use the name of the default batch server.
When a batch job identifier is supplied as an argument to a batch utility and the batch server portion of the batch job identifier is omitted, then the utility shall use either:
The name of the default batch server
or:
The name of the batch server that is currently managing the batch job
When a batch job identifier is supplied as an argument to a batch utility and the batch server portion of the batch job identifier is specified, then the utility shall send the required Batch Server Request to the specified server.
The utility shall recognize a destination of the format:
[queue][@server]
where:
If the application omits the batch server portion of a destination, then the utility shall use either:
The name of the default batch server
or:
The name of the batch server that is currently managing the batch job
The means by which a utility determines the default batch server is implementation-defined.
If the application omits the queue portion of a destination, then the utility shall use the name of the default queue at the batch server chosen. The means by which a batch server determines its default queue is implementation-defined. If a destination is specified in the queue@ server form, then the utility shall use the specified queue at the specified server.
A strictly conforming application shall use the syntax described for a destination. Whenever a destination is specified whose syntax is not recognized by an implementation, then a message shall be written to standard error and the utility shall exit with an exit status greater than zero.
For each option that can have multiple keyword-value pair arguments, the following rules shall apply. Examples of options that can have list-oriented option-arguments are -u value@ keyword and -l keyword= value.
If a batch utility is presented with a list-oriented option-argument for which a keyword has a corresponding value that begins with a single or double quote, then the utility shall stop interpreting the input stream for delimiters until a second single or double quote, respectively, is encountered. This feature allows some flexibility for a comma ( ',' ) or equals sign ( '=' ) to be part of the value string for a particular keyword; for example:
keywd1='val1,val2',keywd2="val3,val4"
foo -xkeywd1=\'val1,val2\',keywd2=\"val3,val4\"
If a batch server is presented with a list-oriented attribute that has a keyword that was encountered earlier in the list, then the later entry for that keyword shall replace the earlier entry.
If a batch server is presented with a list-oriented attribute that has a keyword without any corresponding value of the form keyword= or @ keyword and the same keyword was encountered earlier in the list, then the prior entry for that keyword shall be ignored by the batch server.
If a batch utility is expecting a list-oriented option-argument entry of the form keyword= value, but is presented with an entry of the form keyword without any corresponding value, then the entry shall be treated as though a default value of NULL was assigned (that is, keyword=NULL) for entry parsing purposes. The utility shall include only the keyword, not the NULL value, in the associated job attribute.
If a batch utility is expecting a list-oriented option-argument entry of the form value@ keyword, but is presented with an entry of the form value without any corresponding keyword, then the entry shall be treated as though a keyword of NULL was assigned (that is, value@NULL) for entry parsing purposes. The utility shall include only the value, not the NULL keyword, in the associated job attribute.
A batch server shall accept a list-oriented attribute that has multiple occurrences of the same keyword, interpreting the keywords, in order, with the last value encountered taking precedence over prior instances of the same keyword. This rule allows, but does not require, a batch utility to preprocess the attribute to remove duplicate keywords.
If a batch utility is presented with multiple list-oriented option-arguments on the command line or in script directives, or both, for a single option, then the utility shall concatenate, in order, any command line keyword and value pairs to the end of any directive keyword and value pairs separated by a single comma to produce a single string that is an equivalent, valid option-argument. The resulting string shall be assigned to the associated attribute of the batch job (after optionally removing duplicate entries as described in item 6).