Monday, November 5, 2007

How to Decipher Grid Engine Statuses – Part II

Sinisa Veseli

In Part I of this article I’ve discussed meanings of various queue states that one might see after invoking the Grid Engine qstat command. The list of possible job states is just as long as the list of queue states:



• d (deletion) — Indicates that a job has been deleted using qdel.



• r (running) — Indicates that a job is about to be executed or is already executing.



• R (restarted) — Indicates that the job was restarted. This state can be caused by a job migration or because of one of the reasons described in the -r section of the qsub man page.



• s (suspended) — Shows that an already running job has been suspended using qmod.



• S (suspended) — Show that an already running job has been suspended because the queue that it belongs to has been suspended.



• t (transferring) — Indicates that a job is about to be executed or is already executing.



• T (threshold) — Show that an already running job has been suspended because at least one suspend threshold of the corresponding queue was exceeded, and that the job has been suspended as a consequence.



• w (waiting) — Indicates that the job is suspended pending the availability of a critical resource or specified condition.



• q (queued) — Indicates that the job has been queued.



• E (error) — Indicates that the job is in the error state. You can find the reason for this state using the qstat command with “-explain E” option.



• h (hold) — Indicates that the job is not eligible for execution due to a hold state assigned to it via qhold, qalter, or qsub -h command. 



Just like with queue states, one also frequently encounters various combinations of the above job states.

No comments:

Post a Comment