Tuesday, February 5, 2008

How To Write Your Own Load Sensors For Grid Engine

By Sinisa Veseli

As most Grid Engine (GE) administrators know, the GE execution daemon periodically reports values for a number of host load parameters. Those values are stored in the qmaster internal host object, and are used internally (e.g., for job scheduling) if a complex resource attribute with a corresponding name is defined.



Parameters that are reported by default (about 20 or so) are documented in the file $SGE_ROOT/doc/load_parameters.asc. For a large number of clusters the default set is sufficient to adequately describe their load situation. However, for those sites where this is not the case, the Grid Engine software provides administrators with the ability to introduce additional custom load parameters. Accomplishing this task is not difficult, and involves three steps:



1) Provide custom load sensor. This can be a script or a binary that feeds the GE execution daemon with additional load information. It must comply with the following rules:



• It should be written as an infinite loop that waits for user input from the standard input stream.

• If the string “quit” is received, the sensor should exit.

• Otherwise, it should retrieve data necessary for computing the desired load figures, calculate those, and write them to the standard output stream.

• The individual host-related load figures should be reported one per line and in the form :: (without any blanks). The load figures should be enclosed with a pair of lines containing only “begin” and “end” strings. For example, custom load sensor running on the machine tolkien.univaud.com and measuring parameters n_app_users and n_app_threads might show the following output:



begin

tolkien.univaud.com:n_app_users:12

tolkien.univaud.com:n_app_threads:23

end



Note that for global consumable resources not attached to each host (such as, for example, the number of used floating licenses), the load sensor needs to output string “global” instead of the machine name.



2) For each custom load parameter define complex resource attribute using, for example, the “qconf -mc” command.

3) Enable custom load sensor by executing the “qconf -mconf” command and providing the full path to your script or executable as value for the “load_sensor” parameter. If all goes well, the execution daemon will start reporting the new load parameters within a minute or two, and you should be able to see them using the “qhost -F command.



Administrators with decent scripting skills (or those with a bit of luck ☺) can usually implement and enable new load sensors for their Grid Engine installations in a very short period of time. Note that some simple examples for custom load sensors can be found in the Grid Engine Admin Guide, as well as in the corresponding HowTo document.

No comments:

Post a Comment