Monday, June 30, 2008

About Grid Engine Advance Reservations

By Sinisa Veseli

Advance reservation (AR) capability is one of the most important new features of the upcoming Grid Engine 6.2 release. New command line utilities allow users and administrators to submit resource reservations (qrsub), view granted reservations (qrstat), or delete reservations (qrdel). Also, some of the existing commands are getting new switches. For example, the “-ar <AR id>“ option for qsub indicates that the submitted job is a part of an existing advanced reservation. Given that AR is a new functionality, I thought that it might be useful to describe how it works on a simple example (using 6.2 Beta software).

Advanced resource reservations can be submitted to Grid Engine by queue operators and managers, and also by a designated set of privileged users. Those users are defined in ACL “arusers”, which by default looks as follows:



$ qconf -sul
arusers
deadlineusers
defaultdepartment


$ qconf -su arusers
name    arusers
type    ACL
fshare  0
oticket 0
entries NONE




The “arusers” ACL can be modified via the “qconf -mu” command:



$ qconf -mu arusers
veseli@tolkien.ps.uud.com modified "arusers" in userset list


$ qconf -su arusers
name    arusers
type    ACL
fshare  0
oticket 0
entries veseli




Once designated as a member of this list, the user is allowed to submit ARs to Grid Engine:



[veseli@tolkien]$ qrsub -e 0805141450.33 -pe mpi 2
Your advance reservation 3 has been granted


[veseli@tolkien]$ qrstat
ar-id   name       owner        state start at             end at               duration
-----------------------------------------------------------------------------------------
      3            veseli       r     05/14/2008 14:33:08  05/14/2008 14:50:33  00:17:25

[veseli@tolkien]$ qstat -f 
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/0/4          0.04     lx24-x86      




For the sake of simplicity, in the above example we have a single queue (all.q) that has 4 job slots and a parallel environment (PE) mpi assigned to it. After reserving 2 slots for the mpi PE, there are only 2 slots left for running regular jobs until the above shown AR expires. Note that the "–e" switch for qrsub designates requested reservation end time in the format YYMMDDhhmm.ss. It is also worth pointing out that the qstat output changed slightly with respect to previous software releases in order to accommodate display of existing reservations.

If we now submit several regular jobs, only 2 of them will be able to run:



[veseli@tolkien]$ qsub regular_job.sh 
Your job 15 ("regular_job.sh") has been submitted
...
[veseli@tolkien]$ qsub regular_job.sh 
Your job 19 ("regular_job.sh") has been submitted


[veseli@tolkien]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/2/4          0.03     lx24-x86      
     15 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     16 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     17 0.55500 regular_jo veseli       qw    05/14/2008 14:34:22     1        
     18 0.55500 regular_jo veseli       qw    05/14/2008 14:34:23     1        
     19 0.55500 regular_jo veseli       qw    05/14/2008 14:34:24     1        




However, if we submit jobs that are part of the existing AR, those are allowed to run, while jobs submitted earlier are still pending:



[veseli@tolkien]$ qsub -ar 3 reserved_job.sh 
Your job 20 ("reserved_job.sh") has been submitted
[veseli@tolkien]$ qsub -ar 3 reserved_job.sh 
Your job 21 ("reserved_job.sh") has been submitted


[veseli@tolkien]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/4/4          0.02     lx24-x86      
     15 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     16 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     20 0.55500 reserved_j veseli       r     05/14/2008 14:35:02     1        
     21 0.55500 reserved_j veseli       r     05/14/2008 14:35:02     1        

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     17 0.55500 regular_jo veseli       qw    05/14/2008 14:34:22     1        
     18 0.55500 regular_jo veseli       qw    05/14/2008 14:34:23     1        
     19 0.55500 regular_jo veseli       qw    05/14/2008 14:34:24     1        




The above example illustrates how ARs work. As long as particular reservation is valid, only jobs that are designated as part of it can utilize resources that have been reserved.

I think that AR will prove to be extremely valuable tool for planning grid resource usage, and I’m very pleased to see it in the new Grid Engine release.

Friday, June 6, 2008

Steaming Java

By Roderick Flores

When Rich asked us to walk through a software development process, I immediately
thought back to a conversation that I had with my friend Leif Wickland about building high-performance Java applications. So I immediately emailed him asking him for his best practices. We have both produced code that is as fast, if not faster than C compiled with optimization (for me it was using a 64-bit JRE on a x86_64 architecture with multiple cores).



That is not to say that if you were to spend time optimizing the equivalent C-code that it would not be made to go faster. Rather, the main point is that Java is a viable HPC language. On a related note, Brian Goetz of Sun has a
very interesting discussion on IBM's DeveloperWorks, Urban performance legends, revisited on how garbage collection allows faster raw allocation performance.



However I digress… Here is a summary of what we both came up with (in no
particular order):


           
  1. It is vitally important to "measure, measure, measure," everything
    you do.  We can offer any set of helpful hints but the likelihood that all of them should be applied is extremely low.
  2.        
  3. It is equally important to remember to only optimize areas in the program that are bottlenecks. It is a waste of development time for no real gain.
  4.        
  5. One of the most simple and overlooked things that help your application is to overtly specify method parameters that are read-only using the final modifier. Not only can it help the compiler with optimization but it also
    is a good way of communicating your intentions to your teammates. Furthermore, i
    f you can make your method parameters final, this will help even more. One thing
    to be aware of is that not all things that are declared final behave as expected (see Is that your final answer? for more detail).
  6.        
  7. If you have states shared between threads, make whatever you can final so that that the VM takes no steps to ensure consistency. This is not
    something that we would have expected to make a difference, but it seems to help.
  8.        
  9. An equally ignored practice is using the finally clause. It i
    s very important to clean up the code in a try block. You could leave open streams, SQL queries, or perhaps other objects lying around taking up space.
           
  10. Create your data structures and declare your variables early. A core goal is to avoid allocating short-lived variables. While it is true that the garbage collector may reserve memory for variables that are declared often, why make it have to try to guess your intentions. For example, if a loop is called repeatedly, there is no need to say, for (int i = 0; …
    when you should have declared i earlier. Of course you have to be careful
    not to reset counters from inside of loops.
           
  11. Use static for values that are constants. This may seem obvious, but not everybody does.
  12.        
  13. For loops embedded within other loops:
                   
                              
    • Replace your outer loop with fixed-pool of threads.
      In the next release of java, this will be even easier using the fork-join keywords. This has become increasingly important with processors with many cores.
    •                         
    • Make sure that your innermost loop is the longest even if it doesn't necessarily map directly to the business goals. You shouldn't
      force the program to create a new loop too often as it wastes cycles.
    •        
    • Unroll your inner-loops. This can save an enormous amount of time even if it isn't pretty. The quick test I just ran was 300% faster. If you haven'
      t unrolled a loop before, it is pretty simple:
             

              unrollRemainder = count%LOOP_UNROLL_COUNT;

             

              for( n = 0; n < unrollRemainder; n++ ) {

                  // do some stuff here.

              }

             

              for( n = unrollRemainder; n < count; n+=LOOP_UNROLL_COUNT ) {

                  // do stuff for n here

                  // do stuff for n+1 here

                  // do stuff for n+2 here

                  …

                  // do stuff for n+LOOP_UNROLL_COUNT - 1 here

              }

              Notice that both n and unrollRemainder were declared earlier as recommended previously.
  14.        
  15. Preload all of your input data and then operate on it later. There
    is absolutely no reason that you should be loading data of any kind inside of your main calculation code. If the data doesn't fit or belong on one machine, use
    a Map-Reduce approach to distribute it across the Grid.
  16.        
  17. Use the factory pattern to create objects.
                   
                              
    • Data structures can be created ahead of time and only the necessary pieces are passed to the new object.
    •                         
    • Any preloaded data can also be segmented so that only the necessary parts are passed to the new object.
    •                         
    • You can avoid the allocation of short-lived variables by using constructors with the final keyword on its parameters.
    •                         
    • The factory can perform some heuristic calculations
      to see if a particular object should even be created for future processing.
  18.        
  19. When doing calculations on a large number of floating-point values,
    use a byte array to store the data and a ByteWrapper to convert it to floats. This should primarily be used for read only (input) data. If you are writing floating-point values you should do this with caution as it may take
    more time than using a float array. One major advantage that Java has when you use this approach is that you can switch between big and little-endian data rather easily.
  20.        
  21. Pass fewer parameters to methods. This results in less overhead. If
    you can pass a static value it will pass one fewer parameter.
  22.        
  23. Use static methods if possible. For example, a FahrenheitToCelsius(float fahrenheit); method could easily be made static. The main advantage
    here is that the compiler will likely inline the function.
  24.        
  25. There is some debate whether you should make particular methods
    final
    if they are called often. There is a strong argument to not do this because the enhancement is small or nonexistent (see Urban Performance Legends or
    once again Is that your final answer?). However my experience is that a small enhancement on a calculation that is run thousands of times can make a significant difference. Both Leif and I have seen measurable differences here. The key is to benchmark your code to be certain.