-- CarstenPreuss - 12 Apr 2011

Using QMON

Queue Control

This dialog is used to show the overall cluster status as well as the status of all nodes. From here the queues are configured and administrated.

Complex Configuration

Here the variables used by the scheduler are configured and new values are defined here.

Policy Configuration

This is the place to configure the amount of different tickets, the FairShare tree and the weighting of the different prioritys.

Currently there are 100 FS tickets per core, spread over the main groups Alice, AliceGrid, Hades, Theory and RZ.

OverallPriority = Priority + Urgency + Ticket

Priority = -1023 - +1024 Urgency = WeightDeadline + WeightWaitingtime Ticket = ShareTreeTickets + FunctionalTickets + OverrideTickets

FairsShare can take used CPU time, used memory time and IO in GB for both network and disk added into account, with different weights.

Prioritys are not accounted, so it would not make sense to have high priority queues with unlimited access for users / to resources.

OverrideTickets are tickets that can be assigned to users and will be used as OneTime tickets, means they will be used up. FunctionalTickets are tickets dedicated to special sort of jobs for special purposes, maybe admin tasks. WeightDeadline is for special users who are allowed to submit deadline jobs only. If activated the priority of such jobs will increase while the deadline for the job will be come closer. WeightWaitingTime is just raising the priority for jobs as their waiting time will increase. Priority is the priority given to jobs be the users by themself,

The used Ticket hirarchy is currently OS means OverrideTickets and FairShare.

Resource LXadmin.Quota Configuration

With this it is possible to restrict users, hosts and queues in the usage of different resources.

{ name max_slots_alicegrid description "max. amount of jobs for the alicegrid queue" enabled TRUE limit queues {alicegrid} to slots=2500 } { name max_slots_default_users description "max. slots for default users" enabled TRUE limit users {*,!cpreuss,!alicesgm} to slots=10 }

The first set restricts the amount of jobs in the alicegrid queue to 2500. The second set restricts all users to a maximum of 10 jobs, beside of alicesgm and cpreuss. Instead of filling in single users, it would be (maybe) better to have a list of registered users... This mechanism can be used to allow all users restricted access to SGE and give them more access after they have visited a tutorial, signed the rules....

It is not possible (outside the configuration tool itself) to have a list of resource quotas, only the current enabled and active ones are shown, so a user can't plan what he/she can do...

Load based scheduling

SGE gives the opportunity to move from slot to load based scheduling by using load and suspend thresholds instead of (only) slots (which is also possible).

In LSF we have mainly 8 slots on a 8 core node, means there can be maximal 8 running jobs on a node (plus some suspended ones). So the efficiency of the cluster raises and falls with the efficiency of each single job.

With load based scheduling a node can take as much jobs as it can handle with it's resources, on top there should of course be still a maximal number of running jobs (because the resource usage of a job will change during its lifetime). Load/suspend thresholds can be all SGE complex variables like CPU, mem_free but also own variables introduced to SGE by the admins.

Load Thresholds   Suspend Thresholds  
np_load_avrg 0.8 np_load_avrg 1.5
mem_free 4G mem_free 1.0G
tmp_free 10G tmp_free 2.0G

There will be new jobs scheduled as long as the load thresholds (or the number of max slots) are not exceeded, if the suspend thresholds will be exceeded jobs running on this queue instance will be suspended, X jobs in every suspend intervall.

Having multiple queues on each host, prioritys can be enforced by setting different load/suspend thresholds for every queue :

high_prio_queue Load Thresholds   Suspend Thresholds  
np_load_avrg 0.9 np_load_avrg 1.5

Nice 0

low_prio_queue Load Thresholds   Suspend Thresholds  
np_load_avrg 0.9 np_load_avrg 1.2

Nice 10

Additional by defining the Nice value CPU time can be allocated to the different queues.

Resource/problem based scheduling

E.g. given that there are two queues, one for jobs with LXadmin.Lustre the other one for jobs that don't need Lustre. Modifying the loadsensor (or just add a new one for a LXadmin.Lustre check) and invent a new complex value called lustre_ok :

name value consumable requestable
lustre_ok boolean no yes

name load suspend
lustre_queue lustre_ok=true lustre_ok=false
non_lustre_queue - -

Whenever LXadmin.Lustre is not ok on a single host jobs running in the queue instance lustre_queue@X will be suspended one by one, no new jobs will be scheduled to these queue instances. The queue instances non_lustre_queue will not be affected, but has as side effect the chance to get more ressources and more jobs.

WARNING : Changes are done immediately by pressing Ok, Enter or Write/Quit. If there is an error in the changes SGE will inform about and discard them. If the changes are correct than they will take place in the next scheduling/report/whatever intervall, there is no "-Reconfigure-Are you sure?-Yes-Are you really sure?..." mechanism.

Using Job Submission Verifiers

From SUN/ORACLE :

Job Submission Verifiers (JSVs) allow users and administrators to define rules that determine which jobs are allowed to enter into a cluster and which jobs should be rejected immediately. A JSV is a script or binary that can be used to verify, modify, or reject a job during the time of job submission or on the master host.

The following are examples of how an administrator might use JSVs:

To verify that a user has write access to certain file systems. To make sure that jobs do not contain certain resource requests, such as memory resource requests (h_vmem or h_data). To add resource requests to a job that the submitting user may not know are necessary in the cluster. To attach a user's job to a specific project or queue to ensure that cluster usage is accounted for correctly. To inform a user about certain job details like queue allocation, account name, parallel environment, total number of tasks per node, and other job requests.

A verification can be performed by a client JSV instance at the time of job submission, by a server JSV instance on the master host, or by a combination client JSVs and server JSVs. In general, client JSVs should meet your cluster's needs. See below for more information on what client JSVs and server JSVs have to offer.
Topic revision: r6 - 2011-11-07, BastianNeuburger