--
CarstenPreuss - 12 Apr 2011
Using QMON
Queue Control
This dialog is used to show the overall cluster status as well as the status of all nodes.
From here the queues are configured and administrated.
Complex Configuration
Here the variables used by the scheduler are configured and new values are defined here.
Policy Configuration
This is the place to configure the amount of different tickets, the
FairShare tree and the weighting of the different prioritys.
Currently there are 100 FS tickets per core, spread over the main groups Alice,
AliceGrid, Hades, Theory and RZ.
OverallPriority = Priority + Urgency + Ticket
Priority = -1023 - +1024
Urgency =
WeightDeadline +
WeightWaitingtime
Ticket =
ShareTreeTickets +
FunctionalTickets +
OverrideTickets
FairsShare can take used CPU time, used memory time and IO in GB for both network and disk added into account, with different weights.
Prioritys are not accounted, so it would not make sense to have high priority queues with unlimited access for users / to resources.
OverrideTickets are tickets that can be assigned to users and will be used as
OneTime tickets, means they will be used up.
FunctionalTickets are tickets dedicated to special sort of jobs for special purposes, maybe admin tasks.
WeightDeadline is for special users who are allowed to submit deadline jobs only. If activated the priority of such jobs will increase while the deadline for the job will be come closer.
WeightWaitingTime is just raising the priority for jobs as their waiting time will increase.
Priority is the priority given to jobs be the users by themself,
The used Ticket hirarchy is currently OS means
OverrideTickets and
FairShare.
Resource LXadmin.Quota Configuration
With this it is possible to restrict users, hosts and queues in the usage of different resources.
{
name max_slots_alicegrid
description "max. amount of jobs for the alicegrid queue"
enabled TRUE
limit queues {alicegrid} to slots=2500
}
{
name max_slots_default_users
description "max. slots for default users"
enabled TRUE
limit users {*,!cpreuss,!alicesgm} to slots=10
}
The first set restricts the amount of jobs in the alicegrid queue to 2500.
The second set restricts all users to a maximum of 10 jobs, beside of alicesgm and cpreuss. Instead of filling in single users, it would be (maybe) better to have a list of registered users...
This mechanism can be used to allow all users restricted access to SGE and give them more access after they have visited a tutorial, signed the rules....
It is not possible (outside the configuration tool itself) to have a list of resource quotas, only the current enabled and active ones are shown, so a user can't plan what he/she can do...
Load based scheduling
SGE gives the opportunity to move from slot to load based scheduling by using load and suspend thresholds instead of (only) slots (which is also possible).
In
LSF we have mainly 8 slots on a 8 core node, means there can be maximal 8 running jobs on a node (plus some suspended ones).
So the efficiency of the cluster raises and falls with the efficiency of each single job.
With load based scheduling a node can take as much jobs as it can handle with it's resources, on top there should of course be still a maximal number of running jobs (because the resource usage of a job will change during its lifetime).
Load/suspend thresholds can be all SGE complex variables like CPU, mem_free but also own variables introduced to SGE by the admins.
Load Thresholds |
|
Suspend Thresholds |
|
np_load_avrg |
0.8 |
np_load_avrg |
1.5 |
mem_free |
4G |
mem_free |
1.0G |
tmp_free |
10G |
tmp_free |
2.0G |
There will be new jobs scheduled as long as the load thresholds (or the number of max slots) are not exceeded, if the suspend thresholds will be exceeded jobs running on this queue instance will be suspended, X jobs in every suspend intervall.
Having multiple queues on each host, prioritys can be enforced by setting different load/suspend thresholds for every queue :
high_prio_queue |
Load Thresholds |
|
Suspend Thresholds |
|
|
np_load_avrg |
0.9 |
np_load_avrg |
1.5 |
low_prio_queue |
Load Thresholds |
|
Suspend Thresholds |
|
|
np_load_avrg |
0.9 |
np_load_avrg |
1.2 |
Additional by defining the Nice value CPU time can be allocated to the different queues.
Resource/problem based scheduling
E.g. given that there are two queues, one for jobs with LXadmin.Lustre the other one for jobs that don't need Lustre.
Modifying the loadsensor (or just add a new one for a LXadmin.Lustre check) and invent a new complex value called lustre_ok :
name |
value |
consumable |
requestable |
lustre_ok |
boolean |
no |
yes |
name |
load |
suspend |
lustre_queue |
lustre_ok=true |
lustre_ok=false |
non_lustre_queue |
- |
- |
Whenever LXadmin.Lustre is not ok on a single host jobs running in the queue instance lustre_queue@X will be suspended one by one, no new jobs will be scheduled to these queue instances.
The queue instances non_lustre_queue will not be affected, but has as side effect the chance to get more ressources and more jobs.
WARNING : Changes are done immediately by pressing Ok, Enter or Write/Quit. If there is an error in the changes SGE will inform about and discard them.
If the changes are correct than they will take place in the next scheduling/report/whatever intervall, there is no "-Reconfigure-Are you sure?-Yes-Are you really sure?..." mechanism.
Using Job Submission Verifiers
From SUN/ORACLE :
Job Submission Verifiers (JSVs) allow users and administrators to define rules that determine which jobs are allowed to enter into a cluster and which jobs should be rejected immediately. A JSV is a script or binary that can be used to verify, modify, or reject a job during the time of job submission or on the master host.
The following are examples of how an administrator might use JSVs:
To verify that a user has write access to certain file systems.
To make sure that jobs do not contain certain resource requests, such as memory resource requests (h_vmem or h_data).
To add resource requests to a job that the submitting user may not know are necessary in the cluster.
To attach a user's job to a specific project or queue to ensure that cluster usage is accounted for correctly.
To inform a user about certain job details like queue allocation, account name, parallel environment, total number of tasks per node, and other job requests.
A verification can be performed by a client JSV instance at the time of job submission, by a server JSV instance on the master host, or by a combination client JSVs and server JSVs. In general, client JSVs should meet your cluster's needs. See below for more information on what client JSVs and server JSVs have to offer.