-- CarstenPreuss - 16 May 2011

no Black Hole detection for hosts for queue instances there is something similar, but reacts mainly on problems with SGE itself, NOT with job or environment problems no closing of nodes / host groups -only queues or queue instances (means one has to determine to which queues a host belongs to) -helper script needed here -qmod –d ‘*’@lxb500 no sort/filter options in lists preemption is not implemented in a good way -subordinate queue instances will be preempted at once, not single jobs -> should be solved in 6.2.5 but is not Fair Share accounts only CPU/Mem/IO -no runtime!! -maybe because of suspended times and the load scheduling model and renicing -but this means that inefficient jobs are not penalised need for more than 1 job per core what about the memory accounting if a job is reniced it then takes longer for the execution and then also needs more memory seconds Fair Share is not working properly with ratios smaller than 80:20 (at 90:10 it is in reality 95:5) reason unclear no formulas for Fair Share/Tickets given (claimed that it is highly interactive!) need to be tested in a larger production with many users over a longer time period -new cores will be available under SGE only no differences in accounting for priority jobs -should be done by defining priority as resource and accounting this no differences in accounting if the cluster is fully loaded or empty pend/run/susp and done/exit jobs are monitored via 2 commands instead of one future of OGE not clear, it will be further developed (but also for free?) -yes, currently done by Univa 6.2u5 (currently installed) is the last free version -from here a free developement has started monitoring/accounting(?) of memory consumption above 4 GB is not working -problem in the source code due to a wrong taken variable type no possible to switch running jobs from one queue to another
Topic revision: r2 - 2011-11-07, BastianNeuburger