-- CarstenPreuss - 16 May 2011

Resource based (SGE) versus slot based (LSF) scheduling

In LSF we have 1 job per core. This is most of the time inefficient because we have mainly analysis jobs which are not that CPU efficient, so an overprovisioning with 1.0 to 1.5 jobs per core would be nice to have.

Advantage :
  • -can use resources more efficient than a static slot based scheduling

Disadvantage :
  • -needs more memory due to more active jobs

But should be better than the current LSF setup with running and suspended jobs.

  • job to core binding
  • allocation rules Round Robin or Fill Up (also own strategies should be possible in the future)
  • live monitoring of used resources
  • -not the current resource usage
  • -but should be possible, to determine whether a job is still doing something or not
  • jobs which can’t start will go back to Eqw
  • -with an small explanation why the job can’t start
  • -solve the problem and delete the error marker and the job will start again
  • source code available
  • own load sensors and complex values
  • nice suspend mechanism
  • load/suspend thresholds
Topic revision: r3 - 2011-11-07, BastianNeuburger