-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Container jobs, after being scheduled or after subtracting resources for already scheduled jobs, result in the creation or editing of the slotset named after the job ID or the "container" type value.
This process of creating a new slotset or adding resources to an existing slotset has some bugs and possible ameliorations.
- The code is duplicated between these two functions and may be merged into a single consistent function:
Line 24 in b615fa5
def set_slots_with_prev_scheduled_jobs( Line 405 in b615fa5
def schedule_id_jobs_ct(slots_sets, jobs, hy, id_jobs, job_security_time): - In
set_slots_with_prev_scheduled_jobs, a pseudo job of the size of the container job (minus job_security_time) is created like that:(https://github.com/oar-team/oar3/blob/b615fa52de2dd36a71d04f3b49d1169e744b474d/oar/kao/scheduling.py#L54C1-L57C48)if job.start_time < now: start_time = now else: start_time = job.start_time j = JobPseudo( id=0, start_time=start_time, walltime=job.walltime - job_security_time, res_set=job.res_set, ts=job.ts, ph=job.ts, )
This is a bug as it does not crop the interval for it to start at least at now: it moves the interval. Indeed, the walltime is unchanged while the begin ismax(now, start_time)This may lead to inner jobs being scheduled outside of the job container. - In
schedule_id_jobs_ct, creating a slotset is done differently than inset_slots_with_prev_scheduled_jobs:(https://github.com/oar-team/oar3/blob/b615fa52de2dd36a71d04f3b49d1169e744b474d/oar/kao/scheduling.py#L510C1-L519C56)slot = Slot( 1, 0, 0, copy.copy(job.res_set), job.start_time, job.start_time + job.walltime - job_security_time, ) slots_sets[ss_name] = SlotSet(slot)
This is a bug as:- It does not take into account timesharing and placeholder, while it is taken into account in all other places (when using a pseudo job to add resources, the ts and ph fields of the job are defined so the slots will get ts and ph entries).
- The SlotSet is of the size of the container job (minus job_security_time) instead of spanning from 1 to max_time or from now to max_time. This can lead to bugs if multiple container jobs have the same container name (same slotset). The first job will create a slotset that might be too small. If the second job has a time interval not included in the first one, any resources outside of the first job time interval will not be added if not worse.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels