Skip to content

Wrong initialization of new slotsets for container jobs. #107

@ClementGre

Description

@ClementGre

Container jobs, after being scheduled or after subtracting resources for already scheduled jobs, result in the creation or editing of the slotset named after the job ID or the "container" type value.

This process of creating a new slotset or adding resources to an existing slotset has some bugs and possible ameliorations.

  • The code is duplicated between these two functions and may be merged into a single consistent function:
    def set_slots_with_prev_scheduled_jobs(
    def schedule_id_jobs_ct(slots_sets, jobs, hy, id_jobs, job_security_time):
  • In set_slots_with_prev_scheduled_jobs, a pseudo job of the size of the container job (minus job_security_time) is created like that:
    if job.start_time < now:
        start_time = now
    else:
        start_time = job.start_time
    
    j = JobPseudo(
        id=0,
        start_time=start_time,
        walltime=job.walltime - job_security_time,
        res_set=job.res_set,
        ts=job.ts,
        ph=job.ts,
    )
    (https://github.com/oar-team/oar3/blob/b615fa52de2dd36a71d04f3b49d1169e744b474d/oar/kao/scheduling.py#L54C1-L57C48)
    This is a bug as it does not crop the interval for it to start at least at now: it moves the interval. Indeed, the walltime is unchanged while the begin is max(now, start_time) This may lead to inner jobs being scheduled outside of the job container.
  • In schedule_id_jobs_ct, creating a slotset is done differently than in set_slots_with_prev_scheduled_jobs:
    slot = Slot(
        1,
        0,
        0,
        copy.copy(job.res_set),
        job.start_time,
        job.start_time + job.walltime - job_security_time,
    )
    slots_sets[ss_name] = SlotSet(slot)
    (https://github.com/oar-team/oar3/blob/b615fa52de2dd36a71d04f3b49d1169e744b474d/oar/kao/scheduling.py#L510C1-L519C56)
    This is a bug as:
    • It does not take into account timesharing and placeholder, while it is taken into account in all other places (when using a pseudo job to add resources, the ts and ph fields of the job are defined so the slots will get ts and ph entries).
    • The SlotSet is of the size of the container job (minus job_security_time) instead of spanning from 1 to max_time or from now to max_time. This can lead to bugs if multiple container jobs have the same container name (same slotset). The first job will create a slotset that might be too small. If the second job has a time interval not included in the first one, any resources outside of the first job time interval will not be added if not worse.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions