Skip to content

Running Jobs via Reservations

Running jobs via Reservations

  • What is a Reservation?

    As you are already familiar with the interactive and batch mode for running jobs on Neocortex, there is one more way to do so. That is via reservations. The reservation created against your grant id will make sure that the resource is explicitly reserved for you during the specified time duration. You can use the reserved resource for your jobs without any waiting around for the approved time window.

  • When do we need a reservation?

    You can use reservations when you want to actively debug the code and do not want to wait for the resource everytime you start an interactive or a batch job. The wait time depends on the number and the duration of jobs in the queue. You can use the following command to check the pending jobs in the queue:

    $ squeue

  • How is reservation different from normal interactive or batch mode jobs?

    Interactive mode jobs can fail if there is a network connection issue. They also have a wall-time restriction (default is 4 hours, maximum is 48 hours). If your interactive session is ending because the time is up, you can’t extend it and hence you would have to start another interactive session. There could be other pending jobs in the queue delaying your job execution, whereas a reservation guarantees that the resource is explicitly assigned for your usage regardless of the job queue.

  • How to create a reservation?

    For creating a reservation, please send an email to neocortex@psc.edu with the following details:

    • Username
    • Grant ID
    • Desired start and end date and time.
    • Number of SDF nodes (SDF-1, SDF-2), and SDF node preferred, if any.
    • Explain why you need a reservation (why the job can’t be executed with batch or interactive mode).

    You can use the command to check booked slots to ensure that you are not asking for an already reserved time slot:

    $ scontrol show reservations

    Please note that asking for a reservation does not guarantee you the reserved time slot. We will try our best to grant it.

  • How to run a job with a reservation?

    Once your reservation is confirmed, you will receive a confirmation with details including reservation-name (we will provide you reservation-name upon confirmation). In order to start a job, use the parameter --reservation with srun or sbatch.

    $ sbatch --reservation reservation-name --account GRANT_ID mnist.batch

    If your reservation is not assigned to your default grant_id, then you will need to use the --account option that is your grant-id.

    Please note that using the --reservation flag does not automatically override the default interact or batch parameters, such as job run time. Please, explicitly specify the wall-time, SDF node (if one is preferred), etc, to override the defaults.

  • Comparison table for various modes in which jobs can be run:

    Please note that comparison is between jobs in different modes asking for the same number, type of resources, duration, etc.

    Interactive Batch Reservation
    Network connectivity The interactive jobs are not robust to network connections issues. The resources will be relinquished and the job will be interrupted if there are any network connectivity issues. If the queue is busy, there could be a significant wait time to get the resources to resume debugging/training in case the resources got acquired by other jobs meanwhile. The batch jobs are robust to network connections issues. The jobs will maintain their job status even if there are network jitters. Even if the network jitters, you can quickly acquire the resources without any wait time via interactive or batch job and resume your work.
    Time limit Default: 4 hours
    Maximum: 48 hours
    Default: 4 hours
    Maximum: 48 hours
    As per your request and confirmation by PSC.
    Priority Interact jobs have more priority than batch jobs. Batch jobs have low priority as compared to interactive jobs. The resource is reserved for your usage explicitly.
    How to? Issue an interact command such as:
    $ srun --pty bash -i
    Wait times depend on how busy the queue is.
    Create a batch (or job) script which contains the commands to be run, then submit the job to be run. It will be in the queue until the resources become available. Send an email to neocortex@psc.edu with details (see section above). Interactive and Batch jobs, both can be run under reservation mode (for the confirmed duration).