BYU

Office of Research Computing

Why won't my job submit?

A job may refuse to submit for many reasons. Usually, a failure to submit is a result of an impossible resource request (like 200 CPUs on one node), a missing submission flag (like forgetting to use --time=...), or a typo. A submission filter is in place to catch most such errors and print a helpful message without submitting the job, which is doomed to failure anyway.

Failure to specify required flags

Minimally, a job must be submitted with a request for memory and a time limit. Forgetting to specify a time limit or request memory will result in an error message:

$ salloc --time=01:00:00 # forgot to request memory
salloc: error: All jobs must request memory with either --mem or --mem-per-cpu.
salloc: error: Job submit/allocate failed: Unspecified error
$ sbatch --mem=1G myjob.sh # forgot to set a time limit
sbatch: error: Must specify --time
sbatch: error: Job submit/allocate failed: Unspecified error

Impossible resource requests

The filter also rejects jobs that could not possibly run on hardware that we have. For example, since we do not have any public nodes with 26 or more CPUs in addition to 260G or more of memory, the following job request will fail:

$ sbatch --time=24:00:00 --nodes=1 --ntasks=26 --mem-per-cpu=10G myjob.sh
sbatch: error: Cannot access partition knlg due to memory requirements.
Cannot access partition knlp due to memory requirements.
...
[truncated for brevity]
...
Cannot access partition bio due to QOS normal not being allowed.
Your job cannot run due to an incorrect resource request or there is a bug in the submit plugin. Please correct your
parameters or open a support ticket at https://marylou.byu.edu/ticket with the exact sbatch command line parameters and
#SBATCH lines. Note that only one rejection message is included per partition; there may be multiple reasons for
rejection.
sbatch: error: Job submit/allocate failed: Unspecified error

When your job is submitted, the filter checks every possible partition that the job could be run on and rejects it if no partition is suitable. See the compute resources page for a list of available hardware.

Other checks performed by the filter

If a job will inevitably be rejected due to unacceptable or incongruous resource requests, the filter will preclude its submission. The best way to ensure that this doesn't happen is to request the minimum possible set of constraints on the job--if you need 24 cores and 100G on one node, with no other requirements, specify only that--the less you specify, the faster your job will start, and the less likely it is to be rejected.

QOS restrictions

Sometimes a QOS or partition cannot be requested in combination with other restrictions. For example, no more than 5 jobs can be submitted to the test QOS at a time (including job arrays), so this request fails:

$ sbatch --time=01:00:00 --ntasks=1 --mem=1G --qos=test --array=1-10 myjob.sh
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or
time limits)

If you submit a job to a QOS to which you don't have access, it will obviously fail, as will requesting too much time, memory, or too many CPUs for the given QOS.

Partition restrictions

Any given partition will be lacking in some resource. For example, I cannot require infiniband if I want to use m8:

$ salloc --time 04:00:00 --mem=1G --partition=m8 --constraint=ib
salloc: error: Cannot access partition m8 due to feature requirements.
Your job cannot run due to an incorrect resource request or there is a bug in the submit plugin. Please correct your
parameters or open a support ticket at https://marylou.byu.edu/ticket with the exact sbatch command line parameters and
#SBATCH lines. Note that only one rejection message is included per partition; there may be multiple reasons for
rejection.
salloc: error: Job submit/allocate failed: Unspecified error

Partitions also sometimes have specific restrictions on resource usage. For example, requesting m8f or m8h (the big memory nodes) for more than 6 hours without requesting enough memory to make their usage worthwhile fails:

$ sbatch --time=24:00:00 --partition=m8f --mem=10G0 --nodes=1 --ntasks=24 myjob.sh
sbatch: error: Cannot access partition m8f with QOS normal, time limit > 360 minutes, and memory needs <= 128G
Your job cannot run due to an incorrect resource request or there is a bug in the submit plugin. Please correct your
parameters or open a support ticket at https://marylou.byu.edu/ticket with the exact sbatch command line parameters and
#SBATCH lines. Note that only one rejection message is included per partition; there may be multiple reasons for
rejection.
sbatch: error: Job submit/allocate failed: Unspecified error
$ sbatch --time=03:00:00 --partition=m8f --mem=10G0 --nodes=1 --ntasks=24 myjob.sh # less time requested
Submitted batch job 1234

The solution to such woes is, of course, to avoid specifying a partition for your job unless there is some compelling reason to do so.

Other

There are several other reasons that a job could be rejected (in addition to typos, of course). A couple of examples follow.

If you request job emails but don't specify an email address, your job will be rejected:

$ sbatch --time=01:00:00 --mem=1G --mail-type=BEGIN # no `--mail-user=...` specified
sbatch: error: You must specify an email address with --mail-user if you want emails.
sbatch: error: Batch job submission failed: Unspecified error

Your job will also be rejected if you specify a GPU partition (m8g or m9g) without requesting GPUs:

$ sbatch --time=12:00:00 --mem=64G --nodes=1 --ntasks=24 --partition=m8g # failed to specify `--gres=gpu:2`
sbatch: error: Cannot access partition m8g with QOS normal, time limit > 180 minutes, and without a need for GPUs
Your job cannot run due to an incorrect resource request or there is a bug in the submit plugin. Please correct your
parameters or open a support ticket at https://marylou.byu.edu/ticket with the exact sbatch command line parameters and
#SBATCH lines. Note that only one rejection message is included per partition; there may be multiple reasons for
rejection.
sbatch: error: Batch job submission failed: Unspecified error