Job ran out of memory but the reported MaxRSS value was lower than the requested amount

Jobs may die due to insufficient memory but the MaxRSS accounting field doesn't reflect that. For example, you may see the following output from the job which requested 2 GB of memory:

slurmd[m6-1-17]: Exceeded step memory limit at some point. Step may have been partially swapped out to disk

However, the MaxRSS field in sacct shows that your job only used about 25% (.5 GB) of the memory that the job requested. Why would that cause a problem? The problem is with the reporting; the job actually did exceed the requested memory amount (2GB).

SLURM's accounting mechanism is polling based and doesn't always catch spikes in memory usage. FSL's implementation uses a Linux kernel feature called "cgroups" to control memory and CPU usage. SLURM sets up a cgroup for the job with the appropriate limits which the Linux kernel strictly enforces.

The problem is simple: the kernel killed a process from the offending job and the SLURM accounting mechanism didn't poll at the right time to see the spike in usage that caused the kernel to kill the process.

There is ongoing work to correct this in SLURM but it is not yet ready. Just to repeat it: the job did use more memory than it requested.