BYU

Office of Research Computing

What happens when jobs hit their walltime, or I remove a running job using qdel?

If you have a job running, and it either hits its walltime, or you decide that you want to remove it and use qdel to delete it, the system does the following; Note that this does not apply to jobs that are not running when you remove them with qdel:

  1. The scheduler sends the job a TERM signal, telling the process to stop. Normally this will end the process, and the job will die.
  2. If the job hasn't ended within a few seconds, the scheduler sends a KILL signal, which unconditionally kills the process.

It is possible to change the job script's behavior, so that if it gets sent a TERM signal, you can do some cleanup before the process ends. For example, let's say that your program outputs a log file, and if the job ends normally, you don't care about the log, but if it's killed prematurely, you want to keep a copy of the log. You can set up your job to discard the log during normal operation, but set up a signal handler to make a copy of the log when the job is killed prematurely. For more information, see this page.