Memory Requests

As of September 2010 all jobs require a memory request at submit time. This allows greater flexibility for the scheduler to find a place for your job. For example, if job A is submitted, requiring 6 GB of memory but only needing 1 processor, it currently would need to request 3 processors in order to ensure it will have 6 GB of memory (2GB per processor). If job B is submitted, requiring 10 processors but only 1 GB per process, it currently could not run on the same m6 node as job A, as the total of 13 processors is greater than the number available. With memory requests the scheduler can see that with job A running, the node has 11 processors free and 18 GB of memory that is free. This is enough for job B to run alongside job A.

For those who don't know how much memory your jobs use, here is a simple way to find out.

Launch Test Job

Launch a job using the test qos. Since the amount of memory to be used is not known, assume 2GB per process for testing.

bash-3.2$ qsub -l qos=test,pmem=2gb mpijob
2487566.fslsched.fsl.byu.edu

Find out which nodes the job is running on.

Run checkjob <jobid>. See what nodes have been allocated to the job. They are listed in the form [-node-:-procs-] where -node- is the hostname of the node assigned and -procs- is the number of processors on that node. It may be in other similar forms, ex "m5-1-[1-4]*8" means nodes m5-1-1,m5-1-2,m5-1-3,m5-1-4 each have 8 processors allocated.

bash-3.2$ checkjob 2487566
job 2487566

AName: Primes
State: Running
Creds: user:hamiltop group:hamiltop account:staff class:batch qos:test
WallTime: 00:00:03 of 1:00:00
SubmitTime: Tue Jul 6 11:41:31
(Time Queued Total: 00:00:03 Eligible: 00:00:03)

StartTime: Tue Jul 6 11:41:34
NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 1

Req[0] TaskCount: 1 Partition: base
Dedicated Resources Per Task: PROCS: 1 MEM: 2048M


    Allocated Nodes:
    [m5-1-1:1]


StartCount: 1
Flags: PREEMPTOR,FSVIOLATION
Attr: FSVIOLATION,checkpoint
StartPriority: 21725
Reservation '2487566' (-00:00:44 -> 00:59:16 Duration: 1:00:00)

SSH to an allocated node.

Log on to the first node in the Allocated Nodes List.

bash-3.2$ ssh m5-1-1
hamiltop@m5-1-1's password:
Last login: Tue Jul 6 11:44:55 2010 from m6int02.fsl.byu.edu
Fulton Supercomputing Lab

System Administrators: Tom Raisor, Lloyd Brown, and Ryan Cox.

For support issues please email fslsupport@byu.edu

By using this system, you agree to abide by the Supercomputing usage
policy. See http://marylou.byu.edu/policy.php for details.

Run top

Run the program top. The values under RES show the amount of physical memory (RAM) being used by the process. This should be the amount of pmem in your job request.

top - 15:40:15 up 20 days, 22:21, 1 user, load average: 13.41, 11.47, 6.27
Tasks: 224 total, 11 running, 213 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.7%us, 93.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.1%hi, 5.2%si, 0.0%
Mem: 24679044k total, 11530628k used, 13148416k free, 355968k buffers
Swap: 4096564k total, 27844k used, 4068720k free, 10828984k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6465 hamiltop 25 0 153m 4076 2440 S 100.4 0.0 9:56.11 a.out
6458 hamiltop 25 0 153m 4072 2436 R 100.1 0.0 9:56.04 a.out
6461 hamiltop 25 0 153m 4080 2448 R 100.1 0.0 9:56.09 a.out
6462 hamiltop 25 0 153m 4072 2440 R 100.1 0.0 9:56.05 a.out
6463 hamiltop 25 0 153m 4076 2440 R 100.1 0.0 9:55.43 a.out
6464 hamiltop 25 0 153m 4080 2440 R 100.1 0.0 9:55.66 a.out
6466 hamiltop 25 0 153m 4072 2440 S 100.1 0.0 9:56.00 a.out
6459 hamiltop 25 0 153m 4076 2440 R 99.8 0.0 9:55.89 a.out
6460 hamiltop 25 0 153m 4072 2436 S 99.8 0.0 9:55.98 a.out
6467 hamiltop 25 0 153m 4080 2444 R 99.8 0.0 9:55.99 a.out
6468 hamiltop 25 0 153m 4072 2436 R 99.8 0.0 9:56.09 a.out
6457 hamiltop 25 0 154m 4676 2828 R 78.8 0.0 7:42.05 a.out
6456 hamiltop 15 0 56776 2384 1632 S 21.0 0.0 2:13.80 mpiexec
6852 hamiltop 15 0 12848 1196 820 R 0.3 0.0 0:00.04 top
1 root 15 0 10348 80 52 S 0.0 0.0 1:06.19 init 

After the job has completed, the runtime statistics are available on our website in the Account Manager under Job Stats.

Submit job with memory request

Using the information obtained through job statistics and top, we can request the amount of memory needed for our job to run. We can either request the memory required per process (pmem) or the total memory the job will need (mem). Most often pmem will be easier to calculate. Once calculated you can submit the job using either pmem or mem. Memory can be requested in mb or gb.

bash-3.2$ qsub -l pmem=5mb mpijob
2487667.fslsched.fsl.byu.edu

#or

bash-3.2$ qsub -l mem=240mb mpijob
2487668.fslsched.fsl.byu.edu

Verify that job has memory request

Run checkjob on the new Job ID. Check to see that the proper resource request was made.

bash-3.2$ checkjob 2487667
job 2487667

AName: Primes
State: Running
Creds: user:hamiltop group:hamiltop account:staff class:batch qos:default
WallTime: 00:00:54 of 1:00:00
SubmitTime: Tue Jul 6 16:43:48
(Time Queued Total: 00:00:51 Eligible: 00:00:51)

StartTime: Tue Jul 6 16:44:39
NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 48

Req[0] TaskCount: 48 Partition: base
Memory >= 5M Disk >= 0 Swap >= 0
Dedicated Resources Per Task: PROCS: 1 MEM: 5M

Allocated Nodes:
[m6-9-14:12][m6-18-13:12][m6-18-16:12][m6-19-2:12]


StartCount: 1
Flags: PREEMPTOR,FSVIOLATION
Attr: FSVIOLATION,checkpoint
StartPriority: 25345
Reservation '2487667' (-00:01:39 -> 00:58:21 Duration: 1:00:00)