How do I use the local hard drive on the node?

In some situations, it makes sense to use the local hard drive on the compute nodes, as temporary space. This document will describe how this is done, and the important considerations.

PLEASE READ THIS WHOLE PAGE BEFORE GETTING STARTED USING LOCAL HARD DRIVES. IT IS VERY IMPORTANT THAT YOU UNDERSTAND HOW TO PROPERLY USE THE SYSTEM.

Motivation

Some applications tend to do a lot of reading and writing, especially random, small-block reads and writes, which are about the worst type of Input/Output (I/O) workload ever. Additionally, we occasionally see surges of traffic on the centralized filesystems, which make them run fairly slow. Of course we are working on fixing this, but you may still see it on occasion.

If you need some space to use for temporary files, the local hard drive on the compute nodes might be a good idea. It really depends on the context, what's going on with the system, etc.

Location

The space available on the local hard drive may be used under the "/tmp" folder. In order to avoid possibly overwriting files, we recommend creating a directory based on the unique job number. The example on this page will demonstrate this.

It is also important to know that the /tmp folder is local to each compute node. So, if you use multiple nodes in your job, the data you put there may not necessarily be visible everywhere. However, most of the people who use this method, are running one-node jobs, so this is a non-issue for them. But be aware of the implications.

Cleaning Up After Yourself

Since the "/tmp" folder is only available on each individual compute node, you won't be able to clean up after yourself interactively. Therefore, each job must do the cleanup for itself.

In general, this consists of the following, either at the end of the job, or when the job is deleted/canceled:

  • Copying any needed data back to the central file systems (eg. home or compute directories)
  • Removing the temporary directories on the local hard drive

If you put code in your job script to do this at the end of the job, it will work, assuming that your job runs to completion. However, if you job hits its walltime and is killed by the system, or if you decide to remove it using "scancel" after the job starts running, then that cleanup code at the end of the job will not be reached. The answer to this lies in handling system signals, which is demonstrated in the example on this page.

Space Available

It should be noted that the "/tmp" filesystem is used for a number of other uses, so the full size may not be available. Also, if you have more than one job utilizing the local "/tmp" filesystem, they will also be competing for space.

In general, the nodes have the following data capacities:

Nodes Approximate space in /tmp
m7 200 GB
m8 200 GB
m9 800 GB
m8f/m8h (bigmemory) nodes 400 GB
m8g (GPU) nodes 400 GB

Example 

#!/bin/bash
#SBATCH --time=03:00:00   # walltime
#SBATCH --ntasks=8   # number of processor cores (i.e. tasks)
#SBATCH --nodes=1   # number of nodes
#SBATCH --mem-per-cpu=1024M   # memory per CPU core
#SBATCH -J "myjobname"   # job name


#define variables to represent the directories involved
TEMPORARY_DIR="/tmp/$SLURM_JOB_ID"
DATASRC_DIR="$HOME/data_source"
DATADEST_DIR="$HOME/data_dest"

#set up function.  this isn't called/run here.  It's just used 
#   if the job is canceled via a signal
cleanup_scratch()
{
        echo "Deleting inside signal handler, meaning I probably either hit the walltime, or deleted the job using scancel"
        
        #copy wanted data from $TEMPORARY_DIR to $DATADEST_DIR
        cp -v "$TEMPORARY_DIR/results.dat" "$DATADEST_DIR"

        #change to a safe location
        cd "$HOME"
        
        #remove the remaining data in $TEMPORARY_DIR
        rm -rfv "$TEMPORARY_DIR"
        echo "---"
        echo "Signal handler ending time:"
        date
        exit 0
}

#Associate the function "cleanup_scratch" with the TERM signal, which is usually how jobs get killed
trap 'cleanup_scratch' TERM

#create temporary directory
echo "Creating Temporary directory at $TEMPORARY_DIR"
mkdir -pv "$TEMPORARY_DIR" 2>&1
echo "---"

#copy working data information from $DATASRC_DIR* to $TEMPORARY_DIR
echo "Copying working data information from $DATASRC_DIR* to $TEMPORARY_DIR"
cp -v "$DATASRC_DIR/"* "$TEMPORARY_DIR"
echo "---"

#DO YOUR JOB'S WORK HERE
#  NOTE: what you do to utilize $TEMPORARY_DIR depends on your program and environment

echo "Deleting at the end of the job"
        
#copy wanted data from $TEMPORARY_DIR to $DATADEST_DIR
cp -v "$TEMPORARY_DIR/results.dat" "$DATADEST_DIR"

#change to a safe location
cd "$HOME"
        
#remove the remaining data in $TEMPORARY_DIR
rm -rfv "$TEMPORARY_DIR"
echo "---"