TensorFlow on RHe7¶
- TensorFlow is a popular open source library for machine learning
- The process is still the same to run it as on RHe6, but you now have to load a different module
Things to Note¶
- To utilize
tensortflow-gpu/2.0.0, as this guide does, you must submit the job to a gpu enabled queue like
ece-gpuetc. As of recently,
force-gpuhas been updated to RHe7 so it should work as well.
pace-whoamito see what queues you have access to
- The tensorflow script in this guide is a slightly modified version of Google's text classification with TensorFlow and Keras Guide.
- The neural net is trained on imdb movie reviews and is designed to predict if a given movie review is positive or negative.
- The tensorflow script can be found here: imdb_tf.py)
- Data for script: imdb_data.pickle
Part 1: PBS Script¶
- Make sure you load
tensorflow-gpu/2.0.0. This is the difference between running tensorflow on RHe6 and RHe7.
tensorflow-gpu/2.0.0isn't available on RHe6.
#PBS -N tensorflow_test #PBS -A [Account] #PBS -l nodes=1:ppn=4:gpus=1 #PBS -l walltime=5:00 #PBS -q inferno #PBS -j oe #PBS -o tf_imdb_results.out cd $PBS_O_WORKDIR module purge module load tensorflow-gpu/2.0.0 python imdb_tf.py
- The above script can be found here: tensorflow_rhe7.pbs)
#PBSdirectives request 5 minutes of walltime, 1 node with 4 cores, and 1 gpu. More on
#PBSdirectives can be found in the PBS guide
$PBS_O_WORKDIRis a variable that represents the directory you submit the PBS script from. Input and output files for the script should be found in the same directory you put the PBS script. Make sure the data file and the python script are in this same folder where you submit the
module load tensorflow-gpu/2.0.0loads the version 1.2 of Tensorflow. To see what Tensorflow versions are available, run
module avail tensorflow, and load the one you want.
python imdb_tf.pyruns tensorflow
Part 2: Submit Job and Check Status¶
- Make sure you're in the dir that contains the
PBSScript as well as the tensorflow script
- Submit as normal, with
qsub <pbs script name>. In this case
- Check job status with
qstat -t 22182721, replacing the number with the job id returned after running qsub
- You can delete the job with
qdel 22182721, again replacing the number with the jobid returned after running qsub
Part 3: Collecting Results¶
- In the directory where you submitted the
PBSscript, you should see a
tf_imdb_results.outfile which contains the results of the job. Use
cator open the file in a text editor to take a look.
tf_imdb_results.outshould look like this:
25000/25000 [==============================]25000/25000 [==============================] - 1s 29us/step Loss, accuracy: [0.31646855477809904, 0.8754] --------------------------------------- Begin PBS Epilogue Thu Aug 2 14:45:27 EDT 2018 Job ID: 21872475.shared-sched.pace.gatech.edu
- The tensorflow script should also create a training vs accuracy chart, which will appear as
training_accuracy.pngin the directory where you submitted the
- After the result files are produced, you can move the files off the cluster, refer to the file transfer guide for help.
- Congratulations! You successfully ran Tensorflow on RHe7.