Difference between revisions of "Sungrid Use"
(→Administraiton) |
(→Common Options) |
||
Line 46: | Line 46: | ||
Specify that the executable is a binary and not a script. | Specify that the executable is a binary and not a script. | ||
+ | |||
+ | -V | ||
+ | |||
+ | Specify that the job and the shells environmental variables should be the same. | ||
== Submitting Jobs == | == Submitting Jobs == |
Latest revision as of 06:50, 12 July 2012
Contents
Basic Commands
We have our machines setup in a grid using a batch queue system. You must run the job on one of the sungrid machines (qconf -sh to see them). You can check your job status with:
qstat
You can delete jobs with:
qdel
where you can specify a specific job ID or all your jobs with "-u yourid".
To submit jobs, you use qsub as shown later.
Common Options
-S /bin/sh
For the script option (you did not specify -b y), this specifies which interpreter to use.
-m es
Mail the user on the end or suspension.
-M user@soe.ucsc.edu
What email address to send to.
-cwd
Start the simulation from the current working directory.
-o soc_fpu.log
Standard output.
-e soc_fpu.err
Standard error.
-N "fpu SOC P&R"
A name for the job.
-b y
Specify that the executable is a binary and not a script.
-V
Specify that the job and the shells environmental variables should be the same.
Submitting Jobs
When you submit a job, it sources your ".profile" instead of your ".bashrc" for configuration since it is a "non-interactive" shell.
You can submit jobs in one of two ways:
Command Line
qsub <options> -b y <binary> -- <options for the binary>
Example:
qsub -S /bin/sh -m es -M myid@soe.ucsc.edu -cwd -b y -N "myid_hard_area_FP" \ -e hard_results/n10.err -o hard_results/n10.log ../simanneal/simanneal -- -i ../benchmarks/hard/n10 -o hard_results/n10
Sometimes, I have found that the "--" isn't needed... I'm not sure why.
Script
Instead of specifying all the options on the command line, you can specify them in a script like this:
#!/bin/sh #$ -S /bin/sh #$ -m es #$ -M myid #$ -cwd #$ -o hard_results/n10.log #$ -e hard_results/n10.err #$ -N "myid_hard_area_FP" ../simanneal/simanneal -i ../benchmarks/hard/n10 -o hard_results/n10 exit 0
Restricting to the same machines
Often for papers, you will want to restrict your simualtions to identical machines (e.g. you want to compare run-times). In order to do this, we have a hostgroup for all of the mada servers (mada1-7). You can submit to these machines by doing:
qsub -q *@@mada <other arguments>
You can also specify other requirements such as available memory with the "-l" command.
To list a all hosts
qconf -ss
To list of cluster queues
qconf -sql
Administraiton
Somtimes the machines will get in an error state (E):
qstat -f
will show
---------------------------------------------------------------------------- all.q@mada3.cse.ucsc.edu BIP 0/4 0.00 lx24-amd64 E
You can diagnose the error by typing:
qstat -explain E
---------------------------------------------------------------------------- all.q@mada3.cse.ucsc.edu BIP 0/4 0.00 lx24-amd64 E queue all.q marked QERROR as result of job 18526's failure at host mada3.cse.ucsc.edu
Then you can force clear a queue with:
qmod -cq all.q@mada3.cse.ucsc.edu
or the entire queue with:
qmod -c '*'