Difference between revisions of "Sungrid Use"

From Vlsiwiki
Jump to: navigation, search
(New page: We have our machines setup in a grid using a batch queue system. You can submit jobs in one of two ways: == Common Options == -S /bin/sh For the script option below, this specifies whi...)
 
(Common Options)
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
We have our machines setup in a grid using a batch queue system. You can submit jobs in one of two ways:
+
== Basic Commands ==
 +
 
 +
We have our machines setup in a grid using a batch queue system. You must run the job on one of the sungrid machines (qconf -sh to see them). You can check your job status with:
 +
 
 +
qstat
 +
 
 +
You can delete jobs with:
 +
 
 +
qdel
 +
 
 +
where you can specify a specific job ID or all your jobs with "-u yourid".
 +
 
 +
To submit jobs, you use qsub as shown later.
  
 
== Common Options ==
 
== Common Options ==
Line 5: Line 17:
 
  -S /bin/sh
 
  -S /bin/sh
  
For the script option below, this specifies which interpreter.
+
For the script option (you did not specify -b y), this specifies which interpreter to use.
  
 
  -m es
 
  -m es
Line 34: Line 46:
  
 
Specify that the executable is a binary and not a script.
 
Specify that the executable is a binary and not a script.
== Command Line ==
+
 
 +
-V
 +
 
 +
Specify that the job and the shells environmental variables should be the same.
 +
 
 +
== Submitting Jobs ==
 +
 
 +
When you submit a job, it sources your ".profile" instead of your ".bashrc" for configuration since it is a "non-interactive" shell.
 +
 
 +
You can submit jobs in one of two ways:
 +
 
 +
=== Command Line ===
 +
 
 +
 
  
 
  qsub <options> -b y <binary> -- <options for the binary>
 
  qsub <options> -b y <binary> -- <options for the binary>
Line 40: Line 65:
 
Example:
 
Example:
 
   
 
   
  qsub -S /bin/sh -m es -M myid@soe.ucsc.edu -cwd -b y -N "myid_hard_area_FP" -e hard_results/n10.err -o hard_results/n10.log ../simanneal/simanneal  -- -i ../benchmarks/hard/n10 -o hard_results/n10  
+
  qsub -S /bin/sh -m es -M myid@soe.ucsc.edu -cwd -b y -N "myid_hard_area_FP" \
 +
-e hard_results/n10.err -o hard_results/n10.log ../simanneal/simanneal  -- -i ../benchmarks/hard/n10 -o hard_results/n10
  
== Script ==
+
Sometimes, I have found that the "--" isn't needed... I'm not sure why.
 +
 
 +
=== Script ===
  
 
Instead of specifying all the options on the command line, you can specify them in a script like this:
 
Instead of specifying all the options on the command line, you can specify them in a script like this:
Line 56: Line 84:
 
  ../simanneal/simanneal -i ../benchmarks/hard/n10 -o hard_results/n10  
 
  ../simanneal/simanneal -i ../benchmarks/hard/n10 -o hard_results/n10  
 
  exit 0
 
  exit 0
 +
 +
== Restricting to the same machines ==
 +
 +
Often for papers, you will want to restrict your simualtions to identical machines (e.g. you want to compare run-times). In order to do this, we have a hostgroup for all of the mada servers (mada1-7). You can submit to these machines by doing:
 +
 +
qsub -q *@@mada <other arguments>
 +
 +
You can also specify other requirements such as available memory with the "-l" command.
 +
 +
To list a all hosts
 +
qconf -ss
 +
 +
To list of cluster queues
 +
qconf -sql
 +
 +
== Administraiton ==
 +
 +
Somtimes the machines will get in an error state (E):
 +
 +
qstat -f
 +
 +
will show
 +
 +
----------------------------------------------------------------------------
 +
all.q@mada3.cse.ucsc.edu      BIP  0/4      0.00    lx24-amd64    E
 +
 +
You can diagnose the error by typing:
 +
 +
qstat -explain E
 +
 +
----------------------------------------------------------------------------
 +
all.q@mada3.cse.ucsc.edu      BIP  0/4      0.00    lx24-amd64    E
 +
        queue all.q marked QERROR as result of job 18526's failure at host mada3.cse.ucsc.edu
 +
 +
Then you can force clear a queue with:
 +
 +
qmod -cq all.q@mada3.cse.ucsc.edu
 +
 +
or the entire queue with:
 +
qmod -c '*'

Latest revision as of 06:50, 12 July 2012

Basic Commands

We have our machines setup in a grid using a batch queue system. You must run the job on one of the sungrid machines (qconf -sh to see them). You can check your job status with:

qstat

You can delete jobs with:

qdel

where you can specify a specific job ID or all your jobs with "-u yourid".

To submit jobs, you use qsub as shown later.

Common Options

-S /bin/sh

For the script option (you did not specify -b y), this specifies which interpreter to use.

-m es

Mail the user on the end or suspension.

-M user@soe.ucsc.edu

What email address to send to.

-cwd

Start the simulation from the current working directory.

-o soc_fpu.log

Standard output.

-e soc_fpu.err

Standard error.

-N "fpu SOC P&R"

A name for the job.

-b y

Specify that the executable is a binary and not a script.

-V

Specify that the job and the shells environmental variables should be the same.

Submitting Jobs

When you submit a job, it sources your ".profile" instead of your ".bashrc" for configuration since it is a "non-interactive" shell.

You can submit jobs in one of two ways:

Command Line

qsub <options> -b y <binary> -- <options for the binary>

Example:

qsub -S /bin/sh -m es -M myid@soe.ucsc.edu -cwd -b y -N "myid_hard_area_FP" \
-e hard_results/n10.err -o hard_results/n10.log ../simanneal/simanneal  -- -i ../benchmarks/hard/n10 -o hard_results/n10

Sometimes, I have found that the "--" isn't needed... I'm not sure why.

Script

Instead of specifying all the options on the command line, you can specify them in a script like this:

#!/bin/sh
#$ -S /bin/sh
#$ -m es
#$ -M myid
#$ -cwd
#$ -o hard_results/n10.log
#$ -e hard_results/n10.err
#$ -N "myid_hard_area_FP"
../simanneal/simanneal -i ../benchmarks/hard/n10 -o hard_results/n10 
exit 0

Restricting to the same machines

Often for papers, you will want to restrict your simualtions to identical machines (e.g. you want to compare run-times). In order to do this, we have a hostgroup for all of the mada servers (mada1-7). You can submit to these machines by doing:

qsub -q *@@mada <other arguments>

You can also specify other requirements such as available memory with the "-l" command.

To list a all hosts

qconf -ss

To list of cluster queues

qconf -sql

Administraiton

Somtimes the machines will get in an error state (E):

qstat -f

will show

----------------------------------------------------------------------------
all.q@mada3.cse.ucsc.edu       BIP   0/4       0.00     lx24-amd64    E

You can diagnose the error by typing:

qstat -explain E
----------------------------------------------------------------------------
all.q@mada3.cse.ucsc.edu       BIP   0/4       0.00     lx24-amd64    E
       queue all.q marked QERROR as result of job 18526's failure at host mada3.cse.ucsc.edu

Then you can force clear a queue with:

qmod -cq all.q@mada3.cse.ucsc.edu

or the entire queue with:

qmod -c '*'