
The squeue command can report on jobs in the job queue according to their state; valid states
are: pendin g, running, completing, completed, failed, timeout, and node_fail. Example 6-3 uses
the squeue command to report on failed jobs.
Example 6-3: Reporting on Failed Jobs in the Queue
$ squeue --state=FAILED
JOBID PARTITION NAME USER ST TIME NODES NODELIST
59 amt1 hostname root F 0:00 0
6.6 Killing Jobs w ith the scancel Command
The scancel command can cels a pendin g or running job or job s tep. It can also be used to
send a specified signal to all processes on all nodes associated with a job. Only job owners
or administrators can cancel jobs.
Example 6-4 kills job 415 and all its jobsteps.
Example 6-4: Killin g a Job by Its JobID
$ scancel 415
Example 6-5 cancels all pending jobs.
Example 6-5: Cancelling All Pending Jobs
$ scancel --state=PENDING
Example 6-6 sends the TERM signal to terminate jobsteps 421.2 and 421.3.
Example 6-6: Sending a Signal to a Job
$ scancel --signal=TERM 421.2 421.3
6.7 Getting System Information with the sinfo Command
The sinfo command reports the state of partitio ns and nodes ma naged by SLURM. It has
a wide variety of filtering, sorting, and formatting options. sinfo displays a summary of
available partition and node (not job) information (such as partition names, nodes/partition,
and CPUs/node).
Example 6-7: Using the sinfo Command (No Options)
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 1 down* n15
lsf up infinite 2 idle n[14,16]
Using SLURM 6-13
Commentaires sur ces manuels