Hp XC System 2.x Software Manuel d'utilisateur Page 33

  • Télécharger
  • Ajouter à mon manuel
  • Imprimer
  • Page
    / 154
  • Table des matières
  • MARQUE LIVRES
  • Noté. / 5. Basé sur avis des utilisateurs
Vue de la page 32
2.3 Launching and Managing Jobs Quick Start
This section p rovides a brief description of some of the many ways to lau
nch jobs, m a nage jobs,
and get information about jobs on an HP XC system. This section is i nten
dedonlyasaquick
overview about some basic ways of running and managin g jobs. Full info
rmation and details
about the HP XC job launch environment are provided in the SLURM chapte
r(Chapter6)and
the LSF chapter (Chapter 7) of this manual.
2.3.1 Introduction
As described in Section 1.4, SLURM and LSF cooperate to run and manage jobs on the HP
XC system, combining LSF’s pow erful and flexible scheduling functionality with SLURM’s
scalable parallel job launching capabilities.
SLURM is the low-level resource manager and job launcher, and performs processor allocation
for jobs. LSF gathers info rm ation about the cluster from SLURM when a job is ready to be
launched, LSF creates a SLURM node allocation and dispatches the job to that allocation.
Although jobs can be launched directly using SLURM, it is r ecom men ded that yo u use LSF
to take advantage of its sched uling and job management capabilities. SLURM options can be
added to the LSF job launch command line to further define job launch requirements. The
HP-MPI mpirun command and its options can be used within LSF to l aunch jobs that require
MPI’s high-performance message-passing cap abilities.
When the HP XC system is installed, a SLURM partition o f nodes is created to contain LSF
jobs. This partition is called the lsf partition.
When a job is submitted to LSF, the LSF scheduler prioritizes the job and waits until the
required resources (compute nodes from the lsf partition) ar e available.
When the requested r
esources are available for the job, LSF-HPC creates a SLURM allocation
of nodes on behalf of
the user, sets the SLURM JobID for the allocation, and dispatches the
job with the LSF-HPC
JOB_STARTER script to the first allocated node.
A detailed explanation of how SLURM and LSF interact to launch and manage j obs is provided
in Section 7.1.4.
2.3.2 Getting Information A bout Queues
The LSF bqueues command lists the c o nfigured job qu eues in LSF. By default, bqueues
returns the following information about all queues: queu e name, queue priority, queue status,
job slot statistics, and job state statistics.
To get informatio n about queues, en ter the bqueues as follows:
$ bqueues
Refer to Section 7.3.4 for more i nform ation about using this co mm and and a sample of its output.
2.3.3 Getting Information About Resources
The LSF bhosts, lshosts,andlsload commands are quick ways to get information abo ut
system resources. LSF daemons run on only one node in the HP XC system, so the bhosts
and lshosts commands will list one host which represents all the resources of the HP
XC system. The total n um ber of processors fo r that host should be equal to the total n um ber of
processors assigned to the SLURM lsf partit ion.
•TheLSFbhosts com m a nd provid es a summary of the jobs on the system and information
about the current state of LSF.
$ bhosts
Refer to Section 7.3.1 for more info rm a tion about using this command and a sample of
its output.
Using the System 2-7
Vue de la page 32
1 2 ... 28 29 30 31 32 33 34 35 36 37 38 ... 153 154

Commentaires sur ces manuels

Pas de commentaire