HP XC System SoftwareUser's GuideVersion 3.0Part number: 5991-4847published January 2006
10
Examine the local host information:$ hostnamen2Examine the job information:$ bjobsNo unfinished job foundRun the LSF bsub -Is command to launch the in
SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp memloadSched - - - - - - - - - - -loadStop - - - -
Examine the partition information:$ sinfoPARTITION AVAIL TIMELIMIT NODES STATE NODELISTlsf up infinite 6 idle n[5-10]Examine the loca
Examine the the finished job's information:$ bhist -l 124Job <124>, User <lsfadmin>, Project <default>, Interactive pseudo-term
n16n16Linux n14 2.4.21-15.3hp.XCsmp #2 SMP date and time stamp ia64 ia64 ia64 GNU/LinuxLinux n14 2.4.21-15.3hp.XCs
n15n15n16n16$ srun -n3 hostnamen13n14n15Exit the pseudo-terminal:$ exitexitView the interactive jobs:$ bjobs -l 1008Job <1008>, User smith, Proj
Copyright 1992-2004 Platform Computing CorporationMy cluster name is penguinMy master name is lsfhost.localdomain$ sinfoPARTITION AVAIL TIMELIMIT N
6 Processors Requested;date and time stamp: Dispatched to 6 Hosts/Processors <6*lsfhost.localdomain>;d
GlossaryAadministrationbranchThe half (branch) of the administration network that contains all of the general-purposeadministration ports to the nodes
List of Examples4-1 Directory Structure...
FCFS First-come, first-served. An LSF job-scheduling policy that specifies that jobs are dispatchedaccording to their order in a queue, which is deter
LLinux VirtualServerSee LVS.load file A file containing the names of multiple executables that are to be launched simultaneously by asingle command.Lo
NetworkInformationServicesSee NIS.NIS Network Information Services. A mechanism that enables centralization of common data that ispertinent across mul
SMP Symmetric multiprocessing. A system with two or more CPUs that share equal (symmetric) accessto all of the facilities of a computer system, such a
IndexAACML library, 42application development, 37building parallel applications, 42building serial applications, 39communication between nodes, 97comp
configuring local disk, 96core availability, 38DDDT, 53debuggerTotalView, 53debuggingDDT, 53gdb, 53idb, 53pgdbg, 53TotalView, 53debugging optionssetti
submitting jobs, 77summary of bsub command, 77using srun with, 64viewing historical information of jobs, 82LSF-SLURM external scheduler, 45lshosts com
examples of, 99programming model, 39shared file view, 97signalsending to a job, 65Simple Linux Utility for Resource Management (see SLURM)sinfo comman
About This DocumentThis document provides information about using the features and functions of the HP XC System Software.It describes how the HP XC u
• Chapter 10: Advanced Topics (page 91) provides information on remote execution, running an Xterminal session from a remote node, and I/O performance
Documentation for the HP Integrity and HP ProLiant servers is available at the following URL:http://www.docs.hp.com/For More InformationThe HP Web sit
• http://supermon.sourceforge.net/Home page for Supermon, a high-speed cluster monitoring system that emphasizes low perturbation,high sampling rates,
Related Linux Web Sites• http://www.redhat.comHome page for Red Hat®, distributors of Red Hat Enterprise Linux Advanced Server, a Linux distributionwi
•Perl Cookbook, Second Edition, by Tom Christiansen and Nathan Torkington•Perl in A Nutshell: A Desktop Quick Reference, by Ellen Siever, et al.Typogr
1 Overview of the User EnvironmentThe HP XC system is a collection of computer nodes, networks, storage, and software, built into a cluster,that work
© Copyright 2003, 2005, 2006 Hewlett-Packard Development Company, L.P.Confidential computer software. Valid license from HP required for possession, u
Table 1-1 Determining the Node PlatformPartial Output of /proc/cpuinfoPlatformprocessor : 0 vendor_id : GenuineIntel cpu family : 15
SAN StorageThe HP XC system uses the HP StorageWorks Scalable File Share (HP StorageWorks SFS), which is basedon Lustre technology and uses the Lustre
Be aware of the following information about the HP XC file system layout:• Open source software that by default would be installed under the /usr/loca
free -mDisk Partitions Use the following command to display the disk partitions and theirsizes:cat /proc/partitionsSwap Use the following command to d
Documentation CD contains XC LSF manuals from Platform Computing. LSFmanpages are available on the HP XC system.SLURM commands HP XC uses the Simple L
by default for LSF-HPC batch jobs. The system administrator has the option of creating additional partitions.For example, another partition could be c
SLURM Allocates nodes for jobs as determined by LSF-HPC. It CONTROLS task/rank distribution withinthe allocated nodes. SLURM also starts the executabl
2 Using the SystemThis chapter describes the tasks and commands that the general user must know to use the system. It addressesthe following topics:•
IntroductionAs described in Run-Time Environment (page 24), SLURM and LSF-HPC cooperate to run and manage jobson the HP XC system, combining LSF-HPC&a
$ lsloadFor more information about using this command and a sample of its output, see Getting Host LoadInformation (page 76).Getting Information About
Table of ContentsAbout This Document...13Intended Audience...
Getting System Help and InformationIn addition to the hardcopy documentation described in the preface of this document (About This Document),the HP XC
3 Configuring Your Environment with ModulefilesThe HP XC system supports the use of Modules software to make it easier to configure and modify the you
could cause inconsistencies in the use of shared objects. If you have multiple compilers (perhaps withincompatible shared objects) installed, it is pr
Table 3-1 Supplied ModulefilesSets the HP XC User Environment to Use:ModulefileIntel C/C++ Version 8.0 compilers.icc/8.0Intel C/C++ Version 8.1 compil
you are attempting to load conflicts with a currently loaded modulefile, the modulefile will not be loadedand an error message will be displayed.If yo
When a modulefile conflict occurs, unload the conflicting modulefile before loading the new modulefile. Inthe previous example, you should unload the
4 Developing ApplicationsThis chapter discusses topics associated with developing applications in the HP XC environment. Beforereading this chapter, y
Table 4-1, “Compiler Commands” displays the compiler commands for Standard Linux, Intel, and PGIcompilers for the C, C++, and Fortran languages.Table
The Ctrl/C key sequence will report the state of all tasks associated with the srun command. If the Ctrl/Ckey sequence is entered twice within one sec
3 Configuring Your Environment with ModulefilesOverview of Modules...
Developing Parallel ApplicationsThis section describes how to build and run parallel applications. The following topics are discussed:• Parallel Appli
PthreadsPOSIX Threads (Pthreads) is a standard library that programmers can use to develop portable threadedapplications. Pthreads can be used in conj
http://www.pathscale.com/ekopath.html.GNU Parallel MakeThe GNU parallel Make command is used whenever the make command is invoked. GNU parallel Makepr
Examples of Compiling and Linking HP-MPI ApplicationsThe following examples show how to compile and link your application code by invoking a compiler
recommends an alternative method. The dynamic linker, during its attempt to load libraries, will suffixcandidate directories with the machine type. Th
5 Submitting JobsThis chapter describes how to submit jobs on the HP XC system; it addresses the following topics:• Overview of Job Submission (page 4
Submitting a Serial Job Using Standard LSFExample 5-1 Submitting a Serial Job Using Standard LSFUse the bsub command to submit a serial job to standar
Example 5-3 Submitting an Interactive Serial Job Using LSF-HPC only$ bsub -I hostnameJob <73> is submitted to default queue <normal>.<&
The output for this command could also have been 1 core on each of 4 compute nodes in the SLURMallocation.Submitting a Non-MPI Parallel JobUse the fol
to the number provided by the -n option of the bsub command. Any additional SLURM srun options arejob specific, not allocation-specific.The mpi-jobnam
Running Preexecution Programs...516 Debugg
In Example 5-9, a simple script named myscript.sh, which contains two srun commands, is displayedthen submitted.Example 5-9 Submitting a Job Script$ c
Example 5-12 Submitting a Batch job Script That Uses the srun --overcommit Option$ bsub -n4 -I ./myscript.shJob <81> is submitted to default que
program should pick up the SLURM_JOBID environment variable. The SLURM_JOBID has the informationLSF-HPC needs to run the job on the nodes required by
6 Debugging ApplicationsThis chapter describes how to debug serial and parallel applications in the HP XC development environment.In general, effectiv
This section provides only minimum instructions to get you started using TotalView. Instructions for installingTotalView are included in theHP XC Syst
Using TotalView with LSF-HPCHP recommends the use of xterm when debugging an application with LSF-HPC. You also need to allocatethe nodes you will nee
4. TheTotalView process windowopens.This window contains multiple panes that provide various debugging functions and debugginginformation. The name of
Exiting TotalViewIt is important that you make sure your job has completed before exiting TotalView. This may require thatyou wait a few seconds from
7 Tuning ApplicationsThis chapter discusses how to tune applications in the HP XC environment.Using the Intel Trace Collector and Intel Trace Analyzer
Getting Information About the lsf Partition...76Submitting Jobs...
Example 7-1 The vtjacobic Example ProgramFor the purposes of this example, the examples directory under /opt/IntelTrace/ITC is copied to theuser'
<install-path-name>/ITA/doc/Intel_Trace_Analyzer_Users_Guide.pdfUsing the Intel Trace Collector and Intel Trace Analyzer 61
8 Using SLURMHP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource managementand job scheduling.This chapter address
The srun command has a significant number of options to control the execution of your application closely.However, you can use it for a simple launch
The squeue command can report on jobs in the job queue according to their state; possible states are:pending, running, completing, completed, failed,
# chmod a+r /hptc_cluster/slurm/job/jobacct.logYou can find detailed information on the sacct command and job accounting data in the sacct(1) manpage.
9 Using LSFThe Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system resource managerused on the HP XC system. LSF is an i
job management and information capabilities. LSF-HPC schedules, launches, controls, and tracks jobs thatare submitted to it according to the policies
Differences Between LSF-HPC and Standard LSFLSF-HPC for the HP XC environment supports all the standard features and functions that standard LSF suppo
List of Figures9-1 How LSF-HPC and SLURM Launch and Manage a Job...737
• All HP XC nodes are dynamically configured as “LSF Floating Client Hosts” so that you can executeLSF commands from any HP XC node. When you do execu
Serial jobs are allocated a single CPU on a shared node with minimalcapacities that satisfies other allocation criteria. LSF-HPC always tries torun mu
• exclude= list-of-nodes• contiguous=yesThe srun(1) manpage provides details on these options and their arguments.The following are interactive exampl
• Use the bjobs command to monitor job status in LSF-HPC.• Use the bqueues command to list the configured job queues in LSF-HPC.How LSF-HPC and SLURM
This bsub command launches a request for four cores (from the -n4 option of the bsub command)across four nodes (from the -ext "SLURM[nodes=4]&quo
PreemptionLSF-HPC uses the SLURM "node share" feature to facilitate preemption. When a low-priority is job preempted,job processes are suspe
The following example shows the output from the lshosts command:$ lshostsHOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCESlsfhost
$ sinfo -p lsfPARTITION AVAIL TIMELIMIT NODES STATE NODELISTlsf up infinite 128 idle n[1-128]Use the following command to obtain more i
LSF-HPC node allocation (compute nodes). LSF-HPC node allocation is created by -n num-procs parameter,which specifies the number of cores the job requ
Refer to the LSF bsub command manpage for additional information about using the external scheduler(-ext) option. See the srun manpage for more detail
Getting Information About JobsThere are several ways you can get information about a specific job after it has been submitted to LSF-HPC.This section
Job Allocation Information for a Finished JobThe following is an example of the output obtained using the bhist -l command to obtain job allocationinf
Example 9-5 Using the bjobs Command (Long Output)$ bjobs -l 24Job <24>, User <msmith>,Project <default>,Status <RUN>,
Example 9-7 Using the bhist Command (Long Output)$ bhist -l 24Job <24>, User <lsfadmin>, Project <default>, Interactive pseudo-termi
$ sacct -j 123Jobstep Jobname Partition Ncpus Status Error---------- ------------------ ---------- ------- ---------- -----123
Be sure to unset the SLURM_JOBID when you are finished with the allocation, to prevent a previous SLURMJOBID from interfering with future jobs:$ unset
confirm an expected high load on the nodes. The following is an example of this; the LSF JOBID is 200 andthe SLURM JOBID is 250:$ srun --jobid=250 upt
Table 9-2 LSF-HPC Equivalents of SLURM srun OptionsLSF-HPC EquivalentDescriptionsrun Optionbsub -n numNumber of processes (tasks) to run.-n--ntasks=nt
LSF-HPC EquivalentDescriptionsrun OptionYou cannot use this option. LSF-HPC uses it to createallocation.Root attempts to submit or run a job asnormal
LSF-HPC EquivalentDescriptionsrun OptionUse as an argument to srun when launching paralleltasks.How long to wait after the first taskterminates before
List of Tables1-1 Determining the Node Platform...
10 Advanced TopicsThis chapter covers topics intended for the advanced user. This chapter addresses the following topics:• Enabling Remote Execution w
$ hostnamemymachineThen, use the host name of your local machine to retrieve its IP address:$ host mymachinemymachine has address 14.26.206.134Step 2.
Determine the address of your monitor's display server, as shown at the beginning of "Running an X TerminalSession from a Remote Node"
Further, if the recursive make is run remotely, it can be told to use concurrency on the remote node. Forexample:$ cd subdir; srun -n1 -N1 $(MAKE) -j4
@ \ for i in ${HYPRE_DIRS}; \ do \ if [ -d $$i ]; \ then \ echo "Cleaning $$i ..."; \
struct_matrix_vector/libHYPRE_mv.a: $(PREFIX) $(MAKE) -C struct_matrix_vector struct_linear_solvers/libHYPRE_ls.a: $(PREFIX) $(MAKE) -
Shared File ViewAlthough a file opened by multiple processes of an application is shared, each core maintains a private filepointer and file position.
98
Appendix A ExamplesThis appendix provides examples that illustrate how to build and run applications on the HP XC system. Theexamples in this section
Commentaires sur ces manuels