Hp XC System 3.x Software Manuel d'utilisateur

Naviguer en ligne ou télécharger Manuel d'utilisateur pour Logiciel Hp XC System 3.x Software. HP XC System 3.x Software User Manual Manuel d'utilisatio

  • Télécharger
  • Ajouter à mon manuel
  • Imprimer
  • Page
    / 118
  • Table des matières
  • MARQUE LIVRES
  • Noté. / 5. Basé sur avis des utilisateurs

Résumé du contenu

Page 1 - User's Guide

HP XC System SoftwareUser's GuideVersion 3.0Part number: 5991-4847published January 2006

Page 3 - Table of Contents

Examine the local host information:$ hostnamen2Examine the job information:$ bjobsNo unfinished job foundRun the LSF bsub -Is command to launch the in

Page 4 - 5 Submitting Jobs

SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp memloadSched - - - - - - - - - - -loadStop - - - -

Page 5 - 9 Using LSF

Examine the partition information:$ sinfoPARTITION AVAIL TIMELIMIT NODES STATE NODELISTlsf up infinite 6 idle n[5-10]Examine the loca

Page 6 - 10 Advanced Topics

Examine the the finished job's information:$ bhist -l 124Job <124>, User <lsfadmin>, Project <default>, Interactive pseudo-term

Page 7 - List of Figures

n16n16Linux n14 2.4.21-15.3hp.XCsmp #2 SMP date and time stamp ia64 ia64 ia64 GNU/LinuxLinux n14 2.4.21-15.3hp.XCs

Page 8

n15n15n16n16$ srun -n3 hostnamen13n14n15Exit the pseudo-terminal:$ exitexitView the interactive jobs:$ bjobs -l 1008Job <1008>, User smith, Proj

Page 9 - List of Tables

Copyright 1992-2004 Platform Computing CorporationMy cluster name is penguinMy master name is lsfhost.localdomain$ sinfoPARTITION AVAIL TIMELIMIT N

Page 10

6 Processors Requested;date and time stamp: Dispatched to 6 Hosts/Processors <6*lsfhost.localdomain>;d

Page 12

GlossaryAadministrationbranchThe half (branch) of the administration network that contains all of the general-purposeadministration ports to the nodes

Page 13 - About This Document

List of Examples4-1 Directory Structure...

Page 14 - HP XC Information

FCFS First-come, first-served. An LSF job-scheduling policy that specifies that jobs are dispatchedaccording to their order in a queue, which is deter

Page 15 - Supplementary Information

LLinux VirtualServerSee LVS.load file A file containing the names of multiple executables that are to be launched simultaneously by asingle command.Lo

Page 16 - Related Information

NetworkInformationServicesSee NIS.NIS Network Information Services. A mechanism that enables centralization of common data that ispertinent across mul

Page 17 - Additional Publications

SMP Symmetric multiprocessing. A system with two or more CPUs that share equal (symmetric) accessto all of the facilities of a computer system, such a

Page 19 - System Architecture

IndexAACML library, 42application development, 37building parallel applications, 42building serial applications, 39communication between nodes, 97comp

Page 20 - Storage and I/O

configuring local disk, 96core availability, 38DDDT, 53debuggerTotalView, 53debuggingDDT, 53gdb, 53idb, 53pgdbg, 53TotalView, 53debugging optionssetti

Page 21 - File System

submitting jobs, 77summary of bsub command, 77using srun with, 64viewing historical information of jobs, 82LSF-SLURM external scheduler, 45lshosts com

Page 22 - System Interconnect Network

examples of, 99programming model, 39shared file view, 97signalsending to a job, 65Simple Linux Utility for Resource Management (see SLURM)sinfo comman

Page 24 - Run-Time Environment

About This DocumentThis document provides information about using the features and functions of the HP XC System Software.It describes how the HP XC u

Page 25 - Standard LSF

• Chapter 10: Advanced Topics (page 91) provides information on remote execution, running an Xterminal session from a remote node, and I/O performance

Page 26

Documentation for the HP Integrity and HP ProLiant servers is available at the following URL:http://www.docs.hp.com/For More InformationThe HP Web sit

Page 27 - 2 Using the System

• http://supermon.sourceforge.net/Home page for Supermon, a high-speed cluster monitoring system that emphasizes low perturbation,high sampling rates,

Page 28 - Introduction

Related Linux Web Sites• http://www.redhat.comHome page for Red Hat®, distributors of Red Hat Enterprise Linux Advanced Server, a Linux distributionwi

Page 29

•Perl Cookbook, Second Edition, by Tom Christiansen and Nathan Torkington•Perl in A Nutshell: A Desktop Quick Reference, by Ellen Siever, et al.Typogr

Page 30 - 30 Using the System

1 Overview of the User EnvironmentThe HP XC system is a collection of computer nodes, networks, storage, and software, built into a cluster,that work

Page 31 - Overview of Modules

© Copyright 2003, 2005, 2006 Hewlett-Packard Development Company, L.P.Confidential computer software. Valid license from HP required for possession, u

Page 32 - Supplied Modulefiles

Table 1-1 Determining the Node PlatformPartial Output of /proc/cpuinfoPlatformprocessor : 0 vendor_id : GenuineIntel cpu family : 15

Page 33 - Loading a Modulefile

SAN StorageThe HP XC system uses the HP StorageWorks Scalable File Share (HP StorageWorks SFS), which is basedon Lustre technology and uses the Lustre

Page 34 - Modulefile Conflicts

Be aware of the following information about the HP XC file system layout:• Open source software that by default would be installed under the /usr/loca

Page 35 - Creating a Modulefile

free -mDisk Partitions Use the following command to display the disk partitions and theirsizes:cat /proc/partitionsSwap Use the following command to d

Page 36

Documentation CD contains XC LSF manuals from Platform Computing. LSFmanpages are available on the HP XC system.SLURM commands HP XC uses the Simple L

Page 37 - 4 Developing Applications

by default for LSF-HPC batch jobs. The system administrator has the option of creating additional partitions.For example, another partition could be c

Page 38 - Interrupting a Job

SLURM Allocates nodes for jobs as determined by LSF-HPC. It CONTROLS task/rank distribution withinthe allocated nodes. SLURM also starts the executabl

Page 39 - Setting Debugging Options

2 Using the SystemThis chapter describes the tasks and commands that the general user must know to use the system. It addressesthe following topics:•

Page 40 - Modulefiles

IntroductionAs described in Run-Time Environment (page 24), SLURM and LSF-HPC cooperate to run and manage jobson the HP XC system, combining LSF-HPC&a

Page 41 - Standard

$ lsloadFor more information about using this command and a sample of its output, see Getting Host LoadInformation (page 76).Getting Information About

Page 42 - 42 Developing Applications

Table of ContentsAbout This Document...13Intended Audience...

Page 43 - Developing Libraries

Getting System Help and InformationIn addition to the hardcopy documentation described in the preface of this document (About This Document),the HP XC

Page 44 - 44 Developing Applications

3 Configuring Your Environment with ModulefilesThe HP XC system supports the use of Modules software to make it easier to configure and modify the you

Page 45

could cause inconsistencies in the use of shared objects. If you have multiple compilers (perhaps withincompatible shared objects) installed, it is pr

Page 46 - 46 Submitting Jobs

Table 3-1 Supplied ModulefilesSets the HP XC User Environment to Use:ModulefileIntel C/C++ Version 8.0 compilers.icc/8.0Intel C/C++ Version 8.1 compil

Page 47

you are attempting to load conflicts with a currently loaded modulefile, the modulefile will not be loadedand an error message will be displayed.If yo

Page 48 - 48 Submitting Jobs

When a modulefile conflict occurs, unload the conflicting modulefile before loading the new modulefile. Inthe previous example, you should unload the

Page 50 - 50 Submitting Jobs

4 Developing ApplicationsThis chapter discusses topics associated with developing applications in the HP XC environment. Beforereading this chapter, y

Page 51 - Running Preexecution Programs

Table 4-1, “Compiler Commands” displays the compiler commands for Standard Linux, Intel, and PGIcompilers for the C, C++, and Fortran languages.Table

Page 52 - 52 Submitting Jobs

The Ctrl/C key sequence will report the state of all tasks associated with the srun command. If the Ctrl/Ckey sequence is entered twice within one sec

Page 53

3 Configuring Your Environment with ModulefilesOverview of Modules...

Page 54 - Using TotalView with SLURM

Developing Parallel ApplicationsThis section describes how to build and run parallel applications. The following topics are discussed:• Parallel Appli

Page 55 - Debugging an Application

PthreadsPOSIX Threads (Pthreads) is a standard library that programmers can use to develop portable threadedapplications. Pthreads can be used in conj

Page 56 - TotalView process window

http://www.pathscale.com/ekopath.html.GNU Parallel MakeThe GNU parallel Make command is used whenever the make command is invoked. GNU parallel Makepr

Page 57 - Exiting TotalView

Examples of Compiling and Linking HP-MPI ApplicationsThe following examples show how to compile and link your application code by invoking a compiler

Page 58

recommends an alternative method. The dynamic linker, during its attempt to load libraries, will suffixcandidate directories with the machine type. Th

Page 59

5 Submitting JobsThis chapter describes how to submit jobs on the HP XC system; it addresses the following topics:• Overview of Job Submission (page 4

Page 60 - 60 Tuning Applications

Submitting a Serial Job Using Standard LSFExample 5-1 Submitting a Serial Job Using Standard LSFUse the bsub command to submit a serial job to standar

Page 61

Example 5-3 Submitting an Interactive Serial Job Using LSF-HPC only$ bsub -I hostnameJob <73> is submitted to default queue <normal>.<&

Page 62

The output for this command could also have been 1 core on each of 4 compute nodes in the SLURMallocation.Submitting a Non-MPI Parallel JobUse the fol

Page 63

to the number provided by the -n option of the bsub command. Any additional SLURM srun options arejob specific, not allocation-specific.The mpi-jobnam

Page 64 - The srun Roles and Modes

Running Preexecution Programs...516 Debugg

Page 65 - Job Accounting

In Example 5-9, a simple script named myscript.sh, which contains two srun commands, is displayedthen submitted.Example 5-9 Submitting a Job Script$ c

Page 66 - Security

Example 5-12 Submitting a Batch job Script That Uses the srun --overcommit Option$ bsub -n4 -I ./myscript.shJob <81> is submitted to default que

Page 67

program should pick up the SLURM_JOBID environment variable. The SLURM_JOBID has the informationLSF-HPC needs to run the job on the nodes required by

Page 68 - Overview of LSF-HPC

6 Debugging ApplicationsThis chapter describes how to debug serial and parallel applications in the HP XC development environment.In general, effectiv

Page 69 - Using LSF-HPC 69

This section provides only minimum instructions to get you started using TotalView. Instructions for installingTotalView are included in theHP XC Syst

Page 70 - Job Terminology

Using TotalView with LSF-HPCHP recommends the use of xterm when debugging an application with LSF-HPC. You also need to allocatethe nodes you will nee

Page 71 - Using LSF-HPC 71

4. TheTotalView process windowopens.This window contains multiple panes that provide various debugging functions and debugginginformation. The name of

Page 72 - Notes on LSF-HPC

Exiting TotalViewIt is important that you make sure your job has completed before exiting TotalView. This may require thatyou wait a few seconds from

Page 74 - Job Startup and Job Control

7 Tuning ApplicationsThis chapter discusses how to tune applications in the HP XC environment.Using the Intel Trace Collector and Intel Trace Analyzer

Page 75 - Preemption

Getting Information About the lsf Partition...76Submitting Jobs...

Page 76 - Getting Host Load Information

Example 7-1 The vtjacobic Example ProgramFor the purposes of this example, the examples directory under /opt/IntelTrace/ITC is copied to theuser'

Page 77 - Submitting Jobs

<install-path-name>/ITA/doc/Intel_Trace_Analyzer_Users_Guide.pdfUsing the Intel Trace Collector and Intel Trace Analyzer 61

Page 79 - Using LSF-HPC 79

8 Using SLURMHP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource managementand job scheduling.This chapter address

Page 80 - 80 Using LSF

The srun command has a significant number of options to control the execution of your application closely.However, you can use it for a simple launch

Page 81 - Examining the Status of a Job

The squeue command can report on jobs in the job queue according to their state; possible states are:pending, running, completing, completed, failed,

Page 82 - 82 Using LSF

# chmod a+r /hptc_cluster/slurm/job/jobacct.logYou can find detailed information on the sacct command and job accounting data in the sacct(1) manpage.

Page 83 - Using LSF-HPC 83

9 Using LSFThe Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system resource managerused on the HP XC system. LSF is an i

Page 84 - 84 Using LSF

job management and information capabilities. LSF-HPC schedules, launches, controls, and tracks jobs thatare submitted to it according to the policies

Page 85 - Using LSF-HPC 85

Differences Between LSF-HPC and Standard LSFLSF-HPC for the HP XC environment supports all the standard features and functions that standard LSF suppo

Page 86 - 86 Using LSF

List of Figures9-1 How LSF-HPC and SLURM Launch and Manage a Job...737

Page 87 - Using LSF-HPC 87

• All HP XC nodes are dynamically configured as “LSF Floating Client Hosts” so that you can executeLSF commands from any HP XC node. When you do execu

Page 88 - 88 Using LSF

Serial jobs are allocated a single CPU on a shared node with minimalcapacities that satisfies other allocation criteria. LSF-HPC always tries torun mu

Page 89 - Using LSF-HPC 89

• exclude= list-of-nodes• contiguous=yesThe srun(1) manpage provides details on these options and their arguments.The following are interactive exampl

Page 90

• Use the bjobs command to monitor job status in LSF-HPC.• Use the bqueues command to list the configured job queues in LSF-HPC.How LSF-HPC and SLURM

Page 91

This bsub command launches a request for four cores (from the -n4 option of the bsub command)across four nodes (from the -ext "SLURM[nodes=4]&quo

Page 92 - 92 Advanced Topics

PreemptionLSF-HPC uses the SLURM "node share" feature to facilitate preemption. When a low-priority is job preempted,job processes are suspe

Page 93

The following example shows the output from the lshosts command:$ lshostsHOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCESlsfhost

Page 94 - 94 Advanced Topics

$ sinfo -p lsfPARTITION AVAIL TIMELIMIT NODES STATE NODELISTlsf up infinite 128 idle n[1-128]Use the following command to obtain more i

Page 95 - Example Procedure 2

LSF-HPC node allocation (compute nodes). LSF-HPC node allocation is created by -n num-procs parameter,which specifies the number of cores the job requ

Page 96 - Local Disks on Compute Nodes

Refer to the LSF bsub command manpage for additional information about using the external scheduler(-ext) option. See the srun manpage for more detail

Page 98

Getting Information About JobsThere are several ways you can get information about a specific job after it has been submitted to LSF-HPC.This section

Page 99 - Appendix A Examples

Job Allocation Information for a Finished JobThe following is an example of the output obtained using the bhist -l command to obtain job allocationinf

Page 100 - View the job:

Example 9-5 Using the bjobs Command (Long Output)$ bjobs -l 24Job <24>, User <msmith>,Project <default>,Status <RUN>,

Page 101

Example 9-7 Using the bhist Command (Long Output)$ bhist -l 24Job <24>, User <lsfadmin>, Project <default>, Interactive pseudo-termi

Page 102 - 102 Examples

$ sacct -j 123Jobstep Jobname Partition Ncpus Status Error---------- ------------------ ---------- ------- ---------- -----123

Page 103 - Run the job:

Be sure to unset the SLURM_JOBID when you are finished with the allocation, to prevent a previous SLURMJOBID from interfering with future jobs:$ unset

Page 104 - Show the job allocation:

confirm an expected high load on the nodes. The following is an example of this; the LSF JOBID is 200 andthe SLURM JOBID is 250:$ srun --jobid=250 upt

Page 105 - View the node state:

Table 9-2 LSF-HPC Equivalents of SLURM srun OptionsLSF-HPC EquivalentDescriptionsrun Optionbsub -n numNumber of processes (tasks) to run.-n--ntasks=nt

Page 106 - View the finished job:

LSF-HPC EquivalentDescriptionsrun OptionYou cannot use this option. LSF-HPC uses it to createallocation.Root attempts to submit or run a job asnormal

Page 107

LSF-HPC EquivalentDescriptionsrun OptionUse as an argument to srun when launching paralleltasks.How long to wait after the first taskterminates before

Page 108

List of Tables1-1 Determining the Node Platform...

Page 110 - 110 Glossary

10 Advanced TopicsThis chapter covers topics intended for the advanced user. This chapter addresses the following topics:• Enabling Remote Execution w

Page 111

$ hostnamemymachineThen, use the host name of your local machine to retrieve its IP address:$ host mymachinemymachine has address 14.26.206.134Step 2.

Page 112 - 112 Glossary

Determine the address of your monitor's display server, as shown at the beginning of "Running an X TerminalSession from a Remote Node"

Page 113

Further, if the recursive make is run remotely, it can be told to use concurrency on the remote node. Forexample:$ cd subdir; srun -n1 -N1 $(MAKE) -j4

Page 114

@ \ for i in ${HYPRE_DIRS}; \ do \ if [ -d $$i ]; \ then \ echo "Cleaning $$i ..."; \

Page 115

struct_matrix_vector/libHYPRE_mv.a: $(PREFIX) $(MAKE) -C struct_matrix_vector struct_linear_solvers/libHYPRE_ls.a: $(PREFIX) $(MAKE) -

Page 116 - 116 Index

Shared File ViewAlthough a file opened by multiple processes of an application is shared, each core maintains a private filepointer and file position.

Page 118 - 118 Index

Appendix A ExamplesThis appendix provides examples that illustrate how to build and run applications on the HP XC system. Theexamples in this section

Commentaires sur ces manuels

Pas de commentaire