Troubleshooting9–14
9.23.2 Nodeset and Node ID information
To determine what nodes are visible to a node on the Quadrics interconnect, enter the following command:
# cat /proc/qsnet/ep/rail0/nodeset
[7-10]
In this example, the output indicates that nodes 7, 8, 9, and 10 are visible to the node.
Note that the /proc/qsnet area is created when the qsnet module is loaded. Similarly, the
/proc/qsnet/elan area is created when the elan module is loaded, the /proc/qsnet/ep area is
created when the ep module is loaded, and so on.
To find the ID (on the Quadrics interconnect) of a node, and to determine how many ports are on the switch,
enter the following command:
# cat /proc/qsnet/ep/rail0/state
running NodeId=10 NumNodes=16
In this example, the output indicates that the ID of this client node on the Quadrics interconnect is 10, and
that there are sixteen ports on the switch.
To determine whether the nodeset has stabilized after booting, enter the following command:
# cat /proc/qsnet/ep/rail0/cluster
[8-9][11-11] Online Connected=1
[0-7][12-15] Online Connected=1
9.23.3 Checking active Lustre communications over the Quadrics interconnect
The kernel comms module provides active transaction descriptors (active_tx) that may be useful for
debugging purposes. To view active_tx information, enter the following command:
# lctl --net elan active_tx
9.23.4 Gathering debugging information
The script shown in the following example can be used to gather debugging information about the Quadrics
interconnect:
# qsnetdebug debug /tmp/debug.log &
# echo epcomms > /proc/qsnet/ep/rail0/display
The display entry in the kernel comms procfs module supports the following commands:
epcomms Displays epcomms state.
status Displays cluster membership global state information.
segs Displays cluster membership segment state.
rail Displays cluster membership spanning tree.
nodes Displays cluster membership node state.
Commentaires sur ces manuels