Previous  
Product: Storage Foundation Cluster File System Guides   
Manual: Cluster File System 4.1 Installation and Administration Guide   

CFS Problems

If there is a device failure or controller failure to a device, the file system may become disabled cluster-wide. To address the problem, unmount all secondary mounts, unmount the primary, then run a full fsck. When the file system check completes, mount all nodes again.

Unmount Failures

The umount command can fail for the following reasons:

  • When unmounting shared file systems, you must unmount the secondaries before unmounting the primary.
  • A reference is being held by an NFS server. Unshare the mount point and try the unmount again.

Mount Failures

Mounting a file system can fail for the following reasons:

  • The file system is not using disk layout Version 4, 5 or 6.
  • The mount options do not match the options of already mounted nodes.
  • A cluster file system is mounted by default with the qio option enabled if the node has a Quick I/O for Databases license installed, even if the qio mount option was not explicitly specified. If the Quick I/O license is not installed, a cluster file system is mounted without the qio option enabled. So if some nodes in the cluster have a Quick I/O license installed and others do not, a cluster mount can succeed on some nodes and fail on others due to different mount options. To avoid this situation, ensure that Quick I/O licensing is uniformly applied, or be careful to mount the cluster file system with the qio/noqio option appropriately specified on each node of the cluster. See the mount(1M) manual page for more information on mount options.
  • A shared CVM volume was not specified.
  • The device is still mounted as a local file system somewhere on the cluster. Unmount the device.
  • The fsck or mkfs command is being run on the same volume from another node, or the volume is mounted in non-cluster mode from another node.
  • The vxfsckd daemon is not running. This typically happens only if the CFSfsckd agent was not started correctly.
  • If mount fails with an error message:

  •    vxfs mount: cannot open mnttab
    /etc/mnttab is missing or you do not have root privileges.
  • If mount fails with an error message:

  •    vxfs mount: device already mounted, ...
    The device is in use by mount, mkfs or fsck on the same node. This error cannot be generated from another node in the cluster.
  • If this error message displays:

  •    mount: slow
    The node may be in the process of joining the cluster.
  • If you try to mount a file system that is already mounted without –o cluster option (that is, not in shared mode) on another cluster node,

  •    # mount -F vxfs /dev/vx/dsk/share/vol01 /vol01
    the following error message displays:

    vxfs mount: /dev/vx/dsk/share/vol01 is already mounted, /vol01
    is busy, allowable number of mount points exceeded, or cluster
    reservation failed for the volume
  • If umount fails with an error message:

  •    vxfs umount: /vol01 cannot unmount : Device busy
  • You must unmount the file system on all secondary systems before unmounting it from the primary.

Command Failures

  • Manual pages not accessible with the man command. Set the MANPATH environment variable as listed under Setting PATH and MANPATH Environment Variables.
  • The mount, fsck, and mkfs utilities reserve a shared volume. They fail on volumes that are in use. Be careful when accessing shared volumes with other utilities such as dd, it is possible for these commands to destroy data on the disk.
  • Running some commands, such as fsadm -E /vol02, can generate the following error message:

  •    vxfs fsadm: ERROR: not primary in a cluster file system
    This means that you can run this command only on the primary, that is, the system that mounted this file system first.

Performance Issues

File system performance is adversely affected if a cluster file system is mounted with the qio option enabled and Quick I/O is licensed, but the file system is not used for Quick I/O files. Because qio is enabled by default, if you do not intend to use a shared file system for Quick I/O, explicitly specify the noqio option when mounting.

High Availability Issues

Network Partition/Jeopardy

Network partition (or split brain) is a condition where a network failure can be misinterpreted as a failure of one or more nodes in a cluster. If one system in the cluster incorrectly assumes that another system failed, it may restart applications already running on the other system, thereby corrupting data. CFS tries to prevent this by having redundant heartbeat links.

At least one link must be active to maintain the integrity of the cluster. If all the links go down, after the last network link is broken, the node can no longer communicate with other nodes in the cluster. Thus the cluster is in one of two possible states. Either the last network link is broken (called a network partition condition), or the last network link is okay, but the node crashed, in which case it is not a network partition problem. It is not possible to identify whether it is the first or second state, so a kernel message is issued to indicate that a network partition may exist and there is a possibility of data corruption.

Jeopardy is a condition where a node in the cluster has a problem connecting to other nodes. In this situation, the link or disk heartbeat may be down, so a jeopardy warning may be displayed. Specifically, this message appears when a node has only one remaining link to the cluster and that link is a network link. This is considered a critical event because the node may lose its only remaining connection to the network.


Caution  Caution    Do not remove the communication links while shared storage is still connected.

Low Memory

Under heavy loads, software that manages heartbeat communication links may not be able to allocate kernel memory. If this occurs, a node halts to avoid any chance of network partitioning. Reduce the load on the node if this happens frequently.

A similar situation may occur if the values in the /etc/llttab files on all cluster nodes are not correct or identical.


/pdfmark where { pop} { userdict /pdfmark /cleartomark load put} ifelse

[ {Catalog} << /PageLabels << /Nums [

0 << /S /r /St 1 >>

2 << /S /r /St 3 >>

8 << /S /r /St 9 >>

12 << /S /D /St 1 >>

20 << /S /D /St 9 >>

52 << /S /D /St 41 >>

56 << /S /D /St 45 >>

64 << /S /D /St 53 >>

66 << /S /D /St 55 >>

82 << /S /D /St 71 >>

94 << /S /D /St 83 >>

108 << /S /D /St 97 >>

116 << /S /D /St 105 >>

134 << /S /D /St 123 >>

] >> >> /PUT pdfmark

 ^ Return to Top Previous  
Product: Storage Foundation Cluster File System Guides  
Manual: Cluster File System 4.1 Installation and Administration Guide  
VERITAS Software Corporation
www.veritas.com