Monday, 5 September 2011

PowerHA Failure Rate

Changing Module failure rate -

Within PowerHA you can increase the rate in while HA detects the failure of the various hearbeat network, there are 3 predefined values of slow, normal and fast.

# smitty cm_config_networks
    - then select 'Change aNework Module using Predefined Values' and select the module you wish to change, in this case 'ether'

                               [Entry Fields]
* Network Module Name          ether
  Description                  Ethernet Protocol
  Failure Detection Rate       Normal                     +

  NOTE: Changes made to this panel must be
        propagated to the other nodes by
        Verifying and Synchronizing the cluster


As you can see once the setting has been changed it is important to sync and verify the cluster 'smitty cm_ver_and_sync.select'.
If you wish to tune this further then you can select the 'Custome Values' option -

                                     [Entry Fields]
* Network Module Name                   ether
  Description                           [Ethernet Protocol]
  Address Type                          Address                       +
  Path                                  [/usr/sbin/rsct/bin/hats_nim]  /
  Parameters                            []
  Grace Period                          [60]                           #
  Supports gratuitous arp               [true]                         +
  Entry type                            [adapter_type]
  Next generic type                     [transport]
  Next generic name                     [Generic_UDP]
  Supports source routing               [true]                         +
  Failure Cycle                         [10]                           #
  Interval between Heartbeats (seconds) [1.00]
 
  Heartbeat rate is the rate at which cluster servic
  es sends 'keep alive' messages between adapters in
  the cluster. The combination of heartbeat rate and
  failure cycle determines how quickly a failure can
  be detected and may be calculated using this
  formula:
  (heartbeat rate) * (failure cycle) * 2 seconds

  NOTE: Changes made to this panel must be
        propagated to the other nodes by
        Verifying and Synchronizing the cluster


So the default value above is 10 * 1.00 * 2 = 20.00 seconds before a failure of the network is declaired.

Then once you have this set, using the 'Show a Network Module' will give you a sumary current values.

No comments:

Post a Comment