Veritas Cluster (VCS) Cheat Sheet
Overview
A Veritas Cluster Server (VCS) is a high availabilty system provided by Symantec which consists of combination of multiple servers connected with shared storage devices. VCS links commodity hardware with intelligent software to provide application failover and control. In case of any node or application failure VCS helps in taking the predefined actions to keep system running in cluster. VCS monitors the systems and their services. VCS systems in the cluster communicate over a private network.
A switchover is an orderly shutdown of an application or operating system of the cluster machine and its supporting resources from one server and a controlled startup on another server under VCS.
A failover is situation where applications and resoucrse are stopped abruptly, the ordered shutdown of applications on the original node may not be possible, so the services are started on another node.
The process of starting the application on the node is identical in a failover or switchover.
CLUSTER COMPONENTS:
- Resources
- Resource Dependencies
- Resource Categories
- Service Groups
- Agents
- High-Availability Daemon (HAD)
- Low Latency Transport (LLT)
- Traffic Distribution
- Heartbeat
- Group Membership Services/Atomic Broadcast (GAB)
- Cluster Membership
- Cluster Communications
LLT and GAB files
/etc/llthosts | The file is a database, containing one entry per system, that links the LLT system ID with the hosts name. The file is identical on each server in the cluster. |
/etc/llttab | The file contains information that is derived during installation and is used by the utility lltconfig. |
/etc/gabtab | The file contains the information needed to configure the GAB driver. This file is used by the gabconfig utility. |
/etc/VRTSvcs/conf/config/main.cf | The VCS configuration file. The file contains the information that defines the cluster and its systems. |
LLT Commands
Verifying that links are active for LLT | lltstat -n |
verbose output of the lltstat command | lltstat -nvv | more |
open ports for LLT | lltstat -p |
display the values of LLT configuration directives | lltstat -c |
lists information about each configured LLT link | lltstat -l |
List all MAC addresses in the cluster | lltconfig -a list |
Stop the LLT running | lltconfig -U |
Start the LLT | lltconfig -c |
GAB Commands
Verify that GAB is operating | gabconfig -a |
Stop GAB running | gabconfig -U |
Start the GAB | gabconfig -c -n <number of nodes> |
Override the seed values in the gabtab file | gabconfig -c -x |
GAB Port Membership
List Membership | gabconfig -a |
Unregister port f | /opt/VRTS/bin/fsclustadm cfsdeinit |
Port Function | a – gab driver b – I/O fencing (designed to guarantee data integrity) d – ODM (Oracle Disk Manager) f – CFS (Cluster File System) h – VCS (VERITAS Cluster Server: high availability daemon) o – VCSMM driver (kernel module needed for Oracle and VCS interface) q – QuickLog daemon v – CVM (Cluster Volume Manager) w – vxconfigd (module for cvm) |
Cluster daemons
High Availability Daemon | had |
Companion Daemon | hashadow |
Resource Agent daemon | <resource>Agent |
Web Console cluster managerment daemon | CmdServer |
Cluster Log Files
Log Directory | /var/VRTSvcs/log |
Primary log file (engine log file) | /var/VRTSvcs/log/engine_A.log |
Starting Cluster
Start cluster with local config in ‘stale’ state | hastart -stale |
Start cluster with stale config in ‘valid’ state | hastart -force |
Bring the cluster into running mode from a stale state using the configuration file from a particular server | hasys -force <server_name> |
Stopping Cluster
Stop the cluster on the local server but leave the application/s running, do not failover the application/s | hastop -local |
Stop cluster on local server but evacuate (failover) the application/s to another node within the cluster | hastop -local -evacuate |
Stop the cluster on all nodes but leave the application/s running | hastop -all -force |
Cluster Status
Display cluster summary | hastatus -summary |
Continually monitor cluster | hastatus |
Verify the cluster is operating | hasys -display |
Cluster Details
Information about a cluster | haclus -display |
Value for a specific cluster attribute | haclus -value <attribute> |
Modify a cluster attribute | haclus -modify <attribute name> <new> |
Enable LinkMonitoring | haclus -enable LinkMonitoring |
Disable LinkMonitoring | haclus -disable LinkMonitoring |
System Operations
Add a user | hauser -add <username> |
Modify a user | hauser -update <username> |
Delete a user | hauser -delete <username> |
Display all users | hauser -display |
Add a system to the cluster | hasys -add <sys> |
Delete a system from the cluster | hasys -delete <sys> |
Modify a system attributes | hasys -modify <sys> <modify options> |
List a system state | hasys -state |
Force a system to start | hasys -force |
Display the systems attributes | hasys -display [-sys] |
List all the systems in the cluster | hasys -list |
Change the load attribute of a system | hasys -load <system> <value> |
Display the value of a systems nodeid (/etc/llthosts) | hasys -nodeid |
Freeze a system (No offlining system, No groups onlining) | hasys -freeze [-persistent][-evacuate] |
Unfreeze a system ( reenable groups and resource back online) | hasys -unfreeze [-persistent] |
Dynamic Configuration
Change configuration to read/write mode | haconf -makerw |
Change configuration to read-only mode | haconf -dump -makero |
Check what mode cluster is running in | haclus -display |grep -i ‘readonly’
|
Check the configuration file | hacf -verify /etc/VRTSvcs/conf/config |
Convert a main.cf file into cluster commands | hacf -cftocmd /etc/VRTSvcs/conf/config -dest /tmp |
Convert a command file into a main.cf file | hacf -cmdtocf /tmp -dest /etc/VRTSvcs/conf/config |
Service Groups
Add a service group | haconf -makerw hagrp -add <group> hagrp -modify groupw SystemList sun1 1 sun2 2 hagrp -autoenable <group> -sys sun1 haconf -dump -makero |
Delete a service group | haconf -makerw hagrp -delete <group> haconf -dump -makero |
Change a service group | haconf -makerw hagrp -modify <group> SystemList sun1 1 sun2 2 sun3 3 haconf -dump -makero |
List the service groups | hagrp -list |
List the groups dependencies | hagrp -dep <group> |
List the parameters of a group | hagrp -display <group> |
Display a service group’s resource | hagrp -resources <group> |
Display the current state of the service group | hagrp -state <group> |
Clear a faulted non-persistent resource in a specific grp | hagrp -clear <group> [-sys] <host> <sys> |
Change the system list in a cluster | hagrp -modify <group> SystemList -delete <hostname> hagrp -modify <group> SystemList -add <hostname> 1 hagrp -modify <group> AutoStartList <host> <host> |
Service Group Operations
Start a service group and bring its resources online | hagrp -online <group> -sys <sys> |
Stop a service group and takes its resources offline | hagrp -offline <group> -sys <sys> |
Switch a service group from system to another | hagrp -switch <group> to <sys> |
Enable all the resources in a group | hagrp -enableresources <group> |
Disable all the resources in a group | hagrp -disableresources <group> |
Freeze a service group (disable online and offline) | hagrp -freeze <group> [-persistent] |
Unfreeze a service group (enable online and offline) | hagrp -unfreeze <group> [-persistent] |
Enable a service group. Enabled groups can only be brought online | haconf -makerw hagrp -enable <group> [-sys] haconf -dump -makero |
Disable a service group. Stop from bringing online | haconf -makerw hagrp -disable <group> [-sys] haconf -dump -makero |
Flush a service group and enable corrective action. | hagrp -flush <group> -sys <system> |
Resources
Add a resource | haconf -makerw hares -add <resource> DiskGroup <group> hares -modify <resource> Enabled 1 hares -modify <resource> DiskGroup <resource-name> hares -modify <resource> StartVolumes 0 haconf -dump -makero |
Delete a resource | haconf -makerw hares -delete <resource> haconf -dump -makero |
Change a resource | haconf -makerw hares -modify <resource> Enabled 1 haconf -dump -makero |
Change a resource attribute to be globally wide | hares -global <resource> <attribute> <value> |
Change a resource attribute to be locally wide | hares -local <resource> <attribute> <value> |
List the parameters of a resource | hares -display <resource> |
List the resources | hares -list |
List the resource dependencies | hares -dep |
Resource Operations
Online a resource | hares -online <resource> [-sys] |
Offline a resource | hares -offline <resource> [-sys] |
Display the state of a resource( offline, online, etc) | hares -state |
Display the parameters of a resource | hares -display <resource> |
Offline a resource and propagate the command to its children | hares -offprop <resource> -sys <sys> |
Cause a resource agent to immediately monitor the resource | hares -probe <resource> -sys <sys> |
Clearing a resource (automatically initiates the onlining) | hares -clear <resource> [-sys] |
Resource Types
Add a resource type | hatype -add <type> |
Remove a resource type | hatype -delete <type> |
List all resource types | hatype -list |
Display a resource type | hatype -display <type> |
List a partitcular resource type | hatype -resources <type> |
Change a particular resource types attributes | hatype -value <type> <attr> |
Resource Agents
Add a agent | pkgadd -d . <agent package> |
Remove a agent | pkgrm <agent package> |
Change a agent | N/A |
List all ha agents | haagent -list |
Display agents run-time information i.e has it started, is it running ? | haagent -display <agent_name> |
Display agents faults | haagent -display |grep Faults |
Start an agent | haagent -start <agent_name>[-sys] |
Stop an agent | haagent -stop <agent_name>[-sys] |