Solaris_blog

Thursday, February 4, 2016

TO replace disk in zpool.

TO replace disk in zfs rpool.

# prtconf -vp |grep -i bootpath
        bootpath: '/pci@0,600000/pci@0/pci@8/pci@0/scsi@1/disk@0,0:a'(faulted disk)

# ls -l /dev/rdsk/c1t0d0s0
lrwxrwxrwx   1 root     root          65 Mar 11 2010 /dev/rdsk/c1t0d0s0 -> ../../devices/pci@10,600000/pci@0/pci@8/pci@0/scsi@1/sd@0,0:a,raw

# ls -ld /dev/rdsk/c0t0d0s0
lrwxrwxrwx   1 root     root          64 Mar 11 2010 /dev/rdsk/c0t0d0s0 -> ../../devices/pci@0,600000/pci@0/pci@8/pci@0/scsi@1/sd@0,0:a,raw

# cfgadm -c unconfigure c0::dsk/c0t0d0

<Physically remove failed disk c0t0d0>
<Physically insert replacement disk c0t0d0>

Once replacement is done ,we will configure the disk
# cfgadm -c configure c0::dsk/c0t0d0

echo |format |grep –i c0t0d0

prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c0t0d0s2

label the new device with SMI as it is zfs rpool.

# zpool replace rpool c0t0d0s0
Make sure to wait until resilver is done before rebooting

# zpool online rpool c0t0d0s0

# zpool status rpool

# zpool status rpool
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Feb 4 12:16:01 2016
    24.5G scanned out of 114G at 56.1M/s, 0h27m to go
    24.5G resilvered, 21.41% done
config:

        NAME                STATE     READ WRITE CKSUM
        rpool               DEGRADED     0     0     0
          mirror-0          DEGRADED     0     0     0
            replacing-0     DEGRADED     0     0     0
              c0t0d0s0/old FAULTED      0   951     0 too many errors
              c0t0d0s0      ONLINE       0     0     0 (resilvering)
            c1t0d0s0        ONLINE       0     0     0

errors: No known data errors

Once resilveing done automatically c0t0d0s0/old will get removed.

# zpool status rpool
pool: rpool
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
        pool will no longer be accessible on older software versions.
scan: resilvered 114G in 0h51m with 0 errors on Thu Feb 4 13:07:55 2016
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c0t0d0s0 ONLINE       0     0     0
            c1t0d0s0 ONLINE       0     0     0

errors: No known data errors

<Let disk resilver before installing the boot blocks>
SPARC# installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t0d0s0

Monday, February 1, 2016

Creating a new NFS resource group in Sun Cluster 3.2

1. Make sure the new LUN’s are visible and available to be configured.

     # echo | format > format.b4

     # scdidadm -L > scdidadm.b4

     # cfgadm -c configure <controller(s)>

     # devfsadm

     # scdidadm -r

     # scgdevs ( on one node )

     # scdidadm -L > scdidadm.after

     # diff scdidadm.b4 scdidadm.after

Note down the new DID devices. This is to be used to create file systems

2. Create the new metaset

        # metaset -s sap-set -a -h phys-host1 phys-host2

3. Add disks to metaset

    # metaset -s sap-set -a /dev/did/rdsk/d34

    # metaset -s sap-set -a /dev/did/rdsk/d35

    # metaset -s sap-set -a /dev/did/rdsk/d36

    # metaset -s sap-set -a /dev/did/rdsk/d37

4. Take ownership of metaset on phys-host1

          # cldg switch -n phys-host1 sap-set

5. Create new volumes for sap-set

    # metainit -s sap-set d134 -p /dev/did/dsk/d34s0 1g
    # metainit -s sap-set d135 -p /dev/did/dsk/d34s0 all
    # metainit -s sap-set d136 -p /dev/did/dsk/d35s0 all
    # metainit -s sap-set d137 -p /dev/did/dsk/d36s0 all
    # metainit -s sap-set d138 -p /dev/did/dsk/d37s0 all

6. Create new filesystems

    # umask 022
    # newfs /dev/md/sap-set/rdsk/d134
    # newfs /dev/md/sap-set/rdsk/d135
    # newfs /dev/md/sap-set/rdsk/d136
    # newfs /dev/md/sap-set/rdsk/d137
    # newfs /dev/md/sap-set/rdsk/d138

7. create new mount points, Create it on both the nodes.

    #mkdir -p /sap; chown sap:sap /sap

    #mkdir -p /sapdata/sap11 ; chown sap:sap /sapdata/sap11
    #mkdir -p /sapdata/sap12 ; chown sap:sap /sapdata/sap12
    #mkdir -p /sapdata/sap13 ; chown sap:sap /sapdata/sap13
    #mkdir -p /sapdata/sap14 ; chown sap:sap /sapdata/sap14

8. Edit the /etc/vfstab file and add the new file systems. Make the mount at boot option as “no”

9. Create the Resource group SAP-RG.

      # clrg create -n phys-host1 phys-host2 SAP-RG

10. Create logical hostname resource.

      # clrslh create -g SAP-RG saplh-rs

11. Create the HAstoragePlus Resource

       # clrs create -t HAStoragePlus -g SAP-RG -p AffinityOn=true -p FilesystemMountPoints=”/sap,/sapdata/sap11,/sapdata/sap12,/sapdata/sap13/sapdata/sap14″ sap-data-res

12. Bring the Resource Group Online

    # clrg online -M phys-host1 SAP-RG

13. Test the failover of the Resource Group

     # clrg switch -n phys-host2 SAP-RG

14. Failover Back

     # clrg switch -n phys-host1 SAP-RG

15. Create the SUNW.nfs Config Directory on the /sap file system.

     # mkdir -p /sap/nfs/SUNW.nfs

16. Create the dfstab file to share the file systems

    # vi /sap/nfs/SUNW.nfs/dfstab-sap-nfs-res

    #share -F nfs -o rw /sapdata/sap11
    #share -F nfs -o rw /sapdata/sap12
    #share -F nfs -o rw /sapdata/sap13
    #share -F nfs -o rw /sapdata/sap14

17. Offline the SAP-RG resource group.

      # clrg offline SAP-RG

18. Modify the Pathprefix variable to ensure that NFS knows path to cluster dfstab

    # clrs set -p Pathprefix=/sap/nfs

19. Bring the Resource Group online

    # clrg online -n phys-host1 SAP-RG

20. Create the NFS resource in SAP-RG resource group.

    # clrs create -g SAP-RG -t nfs -p Resource_dependencies=sap-data-res sap-nfs-res

21. Resource should be created and enabled as part of SAP-RG

    # clrs status

22. check to see if the server is exporting filesystems

    # dfshares

Saturday, January 16, 2016

Sun cluster Purging Quorum Keys

Purging Quorum Keys
CAUTION: Purging the keys from a quorum device may result in amnesia.
It should only be done after careful diagnostics have been done to verify why the cluster is not coming up.
This should never be done as long as the cluster is able to come up.
It may need to be done if the last node to leave the cluster is unable to boot,
leaving everyone else fenced out. In that case, boot one of the other nodes to single-user mode,
identify the quorum device, and:
For SCSI 2 disk reservations, the relevant command is pgre, which is located in /usr/cluster/lib/sc:

pgre -c pgre_inkeys -d /dev/did/rdks/d#s2 (List the keys in the quorum device.)
pgre -c pgre_scrub -d /dev/did/rdks/d#s2 (Remove the keys from the quorum device.)

Similarly, for SCSI 3 disk reservations, the relevant command is scsi:

Sun Cluster 3.2 & SCSI Reservation Issues

If you have worked with luns and Sun Cluster 3.2, you may have discovered that
if you ever want to remove a lun from a system, it may not be possible because
of the scsi3 reservation that Sun Cluster places on the disks. The example scenario below
walks you through how to overcome this issue and proceed as
though Sun Cluster is not even installed.

Example: We had a 100GB lun off of a Hitachi disk array that we were using in a metaset that was
controlled by Sun Cluster. We had removed the resource from the
Sun Cluster configuration and removed
the device with configadm/devfsadm,
however when the storage admin attempted to remove the lun id from the Hitachi array zone,
the Hitach array indicated the lun was still in use.
From the Solaris server side, it did not appear to be in use,
however Sun Cluster has set the scsi3 reservations on the disk.

Clearing the Sun Cluster scsi reservation steps

Determine what DID device the lun is mapped to using /usr/cluster/bin/scdidadm -L

Disable failfast on the DID device using /usr/cluster/lib/sc/scsi -c disfailfast -d /dev/did/rdsk/DID

Release the DID device using /usr/cluster/lib/sc/scsi -c release -d /dev/did/rdsk/DID

Scrub the reserve keys from the DID device using /usr/cluster/lib/sc/scsi -c scrub -d /dev/did/rdsk/DID

Confirm reserve keys are removed using /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/DID

Remove lun from zone on machine or whatever procedure you were trying to complete.

HOW to recover from Amnesai situation

Amnesia Scenario:

Node node-1 is shut down.
Node node-2 crashes and will not boot due to hardware failure.
Node node-1 is rebooted but stops and prints out the messages:

Booting as part of a cluster
    NOTICE: CMM: Node node-1 (nodeid = 1) with votecount = 1 added.
    NOTICE: CMM: Node node-2 (nodeid = 2) with votecount = 1 added.
    NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d4s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
    NOTICE: CMM: Node node-1: attempting to join cluster.
   NOTICE: CMM: Quorum device 1 (gdevname /dev/did/rdsk/d4s2) can not be acquired by the current cluster members. This quorum device is held by node 2.

NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
Node node-1 cannot boot completely because it cannot achieve the needed quorum vote count.

In the above case, node node-1 cannot start the cluster due to the amnesia protection of Oracle Solaris Cluster. Since node node-1 was not a member of the cluster when it was shut down (when node-2 crashed) there is a possibility it has an outdated CCR and should not be allowed to automatically start up the cluster on its own.
The general rule is that a node can only start the cluster if it was part of the cluster when the cluster was last shut down. In a multi node cluster it is possible for more than one node to become "the last" leaving the cluster.

How to recover Sun Cluster 3.3 from amnesia if its having only one operatinal node

When we stop all nodes in Sun Cluster, the last node that leaves the cluster is the first that have to boot for the CCR consistency. However, if for any reason the last node that leaves the cluster can not boot (hardware failure … etc)

we will find the problem that the other nodes in the cluster will not boot and this message will appear:

Jul 15 11:05:19 maquina01 cl_runtime: [ID 980942 kern.notice]
NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting
for quorum.

This is a normal behavior that occurs to prevent what Sun Cluster called “amnesia” (see documentation for details). To start the cluster while the faulty node is repaired, we must make the following changes:

boot the node outside of the cluster

# reboot -- -x

Edit the file /etc/cluster/ccr/global/infrastructure

# cd /etc/cluster/ccr/global/
# vi infrastructure

Edit the /etc/cluster/ccr/infrastructure file and change the quorum_vote to 1 for the node that is up:
# vi /etc/cluster/ccr/infrastructure
cluster.nodes.1.name NODE1
cluster.nodes.1.state enable
cluster.nodes.1.properties.quorum_vote 1

For all other nodes and any Quorum Device, set the votecount to zero (0). For example:
cluster.nodes.N.properties.quorum_vote 0
cluster.quorum_devices.Q.properties.votecount 0
Where N is the node id and Q is the quorum device id.
Regenerate the checksum of /etc/cluster/ccr/infrastructure file:
# /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/infrastructure -o
Reboot node NODE1 into the cluster:
# reboot

SUN cluster commands

Some resource group cluster commands are:

* clrt register resource-type         : Register a resource type.
* clrt register -n node1name,node2name resource-type                 : Register a resource type to specific nodes.
* clrt unregister resource-type                                  : Unregister a resource type.
* clrt list -v                                      : List all resource types and their associated node lists.
* clrt show resource-type                     : Display all information for a resource type.
* clrg create -n node1name,node2name rgname    : Create a resource group.
* clrg delete rgname            : Delete a resource group.
* clrg set -p property-name rgname          : Set a property.
* clrg show -v rgname             : Show resource group information.
* clrs create -t HAStoragePlus -g rgname -p AffinityOn=true -p FilesystemMountPoints=/mountpoint resource-name
* clrg online -M rgname
* clrg switch -M -n nodename rgname
* clrg offline rgname                         : Offline the resource, but leave it in a managed state.
* clrg restart rgname
* clrs disable resource-name         : Disable a resource and its fault monitor.
* clrs enable resource-name       Re-enable a resource and its fault monitor.
* clrs clear -n nodename -f STOP_FAILED resource-name
* clrs unmonitor resource-name                  : Disable the fault monitor, but leave resource running.
* clrs monitor resource-name                   : Re-enable the fault monitor for a resource that is currently enabled.
* clrg suspend rgname                            : Preserves online status of group, but does not continue monitoring.
* clrg resume rgname                        : Resumes monitoring of a suspended group
* clrg status                                              : List status of resource groups.
* clrs status -g rgname

How to add zpool to the existing HASTORAGEPLUS in sun cluster resource

How to add zpool to the Existing HASTORAGEPLUS in sun cluster resource.

When you add a local or global file system to a HAStoragePlus resource, the HAStoragePlus resource automatically mounts the file system.
In the /etc/vfstab file on each node of the cluster, add an entry for the mount point of each file system that you are adding.
For local file systems
Set the mount at boot field to no.
Remove the global flag.
For cluster file systems
If the file system is a global file system, set the mount options field to contain the global option.
Retrieve the list of mount points for the file systems that the HAStoragePlus resource already manages
scha_resource_get -O extension -R hasp-resource -G hasp-rg FileSystemMountPoints

Modify the FileSystemMountPoints extension property of the HAStoragePlus resource to contain the following mount points

clresource set -p FileSystemMountPoints="mount-point-list" hasp-resource

Specifies a comma-separated list of mount points of the file systems that the HAStoragePlus resource already manages and the mount points of the file systems that you are adding.