W2008 R2 SP1 - Add node validate cluster - losing disk tru event 1568


we had problem adding third node our existing cluster communication time out. therefor choose update servers , try latest up-to-date fix levels.

when validating the cluster in order add a third node, saw in validation log:

list disks visible 2 or more nodes validated cluster compatibility. online clustered disks excluded.

disk identifier 6390744f has persistent reservation on it. disk might part of other cluster. removing disk validation set

disk identifier ca0db766 has persistent reservation on it. disk might part of other cluster. removing disk validation set

and:

cluster disk 8 node svr01.domain.local has 8 usable path(s) storage target
cluster disk 8 not managed microsoft mpio node svr02.domain.local

cluster disk 8 not managed microsoft mpio node svr03.domain.local

there 11 disks, 2 excluded validating , 1 disk failed mpio strange sure on svr02, existing cluster node.  

and on every svr node:

getting scsi page 83h vpd descriptors cluster disk 8 node svr01.domain.local scsi page 83h vpd descriptors cluster disk 8 , 9 match

scsi page 83h vpd descriptors cluster disk 8 , 10 match

at end of test:

an error occurred while executing test.
specified argument out of range of valid values.
parameter name: percentage

so failed validation test. checked cluster event log , saw no errors, warnings , was online. we logged in vms check event logs and on 1 server welcomed screen saying disk needed mbr record to be set.

when checking the disk in disk management on node saw unallocated status reserved. when looking under storage resource of cluster can see disk online volume path not there.

when looking @ cluster event log can see:

event 1568 - cluster disk resource 'sqlprod_log' found disk identifier stale. may expected if restore operation performed or if cluster uses replicated storage. disksignature or diskuniqueids property disk resource has been corrected.

this pass tru disk , disk vm wanted set mbr record on.

we removed storage resource, disk, mpio , san volume and exposed a new san volume, set mpio, disk , added new storage resource , restore data.

what can cause validating a cluster create such potentially disastrous problem?

tia,

fred

 



this unique disk identifier mechanism not solid proof. scattered on internet found several related posts, 1 brings necessary steps fix it:

microsoft failover cluster csv volume disappear
we began experiencing problems 3rd member of our windows failover cluster. our cluster consists of 3 servers running 2008 r2 sp1, running failover cluster manager san backend. san presents number of cluster shared volumes (csv) servers, of our data sits on these csv's.
one afternoon our primary csv went redirected mode, normal occurrence during backup operations, no backup scheduled , not able turn off redirected mode. had schedule short outage, power off hyper-v hosts , power on. after full investigation turned nothing put down an anomaly. 3 weeks later , problem happened again, time scheduled longer outage investigate problem more thoroughly.
during testing discovered 1 of 3 hosts causing issue, when removed cluster, no problems, when in cluster csv in question randomly go redirected mode. logs of san , hyper-v hosts turned nothing , cluster tests passed perfectly.
unfortunately during our testing, encountered bigger problem. when bringing faulty host online 3rd time, csv disappeared on 2 healthy hosts, csv still visible on 3rd host. promptly removed 3rd host cluster csv did not reappear on 2 healthy hosts.
what didn't work
we tried number of processes volume re-appear.
  • rescanning/refreshing in disk manager
  • deleting , re-adding csv
  • repairing csv
  • restarting hyper-v hosts
  • removing faulty host san lun zones
at point little worried, our primary csv displayed in windows empty disk (as above)  with failover cluster tools checked out disksignature , greeted grim 0. 
command: cluster resource vms /priv
d  vms                  disksignature                  0 (0x0)
scanning failoverclustering event logs turned following events:
event id: 1568
source: failoverclustering
cluster physical disk resource 'vms' cannot brought online because associated disk not found. expected signature of disk 'f62fc592'. if disk replaced or restored, in failover cluster manager snap-in, can use repair function (in properties sheet disk) repair new or restored disk. if disk not replaced, delete associated disk resource.
and
event id: 1568
source: failoverclustering

cluster disk resource 'vms' found disk identifier stale. may expected if restore operation performed or if cluster uses replicated storage. disksignature or diskuniqueids property disk resource has been corrected.

this repeated on , over, cluster trying repair problem not having success.

the solution
after reading thread we noticed in last post user mentioned "a microsoft tech fixed problem, disk first sector corrupted" decided partition table scan , re-write worth shot.
using testdisk able recover volume first analyzing disk partitions writing changes.
i re-wrote disk signature (which found in failoverclustering logs, per above) volume using below command.
cluster resource vms disksignature f62fc592
the volume came online, phew , within outage window!



Windows Server  >  High Availability (Clustering)



Comments

Popular posts from this blog

Motherboard replacement

Cannot create Full Text Search catalog after upgrading to V12 - Database is not fully started up or it is not in an ONLINE state

Remote Desktop App - Error 0x207 or 0x607