2k8R2 Cluster certain cluster group not coming online


2 node cluster (virtual, vmware)
windows 2008r2
sql server 2008 r2 (10.50.1600)

3 sql server clustered instances
1 clustered msdtc

san storage using iscsi connections

here of observations:

- started getting ip conflict errors within last month, conflict occuring on windows server cluster virtual ip.
- clustered instances started failing on no reason seemingly @ same time 10pm , started happening @ 12 noon , getting errors quoram being lost followed volumes being lost , recovered through failover on other node.

- after while 1 of ss instance come on 1 node have 2 come on 1 node (e.g. no failover).
- have 1 those ss instances not come on either 1 of nodes.  particular thing that when attempting bring online (during 'online pending') able connect , query data using ssms after set timout for resource come online expires goes 'failed' , not accessible.  here errors sql server instance producing.

[sqsrvres] odbc sqldriverconnect failed
[sqsrvres] checkodbcconnecterror: sqlstate = 08001; native error = ffffffff; message = [microsoft][sql server native client 10.0]sql server network interfaces: error locating server/instance specified [xffffffff].
[sqsrvres] odbc sqldriverconnect failed
[sqsrvres] checkodbcconnecterror: sqlstate = hyt00; native error = 0; message = [microsoft][sql server native client 10.0]login timeout expired[sqsrvres] odbc sqldriverconnect failed
[sqsrvres] checkodbcconnecterror: sqlstate = 08001; native error = ffffffff; message = [microsoft][sql server native client 10.0]a network-related or instance-specific error has occurred while establishing connection sql server. server not found or not accessible. check if instance name correct , if sql server configured allow remote connections. more information see sql server books online.
sqsrvres] odbc sqldriverconnect failed

then informational log entry

fault bucket , type 0
event name: wsfc resource deadlock
response: not available
cab id: 0
problem signature:
p1: sql server (mvstg)
p2: sql server
p3: onlineresource
p4:
p5:
p6:
p7:
p8:
p9:
p10:

attached files:

these files may available here:
c:\programdata\microsoft\windows\wer\reportqueue\critical_sql server (mvst_f6c6c38b8673478d1cca2c1659ac3f41af00e9_1507c067

analysis symbol:
rechecking solution: 0
report id: ef126730-ee89-11e0-a201-005056be606f
report status: 4

i have checked file specified in informational message , found no errors.

any amazing!

i have seen behaviour well, first thing make sure exclude cluster service clussvc.exe antivirus.

also make sure private network using unique subnet available nodes of cluster configuring, private cluster network can use following blocks of ip addresses since related private usage

 

10.0.0.0 10.255.255.255 (class a)

172.16.0.0 172.31.255.255 (class b)

192.168.0.0 192.168.255.255 (class c)

 

 

it looks issue cluster communications interrupted long enough each node thinks other node down , there’s contention resources (can caused group policy). 

and when ip fails nic not release ip in due time.

 

hope helps!



SQL Server  >  SQL Server High Availability and Disaster Recovery



Comments

Popular posts from this blog

Motherboard replacement

Cannot create Full Text Search catalog after upgrading to V12 - Database is not fully started up or it is not in an ONLINE state

Remote Desktop App - Error 0x207 or 0x607