Discussion:
DPM 2010 RC Backups Seem to Crash Cluster Service
(too old to reply)
Rich McCue
2010-03-22 21:35:01 UTC
Permalink
Hi,

We are protecting a number of HyperV (2008 R2 Enterprise) hosts using DPM
2010 RC, we have two clusters of 4 nodes with each node running 4 virtual
machines. Our storage is on a Netapp 3140 connected via Emulex HBAs, and all
the servers run the Netapp DSM 3.3.1, NetApp host utils 5.2 and Snapdrive
6.2. Everything seems fine under normal conditions and when we create
recovery points manually but when our sheduled DPM backups run at 8PM we find
that the server appears to reboot and all of the guests on the node affected
move to another node, however the reboot doesn't appear to generate a blue
screen.

There appears to be no pattern to this and I've tried running different
guests on different hosts but nothing seems to help, the only common factor
seems to be that there are multiple backups occuring at the same time on the
same nodes that fail.

Here is one of the events logged when the problem occurs:

Time: 20:00:44
EventID: 1135
Cluster node 'SERVER' was removed from the active failover cluster
membership. The Cluster service on this node may have stopped. This could
also be due to the node having lost communication with other active nodes in
the failover cluster. Run the Validate a Configuration wizard to check your
network configuration. If the condition persists, check for hardware or
software errors related to the network adapters on this node. Also check for
failures in any other network components to which the node is connected such
as hubs, switches, or bridges.

Time: 20:03:00
EventID: 5121
Cluster Shared Volume 'Volume1' ('HyperV-CSV') is no longer directly
accessible from this cluster node. I/O access will be redirected to the
storage device over the network through the node that owns the volume. This
may result in degraded performance. If redirected access is turned on for
this volume, please turn it off. If redirected access is turned off, please
troubleshoot this node's connectivity to the storage device and I/O will
resume to a healthy state once connectivity to the storage device is
reestablished.

Time: 20:04:50
EventID: 41
The system has rebooted without cleanly shutting down first. This error
could be caused if the system stopped responding, crashed, or lost power
unexpectedly.

Ive installed the 2008 R2 hotfixes that should resolve the CSV problems on
both the host and guest operating systems but we still get this strange
restart problem.

Can anyone shed any light on this problem?

Thanks,
Richard
Shyama Hembram[MSFT]
2010-03-23 05:48:22 UTC
Permalink
Is the host and cluster communication happening on the same network?
Is the cluster communication happening over a Gbps LAN? Can you increase the
Cluster heartbeat timeout by running the following on one of the node of the
cluster.

cluster.exe /prop ClusSvcHangTimeout=120
--
Thanks
Shyama
[This posting is provided "AS IS" with no warranies, and confers no rights]
Post by Rich McCue
Hi,
We are protecting a number of HyperV (2008 R2 Enterprise) hosts using DPM
2010 RC, we have two clusters of 4 nodes with each node running 4 virtual
machines. Our storage is on a Netapp 3140 connected via Emulex HBAs, and all
the servers run the Netapp DSM 3.3.1, NetApp host utils 5.2 and Snapdrive
6.2. Everything seems fine under normal conditions and when we create
recovery points manually but when our sheduled DPM backups run at 8PM we find
that the server appears to reboot and all of the guests on the node affected
move to another node, however the reboot doesn't appear to generate a blue
screen.
There appears to be no pattern to this and I've tried running different
guests on different hosts but nothing seems to help, the only common factor
seems to be that there are multiple backups occuring at the same time on the
same nodes that fail.
Time: 20:00:44
EventID: 1135
Cluster node 'SERVER' was removed from the active failover cluster
membership. The Cluster service on this node may have stopped. This could
also be due to the node having lost communication with other active nodes in
the failover cluster. Run the Validate a Configuration wizard to check your
network configuration. If the condition persists, check for hardware or
software errors related to the network adapters on this node. Also check for
failures in any other network components to which the node is connected such
as hubs, switches, or bridges.
Time: 20:03:00
EventID: 5121
Cluster Shared Volume 'Volume1' ('HyperV-CSV') is no longer directly
accessible from this cluster node. I/O access will be redirected to the
storage device over the network through the node that owns the volume. This
may result in degraded performance. If redirected access is turned on for
this volume, please turn it off. If redirected access is turned off, please
troubleshoot this node's connectivity to the storage device and I/O will
resume to a healthy state once connectivity to the storage device is
reestablished.
Time: 20:04:50
EventID: 41
The system has rebooted without cleanly shutting down first. This error
could be caused if the system stopped responding, crashed, or lost power
unexpectedly.
Ive installed the 2008 R2 hotfixes that should resolve the CSV problems on
both the host and guest operating systems but we still get this strange
restart problem.
Can anyone shed any light on this problem?
Thanks,
Richard
Rich McCue
2010-03-23 08:13:02 UTC
Permalink
Hi,

Cluster comms and managament traffic occur accross the same 1GBPs NIC, the
HyperV guests have their own dedicated 1GBPs NIC, unfortunately we are
limited to two NICs per blade so can't separate this any further.

I've now updated the timeout as per your recommendation, I'll update this
post later once I know if the system is still going down.

Thanks,
Richard
Post by Shyama Hembram[MSFT]
Is the host and cluster communication happening on the same network?
Is the cluster communication happening over a Gbps LAN? Can you increase the
Cluster heartbeat timeout by running the following on one of the node of the
cluster.
cluster.exe /prop ClusSvcHangTimeout=120
--
Thanks
Shyama
[This posting is provided "AS IS" with no warranies, and confers no rights]
Post by Rich McCue
Hi,
We are protecting a number of HyperV (2008 R2 Enterprise) hosts using DPM
2010 RC, we have two clusters of 4 nodes with each node running 4 virtual
machines. Our storage is on a Netapp 3140 connected via Emulex HBAs, and all
the servers run the Netapp DSM 3.3.1, NetApp host utils 5.2 and Snapdrive
6.2. Everything seems fine under normal conditions and when we create
recovery points manually but when our sheduled DPM backups run at 8PM we find
that the server appears to reboot and all of the guests on the node affected
move to another node, however the reboot doesn't appear to generate a blue
screen.
There appears to be no pattern to this and I've tried running different
guests on different hosts but nothing seems to help, the only common factor
seems to be that there are multiple backups occuring at the same time on the
same nodes that fail.
Time: 20:00:44
EventID: 1135
Cluster node 'SERVER' was removed from the active failover cluster
membership. The Cluster service on this node may have stopped. This could
also be due to the node having lost communication with other active nodes in
the failover cluster. Run the Validate a Configuration wizard to check your
network configuration. If the condition persists, check for hardware or
software errors related to the network adapters on this node. Also check for
failures in any other network components to which the node is connected such
as hubs, switches, or bridges.
Time: 20:03:00
EventID: 5121
Cluster Shared Volume 'Volume1' ('HyperV-CSV') is no longer directly
accessible from this cluster node. I/O access will be redirected to the
storage device over the network through the node that owns the volume. This
may result in degraded performance. If redirected access is turned on for
this volume, please turn it off. If redirected access is turned off, please
troubleshoot this node's connectivity to the storage device and I/O will
resume to a healthy state once connectivity to the storage device is
reestablished.
Time: 20:04:50
EventID: 41
The system has rebooted without cleanly shutting down first. This error
could be caused if the system stopped responding, crashed, or lost power
unexpectedly.
Ive installed the 2008 R2 hotfixes that should resolve the CSV problems on
both the host and guest operating systems but we still get this strange
restart problem.
Can anyone shed any light on this problem?
Thanks,
Richard
Rich McCue
2010-03-25 09:08:01 UTC
Permalink
Hi again,

I've come in today to check the setup and we have experienced the cluster
crash once again, below is the event chain upto the failure:

20:00 DPM backup of VM's begins

20:04 Navssprv starts the VSS hardware provider
20:04 SnapDrive states that a snapshot has been successfully created
20:04 Navssprv - 'Data ONTAP VSS hardware provider has successfully
completed CommitSnapshots for SnapshotSetId
{da581d58-74d3-45f6-b5df-2e86e5e0ae3c} in 234 milliseconds'
20:05 Navssprv and Snapdrive map a LUN.

Time: 20:05:21
EventID: 1000
Data: Faulting application name: clussvc.exe, version: 6.1.7600.16385, time
stamp: 0x4a5bc614
Faulting module name: KERNELBASE.dll, version: 6.1.7600.16385, time stamp:
0x4a5bdfe0
Exception code: 0x80000003
Fault offset: 0x0000000000032442
Faulting process id: 0x3fbc
Faulting application start time: 0x01cac9317842c77c
Faulting application path: C:\Windows\Cluster\clussvc.exe
Faulting module path: C:\Windows\system32\KERNELBASE.dll
Report Id: 92cbeed2-3780-11df-88f4-001e0b61d0a2

Time: 20:05:21
EventID: 1001
Data: Fault bucket , type 0
Event Name: APPCRASH
Response: Not available
Cab Id: 0

Problem signature:
P1: clussvc.exe
P2: 6.1.7600.16385
P3: 4a5bc614
P4: KERNELBASE.dll
P5: 6.1.7600.16385
P6: 4a5bdfe0
P7: 80000003
P8: 0000000000032442
P9:
P10:

Attached files:

These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_clussvc.exe_7351f8239c1232a3331892369f231b119a972bd_8b2f5b28

Analysis symbol:
Rechecking for solution: 0
Report Id: 92cbeed2-3780-11df-88f4-001e0b61d0a2
Report Status: 4

The backup seems to succeed but the cluster resources failover to other
nodes, I then get various events from snapdrive and Navssprv saying tha the
LUNS were successfully unmapped and deleted around 1 minute later.

Thanks,
Rich
Rich McCue
2010-03-29 08:06:01 UTC
Permalink
Hi,

Has anyone got any further ideas for this problem?

Thanks,
Richard
Perry van Erning
2010-04-02 20:46:01 UTC
Permalink
Hi,

We have the same problem only we use the NetApp Snapmanager for Hyper-V and
get the same error 5121 on the hosts. Microsoft and Netapp are asking me to
install DPM 2010 rc and try to backup with it. So they can see if the problem
is by NetApp DSM or not. You could try to uninstall the Netapp Ontap (Mpio)
DSM and use the ms MPIO.

Also you could check the following command in a dos box on the cluster:

cluster.exe res "CSV1" /priv

You get a list of private properties.
What is the DiskSignature property? Is it 0 (0x0)

Regards,

Perry
Post by Rich McCue
Hi,
Has anyone got any further ideas for this problem?
Thanks,
Richard
Rich McCue
2010-04-07 15:41:02 UTC
Permalink
Hi Perry,

Thanks for the update, I'll try it without the Netapp DSM on a test platform
and let you know, our diskSignature is 3032226888 (0xb4bc1c48).

Please could you let me know if you find anything out regarding this problem.

Thanks,
Rich
Post by Perry van Erning
Hi,
We have the same problem only we use the NetApp Snapmanager for Hyper-V and
get the same error 5121 on the hosts. Microsoft and Netapp are asking me to
install DPM 2010 rc and try to backup with it. So they can see if the problem
is by NetApp DSM or not. You could try to uninstall the Netapp Ontap (Mpio)
DSM and use the ms MPIO.
cluster.exe res "CSV1" /priv
You get a list of private properties.
What is the DiskSignature property? Is it 0 (0x0)
Regards,
Perry
Post by Rich McCue
Hi,
Has anyone got any further ideas for this problem?
Thanks,
Richard
Perry van Erning
2010-04-15 13:56:01 UTC
Permalink
Hi Richard,

The errors 5121 in the cluster we must except because it's a normal
behaviour when you backup the CSV.

DPM or Snap manager for Hyper-V will backups the VM's per Host and put the
CSV in a backup state. The communications to the CSV from the other cluster
nodes (hosts) will be redirected. So the other hosts will try to connect true
an other network (card) to the owner of the CSV (the one that's backuped) and
then go to the Cluster Shared Volume.

When this happens you get the 5121 error. From NetApp they say there will be
KB soon on the Microsoft site.

Regards,

Perry
Jako
2011-10-20 08:20:28 UTC
Permalink
I don't think that error 5121 is normal and we have except that.

If I don't have room for hardware snapshots then I don't get errors! Just information, Event Id 5140 Cluster Shared Volume 'Volume1' ('Cluster Disk 2') backup was turned on. Once the backup application completes the backup process, the Cluster Shared Volume backup mode will be turned off. If the backup application has not initiated a snapshot using the Volume Shadow Copy Service within 30 minutes, Cluster Shared Volume backup will be turned off.

And when backup ends, then Event 5122: Cluster Shared Volume 'Volume1' ('Cluster Disk 2') has now resumed normal operation.

But if I have room for hardware snapshot, then I get error 5121 and after 2-4 minutes Event 5122: Cluster Shared Volume 'Volume1' ('Cluster Disk 2') has now resumed normal operation.

So it's releated with hardware VSS provider - maybe it works badly.

I use IBM DS3512 and LSI VSS hardware provider software.
Post by Rich McCue
Hi,
We are protecting a number of HyperV (2008 R2 Enterprise) hosts using DPM
2010 RC, we have two clusters of 4 nodes with each node running 4 virtual
machines. Our storage is on a Netapp 3140 connected via Emulex HBAs, and all
the servers run the Netapp DSM 3.3.1, NetApp host utils 5.2 and Snapdrive
6.2. Everything seems fine under normal conditions and when we create
recovery points manually but when our sheduled DPM backups run at 8PM we find
that the server appears to reboot and all of the guests on the node affected
move to another node, however the reboot does not appear to generate a blue
screen.
There appears to be no pattern to this and I have tried running different
guests on different hosts but nothing seems to help, the only common factor
seems to be that there are multiple backups occuring at the same time on the
same nodes that fail.
Time: 20:00:44
EventID: 1135
Cluster node 'SERVER' was removed from the active failover cluster
membership. The Cluster service on this node may have stopped. This could
also be due to the node having lost communication with other active nodes in
the failover cluster. Run the Validate a Configuration wizard to check your
network configuration. If the condition persists, check for hardware or
software errors related to the network adapters on this node. Also check for
failures in any other network components to which the node is connected such
as hubs, switches, or bridges.
Time: 20:03:00
EventID: 5121
Cluster Shared Volume 'Volume1' ('HyperV-CSV') is no longer directly
accessible from this cluster node. I/O access will be redirected to the
storage device over the network through the node that owns the volume. This
may result in degraded performance. If redirected access is turned on for
this volume, please turn it off. If redirected access is turned off, please
troubleshoot this node's connectivity to the storage device and I/O will
resume to a healthy state once connectivity to the storage device is
reestablished.
Time: 20:04:50
EventID: 41
The system has rebooted without cleanly shutting down first. This error
could be caused if the system stopped responding, crashed, or lost power
unexpectedly.
Ive installed the 2008 R2 hotfixes that should resolve the CSV problems on
both the host and guest operating systems but we still get this strange
restart problem.
Can anyone shed any light on this problem?
Thanks,
Richard
Post by Shyama Hembram[MSFT]
Is the host and cluster communication happening on the same network?
Is the cluster communication happening over a Gbps LAN? Can you increase the
Cluster heartbeat timeout by running the following on one of the node of the
cluster.
cluster.exe /prop ClusSvcHangTimeout=120
--
Thanks
Shyama
[This posting is provided "AS IS" with no warranies, and confers no rights]
Post by Rich McCue
Hi,
Cluster comms and managament traffic occur accross the same 1GBPs NIC, the
HyperV guests have their own dedicated 1GBPs NIC, unfortunately we are
limited to two NICs per blade so cannot separate this any further.
I have now updated the timeout as per your recommendation, I will update this
post later once I know if the system is still going down.
Thanks,
Richard
Post by Rich McCue
Hi again,
I have come in today to check the setup and we have experienced the cluster
20:00 DPM backup of VM's begins
20:04 Navssprv starts the VSS hardware provider
20:04 SnapDrive states that a snapshot has been successfully created
20:04 Navssprv - 'Data ONTAP VSS hardware provider has successfully
completed CommitSnapshots for SnapshotSetId
{da581d58-74d3-45f6-b5df-2e86e5e0ae3c} in 234 milliseconds'
20:05 Navssprv and Snapdrive map a LUN.
Time: 20:05:21
EventID: 1000
Data: Faulting application name: clussvc.exe, version: 6.1.7600.16385, time
stamp: 0x4a5bc614
0x4a5bdfe0
Exception code: 0x80000003
Fault offset: 0x0000000000032442
Faulting process id: 0x3fbc
Faulting application start time: 0x01cac9317842c77c
Faulting application path: C:\Windows\Cluster\clussvc.exe
Faulting module path: C:\Windows\system32\KERNELBASE.dll
Report Id: 92cbeed2-3780-11df-88f4-001e0b61d0a2
Time: 20:05:21
EventID: 1001
Data: Fault bucket , type 0
Event Name: APPCRASH
Response: Not available
Cab Id: 0
P1: clussvc.exe
P2: 6.1.7600.16385
P3: 4a5bc614
P4: KERNELBASE.dll
P5: 6.1.7600.16385
P6: 4a5bdfe0
P7: 80000003
P8: 0000000000032442
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_clussvc.exe_7351f8239c1232a3331892369f231b119a972bd_8b2f5b28
Rechecking for solution: 0
Report Id: 92cbeed2-3780-11df-88f4-001e0b61d0a2
Report Status: 4
The backup seems to succeed but the cluster resources failover to other
nodes, I then get various events from snapdrive and Navssprv saying tha the
LUNS were successfully unmapped and deleted around 1 minute later.
Thanks,
Rich
Post by Rich McCue
Hi,
Has anyone got any further ideas for this problem?
Thanks,
Richard
Post by Perry van Erning
Hi,
We have the same problem only we use the NetApp Snapmanager for Hyper-V and
get the same error 5121 on the hosts. Microsoft and Netapp are asking me to
install DPM 2010 rc and try to backup with it. So they can see if the problem
is by NetApp DSM or not. You could try to uninstall the Netapp Ontap (Mpio)
DSM and use the ms MPIO.
cluster.exe res "CSV1" /priv
You get a list of private properties.
What is the DiskSignature property? Is it 0 (0x0)
Regards,
Perry
Post by Rich McCue
Hi Perry,
Thanks for the update, I will try it without the Netapp DSM on a test platform
and let you know, our diskSignature is 3032226888 (0xb4bc1c48).
Please could you let me know if you find anything out regarding this problem.
Thanks,
Rich
Post by Perry van Erning
Hi Richard,
The errors 5121 in the cluster we must except because it is a normal
behaviour when you backup the CSV.
DPM or Snap manager for Hyper-V will backups the VM's per Host and put the
CSV in a backup state. The communications to the CSV from the other cluster
nodes (hosts) will be redirected. So the other hosts will try to connect true
an other network (card) to the owner of the CSV (the one that is backuped) and
then go to the Cluster Shared Volume.
When this happens you get the 5121 error. From NetApp they say there will be
KB soon on the Microsoft site.
Regards,
Perry
Post by JBritto
Hello, take a look at
http://fawzi.wordpress.com/2008/09/15/cluster-disk-0-does-not-support-persistent-reservation/
Post by unknown
Rich,
I am having the same problem as you mentioned here with a different storage
solution and VSS provider. Were you able to resolve your issue?
Thanks in advance.
Chris
Post by unknown
Hi,
I am having the same problem with EqualLogic PS6000 and Equallogic VSS provider install on Hyper-V. This provider is compatible with DPM2010 and Hyper-V R2.
Someone has a solution ?
Thanks for reply.
Chris Meehan
2010-06-07 13:39:49 UTC
Permalink
Rich,

I'm having the same problem as you mentioned here with a different storage
solution and VSS provider. Were you able to resolve your issue?

Thanks in advance.

Chris
Post by Rich McCue
Hi Perry,
Thanks for the update, I'll try it without the Netapp DSM on a test platform
and let you know, our diskSignature is 3032226888 (0xb4bc1c48).
Please could you let me know if you find anything out regarding this problem.
Thanks,
Rich
Post by Perry van Erning
Hi,
We have the same problem only we use the NetApp Snapmanager for Hyper-V and
get the same error 5121 on the hosts. Microsoft and Netapp are asking me to
install DPM 2010 rc and try to backup with it. So they can see if the problem
is by NetApp DSM or not. You could try to uninstall the Netapp Ontap (Mpio)
DSM and use the ms MPIO.
cluster.exe res "CSV1" /priv
You get a list of private properties.
What is the DiskSignature property? Is it 0 (0x0)
Regards,
Perry
Post by Rich McCue
Hi,
Has anyone got any further ideas for this problem?
Thanks,
Richard
unknown
2010-06-30 09:04:41 UTC
Permalink
Hi,

I am having the same problem with EqualLogic PS6000 and Equallogic VSS provider install on Hyper-V. This provider is compatible with DPM2010 and Hyper-V R2.

Someone has a solution ?

Thanks for reply.



Chris Meehan wrote:

Rich,I am having the same problem as you mentioned here with a different
07-Jun-10

Rich

I am having the same problem as you mentioned here with a different storag
solution and VSS provider. Were you able to resolve your issue

Thanks in advance

Chri

"Rich McCue" wrote:

Previous Posts In This Thread:

On Monday, March 22, 2010 5:35 PM
Rich McCue wrote:

DPM 2010 RC Backups Seem to Crash Cluster Service
Hi

We are protecting a number of HyperV (2008 R2 Enterprise) hosts using DP
2010 RC, we have two clusters of 4 nodes with each node running 4 virtua
machines. Our storage is on a Netapp 3140 connected via Emulex HBAs, and al
the servers run the Netapp DSM 3.3.1, NetApp host utils 5.2 and Snapdriv
6.2. Everything seems fine under normal conditions and when we creat
recovery points manually but when our sheduled DPM backups run at 8PM we fin
that the server appears to reboot and all of the guests on the node affecte
move to another node, however the reboot does not appear to generate a blu
screen

There appears to be no pattern to this and I have tried running differen
guests on different hosts but nothing seems to help, the only common facto
seems to be that there are multiple backups occuring at the same time on th
same nodes that fail

Here is one of the events logged when the problem occurs

Time: 20:00:4
EventID: 113
Cluster node 'SERVER' was removed from the active failover cluste
membership. The Cluster service on this node may have stopped. This coul
also be due to the node having lost communication with other active nodes i
the failover cluster. Run the Validate a Configuration wizard to check you
network configuration. If the condition persists, check for hardware o
software errors related to the network adapters on this node. Also check fo
failures in any other network components to which the node is connected suc
as hubs, switches, or bridges

Time: 20:03:0
EventID: 512
Cluster Shared Volume 'Volume1' ('HyperV-CSV') is no longer directl
accessible from this cluster node. I/O access will be redirected to th
storage device over the network through the node that owns the volume. Thi
may result in degraded performance. If redirected access is turned on fo
this volume, please turn it off. If redirected access is turned off, pleas
troubleshoot this node's connectivity to the storage device and I/O wil
resume to a healthy state once connectivity to the storage device i
reestablished

Time: 20:04:5
EventID: 4
The system has rebooted without cleanly shutting down first. This erro
could be caused if the system stopped responding, crashed, or lost powe
unexpectedly

Ive installed the 2008 R2 hotfixes that should resolve the CSV problems o
both the host and guest operating systems but we still get this strang
restart problem

Can anyone shed any light on this problem

Thanks
Richard

On Tuesday, March 23, 2010 1:48 AM
Shyama Hembram[MSFT] wrote:

Is the host and cluster communication happening on the same network?
Is the host and cluster communication happening on the same network
Is the cluster communication happening over a Gbps LAN? Can you increase th
Cluster heartbeat timeout by running the following on one of the node of th
cluster

cluster.exe /prop ClusSvcHangTimeout=12

-
Thank
Shyam
[This posting is provided "AS IS" with no warranies, and confers no rights]

On Tuesday, March 23, 2010 4:13 AM
Rich McCue wrote:

Hi,Cluster comms and managament traffic occur accross the same 1GBPs NIC,
Hi

Cluster comms and managament traffic occur accross the same 1GBPs NIC, th
HyperV guests have their own dedicated 1GBPs NIC, unfortunately we ar
limited to two NICs per blade so cannot separate this any further

I have now updated the timeout as per your recommendation, I will update thi
post later once I know if the system is still going down

Thanks,
Richard

"Shyama Hembram[MSFT]" wrote:

On Thursday, March 25, 2010 5:08 AM
Rich McCue wrote:

Hi again,I have come in today to check the setup and we have experienced the
Hi again,

I have come in today to check the setup and we have experienced the cluster
crash once again, below is the event chain upto the failure:

20:00 DPM backup of VM's begins

20:04 Navssprv starts the VSS hardware provider
20:04 SnapDrive states that a snapshot has been successfully created
20:04 Navssprv - 'Data ONTAP VSS hardware provider has successfully
completed CommitSnapshots for SnapshotSetId
{da581d58-74d3-45f6-b5df-2e86e5e0ae3c} in 234 milliseconds'
20:05 Navssprv and Snapdrive map a LUN.

Time: 20:05:21
EventID: 1000
Data: Faulting application name: clussvc.exe, version: 6.1.7600.16385, time
stamp: 0x4a5bc614
Faulting module name: KERNELBASE.dll, version: 6.1.7600.16385, time stamp:
0x4a5bdfe0
Exception code: 0x80000003
Fault offset: 0x0000000000032442
Faulting process id: 0x3fbc
Faulting application start time: 0x01cac9317842c77c
Faulting application path: C:\Windows\Cluster\clussvc.exe
Faulting module path: C:\Windows\system32\KERNELBASE.dll
Report Id: 92cbeed2-3780-11df-88f4-001e0b61d0a2

Time: 20:05:21
EventID: 1001
Data: Fault bucket , type 0
Event Name: APPCRASH
Response: Not available
Cab Id: 0

Problem signature:
P1: clussvc.exe
P2: 6.1.7600.16385
P3: 4a5bc614
P4: KERNELBASE.dll
P5: 6.1.7600.16385
P6: 4a5bdfe0
P7: 80000003
P8: 0000000000032442
P9:
P10:

Attached files:

These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_clussvc.exe_7351f8239c1232a3331892369f231b119a972bd_8b2f5b28

Analysis symbol:
Rechecking for solution: 0
Report Id: 92cbeed2-3780-11df-88f4-001e0b61d0a2
Report Status: 4

The backup seems to succeed but the cluster resources failover to other
nodes, I then get various events from snapdrive and Navssprv saying tha the
LUNS were successfully unmapped and deleted around 1 minute later.

Thanks,
Rich

On Monday, March 29, 2010 4:06 AM
Rich McCue wrote:

Hi,Has anyone got any further ideas for this problem?Thanks,Richard
Hi,

Has anyone got any further ideas for this problem?

Thanks,
Richard

On Friday, April 02, 2010 4:46 PM
Perry van Erning wrote:

Hi,We have the same problem only we use the NetApp Snapmanager for Hyper-V
Hi,

We have the same problem only we use the NetApp Snapmanager for Hyper-V and
get the same error 5121 on the hosts. Microsoft and Netapp are asking me to
install DPM 2010 rc and try to backup with it. So they can see if the problem
is by NetApp DSM or not. You could try to uninstall the Netapp Ontap (Mpio)
DSM and use the ms MPIO.

Also you could check the following command in a dos box on the cluster:

cluster.exe res "CSV1" /priv

You get a list of private properties.
What is the DiskSignature property? Is it 0 (0x0)

Regards,

Perry




"Rich McCue" wrote:

On Wednesday, April 07, 2010 11:41 AM
Rich McCue wrote:

Hi Perry,Thanks for the update, I will try it without the Netapp DSM on a test
Hi Perry,

Thanks for the update, I will try it without the Netapp DSM on a test platform
and let you know, our diskSignature is 3032226888 (0xb4bc1c48).

Please could you let me know if you find anything out regarding this problem.

Thanks,
Rich

"Perry van Erning" wrote:

On Thursday, April 15, 2010 9:56 AM
Perry van Erning wrote:

Hi Richard,The errors 5121 in the cluster we must except because it is a
Hi Richard,

The errors 5121 in the cluster we must except because it is a normal
behaviour when you backup the CSV.

DPM or Snap manager for Hyper-V will backups the VM's per Host and put the
CSV in a backup state. The communications to the CSV from the other cluster
nodes (hosts) will be redirected. So the other hosts will try to connect true
an other network (card) to the owner of the CSV (the one that is backuped) and
then go to the Cluster Shared Volume.

When this happens you get the 5121 error. From NetApp they say there will be
KB soon on the Microsoft site.

Regards,

Perry

On Thursday, May 13, 2010 10:23 AM
JBritto wrote:

Hello, take a look athttp://fawzi.wordpress.
Hello, take a look at
http://fawzi.wordpress.com/2008/09/15/cluster-disk-0-does-not-support-persistent-reservation/


"Shyama Hembram[MSFT]" wrote:

On Monday, June 07, 2010 9:39 AM
Chris Meehan wrote:

Rich,I am having the same problem as you mentioned here with a different
Rich,

I am having the same problem as you mentioned here with a different storage
solution and VSS provider. Were you able to resolve your issue?

Thanks in advance.

Chris


"Rich McCue" wrote:


Submitted via EggHeadCafe - Software Developer Portal of Choice
Composite UI Pattern And Enterprise Settings
http://www.eggheadcafe.com/tutorials/aspnet/14dd2b7f-9da4-4a45-bc93-ce5fdba5c5ee/composite-ui-pattern-and-enterprise-settings.aspx
JBritto
2010-05-13 14:23:01 UTC
Permalink
Hello, take a look at
http://fawzi.wordpress.com/2008/09/15/cluster-disk-0-does-not-support-persistent-reservation/
Post by Shyama Hembram[MSFT]
Is the host and cluster communication happening on the same network?
Is the cluster communication happening over a Gbps LAN? Can you increase the
Cluster heartbeat timeout by running the following on one of the node of the
cluster.
cluster.exe /prop ClusSvcHangTimeout=120
--
Thanks
Shyama
[This posting is provided "AS IS" with no warranies, and confers no rights]
Post by Rich McCue
Hi,
We are protecting a number of HyperV (2008 R2 Enterprise) hosts using DPM
2010 RC, we have two clusters of 4 nodes with each node running 4 virtual
machines. Our storage is on a Netapp 3140 connected via Emulex HBAs, and all
the servers run the Netapp DSM 3.3.1, NetApp host utils 5.2 and Snapdrive
6.2. Everything seems fine under normal conditions and when we create
recovery points manually but when our sheduled DPM backups run at 8PM we find
that the server appears to reboot and all of the guests on the node affected
move to another node, however the reboot doesn't appear to generate a blue
screen.
There appears to be no pattern to this and I've tried running different
guests on different hosts but nothing seems to help, the only common factor
seems to be that there are multiple backups occuring at the same time on the
same nodes that fail.
Time: 20:00:44
EventID: 1135
Cluster node 'SERVER' was removed from the active failover cluster
membership. The Cluster service on this node may have stopped. This could
also be due to the node having lost communication with other active nodes in
the failover cluster. Run the Validate a Configuration wizard to check your
network configuration. If the condition persists, check for hardware or
software errors related to the network adapters on this node. Also check for
failures in any other network components to which the node is connected such
as hubs, switches, or bridges.
Time: 20:03:00
EventID: 5121
Cluster Shared Volume 'Volume1' ('HyperV-CSV') is no longer directly
accessible from this cluster node. I/O access will be redirected to the
storage device over the network through the node that owns the volume. This
may result in degraded performance. If redirected access is turned on for
this volume, please turn it off. If redirected access is turned off, please
troubleshoot this node's connectivity to the storage device and I/O will
resume to a healthy state once connectivity to the storage device is
reestablished.
Time: 20:04:50
EventID: 41
The system has rebooted without cleanly shutting down first. This error
could be caused if the system stopped responding, crashed, or lost power
unexpectedly.
Ive installed the 2008 R2 hotfixes that should resolve the CSV problems on
both the host and guest operating systems but we still get this strange
restart problem.
Can anyone shed any light on this problem?
Thanks,
Richard
Loading...