Rich McCue
2010-03-22 21:35:01 UTC
Hi,
We are protecting a number of HyperV (2008 R2 Enterprise) hosts using DPM
2010 RC, we have two clusters of 4 nodes with each node running 4 virtual
machines. Our storage is on a Netapp 3140 connected via Emulex HBAs, and all
the servers run the Netapp DSM 3.3.1, NetApp host utils 5.2 and Snapdrive
6.2. Everything seems fine under normal conditions and when we create
recovery points manually but when our sheduled DPM backups run at 8PM we find
that the server appears to reboot and all of the guests on the node affected
move to another node, however the reboot doesn't appear to generate a blue
screen.
There appears to be no pattern to this and I've tried running different
guests on different hosts but nothing seems to help, the only common factor
seems to be that there are multiple backups occuring at the same time on the
same nodes that fail.
Here is one of the events logged when the problem occurs:
Time: 20:00:44
EventID: 1135
Cluster node 'SERVER' was removed from the active failover cluster
membership. The Cluster service on this node may have stopped. This could
also be due to the node having lost communication with other active nodes in
the failover cluster. Run the Validate a Configuration wizard to check your
network configuration. If the condition persists, check for hardware or
software errors related to the network adapters on this node. Also check for
failures in any other network components to which the node is connected such
as hubs, switches, or bridges.
Time: 20:03:00
EventID: 5121
Cluster Shared Volume 'Volume1' ('HyperV-CSV') is no longer directly
accessible from this cluster node. I/O access will be redirected to the
storage device over the network through the node that owns the volume. This
may result in degraded performance. If redirected access is turned on for
this volume, please turn it off. If redirected access is turned off, please
troubleshoot this node's connectivity to the storage device and I/O will
resume to a healthy state once connectivity to the storage device is
reestablished.
Time: 20:04:50
EventID: 41
The system has rebooted without cleanly shutting down first. This error
could be caused if the system stopped responding, crashed, or lost power
unexpectedly.
Ive installed the 2008 R2 hotfixes that should resolve the CSV problems on
both the host and guest operating systems but we still get this strange
restart problem.
Can anyone shed any light on this problem?
Thanks,
Richard
We are protecting a number of HyperV (2008 R2 Enterprise) hosts using DPM
2010 RC, we have two clusters of 4 nodes with each node running 4 virtual
machines. Our storage is on a Netapp 3140 connected via Emulex HBAs, and all
the servers run the Netapp DSM 3.3.1, NetApp host utils 5.2 and Snapdrive
6.2. Everything seems fine under normal conditions and when we create
recovery points manually but when our sheduled DPM backups run at 8PM we find
that the server appears to reboot and all of the guests on the node affected
move to another node, however the reboot doesn't appear to generate a blue
screen.
There appears to be no pattern to this and I've tried running different
guests on different hosts but nothing seems to help, the only common factor
seems to be that there are multiple backups occuring at the same time on the
same nodes that fail.
Here is one of the events logged when the problem occurs:
Time: 20:00:44
EventID: 1135
Cluster node 'SERVER' was removed from the active failover cluster
membership. The Cluster service on this node may have stopped. This could
also be due to the node having lost communication with other active nodes in
the failover cluster. Run the Validate a Configuration wizard to check your
network configuration. If the condition persists, check for hardware or
software errors related to the network adapters on this node. Also check for
failures in any other network components to which the node is connected such
as hubs, switches, or bridges.
Time: 20:03:00
EventID: 5121
Cluster Shared Volume 'Volume1' ('HyperV-CSV') is no longer directly
accessible from this cluster node. I/O access will be redirected to the
storage device over the network through the node that owns the volume. This
may result in degraded performance. If redirected access is turned on for
this volume, please turn it off. If redirected access is turned off, please
troubleshoot this node's connectivity to the storage device and I/O will
resume to a healthy state once connectivity to the storage device is
reestablished.
Time: 20:04:50
EventID: 41
The system has rebooted without cleanly shutting down first. This error
could be caused if the system stopped responding, crashed, or lost power
unexpectedly.
Ive installed the 2008 R2 hotfixes that should resolve the CSV problems on
both the host and guest operating systems but we still get this strange
restart problem.
Can anyone shed any light on this problem?
Thanks,
Richard