Thursday, November 18, 2010

Clearing broken-off snapmirror relationships

We have migrated many vfilers and volumes recently and some of these left behind old snapmirror relationships in the "Broken-off" state (as reported by snapmirror status)
The snapmirror.conf no longer referred to these volumes, so its an internal state needing to be cleared.
Turns out the method that works is snapmirror release:

snapmirror release {srcpath} {dstfiler}:{dstpath}
in which {srcpath} and {dstpath} are
{volname} or {/vol/volname/qtreename}
- remove destinations

I receieved an error when I issued the command:

No release-able destination found that matches those parameters. Use 'snapmirror destinations' to see a list of release-able destinations.

But a subsequent snapmirror status revealed the Broken-off record was cleared (gone)
(Don't always believe what ONTAP tells you ! :)

Friday, November 12, 2010

vFiler Migrate Netapp lockup

While we were using Netapp Provisioning Manager to migrate vFilers from an older 3040 to a new 3170, we ran into a bug causing the netapp head to drop off the network.
On the RLM console I observed many failed operations with the message:

error=No buffer space available

After consulting with Netapp support the recommendation was apply the solution for BUG 90314:

http://now.netapp.com/NOW/products/csb/csb0803-03.shtml

http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=90314

Specifically setting these hidden options to de-prioritize volume deletion related operations (these ops had swamped the netapp during an aborted vfiler migrated (that's another related issue))

options wafl.trunc.throttle.system.max 30
options wafl.trunc.throttle.vol.max 30
options wafl.trunc.throttle.min.size 1530
options wafl.trunc.throttle.hipri.enable off
options wafl.trunc.throttle.slice.enable off

So far, we have not seen the issue again.

Monday, November 8, 2010

vMotion High CPU vmmemctl

I was investigating a strange issue today during a vMotion of a CentOS 5 tomcat server (running 6 tomcat containers)

source host: Dell R900 Intel Xeon E7450 ESX 4.1 260247

dest host: Dell R610 Xeon 5680 ESXi 4.1 260247 (new)

I vMotion'ed the first of 2 Centos 5 VMs from the source to dest while running top and observed an unusual lag in top refresh during the cutover.

after about 15 seconds of no refreshes the VM came back and top showed a high load average with the vmmemctl near the top. I also observed the "mem used" column on this 8Gb VM went from 7.9Gb to 2.4Gb in the span of 10-40 seconds!

It turned out (after examining esxtop->(m)emory output) that the VM was swapping! (eventhough we were no where near overcommitted on these ESX hosts.)

The SWCUR and SWTGT columns showed zeros for all other behaving VMs, but these two app servers had recently had their memory allocation increased from 2 Gb to 8Gb - but the
Edit Settings ->Resources->Memory-> Unlimited checkbox
had not retained its unlimited value.
Updating it to be unlimited and restarting the VM fixed the issue.