Our recent exploits into SSD, hybrid storage (startup vendor) trials have provided me a renewed appreciation for the solid, staid, well QA'ed feature sets of the tier 1 storage vendors, in this case Netapp's DataMotion.
Datamotion is to the datastore as vMotion is to the VM.
Datamotion is "meta storage vMotion" and the benefits are analogous.
Our vSphere VMs run in vFiler NFS datastores on Netapp (Ontap 8.1) production cluster. We replicate our production cluster to our campus standby cluster. Just as vMotion allows for non-disruptive, zero downtime hypervisor upgrades, datamotion allows us to shift vFilers hosting dozens of VMs live, with zero downtime between physical clusters separated by campus/metro area distance.
We've employed datamotion for the last 3-4 cluster upgrades - "datamotion'ing" off all vFilers from the production cluster to the standby cluster, upgrading the evacuated cluster (Ontap versions, plus hardware, trays, flash cache SSD) - all with zero downtime.
In this age of storage startups - each with their own brand new filesystem which may or may not have your data integrity protected, I have to commend the engineering acumen behind datamotion. I have seen datamotions fail (< 5%) - but similar to when vMotions fail, this is prior to the cutover fail-safes and so its robust - no downtime - correct the previous issue and the next datamotion succeeds.
We recently installed Netapp SSD flash cache - in a follow up post I will use (unsupported online) datamotion between compare flash cache cluster heads to compare performance with and without the SSD.
Thursday, April 26, 2012
Sunday, April 15, 2012
vExpert 2012
Its been a great privilege being associated with the vExpert 2011 class, whether meeting up at VMWorld, PEX, VMUGs or leveraging the great knowledge base this diverse group represents, the experience has been richly rewarding on many levels!
So I am very pleased to be part of the just announced 2012 vExpert group!
Congratulations to those returning and new vExperts - if @sherrod's SVVMUG keynote last week was any indication, we have some very exciting times ahead!
So I am very pleased to be part of the just announced 2012 vExpert group!
Congratulations to those returning and new vExperts - if @sherrod's SVVMUG keynote last week was any indication, we have some very exciting times ahead!
Thursday, April 5, 2012
ESXi Reboot Causes ARP flood
We recently encountered a very strange issue where rebooting one of our ESXi 5.0 hosts would cause network disruption (NTI Enviromux devices would drop off the net, older switches would experience packet loss and vLAN reconfig flapping).
Eventually we determined with a packet capture the host was flooding the subnet with ARP requests related to one of its NFS datastores. When we removed this datastore the host could reboot without causing the network disruption. But we needed this datastore to satisfy HA requirement for 2 heartbeat datastores.
The datastores are all provisioned from Netapp with the exports file listing each host's IP address for RW and ROOT access. Upon closer inspection, the problem host's IP was listed only in the RW section of the Netapp exports file. So, while the datastore would show up mounted, ESXi was not given full access it needed for HA heartbeat functionality, and the result was this flood of ARP.
Once the exports file was updated with IP in the ROOT section, the datastore remounted with full permissions and the host stopped exhibiting the problematic ARP, network disruption behavior.
We still have a case open with VMware to determine why ESXi 5.0 Update 1 behaves this way (allowing an NFS mount without the access required for full functionality, plus unleashing an ARP flood)
Eventually we determined with a packet capture the host was flooding the subnet with ARP requests related to one of its NFS datastores. When we removed this datastore the host could reboot without causing the network disruption. But we needed this datastore to satisfy HA requirement for 2 heartbeat datastores.
The datastores are all provisioned from Netapp with the exports file listing each host's IP address for RW and ROOT access. Upon closer inspection, the problem host's IP was listed only in the RW section of the Netapp exports file. So, while the datastore would show up mounted, ESXi was not given full access it needed for HA heartbeat functionality, and the result was this flood of ARP.
Once the exports file was updated with IP in the ROOT section, the datastore remounted with full permissions and the host stopped exhibiting the problematic ARP, network disruption behavior.
We still have a case open with VMware to determine why ESXi 5.0 Update 1 behaves this way (allowing an NFS mount without the access required for full functionality, plus unleashing an ARP flood)
Tuesday, April 3, 2012
vCenter SQL Max Server Memory
We encountered an error on our vCenter 5 server today
Our vCenter VM is Windows 2008 (64 bit) with MS SQL 2008. The VM is allocated 16Gb RAM - but I've noticed for a while no matter how much memory is allocated, the SQL Server process will eventually grow to fill the allocated memory. And this error got me digging for a solution.
As it turns out, in SQL 2008, the memory allocation for SQL is dynamic and this growth of the SQL Server memory use is expected, UNLESS you set the "Maximum Server Memory"
This is done simply via the SSMS (SQL Server Management Studio):
Right Click Database -> Properties -> Memory
The change is effective immediately - in taskmgr I saw our SQL Server memory usage DROP from 13Gb to 10Gb. By limiting the SQL memory, this will actually help optimize other vCenter processes and jobs (like the vCheck5 report - which has been taking > 3 hours lately?) Tune in tomorrow to find out!
SQL Server failed with error code 0xc0000000 to spawn a thread to process a new login or connection. Check the SQL Server error log and the Windows event logs for information about possible related problems.
Our vCenter VM is Windows 2008 (64 bit) with MS SQL 2008. The VM is allocated 16Gb RAM - but I've noticed for a while no matter how much memory is allocated, the SQL Server process will eventually grow to fill the allocated memory. And this error got me digging for a solution.
As it turns out, in SQL 2008, the memory allocation for SQL is dynamic and this growth of the SQL Server memory use is expected, UNLESS you set the "Maximum Server Memory"
This is done simply via the SSMS (SQL Server Management Studio):
Right Click Database -> Properties -> Memory
The change is effective immediately - in taskmgr I saw our SQL Server memory usage DROP from 13Gb to 10Gb. By limiting the SQL memory, this will actually help optimize other vCenter processes and jobs (like the vCheck5 report - which has been taking > 3 hours lately?) Tune in tomorrow to find out!
Subscribe to:
Posts (Atom)