Monday, December 9, 2013

vNFS datastore for Cluster upgrades

In our pre 5.1 environment we have multiple clusters without centralized storage.
This is a problem when the cluster nodes need to be upgraded - we can not evacuate the nodes via vMotion since the VMs are on local disk.
Here is a solution: vNFS datastore provided you have adequate local disk on your hosts.

Concept:
Present an NFS datastore to the cluster via a new VM using local storage.

Practice:
1 - Configure a Centos/Linux VM with the local storage and share it via NFS
2 - Add the NFS datastore to the cluster nodes
3 - Storage vMotion off the VMs from the node to be upgraded one at a time (monitor the load on the NFS vm via top - the throughput is largely based on the disk IO and secondarily the network)
4 - vMotion off the VMs now they are on the NFS datastore
5 - put the host is maintenance mode and upgrade
6 - reverse the procedure to migrate the VMs back to the upgraded host and local storage

With 5.1 local to local storage vMotion and the upcoming vSAN offering this is less of an issue, but in the interim, its saved us much downtime.


Tuesday, October 22, 2013

vCenter 5.5 upgrade fails and rolls back

Upgrading our 5.1 vCenter to 5.5, we ran into this error:

"Simple Install Setup Wizard ended prematurely because of an error"



This turned out to be related to a mismatch btw the Registry VC IP address and the expected FQDN.
VMware KB2060511 provides the solution for this issue (clean up the failed upgrade dir, and modify the registry setting)

Sunday, April 14, 2013

unregistering virtual machine hangs

We were migrating 40 VMs from one Netapp filer to another and implementing IPSpaces at the same time, so we needed to power down all 40 VMs for this.
Once the VMs were migrated, we powered on the DB VMs first - no issues.
Once the DBA verified Oracle was up, I started powering on app VMs.
But they all hung on  
 "unregistering virtual machine from source host system"

Turns out this was due to 3 of the vmwaretools updates hanging - I had started these vmwaretools updates while the DBA was working on Oracle.
These were not the same App VMs I was trying to power on, but once 2 of the hanging vmwaretools updates finally timed out, some of the App VMs finished powering on.
I manually killed the remaining hung vmwaretools update using the ESXi command line then everything proceeded fine.
Hope this helps someone else since it was not obvious at the time the vmwaretools updates were preventing other VMs from powering up.

Friday, February 8, 2013

vCenter 5.1 SSO login failed

I've worked through a few SSO issues, but this one was new.  Newly provisioned Power VM type role users were failing to authenticate (login or password incorrect) on the classic vi client, and "ns0:RequestFailed: Internal Error while creating SAML 2.0 Token" on the web client.

VMware finally posted KB 2043070 on 02/08/2013, which solved this issue for my users (even though I was not seeing the same log errors mentioned by the KB article).  

To resolve this issue, remove the the localOS identity source from vCenter Server Single Sign-On (SSO).


To remove the localOS identity source from the SSO configuration:

  1. Log in to the vSphere Web Client as the SSO administrator, admin@system-domain.
  2. Click Sign-On and Discovery.
  3. Click Configuration.
  4. Identify the Local Identity Source. Its domain name should match the machine name.
  5. Right-click the Local Identity Source and click Delete Identity Source.

There is another, older KB article 2zero34798 which is the TOP google hit for this error which was a timewaster for me (talks about misconfigured AD DNS - not relevant in my case)

Hope this saves some folks and their users time and aggravation!

Also ensure your vcenter service can restart following this change.
Ours failed with an error in the c:\programdata\vmware\vmware virtualcenter\logs\vpxd log (programdata is hidden by default)
This was because the service account the vcenter service runs as was missing from the VPX_ACCESS table.
We followed  KB 1005680 and inserted the row with:

insert into vpx_access (ID, PRINCIPAL, ROLE_ID, ENTITY_ID, FLAG) values ('100', 'DOM\dom.user', '-1', '1', '1');