VC Troubleshooting
VC Troubleshooting
VC Troubleshooting
Troubleshooting
Guide
CONTENTS
VSPHERE TROUBLESHOOTING INTRODUCTION.................................................................................................................................................. 3
vSphere Troubleshooting Tools.................................................................................................................................................................... 4
VMware Command Line Tools....................................................................................................................................................................... 4
VMware Log Locations for Troubleshooting............................................................................................................................................ 6
The vSphere Syslog Collector........................................................................................................................................................................ 8
The VM Support Command............................................................................................................................................................................. 8
The vCenter Bash Cell....................................................................................................................................................................................... 8
CONCLUSION.....................................................................................................................................................................................................................40
ABOUT ALTARO.................................................................................................................................................................................................................41
ABOUT RYAN BIRK...........................................................................................................................................................................................................43
FOLLOW ALTARO..............................................................................................................................................................................................................44
VSPHERE TROUBLESHOOTING INTRODUCTION
Before we begin, we need to start off with an introduction to a few things that will make
life easier. We’ll start with a troubleshooting methodology and how to gather logs. After
that, we’ll break this eBook into the following sections: Installation, Virtual Machines,
Networking, Storage, vCenter/ESXi and Clustering.
ESXi and vSphere problems arise from many different places, but they generally fall into
one of these categories:
• Hardware issues
• Resource contention
• Network attacks
• Software bugs
• Configuration problems
One of the first things you should try to do when experiencing a problem with a host,
is try to reproduce the issue. If you can find a way to reproduce it, you have a great
way to validate that the issue is resolved when you do fix it. It can be helpful as well
to take a benchmark of your systems before they are implemented into a production
environment. If you know HOW they should be running, it’s easier to pinpoint a
problem.
3
In terms of virtual machine level issues, is it possible that you could have a limit or
share value that’s misconfigured?
At the ESXi Host Level, you could need additional resources. It’s hard to believe
sometimes, but you might need another host to help with load!
Once you have identified the root cause, you should assess the impact of the problem
on your day to day operations. When and what type of fix should you implement? A
short-term one or a long-term solution? Assess the impact of your solution on daily
operations.
―― A set of esxcfg-* commands: The esxcfg commands are deprecated but you will
likely still see some older documentation with them. The recommendation
today is to use esxcli.
―― The host shell can be accessed a couple of different ways, either by using the
local DCUI (Direct Console User Interface) or via SSH.
4
• Local access by using the Direct Console User Interface (DCUI):
2. Access the ESXi Shell from the DCUI by pressing Alt-F1 after
logging in.
3. When finished, disable the ESXi Shell service when not using it.
1. Enable the SSH service on your ESXi host, either in the DCUI or
through the vSphere Web Client.
2. Use PuTTY or your preferred SSH Client to access the ESXi host.
• vSphere Management Assistant (This tool has been deprecated. 6.5 is final release):
• esxcli
• vmware-cmd
• vicfg-* commands
• VMware PowerCLI:
5
VMWARE LOG LOCATIONS FOR TROUBLESHOOTING
VMware stores logs for their products in various locations. It’s important to know where
to look when you’re having problems quickly and efficiently.
■■ %ALLUSERSPROFILE%\VMWare\vCenterServer\logs
■■ /var/log/vmware/
• Includes logs for SSO, Inventory Service and the Web Client.
◦◦ vCenter vpxd.log
■■ This log file is the main vCenter Server log file. If you ever contact
VMware for support, it is highly likely that they will ask you for this file.
Don’t confuse this with vpxa, that is the vCenter agent and runs on the
ESXi hosts.
◦◦ You can monitor and view the logs easily through the vSphere Web Client,
under the Monitor tab (Figure 1), with an SSH session at /var/log (Figure 2) or
in the DCUI under “View System Logs” under System Customization (Figure 3).
6
(Figure 1)
(Figure 2)
(Figure 3)
7
THE VSPHERE SYSLOG COLLECTOR
You can gather logs at the above locations or setup a single location for all of your ESXi
hosts to point to. It uses port 514 for TCP and UDP, and port 1514 for SSL. The Syslog
collector is installed on both the Windows based vCenter and the vCenter Appliance.
• Log files
• System status
• Configuration files
The tool does not require any arguments and it create a zip file using the host name
and time stamp.
8
PART 1: VSPHERE INSTALLATION
TROUBLESHOOTING
This section will cover some of the common issues with vSphere deployments. We will
split this section into two sections. The first will cover ESXi host troubleshooting during
installation, the second will cover vCenter deployments at installation.
Specific drivers are tested and chosen. If it’s not on the list, don’t expect support.
VMware has a large partner eco-system and both hardware and software goes through
rigorous testing and is signed off on for official support.
VMware also has various community driver support. What this means is that even
though your hardware can work with ESXi, it’s not running in a fully supported mode.
This is a nice feature for users who build homelabs for practice.
Another important note to remember during installation is that not all of your drivers
might install automatically. It’s possible that your hardware could be newer and you
might have to download a vSphere Installation Bundle, also called a VIB. A VIB is
somewhat like a tarball or ZIP archive in that it is a collection of files packaged into a
single archive to make software deployments easier.
• A file archive
• A signature file
9
The signature file is the electronic signature used to verify the level of trust. The trust
level will be one of the four listed below:
If installation was successful and you have all the right VIBs and software configured,
but other issues have come up, you should always check the hostd.log file first. The
hostd management service is the main communication channel between ESXi hosts
and VMkernel. If hostd fails, the ESXi host disconnects from vCenter and cannot be
easily managed.
Occasionally, an ESXi host will crash and display a purple diagnostic screen. A host can
crash for several reasons. CPU exceptions, driver issues, machine check exceptions,
hardware fault or a software bug.
2. Restart the host, get the VMs up and running on another host if possible. If
using HA, this should happen on its own if configured properly.
3. Contact VMware support if you can’t find any information online. Occasionally
others have the same issue and the fix can be implemented easily through
firmware or software updates.
10
Another possible issue is that the ESXi host simply hangs during the boot process.
You never get a PSOD, it just sits there and the entire system becomes unresponsive.
Typically hangs happen during a power cycle of a system during the boot process. It’s
caused by VMkernel being too busy or a possible hardware lockup.
2. Try to login to the host with the vSphere Web Client or Embedded Host Client.
If you can ping the host, that’s a good sign. Next connect to the DCUI to display any
messages on the screen. Press Alt-F12 at the host console to do that.
To recover from a host that has hung, try rebooting the ESXi host, review logs and
gather performance statistics. If you determine it’s a hardware issue, fix the hardware
and if required reinstall or reconfigure ESXi. Lastly update the host with the most recent
patches.
VCENTER TROUBLESHOOTING
When installing the vCSA, VMware has split the install into two different stages. Stage 1
is the appliance deployment. Stage 2 is the configuration of the appliance.
Occasionally you might run into issues replacing certificates with the Certificate Manager. It
can hang at 0% and perform an automatic rollback error. This issue can be caused by using
non-Base64 certificates. To resolve, manually publish the full chain to the certificate store
using the vSphere 6.0 Certification Manager.
11
To launch the vSphere 6.x Certificate Manager, run this command using the command
prompt:
12
PART 2: VIRTUAL MACHINE TROUBLESHOOTING
Before we jump into troubleshooting virtual machines, let’s review some of the typical
virtual machine files you will run into.
13
Content ID mismatch conditions are triggered by interruptions to major virtual machine
migrations such as Storage vMotion or Migration, VMware software error, or user action.
The Content ID (CID) value of a virtual machine disk descriptor file aids in the goal of
ensuring content in a parent virtual disk file, such as a flat or base disk, is retained
in a consistent state. The child delta disks that derive from that base disk’s snapshot
contain all further writes and changes. These changes depend on the source disk to
remain intact.
To resolve, open the latest vmware.log and locate the specific disk chain affected. You
will see a line or warning that is similar to: “Content ID mismatch (parentCID ed06b3ce
!= 0cb205b1)”
In our case change the parentCID in the disk descriptor file from ed06b3ce to 0cb205b1.
Then overwrite the existing vmdk file and power the machine back on.
• An error occurred while quiescing the virtual machine. The error code was: 4
The error was: Quiesce aborted.
14
When taking snapshots be sure the following occur:
When taking snapshots, be sure you do not reach 32 levels. If you have more than 32,
you cannot create more snapshots. Generally, it’s a recommended practice to keep as
little of snapshots as possible on a virtual machine. They can be a performance hit and
difficult to troubleshoot.
If a snapshot creation also fails, check that the user has permissions to take a snapshot.
Then check that the disk is also supported. RDMs in physical mode, independent disks
or VMs with bus-sharing are not supported.
Snapshots will grow based on delta files. You cannot create or commit a snapshot if a
snapshot (delta) does not have a descriptor file.
<vm name>- A delta vmdk is created whenever a snapshot is taken. The pre-snapshot vmdk in
00000n-delta. use is locked for writing. Any changes from there on are written to the vm’s delta
vmdk disk. This allows a vm to be restored to any state prior to a specific snapshot
being taken.
<vm name>- The descriptor file for the delta vmdk file.
00000n.vmdk
If the –delta.vmdk has no descriptor file, you will need to create one before doing
anything:
1. Copy the base disk descriptor file, use the name of the missing descriptor file.
• The base descriptor file is the original top most .vmdk file.
2. Edit the new descriptor file. Change the format from a base disk to a snapshot
delta disk descriptor.
15
Another possible issue that might arise when troubleshooting snapshots could be
insufficient space on a datastore to commit all the snapshots. Be sure to check the
Summary tab of your datastore or run the command “df -h” to determine if you have
enough space. You’ll need to increase the size of a the datastore or move virtual
machines to other datastores with enough space.
If the test VM does power on, that indicates it is likely isolated to that specific virtual
machine. Each virtual machine has a vmware.log file in the virtual machine directory
and contains detailed information. Before going through the logs, review the recent
tasks and event sessions in vCenter as well as that can sometimes alert you to previous
tasks that might contain an obvious fix.
Browse to the location of the VM and determine that all the virtual machine files are
there. Look for vmx, vmdks, etc. Restore the file if you see anything missing.
A virtual machine will also not power on if one of the virtual machine’s files is locked.
―― touch filename
―― vmkfstools -D /vmfs/volumes/Shared/VM02/VM02-flat.vmdk
16
―― Check the MAC address at the location (See below) in the output.
―― If you see all zeros for the owner that means the owner is the current ESXi
server.
4. Login to the host that has the locked file and identify the process.
• Virtual Machines show as invalid or orphaned after an ESX host comes out of
maintenance mode
17
To fix, follow the steps below:
1. Determine the datastore where the virtual machine configuration (.vmx) file is
located.
2. Return to the virtual machine in the vSphere Web Client, right-click, and select:
Side Note: Remove from Inventory is far different than Delete from Disk. Remove from
Inventory allows you to re-add the machine later. Delete from Disk destroys the data.
4. Click Storage.
6. Navigate to the folder named after the virtual machine, and locate the virtual
machine.vmx file.
7. Right-click the .vmx file and click Add to inventory. The Add to Inventory wizard
opens.
If you were looking to recreate and not just remove the virtual machine try the
following:
1. Browse to the datastore and verify that the virtual machine files exist.
2. If the vmx configuration file was deletedor remove and the disk files are still
there, attach the old disk files to a newly create machine.
18
PART 3: STORAGE TROUBLESHOOTING
If a virtual machine cannot access its virtual disks, the cause of the problem might be
anywhere from the virtual machine to physical storage.
As you can see below, there are multiple types of storage, it’s important to determine
what type you’re troubleshooting before starting. A “datastore” can be multiple things
with different types of connectivity.
• Verify that the ESXi host can see the LUN by running: “esxcli storage core path
list” from the host.
You can also rescan your storage adapters under your storage adapter section in the
vSphere Web Client or rescan at the cluster level as show below:
19
If the rescan does not resolve it, it is likely that something else is causing the issue. Have
there been any other recent changes to the ESXi host?
Is LUN masking in place? Is the LUN still Check to see if the array is supported
presented?
Check your adapter settings. Are the network port bindings setup properly? Is the target
name spelled properly? Is the initiator name correct? Are there any required CHAP
settings needed? Do you see your storage devices under the devices tab?
20
If the storage device is online but functioning poorly, check your physical device
latency metrics as well. High numbers (greater than 15 or 20 ms) represent a slow or
overworked array. Your goal is to not oversubscribe your links. Try to isolate iSCSI and
NFS.
Column Description
CMDS/s This is the total amount of commands per second and includes IOPS (Input/Output
Operations Per Second) and other SCSI commands such as SCSI reservations, locks,
vendor string requests, unit attention commands etc. being sent to or coming from
the device or virtual machine being monitored.
DAVG/cmd This is the average response time in milliseconds per command being sent to the
device.
KAVG/cmd This is the amount of time the command spends in the VMkernel (2-3 ms here
represent either an overworked array or an overworked host).
GANG/cmd This is the response time as it is perceived by the guest operating system.
• On the NFS server, are the ACLs correct? (read/write or read only).
21
VMware supports both NFS v3 and v4.1, but it’s important to remember that they use
different locking mechanisms:
Configure an NFS array to allow only one NFS protocol. Use either NFS v3 or NFS v4.1
to mount the same NFS share across all ESXi hosts. It is not a good idea to mix. Data
corruption might occur if they try to access the same NFS share with different client
versions.
NFS 4.1 also does not currently support Storage DRS, vSphere Storage I/O Control, Site
Recovery Manager or Virtual Volumes.
22
A path to a storage/LUN device can be marked as Dead in these situations:
• The ESXi storage stack determines a path is Dead due to the TEST_UNIT_
READY command failing on probing
• The ESXi storage stack marks paths as Dead after a permanent device loss (PDL)
• The ESXi storage stack receives a Host Status of 0x1 from an HBA driver
For iSCSI storage, verify that NIC teaming is not misconfigured. Next verify your path
selection policy is setup properly.
Check for Permanent Device Loss or All Paths Down. There are two distinct states
a device can be in when storage connectivity is lost; All Paths Down or Permanent
Device Loss. For each of these states, the device is handled is different. All Paths Down
(APD) is a condition where all paths to the storage device are lost or the storage device
is removed. The state is caused because the change happened in an uncontrolled
manner, and the VMkernel storage stack does not know how long the loss of access to
the device will last. The APD is a condition that is treated as temporary (transient), since
the storage device might come back online; or it could be permanent, which is referred
to as a Permanent Device Loss (PDL).
• You are unable to connect directly to the ESXi host using the vSphere Client
23
It is also important to mention that you will need to check the SAN/NAS fabric as well to
get further detailed information in the event of an APD.
The storage all paths down (APD) handling on the ESXi host is enabled by default. When
it is enabled, the host continues to retry nonvirtual machine I/O commands to a storage
device in the APD state for a limited time frame. When the time frame expires, the host
stops the retry attempts and terminates any nonvirtual machine I/O. You can disable
the APD handling feature on your host. If you disable the APD handling, the host will
indefinitely continue to retry issued commands to reconnect to the APD device. If you
disable it, it’s possible that the host could exceed their internal I/O timeout and become
unresponsive.
You might want to increase the value of the timeout if there are storage devices
connected to your ESXi host which might take longer than 140 seconds to recover from
a connection loss. You can enter a value between 20 and 99999 seconds for the Misc.
APOTimeout value.
24
STORAGE TROUBLESHOOTING SCENARIO #4 – VSAN WON’T
TURN ON
Before you begin it is important to realize that vSAN is a software based storage product
that is entirely dependent on the proper functioning underlying hardware components,
like network, storage I/O controller and the individual storage devices. You always need
to follow the vSAN Compatibility Guide for all deployments.
Many vSAN errors can be traced back to faulty VMkernel ports, mismatched MTU sizes,
etc. It’s far more than simple TCP/IP.
When enabling vSAN, you need to verify that each host has connectivity to all hosts in
the cluster via a VMkernel port marked for vSAN. You will also need to verify that each
host has flash based disks that will be dedicated as flash tier and another typically
slower disk used for the capacity tier. Note that it is possible to run all flash for both
cache and capacity tier.
◦◦ Can inspect underlying disk devices and how they are being used by
vSAN.
• esxcli vsan
25
Ruby vSphere Console
• vSAN Observer
26
PART 4: NETWORK TROUBLESHOOTING
In vSphere, networking problems can occur at many different levels. It is important to
know which level to start with. Is it a virtual machine problem or a host problem? Did
the issue arise when you migrated the machine to a new host?
■■ Standard switches
■■ Distributed switches
You also must determine if it’s a virtual machine or a host management issue.
• Does the ESXi host network configuration appear correct? IP, subnet mask,
gateway?
• If using VLANs, does the VLAD ID of the port group look correct?
• Check the trunk port configuration on the switch. Have there been any recent
changes?
• Does the physical uplink adapter have all settings configured properly? (speed,
duplex, etc.)
27
• If using NIC teaming, is it setup and configured properly?
• If all of the above test ok, check that you don’t have a physical adapter failure.
If you recently moved the VM to a new host, also verify that an equivalent port group
exists on the host and that the network adapter is connected in the virtual machine
settings. The firewall in the guest operating system might be blocking traffic. Ensure
that the firewall does not block required ports.
Typically, this issue is because of lost heartbeat packets between vCenter (vpxd) and an
ESXi host (vpxa).
The first thing you should check is that no firewall is in place blocking the vCenter
communication ports. Then verify that network congestion is not occurring on the
network. This issue is more prevalent with Windows based vCenter systems.
◦◦ If the firewall is configured with the proper ports, ensure that Windows
Firewall is not blocking UDP port 902.
28
By default vpxa uses UDP port 902, but it is possible to change the ports to something
else. Check the /etc/vmware/vpxa/vpxa.cfg file <ServerPort> setting.
When it comes to network congestion, dropped heartbeats can happen as well. Some
tools you can use to troubleshoot:
◦◦ You can use the resxtop utility or graphical views to analyze traffic.
■■ Direction of traffic is specified using --dir 0 for inbound and --dir 1 for
outbound.
■■ Two (or more) separate traces can be run in parallel but need to be
merged later in wireshark.
◦◦ Wireshark
One feature VMware has, which helps in this case is the Rollback feature. Several
different types of events can trigger a network rollback:
29
◦◦ Updating DNS and routing settings on the ESXi host
If any of the above are changed and it fails, the host rolls back to the last known good
configuration.
30
PART 5: VCENTER AND ESXI TROUBLESHOOTING
Before we being troubleshooting vCenter it is important to note that there are two
major components to vCenter. First you have the Platform Services Controller. This
component helps by scaling services across an organization and deals with identity
management for various tools that interact with vSphere. It also includes vCenter
Single Sign-On, the certificate store, certificate authority, the license service, directory
service and lookup service. A Platform Services Controller can be connected to multiple
vCenters.
vCenter Server includes the vCenter service, the web client service, inventory service,
PostgreSQL, a syslog service, the dump collector and auto deploy.
Multiple Platform Services Controller instances can be used together when used with a
load balancer.
One of the most important vCenter logs is vpxd.log. This is the main vCenter Server log,
it contains all vSphere Client and web services connections, internal tasks and events,
and communication with the vCenter Server Agent (vpxa) on managed ESXi/ESX hosts.
You may extract a support bundle from the vSphere Web Client as well.
• In the Export Support Bundle window, expand the trees to view the services
running in the appliance and deselect the services for which you do not want
to export log files.
31
VCENTER/ESXI TROUBLESHOOTING SCENARIO #1 –
THE VCENTER SERVICE FAILS TO START
There are two different types of vCenters as mentioned, above. If you’re using the Linux
based appliance, check the vCenter Server service status from the vSphere Web Client.
On a Windows based vCenter, run services.msc and check for error messages related to
the service. In the image below, it is demonstrated using the vCenter appliance.
• Use netstat and verify port 902, 443 or 80 are not being used by other services.
Occasionally the vCenter service will start but the Inventory Service is not functioning.
You will generally notice this issue when creating storage profiles. VMware has also
written scripts that will allow you to reset the inventory service database. See
KB 2146248 for more information and to download the script.
Most of those these issues are related to a SQL Express installation where you may have
a limited database file due to the edition limitation: SQL Express 2008 is limited to 1
CPU, 1 GB of RAM and 4 GB for database file, the SQL Express 2008 R2 version increase
the maximum file to 10 GB and the SQL Express 2012 version also can use up to 4 cores.
Generally this type of database should only be used in testing environments. Avoid
using SQL Express for anything in a production environment!
32
Often, you will receive an error like this:
If you are having problems with slow startups check the following items:
If vCenter does not start, check the ODBC datasource configuration is correct. Verify
that port 902, 80 or 443 is not being used by other processes. This happens more
frequently when running vCenter on Windows.
If vCenter starts but is very slow to start up, you could also be experiencing database
issues. Check that there happens to be disk space available, healthy transaction logs
and that database authentication has not changed.
• Keep the vCenter statistic level at level 2 or lower and only increase when
troubleshooting an existing issue.
VMware has written a script and provided it in KB 1025914. This script will truncate the
event and tasks tables. Before running this script, be sure you take a full backup of the
vCenter database.
33
VCENTER/ESXI TROUBLESHOOTING SCENARIO #3 – AN ESXI
HOST HANGS DURING BOOT OR RANDOMLY
When an ESXi host hangs, it is typically either VMkernel being too busy, hardware failure
or software failure.
To verify that an ESXi host is hanging, try the following tasks on the host:
• If the host is hanging during boot, check what it’s hanging on.
• Monitor network traffic and its virtual machine traffic. If you see virtual
machine traffic, then your host is still functioning at a minimal level.
To verify that the host is hung, use the ESXi host’s DCUI to check logs.
If the host is not responsive reboot the ESXi host first. Determine why the host locked
up by reviewing the logs and looking for errors or performance anomalies. Reinstall
ESXi and install the latest patches and updates.
◦◦ Forces virtual machines to use their own internal page file, when host is
in contention.
• Memory Compression
34
• Memory Swapping
◦◦ Uses swap space either specified at host level or with virtual machine
files. Generally, performs poorly.
You can use esxtop or vCenter advanced performance charts to troubleshoot. First
though, start with checking that hyperthreading is turned on. It can be overlooked.
c = cpu
m = memory
n = network
d = disk adapter
u = disk device
MCTL This column is either YES or NO. If Yes it means that the balloon driver is installed.
The Balloon driver is automatically installed with VMware tools and should be in every
virtual machine.
MCTLSZ This column shows you how inflated the balloon is in the virtual machine. If
it says 500MB, it translates to the balloon driver inside the guest operating system has
“borrowed” 500MB from Windows/Linux etc. You would expect to see a value of 0 in this
column.
SWCUR This section tells you how much memory the virtual machine has in the vswp
file. If you see 500MB here, it means that 500MB is the swap file size. This does not
necessarily mean bad performance however. To figure out if your virtual machine is
suffering from swapping, you need to look at the next two counters.
35
SWR/s This value tells you the read activity to your swap file. If you see a number here,
then your virtual machine is suffering from hypervisor swapping. Performance will
likely be poor. You will probably be getting user complaints.
SWW/s This value tells you the write activity to your swap file. You want to see the
number 0 (zero) here. Every number above 0 is BAD. Again, users will likely notice the
performance issue at this point.
%USED This metric tells you how much time did the virtual machine spends executing
CPU cycles on the physical CPU.
%RDY is one of the most important indicators when it comes to performance! Always
start here. This defines how much time your virtual machine is waiting to execute CPU
cycles but could not get access to the physical CPU. It tells you how much time you
spend in line, patiently waiting for physical CPU.
%CSTP tells you how much time is spent waiting for a virtual machine with multiple
vCPUs to catch up.
If you notice things are off, here a few things you can try to address these issues.
First, start with resource pools, although if you are swapping and seeing significant
performance problems, a resource pool is likely only going to be a temporary solution
to fix the problem. You can, if acceptable, shut down other non-production machines to
free up resources. Another option would be to add additional ESXi hosts.
36
PART 6: HA, DRS AND VMOTION TROUBLESHOOTING
• Verify DNS name resolution works, this is more important in older versions of
vSphere (4.x).
• Are datastores required for network heartbeats connected to all the hosts?
Two heartbeat datastores are recommended (see below).
It is also possible that the FDM Agent cannot be installed on the ESXi host. Check the
agent installation log at /var/log/fdm-installer.log.
On the ESXi host, check to see if there are agent files (clusterconfig, fdm.cfg, hostlist
and vmmetadata) in the /etc/opt/vmware/fdm directory.
37
HA, DRS AND VMOTION TROUBLESHOOTING SCENARIO #2 –
ERROR MESSAGES ABOUT INSUFFICIENT RESOURCES
Typically, you will receive this error when powering on a machine in a HA cluster with
insufficient resources. Check the following information:
• The virtual machine does not have reservations set properly. Generally, they
are set too high and the host cannot guarantee the reservation. Remember
an ESXi host will only power on virtual machines if the reservations can be
guaranteed.
• It is also possible that the cluster has no available physical resources. You will
need to add more hosts to the cluster to ensure adequate performance.
38
HA, DRS AND VMOTION TROUBLESHOOTING SCENARIO #3 –
VMOTION FAILS OR TIMES OUT
If vMotion was working and suddenly stops working first start by restarting your
management agents by issuing the following commands:
• /etc/init.d/hostd restart
• /etc/init.d/vpxa restart
You can also restart the management agents in the DCUI on the host.
• Check NTP settings, and verify all hosts are syncing to the same source.
• Verify that the log.rotateSize parameter in the virtual machine .vmx file is not
set too low. The default is 0, which is unlimited.
If the hosts are not equally balanced, and the cluster is set to fully automated mode, a
vMotion should occur automatically.
Verify that the DRS automation level is not set to manual mode.
If your virtual machine fails to migrate, check that it is not attached to any local
resources that the other host does not have access to.
39
CONCLUSION
To wrap up, there is no perfect way to troubleshoot. The scenarios in this eBook are
some of the more popular methods you will see as a VMware Administrator and should
help get you in the right direction. Always start with the simple things and work from
there!
40
ABOUT ALTARO
Altaro Software (www.altaro.com) is a fast growing developer of easy to use backup
solutions used by over 30,000 customers to back up and restore both Hyper-V and
VMware-based virtual machines, built specifically for Small and mid-market business
with up to 50 host servers. Altaro take pride in their software and their high level of
personal customer service and support, and it shows; Founded in 2009, Altaro already
service over 30,000 satisfied customers worldwide and are a Gold Microsoft Partner for
Application Development and Technology Alliance VMware Partner.
Altaro VM Backup is intuitive, feature-rich and you get outstanding support as part of
the package. Demonstrating Altaro’s dedication to Hyper-V, they were the first backup
provider for Hyper-V to support Windows Server 2012 and 2012 R2 and also continues
support Windows Server 2008 R2.
41
42
ABOUT RYAN BIRK
Ryan has been working in Information Technology for many years and calls himself a
“virtualization snob” these days. He has been a Virtualization Consultant, Engineer and
Technical Instructor most recently. Since 2012, he has been a proud VMware vExpert
and runs a blog at ryanbirk.com, which happens to focus on VMware home labs.
@ryanbirk
VMware vExpert 12-17
VMware Certified Instructor
43
FOLLOW ALTARO
Like our eBook? There’s more!
44