Esxi Issues
Esxi Issues
Esxi Issues
When connecting to the ESXi host using putty, you see the error:
Network error: Connection Refused
Cause
This issue may occur if the /etc/inetd.conf file is empty or does not contain the
correct settings for remote shell access and VMware authentication daemon.
Note: In ESXi 5.0, the inetd.conf file is located at /var/run/
Resolution
To resolve this issue:
1. Connect to the ESXi console directly using KVM session , such as iLO, iLOM,
DRAC, RSA, or IP KVM, press ALT+F1, and then log in as root.
2. Open the inetd.conf file using a text editor.
To open the file using the vi editor, run this command:
# vi /etc/inetd.conf
3. Ensure that the contents of the /etc/inetd.conf file are similar to:
# Internet server configuration database
# Remote shell access
ssh
stream tcp nowait
+min=0,swap,group=shell -i
ssh
stream tcp6 nowait
+min=0,swap,group=shell -i
In ESXi 5.0, the contents under Remote shell access appear similar to:
ssh stream tcp nowait root /usr/lib/vmware/openssh/bin/sshd sshd +
+swap,group=host/vim/vimuser/terminal/ssh -i
ssh stream tcp6 nowait root /usr/lib/vmware/openssh/bin/sshd sshd +
+swap,group=host/vim/vimuser/terminal/ssh -i
# VMware authentication daemon
authd stream tcp nowait root /sbin/authd
authd stream tcp6 nowait root /sbin/authd
authd
authd
When attempting to add an ESXi/ESX host to vCenter Server, you see an error
similar to:
Unable to access the specified host, either it doesn't exist, the server
software is not responding, or there is a network problem
3. Verify that the ESXi host is able to respond back to vCenter Server at the
correct IP address. If vCenter Server does not receive heartbeats from the
ESXi host, it goes into a not responding state. To verify if the correct Managed
IP Address is set, see Verifying the vCenter Server Managed IP Address
(1008030) and ESXi 5.0 hosts are marked as Not Responding 60 seconds
after being added to vCenter Server (2020100). See also, ESXi/ESX host
disconnects from vCenter Server after adding or connecting it to the
inventory (2040630) and ESX/ESXi host keeps disconnecting and
reconnecting when heartbeats are not received by vCenter Server (1005757).
4. Verify that network connectivity exists from vCenter Server to the ESXi host
with the IP and FQDN. For more information, see Testing network connectivity
with the ping command (1003486).
5. Verify that you can connect from vCenter Server to the ESXi host on TCP/UDP
port 902. If the host was upgraded from version 2.x and you cannot connect
on port 902, then verify that you can connect on port 905. For more
information, see Testing port connectivity with Telnet (1003487).
6. Verify if restarting the ESXi Management Agents resolves the issue. For more
information, see Restarting the Management agents on an ESXi or ESX host
(1003490).
7. Verify if the hostd process has stopped responding on the affected ESXi host.
For more information, see Troubleshooting vmware-hostd service if it fails or
stops responding on an ESX/ESXi host (1002849)
8. The vpxa agent has stopped responding on the affected ESXi host. For more
information, see Troubleshooting the vCenter Server Agent when it does not
start (1006128)
9. Verify if the ESXi host has experienced a Purple Diagnostic Screen. For more
information, see Interpreting an ESX/ESXi host purple diagnostic screen
(1004250)
10.ESXi hosts can disconnect from vCenter Server due to underlying storage
issues. For more information, see Identifying Fibre Channel, iSCSI, and NFS
storage issues on ESXi/ESX hosts (1003659)
2. Verify that port 22 on the ESX host can be reached from the SSH client
machine. For more information, see Testing port connectivity with the Telnet
command (1003487).
3. Verify that the username and password is correct by logging directly in to the
console. A common error is to use an incorrect username or password, which
causes a login failure and an access denied error message to appear.
4. Verify that the SSH Daemon is running. For more information, see Verifying
that the Secure Shell Daemon is running on an ESX host (1003906).
5. Verify that user is permitted to access the host via SSH. For more information,
see Enabling root SSH login on an ESX host (8375637).
6. Verify that the ESX host firewall permits SSH connections. For more
information, see Configuring the ESX host firewall for SSH (1003808).
This issue is resolved in a patch release for ESXi 5.1 and ESXi 5.0 Update 2. For
more information, see VMware ESXi 5.1, Patch ESXi510-201212401-BG: Updates
esx-base (2035777).
Similar symptoms may also occur on HP ProLiant Gen8 servers running ESXi 5.x,
where the /var/log/hpHelper.log file fills the ESXi 5.x RAMDisk.
For more information, see ESXi ramdisk full due to /var/log/hpHelper.log file size
(2055924).
For related information, see HP Proliant Gen8 Servers - ESXi 5: The
/var/log/hpHelper.log File Can Grow Very Large and Fill the ESXi 5 RAMDisk.
Note:
To help identify which files are consuming space, see ESX error: No free space
left on device (1007638).
The preceding links were correct as of June 4, 2014. If you find a link is
broken provide a feedback and a VMware employee will update the link.
Resolution
This issue is resolved in ESXi 5.1 Patch 04. For more information, see VMware ESXi
5.1, Patch ESXi510-201404401-BG: Updates esx-base (2070667).
Note: If the output indicates that the value is 2000 or more, this may be
causing the full inodes.
3. Delete the .trp files in the /var/spool/snmp/ directory by running the
commands:
# cd /var/spool/snmp
# for i in $(ls | grep trp); do rm -f $i;done
# cd /etc/vmware
# mv snmp.xml snmp.xml.bkup
5. Create a new file named snmp.xml and open it using a text editor. For more
information, see Editing files on an ESX host using vi or nano (1020302).
6. Copy and paste these contents to the file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<config>
<snmpSettings><enable>false</enable><port>161</port><syscontact></syscon
tact><syslocation></syslocation>
<EnvEventSource>indications</EnvEventSource><communities></communities><
loglevel>info</loglevel><authProtocol></authProtocol><privProtocol></pri
vProtocol></snmpSettings>
</config>
9. To confirm the SNMP services are running normally again, run the command:
# esxcli system snmp get
To ensure that the issue does not recur, you can temporarily disable snmpd to stop
logging. To stop the snmpd service, run this command:
# /etc/init.d/snmpd stop
Additional Information
Note: In the event that the host has run out of inodes, attempt to stop vpxa on the
host to free up an inode:
1. Connect to the host with the vSphere Client.
2. Click Configuration > Security Profile.
Rebooting an ESXi 5.1 host enabled with Beacon probing on the NIC teaming
policy fails with a purple diagnostic screen.
Resolution
This is a known issue affecting ESXi 5.1 and is resolved in ESXi 5.1 Update 1. For
more information, please refer to the ESXi 5.1 Update 1 Release Notes.
To reduce the likelihood of this issue occurring, deactivate Beacon Probing if it is
activated on one of the vSwitches.
To disable Beacon probing you will need to edit the failover detection policy:
1. Log in to the VMware vSphere Client and select the ESX/ESXi host from the
inventory panel
hosted log files are usually the logs you will want to check first.
ESX Update log - /var/log/vmware/esxupdate.log Logs all updates done through the
esxupdate tool.
As part of the troubleshooting process, often times you'll need to find out the version of various
ESX components and which patches are applied. Below are some commands you can run from
the service console to do this:
Type vmware v to check ESX Server version, i.e., VMware ESX Server 3.0.1 build-
Type vpxa v to check the ESX Server management version, i.e. VMware
VirtualCenter Agent Daemon 2.0.1 build-40644.
Type rpm qa | grep VMware-esx-tools to check the ESX Server VMware Tools
installed version i.e., VMware-esx-tools-3.0.1-32039.
32039
ESXi 5.0 host experiences a purple diagnostic screen with the errors "Failed to ack
TLB invalidate" or "no heartbeat" on HP servers with PCC support (2000091)
Symptoms
The purple diagnostic screen or core dump contains messages similar to:
Cause
Some HP servers experience a situation where the PCC (Processor Clocking Control
or Collaborative Power Control) communication between the VMware ESXi kernel
(VMkernel) and the server BIOS does not function correctly.
As a result, one or more PCPUs may remain in SMM (System Management Mode) for
many seconds. When the VMkernel notices a PCPU is not available for an extended
period of time, a purple diagnostic screen occurs.
Resolution
This issue has been resolved as of ESXi 5.0 Update 2 as PCC is disabled by default.
For more information, see VMware ESXi 5.0, Patch ESXi500-Update02: VMware ESXi
5.0 Complete Update 2 (2033751) and the ESXi 5.0 Update 2 Release Notes.
To work around this issue in versions prior to ESXi 5.0 U2, disable PCC manually.
To disable PCC:
1. Connect to the ESXi host using the vSphere Client.
2. Click the Configuration tab.
3. In the Software menu, click Advanced Settings.
4. Select vmkernel.
Below issues were addressed as part of this latest HPSA driver version
Fixed a null pointer dereference in error handling code that can cause a PSOD
in rare cases when device inquiries fail.
To identify the current version of HPSA driver installed with your ESXi 5.5 host, execute the
below command:
~ # vmkload_mod -s hpsa | grep Version
Version: Version 5.5.0.58-1OEM, Build: 1331820, Interface: 9.2 Built on: Dec16 2013
Driver Version: HP HPSA Driver (v 5.5.0.58-1OEM) meant the affected software driver was
used.
How to Install the latest HPSA driver for ESXi 5.5:
for your ILO configurations like Reset ILO configuration, Reconfigure ILO IP address
and reset ILO administrative password.
Now ILO configuration of ESXi host can easily be done without need of host restart using HP
ESXi utilities.
HP ESXi Utilities Offline bundle for VMware ESXi 5 will be available as part of HP customized
ESXi installer image but if it is not a HP customized ESXi image then you may need to
download and install HP ESXi Utilities Offline bundle for VMware ESXi 5. It downloads the
HP ESXi Utilities Offline bundle Zip file.This ZIp file contains 3 different utilities HPONCFG ,
HPBOOTCFG and HPACUCLI utilities for remote online configuration of servers.
HPONCFG can be used to set up and reconfigure the iLO (Integrated Lights-Out) management
controller of a server.
HPBOOTCFG can be used to set up the boot configuration for the server.
HPACUCLI can be used in the configuration of HP Smart Array controllers and attached storage
You can directly download HPONCFG and Upload the VIB file into your ESXi host and
execute the below command to install the HPONCFG utility.
esxcli software vib install -f -v /tmp/hponcfg-04-00.10.vib
Once it is installed. Browse towards the directory /opt/hp/tools and execute the below
commands to perform the below operations. Below are the available help options for HPONCFG
utility
Browse to /opt/hp/tools and execute the below command to reset the HP ILO configuration.
./hponcfg -r
Browse to /opt/hp/tools and execute the below command to export the ILO configuration into
text file.
Create a file named reset_admin_pw.xml with the below info and add the new password in the
password section:
<ribcl VERSION=2.0>
<login USER_LOGIN=Administrator PASSWORD=YOUR-NEW-PASSWORD>
<user_INFO MODE=write>
<mod_USER USER_LOGIN=Administrator>
<password value=newpass/>
</mod_USER>
</user_INFO>
</login>
</ribcl>
Commit the Updated administrator password information from the file
(reset_admin_pw.xml) to ILO using the below command
./hponcfg -f reset_admin_pw.xml
VMware VirtualCenter Server service fails to start with the error: Unable to create
SSO facade: vmodl.fault.SystemError (2056296)
Symptoms
When attempting to ping localhost with a command prompt from the vCenter
Server, it returns ::1 instead of 127.0.0.1.
The Windows Event logs located under Control Panel > Administration
Tools > Event Viewer contains events similar to:
Event ID 7024 - The VMware VirtualCenter service terminated with servicespecific error. The system cannot find the file specified.
Event ID 7009 - A timeout was reached while waiting for the VMware
VirtualCenter Server service to connect.
Resolution
To resolve the issue, remove the comment from the IPv4 line for localhost and
comment out the IPv6 line.
To remove the comment from the IPv4 line for localhost and comment out the IPv6
line:
1. Log in to machine running vCenter Server.
2. Open the C:\Windows\System32\drivers\etc\hosts file using a text editor.
3. Locate the localhost lines for IPv4 and IPv6:
# 127.0.0.1 localhost
::1
localhost
Note: If either line does not exist in the hosts file, add it to the end of the
text.
4. Delete the hash symbol (#) to remove the comment from the IPv4 line and
add a hash symbol to comment out the IPv6 line.
127.0.0.1 localhost
# ::1 localhost
If the VirtualCenter Server service still fails, reboot the vCenter Server
server.
You would have come across a lot of instances of hard disk failures of your physical servers. It is
necessary to identify the exact disk which is failed on the server. It can be easliy checked using
hardware managenet tools like HP system Management, HP ILO or even in Hardware status tab
of ESXi host from vSphere Client. This post talks about the checking the status of disk failures
for esxi host command line utilities. In this post, i am going to discuss about the HP hardwares
and how to check the disk failures from command line in Hp hardwares. This post will guide
you step by step procedure to verify the disk status in ESXi host using HPSSACLI utility which
is part of HP ESXi Utilities Offline bundle for VMware ESXi 5.x.
HP ESXi Utilities Offline bundle for VMware ESXi 5.x will be available as part of HP
customized ESXi installer image but if it is not a HP customized ESXi image then you may need
to download and install HP ESXi Utilities Offline bundle for VMware ESXi 5.x.This ZIP file
contains 3 different utilities HPONCFG , HPBOOTCFG and HPSSACLI utilities for remote
online configuration of servers.
HPONCFG Command line utility used for obtaining and setting ProLiant iLO
configurations.
You can download and install HP ESXi utilities offline bundle for ESXi 5.X using below
command
esxcli software vib install -f -v /tmp/hp-esxi5.5uX-bundle-1.7-13.zip
You can even directly donwload HPSSACLI utility and Upload the VIB file into your ESXi host
and execute the below command to install the HPACUCLI utility.
esxcli software vib install -f -v /tmp/hpssacli-1.60.17.0-5.5.0.vib
Once it is installed. Browse towards the directory /opt/hp/hpssacli/bin and verify the installation.
Check the Disk Failure Status:
Type the below command to check the status of Disks in your ESXi host. It displays the status of
the Disk in All Arrays under the Controller.
/opt/hp/hpssacli/bin/hpssacli controller slot=0 physicaldrive all show
Thats it. We identified the disk failure, You may need to generate the HP ADU (Array
Diagnostics Utility) report to raise the support case with hardware vendor. Please refer my blog
post How to Generate HP ADU Disk Report in ESXi host to understand the step by step guide
to generate ADU report from ESXi host command line. I hope this is informative for you.
Thanks for Reading!!!. Be Social and Share it in Social media, if you feel worth sharing it.
This post might be sounds simple that removing datastore from ESXi host but it is not actually
simple as it sounds. VMware Administrators might think that right-click the datastore and
unmounting. It is not only the process to remove LUN from ESXi hosts but there are few
additional pre-checks and post tasks like detaching the device from the host is must before we
request storage administrator to unpresenting the LUN from the backend storage array. This
process needs to be followed properly otherwise it may cause bad issues like APD (All Paths
Down) condition on the ESXi host. Lets review what is All Path Device (APD) condition.
AS per VMware APD is when there are no longer any active paths to a storage device from the
ESX, yet the ESX continues to try to access that device. When hostd tries to open a disk device, a
number of commands such as read capacity and read requests to validate the partition table are
sent. If the device is in APD, these commands will be retried until they time out. The problem is
that hostd is responsible for a number of other tasks as well, not just opening devices. One task
is ESX to vCenter communication, and if hostd is blocked waiting for a device to open, it may
not respond in a timely enough fashion to these other tasks. One consequence is that you might
observe your ESX hosts disconnecting from vCenter.
VMware has did lot of improvements to how to handle APD conditions over the last number of
releases, but prevention is better than cure, so I wanted to use this post to explain you the best
practices of removing the LUN from ESXi host.
Pre-Checks before unmounting the Datastore:
1.If the LUN is being used as a VMFS datastore, all objects (such as virtual machines, snapshots,
and templates) stored on the VMFS datastore are unregistered or moved to another datastore
using storage vMotion. You can Browse the datastore and verify no objects are placed on the
datatsore.
2. Ensure the Datastore is not used for vSphere HA heartbeat.
3. Ensure the Datastore is not part of a Datastore cluster and not managed by Storage DRS.
4. Datastore should not be used as a Diagnostic coredump partition.
5. Storage I/O control should be disabled for the datastore.
6. No third-party scripts or utilities are accessing the datastore.
7. If the LUN is being used as an RDM, remove the RDM from the virtual machine. Click
Edit Settings, highlight the RDM hard disk, and click Remove. Select Delete from disk if it is not
selected, and click OK. Note: This destroys the mapping file, but not the LUN content.
Procedure to Remove Datastore or LUN from ESXi 5.X hosts:
1. Ensure you have reviewed all the pre-checks as mentioned above for the datastore ,which you
are going to unmount.
2. Select the ESXi host-> Configuration-> Storage-> Datastores. Note down the naa id for that
datastore. Which starts something like naa.XXXXXXXXXXXXXXXXXXXXX.
Image thanks
to shabiryusuf.wordpress.com
5.Select the ESXi host-> Configuration-> Storage-> Devices. Match the devices with the
naa.id (naa.XXXXXXXX) which you have noted down in step 2 with the Identifier. Select the
device which has same naa.id as the unmounted datastore. Right-click the device and Detach.
Verify all the Green checks and click on Ok to detach the LUN
6. Repeat the same steps for all ESXi hosts, where you want to unpresent this datastore.
7. Inform your storage administrator to physically unpresent the LUN from the ESXi host using
the appropriate array tools. You can even share naa.id of the LUN with your storage
administrator to easily identify from the storage end.
8. Rescan the ESXi host and verify detached LUNs are disappeared from the ESXi host.
Thats it. I hope this post helps you understand the detailed procedure to properly remove the
Datastore or LUN from your ESXi host.