Yesterday I had Linux's OOM killer take out a production VM twice. The first time, while irritating the system came back. We had a plan to upgrade the kernel on the system to a HUGEMEM kernel as recommended here to try and prevent that.
The second time was a problem. The VM would not start. Using the VMware Server Console the screen flashed then simply returned to the inventory. Using vmware-cmd to start the machine returned 1, vmware-cmd getstatus said it was running but, it definitely was not.
I upgraded from 1.0.1 to 1.0.3 just in case vmware-config.pl didn't complete correctly when I had to reconfigure for the new kernel, but still had the same problems.
/var/log/vmware-server.log has entries like this:
May 17 21:32:46: app| OvlHostStartIo: errno 104
May 17 21:32:46: app| vmdbPipe_Streams: Couldn't read
May 17 21:32:46: app| VMHS: Connection to VM broken: cfg: /VirtualMachines/RedHat62/RedHat62.vmx; error: Pipe: Read failed; state: 3
May 17 21:32:46: app| VMServerd IPC closed the connection with thread /VirtualMachines/RedHat62/RedHat62.vmx (0x8302a50)
May 17 21:32:46: app| Lost connection to /VirtualMachines/RedHat62/RedHat62.vmx (/VirtualMachines/RedHat62/RedHat62.vmx) unexpectedly.
May 17 21:32:46: app| VM suddenly changed state: poweredOff.
May 17 21:32:46: app| VM suddenly changed state: poweredOff.
May 17 21:32:47: app| SP: Retrieved username: root
May 17 21:32:47: app| SP: Retrieved username: root
May 17 21:32:47: app| SP: Retrieved username: root
May 17 21:32:47: app| SP: Retrieved username: root
May 17 21:32:52: app| vmdbPipe_Streams Couldn't read: OVL_STATUS_EOF
May 17 21:32:52: app| SP: Deleting user session: 1 username: root
I believe this is related to another problem we have where this machine no longer starts on host boot since being moved from RHEL3 to CentOS 4.4 host.
When I connected to the vmware-server process from a Server Console on another machine and tried to start it, it came up fine.
Now, this is a little bit of a problem, as I can't manage the server locally. I really could not find any problems logged, so I can't provided much more details right now. I will provide what I can if there are any questions that could help narrow down the problem.
Any help would be very much appreciated.