Hello,
I am running Vmware server on an Ubuntu 7.04 (Feisty) server, with mixed (k)Ubuntu server/desktop 32/64bits as guests.
The host is a SUN server, dual Opteron dual cores (first generation, so without hardware virtualisation support), with 8GB RAM and 10k SAS disks in raid1.
There are 8 different VMs running on this Host - some with one CPU assigned, some others with 2. Global memory usage is below the physical memory of the machine, so the machine is not swapping.
The machine is called Grimston, munin monitoring is available here: http://monitor.thehumanjourney.net/munin/ (host.grimston being Grimston, the vmware host).
So, let's describe the couple of weird things we are experiencing. The CPU usage on the host is rather low: http://monitor.thehumanjourney.net/munin/Grimston/Host.Grimston-cpu.html - less than 33% of the CPU seems to be used.
The load whereas is rather high: http://monitor.thehumanjourney.net/munin/Grimston/Host.Grimston-load.html
Oscillating from 3 to 5. So I had a look at htop to see which machines where using the most CPU - seems the cpu usage comes mostly from 2 machines, NoMachine (kubuntu desktop - with X activated) and Zimbra (ubuntu dapper 32bits server).
Which actually surprised me as the nomachine server is hardly ever used, see its CPU usage:
http://monitor.thehumanjourney.net/munin/Grimston/Nomachine.Grimston.html
I thought then that it was handling a lot of networking and that it was a problem of vmware's network drivers, but no:
http://monitor.thehumanjourney.net/munin/Grimston/Nomachine.Grimston-if_eth0.html
But here is what I get when pinging from one VM to another VM on the SAME host:
yann@mirror:~$ ping 10.0.10.35
PING 10.0.10.35 (10.0.10.35) 56(84) bytes of data.
64 bytes from 10.0.10.35: icmp_seq=1 ttl=64 time=8.68 ms
64 bytes from 10.0.10.35: icmp_seq=2 ttl=64 time=139 ms
64 bytes from 10.0.10.35: icmp_seq=3 ttl=64 time=3350 ms
64 bytes from 10.0.10.35: icmp_seq=4 ttl=64 time=2348 ms
64 bytes from 10.0.10.35: icmp_seq=5 ttl=64 time=1348 ms
64 bytes from 10.0.10.35: icmp_seq=6 ttl=64 time=348 ms
64 bytes from 10.0.10.35: icmp_seq=7 ttl=64 time=401 ms
(that doesn't happen \_always_, but often enough to be noticed by our nagios monitoring). We also sometimes experience trouble using Zimbra, which sometimes complains of a too slow network - I guess both issues are related.
So, here are my questions:
\- How can it be that the load is so high, even if the CPU usage is so low? Most big VMs are using 2 CPUs...
\- How can it be that a VM is appearing on top of the top (hum, on top when you do a top on the host), appearing to use a lot of CPU, although the VM itself is nearly completely idle?
\- Do some other people experienced the same networking issues from two VMs on the same host?
Thanks in advance for your help