Jump to content

Dedicated Server Performance Degraded on Virtual Machines?


Recommended Posts

Posted (edited)

TL;DR: I ran a dedicated server benchmark on Windows 10 both natively as well as virtualized (Linux QEMU+KVM). The VM reproducibly performs worse than the native server by more than 3 orders of magnitude: The VM yields median frame times >1000 times of native execution on the same hardware. This has me doubt the validity of these results. Therefore, I'd appreciate it if someone else could independently verify these findings.

I wrote Benchmark.lua to get some performance figures when running the DCS dedicated server on Linux. It will repeatedly run a simple benchmark mission for a configurable duration and log simulation frame times. I finally got around to running a benchmark on my local rig (Ryzen 9 5900X, 64G DDR4 3600) both natively on my regular Windows 10 install as well as on a fresh Windows 10 install running inside a VM (QEMU 10.0 + KVM on Linux 6.12.22). Of course, benchmark mission and server configuration are identical. I've tried my best to give the VM a fighting chance: Reserved an entire CPU CCD for exclusive use by the VM, used raw partition on same physical NVMe SSD (not just a disk image on a Linux file system), and pre-allocated the VM memory on 1G hugepages (to avoid page faults). These optimizations do make a difference (more on that in a future post). Here's my QEMU command line for reference and peer review:

qemu-system-x86_64 -enable-kvm -machine q35 -smp 6 -bios /usr/share/ovmf/OVMF.fd -mem-prealloc -mem-path /dev/hugepages/qemu -m 24576 -vga virtio -device qemu-xhci -device usb-tablet -device usb-kbd -nic user,model=virtio-net-pci -drive file=/dev/nvme0n1p6,aio=io_uring,index=0,media=disk,format=raw

The results are catastrophic (note the logarithmic scaling). First (left) plot shows the time series of the measured frame times (you'll notice the 10 benchmark iterations). Second (right) plot shows the frame time distribution as exclusive percentiles (read: xx% of the benchmark duration had frame times worse than yy).

Benchmark-2.9.15.9408-W10_KVM.time.pngBenchmark-2.9.15.9408-W10_KVM.dist.png

I'd indeed expect the VM to perform worse than running the server natively. However, the VM repeatedly exhibits frame times in excess of dozens of seconds, even exceeding the configured benchmark duration (5 min per round) as a consequence. Looking at CPU utilization, the native server simply shrugged off the benchmark (busiest core averaged ~30% load), whereas the VM constantly maxed out a single core.

Benchmark-2.9.15.9408-W10-20250419-085425Z.pngBenchmark-2.9.15.9408-W10_KVM+hugepage+isol-20250419-200019Z.png

As an auxiliary performance metric, I measured the system and user CPU time consumed by DCS_server.exe (see benchmark.ps1).

System System CPU Time (s) User CPU Time (s)
Windows 10 (native) 63.9375 1135.78125
Windows 10 (Linux QEMU+KVM) 2674.03125 1995.234375

Running virtualized, the system CPU time increased to 31.2 times of the native value. I have no reasonable explanation. The benchmark runs an isolated dedicated server without clients, which should thus require a minimal amount of syscalls (no clients -> no network). This assumption holds true when running DCS_server.exe natively, but not on the VM. Only educated guess I have right now is that the multi-threaded dedicated server might be subject to a significantly higher thread synchronization overhead on VMs.

I'd appreciate feedback from anyone with experience running the dedicated server on VMs:

  1. Can you reproduce a significant performance degradation between native and virtualized execution of the dedicated server on the same hardware?
  2. Do you see any systematic or other possible sources of error within these benchmarks?
  3. Can multi-threading be forcibly disabled in the (other than just pinning DCS_server.exe to a single CPU core)?

I've attached the dcs.log an benchmark.log files if anyone wants to poke at the data. You can find the Python script that generated the plots here.

Benchmark-2.9.15.9408-W10-20250419-085425Z.dcs.log.gz Benchmark-2.9.15.9408-W10-20250419-085425Z.log.gz Benchmark-2.9.15.9408-W10_KVM+hugepage+isol-20250419-200019Z.dcs.log.gz Benchmark-2.9.15.9408-W10_KVM+hugepage+isol-20250419-200019Z.log.gz

Edited by Actium
Posted

Did a quick sanity check on the fundamental performance of the VM vs. the native Windows 10 install via PowerShell:

(Measure-Command { for ($i = 0; $i -lt 10000000; $i++) {} }).TotalSeconds

Just run 10 million loop iterations and measure the total duration. On the native Win 10 it takes about 6.0 seconds and on the VM 7.0 seconds. Well within the performance margin I'd expect. So nothing's outright erroneous with the VM setup itself that'd explain the huge discrepancy of the dedicated server performance.

Posted

As promised, a comparison of QEMU+KVM with and without optimization (1G hugepages and CPU isolation). While the server performance is unusable in either configuration, the optimization up to halves the peak frame time.

benchmark_time.pngbenchmark.png

Unoptimized QEMU commandline:

qemu-system-x86_64 -enable-kvm -machine q35 -smp 6 -bios /usr/share/ovmf/OVMF.fd -m 24576 -vga virtio -device qemu-xhci -device usb-tablet -device usb-kbd -nic user,model=virtio-net-pci -drive file=/dev/nvme0n1p6,aio=io_uring,index=0,media=disk,format=raw

Benchmark-2.9.15.9408-W10_KVM+hugepage+isol-20250419-200019Z.log.gz Benchmark-2.9.15.9408-W10_KVM-20250418-143902Z.log.gz

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...