-
Posts
163 -
Joined
-
Last visited
Content Type
Profiles
Forums
Events
Everything posted by Actium
-
This will definitely happen if your computer is properly suspended for at least 30 minutes, because the DCS client process will not be executing and thus unable to communicate with the DCS master server, which causes the master server to forget its session id. For details, see here (same thread as I posted above). Windows may however decide to do some suspension voodoo that'll cause periodic process execution, e.g., thru Connected Standby. Just hypothesizing. For me, a 30+ minute suspension always results in an expired session termination.
-
That's a bit like blaming the road if your car's engine stops every time you hit a pot hole. Not an ideal situation, but a proper car should be able to deal with it. The DCS client and master server should simply be more resilient to less than ideal internet connections. Even if a session check with the master server fails, the client should try to reauthenticate Instead of just terminating. That should be a relatively straightforward change that'll make a lot of paying customers less frustrated. I've litigated the issue in detail here (first post updated since you reported it to the team):
-
I've reported this issue in February. It's been wishlisted. Easy workaround would be to enter offline mode before starting the mission. If you forgot to do that, there's another use at your own risk workaround: Before suspending your computer, add a firewall rule or invalid network route that prevents further communication with the DCS master server. While this sounds counterintuitive, DCS will only terminate itself after it reconnects to the master server. It will happily live on without the master server, at least for a couple of hours. Haven't tried that overnight, yet. Update: Tested with the dedicated server, which runs for more than 3 days after losing connection to the master server. I'd assume the client will also.
-
Thanks. Glad to know this is of use to others. install_dcs.sh has been setting Windows 10 as the Windows version via wine winecfg -v win10 since the initial commit. I guess DCS is just getting a bit confused by your Wine version/installation. Curiously, I'm unable to reproduce the warnings. Line 5 of my server dcs.log correctly identifies the Windows version: 2025-06-12 15:20:31.344 INFO APP (Main): DCS/2.9.16.10523 (x86_64; MT; Windows NT 10.0.18362) Just ran install_dcs.sh on an empty WINEPREFIX (.wine directory). Couldn't reproduce your error, either. This is my output (full log install_dcs.sh.log) : ... + export WAYLAND_DISPLAY=wayland-1 + wine winecfg -v win10 0050:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hr 0x80004002 0050:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, hr 0x80004002 0050:err:ole:apartment_get_local_server_stream Failed: 0x80004002 0050:err:ole:start_rpcss Failed to open RpcSs service 0048:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hr 0x80004002 0048:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, hr 0x80004002 0048:err:ole:apartment_get_local_server_stream Failed: 0x80004002 wine: configuration in L"/home/dcs/.winetest" has been updated. + cat + wine reg import /home/dcs/.winetest/drive_c/wineconfig.reg + rm /home/dcs/.winetest/drive_c/wineconfig.reg ... Did you follow the installation instructions? Anything else relevant to reproducing your error? Your error message wine: could not load kernel32.dll, status c0000135 sounds like sth. is wrong with the wine prefix/installation.
-
I used my server benchmark framework to compare the performance of the three most recent dedicated server releases and found no statistically significant differences on Caucasus. Total CPU usage is also almost equal: Benchmark-20250531-2.9.15.9599.log.gz # STATS: {"CPU_TIME_USER":1745.796875,"CPU_TIME_SYSTEM":102.546875} Benchmark-20250531-2.9.16.10532.log.gz # STATS: {"CPU_TIME_USER":1772.40625,"CPU_TIME_SYSTEM":102.609375} Benchmark-20250531-2.9.16.10973.log.gz # STATS: {"CPU_TIME_USER":1822.828125,"CPU_TIME_SYSTEM":103.625} Raw measurement data attached, if anyone wants to take a closer look. Benchmark-20250531-2.9.15.9599.log.gz Benchmark-20250531-2.9.16.10532.log.gz Benchmark-20250531-2.9.16.10973.log.gz
- 5 replies
-
- 1
-
-
- lag spikes
- lag
-
(and 1 more)
Tagged with:
-
Dedicated Server Performance Degraded on Virtual Machines?
Actium replied to Actium's topic in Multiplayer Bugs
So just to clarify, the dedicated server does work for me on a Linux QEMU+KVM VM. Simple mission with a few dozen units are no problem whatsoever. However, once throwing too many units at each other, the server will perform far worse on the VM than natively on Windows. After noticing that, I started digging, i.e., benchmarking, and figured that a performance degradation by more than two orders of magnitude is far beyond the performance overhead of a VM one would reasonably expect. Because another KVM user (thru Proxmox) also has performance issues, this may be a KVM-related like @_UnknownCheater_ said. Unfortunately, I don't have the time right now to set up objectively comparable benchmarks with other hypervisors. -
Dedicated Server Performance Degraded on Virtual Machines?
Actium replied to Actium's topic in Multiplayer Bugs
I'll adapt my benchmarking code to use .getRealTime(). Real-time sounded too much like wallclock time and therefore not monotonic. Any documentation available on the new API namespace? I read about DCS.getModelTime() in DCS_Client/API/DCS_ControlAPI.html, which does not mention the Sim namespace. Neither does the Singletons page. How did you arrive at that conclusion? Both getRealTime() and getModelTime() definitively have sub-second resolution. DCS.getModelTime() returns 3 non-zero decimal places and DCS.getRealTime() returns 6 non-zero decimal places. The attached log files contain the raw difference of getModelTime() return values, see Benchmark.lua: function benchmark.onSimulationFrame() benchmark.logfile:write(string.format("%.0f\n", (now - benchmark.prev_frame) * 1e3)) benchmark.prev_frame = now end The plots in my first post (native Windows performance) clearly show that the values have millisecond resolution and no higher level of quantization. See the steps in the distribution function. Thank you for the suggestions. Already had prealloc (on 1G hugepages), KVM, and asynchronous IO enabled (see first post). I'm using IO_uring instead of native AIO, as their performance is comparable and I have experience with io_uring. I added the other options you suggested. Here both for comparison: # old QEMU options (r2) qemu-system-x86_64 -bios /usr/share/ovmf/OVMF.fd \ -enable-kvm -machine q35 -smp 6 \ -mem-prealloc -mem-path /dev/hugepages/qemu -m 24576 \ -vga virtio -device qemu-xhci -device usb-tablet -device usb-kbd \ -nic user,model=virtio-net-pci \ -drive file=/dev/nvme0n1p6,aio=io_uring,index=0,media=disk,format=raw # new QEMU options (r3) qemu-system-x86_64 -bios /usr/share/ovmf/OVMF.fd \ -enable-kvm -machine q35,accel=kvm -cpu host,topoext,kvm=on -smp 6,sockets=1,cores=3,threads=2 \ -mem-prealloc -mem-path /dev/hugepages/qemu -m 24576 \ -vga virtio -device qemu-xhci -device usb-tablet -device usb-kbd \ -nic user,model=virtio-net-pci \ -drive file=/dev/nvme0n1p6,aio=io_uring,cache=none,index=0,media=disk,format=raw Reran the benchmark (with the same DCS version, obviously). Results attached. Presumably, the most important change was passing through the host CPU. Thanks again, I missed that entirely. Unfortunately, the performance improved only slightly. However, the CPU usage within the VM went down noticeably: Total CPU usage did not change significantly, particularly the system CPU time: System System CPU Time (s) User CPU Time (s) Windows 10 (native) 63.9375 1135.78125 Windows 10 (KVM, r2) 2674.03125 1995.234375 Windows 10 (KVM, r3) 2524.5625 1625.78125 Found this deep dive on QEMU+KVM resource isolation. Will do some more digging once I find the time. Do you have any additional info on that, preferably comparable benchmark results? I believe my QEMU config should be fairly comparable to an optimized proxmox setup, so I'd expect similar results. Even when avoiding the virtualization overhead and running the DCS dedicated server on Linux via Wine, the performance is a little worse than native Windows 10 as of upstream Wine 10.0 (see benchmark results). Benchmark-2.9.15.9408-W10_KVM_r3-20250426-125439Z.log.gz -
Recommended spec for dedicated server?
Actium replied to Lace's topic in Multiplayer Server Administration
@Potato Is that a VM? 12 dedicated cores and 32 GB RAM for an 84 core CPU with 12 memory channels sounds very much like it. The current dedicated server implementation relies primarily on its main thread. Actual performance will depend on the mission (number of active units) and the peak single thread performance you get from the CPU. If it's a VM, the hosting provider may have opted to disable the turbo to improve overall stability. Unfortunately, there's no rule of thumb regarding DCS dedicated server performance. For optimum performance, rent a physical server with a CPU with very high single-thread performance. Otherwise, find a VM hosting provider with no setup fee and hourly billing, so you can simply give VMs a shot with the most complex mission you plan on running. For intuitive, real-time server performance monitoring, I've written FPSmon.lua, which will warn in global chat if the performance (server simulation frame rate) hits configurable thresholds. -
Dedicated Server Performance Degraded on Virtual Machines?
Actium replied to Actium's topic in Multiplayer Bugs
As promised, a comparison of QEMU+KVM with and without optimization (1G hugepages and CPU isolation). While the server performance is unusable in either configuration, the optimization up to halves the peak frame time. Unoptimized QEMU commandline: qemu-system-x86_64 -enable-kvm -machine q35 -smp 6 -bios /usr/share/ovmf/OVMF.fd -m 24576 -vga virtio -device qemu-xhci -device usb-tablet -device usb-kbd -nic user,model=virtio-net-pci -drive file=/dev/nvme0n1p6,aio=io_uring,index=0,media=disk,format=raw Benchmark-2.9.15.9408-W10_KVM+hugepage+isol-20250419-200019Z.log.gz Benchmark-2.9.15.9408-W10_KVM-20250418-143902Z.log.gz -
@Noisy Thanks. Love the xdotool idea. I'm fully committed to Wayland though. Particularly, because Wine 10 has enabled Wayland be default. It should certainly be possible to implement that with the Wayland tools and subsequently get rid of XWayland. Could be possible via ext-foreign-toplevel-list-v1 to immediately punch in the credentials once the Login window pops up. Unfortunately, the extension lacks the ability to explicitly bring windows to the front. I'll look into that down the road. Thanks for sharing. I've had a look. I'm surprised by what PowerShell is capable of (i.e., press buttons chosen by name). Alternatively, you could install directly via DCS_updater.exe invocation as I do it in install_dcs.sh. In terms of performance I have already packaged ntsync for straightforward deployment on Debian. The Wine ntsync patch is still being worked on upstream and has not yet been merged. Dunno if I'll wait for it to be merged before toying around with it. However, I just ran a bunch of dedicated server benchmarks on different platforms. It appears to suffer severely when running the server on a Windows 10 VM with QEMU+KVM as the hypervisor (frame time degrades by 3 orders of magnitude). Using my Benchmark.lua and Benchmark_150_v2024.11.23.miz, I found that Wine 10.0 performs almost on par with running natively in Windows 10 on my Ryzen 9 5900X, whereas the Windows VM has a performance excursion that renders it entirely unusable (server simulation frame times in the order of a minute). See the plotted results for 10 rounds of 5 minutes each (or more). Windows goes nuts in terms of system CPU time consumption. WIth Wine it's also significantly elevated compared to Windows, but I'd expect that to improve with ntsync. System System CPU Time (s) User CPU Time (s) Windows 10 (native) 63.9375 1135.78125 Wine 10.0 (no VM) 460.45 1654.63 Windows 10 (Linux QEMU+KVM) 2674.03125 1995.234375 I haven't made any benchmarks of Wine running in a VM. That'll probably have a noticeable impact, too, but currently I have no clue where it'll end up relative to the Windows VM performance. As of right now, I'd recommend to use Wine (possibly in containers, but not VMs) instead of Windows VMs, as far as performance is concerned. Of course, running the dedicated server on a physical Windows box still has a performance edge over Wine, particularly when missions get more demanding (see older results with the Benchmark_200.miz). I've attached the raw benchmark log files for anyone to have a closer look. Benchmark-2.9.15.9408-W10-20250419-085425Z.log.gz Benchmark-2.9.15.9408-W10_KVM+hugepage+isol-20250419-200019Z.log.gz Benchmark-2.9.15.9408-Wine10-20250420-123953Z.log.gz
-
Dedicated Server Performance Degraded on Virtual Machines?
Actium replied to Actium's topic in Multiplayer Bugs
Did a quick sanity check on the fundamental performance of the VM vs. the native Windows 10 install via PowerShell: (Measure-Command { for ($i = 0; $i -lt 10000000; $i++) {} }).TotalSeconds Just run 10 million loop iterations and measure the total duration. On the native Win 10 it takes about 6.0 seconds and on the VM 7.0 seconds. Well within the performance margin I'd expect. So nothing's outright erroneous with the VM setup itself that'd explain the huge discrepancy of the dedicated server performance. -
Dedicated Server Performance Degraded on Virtual Machines?
Actium posted a topic in Multiplayer Bugs
TL;DR: I ran a dedicated server benchmark on Windows 10 both natively as well as virtualized (Linux QEMU+KVM). The VM reproducibly performs worse than the native server by more than 3 orders of magnitude: The VM yields median frame times >1000 times of native execution on the same hardware. This has me doubt the validity of these results. Therefore, I'd appreciate it if someone else could independently verify these findings. I wrote Benchmark.lua to get some performance figures when running the DCS dedicated server on Linux. It will repeatedly run a simple benchmark mission for a configurable duration and log simulation frame times. I finally got around to running a benchmark on my local rig (Ryzen 9 5900X, 64G DDR4 3600) both natively on my regular Windows 10 install as well as on a fresh Windows 10 install running inside a VM (QEMU 10.0 + KVM on Linux 6.12.22). Of course, benchmark mission and server configuration are identical. I've tried my best to give the VM a fighting chance: Reserved an entire CPU CCD for exclusive use by the VM, used raw partition on same physical NVMe SSD (not just a disk image on a Linux file system), and pre-allocated the VM memory on 1G hugepages (to avoid page faults). These optimizations do make a difference (more on that in a future post). Here's my QEMU command line for reference and peer review: qemu-system-x86_64 -enable-kvm -machine q35 -smp 6 -bios /usr/share/ovmf/OVMF.fd -mem-prealloc -mem-path /dev/hugepages/qemu -m 24576 -vga virtio -device qemu-xhci -device usb-tablet -device usb-kbd -nic user,model=virtio-net-pci -drive file=/dev/nvme0n1p6,aio=io_uring,index=0,media=disk,format=raw The results are catastrophic (note the logarithmic scaling). First (left) plot shows the time series of the measured frame times (you'll notice the 10 benchmark iterations). Second (right) plot shows the frame time distribution as exclusive percentiles (read: xx% of the benchmark duration had frame times worse than yy). I'd indeed expect the VM to perform worse than running the server natively. However, the VM repeatedly exhibits frame times in excess of dozens of seconds, even exceeding the configured benchmark duration (5 min per round) as a consequence. Looking at CPU utilization, the native server simply shrugged off the benchmark (busiest core averaged ~30% load), whereas the VM constantly maxed out a single core. As an auxiliary performance metric, I measured the system and user CPU time consumed by DCS_server.exe (see benchmark.ps1). System System CPU Time (s) User CPU Time (s) Windows 10 (native) 63.9375 1135.78125 Windows 10 (Linux QEMU+KVM) 2674.03125 1995.234375 Running virtualized, the system CPU time increased to 31.2 times of the native value. I have no reasonable explanation. The benchmark runs an isolated dedicated server without clients, which should thus require a minimal amount of syscalls (no clients -> no network). This assumption holds true when running DCS_server.exe natively, but not on the VM. Only educated guess I have right now is that the multi-threaded dedicated server might be subject to a significantly higher thread synchronization overhead on VMs. I'd appreciate feedback from anyone with experience running the dedicated server on VMs: Can you reproduce a significant performance degradation between native and virtualized execution of the dedicated server on the same hardware? Do you see any systematic or other possible sources of error within these benchmarks? Can multi-threading be forcibly disabled in the (other than just pinning DCS_server.exe to a single CPU core)? I've attached the dcs.log an benchmark.log files if anyone wants to poke at the data. You can find the Python script that generated the plots here. Benchmark-2.9.15.9408-W10-20250419-085425Z.dcs.log.gz Benchmark-2.9.15.9408-W10-20250419-085425Z.log.gz Benchmark-2.9.15.9408-W10_KVM+hugepage+isol-20250419-200019Z.dcs.log.gz Benchmark-2.9.15.9408-W10_KVM+hugepage+isol-20250419-200019Z.log.gz -
Unfortunately, DCS 2.9.15.9408 just broke the a_do_script() return value pass-thru. Thus, code run in the *a_do_script environment will always yield an empty string as the result, until the issue is fixed by ED. I sincerely hope a fix will come quickly, so I won't have to bring back the previous, cumbersome workaround, which used a temporary file to pass along the serialized return value.
-
DCS 2.9.13.6818 introduced return value pass-thru for a_do_script() and a_do_file(): This worked up to and including 2.9.14.8394. However, DCS 2.9.15.9408 fixed a segfault within that function: Unfortunately, now the return value pass-thru does not work any more. Presumably a regression introduced by above mentioned fix. Steps to reproduce: Run the following code in the hooks (gui) scripting environment, e.g., using my WebConsole.lua: return {net.dostring_in("mission", "return a_do_script('return 42')")} Since 2.9.15.9408, the return value is: ["", true] However, the return value should be: ["42", true] This issue was first reported here by @MarcosR. P.S.: @ED Please implement proper unit testing in your CI/CD pipeline! The described regression seems like a bug that could have been easily caught by a very simple unit test.
- 1 reply
-
- 3
-
-
-
Unofficial: https://wiki.hoggitworld.com/view/Running_a_Server#AutoExec.cfg Dunno. But start.exe has the /affinity parameter that you could use for that. Works similarly with PowerShell.
-
Force server to use certain public IP?
Actium replied to Omelette's topic in Multiplayer Server Administration
I'd keep that system air gapped or better yet, scrap it. Windows Server 2008 has been without security updates for several years now, unless you're on Premium Assurance. -
Exactly. I'd consider it a no-brainer to fix an issue that'll boot all clients and brick all servers if the master server takes a break. Good point. But still shouldn't be a problem to track these. So that's once a month on average. I'd call that quite often. Too often to forcibly terminate all currently logged in sessions, IMHO.
-
Universal, straightforward solution: If the master server responds 401 – for whatever reason, e.g., an expired token – the client (or dedicated server) should simply try to log in again in the background (no interaction required), instead of terminating itself. The regular 3 day grace period should start automatically (i.e., no popups that need manual acknowledgment). Only after it expires without ever reaching the master server again, the client (or dedicated server) may terminate itself. The grace period should reset after successful (re-)login with the master server. The current situation where when the network goes down on either end, all affected clients (or dedicated servers) will terminate themselves, results in a terrible user experience, which should IMHO be of utmost concern. Good user experience, leads to customers willing to spend money on the product, which is presumably the primary revenue source. An expired session token, which could simply be renewed by logging in automatically in the background, as suggested above, is no justification to unceremoniously terminate the game. Irrelevant wrt. above suggested solution, but just for the sake of argument: Why not? A session token should take no more database space than a properly salted and hashed password. I see no technical difference between keeping session tokens for up to 30 minutes (should be in non-volatile memory, anyway, for the sake of master server crash resilience) or retaining the most recent session token for each account, indefinitely.
-
Yeah. Nothing wrong with your computer, just the ED master server that took a nap.
-
Recommended spec for dedicated server?
Actium replied to Lace's topic in Multiplayer Server Administration
You do need to install the terrains on the dedicated server using the updater, as detailed here. -
@Toumal Short of resorting to sslplit, having a look at the communication with the master server may shed some light on the communication with the master server. I ran the tcpdump for the last couple of days to see whether anything stands out (run as root, adjust eth0 as necessary): tcpdump -ni eth0 "host 185.195.197.4" Already found out the master server took a nap for half an hour shortly after midnight UTC. That resulted in symptoms very similar to what you describe. Maybe worth a shot to run tcpdump and compare its output and the dcs.log to what I posted here.