Jump to content

memory reserved by spawning units is never deallocated leading to server crashes


Recommended Posts

TL;DR: when a mission spawns or clones a unit, memory is allocated. This memory is never deallocated, even on group/unit destroy, explode or removeJunk. Thus over time any mission that creates units will crash either from memory starvation or from hitting the RegMapStorage cap of 4094 groups.

This investigation started after we noticed the excellent Pretense mission was dying after a few hours on our hosted server. By profiling the memory usage of the DCS_Server process using Process Lasso logging, we noticed that the VM (private bytes memory) continued to increase regardless of unit destruction, client join/leave or any other factor. 

image.png?ex=6627b037&is=66153b37&hm=aed

The control was performed with a basic ME mission with a single client helo and no AI. VM usage was flat.
The logical next step was then to write a script that spawned new units at a constant rate, and destroyed them on a cadence. If VM was deallocated, we would expect to see a sawtooth pattern. If it wasn't, we would expect to see a relatively straight incline up. 

image.png?ex=662932a8&is=6616bda8&hm=0b9

The mission starts with 2 minutes of no spawning to establish a baseline with all scripts loaded. It is virtually flat. Once spawning starts (1 group of 5 vehicles every 3 seconds), VM rises steadily and does not reduce with destroy() being called on all ground objects every 60 seconds. When the spawn rate was increased to 1 group of 10 units every second, as expected the rate of VM accrual increased and VM was never deallocated. With a mission (not server) restart, no (effective) deallocation was performed as the VM restarts higher than when the mission was restarted.

The next step was to also removeJunk around the spawn Zone to clean up as much as is permitted in the scripting environment. The same spawn parameters were used, with an additional step of calling removeJunk on a sphere 2x the size of the spawn zone every 5 minutes. 

image.png?ex=6629cfca&is=66175aca&hm=c4a

In this test, VM continued the trend of accruing with no reduction due to destroy() or removeJunk().  It continued to accrue VM until the server crashed with the following error:

2024-04-11 03:15:28.743 WARNING EDOBJECTS (Main): RegMapStorage start cycle to find empty space in <viColumn>
2024-04-11 03:15:28.743 ERROR   EDOBJECTS (Main): RegMapStorage has no more IDs (4094 max) in <viColumn>
2024-04-11 03:15:28.743 ERROR   EDOBJECTS (Main): Failed assert `false` at Projects\edObjects\Source\Registry\RegMapStorage.cpp:124

This effectively caps the life of any mission that creates units on the fly based on the server RAM and the RegMapStorage cap of 4094. At 1 group per second being created, we'd expect to hit this cap at ~4094 seconds, or about 1.1 hours. This is basically exactly what we saw, despite all units in those groups being both destroyed and junk removed.

A working theory is that this behaviour was introduced with the Apache FCR in order to allow destroyed vehicles to still be seen by the FCR. That said, others have said anecdotally that this predates the Apache. The effect is that dynamic missions require a regular server (not mission) restart to prevent this VM saturation, on a cadence that depends on the rate at which new units are spawned by the mission, and the total server RAM.

Request that ED advise or investigate a solution that allows mission editors to purge the VM allocation of "dead but still there objects" in areas that are no longer relevant to the mission to prolong the life of the mission/server. While this may have other effects e.g. rendering them invisible to FCR, the alternative (a server crash or restart) is surely not better.

Log data and charts:
https://docs.google.com/spreadsheets/d/1p0tKoeipHJOaChhKnzNKjLD30maU3xKCvJH5o4quBY0/edit?usp=sharing

(edit: if anyone wants to help me replace the MOOSE functions with stock to eliminate that potential source of error, please do!)

edit2 another data point using trigger.action.explosion rather than destroy(). Same issue.

RnR_Memtest_Syria.miz

image.png

update: tested a different miz with no MOOSE (thanks cfrag), which creates 7 groups per second and destroys the group rather than the individual vehicles. Same VM accrual, and right on cue at ~590 sec (4094 / 7) it starts spamming RegMapStorage full.

 

 

image.png


Edited by ruprecht
  • Like 3

DCS Wishlist: | Navy F-14 | Navy F/A-18 | AH-6 | Navy A-6 | Official Navy A-4 | Carrier Ops | Dynamic Campaign | Marine AH-1 |

 

Streaming DCS sometimes:

Link to comment
Share on other sites

Thank you for this very interesting and detailed analysis - it confirms what I have speculated about (and is the reason why I tend to have missions save and then re-start. I'm hoping that a mission re-start frees the memory occupied by units). I'm looking forward to ED handling this quite serious bug (a massive memory leak).

 

2 hours ago, ruprecht said:

This effectively caps the life of any mission that creates units on the fly

For the life of me I can't figure out why you filed this important bug as a Mission Editor bug - it doesn't affect ME but, much more importantly, the game core engine itself. 

Great work, thank you, and hopefully this is going to be tackeld soon.


Edited by cfrag
  • Like 1
Link to comment
Share on other sites

8 minutes ago, cfrag said:

For the life of me I can't figure out why you filed this important bug as a Mission Editor bug

In my head, it's in a category at the top of the page for visibility but it won't get buried as quickly as it might in some other performance/general thread.

*shrug*

DCS Wishlist: | Navy F-14 | Navy F/A-18 | AH-6 | Navy A-6 | Official Navy A-4 | Carrier Ops | Dynamic Campaign | Marine AH-1 |

 

Streaming DCS sometimes:

Link to comment
Share on other sites

29 minutes ago, cfrag said:

I'm hoping that a mission re-start frees the memory occupied by units)

Unfortunately a mission restart doesn't seem to. You can see it halfway through the second chart. Only a server restart cures it.

DCS Wishlist: | Navy F-14 | Navy F/A-18 | AH-6 | Navy A-6 | Official Navy A-4 | Carrier Ops | Dynamic Campaign | Marine AH-1 |

 

Streaming DCS sometimes:

Link to comment
Share on other sites

2 hours ago, ruprecht said:

The working theory is that this behaviour was introduced with the Apache FCR in order to allow destroyed vehicles to still be seen by the FCR. The effect is that dynamic missions require a regular server (not mission) restart to prevent this VM saturation, on a cadence that depends on the rate at which new units are spawned by the mission, and the total server RAM.

This is the behavior since DCS exist, or at least since I have memory when I start to make server back in the 2017, nothing related to the Apache FCR. That's why we are usual to perform a complete stop/start of our DCS server instances every 6 hours.

Below you can see memory trend of our DCS server back in 2022:

image.png

It's a good analysis, something interesting for a deep dive for ED team ( @c0ff ), but I guess it will not something that will be solved in short time.

  • Like 2
  • Thanks 1

FlighRIG => CPU: RyZen 5900x | RAM: 64GB Corsair 3000Mhz | GPU: nVIDIA RTX 4090 FE | OS Storage: SSD NVMe Samsung 850 Pro 512GB, DCS Storage: SSD NVMe Sabrent 1TB | Device: Multipurpose-UFC, VirPil T-50, TM WARTHOG Throttle, TrackHat, MFD Cougar with screen.

Our Servers => [ITA] Banshee | Krasnodar - PvE | PersianConquest PvE Live Map&Stats | Syria Liberation PvE Conquest

Support us on twitch subscribing with amazon prime account linked, it's free!

Link to comment
Share on other sites

Here's a mission that, once per SECOND (pls ignore the silly misleading name) destroys (if they exist) and then allocates 7 groups of 17 ground vehicles.

This will exhaust the storage after a while (some 580 seconds until RegMapStorage runs out, which can only store 4094 entries). After a while, another table (with 65534 limit) runs out.

ERROR   EDOBJECTS (Main): RegMapStorage has no more IDs (4094 max) in <viColumn>

and

EDOBJECTS (Main): RegMapStorage has no more IDs (65534 max) in <viWorldHeavyObject>

Here's the miz that reliably re-creates the issue on local server:

 

clone once a minute.miz


Edited by cfrag
  • Like 3
Link to comment
Share on other sites

Posted (edited)
6 hours ago, Maverick87Shaka said:

This is the behavior since DCS exist, or at least since I have memory when I start to make server back in the 2017, nothing related to the Apache FCR. That's why we are usual to perform a complete stop/start of our DCS server instances every 6 hours.

Maybe, though it definitely seems more extreme lately. Just shrugging and accepting a 6, or 4, or 2 hourly restart isn't something everyone is relaxed about. 

In any case, there's hard data here that the problem is directly affecting customers of commercial hosting providers so I'd suspect there is some incentive to want this finally hunted down and shacked. Big persistent dynamic sandbox missions like these are the raison d'etre for commercial hosting. 

If it's a leak because some dev in 2008 missed a pointer delete somewhere, it's about time it was fixed. If it's an intentional architectural decision to persist these groups, the impact needs to reconsidered and alternative designs explored.

6 hours ago, Maverick87Shaka said:

Below you can see memory trend of our DCS server back in 2022

It's interesting, but without knowing any details about the mission running and how it is spawning and destroying units, it's hard to draw any conclusions from it.


Edited by ruprecht
  • Like 1

DCS Wishlist: | Navy F-14 | Navy F/A-18 | AH-6 | Navy A-6 | Official Navy A-4 | Carrier Ops | Dynamic Campaign | Marine AH-1 |

 

Streaming DCS sometimes:

Link to comment
Share on other sites

  • 2 weeks later...
  • 5 weeks later...
On 4/11/2024 at 1:04 PM, cfrag said:

Here's a mission that, once per SECOND (pls ignore the silly misleading name) destroys (if they exist) and then allocates 7 groups of 17 ground vehicles.

This will exhaust the storage after a while (some 580 seconds until RegMapStorage runs out, which can only store 4094 entries). After a while, another table (with 65534 limit) runs out.

ERROR   EDOBJECTS (Main): RegMapStorage has no more IDs (4094 max) in <viColumn>

and

EDOBJECTS (Main): RegMapStorage has no more IDs (65534 max) in <viWorldHeavyObject>

Here's the miz that reliably re-creates the issue on local server:

 

clone once a minute.miz 174.6 kB · 1 download

 

@Flappie I think you were not around when this one was posted but it would be great if you can try to reproduce the issue with cfrag's miz.

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...