[BDS-17527] Multiple server crashes due to memory leak when loading chunks

Server is running mostly default survival settings except for simulation distance of 6 and chunk render distance of 32. Has been stable for the past year without issue until around 1 month ago. At first we believed the crashes to be caused by entering portals too quickly (going from nether to overworld, then overworld to end portal fast) since some of the crashes were occurring under those conditions, but we have also been getting crashes throughout normal gameplay, when performing normal actions such as placing blocks or even walking around. The crashes occur in multiple areas, overworld, nether, and end. There doesn't seem to be a specific chunk or groups of chunks causing this since the same areas may be fine one moment, then crash at another. As of now the server will crash at least 4-5 times a day if people are active. Checking memory usage showed a steady usage of around 700MB our of the 3GB allocated, so there doesn't seem to be any issue there.

Made the bug public and removed the world file since I confirmed the memory leak occurs with a fresh world. Also adding some extra info from my comments below:

After further testing this seems to be a memory leak related to chunk loading. If you check the memory while standing still, not loading chunks, it should be stable. When you load chunks, whether it's via crossing portals, flying, etc. the memory use increases and not come back down, which it should not be doing. Running farms while standing still did not affect the memory usage for us. Passive mob farms, redstone mechanisms, etc. did not contribute to the memory leak. Here's a log of our memory use up until the server crashed again.

Crash reports are fairly limited in information, but here are some of them attached below.

Linked issues

is duplicated by 6

BDS-17519 Server Crash when loading chunks Resolved

BDS-17567 BDS Memory Leak in Linux 1.19.21.01 Resolved

BDS-17765 Bedrock server crashes without any Resolved

BDS-17832 Server suddenly crash Resolved

BDS-18156 Suddenly the server spits out an error and stops. Resolved

1 more links

Attachments

bds-smaps-15teleportevents.txt

bds-smaps-launched.txt

bds-smaps-userloggedin.txt

server_log.txt

server.properties

server-1.properties

syrupy_20220827204110.ps.log

TRS.2022.08.23.20.17.37.log

TRS.2022.08.24.09.55.26.log

TRS.2022.08.24.14.51.04.log

TRS.2022.08.25.04.00.41.log

TRS.2022.08.26.17.05.38.log

TRS.2022.08.27.04.00.42.log

TRS.2022.08.27.17.25.05.log

Comments 55

Maciej Piornik 2022-08-22T14:23:24Z

Were there any changes made to server.properties? Any files from root BDS folder deleted? Have you tried reinstalling BDS from new download?
Can you please look at BDS-17453 and tell us if it relates to your issue?

This ticket will automatically reopen when you reply.

MaladjustedPlatypus 2022-08-22T16:46:29Z

I don't think this is related to BDS-17453 since there is a permissions.json file present. Either way the server crashes are not on startup, they're after some time playing, which can vary dramatically, not sure if that's relevant to the script-watchdog setting.
Attached here is the server properties file. Not sure what would cause an issue there, thought it might be outdated since I do not see a script-watchdog setting.

[media]

MaladjustedPlatypus 2022-08-22T16:57:26Z

Sorry for the duplicate comment, can't seem to add another attachment via editing existing comments. Here's a screenshot of the server directory. If anything I'm seeing more files than what is normally present so I'll back up the server files, download a fresh install and try again.

[media]

MaladjustedPlatypus 2022-08-27T22:13:09Z

After further testing this seems to be a memory leak related to chunk loading. If you check the memory while standing still, not loading chunks, it should be stable. When you load chunks, whether it's via crossing portals, flying, etc. the memory use should increase and not come back down. Running farms while standing still did not affect the memory usage for us. Passive mob farms, redstone mechanisms, etc. did not contribute to the memory leak. Here's a log of our memory use up until the server crashed again.

[media]

^{Edit: Tested with a fresh world on the server, issue still persists.}

glenb711 2022-08-28T15:15:59Z

My ticket was merged here, so I'm adding my comments here as requested:

There appears to be a significant memory leak in BDS for Linux. The bedrock_server process continues to grow its memory utilization whenever any activity occurs, and appears to not ever release that memory. Utilization continues to increase until all available RAM on the host is exhausted, at which point either swap kicks in (resulting in host paging/thrashing), or the process is killed by the OOM killer in the kernel, which of course releases all the memory but terminates the BDS service. Either situation results in all players being forcibly disconnected, and could result in database corruption.

The problem appears to be related to loading chunks. It is exacerbated when teleporting. Teleporting to a distant location can trigger significant memory allocations (on the order of 10MB per second) making this problem easier to duplicate. It also exposes an additional aspect of this bug, which I will describe below.

Steps to duplicate:
1. Fresh install a current Ubuntu LTS version on a bare metal or virtual server.
2. Download BDS 1.19.21.01. No mods, blank world. Start the process running.
3. Observe utilization of about 300MB initially.
4. Connect to game. Note that RAM increases slightly (that's expected of course.)
5. Teleport to a distant location (e.g. 2000, 80, 2000).
6. Note that RAM increases by 10-20MB within seconds.
7. Teleport back to origin. Wait.
8. Note that RAM is not released.
9. Teleport to the same distant location.
10. Observe that RAM again increases by 10-20 MB within seconds.
11. Repeat steps 7-10. Observe that RAM continues to increase.
12. Disconnect from game. Wait for hours. Observe that allocated RAM is never released.

I used an Azure server with 2 CPUs, 4GB of RAM, and 32GB of disk, but this has also been tested on larger server configurations with the same result. I used Ubuntu 20 LTS as directed, but I also tested this under OpenSuse 15.3, with the same result.

Following the above steps, I was, within the space of 10 minutes, able to more than double the process' RAM allocation to above 700MB. Clearly I could have continued teleporting, back and forth, until I brought the server down from memory exhaustion.

This highlights a number of points:

1. As C++ has no automated garbage collection mechanism, save for very limited scope-exit recoveries, it is necessary for the program to track and release its own memory. This clearly is not happening. An idle server with no players on it should detect this condition and release unused memory. A chunk which is no longer in use after a period of time should be written back to disk and similarly released from memory. Neither of these things are happening.

2. If you're building an in-memory copy of loaded portions of the database (as is clearly the case here), an in-core index or other similar data structure should be used to track which chunks are already loaded, and point functions back to them. This clearly is also not happening. Only two chunks (or areas) were being visited in my test: 0,0,0 (the spawn point, or near to it), and 2000,80,2000. Yet, each visit to those same chunks in either direction, caused additional RAM to be alloc'ed by the process. Each teleport required, in my case, an additional 10-20MB of RAM to complete. This suggests that multiple copies of the same chunk(s) were being maintained in RAM (clearly without the process realizing it), which amplifies the leak: Not only is RAM not being released, but data is being duplicated in RAM, causing growth to expand at least geometrically. This may be related to why memory is not being freed: If the process isn't tracking which memory it's allocated, and loses the pointer to the allocated memory block(s), it CANNOT release them. That type of thing seems to be happening here.

3. This obviously exposes a potential for a denial-of-service-style attack against a BDS. If a world has either a malicious operator, or if the world has set up (for example) command blocks enabling regular users to teleport, then repeated use of the teleport by players - either maliciously in quick succession - or, simply over time, if the server process continues to run - is guaranteed to speed up memory consumption and hasten the crashing of the server process itself. Again this applies to any in-server actions: the more action, the faster the RAM exhaustion appears to occur.

Note here that this bug is NOT about teleporting itself: Memory usage increases whenever new chunks are loaded via ANY method, and such memory appears to never be released. Teleporting simply speeds up the process and the visibility of the problem. Even with teleporting disabled, BDS on Linux slowly grows in RAM size, and never releases any RAM, eventually exhausting available resources on the server causing a crash of some kind to happen.

I've been off the test instance for an hour as I post this bug, and I firewalled it off so nobody could get in. It's still at 705.64MB - after being unused by anyone for an hour - and will continue that way until it's reset.

45 more comments

glenb711 2022-08-28T15:30:47Z

Just by way of follow-up: 14 hours later my completely idle server process released no memory and is still at 705.64MB.

I note that the author of the other ticket wrote "When you load chunks, whether it's via crossing portals, flying, etc. the memory use should increase and not come back down."

With no disrespect to anyone intended, and with deep apologies, I disagree with that "should". I think that condition is actually a part of the problem. I would suggest instead that when you load chunks [by any method] the memory use should increase, but when chunks are unused/un-accessed for a period of time, they should be written back to disk and released from memory. That is to say, "memory use should increase, and then should come back down after a period of inactivity for unused chunk(s)."

If memory is never released, the process runs the risk of growing to infinity, which is the behavior we're seeing here. Proper memory management is essential to the health of the process.

MaladjustedPlatypus 2022-08-28T17:02:23Z

The use of the word "should" is with regards to someone trying to reproduce the bug, not that said behavior is what one would expect in normal gameplay. If you were to test for the bug, that is what you "should" be seeing. I'll change the word to avoid any more grammar semantics unrelated to the bug.

PeonyGirl13 2022-08-29T22:14:48Z

Same here!!! Random crashes with 100% RAM usage and nobody online.

NavDaz76 2022-09-06T23:53:30Z

We are having the exact same issue. We are running on a server hosted by Nodecraft.

Originally set to 2GB RAM, 2-3 players online and only working in about 500 blocks around spawn we would consistently get to 80-90% usage.
If one of us would venture further or head out and come back then we would quite regularly have the RAM hit 100% and crash.

Upgraded the ram to 4GB earlier, and with only 2 people on we were able to have the server crash due to 100% RAM usage.
As you can see from the image below there are multiple times where there is no-one online and the RAM usage is still excessive. The redlines indicate a server crash, and the blue lines indicate a manual server restart, which is when the RAM is unallocated again.

We are running a primarily vanilla server with some resource and behaviour packs, however these packs are the same that were on the server before the upgrade to 1.19.20, and we have only been experiencing these issues since 1.19.20

[media]

::EDIT::
Have been keeping an eye on the ram usage, and as shown in this picture, the ram is not released by the game until the server is restarted, indicated by the blue mark. Even though there had been no players online for a significant amount of time.

[media]

Omar Berrow 2022-10-14T00:26:36Z

Now seems to affect windows

PlodPlod 2022-11-15T07:05:52Z

I can confirm this affects 1.19.41; in fact it seems significantly worse since we updated our servers to that version.

foxynotail 2022-11-15T18:23:29Z

This affects 1.19.41 too.
Our Nodecraft server resets every 12 hours at 1pm and you can see the issues from the attached graphs.

[media]

foxynotail 2022-11-15T18:25:03Z

It also affects the other server that I run which resets twice per day

[media]

timmyc1983 2022-11-19T12:06:23Z

Have been having this exact issue since I think 1.19.40. Only way to get the RAM down is to restart the server.

[media]

RicoXu 2022-12-29T08:58:25Z

The problem still seems to exist in the latest 1.19.50 update. I am running the server with a new world on a 8GB RAM Windows machine, but the bedrock server used about 6GB RAM after 3-4 players played for a week(I guess mainly because they travel long distances in games and as a result loaded a lot of chunks along the way). I have to manually stop and start the server to make the RAM usage go down again.

DNS Crypt 2022-12-30T18:11:05Z

I've noticed performance take a huge hit recently on latest updates in 1.19.5 series, chunks slow to load when flying with elytra, game locks up sometimes (server side freezing), portals in and out are way slower to load (world gen). I'm assuming memory usage is higher than normal (checked mine and it's 4.8 GB) with just myself connected, but something is definitely not right. Running BDS on Windows, 1-3 players typically.

tct1997t.2 2023-01-06T14:25:06Z

This issue is ongoing as of 1.19.51. With a clean install of ubuntu server and only the Minecraft BDS running with two players, the memory usage started out on server start with around 270 MB then slowly increased over around 4 hours time up to 2.4 GB of memory usage. Both players logout and even after thirty minutes logged out the memory usage was still at 2.4 GB. After testing the memory usage increases at a faster rate when loading chunks, even chunks that were loaded very recently (riding a horse back and forth 500 block distance). However the usage still continues to climb (at a lower rate) even when just standing there. Only restarting the server released the memory.

tyro89 2023-01-08T18:24:54Z

This issue is also impacting my recent deployment, fresh world, and fresh ubuntu server. To make things worse, this issue makes deploying bedrock on AWS with EBS-backed storage extremely problematic as the ever-increasing memory usage leads to swapping, which eats up the server's storage IOPS—rendering the whole instance unresponsive. I'm investigating setting hard memory limits, scheduling daily server restarts, and disabling swapping to support hosting a bedrock server. Which indeed is a bit of a bummer to have to deal with. Hopefully, the issue can be looked at soon, and I am happy to help in any way I can.

Scotty B 2023-01-09T14:07:59Z

We are also seeing this on our newly created world running on Ubuntu 20.04 with 4GB of ram and bedrock server version 1.19.51.01. Restarting of the sever fix the problem for a short while. As members are on the world moving around after about 5 hours block lag is noticed, and at the server has crashed/rebooted almost daily now. We have implanted 6 hour restart to help minimize the problem.

Scotty B 2023-01-11T12:27:33Z

We have dropped our server render distance from 32 to16 and it has helped a lot since each client s now asking the server to generate less chunks. We have client-side-chunk-generation-enabled set to false as well.

Paul Bramhall 2023-01-12T12:18:16Z

I've also been experiencing this more as of late, I've also been monitoring CPU/Memory usage and can confirm the same pattern as others. I've done some tests on my own BDS server too which caused it to crash within 5 minutes by simply teleporting between 2 previously generated areas at a 10 second interval. I'll attach both the output from pidstat, server log (with tp commands prefixed with the time at which they occured) and server.properties. You can clearly see memory utilisation increasing at each teleportation event.

TJ Spann 2023-01-15T18:54:21Z

Happening here as well. 1.19.50

Austin Farmer 2023-02-02T03:27:21Z

Still happening on 1.19.51.01. I have to restart my server constantly to avoid filling up all the allocated RAM.

Jack Richard 2023-02-15T17:26:22Z

This is happening to me, as well. Restarting my server around four times every week or it'll crash on its own. Running latest bedrock server on Ubuntu.

Coleton Watt 2023-02-28T21:44:14Z

The problem continue to exist in the latest update patch 1.19.63.01. Depending on activity my self hosted server crashes more then once a day.

Matthew Dietrich 2023-03-05T23:15:55Z

This has been happening to me as well with my server host had it crash on its own 4 times in one day. It has been crashing on a daily. This issue needs to be resolved as it is having a major effect on my community and their ability to enjoy my server.

Luke 2023-04-01T03:47:44Z

Come on Mojang! This has been happening for 8 MONTHS NOW! That is ALMOST A YEAR and still not fixed. My server crashes every 1-2 hours with 6-10 players online... If this is not fixed all the work I put into my server is lost.

And moderators: can't you do something? like bumping this issue up? I haven't seen a single comment from a dev for a long time since this issue was open.

Matthew Dietrich 2023-04-01T05:12:17Z

This is absolutely absurd things like this are shoved off to the side people are spending their hard earned money for their servers and yall are just ignoring a problem with your software and your game. You people need to get off your lazy ***** and get this resolved or I think you should have to reimburse every bds owner partial of nlot all the money for their servers since this has gotten to the point of sheer neglect by the dev team

Luke 2023-04-01T05:43:38Z

@Matthew Dietrich EXACTLY! As a temporary solution, i suggest moving your BDS server to realm. It has 10 player limitation, but at least no crashes that i'm aware of.

Scotty B 2023-04-03T22:14:12Z

Our server is still having the issue. Our work around that has help us, is setting server render distance to 16, and we do 4 restarts a day. Every once in a while the server starts to lag just before the restart, but it is manageable . If we increase render distance or reduce the number of restarts the server will almost always crash, within the restart window.

Ivo Burkart 2023-04-16T15:20:32Z

1.19.73 affected as well

Luke 2023-05-08T19:04:41Z

can anyone confirm if this happens in 1.19.81?

Matthew Dietrich 2023-05-08T20:10:10Z

THIS ISSUE IS STILL PRESENT IN 1.19.81!!!!

darknavi 2023-05-10T20:17:56Z

Does someone mind uploading a more recent server log? Specifically I am looking for the bits at the end that contain the session ID.

Matthew Dietrich 2023-05-10T20:58:24Z

@darknavi

[media]

ProtocolPav 2023-05-14T09:53:33Z

I seem to be having the same issue, however I don't get any crash logs. Just a message that says "Killed" and nothing else.

Matthew Dietrich 2023-06-02T16:06:09Z

Are we going to get any type of Proper resolution for this my server crashed 22 times in 4 days. This issue has gone on far too long. It needs to stop being ignored and made a priority. Many of us are spending hard earned money on our servers and these types of crashed are having a major impact on my community as a whole

0ld guy 2023-06-08T22:12:53Z

this is still a problem in the 1.20 update, i restart my server 2x a day and ive still had crashes due to excessive memory usage, the server has 8gb memory assigned to it, if it takes more than 8gb of memory to run for 12 hours thats ridiculous, listen i am willing to do whatever it takes, if i need to add a dev to the dashboard so they can watch the server or if i need to provide server downloads of the files, i can do that, but the current state of the BDS is not acceptable,

Tasel 2023-06-13T15:56:08Z

Confirming issue still active on BDS servers (ours is running on Nodecraft if that helps any). Running current 1.20 update. Issue did not seem to happen in previously generated world as everything had been explored and built before we restarted for 1.20. New world crashes multiple times a day, even with the system rebooting twice a day. Have to have a script running to see if down every 5minutes and start it back up to accommodate players when they all get kicked due to the RAM hitting 100%+ and taking the server down. If I watch the RAM just slowly creeps up until it hits 100% and crashes.

darknavi 2023-06-16T22:32:50Z

It seems like this could be provoked by content. Does anyone here mind uploading their world/packs (probably just the entire server folder) so we can take a look at this?

Luke 2023-06-16T23:33:08Z

@darknavi That's what I thought at first as well, but after removing all content (I uninstalled BDS completely and reinstalled without changing anything so it creates a brand new fresh world) the issue still remained tho the interval between the crashes seemed to be a bit longer. The issue is definitely not caused by content, but it may be aggravated by them. The mystery is why this doesn't seem to happen to every single person.

Coleton Watt 2023-06-17T19:30:33Z

@darknavi I could send my server file however it is really easy to replicate the problem, just by going back and forth from the nether and overworld. My hypothesis is that when going back and forth it causes the chunks in the nether and overworld to load again. Meaning the server has multiple of the same chunks loaded in memory without realizing. This might be caused by the server writing to storage but never returns the memory to the stack, a skipped destructor. This is only worsened by have building and content beyond the chunks that need to be loaded.

glenb711 2023-06-17T21:48:21Z

You don't even need to go to the Nether. Just the act of teleporting around will cause the memory heap to grow and grow. This is all documented in the comment history for this issue. You can pick two locations, teleport back and forth between them, and watch memory utilization increase. The point is that memory is never freed: BDS keeps allocating new memory every time it needs to load a new chunk but it never frees up old memory. It's as if the server can't remember which chunks it has loaded in memory, and keeps reloading the same chunks into new memory allocations over and over again.

And just to be clear, this happens even on brand new worlds, bare servers with no plugins. It has nothing to do with content and can be easily reproduced even on brand new worlds. Create new world, log in, teleport around, watch memory usage. That's it!

Copitoch 2023-06-18T19:29:18Z

This is a very old problem, I remember it has existed almost since the release of the kernel.

That's why I had to abandon the idea of using a vanilla kernel.

I fully support the negative comments here and am waiting for the long-awaited fix.

(translation)

Tasel 2023-06-21T16:17:52Z

I could upload the 77th's world file if you want, but it's every world we put on the server. We play pure vanilla, no adding, no resource packs. Any world that's up (both our old one and the brand new server we created the Saturday after the update) will constantly increase in RAM usage until it hits 100% and then crash. Seems to be related more to chunks loading. The server just never seems to let go of the RAM.

Matthew Dietrich 2023-06-22T23:42:20Z

Maybe all server owners should send Mojang/ Microsoft a partial server cost bill for each month this goes unresolved maybe then it will get the proper attention it deserves.

jtp10181 2023-07-04T00:31:11Z

@[Mojang] darknavi, @darknavi Do you still need a world to test this? Should really work on any world but I could upload if you really need something. Where do I send it?

This is ridiculous this has been going on since August with basically no visible effort to fix. I am just hosting a small server at home for 4-8 people and it will tear through 4gb of ram every few days and crash.

wuhupoo 2023-07-27T20:54:35Z

Very serious problem, when we want to travel far, soon the server crashes

Matthew Dietrich 2023-07-27T21:31:31Z

Not sure why this issue is being ignored people like myself are paying good money for our servers to only have them be ruined by an issue that was reported a long time ago. This is completely unfair to server owners such as myself as it ruins the overall enjoyment people have.

jtp10181 2023-08-03T21:34:38Z

Now mods are editing and deleting peoples comments, while continuing to ignore the issue giving no responses at all. In a few weeks it will be the 1yr anniversary of this being reported.

mattwelke 2023-08-06T23:49:38Z

Just chiming in to say I'm experiencing this too. The only difference is that for me, the server hangs instead of crashing while a process called "kswapd" climbs to 100% CPU usage and stays there. Sometimes it resolves itself after a few minutes. Sometimes it doesn't and requires me to restart my cloud VPS. I think this is because they provision the VPSs for me with swap enabled, so I get a hang instead of a crash.

From the looks of the comments here, other people have more experience doing debugging steps and providing logs etc than me, so I'll hold off on adding mine until asked. I'll subscribe to this for updates on this issue.

Thanks!

GoldenHelmet 2023-08-17T20:13:42Z

jtp10181: the Bedrock team does not use the "Assignee" field on this bug tracker. You can see that the report is being tracked internally by Mojang when it has a number in the ADO field.

jtp10181 2023-08-18T11:55:19Z

@GoldenHelmet, fair enough. Would be nice to get some sort of an update. Is it actively being worked on? Can they reproduce it? Do they need anything from the users? Even a response acknowledging the problem and saying they cannot figure out what is causing it would be better than totally ignoring everyone. I imagine there are thousands of people experiencing this bug but they either have no idea why the server is crashing or just have not bothered to search around and find this. I found it with a search trying to figure out if it was something I did wrong and could fix.

Pretty sad this has been on here for just about a year now and only two responses from "Mojang" in that entire time.

Luke 2023-08-23T02:27:11Z

Looks like this issue has been finally fixed in the latest preview as the changelog says! Next stable release should bring the fix to everyone! “Only” took a year, but at least they fixed it. https://feedback.minecraft.net/hc/en-us/articles/18619357250701-Minecraft-Beta-Preview-1-20-30-22

jtp10181 2023-08-23T14:24:34Z

I feel like they just ignored it until people recently started to pester them about it more. Seems like they were very confused on how to replicate the issue up until at least the last Mojang post on 6/16/23. Not sure how though since all you had to do was load a server and play on it.

Either way, glad a fix is finally coming. Will be interesting to see how it runs after the fix. I had to bump up the RAM allocation on my VM just to deal with this issue. Will have to keep an eye on the release notes to watch for the update.