mojira.dev

DoctorKest

Assigned

No issues.

Reported

BDS-16839 Server Memory Usage Explodes when Specific User Joins Duplicate

Comments

Chiming in to confirm, 1.18.31 fixed the issue on my BDS.

I re-entered the area with the "bad" chunk, and at first there were no entities, and some chunks wouldn't load. But the server RAM usage stayed steady. Fearing for the worst, I opened a door and voila! Everything started working again. Villagers, animals, mobs, missing chunks, everything came back. I played for 30 minutes in the same area and had zero problems.

So if you load up your world again in 1.18.31 and find a creepy ghost town, just interact with something (door, bed, maybe break a block) to bring everyone back Avengers Endgame style.

Thank you to the devs for their quick work!

@Michal Brzek Awesome job using strace! BDS often has multiple threads doing things, so it's not surprising that some of them are sleeping while others are busy/crashing. If you run strace with all the bedrock server threads, you'll see more activity, but none of it sticks out as unusual to me. lseek() and read() calls for reading world data from disk, sendto() and recvfrom() for network traffic between client/server, and plenty of nanosleeps while threads wait on things. If I have time, I'll try to compare strace logs between a healthy server and a crashing one to see if anything sticks out.

 

If you want to dig a bit further, when BDS crashes, it generates a huge file called "core", which is a core dump. You can try to read it with gdb with `gdb ./bedrock_server-1.18.30.04 ./core`. You can see the state of all threads running with `thread apply all bt`. You'll see a lot of `??` because we don't have the debug symbols (names for anything "inside" Minecraft), but you can usually see what system calls the program was in at the time of the crash.

 

Unfortunately, that's still not super helpful (at least not without the debug symbols mentioned above). The BDS seems busy doing something while consuming all the memory on the system, until the OS OOM killer finally kills it for hogging all the memory. So the BDS isn't hitting a wall, it's stuck in a loop. But I usually see the main thread doing something involving memcmp or memmove when it crashes - indicating that it's probably stuck loading more and more of something into memory.

@Whitemist that doesn't sound like the same issue.

Just to summarize the comments/reports thus far, the hallmarks of this issue seem to be:

  • Extremely high RAM usage (>10GB, up to consuming all physical memory on the machine)

  • World/chunks stop loading

  • All entities (mobs, animals, villagers, etc.) stop moving

  • Blocks stop updating (e.g. break blocks below sand/sugarcane/scaffolding and nothing happens to the blocks above)

  • For BDS/Realms, server crashes after several moments, disconnecting everyone

  • Clients remain responsive (players can walk around, break blocks, etc.)

  • All of the above triggered exclusively when a player enters a very specific part of the world.

I guess if you're playing locally (no realm/server) the RAM explosion could start affecting your performance/FPS, but that's just an unfortunate coincidence.

We keep using the term "corruption" not because anyone has confirmed our world files have in fact become corrupted, but because the issue is triggered by loading some thing in a specific part of our worlds. We've taken a lot of guesses about what that thing is (Nether portals, traveling to/from nether, lots of storage/chests, lots of entities, etc.) but for every guess we make there seems to be a counterexample, so we really don't know.

I have a nether portal in the area, but I haven't been through it since updating to 1.18.30.

While the corrupted area in my world is also pretty heavily “detailed,” I find it strange that the server/world becomes bugged while the client remains responsive. In Luminicent Owl’s video, the player can still walk and modify the world, but the world is frozen.

Those of us running BDS are seeing the server crash, not the clients.

All this seems to point away from anything related to graphical rendering, although entity density within a chunk being loaded or updated could play a role.

Any ideas if it's worth playing on backups or does it keep happening again and again? If the issue keeps happening it's pointless to keep playing the world cuz the progress gets lost.

As Rogue Cheddar points out, it seems to be repeatable and related to the world data. Either something that existed before the update, or something that was changed post-update, not sure. Some of us are reporting that we played in the world for hours after updating before the problem manifested, which raises the question of whether we could load a backup and avoid doing/creating the thing that corrupted the world.

But with the number of people reporting and voting for this bug, whatever "it" is must be pretty common.

Not trying to badger or anything, but how long until I can expect this to be fixed? It's REALLY irritating.

You know as much as we do. We've had mods mark other issues as duplicates and consolidate on this issue, which is always a good sign - Thank you OcelotOnesie!

Yep, the main issue seems to be MCPE-154278. I’ve been commenting over there, this one should be marked as a duplicate.

Just to add another data point:

I had another user log in (in a non-corrupted part of the world) and, at the moment I logged in, teleport me to them. We both promptly logged out. The server crashed after a minute, but it was alive long enough to save my new location.

When I restarted the server, we were both able to log in without the server crashing. Memory usage stayed at a reasonable ~1 GB.

So, for now, that region of the world is off-limits. Hopefully we can get a fix or workaround that will heal the corruption.

— Client side memory leak, maybe? Possibly caused by pendingticks?

Definitely not client-side, I'm seeing the issue crash the server (Bedrock Dedicated Server). Client (Minecraft for Windows) has no issues and stays responsive the entire time.

EDIT: Additionally, I can log in and log out really quick from the client, and the server still crashes after a minute of intense RAM usage.

NeoCyania is really on to something. On my server, the problem only manifests when a specific user (in a specific part of the world) logs in. Users in other parts of the world are unaffected.

This may also be related to BDS-16839 and BDS-16847. As NeoCyania mentioned, it may be related to excessive RAM usage.

My BDS produces a core dump when it crashes that I'd be happy to share, but it's 1.5 GB when compressed. I'm not sure how to get it to the devs.