The 1.14 snapshots have some severe performance problems. Servers fall behind continually, and users are regularly kicked. The first thing I noticed is that even when nobody is logged on to a server, 100% of one CPU sure is being used. As you'll see, this is a red herring, but it means that everyone else is stuck until this is implemented properly.
I profiled this, and I found that Thread::yield dominates the run time:
Tree Profile:
(t 100.0,s 7.9) java.lang.Thread::run
(t 92.1,s 0.3) net.minecraft.server.MinecraftServer::run
(t 87.9,s 87.9) java.lang.Thread::yield
(t 1.9,s 0.0) net.minecraft.server.MinecraftServer::a
I only have obfuscated names, here's where that's called from in MinecraftServer.run()
:
@Override
public void run() {
try {
if (this.d()) {
this.Z = k.b();
this.n.a(new jf(this.E));
this.n.a(new pd.c("18w43c", 442));
this.a(this.n);
while (this.u) {
long l2 = k.b() - this.Z;
if (l2 > 2000L && this.Z - this.Q >= 15000L) {
long l3 = l2 / 50L;
h.warn("Can't keep up! Is the server overloaded? Running {}ms or {} ticks behind", (Object)l2, (Object)l3);
this.Z += l3 * 50L;
this.Q = this.Z;
}
this.Z += 50L;
this.a(this::aU);
while (this.aU()) {
Thread.yield();
}
this.P = true;
}
} else {
this.a((b)null);
}
}
Specifically:
while (this.aU()) {
Thread.yield();
}
There are multiple other places where Thread::yield is called from, including another place in MinecraftServer and in ty.java
.
This may be decompiled code, but it looks intentional. Basically, it looks like a hack that was necessary to get a snapshot out in reasonable time. No negative judgement there; we all do cheap hacks to get alpha releases usable for others, and we're better off with having a snapshot than not!
Unfortunately, this makes it hard for anyone else to help with profiling the server, and as a result, we are unable to help the developers find the performance hotspots that their profiling tools miss.
This problem also makes it inadvisable or disallowed to run 1.14 snapshots in the cloud, as some VM hosts will kill CPU-hogging processes, and it also can cost more.
Therefore I urge the developers to please implement this properly sooner rather than later.
Related issues
is duplicated by
relates to
Attachments
Comments


It appears net/minecraft/server/MinecraftServer.run()
calls Thread.yield()
as well while waiting for the next tick, according to this StackOverflow answer this might not be ideal either.

Whoops. My bad. I'll edit the bug description, because THAT is the place where it's hammering the CPU from.

18w44a hasn't done anything to help with this. Here's the top of the profile tree now:
(t 100.0,s 13.1) java.lang.Thread::run
(t 86.9,s 1.7) net.minecraft.server.MinecraftServer::run
(t 52.9,s 52.9) java.lang.Thread::yield
(t 14.0,s 0.0) net.minecraft.server.MinecraftServer::a
(t 14.0,s 0.0) ti::b
(t 14.0,s 0.0) net.minecraft.server.MinecraftServer::b
(t 13.8,s 0.0) ua::a
(t 13.7,s 0.0) tz::a
(t 13.5,s 2.1) tz::n
(t 3.4,s 2.9) azp::c
(t 0.5,s 0.0) java.util.TreeMap::get
(t 0.5,s 0.4) java.util.TreeMap::getEntry
(t 0.1,s 0.1) java.lang.String::compareTo
(t 2.2,s 0.0) tq::c
(t 1.2,s 0.0) java.util.concurrent.CompletableFuture::getNow
(t 1.2,s 1.2) java.util.concurrent.CompletableFuture::reportJoin
(t 0.6,s 0.0) com.mojang.datafixers.util.Either$Left::left
(t 0.6,s 0.5) java.util.Optional::of
(t 0.1,s 0.1) java.util.Optional::<init>
(t 0.4,s 0.4) com.mojang.datafixers.util.Either$Right::left
(t 2.2,s 0.2) azt::a
(t 0.8,s 0.6) java.util.Random::nextInt
(t 0.2,s 0.1) java.util.Random::next
(t 0.1,s 0.1) java.util.concurrent.atomic.AtomicLong::compareAndSet
Can confirm the constant 100% CPU core usage for 18w45a.
CPU wasting time is still an issue with 18w46a:
Performances are worsened since this snapshot version, with players being kicked or server crashing upon exceeding max tick delay or being frequently overloaded even when no-one is online.

Can confirm that this is a problem in snapshot versions 18w50a as well as 19w02a.

Replacing yield with sleep(1) is a quick fix.
An even better fix would be to not use a busy loop and rather use futures or wait/notify.

Can confirm still an issue on 19w05a. This problem is preventing proper performance testing of other components. Could this be taken to developers as a higher priority so we have sufficient testing time before release?

This is an absolute show stopper for me, it's basically unplayable due to this issue, which as others have said, prevents or at least complicates the actual play testing of new content. I accept that snapshots aren't stable releases, but getting this fixed first should be a priority
Edit: This problem seems to be made worse by riding horses and/or generating new chunks.

Using wait/notify is the way to go.
Right now this kills and it becomes really hard to debug things.

I'm not sure if this is related but once the second player connects, the server begins lagging.. This has been noticed on the last few snapshot builds.

This probably doesn't help much, but since I want to test this server, I can't let the CPU idle around on 100% on a core all the time.
First time editing a bytecode, I've modified my 19w06a snapshot by replacing the call of Thread.yield() to Thread.sleep(1) just for testing (as suggested in a previous comment). Funny thing, it isn't pretty and the average tick time rose by like 50% on the system, but the cpu usage of the process dropped to like 3%-4% . Don't know if the performance is way down now, but at least I can let the server run for further testing.

If you are in to bytecode editing you could try using Thread.sleep(0, 1000)
instead of {{Thread.sleep(1) }}since you can pass either a number of milliseconds or a number of milliseconds and a number of nanoseconds. In the event that that fixes the tick time issue you could then play around with that second value to find a setting that gives acceptable tick time and acceptable CPU load.

@nickovs most JVMs will not do nanosecond sleep, internally they will just round up to the next nearest millisecond if nanos is large enough.
See e.g. OpenJDK https://github.com/md-5/OpenJDK/blob/9ce194a08077f2d0d6acf034c3326b9a53baa14f/src/java.base/share/classes/java/lang/Thread.java#L325

I was hoping that even if they don't do nanoseconds the might at least to microseconds 🙂

I assume this is still happening in 19w07a? I haven't tested it yet.

Yes, it seems to still be an issue. Running an idle 19w07a server on my machine it takes up 101% CPU, so clearly some thread is still in a spin loop.

Using notifyAll() and wait() with a short timeout is probably the way to go (or use semaphores instead of timeouts). Technically this is not hard to realize, somebody just has to do it. But one would need the complete, compilable sourcecode to fix this 🙂
Minecraft is not FOSS after all :/
Present in 19w08a.

Boy, it sure would be nice if someone would bother fixing this, so people could actually test things...

Can confirm its still present in 19w08a. We had to revoke snapshot access weeks back due to this. The least they could do is acknowledge its an issue and will be fixed.

I believe this would impact Realms too when released? All their server's would overload and lock out.
No worries, this will for sure never reach stable release with this issue. But what has been said above, it's not as easy as changing one line. It impacts game's performance.
You can test the snapshots fine on the cloud,… but for it to run as expected, ideally you'll need a multi-core dedicated server for… a minecraft server.

this issue looks like sitting in the main thread, I have 8core 4Ghz in my server and with 3 players the server is overloaded and it uses a little more then 100% of 1 core
the other 7 cores are doing nothing (And yes its dedicated, you will not find a virtual with 4Ghz)
the effects are,
very slow loading chunks (already existing chunks, not traveling the world)
white holes (chunks not loading at all)
mobs are reacting slow or stand still for a moment
Can't keep up! Is the server overloaded? Running xxxxms or xx ticks behind
I will not upgrade to 1.14 if this is not fixed
I realy hope the care about something like this

This is still a problem on 19w09a, idling between 99-104% CPU no players logged in the server

I still have 100% core loading on 19w11a, client.
Arch Linux

Just wanted to followup on 19w11a encase anyone wanted to know.
Client - 14% on Windows 10.
Server (After being online, with one player for 10 min) on Debian 8, Intel E3 Xeon 1270 V5 - 52% and gradually going down.
Thank you for fixing this horrific issue.

@unknown Did you check that it's not just lag? I just tested it and the fix worked for me: Empty world, 2 chunks render distance, 200fps. Setting the fps limit to 60 brings down the CPU usage to 4% (a third of one CPU core).

Fabian, FPS on a client does not affect servers - unless they are both being run on the same machine.
I have this issue on my dedicated i7-7700k server (boost to 5GHz). No one on, and CPU idling at 90-100%, with nothing running except spawn chunks.
Will see if it's resolved later on today. 🙂

What I meant to say with the 200fps is that there's processing power left, but you're right, there's another process (I only looked at one before) that does take up 100% of one core still. Not sure if the cause is the one in this report or a new one.

still persists in 19w11a on linux(ubuntu)

Are yall using old maps or a fresh install for Snap 19w11a? Currently, mine is at 8.4% Idle which is "fine".

new superflat map with no one online

I updated the world I was running previously, same conditions inside a docker container from Pterodactyl and it is no longer idling at 100%. Idling below 10% here. Host OS is Ubuntu 18.04.1 LTS and container is Alpine

On my server machine (Ubuntu 18.04, 2x8 core Xeon) things are better but still not perfect with 19w11a. The machine is running one instance using 19w11a and two using 1.13.2. The newest server is idling around 16% of a CPU and while the others are idling at about 2%. While this is a great deal better than the 100% it was using until now, it would be great to get it back to where is was in previous releases. Hopefully it's just due to excessive debugging code in the pre-release version.

The released 1.14 server .jar is still affected (I do not know how to correlate the NNwNNa snapshots to the releases, sorry).
I host 4 minecraft server instances on an Arch Linux amd64 machine, and due to this issue I had to suspend hosting after I upgraded from 1.13.2 to 1.14.0. I can definitely see how one could get a nasty surprise when hosting this on a cloud service.
SHA-384 checksums here:
951dbcef3af5f952cafb9a89379aed2018d8eebe4e9ebf801bd03c44719931d8652ddc7dab83f0d26e2568e8add4b788 minecraft_server.1.13.2.jar
2e691ee6bc50f67b9ad2237d86c30e7394aba39d73e20fa43b535f1c74b01238a8724965fdb22c1b3f52501f94005fe6 minecraft_server.1.14.jar
The 1.14 server .jar file has been obtained from here.
Here is a commented screenshot from a graphical systems load indicator when stopping, then starting, and then stopping again the 4 service instances:
[media]The baseline idle CPU load can be seen pretty well.
The shown time window is 4 minutes (120px, 1 pixel = 2 seconds), so yeah the huge startup load is pretty bad too, I am quite sure 1.13.2 started up in a fraction of that time. This is not an embedded system or anything, it is an entry-level server, the CPU being an Intel Xeon E3-1225 v3 with 4 cores and 3.2 GHz having 32 GiB of memory.
Java runtime environment:
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b01)
OpenJDK 64-Bit Server VM (build 25.212-b01, mixed mode)

I can confirm that this bug is still in the latest stable 1.14 release. CPU load is at 100% when no players are logged in.
There's a similar ticket for versions after 19w12a: MC-146579
Please post updates there.