[MC-138071] Main thread uses 100% of one CPU core when no users are logged in, busy-waiting over java.lang.Thread::yield, makes it hard to profile properly

The 1.14 snapshots have some severe performance problems. Servers fall behind continually, and users are regularly kicked. The first thing I noticed is that even when nobody is logged on to a server, 100% of one CPU sure is being used. As you'll see, this is a red herring, but it means that everyone else is stuck until this is implemented properly.

I profiled this, and I found that Thread::yield dominates the run time:

Tree Profile:
 (t 100.0,s  7.9) java.lang.Thread::run
  (t 92.1,s  0.3) net.minecraft.server.MinecraftServer::run
   (t 87.9,s 87.9) java.lang.Thread::yield
   (t  1.9,s  0.0) net.minecraft.server.MinecraftServer::a

I only have obfuscated names, here's where that's called from in MinecraftServer.run():

@Override
    public void run() {
        try {
            if (this.d()) {
                this.Z = k.b();
                this.n.a(new jf(this.E));
                this.n.a(new pd.c("18w43c", 442));
                this.a(this.n);
                while (this.u) {
                    long l2 = k.b() - this.Z;
                    if (l2 > 2000L && this.Z - this.Q >= 15000L) {
                        long l3 = l2 / 50L;
                        h.warn("Can't keep up! Is the server overloaded? Running {}ms or {} ticks behind", (Object)l2, (Object)l3);
                        this.Z += l3 * 50L;
                        this.Q = this.Z;
                    }
                    this.Z += 50L;
                    this.a(this::aU);
                    while (this.aU()) {
                        Thread.yield();
                    }
                    this.P = true;
                }
            } else {
                this.a((b)null);
            }
        }

Specifically:

while (this.aU()) {
                        Thread.yield();
                    }

There are multiple other places where Thread::yield is called from, including another place in MinecraftServer and in ty.java.

This may be decompiled code, but it looks intentional. Basically, it looks like a hack that was necessary to get a snapshot out in reasonable time. No negative judgement there; we all do cheap hacks to get alpha releases usable for others, and we're better off with having a snapshot than not!

Unfortunately, this makes it hard for anyone else to help with profiling the server, and as a result, we are unable to help the developers find the performance hotspots that their profiling tools miss.

This problem also makes it inadvisable or disallowed to run 1.14 snapshots in the cloud, as some VM hosts will kill CPU-hogging processes, and it also can cost more.

Therefore I urge the developers to please implement this properly sooner rather than later.

Linked issues

is duplicated by 9

MC-138816 Integrated / "Vanilla" Snapshot Server's CPU usage is 10 times higher than in 1.13.2 Resolved

MC-141694 Sustained high CPU on java server while doing nothing. Resolved

MC-141895 server process 100% CPU usage even without players Resolved

MC-143212 Server 100% cpu on new empty world Resolved

MC-143854 Idle java server load 100% on snapshots Resolved

4 more links

relates to 2

MC-138676 1.14 snapshot server always crashes after slightly more than one minute, fresh start, no users online Resolved

MC-146579 Server CPU usage at 100% for all cores Resolved

Attachments

Comments 38

marcono1234 2018-10-26T16:09:29Z

It appears net/minecraft/server/MinecraftServer.run() calls Thread.yield() as well while waiting for the next tick, according to this StackOverflow answer this might not be ideal either.

Timothy Miller 2018-10-26T16:48:03Z

Whoops. My bad. I'll edit the bug description, because THAT is the place where it's hammering the CPU from.

Timothy Miller 2018-11-01T17:24:38Z

18w44a hasn't done anything to help with this. Here's the top of the profile tree now:

(t 100.0,s 13.1) java.lang.Thread::run
  (t 86.9,s  1.7) net.minecraft.server.MinecraftServer::run
   (t 52.9,s 52.9) java.lang.Thread::yield
   (t 14.0,s  0.0) net.minecraft.server.MinecraftServer::a
    (t 14.0,s  0.0) ti::b
     (t 14.0,s  0.0) net.minecraft.server.MinecraftServer::b
      (t 13.8,s  0.0) ua::a
       (t 13.7,s  0.0) tz::a
        (t 13.5,s  2.1) tz::n
         (t  3.4,s  2.9) azp::c
          (t  0.5,s  0.0) java.util.TreeMap::get
           (t  0.5,s  0.4) java.util.TreeMap::getEntry
            (t  0.1,s  0.1) java.lang.String::compareTo
         (t  2.2,s  0.0) tq::c
          (t  1.2,s  0.0) java.util.concurrent.CompletableFuture::getNow
           (t  1.2,s  1.2) java.util.concurrent.CompletableFuture::reportJoin
          (t  0.6,s  0.0) com.mojang.datafixers.util.Either$Left::left
           (t  0.6,s  0.5) java.util.Optional::of
            (t  0.1,s  0.1) java.util.Optional::<init>
          (t  0.4,s  0.4) com.mojang.datafixers.util.Either$Right::left
         (t  2.2,s  0.2) azt::a
          (t  0.8,s  0.6) java.util.Random::nextInt
           (t  0.2,s  0.1) java.util.Random::next
            (t  0.1,s  0.1) java.util.concurrent.atomic.AtomicLong::compareAndSet

Johnibur 2018-11-07T21:53:55Z

Can confirm the constant 100% CPU core usage for 18w45a.

Johnibur 2018-11-21T11:35:06Z

CPU wasting time is still an issue with 18w46a:
Performances are worsened since this snapshot version, with players being kicked or server crashing upon exceeding max tick delay or being frequently overloaded even when no-one is online.

28 more comments

Ninjawolf0007 2019-01-09T18:39:54Z

Can confirm that this is a problem in snapshot versions 18w50a as well as 19w02a.

md_5 2019-01-16T05:08:59Z

Replacing yield with sleep(1) is a quick fix.

An even better fix would be to not use a busy loop and rather use futures or wait/notify.

Mudassar Khan 2019-01-31T21:10:17Z

Can confirm still an issue on 19w05a. This problem is preventing proper performance testing of other components. Could this be taken to developers as a higher priority so we have sufficient testing time before release?

Zachary Taylor 2019-02-03T05:56:57Z

This is an absolute show stopper for me, it's basically unplayable due to this issue, which as others have said, prevents or at least complicates the actual play testing of new content. I accept that snapshots aren't stable releases, but getting this fixed first should be a priority

Edit: This problem seems to be made worse by riding horses and/or generating new chunks.

XXMINECRAFTGODXX 2019-02-04T16:48:51Z

Using wait/notify is the way to go.

Right now this kills and it becomes really hard to debug things.

Vix 2019-02-08T21:54:04Z

I'm not sure if this is related but once the second player connects, the server begins lagging.. This has been noticed on the last few snapshot builds.

Andreas 2019-02-08T23:06:56Z

This probably doesn't help much, but since I want to test this server, I can't let the CPU idle around on 100% on a core all the time.

First time editing a bytecode, I've modified my 19w06a snapshot by replacing the call of Thread.yield() to Thread.sleep(1) just for testing (as suggested in a previous comment). Funny thing, it isn't pretty and the average tick time rose by like 50% on the system, but the cpu usage of the process dropped to like 3%-4% . Don't know if the performance is way down now, but at least I can let the server run for further testing.

Nicko van Someren 2019-02-08T23:20:16Z

If you are in to bytecode editing you could try using Thread.sleep(0, 1000) instead of {{Thread.sleep(1) }}since you can pass either a number of milliseconds or a number of milliseconds and a number of nanoseconds. In the event that that fixes the tick time issue you could then play around with that second value to find a setting that gives acceptable tick time and acceptable CPU load.

md_5 2019-02-08T23:24:13Z

@nickovs most JVMs will not do nanosecond sleep, internally they will just round up to the next nearest millisecond if nanos is large enough.

See e.g. OpenJDK https://github.com/md-5/OpenJDK/blob/9ce194a08077f2d0d6acf034c3326b9a53baa14f/src/java.base/share/classes/java/lang/Thread.java#L325

Nicko van Someren 2019-02-08T23:26:55Z

I was hoping that even if they don't do nanoseconds the might at least to microseconds 🙂

Zachary Taylor 2019-02-13T16:58:17Z

I assume this is still happening in 19w07a? I haven't tested it yet.

Nicko van Someren 2019-02-13T17:14:03Z

Yes, it seems to still be an issue. Running an idle 19w07a server on my machine it takes up 101% CPU, so clearly some thread is still in a spin loop.

Andreas 2019-02-14T06:51:32Z

Using notifyAll() and wait() with a short timeout is probably the way to go (or use semaphores instead of timeouts). Technically this is not hard to realize, somebody just has to do it. But one would need the complete, compilable sourcecode to fix this 🙂

Minecraft is not FOSS after all :/

Johnibur 2019-02-20T16:22:11Z

Present in 19w08a.

Pegasus Epsilon 2019-02-20T16:29:12Z

Boy, it sure would be nice if someone would bother fixing this, so people could actually test things...

Adam Stockton 2019-02-20T18:49:03Z

Can confirm its still present in 19w08a. We had to revoke snapshot access weeks back due to this. The least they could do is acknowledge its an issue and will be fixed.

Mudassar Khan 2019-02-20T20:37:59Z

I believe this would impact Realms too when released? All their server's would overload and lock out.

Johnibur 2019-02-20T23:06:20Z

No worries, this will for sure never reach stable release with this issue. But what has been said above, it's not as easy as changing one line. It impacts game's performance.
You can test the snapshots fine on the cloud,… but for it to run as expected, ideally you'll need a multi-core dedicated server for… a minecraft server.

Mario Verbelen 2019-02-21T06:46:12Z

this issue looks like sitting in the main thread, I have 8core 4Ghz in my server and with 3 players the server is overloaded and it uses a little more then 100% of 1 core

the other 7 cores are doing nothing (And yes its dedicated, you will not find a virtual with 4Ghz)

the effects are,

very slow loading chunks (already existing chunks, not traveling the world)
white holes (chunks not loading at all)
mobs are reacting slow or stand still for a moment
Can't keep up! Is the server overloaded? Running xxxxms or xx ticks behind

I will not upgrade to 1.14 if this is not fixed

I realy hope the care about something like this

Mike 2019-02-28T02:50:21Z

This is still a problem on 19w09a, idling between 99-104% CPU no players logged in the server

Xakep_SDK 2019-03-13T16:12:13Z

I still have 100% core loading on 19w11a, client.
Arch Linux

Adam Stockton 2019-03-13T16:42:43Z

Just wanted to followup on 19w11a encase anyone wanted to know.

Client - 14% on Windows 10.

Server (After being online, with one player for 10 min) on Debian 8, Intel E3 Xeon 1270 V5 - 52% and gradually going down.

Thank you for fixing this horrific issue.

FaRo1 2019-03-13T17:15:14Z

@unknown Did you check that it's not just lag? I just tested it and the fix worked for me: Empty world, 2 chunks render distance, 200fps. Setting the fps limit to 60 brings down the CPU usage to 4% (a third of one CPU core).

Rob DeWolf 2019-03-13T17:45:34Z

Fabian, FPS on a client does not affect servers - unless they are both being run on the same machine.

I have this issue on my dedicated i7-7700k server (boost to 5GHz). No one on, and CPU idling at 90-100%, with nothing running except spawn chunks.

Will see if it's resolved later on today. 🙂

FaRo1 2019-03-13T18:24:51Z

What I meant to say with the 200fps is that there's processing power left, but you're right, there's another process (I only looked at one before) that does take up 100% of one core still. Not sure if the cause is the one in this report or a new one.

AJ 2019-03-13T22:28:33Z

still persists in 19w11a on linux(ubuntu)

Adam Stockton 2019-03-13T22:31:34Z

Are yall using old maps or a fresh install for Snap 19w11a? Currently, mine is at 8.4% Idle which is "fine".

AJ 2019-03-13T23:37:19Z

new superflat map with no one online

Mike 2019-03-14T00:09:00Z

I updated the world I was running previously, same conditions inside a docker container from Pterodactyl and it is no longer idling at 100%. Idling below 10% here. Host OS is Ubuntu 18.04.1 LTS and container is Alpine

Nicko van Someren 2019-03-14T12:51:33Z

On my server machine (Ubuntu 18.04, 2x8 core Xeon) things are better but still not perfect with 19w11a. The machine is running one instance using 19w11a and two using 1.13.2. The newest server is idling around 16% of a CPU and while the others are idling at about 2%. While this is a great deal better than the 100% it was using until now, it would be great to get it back to where is was in previous releases. Hopefully it's just due to excessive debugging code in the pre-release version.

eomanis 2019-04-27T23:51:05Z

The released 1.14 server .jar is still affected (I do not know how to correlate the NNwNNa snapshots to the releases, sorry).

I host 4 minecraft server instances on an Arch Linux amd64 machine, and due to this issue I had to suspend hosting after I upgraded from 1.13.2 to 1.14.0. I can definitely see how one could get a nasty surprise when hosting this on a cloud service.

SHA-384 checksums here:

951dbcef3af5f952cafb9a89379aed2018d8eebe4e9ebf801bd03c44719931d8652ddc7dab83f0d26e2568e8add4b788 minecraft_server.1.13.2.jar

2e691ee6bc50f67b9ad2237d86c30e7394aba39d73e20fa43b535f1c74b01238a8724965fdb22c1b3f52501f94005fe6 minecraft_server.1.14.jar

The 1.14 server .jar file has been obtained from here.

Here is a commented screenshot from a graphical systems load indicator when stopping, then starting, and then stopping again the 4 service instances:

[media]

The baseline idle CPU load can be seen pretty well.

The shown time window is 4 minutes (120px, 1 pixel = 2 seconds), so yeah the huge startup load is pretty bad too, I am quite sure 1.13.2 started up in a fraction of that time. This is not an embedded system or anything, it is an entry-level server, the CPU being an Intel Xeon E3-1225 v3 with 4 cores and 3.2 GHz having 32 GiB of memory.

Java runtime environment:

openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b01)
OpenJDK 64-Bit Server VM (build 25.212-b01, mixed mode)

Phil Movel 2019-04-29T15:43:56Z

I can confirm that this bug is still in the latest stable 1.14 release. CPU load is at 100% when no players are logged in.

violine1101 2019-04-29T19:32:42Z

There's a similar ticket for versions after 19w12a: MC-146579
Please post updates there.