Since the way structures are saved has changed for 1.6.4, the save-all is causing huge lagspikes on servers with bigger maps.
In 1.6.2, save-all didn't cause any lagspikes.
1.7.2 and the lagspikes still occurs.
Attachments
Comments

I have seen /save-all cause 20+ second pauses, which, combined with existing pauses from a busy server, is enough to cause clients (netty) to timeout.
Observed in 1.7.2.
Even when /save-all does not cause long pauses, I often see the following exception:
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894)
at java.util.HashMap$KeyIterator.next(HashMap.java:928)
at dg.a(SourceFile:21)
at dg.a(SourceFile:314)
at dg.a(SourceFile:23)
at dg.a(SourceFile:314)
at dg.a(SourceFile:23)
at dr.a(SourceFile:94)
at dr.a(SourceFile:84)
at dr.a(SourceFile:23)
at ayc.a(SourceFile:93)
at ayc.a(SourceFile:75)
at mj.a(SourceFile:616)
at mj.a(SourceFile:601)
at bg.b(SourceFile:36)
at z.a(SourceFile:58)
at net.minecraft.server.MinecraftServer.g(SourceFile:674)
at nn.run(SourceFile:84)
at java.lang.Thread.run(Thread.java:722)
As well as:
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:819)
at java.util.ArrayList$Itr.next(ArrayList.java:791)
at mj.a(SourceFile:455)
at aow.a(SourceFile:225)
at aow.a(SourceFile:96)
at mi.b(SourceFile:146)
at mi.a(SourceFile:175)
at mj.a(SourceFile:604)
at bg.b(SourceFile:36)
at z.a(SourceFile:58)
at net.minecraft.server.MinecraftServer.g(SourceFile:674)
at nn.run(SourceFile:84)
at java.lang.Thread.run(Thread.java:722)
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
at mj.a(SourceFile:455)
at aow.a(SourceFile:225)
at aow.a(SourceFile:96)
at mi.b(SourceFile:146)
at mi.a(SourceFile:175)
at mj.a(SourceFile:604)
at bg.b(SourceFile:36)
at z.a(SourceFile:58)
at net.minecraft.server.MinecraftServer.g(SourceFile:674)
at nn.run(SourceFile:84)
at java.lang.Thread.run(Thread.java:722)
The stack traces happen most often when RCON is used to issue the /save-all command. But I suspect they are hidden and happen in the background when /save-all is issued by OP from the Minecraft client.
This is related. In my case, save spikes are indeed caused by minecraft.dat. Client timeouts can occur if a) the server is already behind (Can't keep up!) and b) the save takes long enough. The combination can cause the client to think the server is no longer responding. It is my understanding that the timeout threshold is 20 seconds. So if it takes 10 seconds to do a /save-all (random), and your server is 10 seconds behind, you have a high chance of timeouts.
Still have this issue in 1.7.4. When the game saves the world, either automatically by the game, or externally by cron, there is a save spike that continues to increase over the course of time until it is great enough to cause the clients to disconnect.
I have a start script that deletes the Mineshaft.dat file. Every time it deletes the file, the file has grown to about 1MB. It only takes 30 minutes for it to grow to that size. When the file is small, /save-all is always fast. But when it grows to about 512KB to 1MB, the /save-all can take up to 40 seconds.
There is a direct correlation to save spikes and the size of Mineshaft.dat, at least on my server.
Total world size is currently 43GB. The world was restarted when 1.7.0 was released, so there are no legacy chunks from prior versions. And, of course, no mods.
Still an issue in 14w02c.
I found the cause of this issue, and the solution for it is pretty simple.
First of all, what exactly are these .dat structure files:
These files are a gzip compressed NBTTagCompound, which itself is build out of many other NBTTagCompounds. In case of Mineshaft.dat, these can easily be several thousands.
What is wrong with them:
The procedure of saving these files is currently very inefficient.
How are these files saved:
Via OutputStreams, they are layered in this fashion:
DataOutputStream(GZipOutputStream(FileOutputStream(File)))
Saving is initiated at the root NBTTagCompound and from there each child-NBTTag is recursively visited. Each time a NBTTag is visited, it is written to the top output stream (the DataOutputStream object).
Here is the culprit: For each write operation on the DataOutputStream, the GZipOutputStream is instantly compressing the new data. The problem is, that we dont have a single write operation with lots of data, but thousands of write operations with very litte data. The overhead of the GZipOutputStream for each compression is building up here.
The solution:
Instead of writing each NBTTag directly to a compressed stream, gather the data in an uncompressed one first, and
then write it all at once to the compressed stream.
Gathering can be done with a ByteArrayOutputStream:
DataOutputStream(ByteArrayOutputStream())
When done, get the data with a call to .toByteArray() on the ByteArrayOutputStream object and write the byte array to the previous compressed stream.
When done, write the data from the ByteArrayOutputStream to the previous compressed stream via .writeTo(OutputStream out)
==========================================
You can even take it one step further:
1) Create a copy of the data with .clone() on the root NBTTag, which takes only a fraction of what compressing and writing takes
2) Let a seperate thread write the copy to the compressed stream
Actually, anything that involves .toByteArray() would be a bad workaround. We're talking about a pretty large data set. Without looking at the code myself, I wonder if adding BufferedOutputStream to the chain would help.
On the other hand, I have a hard time believing that all these problems have been caused by what amounts to a one-liner. It has to be a huge mystery to Mojang, so it can't be that simple.
You're right, passing the data along via .toByteArray() is not a good choice here. The class ByteArrayOutputStream also got the method .writeTo(OutputStream out), how about that?
Buffering the data will cut down the number of compressions by a great deal, too, but not as much as doing it in one step.
I did some profiling of the class NBTCompressedStreamTools.
(check the diagram in the attachments area)
With and without this fix:
https://github.com/Poweruser/MinetickMod-1.7.2/commit/b09ed358542129600d02b5be5c90da47cfe58816
Added a more complete profiling.
My vote goes for the BufferedOutputStream now. Changing its buffer size during the tests didn't really make a difference.
Check out the difference in IOPS when i switches to Spigot 1.7.2 from vanilla 1.7.4 on the 12th of januari. Those spikes are backups.:
Michiel, are you aware that Spigot does not save the structure files, unless you specifically tell it to do so? And even if you enable it, it just skips the mineshafts, no matter what you set.
EDIT: Actually, I just went over their code changes again: They are saving the files, just not loading previous saved ones. Either way ...
This is definitely not helpful for Mojang.
EDIT2: The first part of your diskIO log is helpful though, as it adds another piece to the puzzle.
Anthony and I have instead found a way which drastically lowers the time spent for saving compressed NBT files altogether, not just structures. This bug has been in previous versions of Minecraft already, it just hasn't been noticed before.
I don't know how Spigot and Vanilla save this stuff differently. Hope the graph helps.
@Poweruser - see my comment at https://bugs.mojang.com/browse/MC-33134?focusedCommentId=133962&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-133962 - there seems to be something fundamentally odd about how the mineshaft.dat object is stored in memory which is also causing issues
I believe this is also related, observed in the console running 14w04b:
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
at nm.a(SourceFile:494)
at nm.a(SourceFile:474)
at arn.a(SourceFile:226)
at arn.a(SourceFile:99)
at nl.b(SourceFile:160)
at nl.a(SourceFile:191)
at nm.a(SourceFile:655)
at bs.b(SourceFile:36)
at x.a(SourceFile:65)
at net.minecraft.server.MinecraftServer.g(SourceFile:709)
at oq.run(SourceFile:80)
at java.lang.Thread.run(Thread.java:722)
[01:45:00] [RCON Client #24/INFO]: [Rcon: Saved the world]
Still an issue in 14w11b. I believe this console output is related, because it often occurs when /save-all command is issued over rcon:
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
at nz.a(SourceFile:502)
at nz.a(SourceFile:482)
at ave.a(SourceFile:226)
at ave.a(SourceFile:99)
at ny.b(SourceFile:161)
at ny.a(SourceFile:192)
at nz.a(SourceFile:663)
at bz.b(SourceFile:36)
at z.a(SourceFile:65)
at net.minecraft.server.MinecraftServer.g(SourceFile:733)
at qe.run(SourceFile:80)
at java.lang.Thread.run(Thread.java:744)
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
at nz.a(SourceFile:502)
at nz.a(SourceFile:482)
at ave.a(SourceFile:226)
at ave.a(SourceFile:99)
at ny.b(SourceFile:161)
at ny.a(SourceFile:192)
at nz.a(SourceFile:663)
at bz.b(SourceFile:36)
at z.a(SourceFile:65)
at net.minecraft.server.MinecraftServer.g(SourceFile:733)
at qe.run(SourceFile:80)
at java.lang.Thread.run(Thread.java:744)
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
at nz.a(SourceFile:502)
at nz.a(SourceFile:482)
at ave.a(SourceFile:226)
at ave.a(SourceFile:99)
at ny.b(SourceFile:161)
at ny.a(SourceFile:192)
at nz.a(SourceFile:663)
at bz.b(SourceFile:36)
at z.a(SourceFile:65)
at net.minecraft.server.MinecraftServer.g(SourceFile:733)
at qe.run(SourceFile:80)
at java.lang.Thread.run(Thread.java:744)
[10:00:01] [RCON Client #271/INFO]: [Rcon: Saved the world]

We now buffer up NBT writing before compression
I verify, the buffer indeed helped a lot. Before, I had a Mineshaft.dat file that was 11,895,844 and back then every player would get kicked every time the server saved. I had a script that would delete this file before every start.
After this fix, I took out the script and now, the same file has grown to 15,640,524 but no one is being kicked when the server saves.
I'm not saying we're lag free. But those spikes are not crippling the server any longer. Thank you @unknown.
Now that Mineshaft.dat has grown to 21,228,624, the server can take up to 10 seconds to save, often results in Full GC event.
To see these events, here's how I launch the server:
java -server -Xmx4G -Xms1024M -XX:PermSize=128m -XX:MaxPermSize=256m -XX:InitiatingHeapOccupancyPercent=80 -XX:+PrintGCDetails -Xloggc:server_gc.log -XX:+UseG1GC -XX:+UseFastAccessorMethods -XX:+UseStringCache -XX:+OptimizeStringConcat -XX:+UseBiasedLocking -XX:+UseNUMA -Xnoclassgc -XX:OnError=/Users/steve/Minecraft/Swim3/scripts/jvm_on_error.sh -XX:OnOutOfMemoryError=/Users/steve/Minecraft/Swim3/scripts/jvm_on_out_of_memory.sh
Here is what the GC is saying when this occurs:
400.910: [Full GC 3941M->1988M(4096M), 10.3670910 secs]
This means it took > 10 seconds to do a Full GC with only 6 minutes of uptime, 4GB memory. This is unusual. Deleting the Mineshaft.dat eliminates the Full GC events.
It's certainly an improvement, but I think this is still too long and this item should be re-opened.
Also still getting the following stack trace in the console during save (14w20b):
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
at on.a(SourceFile:536)
at on.a(SourceFile:516)
at ayu.a(SourceFile:226)
at ayu.a(SourceFile:99)
at om.b(SourceFile:161)
at om.a(SourceFile:192)
at on.a(SourceFile:697)
at bz.b(SourceFile:36)
at z.a(SourceFile:65)
at net.minecraft.server.MinecraftServer.g(SourceFile:748)
at qt.run(SourceFile:80)
at java.lang.Thread.run(Thread.java:722)
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
at on.a(SourceFile:536)
at on.a(SourceFile:516)
at ayu.a(SourceFile:226)
at ayu.a(SourceFile:99)
at om.b(SourceFile:161)
at om.a(SourceFile:192)
at on.a(SourceFile:697)
at bz.b(SourceFile:36)
at z.a(SourceFile:65)
at net.minecraft.server.MinecraftServer.g(SourceFile:748)
at qt.run(SourceFile:80)
at java.lang.Thread.run(Thread.java:722)
... and ...
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:819)
at java.util.ArrayList$Itr.next(ArrayList.java:791)
at on.a(SourceFile:536)
at on.a(SourceFile:516)
at ayu.a(SourceFile:226)
at ayu.a(SourceFile:99)
at om.b(SourceFile:161)
at om.a(SourceFile:192)
at on.a(SourceFile:697)
at bz.b(SourceFile:36)
at z.a(SourceFile:65)
at net.minecraft.server.MinecraftServer.g(SourceFile:748)
at qt.run(SourceFile:80)
at java.lang.Thread.run(Thread.java:722)
14w21b
[00:45:05] [RCON Client #838/ERROR]: Couldn't process command: 'save-all'
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:819) ~[?:1.7.0_07]
at java.util.ArrayList$Itr.next(ArrayList.java:791) ~[?:1.7.0_07]
at oq.a(SourceFile:704) ~[minecraft_server.jar:?]
at bz.b(SourceFile:37) ~[minecraft_server.jar:?]
at z.a(SourceFile:75) [minecraft_server.jar:?]
at z.a(SourceFile:59) [minecraft_server.jar:?]
at net.minecraft.server.MinecraftServer.g(SourceFile:760) [minecraft_server.jar:?]
at qv.run(SourceFile:80) [minecraft_server.jar:?]
at java.lang.Thread.run(Thread.java:722) [?:1.7.0_07]
[00:47:21] [RCON Listener #2/INFO]: Rcon connection from: /192.168.0.3
14w21b; Java 1.7.0_60-b19; Mac OS 10.9.2
[08:17:36] [Server thread/WARN]: Can't keep up! Did the system time change, or is the server overloaded? Running 2416ms behind, skipping 48 tick(s)
s: Saving entity NBT
at ub.e(SourceFile:1112)
at ub.d(SourceFile:1074)
at ayx.a(SourceFile:210)
at ayx.a(SourceFile:99)
at op.b(SourceFile:161)
at op.a(SourceFile:192)
at oq.a(SourceFile:700)
at net.minecraft.server.MinecraftServer.a(SourceFile:339)
at net.minecraft.server.MinecraftServer.v(SourceFile:524)
at net.minecraft.server.MinecraftServer.run(SourceFile:422)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
at aku.k(SourceFile:127)
at akv.a(SourceFile:128)
at acc.b(SourceFile:228)
at ub.e(SourceFile:1100)
... 10 more
s: Saving entity NBT
at ub.e(SourceFile:1112)
at ub.d(SourceFile:1074)
at ayx.a(SourceFile:210)
at ayx.a(SourceFile:99)
at op.b(SourceFile:161)
at op.a(SourceFile:192)
at oq.a(SourceFile:700)
at net.minecraft.server.MinecraftServer.a(SourceFile:339)
at net.minecraft.server.MinecraftServer.v(SourceFile:524)
at net.minecraft.server.MinecraftServer.run(SourceFile:422)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
at aku.k(SourceFile:127)
at akv.a(SourceFile:128)
at acc.b(SourceFile:228)
at ub.e(SourceFile:1100)
... 10 more
s: Saving entity NBT
at ub.e(SourceFile:1112)
at ub.d(SourceFile:1074)
at ayx.a(SourceFile:210)
at ayx.a(SourceFile:99)
at op.b(SourceFile:161)
at op.d(SourceFile:220)
at oq.c(SourceFile:179)
at net.minecraft.server.MinecraftServer.w(SourceFile:570)
at np.w(SourceFile:284)
at net.minecraft.server.MinecraftServer.v(SourceFile:506)
at net.minecraft.server.MinecraftServer.run(SourceFile:422)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
at aku.k(SourceFile:127)
at akv.a(SourceFile:128)
at acc.b(SourceFile:228)
at ub.e(SourceFile:1100)
... 11 more
[08:17:53] [Server thread/WARN]: Can't keep up! Did the system time change, or is the server overloaded? Running 2468ms behind, skipping 49 tick(s)
[08:17:56] [RCON Listener #2/INFO]: Rcon connection from: /192.168.0.3