I've uploaded src/minecraft_server/net/minecraft/world/chunk/storage/AnvilChunkLoader.java
The fix is removing some code which I have shown commented out with the annotation MC-22147 FIX. As I said before, ConcurrentHashMap put() and remove() are fully synchronized between threads and don't need a separate attempt at synchronization with a wrapper. The code using pendingAnvilChunkCoordinates makes it so that new requests to save a chunk are thrown away if the chunk is currently in the process of being written out to disk (with old data) which is incorrect behavior. The new data should not be discarded - it needs to be added to the queue even if old data is in the process of being written to disk.
These are the patched 1.11.2 server files that fix the problem for me. ConcurrentHashMap.put() and ConcurrentHashMap.remove() are fully synchronized between multiple threads. There is no reason to try to wrap the operations in an additional layer of synchronization. Also, the most recent data should never be discarded. If any data is to be discarded, it should be the old data.
Also, I am getting this bug on 1.11.2. I run a multiplayer server and like to make large villages, so this bug happens every few days. I can see the following sorts of warning messages in the server logs:
[20:55:39] [Server thread/WARN]: Keeping entity minecraft:villager that already exists with UUID 7789a225-8dfb-4c8b-9ff7-d09220e2357c
[20:55:39] [Server thread/WARN]: Keeping entity minecraft:villager that already exists with UUID 520cf954-4346-4bd4-9d86-c22f4af6dc81
[20:55:39] [Server thread/WARN]: Keeping entity minecraft:villager that already exists with UUID b9003671-7440-47ff-ad3f-c845a30bcac5
Using NBTExplorer on the region files I can see that some villagers that I have given a name with a Name Tag are disappearing as well.
I have found a bug in AnvilChunkLoader.java while browsing the MCP decompiled source code that can lead to duplicate/missing entities. Here's how it can happen:
1. Current chunk data gets added to the queue to be unloaded.
2. Chunk gets reloaded for whatever reason and entities enter/leave the chunk.
3. Queued chunk data starts to get written out in the separate IO thread.
4. While the queued chunk data is in the process of being written, the chunk is queued to be unloaded again.
The problem is the following function gets called for unloading the chunk the second time (from MCP source code):
protected void addChunkToPending(ChunkPos pos, NBTTagCompound compound) {
if(!this.pendingAnvilChunksCoordinates.contains(pos)) {
this.chunksToRemove.put(pos, compound);
}
ThreadedFileIOBase.getThreadedIOInstance().queueIO(this);
}
Since the same ChunkPos is already in the process of being written, it will not add the new chunk data to the queue. The result is that the stale data is written to the region file while the new data is just discarded. This means that entities that left the chunk will be duplicated in the region file (still in the stale chunk data as well as added to its new chunk) and that entities that entered the chunk will be lost (not in the stale chunk data and removed from its previous chunk).
There is only a single I/O thread used to write out chunks which is accessed through the singleton class src/minecraft_server/net/minecraft/world/storage/ThreadedFileIOBase.java
When a chunk is queued for saving, a snapshot in the form of an NBTTagCompound containing the chunk data is created and then placed in the ConcurrentHashMap chunksToRemove with the chunkPos as the key. This is what is later saved to disk by the I/O thread.
Here are the possible cases when a chunk is queued for saving (data A), is modified, and then is requeued for saving (data B):
1. Data B is requested to be queued before data A has started writing to disk. In this case data B replaces data A in the queue.
2. Data B is requested to be queued while data A is being written to disk. In this case data B is not put on the queue and is discarded. However, the server continues as if data B has been queued (clears dirty flags, etc.)
3. Data B is requested to be queued after data A has finished writing to disk. In this case data B is added to the queue.
Clearly, case 2 is doing the wrong thing. I do not see any reason why you would want to throw away data B just because data A is in the process of being written to disk especially since right before that and right after that, data B would instead be added to the queue.