It turns out that MC-214808, MC-216148 and MC-214793 are all caused by the same code change introduced in the 1.17 snapshots. In the following I will describe this change and explain how it leads to each of these bugs. I post this as a separate report as it affects multiple other reports, so it doesn't really fit anywhere as a comment.
World generation/loading stores one Future
for each generation stage of a chunk in a corresponding ChunkHolder
. These Futures
are created once something requests the chunk in the respective stage, under the condition that the chunk ticket level is sufficiently large to issue generation into that stage. What changed in the 1.17 snapshots is the handling of demotion of chunk ticket levels.
In 1.16 demoting the chunk ticket level completed all pending Future
s above the new level with a special UNLOADED_CHUNK
marker, while leaving already finished Future
s untouched.
net.minecraft.server.world.ChunkHolder.java
protected void tick(ThreadedAnvilChunkStorage chunkStorage) {
...
Either<Chunk, ChunkHolder.Unloaded> either = Either.right(new ChunkHolder.Unloaded() {...});
for(int i = bl2 ? chunkStatus2.getIndex() + 1 : 0; i <= chunkStatus.getIndex(); ++i) {
completableFuture = (CompletableFuture)this.futuresByStatus.get(i);
if (completableFuture != null) {
completableFuture.complete(either);
} else {
this.futuresByStatus.set(i, CompletableFuture.completedFuture(either));
}
}
...
}
This has two important consequences:
Once a generation stage finishes, the
Future
stays completed with the correct result until the chunk is unloadedA
Future
that is still pending at some point can only be aborted (completed withUNLOADED_CHUNK
, which does not abort the actual generation step) if the chunk ticket level drops too low, otherwise it will complete with a valid chunk. This requires some small argument since theFuture
cannot only be aborted directly by the above code, but also indirectly if any of its dependencies is aborted (completed withUNLOADED_CHUNK
). However, the latter cannot happen if the ticket level stays high enough (after observing the pendingFuture
), by construction of the ticket system.
In the 1.17 snapshots, this handling of demotion fundamentally changed. Instead of directly aborting the pending Future
s above the new level, all Future
s, including already finished ones, are replaced with a new one that completes with an UNLOADED_CHUNK
once the original Future
completes.
net.minecraft.server.world.ChunkHolder.java
protected void tick(ThreadedAnvilChunkStorage chunkStorage) {
...
Either<Chunk, ChunkHolder.Unloaded> either = Either.right(new ChunkHolder.Unloaded() {...});
for(int i = bl2 ? chunkStatus2.getIndex() + 1 : 0; i <= chunkStatus.getIndex(); ++i) {
completableFuture = (CompletableFuture)this.futuresByStatus.get(i);
if (completableFuture != null) {
this.futuresByStatus.set(i, completableFuture.thenApply(__ -> either));
} else {
this.futuresByStatus.set(i, CompletableFuture.completedFuture(either));
}
}
...
}
As a result, both of the above properties are no longer valid:
An already finished stage might be replaced with an
UNLOADED_CHUNK_FUTURE
If the chunk ticket level drops below, say,
t
then the correspondingFuture
is not directly aborted, but instead replaced with aFuture
that will be lazily aborted. If the ticket level is now raised abovet
again while the originalFuture
is still running, then no newFuture
will be issued for this generation stage.net.minecraft.server.world.ChunkHolder.java
public CompletableFuture<Either<Chunk, ChunkHolder.Unloaded>> getChunkAt(ChunkStatus targetStatus, ThreadedAnvilChunkStorage chunkStorage) { int i = targetStatus.getIndex(); CompletableFuture<Either<Chunk, ChunkHolder.Unloaded>> completableFuture = (CompletableFuture)this.futuresByStatus.get(i); if (completableFuture != null) { Either<Chunk, ChunkHolder.Unloaded> either = (Either)completableFuture.getNow((Object)null); if (either == null || either.left().isPresent()) { return completableFuture; } } if (getTargetStatusForLevel(this.level).isAtLeast(targetStatus)) { CompletableFuture<Either<Chunk, ChunkHolder.Unloaded>> completableFuture2 = chunkStorage.getChunk(this, targetStatus); this.combineSavingFuture(completableFuture2, "schedule " + targetStatus); this.futuresByStatus.set(i, completableFuture2); return completableFuture2; } else { return completableFuture == null ? UNLOADED_CHUNK_FUTURE : completableFuture; } }
This was fine in 1.16 since the Future
could only be pending if it was not already (indirectly) aborted (which is exactly point 2 above). However, in 1.17 the Future
is already indirectly aborted (it is replaced by a lazily aborted one) so it will always produce an UNLOADED_CHUNK
even if the ticket level does not fall again. This breaks property 2.
It turns out that this code was changed in 21w06a, which is precisely the version where all 3 bugs were reported. (I actually checked the precise version retrospectively after figuring out the cause for these issues).
So, how does violation of these 2 properties cause the 3 linked bugs?
1. Ticking-Futures might never complete
Every time the ticket level is raised high enough, a chunk is promoted to BORDER
, TICKING
and ENTITY_TICKING
state respectively. This will create additional Future
s that depend on the generation Future
s in some small neighborhood of the chunk. Upon completion, these will execute the respective tasks for promoting the chunk to the respective state, like registering tick schedulers, marking the chunk tickable/entity-tickable and sending the chunk data to players.
Assuming property 2 above, these Future
s will always complete with a valid chunk at some point, unless the ticket level drops too low. In the latter case, the Future
s get recreated once the chunk is promoted again, so this is not an issue.
Now, in the 1.17 snapshots property 2 is violated and hence these special Future
s might never complete (with a valid chunk) since they might depend on lazily aborted Future
s upon creation. Hence the respective promotion tasks might never execute. In particular, the ticking-Future
, which is responsible for sending the chunk data to players, might not execute, hence causing MC-214808.
Note that all 3 Future
s can fail independently. For example, I did observe chunks where the ticking-Future
completed but the entity-ticking-Future
did not, causing the chunk to load but entities to get stuck in these chunks. Also note that this issue can similarly lead to the "Chunk not there when requested: "
error when the ServerChunkManager
queries a lazily aborted Future
for the FULL
stage. I actually observed this crash a few time during debugging.
In order to provoke this issue, one can try the following steps. Due to the random nature of these bugs, several attempts might be needed. Run
/tp ~1000 ~ ~
/tp ~-1000 ~ ~
/tp ~1000 ~ ~
in quick succession, e.g., with one tick in-between, or while pausing the server thread through the debugger. The first teleport will create theChunkHolder
s at the target location and create the generationFuture
s up toFULL
stage. The second tp will then lazily abort theseFuture
s again and the third and final tp will create theticking-Future
s on the lazily aborted but still pending generationFuture
s, hence causing MC-214808.
I also observed this issue with only a single tp, although this was less reproducible. This might seem strange at first, since the ticket levels should change monotonically in this case and not experience the jitter required for the above explanation. I think this is caused by an interaction of thePOST_TELEPORT
chunk ticket and the player ticket throttling. ThePOST_TELEPORT
ticket is created right after teleporting, but expires before the player tickets are added because of the throttling, hence causing the required jitter.
2. Futures might be erased completely
If the chunk ticket level drops low enough, the corresponding ChunkHolder
is scheduled for unloading. It can still be revoked up to the point where it is actually saved to NBT
and all the unloading tasks are done. This can be a few ticks from the actual scheduling, depending on server load. By property 1, once the corresponding chunk has passed the first generation/loading stage, all Future
s will keep the reference to this chunk, so that it can indeed be revoked from the ChunkHolder
.
However, in 1.17 when the ChunkHolder
is scheduled for unloading, all generation Future
s are replaced with (lazily) aborted ones. If this ChunkHolder
gets then later on revoked (before it could properly save and unload) all generation Future
s are recreated and the very first stage then reloads the chunk from disk again. Hence, the chunk object gets replaced with a completely new version that is reloaded from disk, erasing any progress since the last save. This is exactly MC-216148.
Note that the unloading tasks did not run in this scenario, so any externally stored data is still present. In particular, the chunk is still marked as loaded and will hence skip the block-entity loading step.
net.minecraft.server.world.ThreadedAnvilChunkStorage.java
private CompletableFuture<Either<Chunk, ChunkHolder.Unloaded>> convertToFullChunk(ChunkHolder chunkHolder) {
...
worldChunk2.loadToWorld();
if (this.loadedChunks.add(chunkPos.toLong())) {
worldChunk2.setLoadedToWorld(true);
worldChunk2.updateAllBlockEntities();
}
...
}
As a consequence, block-entity tickers will not be recreated and still reference the old copies, hence block-entities will not be ticked. This is indeed a side effect noted in the bug report.
In order to provoke this issue, run for example
/tp ~1000 ~ ~
/tp ~-1000 ~ ~
in quick succession, so that the ChunkHolder
s do not save and unload in-between. This can be achieved by pausing the server thread, for example. The first tp will erase all generation Future
s and the second tp will then regenerate them and reload the chunk from disk.
In the report, it is noted that this seems to happen near nether portals. I'm not really sure how this is correlated. Most certainly, the relevant feature of nether portals are the PORTAL
tickets created upon using them. However, I don't know how they enter the equation. They might play a similar role to the POST_TELEPORT_TICKET
of the previous section or it might be something else.
3. Chunks might drop below FEATURES stage during initial lighting
While MC-214793 is actually caused by another concurrency bug, namely MC-224894, it might still be worthwhile to explain why this issue only showed up recently, even though the other bug is way older than 1.16.
Chunks below the FEATURES
stage, i.e., chunks for which the FEATURES
generation Future
does return UNLOADED_CHUNK
, are considered opaque by the lighting engine. Due to MC-224894, chunks are not kept in this required stage during initial lighting and hence might become opaque to the lighting engine.
However, due to property 1, in 1.16 the chunk always stayed in FEATURES
stage once it was scheduled for initial lighting (and even was kept from unloading due to the pending light task). What could have happened even in 1.16 is that neighbor chunks might be unloaded between starting the initial lighting and finishing, which would cause glitches at the border. However, these neighbor chunks were kept loaded by the light ticket that is only removed (wrongly) at the start of the initial lighting, so the time frame was usually way too small, given that the actual unloading usually requires some extra ticks.
On the other hand, in 1.17 the chunk will immediately lose its FEATURES
status once the light
ticket is removed (and the player is sufficiently far away), hence triggering the issue with a relatively small amount of work required by the server thread which thus gives a large enough time frame for the issue to happen.
Furthermore, note that in 1.16 the light generation stage can be aborted before execution if any pending dependency was aborted due to ticket levels dropping too low. In 1.17 this is no longer the case since generation Future
s don't get aborted directly. Hence, 1.17 more often processes initial lighting even if the player is already far away, compared to 1.16, increasing the chance for glitches at he chunk borders.
MC-224894 already describes how to trigger this lighting issue alone. However, MC-214793 mentions corrupted features in conjunction with this, I am not sure what is happening here, but I think this is a combination of the pure lighting issue and the previous section (progress being reverted). For example, one might argue that chunks which do make it to the LIGHT
stage are kept from unloading a bit longer than their neighbors (which already finish after the FEATURES
stage), due to the additional FUTURE
that is waited upon by the unloading code. Hence, these ChunkHolder
s might still be alive while their neighbors are already saved and unloaded and are hence susceptible to the previous issue which then regenerates these chunks completely, erasing all features that were leaking in from neighbors. However, as the light data is stored externally in the lighting engine, it will stay broken and not be regenerated. The neighbors were already saved to disk and will hence not need to regenerate, so they keep their features, causing corruption at the boundary.
Anyway, this is just a wild guess and probably not the whole story. But I think it's not too important to know precisely what's going on here. Debugging this is a real nightmare due to the rather random nature of everything involved and light
tickets interacting with the whole argument.
Conclusion
The actual bug should be reasonably easy to understand and fix, e.g., by reverting to the 1.16 behavior. On the other hand, there might be good reasons for this change, so some more sophisticated solution might be necessary.
In any case, I hope these explanations have made clear the connections to the other 3 bugs and improved understanding of the general concurrency issues.
Best,
PhiPro
Related issues
relates to
Comments


Fixed in 21w18a?

Whoops, ofc 1.16.5 is not affected. Guess all my other reports affected older versions aswell, so I clicked it kinda automatically.

The implemented fix for this issue did not revert to the 1.16 code, but instead just dropped the abortion part alltogether
net.minecraft.server.world.ChunkHolder.java
protected void tick(ThreadedAnvilChunkStorage chunkStorage) {
...
Either<Chunk, ChunkHolder.Unloaded> either = Either.right(new ChunkHolder.Unloaded() {...});
for(int i = bl2 ? chunkStatus2.getIndex() + 1 : 0; i <= chunkStatus.getIndex(); ++i) {
completableFuture = (CompletableFuture)this.futuresByStatus.get(i);
if (completableFuture == null) {
this.futuresByStatus.set(i, CompletableFuture.completedFuture(either));
}
}
...
}
Unfortunately, this breaks property 2 above. (Guess I was a bit too sloppy when sketching the argument why it holds for 1.16 and I honestly didn't spot the issue either when briefly looking at the implemented solution).
The problem is that there are still some mechanisms that can indirectly abort the Future
s, namely ThreadedAnvilChunkStorage.getRegion(...)
which returns an UNLOADED_CHUNK
if any of the requested chunks has ticket level below EMPTY
(note that the chunk does not need to be completely unloaded for the check to fail, as the code only checks the live chunk holder map, but does not attempt to revoke any chunk) or ThreadedAnvilChunkStorage.convertToFullChunk(...)
which does abort if the chunk has ticket level below FULL
.
This can lead to generation Future
s that are still pending (for example when looking at it for creating the ticking-Future)
but are already indirectly aborted through their dependencies (because the ticket level dropped too low at some point in the past), and this information might not yet have been propagated, e.g., because it needs to pass through other threads. This can hence break property 2.
Note that in 1.16 this does not happen since Future
s are directly aborted when the ticket level drops too low, so one can conclude from observing a still pending generation Future
that the ticket level did not drop (below the respective status) since creation of the Future
, and then conclude from this that the Future
is also not indirectly aborted. In 21w18a this argument does not work since Future
s are not directly aborted and hence do not allow any conclusion about past ticket levels.
As a consequence, symptom 1 (Ticking-Future
s might never complete) can still occur (whereas the other 2 symptoms were caused by failure of property 1, which is indeed fixed). This was indeed reported in MC-224986.
In the following I will present some very artificial steps to reproduce the issue deterministically. (Unfortunately, these steps do have some chance for failure since there is no way to prohibit the server thread from looking at chunks around the player (even with spectator mode), so that these steps will unintentionally pause the server thread even when only explicitly pausing the worldgen thread. This leads to some timing issue which can cause these steps to fail. Nevertheless, they worked quite reliably for me.)
Create a new world (these steps will only work when generating new chunks, not when reloading them)
We need a bunch of breakpoints in order to force some bad timing between worldgen and server thread. It is important that the breakpoints are configured to only pause the hitting thread, and not all threads. Concretely, we will place these in the following method:
net.minecraft.server.world.ThreadedAnvilChunkStorage.jaba
private CompletableFuture<Either<Chunk, ChunkHolder.Unloaded>> upgradeChunk(ChunkHolder holder, ChunkStatus requiredStatus) { ChunkPos chunkPos = holder.getPos(); CompletableFuture<Either<List<Chunk>, ChunkHolder.Unloaded>> completableFuture = this.getRegion(chunkPos, requiredStatus.getTaskMargin(), (i) -> { return this.getRequiredStatusForGeneration(requiredStatus, i); }); ... Executor executor = (runnable) -> this.worldGenExecutor.send(ChunkTaskPrioritySystem.createMessage(holder, runnable)); return completableFuture.thenComposeAsync((either) -> { return (CompletableFuture)either.map((list) -> { try { CompletableFuture<Either<Chunk, ChunkHolder.Unloaded>> completableFuture = requiredStatus.runGenerationTask(executor, this.world, this.chunkGenerator, this.structureManager, this.serverLightingProvider, (chunk) -> this.convertToFullChunk(holder), list); this.worldGenerationProgressListener.setChunkStatus(chunkPos, requiredStatus); return completableFuture; } catch (Exception var9) { ... } }, (unloaded) -> { this.releaseLightTicket(chunkPos); return CompletableFuture.completedFuture(Either.right(unloaded)); }); }, executor); }
After generating the world, place a breakpoint in the
requiredStatus.runGenerationTask(...)
line. This will eventually pause the worldgen thread and prohibit chunks from finishing generation.
/tp ~1000 ~ ~
. This should now trigger the breakpoint on the worldgen thread. Chunks are now frozen in some early generation stage./tp ~-1000 ~ ~
while the worldgen thread is still paused. Leaving the region will eventually cause theThreadedAnvilChunkStorage.getRegion(...)
calls to produceUNLOADED_CHUNK
s.Next, move the breakpoint to the
this.getRegion(...)
line (removing the old one) and continue execution. This should trigger the breakpoint on the server thread. Chunks can now finish their current execution step and are now all waiting forgetRegion(...)
for their next step.Move the breakpoint a last time to the
this.releaseLightTicket(...)
line (again removing the old one) and continue execution, The breakpoint will now trigger again on the worldgen thread (This step may fail if one has bad luck, see the remark in the beginning. Just repeat the whole process in this case). All the pendinggetRegion(...)
calls will now produceUNLOADED_CHUNK
results, but this information is not propagated due to the worldgen thread being paused./tp ~1000 ~ ~
. This will now recreate theticking-Future
s in the target region, using the still pending, but already indirectly aborted generationFuture
s.Finally, remove the breakpoint and continue execution. All the
Future
s, including theticking-Future
s, will now be completed with anUNLOADED_CHUNK
(verify this for example by looking at the chunk holder map), hence triggering the chunk loading issue (symptom 1).
This problem shows that it is necessary to keep track of the history of the ticket level, either by directly storing that information in the Future
by aborting it like 1.16 does. Or by using some other external tracking which then needs to extend still pending Future
s with some retry mechanism upon increasing ticket level (or something in that direction)
Best,
PhiPro

This issue actually still exists in 1.17-pre3 to some extent. It took me quite a while to find out what actually changed in pre3 at all, but as far as I can tell, the relevant change is that the getRegion(...)
calls are now done eagerly instead of lazily, or more precisely ThreadedAnvilChunkStorage.getChunk(...)
executes immediately instead of waiting for the previous worldgen stage to finish. As a consequence, they should indeed no longer be able to produce UNLOADED_CHUNK
results by construction of the ticket system, hence solving the example in my previous comment.
As a side remark: This change might have some negative impact on performance, as it gets increasingly difficult to abort already scheduled worldgen steps. Lazy evaluation of the getRegion(...)
calls (or rather of the getChunk(...)
code) was at least able to abort all but the lowest pending worldgen stages (which still misses a bunch of steps which could possibly be aborted (MC-183841)), whereas with eager evaluation no stages at all are aborted. I haven't actually looked into actual numbers to tell whether or not this is relevant. Might be a good idea to look investigate.
Anyway, back to the original issue. There is still one place that can produce UNLOADED_CHUNK
results when generating new chunks, namely ThreadedAnvilChunkStorage.convertToFullChunk(...)
. The following steps shows how this can be employed to prevent a single chunk from loading to the client:
Create a new world (these steps will only work when generating new chunks, not when reloading them)
Set a breakpoint in the
return completableFuture;
line insideThreadedAnvilChunkStorage.upgradeChunk(...)
(see the code snippet in the previous comment). Configure this breakpoint to only pause the hitting thread and only trigger it on the conditionrequiredStatus == ChunkStatus.FULL
Run the following two commands in chained command blocks, so that they execute in the same tick
/tp @p ~1000 ~ ~
/tp @p ~ ~ ~
The first of these commands will trigger chunk loading through the
PLAYER
(andPOST_TELEPORT
) ticket. The second will teleport the player back before it tries to access the chunk, which would result in the server thread getting paused (see the timing issue of the previous comment).
This will now trigger the breakpoint on the worldgen thread. The returnedcompletableFuture
corresponds to theFULL
stage due to the breakpoint condition. Since the server thread was not paused, the corresponding call toconvertToFullChunk(...)
could already finish and produce anUNLOADED_CHUNK
result, as the player was already teleported away, so the ticket level of the chunk is belowFULL
. This can be verified by inspecting the returnedcompletableFuture
. (This step might fail due to thePOST_TELEPORT
ticket, which can keep the ticket level of some of the chunks atFULL
. In this cause simply continue until hitting a breakpoint where thecompletableFuture
contains anUNLOADED_CHUNK
)./tp @p ~1000 ~ ~
TheUNLOADED_CHUNK
result is not propagated since it is still stuck in the paused worldgen thread. Teleporting back to the area creates the ticking-Future which then depends on the still pendingFuture
for theFULL
stage, which will hence eventually produce anUNLOADED_CHUNK
. As in the previous examples, this will prevent this chunk from loading to the client.Remove the breakpoint and continue execution. The chunk where we forced the
UNLOADED_CHUNK
result will not load to the client, but all the others do.
Best,
PhiPro


Can confirm in 1.19.3 and 23w03a
How 1.16.5 is affected? Because all the related issues are new to 1.17