The overall issue dealt with in this report is that block changes are not properly synchronized with the lighting engine; in particular they are not properly synchronized with the lightmap handling which determines the region the lighting engine can operate on and the region where skylight optimizations are applicable. As shown below, this can lead to persistent lighting glitches.
In the following we'll look into 3 variants of this general issue that are relevant for my proposed solution. I attached a world download containing a simple setup for all 3 problems with the labeling as below. Due to the multithreading nature of the problem, the issues are not necessarily reliably reproducible, although they were on my system; it would be nice if someone else could confirm that the setup works on their system. Below I'll try to extract the relevant points.
A boring solution to these issues would be to employ some copy-on-write data structure to synchronize block changes to the lighting thread in a more controlled fashion. However, that might add quite a lot of overhead in order to fix some bug that might not occur that often in practice. The solution developed below should be much more efficient and in fact not very complicated.
While the first issue is not directly a multithreading issue, it is still relevant for the solution of the general bug.
Problem 1: Lightmaps get removed too early
When the last block of a subchunk gets removed, the lighting engine is told to remove the corresponding lightmap (if there are no nearby blocks). Only afterwards it is told to reevaluate the removed block. However, at this point the lightmap was already removed, so the lighting engine simply ignores the light check; even if it wouldn't be ignored, all the relevant information would be erased.
This can be visualized as follows:
Create a new redstone-ready world
/setblock 7 82 7 minecraft:sea_lantern
Destroy the sea lantern
Observe that the block light value is still 15. This is actually not directly the bug we are after. This part of the issue will disappear upon reloading the world or triggering some block light update. We will come back to this later.
[media]
Reload the world to fix the previous point.
Note that the light is still broken in subchunk
[media]0 4 0
As soon as we break the sea lantern, the lightmap of that subchunk gets removed, so the lighting engine cannot process the block update to reduce the blocklight level. Hence the light stays in subchunk 0 4 0
which doesn't have its lightmap removed because of nearby blocks. The blocklight level for subchunk 0 5 0
is set to 0 by removing the lightmap, as we see after reloading the world.
The relevant code pieces are
net.minecraft.world.chunk.WorldChunk.java
@Nullable
public BlockState setBlockState(BlockPos pos, BlockState state, boolean bl) {
...
boolean bl2 = chunkSection.isEmpty();
BlockState blockState = chunkSection.setBlockState(i, j & 15, k, state);
...
boolean bl3 = chunkSection.isEmpty();
if (bl2 != bl3) {
this.world.getChunkManager().getLightingProvider().updateSectionStatus(pos, bl3);
}
...
}
net.minecraft.world.World.java
public boolean setBlockState(BlockPos pos, BlockState state, int flags) {
....
BlockState blockState = worldChunk.setBlockState(pos, state, (flags & 64) != 0);
...
this.profiler.push("queueCheckLight");
this.getChunkManager().getLightingProvider().checkBlock(pos);
this.profiler.pop();
}
...
Chunk.setBlockState(...)
calls updateSectionStatus(...)
to issue the creation and removal of lightmaps to the lighting engine. World.setBlockState(...)
first calls Chunk.setBlockState(...)
and only afterwards checkBlock(pos)
which processes the block update for the lighting engine.
To fix this issue, we should first issue the lighting check and only afterwards the removal of the lightmap. By a similar argument, we should first issue the creation of a lightmap and then the lighting check for a newly added block. As we will see later, we must in fact issue the creation of a lightmap before changing the block via chunkSection.setBlockState(...)
.
Hence I propose to move the lightmap handling (updateSectionStatus(...)
) to World.setBlockState(...)
and change the order to
Issue lightmap creations if necessary
Call
Chunk.setBlockState(...)
Issue light checks
Issue lightmap removals if necessary
Also, the lighting engine should process the issued operations in this order. Keeping only a single call toupdateSectionStatus(...)
as in the current code and adding some priority system to achieve the right order might work in the single threaded case, but not in the multithreaded case, as one operation might be processed before the others get issued.
There is a similar version for skylight included in the attached world, however it requires some more operations as there is some special handling for adding and removing skylight maps.
For completeness, let's discuss why in the example above the light stayed in subchunk 0 5 0
until reloading the world. The main thread does not read light values directly from the internal lightmaps, but gets its own immutable copies that are updated inside
net.minecraft.world.chunk.light.LightStorage.java
protected void notifyChunkProvider() {
if (!this.field_15802.isEmpty()) {
M chunkToNibbleArrayMap = this.lightArrays.copy();
chunkToNibbleArrayMap.disableCache();
this.uncachedLightArrays = chunkToNibbleArrayMap;
...
}
field_15802
gets populated whenever a light value changes or a new lightmap is added (which can contain precalculated light values and hence cause a change of light values as well), but not when a lightmap is removed. Hence the immutable copy of the lightmaps does not get updated until reloading the world or triggering some blocklight update, as observed above.
By itself this isn't really a bug. If everything else worked correctly, lightmap removal shouldn't change light values, hence there shouldn't be a reason to create a new copy of the lightmaps, other than freeing some memory.
Images produced by the attached world:
[media][media]
Problem 2: Blocks can be seen too early
Since the lighting engine reads blocks directly from the main thread world, it can see block changes "immediately". In particular, it can see blocks before all relevant lightmaps have been created; even though lightmap creation should be issued before the actual block change is carried out, as noted above, the lighting engine might encounter a new block while processing other light updates. Although the issued lightmap creation will be visible to the lighting thread at that point (by looking at the queued operations), there is no reason why this operation should have been processed. Even if we processed all pending operations just before the block lookup, it could happen that the block gets placed and the lightmap creation gets issued inbetween those two operations.
This can lead to the problem that the lighting engine can see and take into account newly placed blocks although the relevant adjacent lightmaps, which that block can affect, have not yet been created, so the light updates will get stuck at those neighbor subchunks. Later on, the relevant lightmaps will be added and a light check for the newly added block will be carried out. However, the light is already correct around this block; the lighting errors occur at the boundary to the neighbor subchunk. Hence, the light check at the newly added block will succeed and no further updates will be carried out, keeping the errors at the chunk boundary.
For skylight this also causes the skylight optimization to be applied wrongly; the optimization only works 16 tiles away from non-air blocks, so is not applicable around those newly placed blocks. But since the lighting engine has not been told yet about those new non-air blocks, it might apply the optimization nevertheless, causing errors. This is what we will demonstrate in the following.
The setup is as follows:
We have some large stone platform high up in the sky (at y=255 in order to cause long light propagation times). Subchunk 1 6 1
has a lightmap that is kept alive by nearby blocks, but subchunk 1 5 1
doesn't. Inside a single tick
Punch a small hole in the stone platform
Trigger the lighting engine to propagate the light downwards from this hole by causing some more light checks
Cause a bunch of other lightmap creations and light updates, so the priority system doesn't accidentally move the next step before processing the updates of the previous one. Don't cause too many updates, as otherwise the light propagation of the previous step will be finished before we can do cool stuff.
The lighting thread will now propagate the light downwards from the hole and hence not process any operation caused by the next step until done
Place a platform of leave blocks in subchunk
1 6 1
below the holeOn its way down, the light will now see the leave blocks and correctly incorporate them into the result. As subchunk {{ 1 5 1}} doesn't have a lightmap, the skylight optimization will kick in and simply copy down the light values from subchunk
1 6 1
.
After passing the leave blocks, the light should decay to 0 after 15 blocks. But because of the wrongly applied optimization, it travels a whole 16 blocks (or arbitrarily many) without any decay.Now the lighting engine adds the lightmap for subchunk
1 5 1
which is initialized by copying down the values from subchunk1 6 1
, ie. the values as produced by the skylight optimization. Also, light checks are carried out for the leave blocks. But as the light was already correct around those, nothing changes and the light in subchunk1 5 1
stays broken.Add some blocks in subchunk
[media]1 5 1
to better visualize the issue (F3 would work as well)
What we learn from this is that we should make sure that the lighting engine doesn't see any non-air blocks for subchunks that it has been told to be empty, as this can conflict with the skylight optimization.
The lighting engine already keeps track of which subchunks should be empty according to the main thread and spawns lightmaps accordingly. I propose to employ this information inside the block lookup used by the lighting engine: First test whether the queried subchunk should be empty and return air in that case. Otherwise, proceed as normal. This prevents the described issue that blocks get seen before all relevant lightmaps have been added.
Unfortunately, things can get a bit more delicate as Problem 3 shows.
Problem 3: Lightmap removal must be cancellable
The basic issue encountered here is that lightmap removal should be canceled if a subchunk gets repopulated before the lightmap is actually removed, rather than removing the lightmap and then recreating it, as this would erase information.
In order to get a better understanding, let's look at the setup in the attached world. The situation is mainly the same as in Problem 2: we have a stone platform high up in the sky and subchunk 1 6 4
has a light map. This time subchunk 1 5 4
has a light map that is kept alive by some nearby block, as well. Similarly as above, inside a single tick
Punch a small hole in the stone platform
Trigger the lighting engine to propagate the light downwards from this hole by causing some more light checks
Cause a bunch of lightmap creations and light updates to prevent priority reordering
The lighting thread will now propagate the light downwards from the hole and hence not process any operation caused by the next steps until done
As in Problem 2, we will spawn a leave platform while the lighting engine is propagating the light downwards in order to cause some issues. However, we want the lighting engine to remove the lightmap for subchunk
1 5 4
before it gets kept alive by the leave platform, but after propagating the light downwards, in order to produce the issue outlined above. Hence we need some other steps before spawning the leave platform.
Now, issue removal for the lightmap of subchunk1 5 4
by deleting the blocks that keep it alive, but in such a way that the lightmap for subchunk1 6 4
does not get removed.Again cause some more lightmap creations and light updates to prevent priority reordering
Now spawn the leave platform in subchunk
1 6 4
below the holeOn its way down, the lighting engine will now see and incorporate the leaves blocks, as in Problem 2. This time, however, subchunk
1 5 4
does have a lightmap, so we don't wrongly apply the skylight optimization, but correctly propagate the skylight to subchunk1 5 4
. Subchunk1 4 4
now correctly stays dark.Now the lighting engine processes the issued removal of the lightmap at
1 5 4
. The issued recreation of the lightmap through the leave platform is not yet processed, as we issued a bunch of other lightmap creations, so it is still stuck in the queue.Now we finally process the issued recreation of the lightmap at
[media]1 5 4
. However, at this point the lighting information has already been erased and the new lightmap for1 5 4
will be initialized by copying down the values from1 6 4
, hence the light below the leave platform does not decay, looking as in Problem 2. However, subchunk1 4 4
will stay dark, in contrast to Problem 2, showing that we indeed observe a different issue here.
This issue is a mixture of Problem 1 and 2. In some sense, the issue is that the lighting engine sees the leave blocks before it is told that subchunk 1 5 4
is not empty (i.e. before it processes the lightmap creations). However, the solution for Problem 2 of simply treating that block as air does not apply here, because at the time the leave block is seen, that subchunk is not marked empty, but gets only marked empty later on. In some sense this is also related to Problem 1, as lightmaps that are relevant to the leave blocks get removed after the lighting engine sees the leave blocks but before it processes the removal of the leave blocks.
The general lesson learned here is that the lightmap creation issued by spawning the leave platform should cancel the previous removals of lightmaps.
The precise order of operations needed is summarized in the following section.
Proposed Solution
As noted above, the main thread should carry out operations in the following order:
Issue an operation to mark the subchunk as non-empty, as necessary
Call
Chunk.setBlockState(...)
Issue light checks
Issue an operation to mark the subchunk as empty, as necessary
It will become clear in a moment why the subchunk must be marked non-empty not only before the light check, but also before the block gets written to the chunk.
For block lookups, the lighting engine should first test whether the queried subchunk should be empty, according to its current knowledge, and in this case return air.
The lighting engine should carry out operations in the following order:
When starting an update cycle, take a snapshot of the operation queue and don't process any later updates. This makes sure that we don't process light checks without processing the corresponding lightmap creation or lightmap removals without the corresponding light checks.
Process all operations that mark subchunks as non-empty
Process all light checks and propagate all light updates
In case that only a limited number of light updates is carried out at a time, finish the current set of updates before processing any new operation
After all pending updates have been processed, process the operations that mark subchunks as empty. In this step, we need to take another snapshot of the operation queue and search it for operations that mark subchunks as non-empty, and cancel operations accordingly. This is needed as shown by Problem 3.
The correctness of this last step requires some small argument, as follows. Suppose some subchunk has been marked as empty. We consider two cases:The lighting engine has not seen any block that was added to the subchunk after the main thread issued the operation to mark it as empty. In this case, we can safely mark this subchunk as empty, since we processed all pending light updates that were caused by the removal of blocks and no new blocks have been observed.
The lighting engine does see some block that was added to the subchunk after the main thread issued the operation to mark it as empty. Now we are in the situation of Problem 3. The important point is now that we can use acquire-release semantics on reading this newly added block and the corresponding write on the main thread. Because the operation to mark subchunks as non-empty is issued before writing the block ⚠️ , acquire-release semantics make sure that we will see this operation in the operation queue after observing the block. Hence we will find the operation in the second snapshot of the operation queue and correctly cancel the operation.
In fact, acquire-release semantics are not precisely guaranteed as reading blocks isn't even threadsafe (MC-162080), however in practice they will hold on x86 architectures as long as MC-162080 doesn't occur.
Leave all remaining operations to mark subchunks as non-empty for the next cycle or process them right away.
Unfortunately, as far as I can tell, this is not so easily achievable with the current design of the multithreading code. That is because all operations are wrapped inside some generic runnable tasks and put into a single task queue, so it is not straightforward to scan this task queue without actually processing its elements.
I propose to introduce more specialized "task queues" for different types of tasks which then can be scanned more easily. More concretely, I propose to add 3 sets (and possibly more for other stuff not covered here) lightChecks
, markEmpty
, markNonEmpty
. When the lighting engine starts its update cycles, it takes ownership over all 3 sets and gives the main thread new ones to populate. Now we can easily process markNonEmpty
and lightChecks
. When processing markEmpty
, we again take ownership over the new markNonEmpty
and cancel any subchunks occurring in both. Now we can either merge the remaining operations of markNonEmpty
back into the current set for the main thread to populate or simply keep it and merge it with the next update cycle.
The main thread then populates the sets as follows:
Use some lock, so that we don't write to the sets after the lighting engine has taken ownership
Add light checks to
lightChecks
(obviously 😛)When issuing a new empty subchunk, add it to
markEmpty
When issuing a new non-empty subchunk, first try to cancel a corresponding entry in
markEmpty
and otherwise add it tomarkNonEmpty
.Issuing an empty subchunk must not cancel any operation in
markNonEmpty
, as this operation might be needed to cancel a removal in the currently ongoing cycle. Canceling entries ofmarkEmpty
by non-empty subchunk, on the other hand, is unproblematic.
I hope this report was more or less understandable. If anything is unclear, please don't hesitate to ask 🙂
Best,
PhiPro
Related issues
relates to
Attachments
Comments


Just some random thought I had while implementing some other bugfixes. I put this here preemptively in case this point comes up in your discussions as well.
One could say that it sounds rather inefficient to first carry out all light updates before removing lightmaps, just to throw away the results immediately afterwards. Once we know that a subchunk and all its neighbors are empty, we know rather easily what the the lighting should look like in the end (at least for blocklight, so the result will be 0). So one could propose to schedule removal of the lightmap before scheduling the light updates, as is currently done in the vanilla code, hence optimizing away all the propagations for the lightmap that will be removed anyway. This might have even been the original idea behind the current vanilla code.
The main issue with this approach is that it not only eliminates all propagations inside the removed lightmap but also propagations to all neighbors, which might not be removed. Whenever you do such an optimization overriding light values, you need to inform all relevant neighbors about the changes. Concretely, with this approach you would need to propagate the new values to all neighbors (that still have a lightmap) whenever removing a lightmap. Even if this would not cause any actual propagation to take place, as in most cases you can expect that the ligthvalues don't actually change upon removing a lightmap, the lighting engine would still have to check a lot of blocks, just to come to the conclusion that there is nothing to do.
In practice, it should be very likely that the geometry of non-air blocks is connected and changes continously with time (whatever that means exactly). In this scenario, you would indeed only remove lightmaps after all their nearby blocks have been replaced with air some time ago, hence their light would be trivial. So in this case the issue discussed in this bug report does not occur and it does not matter whether we first remove blocks and carry out their light updates or first remove lightmaps. In particular, the hypothetical optimization considered above would not actually optimize anything, as the light was already trivial before the lightmap removal. However, since the lighting engine does not know this, it would still have to propagated potential changes to all neighbors. So this hypothetical optimization would in fact increase the workload in the expected case.
So in conclusion I would expect this hypothetical optimization to be actually more expensive in most practical cases, compared to the solution proposed in the report, due to the compensation needed for manually chaning light values.
One could then try to apply this optimization only to lightmaps that are sufficiently far away that also their neighbors will be removed, i.e., those which are 3 subchunks away from any non-air block. In this case, we would not need to propagate changes to the neighbors, since those would be removed as well (at least for blocklight; for skylight one would still need the far-reaching downwards propagation).
However, I don't think that this would be worth the effort, as it will only be applicable for very discontinous geometry transformations.
Also note that this discussion only concerns the first part of the bug report. The more difficult multithreading part of the issue is unaffected by this approach.

In 1.16.2

It turns out that the workaround for the easy part 1 of this issue that I described in this comment does not work. This is because Vanilla schedules light checks to the POST_UPDATE
phase, so scheduling lightmap removal to the POST_UPDATE
phase aswell does not guarantee that all pending light updates have been carried out before the lightmaps are removed. Also, light checks cannot be moved to the PRE_LIGHT
phase as they would otherwise be carried out before the relevant lightmaps are created, which would again cause issues.
Fortunately, this part of the issue is automatically solved by my proposed solution for MC-196725. Note that the argument given there at the end regarding the interaction of skylight optimization and unloaded chunks also shows that my proposed solution mitigates this easy part of the issue here.
One important point in the argument was that non-optimizable subchunks are always locally correct. This does not hold in the current Vanilla code, precisely because of the issue discussed here, i.e., non-trivial lightmaps getting removed, hence changing values without informing neighbors. This is fixed by my proposed solution for MC-196725 which only removes lightmaps once they are trivial.
Also note that it is not an issue that the distance tracking of non-empty subchunks gets updated before all pending light updates are carried out. The argument only requires that if in the end state there is a non-air block then the surrounding subchunks must not be optimizable. Hence it is sufficient to mark a subchunk as empty after all blocks have been removed, but possibly before the corresponding light updates have been fully processed. Which is what the Vanilla code currently does.
Hence no extra treatment for the easy part of this issue is required once implementing my solution for MC-196725.
In fact, the argument given in MC-196725 can be greatly simplified for what we need here, since we do not need to deal with unloaded chunks for this matter. Hence the region that was constructed in the argument is simply (-1 -1 -1)..(1 1 1)
in this case, so the only interesting parts of the boundary are the top faces. And for those we didn't even use that the subchunks at y=-1 were empty, hence the more efficient specification of the skylight optimization, as briefly mentioned at the end of MC-196725, would be sufficient here.

In 1.16.4 and 20w48a. I was able to reproduce all 3 issues mentioned using the attached map in both of these versions.

Is this still an issue with the new lighting engine?