JellySquid

JellySquid on MC-186075 2020-05-30T19:02:07Z

I've created a GitHub repository to manage future releases of the shader code optimization resource pack in order to avoid additional clutter on this thread. Any issues caused by it should be reported on its respective issue tracker. I've just published the r3 pack attached to this issue as version 1.1.1 which you can now download from here.

For Mojangsters looking at this issue: I have explicitly licensed the shader code here (with permission from @gegy1000 for their additional fixes) under the CC0 Public Domain license. Feel free to use what's there.

JellySquid on MC-186075 2020-05-30T03:22:47Z

@PitNox ~~The issue you describe is caused by MC-186064 which is another bug with vanilla's new framebuffer blend code.~~ This was incorrect, it was caused by a sporadic bug in my shader code and has since been fixed.

JellySquid on MC-186075 2020-05-29T21:42:16Z

@JustBoyann Can you try the updated resource pack? It should perform better on Intel graphics cards.

@DieselDorky16 There is a resource pack included as an attachment

[media]

which might help you. Any feedback with it would be useful.

JellySquid on MC-186075 2020-05-29T18:09:12Z

Seeing GPU power consumption and utilization doubled between 20w22a and the previous snapshot on my workstation machine. My game's overall frame rate has not changed due to a CPU bottleneck caused by the number of chunk draw calls, but the GPU is much more busy when playing in fullscreen (1080p) than before. I'm not locked to v-sync here despite the misleading frame rates.

GPU stats

Version     VRAM Allocated   GPU TDP    GPU Utilization   Frame rate
20w21a      312MiB           36W        37%               59 fps
20w22a      423MiB           63W        71%               59 fps

System Specs

Arch Linux (5.6.14 kernel)

AMD FX-8370 (8 cores) @ 4.3GHz

24GB DDR3 RAM (2x8GB + 2x4GB)

GTX 960 (2GB VRAM)

1920x1080 Display

It looks like the culprit of this massive increase in GPU usage comes down to the new framebuffer blending technique used to resolve some translucency issues between render layers (i.e. items behind fluids). I've made a small resource patch which applies some optimizations to the shader in question in order to help reduce GPU usage. Primarily, it eliminates the secondary indexing table used to sort the fragments on each frame buffer and uses a faster insertion sort algorithm. The blend code itself accounts for a very small amount of the overall shader's execution time.

On my machine, this patch (in a different scene) makes a considerable improvement to GPU utilization at a render distance of 16 chunks and brings it much closer to 20w21a.

GPU stats with attached resource pack

          GPU TDP     GPU Utilization   Frame rate
20w11a (Vanilla)    54W         38%               123fps
20w22a (Vanilla)    66W       95%               100fps
20w22a (w/ Patch)   54W       45%               118fps

I've uploaded a resource pack to this issue with the optimized shader code that can be used to help improve performance on affected machines, but it does not fully eliminate the added overhead of this new render pass.

JellySquid on MC-174568 2020-03-12T19:40:17Z

Hey there. I'm the author of Lithium, which was mentioned in the ticket. The specific benchmark used here is a bit of a unrealistic synthetic workload, but it does demonstrate very clearly how recent versions have impacted performance.

The reason for the extreme slowness here is because the game will always perform an expensive shape comparison beneath a rail block in order to determine if the rail can stay at a given location (even if the block shape beneath it is a simple full-cube) and never cache the result to avoid subsequent computations. This issue is not specific to rails and also causes significant slowdown for Redstone gate blocks, which are otherwise very simple in comparison.

There are two solutions which could be employed here with varying levels of complexity, though both are not super invasive.

The shape information cache for each block state could be extended to include other properties, such as whether or not it can support a Redstone component or light source on its given faces. This data could be encoded into 6 bits (one for each face) and fit into a single byte for each property, making it essentially free when the information is cached, whereas a simple boolean array for each would require something like 24-28 bytes.
Or, the code responsible for checking if a block can support another component could simply check if the block's shape is a simple full-cube with an equality check and early-exit with a truthy value, since full-cube shapes can always support another component. This would perform the same as without it in cases where other non-simple shapes are involved, and very slightly slower than caching the information in the BlockState due to virtual call overhead in retrieving the shape from a block implementation.

I use both of these techniques in Lithium. The block state's shape information cache is checked first, and if it can't be used because the shape is non-static, a simple full-cube test is done before falling back to the slow shape equality test.

JellySquid

Assigned

Reported

Comments